1. Introduction
There is an extensive literature using spatial statistics that deals with cross-sectional correlation, and is popular in regional science, urban economics and geography, to mention a few; see Anselin [
1] for a nice introduction to this literature. Unlike time-series, there is typically no unique natural ordering for cross-sectional data. Spatial dependence models may use a metric of economic distance that provides cross-sectional data with a structure similar to that provided by the time index in time series. Examples in economics usually involve spillover effects or externalities due to geographical proximity. For example, the productivity of public capital, like roads and highways, on the output of neighboring states. Also, the pricing of welfare in one state that pushes recipients to other states. In a linear regression model, this spatial correlation may be in the disturbances and is called the spatial error model (SEM), or modeled on the dependent variable itself and named the spatial autoregression (SAR) model, or the spatial lag model. Unlike the autoregressive model or lagged model in time series where there is a natural ordering across time and lagged values are well defined, in cross-sections, this is dealt with using neighbors whose shocks or disturbances are affected by their neighbor’s shocks or disturbances in the SEM. For the SAR, the dependent variable is affected by neighbors, like house price being affected by neighboring house prices.This leads to the construction of a weight matrix that defines one’s neighbors by distance or contagion; see Anselin [
1]. So, rather than lagged house prices as in an autoregressive time series model, own house price is related to a weighted average of neighboring house prices. Spatial dependence has been extended from cross-section to panel data; see Chapter 13 of Baltagi [
2] or Elhorst [
3] for a textbook treatment of the subject.
OLS yields inconsistent estimates in the SAR model due to the endogeneity of the spatial lagged dependent variable and the disturbances. For the SEM model, OLS yields an unbiased but inefficient estimator. Because of the limitation of OLS, (Q)MLE is often used to estimate the spatial models as in Ord [
4], Anselin [
1] and Lee [
5]. However, MLE is sometimes computationally intensive especially for large sample sizes because of the requirement to compute the Jacobian term in the likelihood function. Ord [
4] proposes a simplified computational procedure only requiring the eigenvalues of the spatial weight matrix, but computing accurate eigenvalues is increasingly difficult for large
n. Kelejian and Prucha [
6] suggest a Generalized Method of Moments (GMM) estimator for the SEM with SAR structure. Alternatively, constructing instrumental variables (IVs) from the exogenous variables, Kelejian and Prucha [
7] propose the GS2SLS estimator and Lee [
8] discuss the best GS2SLS estimator by using more efficient IVs. Lee [
9] considers a GMM estimator for the SAR models with exogenoues variables and showed it is more efficient than the 2SLS estimator and is as efficient as the ML estimator asymptotically. Spatial panel data models (with dynamic terms) are also estimated using MLE, 2SLS or GMM as in Yu et al. [
10], Baltagi and Liu [
11], Kapoor et al. [
12].
However, for all of these estimation methods considered above, they constrain the parameter space of the spatial coefficient to limit the degree of correlation between units. This is because when such spatial correlation is too strong, the spatial echoes passing through each unit do not die out and so the system cannot achieve an equilibrium. When the spatial weight matrix is row-normalized by convention, such a constraint requires the spatial coefficient to be smaller than 1 in absolute value. In this survey, we summarize the developments that relax this constraint and allow the spatial coefficient to equal or sufficiently approach unity, which is known as the spatial unit root. This is of practical relevance because there are many cases where the spatial coefficients are close to 1. For example, Keller and Shiue [
13] detect the inter-regional trade of Chinese rice and find that rice prices for different provinces are highly related with spatial coefficients lying between 0.9 to 0.95.
When (near) spatial unit roots exist, the standard estimation procedures are not necessarily reliable and statistical inference is invalid. To remedy such a case, Fingleton [
14] avoids the circularity of the spatial weight matrix and conducts a Monte Carlo simulation to explore the performance of OLS estimation. Alternatively, Lee and Yu [
15] artificially let the spatial autoregression coefficient sufficiently approach the unit roots and derive the asymptotic behavior of QMLE and 2SLS estimators. The spatial unit roots have also been generalized to Spatial Dynamic Panel Data (SDPD) model by Yu and Lee [
16].
In order to investigate the possible spatial unit roots, several test procedures have been proposed. Fingleton [
14] suggests a “very high” value of the Moran’s
I statistic could be useful for testing spatial unit roots. Lauridsen and Kosfeld [
17,
18] propose a two-stage Lagrange Multiplier (LM) tests that distinguish the spatial unit roots from the stationary positive spatial correlation. By the fact that spatial impulses do not die out under spatial unit roots and lead to explosive variance, Beenstock et al. [
19] numerically calculate the critical value even under an irregular spatial weight matrix. These tests have been extensively used in the literature; see Yesilyurt and Elhorst [
20], Olejnik [
21], Machado et al. [
22], Beenstock and Felsenstein [
23].
This paper is organized as follows.
Section 2 introduces some basic concepts in spatial econometrics. Next, the parameter space of the spatial autoregressive coefficient and the corresponding singular points are considered. Stationarity and spatial cointegration concepts derived from spatial unit roots are introduced. General assumptions and the corresponding implications in spatial econometrics are discussed.
Section 3 introduces the potential problems with the existence of spatial unit roots: spurious and nonsense regression.
Section 4 investigates estimation methods and inference under spatial unit roots.
Section 5 discusses how to test for spatial unit roots.
Section 6 discusses spatial unit roots in the SAR model while
Section 7 concludes.
2. Basic Concepts in Spatial Econometrics
Spatial models study the spatial dependence between units. In practice, the spatial weight matrix is used to describe such dependence. Let be the contiguity-based spatial weight matrix, i.e., if unit i and j are contiguous and 0 otherwise. Also, the diagonal elements are set to 0 by convention. In practice, the spatial weight matrix is generally row-normalized by , so the row sum of will be one.
Different spatial model specifications have different implications. The SEM with spatial autoregressive (SAR) structure on the
error vector
can be expressed as
, where
is known as the spatial coefficient and satisfies some assumptions that will be introduced later, and
is the
independent and identically distributed (i.i.d.) innovations with variance
. The error covariance matrix for the
with SAR structure is
where
and
. Though
may be sparse,
is not necessarily so, thus the spatial covariance structure induced by such SEM model is classified as
global. Conversely, a spatial moving average (SMA) specification for the error vector
can be expressed as
, and the corresponding covariance matrix will be
including only
and
which are first and second order neighbors if
is defined as first-order contiguity. Hence, such a model is generally classified as
local. See Baltagi et al. [
24].
Kelejian and Prucha [
7] consider a “cross-sectional (first-order) autoregressive spatial model with (first-order) autoregressive disturbances” (SARAR) and is labeled as spatial autoregressive combined (SAC) model. If the right-hand side includes both the independent variable and the spatially lagged dependent variable then it is termed the mixed regressive, spatial autoregression (MRSAR) or mixed SAR model. The spatial Durbin model (SDM) includes both the spatial lagged dependent variable and independent variables. A full model labeled as general nesting spatial (GNS) model given in Elhorst [
3] is
where
is referred as spatial autoregressive coefficient (SAC) and
is called the spatial autocorrelation coefficient.
2.1. Parameter Space of the Spatial Autoregressive Coefficient
Consider the pure SAR model with data generating process (DGP):
where
is an
vector of observations on the dependent variable,
is an
spatial weight matrix and
is an
vector of disturbances which are assumed to be i.i.d.
. The reduced form equation of
can be written as
where
. So to guarantee the system achieves equilibrium, a crucial assumption is that the absolute value of the spatial coefficient
is strictly less than 1 (see Kelejian and Prucha [
6] (Assumption 2), Kelejian and Prucha [
7] (Assumption 2)) to ensure the nonsingularity of
. This assumption follows from a sufficient condition for the invertibility matrix in Horn and Johnson [
25] (Corollary 5.6.16, p. 351):
Theorem 1. An matrix is nonsingular if there exists a matrix norm such that . If this condition is satisified, .
Thus,
is invertible if there exists a matrix norm such that
. It is also well known that any norm of a matrix is larger than all of its eigenvalues. Let
be the eigenvector matrix and
,
, be the egienvalues of
, then
So it is easy to see that
and therefore
because
so that
. A useful result is given in Ord [
4]:
Theorem 2. If has eigenvalues , . Then for , .
Moreover, the log-likelihood function for
, given
in (
4) is
and
, of which a sufficient condition is
, for all
i. Again, since
, we obtain the range of
that is given in (
7).
However, either by Theorem 1 or 2, (
7) is a sufficient condition for the invertibility of
. The singular points of
are
by Theorem 2, and the number of these singular points are at most countably many as
. This raises a problem as stated in Kelejian and Robinson [
26]. These singular points can be determined generally by the
nth polynomial numerically, and to avoid inconsistency, they should be removed from the possible values of
. Griffith [
27] (p. 19) states that this condition also ensures stationarity, but Kelejian and Robinson [
26] give a counter example showing that it does not when
is a row-normalized double queen weight matrix.
When the matrix is row-normalized, Kelejian and Prucha [
6] (Note 8, p. 120) show that
by Geršgorin’s theorem, and typically,
[
26]. We assume
, and this is why
is generally interpreted as the spatial autocorrelation coefficient similar to its counterpart in time series.
2.2. Stationarity, Order of Integration and Cointegration
Stationarity is a key assumption in time series. Similarly, in spatial econometrics, when the stationary assumption does not hold, spurious or nonsense regression appears, as will be shown in the next section. Stationarity is tightly connected with
. The formal definition of stationarity is given in Anselin [
1] (p. 42):
Definition 1. A process is strictly stationary if any finite subset from the stochastic process has the same joint distribution as the subset for any s, where s represents an uniform shift in time, space or time–space.
But we generally consider a weaker version, covariance stationarity. For the intuition of stationarity and the connection with the inverse of
, see Beenstock et al. [
19]. Consider the pure SAR model in (
4) and (
5), the covariance matrix for
is
where
and
is defined in (
5). Letting
be the
element of the matrix
, by normalizing
,
has variance
and covariance
with
. By Definition 1, stationarity requires that
and
remain unchanged asymptotically (this implicitly assumes that both location
j and
k are far away from the edge). Note that by Theorem 1, we have
So the stationarity assumption is equivalent to
, for all
k as
. Since
and
,
m represents the “remote” area, and
is the
step neighborhood of unit
k. Thus, intuitively, stationarity requires that the shocks from far away locations will asymptotically not affect the epicenter area.
Another two concepts tightly connected with unit roots and stationarity are the order of integration and cointegration. The order of integration is originally a concept in time series that describes the minimum number of differences that a non-stationary process needs to be (covariance) stationary. Cointegration, on the other hand, describes the minimum order of integration of a combination of two or more series with the same order of integration. The formal definitions of the order of integration and cointegration are given in Hamilton [
28]:
Definition 2. A time series is integrated of order d, denoted , if is a stationary process, where L is the lag operator and is the first difference.
Definition 3. Time series X and Y are cointegrated of order , if both of them are , and there exists a cointegrated vector such that .
In time series, the lag operator
L is defined by
because of the natural order of temporal data. In spatial econometrics, we regard
as the spatial lag operator and
as the first order spatial difference, see Anselin [
1] (pp. 22–26). We also use
and
to refer to spatial integration and spatial cointegration respectively.
More specifically, for the pure SAR model in (
4) with a row-normalized weight matrix, if
,
since
is stationary. Also, suppose both
and
are
, but they have a long-term equilibrium relationship
, then obviously,
are
with cointegrated vector
.
2.3. Some Fundamental Assumptions
Different assumptions are made in spatial econometrics for different estimation methods. The most common ones are listed here, and the implications are explained.
Assumption 1. The disturbances are i.i.d. for all n (so uniformly) with zero mean and finite variance . Additionally, fourth moments exist.
Assumption 2. The elements of the exogenous variables are uniformly bounded for all n. The exists and is nonsingular.
Assumption 3. The matrix is nonsingular.
The existence of up to the fourth moment of disturbances is needed to apply the central limit theorem for (a system) of the linear-quadratic form (see Kelejian and Prucha [
29] (p. 226) and Kelejian and Prucha [
30] (p. 63)). The nonsigularity of
makes sure the system achieves an equilibrium as well as ensuring that the mean and variance of
exist.
Assumption 4. The matrices and are uniformly bounded (UB) in both row and column sums for all n. (We say a matrix is UB in row (column) sums if its maximum row (column) sum is finite. This property preserves under finite matrix multiplication.)
The UB condition for
implicitly assumes a limited number of neighbors for all units even as
, so the weight matrix
is sparse for large
n. This assumption is relaxed in Lee et al. [
31] by introducing dominant (popular) units. In practice, the spatial units have a limited number of neighbors. Though sometimes
may be defined as the inverse of the distance between
i,
j physically or economically,
tends to be 0 between far away units as
n increases. So in general this assumption is satisfied. The UB of
is to ensure the covariance matrix
in (
9) is still UB, which limits the correlation between two different units since the UB property preserves under matrix multiplication.
Other assumptions to ensure identification conditions or the derivation of asymptotic distributions of estimators will be mentioned when needed.
4. Estimation and Inference
The spurious regression studied in the previous section suggests that OLS is not a good estimator in the SAR model with spatial unit roots. QMLE and 2SLS are alternative methods of estimation. This section briefly reviews these estimators and their performance under near unit roots.
When the error term of the DGP has a spatial autoregressive structure, OLS and (feasible) generalized least squares ((F)GLS) estimators are consistent. The efficiency of the OLS estimator was considered by Krämer and Donninger [
40], Tilke [
41] for the symmetric spatial weight matrices, and generalized by Krämer and Baltagi [
42] with a broader covariance matrix. But the symmetry of the weight matrix is too restrictive to be used in practice, so Martellosio [
43] generalizes this to nonsymmetric weights matrices. The efficiency of the OLS estimator is defined as
. But these papers generally focus on the relationship between
and
X, for example, when the column space of
is contained by the column space of
X. Following Lee and Yu [
34], Baltagi et al. [
37] derive the asymptotic properties of OLS, (F)GLS of
and point out important differences from conventional theory based on stationary spatial error.
A special SAR model has also been considered in regular lattices under spatial unit roots with the form
The simplest case of this special SAR model is the doubly geometric spatial autoregressive process:
It is called “unilateral” because only the previous units have effects on the latter ones and have a lower triangular weight matrix. It can be considered as the combination of two autoregressive (AR) models [
44]. This model has been widely used in the area of image processing, agriculture trials, digital filtering, and other different fields. Model (
64) is unstable when either
or
because of the existence of spatial unit roots [
45,
46].
A more complicated special case of the unilateral model is
Because of its simple form, the estimation and inference of (
64) can be derived without too many assumptions, as will be seen shortly. Moreover, when spatial unit roots exist, the limit of the variance of
are analytically obtained in Baran [
47] when the parameters
are located on the interior, on the edges, and on the vertices of the domain of stability. Paulauskas [
48] explicitly shows that the growth rate of the variance of
Y is different in dimensions
and
. Though this approach studies spatial unit roots from a different angle than that of Fingleton [
14], it points to some similarity as in a recent paper [
49] that will be discussed in
Section 6.
Another possible way to remedy the problem caused by spatial unit roots in the SAR, SEM model is to relax the compactness assumption. When some parameters approach the boundary of the parameter space, consistency of extremum estimators could be obtained with compact parameter spaces. Thus the compactness assumption is standard in spatial econometrics because of the existence existence of the singular point
, see proofs in Kelejian and Prucha [
6] Lee [
5], Gupta [
50]. But such an assumption is also restrictive in the sense that if we choose an arbitrary compact set on the open parameter space, the true global optimizer may be exclusive, especially for near unit root cases. A recent paper by Liu et al. [
51] generalizes Theorem 2.7 in Newey and McFadden [
52] (p. 2133), which relaxes compactness when the objective function of an extremum estimator is concave and allows the non-stochastic objective functions to depend on the sample size
n. This generalization is suitable for spatial econometrics models because the sample observations are usually in triangular arrays. (A triangular array is a doubly indexed sequence of numbers or polynomials. Each row of the array is only as long as the row’s index. For example, the
ith row contains only
i elements.) The consistency of the QMLE of the SAR model and the MLE of the SAR Tobit model are investigated. But a closed-form solution is not obtained. On the other hand, Gupta [
53] proposes a Newton-step computational algorithm of QMLE for a large-parameter-size SAR model, which is free from the compactness assumptions. Under the normality assumption, it has the same asymptotic efficiency as MLE, but has a closed-form solution and is computationally simple.
4.1. QMLE and 2SLS Methods for the (Mixed) SAR Model
4.1.1. Quasi-Maximum Likelihood Estimation Method
Lee [
5] investigates the asymptotic distribution of the QMLE estimator of the mixed SAR model, which is the starting point for further analysis when spatial unit roots exist. This analysis is based on the discussion of the singularity of the information matrix of the log-likelihood function. Especially when the information matrix is singular, a scaling factor
is needed, where
is the order of the elements of the spatial weight matrix
and thus
is the order elements of
. We have seen in (
39) that the order of elements of
is
in the near unit roots case; thus, a similar scaling factor will be needed.
The (mixed) SAR model under consideration is
with its reduced form
since
.
Also, Lee [
5] imposes a weaker assumption about the spatial weight matrix and derives the information matrix.
Assumption 7. The elements of are at most of order , denoted by , uniformly in all where the rate sequence can be bounded or divergent. The ratio as n goes to infinity.
Let
, where
, then the log-likelihood function of (
67) is
where
. The information matrix is
where
with
. The existence of the extra
is because
is not necessarily normally distributed.
To ensure the asymptotic distribution of QMLE
exists,
must be well defined. Lee [
5] proves the nonsingularity of
can be guaranteed by the fact that there does not exist a nonzero vector
such that a linear combination of columns of
is 0. This condition could be simplified as: there does not exist a
, such that
Since each term in (
72) is greater or equal to 0 (the first term is non-negative because it is symmetric; for the second term,
where
and
), Lee [
5] studies the singularity of the information matrix in terms of these two terms, respectively. For the first term, by the partition matrix formula,
is nonsingular if and only if
and
are nonsingular. Moreover, under Assumption 7, if
and
are independent, one sufficient condition for the nonsingularity of
could be:
Assumption 8. exists and is nonsingular.
However, Lee [
5] states that if
and
are linearly dependent, for example, when
is row-normalized and the relevant regressor is only the constant term, Assumption 8 should be adjusted to guarantee the second term in (
72) is greater than 0:
Assumption 9. and the is a bounded sequence and, for any , Then under Assumption 7 and either 8 or 9, the asymptotic distribution of QMLE
will be
where
,
.
The above results are based on
being invertible of which a necessary condition is that
is a bounded sequence. However, when
will become singular because
. Also, the singularity of the information matrix implies that the score function will be too flat to be useful and thus an adjustment of the rate should be imposed as in Lee [
5]
Assumption 10. The is a divergent sequence, elements of have the uniform order , and with . Under this situation, either (a) , or (b) if whenever . Lee [
5] gives the asymptotic distributions of the QMLE under this rate-adjusted assumption:
4.1.2. Generalized Spatial Two-Stage Least Squares (GS2SLS) Method
Kelejian and Prucha [
7] proposed the GS2SLS method for the “cross-sectional (first-order) autoregressive spatial model with (first-order) autoregressive disturbances”:
where
and
. Since
is endogenous, it should be instrumented. Assume
is known (a consistent estimator is given by Kelejian and Prucha [
6]), a Cochrane–Orcutt (CO)-type transformation applied to (
77) yield the transformed regression
should be instrumented. The ideal instruments are of course
Then, by (
10),
, the ideal instruments are:
In practice,
is used.
Assumptions about the instrument matrix are made to ensure that
exists and is of full rank. In the near unit roots case, similar to the QMLE method, scaling factors are needed to guarantee this property as we will see later. With these assumptions, the GS2SLS procedure has three steps, as in Kelejian and Prucha [
7], as follows:
Run 2SLS on with instruments . This yields , where , is the projection matrix of , and .
Estimate
by Kelejian and Prucha [
6] according to the GMM system:
where
, then solve
, or by
. Both
and
are consistent, but
is more efficient.
Assuming
is known, run 2SLS on the CO transformed regression (
78) with instruments
yields
, where
By replacing
by its consistent estimation
(in Step 2). The feasible 2SLS estimator is
Obviously, this procedure is for a SARAR model. If it is a SAR model, only step 1 is needed and .
4.1.3. Best Generalized Spatial Two Stage Least Squares (BGS2SLS) Estimators
Lee [
8] does not drop the higher-order terms in (
80) but use the fact that
by the definition of
and proposes the best instrument:
With the corresponding simple instrumental variable estimator, the BGS2SLS is
If there is no SAR structure in the disturbance term, the best instrument is
and
in (
85) could be replaced by any consistent estimator such as the KP-GS2SLS estimator.
Compared with (
80),
does not drop off the higher-order terms and is numerically equivalent to the ideal instrument, which in turn yields asymptotically optimal instrumental variable estimators.
4.2. Near Unit Roots in the SAR Model
Lee and Yu [
15] study the asymptotic distribution of QMLE and 2SLS estiamtors of the SAR model by decomposing
(see
Section 3.3.1). The model is given as
where
. And the reduced form, again, is
4.2.1. QMLE
Obviously, the generated regressor
is explosive because of
in
, see (
39). This is very similar to the case in Assumption 10 (a) with
in
Section 4.1.1. This implies that the convergence rate of estimators of
is not the usual
case as in (
76).
Additional assumptions are made in Lee and Yu [
15] to control the magnitude of the unstable part of
(Assumption 11), and specify the identification condition (Assumption 12), which are adjusted from Assumption 8:
Assumption 11. (1) ; (2) ; (3) for any finite constant .
Assumption 12. holds.
Assumption 11 (1) (2) guarantee that
is not too small compared to
n and (3) implies that it is not too large. Assumption 12 is equivalent to (see Lee and Yu [
15] (p. 338, Lemma 1 (7)))
, which is a rate-adjusted version of Assumption 8, that ensures the identification uniqueness and implies nonsingularity of
. For a detailed discussion of these two assumptions, see Baltagi et al. [
37] (p. 6). Also, since the adjusted rate is
, QMLE
would be
-consistent, see (
76) derived under Assumption 10.
The information matrix will be the same as in (
70). In
Section 3.3.2, when studying the spurious regression of OLS, we mentioned that the scaling factor
is needed in terms of the order of
. Here, for the QMLE, similar scaling factor will also be introduced because elements of
and
have different orders as the existence of
. The second column and row of
and
, which are the derivatives with respect to the spatial coefficient
contain
, thus, they have to be scaled by a factor
. Specifically, the
element should be scaled by a factor
. This can be done by a left and right multiplying matrix
, where
Thus, Lee and Yu [
15] give
Let
and
and assume they exist, the asymptotic distribution of
is
Recently, Rossi and Lieberman [
54] combine the near unit roots with a similarity-based weighted matrix and study the consistency of the QMLE estimator when the spatial coefficient
and
, by allowing uncentered units. The element of similarity-based weighted matrix is
, where
is some function that measures the similarity between unit
i and
j according to some parameter
. The parameters they are most interested in are
. They establish the connection between the
and the order of uniform absolute row-sum norm of
,
. This means that
[
54] (p. 11, Proposition 1). Recall that
, so when
, the variance is independent of
n, corresponding to the standard SAR setup (
and fixed). In the case
, the variance increases with the sample size but with a lower speed, which is the case that we have seen when studying near unit root; but when
,
, the variance increases so fast that the non-standard limit distribution of
has to be established on a case-by-case basis, according to the resulting
. Their result is much more complicated than that of Lee and Yu [
15] because of the introduction of similarity structure in the weight matrix, but are much more flexible since now
is no longer fixed but data-driven, and is potentially more useful in empirical work.
4.2.2. GS2SLS and BGS2SLS
Lee and Yu [
15] derive the 2SLS estimators and their asymptotic distributions using the procedures mentioned above. Using instruments defined in (
80), the GS2SLS estimator of
is
Since
contains
, the adjustment by
is needed. The asymptotic properties of the GS2SLS estimators of
and
are obtained as follows:
where
. So the GS2SLS estimator
of
is
-consistent, which is higher than the usual
rate in Kelejian and Prucha [
7] and Lee [
8], but
has the usual
rate of convergence.
Choosing the instrument
as in (
85), the BGS2SLS estimator is
with asymptotic distribution:
where
. Since
is negative semidefinite, the BGS2SLS estimator is more efficient.
The above result is based on the fact that
and
are independent, which makes sure that the instrument matrix is of full rank, otherwise the 2SLS estimator will be inconsistent [
9]. Liu [
55] shows that even though
and
are linearly dependent, i.e.,
, where
is a nonzero vector, we still have
,
under near unit roots case; and
,
under the regular case, as long as
. This is equivalent to
and
, which are asymptotically independent.
To provide guidelines for empirical studies, Lee and Yu [
15] conduct simulations to compare the performance of QMLE and 2SLS methods. QMLEs are relatively robust whether the error term is normally distributed or not. Moreover, as
n increases (and the spatial coefficient is closer to the spatial unit roots), the QMLEs perform better than the 2SLS estimators because of smaller variances. One interesting phenomenon is that the best 2SLS estimators are even worse than the regular 2SLS estimators in some cases, which violates the theoretical result as shown in (
93). One possible reason for this is that the best 2SLS estimator requires an initial consistent estimator by construction (see (
85)) and under spatial unit roots, such an initial estimator may not be accurately calculated.
4.3. Near Unit Roots in the SEM Model
Baltagi et al. [
37] extend the study of near unit roots from SAR model to SEM model by considering the OLS, GLS and FGLS estimation and properties of the corresponding statistics. The model is given as
with
. Similar to Lee and Yu [
15],
could be decomposed to
One more assumption that they impose is
Assumption 13. The elements of are nonstochastic and bounded, uniformly in n, exists and is nonsingular. exists. Furthermore, exists and is nonsingular.
The OLS estimator
has the asymptotic distribution when
and when
,
. Thus
, which is
-consistent and is slower than the stationary error term case.
Baltagi et al. [
37] also study the asymptotic properties of the GLS and FGLS estimators. If
is known,
, and
which implies that
is robust for the near unit roots in the error term because it has
rate of convergence. The feasible GLS (FGLS) could be achieved by replacing
by a consistent estimator
, which yields
where
. It can be seen that FGLS is identical to the QMLE: concentrated log likelihood function of (
94) with respect to
is
where
The QMLE is of order
, which is
-consistent and
Comparing with (
98),
is a FGLS of
using
. Thus, the QMLE
and the infeasible GLS estimator
have the same asymptotic distribution as shown before. Next, Baltagi et al. [
37] consider the Wald test statistic for the null hypothesis
for OLS, GLS and FGLS, where
R is a
matrix of rank
and
r is
. For OLS,
where
does not have a standard
distribution, which is similar to the
F-statistic shown above. However, the GLS Wald statistic
has a chi-squared limiting distribution.
Baltagi et al. [
37] conduct extensive simulations. Using the root mean squared error (RMSE) as the evaluation criteria, the QML (FGLS) estimators perform uniformly better than the OLS estimator. In particular, when the spatial coefficient is sufficiently close to 1 and the sample size
n increases, the RMSE of the OLS estimator grows dramatically. Together with the fact that the Wald test statistic based on the QML method has a standard Chi-squared distribution, QMLE is recommended when near spatial unit roots exist in the spatial error model.
4.4. Doubly Geometric Spatial Autoregressive Process
The main difference between the SAR and the doubly geometric spatial autoregressive models is that the spatial dependence form of the latter is clearly specified. However, for the SAR model, such dependence relies on the specification of whose explicit form varies in different situations.
For model (
65), based on the observation
, Baran [
47] shows that the asymptotic normality of the estimators
is
in the stable case (
and
), with some covariance matrix
. For the unstable case (
,
), using the martingale central limit theorem, Bhattacharyya et al. [
56,
57] show “one step Gauss-Newton” estimators are asymptotically normal with convergence rate
. This is different from the classical time series
, where the OLS estimator converges to a fraction of functionals of the standard Brownian motion:
[
58] (p. 281).
Baran and Pap [
59] consider the more complicated model as in (
66). The model is stable if and only if
, where
S is the open tetrahedron with vertices
. They also prove that the OLS estimator is asymptotically normally distributed with the convergence rate
n when the model is stable, and
otherwise. (The simpler model
, with possibly
was investigated in Baran et al. [
60], Baran et al. [
44], Baran and Pap [
61] under stable and unstable cases. Under different settings, the limiting distribution of the OLS estimator is normal but has different rates of convergence.)
Roknossadati and Zarepour [
62,
63] study the limiting behavior of M-estimation for the near unit roots of model (
65). The M-estimator
of
is defined to minimize of the objective function
for some convex function
. Roknossadati and Zarepour [
62] show that the self-normalized M-estimators are asymptotically normal, and when the series is stable, the convergence rate of M-estimators is still
, same as in Bhattacharyya et al. [
56,
57]. But if it is unstable, i.e., when the model has infinite variance innovations, the M-estimates have a higher consistency rate.
5. Tests for Spatial Unit Roots and Nonstationarity
Recognizing the possible consequences of spatial unit roots, it is necessary to test for it. In fact, in nonstationary cases, the estimator is inconsistent and diverges [
64]. If the series contains spatial unit roots, one may employ the spatial first difference as recommended by Fingleton [
14]: after the first-order difference, such series will be converted to a stationary one, otherwise it is over-differenced and spatial correlation still exists. Based on this idea, Lauridsen and Kosfeld [
17,
18] propose two-stage LM tests to check for spatial unit roots. However, such LM tests have a high power function because of
in finite samples and are not useful for spatial cointegration since they mis-specify the regression in the second stage. A Wald test is proposed by Lauridsen and Kosfeld [
65] but it does not have a usual
distribution so simulation has to be conducted before each test to obtain the critical values. A different approach introduced by Beenstock et al. [
19] uses the fact that when spatial unit roots exist, the variance explodes and the spatial impulse does not die out as distance increases, so they iterate on the parameter space to find out the value of the unit roots (for irregular lattice) and then generate nonstationary series to conduct interval estimation.
Martellosio [
66] derives the power properties of invariant tests, for example,
, where
is the OLS residuals and
Q is a fixed matrix. When
, we obtain the Cliff–Ord test. When the regression contains only a constant, the Cliff–Ord test reduces to Moran’s test as introduced before, which is best locally invariant as shown by King [
67]. It has been shown that for the SEM model, as
, the test power vanishes. For the SAR model, as
, the limiting power is either 0 or 1. Krämer [
68] shows similar conclusions but focuses on the symmetric weight matrix. Martellosio [
69] further shows the power of any test vanishes as spatial correlation increases for a set of regression spaces. Heteroskedasticity robust tests have been studied. For example, Born and Breitung [
70], Baltagi and Yang [
71] design diagnostic tests for SEM and SAR employing the outer product of gradients (OPG) variant of the LM test which are robust against heteroskedastic (and non-normal) errors. But these tests suffer from the same deficiency as in Martellosio [
66] because such test is asymptotically equivalent to Moran’s
I. Baltagi and Yang [
72] have also shown that the standard LM test undergoes finite sample distortion when spatial dependence is heavy in both spatial and panel data settings. Recently Preinerstorfer [
73] suggests some modified tests to avoid this “zero-power trap” phenomenon, which works well for small spatial autocorrelation, but still has limiting power smaller than 1 (only 0.619 by simulation). Thus, the invariant test of
null hypothesis is not satisfactory when spatial unit roots exist, and methodologies to determine it (Tests of the
null hypothesis) deserve more attention.
5.1. Two-Stage LM Test for the Sources of Spurious Spatial Regression
Lauridsen and Kosfeld [
17] develop a two-stage LM test to distinguish between two possible sources for spurious regression. The first one is the existence of spatial (near) unit roots in the regressand and/or regressors as in Fingleton [
14], Lee and Yu [
34]; the second is that the spatial error term itself is nonstationary. So the LM tests are essentially testing if the spatial process is stable or not. The idea originates from the fact that Fingleton [
14] suggests a high value of Moran’s
I statistic as an indicator for both spatial nonstationarity and spurious regression, but we cannot distinguish between them or even distinguish between the nonstationarity and the positive spatial correlation among the error terms, which by definition imply a high value of Moran’s
I. Specifically, we are trying to distinguish if (i) the
is a SAR process and we regressed
on
or (ii) the model itself is SEM as
, where
, because both (i) and (ii) can cause spatial autocorrelation.
There are at least three advantages of the LM tests [
17]. First is that compared with Wald or LR, LM is usually simpler to compute because it is constructed under
. Second is that with the LM test, it is possible to control for some omitted model features such as heterogeneity and autoregression, as in Anselin [
74], which will be discussed later. The last one is that, other statistics may not have a standard asymptotic distribution, such as the OLS Wald type statistic as in Baltagi et al. [
37]. The proposed two-stage LM test is based on the SEM model and all four possible results are summarized in Lauridsen and Kosfeld [
17] (Table 1):
Under
, the LM error statistic (LME) developed by Anselin [
74] (p. 11, Equation (
35)) is
Thus large values of reject the null hypothesis, which implies either or .
The next step is to test if . This could be carried out by using the spatial differencing we introduced before. Under , , thus the first order difference on the regression, , yields i.i.d. error , which means the value of differenced LME (DLME) should be close 0 under . But if , represents overdifferencing, i.e., , so spatial correlation in the error term still exists, and we cannot reject .
Similar procedures could be used to investigate whether
or any
are spatially nonstationary, as the case in Lee and Yu [
34]. Letting
be one of
,
,
, Lauridsen and Kosfeld [
17] suggest using the regression
to obtain LME and DLME, respectively.
is regressed on a constant term because there is no meaningful regressor but we still need the residuals.
Spatial cointegration could also be tested using this LM test. Thus, after determining and are nonstationary, regress on and on to obtain LME and DLME. The cointegration relation exists if LME is 0; and non-cointegration if LME is positive and DLME is 0; the limiting case of “near cointegration” occurs if LME and DLME are positive.
Moreover, Lauridsen and Kosfeld [
18] generalize their two-stage LM test to account for unobserved heteroskedasticity. They specify the covariance matrix of
,
, have the diagonal element
, where
is
vector of observations of exogenous variables for region
i, related to
via the
vector of parameters
. So the statistic in (
107) should be adjusted as in Anselin [
74] (p. 9, Equation (
29)):
with
and
Z as the
matrix containing the
Z vectors that cause heteroskedasticity.
However, the Lauridsen and Kosfeld test procedure is not without problems. Beenstock et al. [
19] point out that this procedure is not suitable for testing spatial cointegration since the second stage is misspecified. To see it more clearly, regress
. The LM procedure asserts that if
is not spatially correlated, then
and
are spatially integrated; and if
is spatially correlated,
and
are not spatially integrated because of overdifferencing. Nevertheless, regressing
on
is equivalent to regressing two white noise series,
and
; because
Y and
X are both I(1). Hence, the corresponding residuals must be not spatially correlated as long as
and
are independent, regardless of whether
is spatially correlated, nonstationary, or not.
5.2. A Wald Test for Spatial Nonstationarity
Lauridsen and Kosfeld [
65] suggest a Wald post-test statistic. Based on MLE, under
, the general form of the Wald test is
, where
is the inverse of information matrix. If we specify the null hypothesis as
, then with
and
, we have
, where
is the diagonal element of
V corresponding to
. However, as mentioned before, Wald statistics may not have a standard distribution so simulations are conducted. Unlike Fingleton [
14], to generate SAR series with spatial unit roots, Lauridsen and Kosfeld [
65] do not introduce the noncircular matrix. Thus
is a singular point of
so the inverse does not exist. To solve this issue, they use the Moore–Penrose pseudoinverse.
According to Monte Carlo simulation, they find the critical limit of the Wald test under spatial nonstationarity is higher than the distribution, especially for the 5th and 10th percentile.
5.3. Test Unit Roots and Cointegration in the Sense of Spatial Impulses
Beenstock et al. [
19] come up with an innovative method to test spatial unit roots and spatial cointegration by considering the behavior of the variance and the spatial impulse. Also, they do not assume unconnected spatial units or row normalize the spatial weight matrix either. Thus, based on the topology of the unit neighborhood, the spatial unit roots are
in the regular lattices where
n is the maximum and general number of neighbors of each unit. For example,
, for bilateral space,
for rook lattice and
for queen lattice, with
respectively. (“The weight matrix with first-order contiguity according to the rook criterion has the cells immediately above, below, to the right, and to the left, for a total of four neighboring cells. The weight matrix with first-order contiguity according to the queen criterion is eight cells immediately surrounding the central cell” [
75] (p. 131). For the introduction of other types of the spatial weight matrices, see Kelejian and Robinson [
26] (pp. 94–95).) And with spatial unit roots,
is still well-defined because of the existence of the edge effect, that is, there exist some units having fewer neighbors than
n. However, even though
exists, the variance tends to explode even in finite sample space, which provides us with a way to determine the spatial unit roots for any arbitrary irregular lattices.
5.3.1. Spatial Impulse
The spatial impulse response is essentially the consequence of the shocks from one location to another. Intuitively, shocks should have no effect on the remote units if the spatial data are stationary. Beenstock et al. [
19] first consider the simplest SAR model in lateral space:
where
L denote a spatial lag operator such that
. The auxiliary equation is
When the discriminant of the above equation is greater than 0,
, and there are two different solutions,
, by Vieta’s formula. Hence, Beenstock et al. [
19] express
as
where (
112) is known as the Wold representation that expresses
in terms of the shocks. The impulse from location
to
i tends to 0 because
. Also,
varies with
. When
,
, so the impulse does not die out with distance and explodes. This fact can also be seen from
If
,
is finite and independent of
j; if
,
, and
is infinite.
For the bilateral space case,
are the spatial unit roots as shown before. Because of the edge effect, the singular point is strictly greater but approaches
. This fact shows a downside of the row-normalized spatial weight matrix: it overstates the true weight of the unit at the edge of the lattice. For example, in a rook lattice, the units have three neighbors with weight
at the edge, and four neighbors with weight
in the center. The row-normalized procedure assigns a higher weight to the neighbors of edge units. This weight assignment is not necessarily reasonable and makes the spatial unit roots the same as the singular points. Moreover, without row-normalization, the edging units play the role of the unconnected unit as in Fingleton [
14]. The general SAR model in bilateral space is
and the Wold representation is
, where
. Let the spatial impulse response be defined as
and
. Analytical solutions of spatial impulse response in bilateral and higher dimension lattices are not obtained, but Beenstock et al. [
19] expect
to be positively related to the number of spatial units because of the larger spillover effect and
varies inversely with the distance between
i and
j in the stationary case. If spatial unit roots exist, as in the lateral case, the impulse
would not die out as the distance increases. This is supported by the simulation, though only the finite sample case could be simulated, see Beenstock and Felsenstein [
76] (Figures 5.2 and 5.4). Compared with
, when
, it obviously shows a qualitative difference in the persistence of spatial impulses, as well as in the tendency for the explosion of the variance.
In the irregular lattices, the number of neighbors for the unit is undetermined generally, so the spatial unit roots, , cannot be calculated as the reciprocal of n. However, since the nonstationarity implies that spatial impulses do not disappear, one can find the empirical spatial unit roots by simulation.
The simulation method in Beenstock et al. [
19] to calculate the critical value is pretty flexible and can be adapted to different models. For example, when both dynamic and spatial terms are included, Beenstock and Felsenstein [
23] develop a similar procedure for testing cointegration in nonstationary panel data when estimating the spatial spillover effect in housing construction for Israel.
5.3.2. Spatial Unit Roots and Cointegration Tests
Knowing the spatial unit roots
, Beenstock et al. [
19] conduct Monte Carlo simulations to generate the artificial SAR series and use the MLE method to estimate SAC to obtain the corresponding distributions under different topologies (different sample size, criteria, etc.). Results show that the empirical distribution of SAC could be used to construct interval estimation and critical values for statistics that test spatial unit roots. For the spatial cointegration test, a similar procedure applies, but OLS estimation is used.
5.4. Some Applications
Kosfeld and Lauridsen [
77] offer an application of the two-stage LM test in
Section 5.1 to the income and productivity convergence in the German regional labor market. They find highly significant LME and DLME statistics (refer to formulas above like (
107)) for all variables, which means the spatial unit roots are rejected. Yesilyurt and Elhorst [
20] estimate the spatial interaction effects of inflation in Turkey. Because the regional inflation rates have a high tendency to co-move over time, they question whether the inflation rates of different regions are stationary in space. Using the two-stage LM statistics from (
108), they find that the inflation curve is stationary in space. Olejnik [
21] studies the income process of the extended European 25 based on the augmented Solow model taking into account the spatial autocorrelation effect. The stationarity of the error term as well as all variables in the model are investigated. No problem of spurious regression is found. Machado et al. [
22] examine the spatial correlation of traffic accidents of vulnerable road users (such as pedestrians and cyclists) in big cities and detect the factors that contribute to these accidents. Because their study covers several cities, the model specifications may vary across different locations. Thus they use the two-stage LM statistic to choose the best model, see Machado et al. [
22] (Table 4). Though the Wald post-test in Lauridsen and Kosfeld [
65] is asymptotically equivalent to the LM test, “It is generally recommended to choose among these alternatives on the basis of computational ease [
78] (p. 94)”.
6. Related Topics
Spatial panel data have been studied extensively. The spatial dependence is incorporated in the error component [
12,
75,
79] or by spatial lag dependence [
11,
80]. See Baltagi [
2] for a textbook discussion. Also, the panel data model can have time lagged dependent variables. If the panel data model includes both spatial and dynamic features, it is named as spatial dynamic panel data (SDPD) model by Yu et al. [
10]. Yu et al. [
10], Yu and Lee [
16] and Yu et al. [
36] study the QMLE estimator of the SDPD model under stable, unit roots, and spatial cointegration respectively. The concept of unit roots under the SDPD model is a combination of the spatial and dynamic one. To see this more clearly, Yu et al. [
10] specify the model as
where
and
are
column vectors. Since
, assuming
is invertable, the reduced form is
where
. If the infinite sums are well-defined, then by continuous substitution
So instead of focusing on the singular points of
,
should be considered, which contains
,
and
: the parameter of contemporaneous spatial effect, time lagged variable and time–spatial effect. A similar process as in
Section 3.3.1, letting
be the eigenvalue matrix of
, Yu et al. [
10] show that the eigenvalue matrix of
is
, which can be decomposed as
. The power matrix of
follows as
with
since the eigenvector matrix
is orthogonal and
. Thus, whether
is stable or not depends on the value of
compared with 1. Consequently, the decomposition of
, which is a generalization of (
40), can be expressed as
where
is a possible stable part,
is a possible unstable part, and
is the time effect part, see Lee and Yu [
81] for details. A data transformation procedure is imposed by them to eliminate both the time effects and the possible unstable term. Based on their analysis, the eigenvalues of
, the asymptotic properties of QMLE and bias are derived. When eigenvalues of
are all less than 1 (
), or some equal to 1 (
and
), or all equal to 1 (
and
), the information matrix has different properties, see Yu and Lee [
16] (Table 5).
Thus, the test for the unit eigenvalues of
is of great importance. Most attention is paid to the unit roots in the time dimension, i.e.,
and equivalently, if
. Unit root tests in panel data under spatial dependence have been extensively studied, see Baltagi [
2] (Section 12.3) for a summary. Also, the performance of different tests has been considered in Baltagi et al. [
24]. The test for
has been investigated in Lee and Yu [
81] (Section 14.3.4). Such a test works well when
. However, when
and
T is small, it is not reliable. Thus, further study of the unit root test for the SDPD model should be investigated.
Recently, another approach to describe strong spatial dependence has been proposed by Müller and Watson [
49]. Since the spatial units are not neatly arranged, i.e., irregular lattice, they do not model the spatial dependence by SAR model but “posit a continuous parameter model of spatial variation [
49]”. They use the Lévy–Brownian motion to define the spatial
process,
, which is a generalization of the Wiener process that is widely used in time series in
d dimensions (when
, it could be regarded as a random walk on the plane). The advantage of the Lévy–Brownian motion is that such a process is isotropic, which means the relative variance between two locations is determined by the distance but not the orientation (see Anselin [
1] (p. 42)). The functional central limit theorem (FCLT) is established to measure the asymptotic behavior of such a process. When regressing two independent
processes, spurious regression also occurs since classical, HAC-corrected and clustered standard errors
F statistics diverge to infinity, which is similar to that in Fingleton [
14]. To remedy this situation, “difference” regression is again considered. Unlike time series, Müller and Watson [
49] introduce the isotropic differences that treat all directions symmetrically. That is, they regard the weighted values of the neighborhood as the “average” value of the current location, just as
in the SAR model. And the difference transformation is defined as
, where
is some weighting function. Their simulations show regressions using isotropic differences do not suffer from spurious regression problems and valid inference can be conducted. Some test procedures for
and
are also suggested.
7. Conclusions
This paper briefly surveys spatial unit roots in spatial models. First, some fundamental concepts in spatial econometrics are introduced. Spatial unit roots in SAR and SEM models may lead to spurious regression. For the estimation and inference in the presence of spatial unit roots, QMLE and 2SLS methods are generally used and have satisfactory properties after scaling. The compactness assumption has been recently relaxed in spatial econometrics which potentially makes the spatial unit roots no longer a concern but its implication to concepts like stationarity, and spatial cointegration should be further investigated. The doubly geometric spatial autoregressive process, has been widely used in some scientific fields of which the most concern is about regular lattice. Similar to time series, exact orders of convergence for different estimators are obtained because of its simple specification. But this limits its application in economics where irregular lattice and different types of weight matrices are applied.
To detect possible spatial unit roots, as well as spatial cointegration, several test procedures have been proposed. Their applications are rather limited and depend heavily on simulations to obtain critical values. This could be explained by the fact that statistics under spatial scenarios generally do not have standard asymptotic distributions, not to mention the irregular lattice.
Lastly, some related topics were introduced. The idea of singular points is generalized in SDPD model because such a model includes the time lagged variable that is based on the traditional SAR model. However, the existing literature focuses on the temporal unit roots in the SDPD model. Recently, an innovative way to study spatial unit roots describes the underlying spatial process using Lévy–Brownian motion, which is a generalization and spatial analogy to the time series counterpart. The limitations of different approaches and further research were also discussed.