1. Introduction
Recently, spatial econometrics have gradually become one of the ‘standard’ econometric analysis tools; see Anselin et al. (2010) for the consideration of the spatial effects among research objects, where the spatial econometric model provides more practical results compared to the classical econometric model [
1]. There are two commonly used spatial econometric models: the Spatial Lag Model (SLM), also known as the Spatial Autoregressive Model (SAR), and Spatial Error Model (SEM). The SLM studies the influence of dependent variable behaviors in the vicinity on the rest of the whole system. The spatial dependence in SEM is present in the error term, and SEM researches the influences of the error impacts in the vicinity on other areas.
Introducing spatial effects into spatial panel data model has a broad practical application prospect, since it not only controls the individual heterogeneity well, but it also considers the spatial dependence in the cross-sectional dimension. However, because the spatial panel data model includes both features of the spatial econometric model and panel data model, which makes model structures more complex, it is more difficult to test spatial dependence and estimate parameters.
In consideration of the individual effects, the spatial panel econometric model is specified as follows:
In (1) and (2), and respectively present a dependent variable and an independent variable; stands for spatial weight matrix. The spatial weight matrix W needs to be standardized based on spatial distance. in (1) and in (2) are the spatial correlation coefficient of the model, with and . We define the error term in (1) by and in (2) by . According to individual effects , we can set the model as a fixed effects model or random effects model. In this paper, we let denote random effects, and assume that and are mutually independent, noting that .
As we know, the spatial dependence test is the premise of spatial econometric analysis. There are many mature methods to test spatial dependence in the spatial panel data model, including such commonly used methods as Moran’s I or the LM test method. Moran’s I test statistics can test spatial dependence in a model, but is unable to determine the forms of spatial dependence. Generally, the LM test can be divided into the LM-Lag test and LM-Error test, respectively, and can be applied to the spatial lag model and spatial error model for spatial dependence tests.
The efficiency of the spatial dependence LM test method depends on the i.i.d-normal assumption of error under the condition of large samples. In real economic management research, however, empirical research can only obtain finite samples in most cases since the data are difficult to obtain. Meanwhile, due to the complexity of the practical economy and the diversity of influential factors, the error term of the model usually violates the i.i.d-normal classical assumption. At this point, the spatial dependence LM test is no longer valid.
To solve the above problem, Efron (1979) provided an effective method, namely the Bootstrap method [
2]. The Bootstrap method does neither need to assume the overall distribution nor derive the estimator’s function beforehand. It just reconstructs samples and calculates estimates continuously. Therefore, the Bootstrap method widely applies to the situation that the error term disobeys in classical distribution. The Residual Bootstrap or Wild Bootstrap method have been very commonly used in spatial dependence tests. More specifically, the Wild Bootstrap is more applicable to solve heteroscedasticity or the autocorrelation of an error term (Davidson [
3], 2007). Applying the Bootstrap method to the panel data unit root test or cross-sectional correlation test, etc., in the panel data model, Chang (2003) [
4], Cerrato (2006) [
5] and Godfrey (2009) [
6] et al. showed that the Bootstrap test is obviously better than the asymptotic test.
The Bootstrap method has broad applications in classical econometric research; however, as far as we know, it has few applications in the spatial dependence test. Warren (2008) used the Bootstrap method to research the heteroscedasticity, serial correlation or spatial error autocorrelation in a spatial panel data model [
7]. Yang et al. [
8] (2011) and Lin et al. [
9] (2011) also used the Bootstrap method to test spatial dependence in a spatial cross-sectional data model. Ren et al. [
10] (2014) studied the performance of Bootstrap tests for spatial pooled data models, Yang [
11] (2015) tested spatial dependence in a spatial cross-sectional data model through LM tests, and Lee [
12] (2015) focused on the test statistic for Moran’s I test for the spatial dependence of a spatial cross-sectional data model. Ou et al. (2015) proposed a robust LM test for a time-varying spatial weight matrix panel data model. In the case of a longer time and more individuals, the robust LM test of the time-varying spatial weight matrix is more efficient.
Zhai et al. (2022) studied the spatial effect testing problem of a fixed effect variable of a coefficient spatial autoregressive panel data model. Their numerical simulation shows that the proposed method has better performance under limited samples performance.
Nevertheless, the above Bootstrap methods may only apply in spatial cross-sectional data models or pooled data models, since we have not searched the relevant literature about research on spatial dependence LM tests in spatial panel data models. This paper will apply the Fast Double Bootstrap (hereafter referred to as FDB) sampling method, which was proposed by Davidson and MacKinnon (2007) to an LM test in a spatial panel data econometric model, in order to solve such problems as small samples or a spatial dependence test in a spatial panel data model under non-classical distributed errors [
3]. From size distortion and power, we employ Monte Carlo simulation to study the efficiency of the Bootstrap LM test in a spatial panel data model. Due to space limitations, this paper just reports the efficiency of the Bootstrap LM-Error test method under the conditions of a random effect.
The rest of this paper is organized as follows.
Section 2 introduces the Bootstrap LM test in the spatial panel data econometric model. In
Section 3, we conduct the Monte Carlo simulation and explain its results.
Section 4 concludes.
2. Bootstrap LM-Error Test for Spatial Panel Data Econometric Model
The first step to test spatial correlation in a spatial panel data model is to establish test statistics. Then, a spatial panel data error model with random effect is given by (2), and the variable
denotes the random effect in the model [
13]. We are interested in testing the null hypothesis of the LM-Error test,
(with assumption that
), against the alternative,
, which means the presence of a spatial error autocorrelation. Baltagi [
14] et al. (2003) proposed LM-Error test statistics for an error term autocorrelation in a spatial panel data error model:
where
,
.
Under the null hypothesis,
and
, respectively, present the ML estimates of
and
. Similarly,
is the ML estimated residual under the null hypothesis [
15].
The new samples obtained through general Bootstrap sampling cannot be directly applied to a panel data model, as it does not precisely close the population and consider the multidimensionality of panel data. Thus, Beran [
16] (1998) proposed the Double Bootstrap (DB) method to test spatial dependence in a spatial panel data model. By using the DB method, we obtain the distribution of test statistics through sampling and then make it approximate to the real distribution. Therefore, the DB method not only obtains a more accurate
p value of statistics than the general Bootstrap sampling method, but it also improves the accuracy of population property in a spatial panel data econometric model.
DB sampling is divided into two steps. initial Bootstrap samples will be generated by the first sampling, which are used to calculate the Bootstrap statistics , for j = 1, 2 … . Bootstrap samples, which are generated by the second sampling on the basis of each of the first samples, are used to calculate the Bootstrap statistics , for = 1, 2 … . However, through this method, the operation time is too long and the computation is too much. For every Bootstrap sample, the test statistics will be computed, thus test statistics are needed by using DB sampling.
Davidson and MacKinnon et al. [
17] (2002) further put forward a Fast Double Bootstrap (FDB) method (at an earlier time, we employed Monte Carlo simulation to compare the DB method with the FDB method for partial test. And we found that the FDB method can greatly save operation time, and, moreover, its test performances are very similar to the DB method) to optimize the DB sampling method. Using the FDB method to generate
initial Bootstrap samples, we obtain test statistics
. And to obtain one test statistic
on the basis of a second Bootstrap sampling, we need to calculate
test statistics. In this way, the FDB method greatly reduces Bootstrap sampling time and largely simplifies the operation. The formula for the
p value of FDB is given by MacKinnon [
18] (2006), as follows:
where
denotes the
p value of FDB,
denotes
p value of ordinary Bootstrap method, and
denotes
quantile of the statistic
under the second Bootstrap sampling [
19].
In this paper, we employ a residual FDB method and a Wild Bootstrap method [
20], including symmetric and asymmetric, for LM-Error test statistics in a spatial panel data model. The steps are as follows:
(1) Based on the observation data set and without spatial dependence, we use a ML method to estimate the model in (2), and then we obtain estimated parameter and residual vector . Given (3), we calculate asymptotic LM-Error statistics and the p value of the asymptotic test.
(2) For the vector , we use Bootstrap sampling with N times replacement to generate a Bootstrap sample , where is a repeatable random order permutation of each element of . If , we suggest a standard residual Bootstrap method under heteroscedastic errors. If (for ), the Wild Bootstrap method is better, which can solve heteroscedasticity in the model. There, has two forms:
① Symmetric Wild FDB:
② Asymmetric Wild FDB:
In (5) and (6), , , respectively, correspond to the symmetric Wild Bootstrap and asymmetric Wild Bootstrap. In (5), the probability of the random variable , which is equal to 1 or −1, is . Similarly, the probability of the random variable , which is equal to or , is, respectively, and .
(3) As a result, we obtain , and residual , which are used to define the new dependent variable as . To obtain a new estimated parameter and residual vector , we apply the OLS method to estimate the model, and based on vector , calculate the first sampling Bootstrap LM-Error test statistics as given in (4), denoting it by .
(4) Bootstrap sample , which is generated via Bootstrap sampling for again, is used to define . As the preceding step, the new estimated parameter and are given by OLS estimation. According to (4), we obtain the second Bootstrap sampling LM-Error test statistic, denoted as .
(5) Repeat step (2)~(4) times. Thus, the B first Bootstrap sampling LM-Error test statistics (for ) and the B second Bootstrap sampling LM-Error test statistics will be generated.
(6) is the p value of first Bootstrap sampling, which we substitute into (5) to calculate the FDB p value .
Next, on the basis of Monte Carlo simulation, we prove the efficiency of the spatial dependence LM test in spatial panel data model by using the FDB method.
In the spatial panel data model, the size distortion of Bootstrap LM-Error test is given by the following: (the size distortion of the Bootstrap LM-Error test in the spatial panel data model is referred to as the difference between the test level and the given nominal significance level. The smaller the size distortion is, indicating that the Bootstrap test level is quite close to the nominal significance level, then the more reliable the hypothesis test result will be. The power of the Bootstrap LM-Error test in the spatial panel data model implies the probability of the Bootstrap LM-Error to reject the null hypothesis when the null hypothesis does not hold, i.e., there is spatial dependence in the spatial panel model)
where
denotes significance level, as usual
= 0.05, and
denotes Monte Carlo simulation times.
is indicator function, equal to 1 when
, otherwise, 0.
Furthermore, the power of Bootstrap LM-Error test in spatial panel data model is given by:
In the following section, by taking above steps and Monte Carlo simulation, when the null hypothesis or alternative hypothesis holds, we research the efficiency of the spatial dependence test in the spatial panel data model from two aspects: size distortion and efficiency.
3. Monte Carlo Simulation Experiment
In our context, we researched the size distortion and power of the Bootstrap LM-Error test and asymptotic test in the spatial panel data model, in order to prove the efficiency of the Bootstrap LM-Error test via Monte Carlo simulation experiment. Generally, the smaller the size distortion and the bigger the power is, the more valid the test statistics will be.
3.1. Parameter Setting and Experimental Procedures of Monte Carlo Simulation Experiment
The parameters of simulation experiment are given as follows:
(1) The data generation process is spatial error model given by:
where,
, , respectively, denotes the matrix on even distribution and standard normal distribution. The spatial weight matrix is specified to the Rook matrix and Queen matrix. There are two forms of adjacency matrix: the Queen adjacency matrix and the Rook adjacency matrix. The Queen adjacency matrix determines that two regions are adjacent if they share a common boundary or a common vertex. The Rook adjacency matrix, however, only considers two regions as adjacent if they share a common boundary.
(2) In (9), is the intercept and is all the matrixes of the N-dimension. We let , . And indicates random effects with assumption, . Notice that and are mutually independent, and .
(3) Error term is set to (, , ). Specifically speaking, denotes the standard normal error imposed and denotes heteroscedastic distribution, which is specified as the product of independent variable and random generating normally distributed variable. is time-series error, which is generated by , with and presents serial correlation coefficient for time-series.
(4) Study size distortion of LM-Error test in spatial panel data model by setting , and the study power when under condition of , which interval specified as 0.1.
(5) Take 5000 samples to carry out Monte Carlo simulation. Bootstrap times is given as B = 199, 299 … 999.
According to the above-mentioned setting, the Monte Carlo simulation experimental procedures are as follows:
(1) Based on computer-generated variables and , we obtain the dependent variable from (9) and thereby sample .
(2) Conduct Bootstrap LM-Error test to obtain 1+B spatial dependence statistics and , , … ; we then calculate the p value of the FDB LM-Error test and the asymptotic LM-Error test denoted and , respectively.
(3) Repeating step (1), (2) M times, we obtain the M p values of the FDB LM-Error test and asymptotic LM-Error test, i.e., and (for ).
(4) Calculate the size distortion and power of asymptotic LM-Error test and FDB LM-Error test based on (7) and (8).
According to above mentioned steps, we start the simulation experiment via Gauss10.0 programming.
3.2. Monte Carlo Simulation Results
Through Monte Carlo simulation, we study the efficiency of the FDB method when it is applied to the Bootstrap LM-Error test in a spatial panel data model under the random effects condition, with the presence of normal distributed errors, heteroscedastic errors or time-series autocorrelation. We combine the FDB method with asymmetric the Wild Bootstrap method to perform our simulation, and due to limited space, we only use the Rook matrix (one of spatial weight matrix) to show our simulation results.
3.2.1. Size Distortion
Let and individual effect be the random effect. Respectively, under conditions that the error term obeys normal distribution (), or the presence of heteroscedastic errors () or autocorrelation in time-series (), we are interested in size distortion of FDB LM-Error test under the influence of Bootstrap simulation times (B = 199, 299 … 999), Rook matrix as spatial weight matrix, and sample size given by (N,T) = (49,10), (49,30), (100,10).
(1) Error term with normal distribution
As shown in
Figure 1, the horizontal axis represents the Bootstrap simulation times (B = 199, 299 … 999), and the longitudinal axis reflects size distortion. The letter ‘Theory’ stands for theoretical value of the size distortion, i.e., 0. ‘Bp’ and ‘Asy’, respectively, show the size distortion of FDB LM-Error test and asymptotic LM-Error test. The smaller the size distortion and the closer the size distortion curve and the theoretical curve are, the more valid the test will be.
Under standard normal distributed errors, the Bp curve gradually closes to the Theory curve with the sample size increasing, i.e., when the size distortion of FDB LM-Error decreases, Asy curve will be move gradually away from the Theory curve, which means that the size distortion of the FDB LM-Error is smaller than the asymptotic test (Asy curve). Moreover, the size distortion curve of either the FDB LM-Error test or the asymptotic LM-Error test will level off as the Bootstrap simulation times increases.
(2) The presence of heteroscedastic errors
When the error term has heteroscedasticity, we study how the different sample sizes and Bootstrap simulation times will influence the size distortion of the LM-Error test. (See
Figure 2).
Under heteroscedastic errors, the set spatial weight matrix is Rook matrix, and the size distortion of the FDB LM-Error decreases with the increase in sample size. Meanwhile, the Bp curve becomes gradually closer to the Theory curve, while the Asy curve is far from the Theory curve and thus has a bigger size distortion. In conclusion, the asymptotic LM-Error test has a large size distortion, which the FDB LM-Error can remedy, and, in the same situation, performs more validly.
(3) Time serial correlated errors
When we let the correlation coefficient be
, then the different sample size and the Bootstrap simulation times influence the size distortion of the FDB LM-Error test; this can be seen in
Figure 3.
As shown in
Figure 3, under time-series correlated errors, the asymptotic LM-Error test has a large size distortion while the FDB LM-Error test is very small, near to the Theory value, i.e., 0. Moreover, when the sample size increases, the size distortion of the asymptotic LM-Error test will become larger. On the contrary, the FDB LM-Error test will be stable and converge to the Theory value 0. As the Bootstrap simulation times increases, we also find that the size distortion of the FDB LM-Error test is close to the Theory value 0 and levels off. Therefore, the FDB LM-Error test is a more valid method for a spatial dependence test.
Furthermore, on the basis of the aforementioned Bootstrap simulation, the simulation results tend to be stable when the Bootstrap times reach 399. In the following example, we let the Bootstrap times B equal to 399, taking the sample size as (49,10), (49,20), (49,30) and using Rook matrix. Then, we wanted to know what the influences were on the size distortion of the asymptotic LM-Error test and the FDB LM-Error test when correlation coefficient and time T changes.
As we can see from
Figure 4, when
increases, the size distortion of the asymptotic LM-Error test increases and suffers from a huge upward trend, and is thus very far away from the Theory value 0. On the other hand, the FDB LM-Error test performs well, its size distortion is very small and is always close to the Theory value 0. Thus, the FDB LM-Error test can effectively refine the size distortion of the asymptotic LM-Error test. Furthermore, as T changes from 10 to 30, the FDB LM-Error test still levels off, and the asymptotic LM-Error test also becomes larger. This indicates that with the disturbance of errors growing (i.e., correlation coefficient of time-series increasing) under time-series correlation, the asymptotic LM-Error test will deviate more from the Theory value as it cannot deal with such disturbances. However, the FDB LM-Error test can avoid disturbance of errors by resampling. Thereby, its simulation results are always near the Theory value. Consequently, the FDB LM-Error test is more valid than the asymptotic LM-Error test.
3.2.2. Power
When and ( meaning test level, covered earlier), letting the Bootstrap simulation times be B=399, taking the Rook matrix as spatial weight matrix, and the sample size is respectively (49,10), (49,30), (100,10), we research the power of the FDB LM-Error test for testing the spatial dependence in the spatial panel data model under normal distributed errors, heteroscedastic errors or time-serials correlation.
(1) Error term with normal distribution
Setting the conditions as mentioned above, the power of the asymptotic LM-Error test and FDB LM-Error test is shown in
Figure 5.
In
Figure 5, ‘Asy’ and ‘Bp’ still, respectively, stands for asymptotic test and FDB test. The horizontal axis represents the correlation coefficient of spatial error
, and the longitudinal axis reflects the power. With the growing sample size, we find that the power of the asymptotic LM-Error test and FDB LM-Error test increases as
increases. Furthermore, the power of the FDB LM-Error test is quite close to that of asymptotic LM-Error test. Moreover, when
, the power of both are almost equal to 1, which show that when there is the presence of large spatial dependence, both tests are quite valid.
(2) The presence of heteroscedastic errors
The parameters are set as above. We can see the power of asymptotic LM-Error test and FDB LM-Error test in
Figure 6.
Crosswise, we can easily find that both the power of the asymptotic LM-Error test and the FDB LM-Error test increases as the sample size grows, and the power of the FDB LM-Error test is greater than or equal to that of the asymptotic LM-Error test. Meanwhile, both powers are almost equal to 1 when the condition is satisfied. Thus, we can draw the conclusion that both the FDB LM-Error test and the asymptotic LM-Error test have good performance, although the spatial dependence is large.
(3) Time serial correlated errors
In this section, we set the correlation coefficient as
, and for the others it is the same as previously mentioned. Then,
Figure 7 shows the power of the FDB LM-Error test for testing spatial dependence in the spatial panel data model with random effects, under the condition that the error term has a time-series correlation.
From the horizontal visual angle, with the increase in sample size, both the powers of FDB LM-Error test and asymptotic LM-Error test increase, and the former is greater than or equal to the latter. When , they are almost equal to 1.
We further focus on the influence of the correlation coefficient
and time dimension T on the power of asymptotic LM-Error test and FDB LM-Error test. By setting the Rook matrix as the spatial weight matrix, Bootstrap times B = 399, and taking three sample size as (49,10), (49,20), (49,30), which different from aforementioned sample size, and letting
,
, the power of the two tests can be seen in
Figure 8.
It is not very difficult to find that both powers of the FDB LM-Error test and the asymptotic LM-Error present a decreasing trend while increases, and they are very close. As T changes from 10 to 30, the power of the two tests both increase and their curve nearly overlaps.
In summary, the Monte Carlo simulation results indicate that the asymptotic LM-Error test has a large size distortion when the error term violates the classical normal distribution in the spatial panel data model, while the FDB LM-Error test allows a small size distortion under the premise that the power of this test has essentially no loss. Therefore, the FDB LM-Error test is a more valid method for the spatial dependence test in the spatial panel data model.