Next Article in Journal
Learning Retinal Patterns from Multimodal Images
Previous Article in Journal
Cryptosporidium & Giardia in Water—Key Features and Basic Principles for Monitoring & Data Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Extended Abstract

Testing Goodness-of-Fit of Parametric Spatial Trends †

by
Andrea Meilán-Vila
1,*,
Jean Opsomer
2,
Mario Francisco-Fernández
1 and
Rosa M. Crujeiras
3
1
Departamento de Matemáticas, Universidade da Coruña, 15071 A Coruña, Spain
2
Westat Inc., Rockville, MD 20850, USA
3
Departamento de Estadística, Análisis Matemático y Optimización, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
*
Author to whom correspondence should be addressed.
Presented at the XoveTIC Congress, A Coruña, Spain, 27--28 September 2018.
Proceedings 2018, 2(18), 1185; https://doi.org/10.3390/proceedings2181185
Published: 17 September 2018
(This article belongs to the Proceedings of XoveTIC Congress 2018)

Abstract

:
The aim of this work is to propose and analyze the behavior of a test statistic to assess a parametric trend surface, that is, a regression model with spatially correlated errors. The asymptotic behavior under the null hypothesis, as well as the asymptotic power of the test under local alternatives will be analyzed. Finite sample performance of the test is addressed by simulation, introducing a bootstrap calibration procedure.

1. Introduction

Consider a spatial stochastic process, which consists of a collection of random variables indexed on a certain domain of R 2 , with a well-defined joint distribution. In this framework, the observed data usually exhibit an important feature: close observations tend to be more similar than those which are far apart. Therefore, such observations cannot be treated as independent and the dependence structure should be taken into account in any descriptive or inferential procedure. In particular, from the perspective of spatial regression models (a trend surface plus an error term), the dependence structure should be considered and properly introduced into the model.
A common task in statistics is to determine whether a parametric model is an appropriate representation of a dataset. Under the assumption of independent errors, some authors have developed goodness-of-fit tests for parametric models that rely on a smooth alternative estimated by a nonparametric regression method, as [1] or [2].
A new proposal for testing a parametric trend surface is given in this paper. The proposed test is based on a comparison between a smooth version of a parametric fit with a nonparametric estimator of the trend (specifically, the multivariate local linear estimator will be used) in terms of a distance.

2. Statistical Model

Let { Z ( s ) , s D } be a random spatial process consisting of collections of random variables indexed in a domain D R 2 with a well-defined joint distribution. Consider n locations { s 1 , , s n } on the region D generated from a density f. The set of random variables corresponding with those locations will be represented by { Z ( s 1 ) , , Z ( s n ) } . Assume the model
Z ( s i ) = m ( s i ) + ε ( s i ) , i = 1 , , n ,
where m is an unknown smooth regression function which is supposed to be twice continuously differentiable. The ε are unobserved random variables with
E [ ε ( s i ) ] = 0 , Cov ( ε ( s i ) , ε ( s j ) ) = σ 2 ρ n ( s i s j ) , i , j = 1 , , n ,
where σ 2 < and ρ n is a continuous correlation function satisfying ρ n ( 0 ) = 1 , ρ n ( s ) = ρ n ( s ) and | ρ n ( s ) | 1 , s . The goal of this work is to test if the trend function belongs to a parametric family:
H 0 : m M β = { m β , β B } , vs . H a : m M β ,
with B R p a compact set. One of the more usual approaches is to compare a smooth version of a parametric fit with a nonparametric estimator of m ( s ) and “thereafter” to reject H 0 if the distance between both fits exceeds a critical value.

3. Test Statistic

A suitable test statistic in order to solve the testing problem (2) could be computed as a weighted L 2 —distance between the nonparametric and parametric fits, as in [2]:
T n = n | H | 1 / 2 D ( m ^ H L L ( s ) m ^ H , β ^ L L ( s ) ) 2 w ( s ) d s ,
where w is a weight function. A full definition of the elements of the test statistic T n can be found in Appendix A. For the calibration of the critical values, a bootstrap procedure is considered, see Appendix B.

4. Simulations

In this section, a simulation study showing the performance of the bootstrap procedure is presented. For this purpose, 500 samples of size n = 400 are generated from an isotropic spatial process observed at regularly spaced locations { s 1 , , s n } in the unit square, where s i = ( s i 1 , s i 2 ) , i = 1 , , n :
Z ( s i ) = 2 + s i 1 + s i 2 + c s i 1 3 + ε ( s i ) , 1 i n .
The random errors ε ( s i ) are normally distributed with zero mean and exponential covariance function Cov ( ε ( s i ) , ε ( s j ) ) = σ 2 { exp ( s i s j / a e ) } , with σ = 0.4 and σ = 0.8 . Different values of parameter a e are considered: a e = 0.4 , 0.6, 0.8 . The bootstrap procedure has been performed using B = 500 replicas for each sample. The weight function used was taken as w ( s ) = 1 . For simplicity, the bandwidth matrix was considered H = diag ( h , h ) , and different bandwidth values were chosen, h = 0.10, 0.15, 0.20 .
In Table 1, the simulated rejection probabilities obtained for T n are presented for the significance level α = 0.05 over the 500 trials. When c is equal to zero (under the null hypothesis of linearity of the trend), the proportion of rejections obtained is similar to the considered significance level, but this proportion depends directly on the value of the bandwidth h. When c is equal to 5 or 10, the power of the test is really good, since the proportion of rejections is close to one, in the majority of the cases. Again, this proportion depends on the value of the bandwidth.

Funding

This research has received financial support from the Xunta de Galicia and the European Union (European Social Fund-ESF). This research has been partially supported by MINECO grant MTM2014-52876-R, MTM2016-76969-P and MTM2017-82724-R and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A

The trend surface estimation can be performed using a parametric and a non-parametric approach. In the parametric context, an iterative estimation procedure could be used. Denoting Z = ( Z ( s 1 ) , , Z ( s n ) ) and m β = ( m β ( s 1 ) , , m β ( s n ) ) , under H 0 the steps of the procedure are:
(1) Based on the sample, estimate the trend parameter β using the ordinary least squares estimator, ignoring the dependence structure of the errors:
β ˜ = arg min β ( Z m β ) ( Z m β ) .
(2) Estimate the variance-covariance matrix of the errors Σ using the residuals ε ˜ ( s i ) = Z ( s i ) m β ˜ ( s i ) , i = 1 , , n , obtained from the estimator of the trend from Step (1). Note that, the entries of Σ are:
Σ ( i , j ) = C θ ( s i s j ) , i , j = 1 , n ,
where C θ ( s i s j ) = σ 2 γ θ ( s i s j ) , being { 2 γ θ ( u ) : θ Θ R q } a valid parametric family to estimate the variogram function.
(3) Estimate the trend parameter β using the weighted least squares estimator, taking the dependence structure of the errors into account:
β ^ = arg min β ( Z m β ) Σ ˜ 1 ( Z m β ) .
Therefore, the parametric trend estimator considered would be m β ^ . Note that, an estimation of Σ can be obtained from the residuals ε ˜ ( s i ) , i = 1 , , n , as follows:
Σ ˜ ( i , j ) = C θ ˜ L S ( s i s j ) = σ ˜ 2 γ θ ˜ L S ( s i s j ) , i , j = 1 , n ,
where γ θ ˜ L S is the parametric least squares estimator of the variogram and σ ˜ 2 is an estimator of the variance. The last estimator could be obtained using a least squares procedure.
From a nonparametric point of view, model (1) has been studied by several authors. Some approaches used for this task include kernel-based methods. In this case, the trend is estimated using the multivariate local linear estimator, see [3]. In the spatial framework, the local linear estimator for m ( s ) at a location s can be explicitly written as
m ^ H L L ( s ) = e 1 ( X s W s X s ) 1 X s W s Z ,
where e 1 = ( 1 , 0 , 0 ) , X s is a n × 3 matrix whose i-th row equals ( 1 , ( s i s ) ) , i = 1 , , n , W s = diag { K H ( s 1 s ) , , K H ( s n s ) } , where K H ( s ) = | H | 1 K ( H 1 s ) is used to assign weights. H is a 2 × 2 symmetric, positive definite matrix depending on the sample size n and K is a multivariate kernel function. Given s , the bandwidth H controls the shape and the size of the local neighborhood used to estimate m.
Therefore, taking into account these estimators, the proposed test statistic is
T n = n | H | 1 / 2 D ( m ^ H L L ( s ) m ^ H , β ^ L L ( s ) ) 2 w ( s ) d s ,
where w is a weight function and m ^ H , β ^ L L is a smooth version of the parametric estimator m β ^ , which is defined by
m ^ H , β ^ L L ( s ) = e 1 ( X s W s X s ) 1 X s W s m β ^ ,
where m β ^ = ( m β ^ ( s 1 ) , , m β ^ ( s n ) ) .

Appendix B

Once a suitable test statistic is available, a crucial task is the calibration of critical values for a given level α , namely t α . Usually, the estimation of these critical values t α such that P H 0 ( T n t α ) = α can be done by means of the asymptotic distribution. The use of asymptotic theory to calibrate the test poses some problems, such as the need to estimate some nuisance functions and a slow convergence rate to the limit distribution. Under these circumstances, calibration can be done by means of resampling procedures, such as bootstrap, see [4].
The procedure consists in generating a bootstrap sample { Z * ( s i ) , i = 1 , , n } and then computing a bootstrap statistic T n * like T n by the squared deviation between the smooth version of the parametric fit m ^ β ^ * L L and the nonparametric fit m ^ * L L . Once the bootstrap statistic is computed, the distribution of T n * can be approximated by Monte Carlo. From this Monte Carlo approximation, the ( 1 α ) quantile t α * is defined and the parametric hypothesis es rejected if T n > t α * . The specific steps for the algorithm used in this work are the following:
  • Obtain the parametric trend estimator β ^ .
  • Estimate the covariance matrix of the errors Σ ^ based on the residuals ε ^ = ( ε ^ ( s 1 ) , , ε ^ ( s n ) ) , where ε ^ ( s i ) = Z ( s i ) m β ^ ( s i ) , i = 1 , , n , and find the matrix L, such that Σ ^ = L L , using Cholesky decomposition.
  • Compute the independent residuals, e = ( e ( s 1 ) , , e ( s n ) ) , given by e ( s i ) = L 1 ε ^ ( s i ) .
  • These independent variables are centered and, from them, we obtain an independent bootstrap sample of size n, denoted by e * = ( e * ( s 1 ) , , e * ( s n ) ) .
  • Finally, the bootstrap errors ε * = ( ε * ( s 1 ) , , ε * ( s n ) ) are ε * ( s i ) = L e * ( s i ) , and the bootstrap samples are Z * ( s i ) = m β ^ ( s i ) + ε * ( s i ) .

References

  1. Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Stat. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
  2. Alcalá, J.; Cristóbal, J.; González-Manteiga, W. Goodness-of-fit test for linear models based on local polynomials. Stat. Probab. Lett. 1999, 42, 39–46. [Google Scholar] [CrossRef]
  3. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA, 1996; Volume 66. [Google Scholar]
  4. Francisco-Fernández, M.; Jurado-Expósito, M.; Opsomer, J.; López-Granados, F. A nonparametric analysis of the spatial distribution of Convolvulus arvensis in wheat-sunflower rotations. Environmetrics 2006, 17, 849–860. [Google Scholar] [CrossRef]
Table 1. Proportion of rejections of the null hypothesis.
Table 1. Proportion of rejections of the null hypothesis.
h
σ a e c 0.100.15 0.20
0.4 0.40 0.0520.047 0.042
5 0.8970.932 0.911
10 0.9050.948 0.923
0.4 0.60 0.0540.042 0.034
5 0.8560.901 0.898
10 0.8940.926 0.918
0.8 0.80 0.0680.048 0.038
5 0.8080.798 0.806
10 0.8450.803 0.816
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Meilán-Vila, A.; Opsomer, J.; Francisco-Fernández, M.; Crujeiras, R.M. Testing Goodness-of-Fit of Parametric Spatial Trends. Proceedings 2018, 2, 1185. https://doi.org/10.3390/proceedings2181185

AMA Style

Meilán-Vila A, Opsomer J, Francisco-Fernández M, Crujeiras RM. Testing Goodness-of-Fit of Parametric Spatial Trends. Proceedings. 2018; 2(18):1185. https://doi.org/10.3390/proceedings2181185

Chicago/Turabian Style

Meilán-Vila, Andrea, Jean Opsomer, Mario Francisco-Fernández, and Rosa M. Crujeiras. 2018. "Testing Goodness-of-Fit of Parametric Spatial Trends" Proceedings 2, no. 18: 1185. https://doi.org/10.3390/proceedings2181185

APA Style

Meilán-Vila, A., Opsomer, J., Francisco-Fernández, M., & Crujeiras, R. M. (2018). Testing Goodness-of-Fit of Parametric Spatial Trends. Proceedings, 2(18), 1185. https://doi.org/10.3390/proceedings2181185

Article Metrics

Back to TopTop