Next Article in Journal
A Compressed Sensing Measurement Matrix Construction Method Based on TDMA for Wireless Sensor Networks
Next Article in Special Issue
Machine Learning Methods for Multiscale Physics and Urban Engineering Problems
Previous Article in Journal
Modeling and Analysis of Social Phenomena: Challenges and Possible Research Directions
Previous Article in Special Issue
Spatial Modeling of Precipitation Based on Data-Driven Warping of Gaussian Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of the Covariance Matrix in Hierarchical Bayesian Spatio-Temporal Modeling via Dimension Expansion

Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(4), 492; https://doi.org/10.3390/e24040492
Submission received: 18 February 2022 / Revised: 18 March 2022 / Accepted: 18 March 2022 / Published: 31 March 2022

Abstract

:
Ozone concentrations are key indicators of air quality. Modeling ozone concentrations is challenging because they change both spatially and temporally with complicated structures. Missing data bring even more difficulties. One of our interests in this paper is to model ozone concentrations in a region in the presence of missing data. We propose a method without any assumptions on the correlation structure to estimate the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields based on the estimations from the hierarchical Bayesian spatio-temporal modeling technique (Le and Zidek). Further, we apply an entropy criterion (Jin et al.) based on a predictive model to decide if new stations need to be added. This entropy criterion helps to solve the environmental network design problem. For demonstration, we apply the method to the ozone concentrations at 25 stations in the Pittsburgh region studied. The comparison of the proposed method and the one is provided through leave-one-out cross-validation, which shows that the proposed method is more general and applicable.

1. Introduction

Ozone concentrations are the daily maximum 8 h moving averages of hourly ozone concentration data recorded in micrograms per cubic meter, μ g/m 3 , which are key indicators of air quality. Monitoring the changes both spatially and temporally is very important for the assessment of air quality change, which has a great impact on our environment, society and economy. However, modeling the ozone concentrations is not an easy task since the ozone concentrations vary over space and time with complicated spatial structures, temporal structures and spatio-temporal interactions. Furthermore, the presence of missing data brings even more difficulties. As commented in [1], although we cannot escape the “curse of dimensionality”, we can take advantage of recent developments in computing speed and numerical advances (e.g., Markov chain Monte Carlo) that allow us to implement Bayesian spatio-temporal dynamical models in a hierarchical framework. Such a framework provides simple strategies for incorporating complicated spatio-temporal interactions at different stages of the models’ hierarchy, and the models are feasible to be implemented for high-dimensional data. Two popular hierarchical Bayesian spatio-temporal models can be found in [1,2], among others. The latter one was used in [3].
Ref. [3] studied the ozone concentrations within 79 to 81.5 longitude and 39.5 to 41.5 latitude around the Pittsburgh region ( 79.23 , 43.39 ), in which all of the monitoring stations have missing data. That paper dealt with the missing problems in two steps. First, it filled in some of the missing measurements by using linear models so that the pattern of missing data became monotone (the monotone missing is also referred to as the staircase pattern). Second, it applied hierarchical Bayesian spatio-temporal (HBST) modeling proposed in [2] on this staircase of missing data to estimate the hyperparameters of the spatial-temporal model. Based on the estimated hyperparemeters, it estimated the spatial correlation function for the monitoring stations. Then, it estimated the covariance matrix for all of the stations and derived the predictive distribution for the ungauged sites.
Generalized linear models can be used to accommodate non-Gaussian geostatistical data (e.g., see [4]). Ref. [3] selected the generalized linear model with the quasi-Poisson family as an appropriated spatial correlation function by examining the pattern of spatial correlations obtained via the hierarchical model in the plot. However, their link function is not appropriate if there are negative correlations. This is a strong restriction because negative correlations are common for the ozone concentrations and other spatial-temporal data. Moreover, choosing a model by examining the plots derived in terms of the observed data set is not rigorous enough and may only be suitable just for a particular data set.
In this paper, we propose a method to estimate the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields based on the estimations from hierarchical Bayesian spatio-temporal modeling. For demonstration, we apply the proposed method on the same data as in Jin et al. [3]. Without any assumption on the correlation structure, the proposed method is more general than the method in [3] such that it is applicable to other spatio-temporal data sets. Using the covariance matrix estimated by the proposed method on the entropy criterion in the environmental network design problem, our study provides interesting findings, and the locations of the selected ungauged stations are more reasonable. We provide comparison of these two methods through leave-one-out cross-validation, which shows that the proposed method provides improved results.
The paper is arranged as follows. In Section 2, we briefly introduce hierarchical Bayesian spatio-temporal modeling. In Section 3, we describe the ozone concentrations in the Pittsburgh region and apply the hierarchical Bayesian spatio-temporal modeling techniques for filling in missing measurements following [3]. In Section 4, we model the ozone concentrations in the Pittsburgh region. We first introduce the method for estimating the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields, and we then give spatial predictive distributions on the ungauged sites using the covariance matrix estimated by the proposed method. In Section 5, we present the results of the entropy of the predictive distributions and an optimality criterion for extending an environmental network. In Section 6, we provide the model evaluation through leave-one-out cross-validation. We conclude this paper with a conclusion in Section 7.
Throughout the rest of the paper, the L 1 -norm of a vector c is denoted by c 1 , a p × p identity matrix is denoted by I p , the transpose of a matrix A is denoted by A and the trace of a square matrix B is denoted by tr ( B ) . In addition, ‘⊗’ represents the Kronecker product, N k × ( · , · ) refers to a matrix Gaussian distribution, t k × ( · , · ) denotes a matric-t distribution, I W ( · , · ) stands for the inverted Wishart distribution (see (a) of the appendix for definitions of these distributions) and G I W ( · , · ) denotes the generalized inverted Wishart distribution.

2. Hierarchical Bayesian Spatio-Temporal Modeling

We briefly describe HBST modeling in this section, which is the same as that given in [3] excluding Step 3 in the HBST modeling procedure. It is noted that this modeling is a special case of the HBST modeling presented in Chapter 10 of Le and Zidek (2006) excluding Step 3 in the HBST modeling procedure.
Define the following notations:
d = number of different type stations (e.g., agricultural, residential, commercial and industrial);
n = number of time points (e.g., number of days);
u = number of locations with no monitors (i.e., ungauged sites);
g = number of locations with monitors (i.e., gauged sites).
The stations are organized into k blocks where the g j ( j = 1 , 2 , , k ) sites in the jth block have the same number of timepoints m j at which no measurements are taken. These blocks are numbered so that the measurements correspond to a monotone data pattern or a staircase structure, that is,
m 1 > m 2 > > m k 0 .
The response variables are written as
Y = [ Y [ u ] , Y [ g ] ] .
Here, Y [ u ] of dimension n × u denotes the unobserved responses at ungauged sites while Y [ g ] of dimension n × g is given by
Y [ g ] = Y [ g 1 ] , , Y [ g k ] = Y [ g 1 m ] Y [ g 1 o ] , , Y [ g k m ] Y [ g k o ] ,
where Y [ g j m ] is an m j × g j matrix of missing measurements at the g j gauged sites for the m j time points and Y [ g j o ] is an ( n m j ) × g j matrix of observed measurements at the g j gauged sites for the ( n m j ) time points.
We assume that the response matrix Y follows the Gaussian and generalized inverted Wishart model specified by
Y | B , Σ N ( X B , I n Σ ) , B | Σ , B 0 N ( B 0 , V B Σ ) , Σ G I W ( Θ , δ ) .
where B is an l × ( g + u ) coefficient matrix with the hyperparameter mean matrix B 0 and the variance components V B , X is the matrix of covariates which is defined in (4) and Θ , δ is a set of model parameters specified below.
We partition B corresponding to the l time-varying covariates in conformance with the block structure as
B = ( B [ u ] , B [ g 1 ] , , B [ g k ] ) .
By assuming an exchangeable structure across sites, B can be written as B = B ˜ R , where B ˜ is the l × d hyperparameter matrix and R = ( r i , j ) d × ( u + g ) with r i , j = 1 for Station j under Class i and r i j = 0 otherwise.
Likewise, we partition the ( u + g ) × ( u + g ) covariance matrix Σ over gauged and ungauged sites conformably as
Σ = Σ [ u , u ] Σ [ u , g ] Σ [ g , u ] Σ [ g , g ] ,
where Σ [ u , u ] is a u × u matrix being for the ungauged sites. Further, we partition the g × g covariance matrix Σ [ g , g ] for the gauged site blocks as follows:
Σ [ g , g ] = Σ [ g 1 , g 1 ] Σ [ g 1 , g k ] Σ [ g k , g 1 ] Σ [ g k , g k ] .
Similarly, for j = 1 , , k , we put
Σ [ g j , , g k ] = Σ [ g j , g j ] Σ [ g j , g k ] Σ [ g k , g j ] Σ [ g k , g k ] .
We reparametrize the matrix Σ through the recursive one-to-one Bartlett transformation for the two blocks:
Σ = Γ [ u ] + ( Υ [ u ] ) Σ [ g , g ] Υ [ u ] ( Υ [ u ] ) Σ [ g , g ] Σ [ g , g ] Υ [ u ] Σ [ g , g ] ,
where Γ [ u ] = Σ [ u , u ] Σ [ u , g ] Σ [ g , g ] 1 Σ [ g , u ] and Υ [ u ] = Σ [ g , g ] 1 Σ [ u , g ] . Similarly, by applying the Bartlett decomposition, we can represent the submatrix Σ [ g j , , g k ] , for j = 1 , , k 1 , as
Σ [ g j , , g k ] = Γ j + Υ j Σ [ g j + 1 , , g k ] Υ j Υ j Σ [ g j + 1 , , g k ] Σ [ g j + 1 , , g k ] Υ j Σ [ g j + 1 , , g k ] ,
where Γ k = Σ [ g k , g k ] and for j = 1 , , k 1 ,
Γ j : g j × g j = Σ [ g j , g j ] Σ [ g j , ( g j + 1 , , g k ) ] Σ [ g j + 1 , , g k ] 1 Σ [ ( g j + 1 , , g k ) , g j ] ,
Υ j : ( g j + 1 + + g k ) × g j = Σ [ g j + 1 , , g k ] 1 Σ [ ( g j + 1 , , g k ) , g j ] ,
with
Σ [ g j + 1 , , g k ] = Σ [ g j + 1 , g j ] Σ [ g k , g j ] .
Therefore, the GIW prior distribution for Σ in (1) is equivalently defined in terms of ( Γ [ u ] , Υ [ u ] ) and { ( Γ 1 , Υ 1 ) , , ( Γ k , Υ k ) , Σ k } as follows:
Υ [ u ] | Γ [ u ] N ( Υ 00 , H 0 Γ [ u ] ) , Γ [ u ] I W ( Λ 0 , δ 0 ) , Υ j | Γ j N ( Υ 0 j , H j Γ j ) , j = , k 1 , Γ j I W ( Λ j , δ j ) , j = 1 , , k ,
where Υ [ u ] is the slope of the optimal linear predictor of Y [ u ] based on Y [ g ] and Γ [ u ] is the residual covariance of the optimal linear predictor. Similar interpretations can be applied to Υ j and Γ j , for j = 1 , , k 1 .
Let H be the set of the hyperparameters in (1) and (2), i.e., H = { Θ , δ , V B , B 0 } , where Θ = { ( Υ 00 , H 0 , Λ 0 ) , ( Υ 01 , H 1 , Λ 1 ) , , ( Υ 0 k 1 , H k 1 , Λ k 1 ) , Λ k } with degrees of freedom parameters δ = ( δ 0 , δ 1 , , δ k ) . Write H = [ H u , H g ] . Here, H g = { V B , B 0 , ( Υ 01 , H 1 , Λ 1 , δ 1 ) , , ( Υ 0 , k 1 , H k 1 , Λ k 1 , δ k 1 ) , ( Λ k , δ k ) } , which represents the hyperparameters involved in the marginal distribution of Y [ g o ] .
If a data matrix appears to be an ascending staircase, the HBST modeling procedure is given as follows:
Step 1.
Compute the hyperparameter values that maximize the marginal distribution f ( Y [ g o ] | H g ) using an empirical Bayesian approach (see (b) of Appendix A). The EM algorithm is used to obtain H ^ g .
Step 2.
Obtain the predictive distributions f ( Y [ g k m ] | Y [ g o ] , H ^ g ) of missing measurements as in (c) of Appendix A. Fill in the missing data by using the predictive distributions.
Step 3.
Obtain the estimate Σ ^ [ g , g ] from the estimate of H ^ g . In terms of Σ ^ [ g , g ] , obtain the estimate of the covariance matrix by using a dimension expansion method given in Qin et al. [5] and the thin-plate spline method given in Wabba and Wendelberger (1980). The details are given in Section 4.1.
Step 4.
Estimate the hyperparameters H u and obtain the conditional predictive distribution f ( Y [ u ] | Y [ g ] , H ^ ) (see Section 4.2).

3. Ozone Concentrations from the Monitoring Stations in Pittsburgh Region

The ozone concentrations were recorded within 79 to 81.5 longitude and 39.5 to 41.5 latitude around the Pittsburgh region ( 79.23 , 43.39 ) for four consecutive summer months, June, July, August and September, over the period from 1995 to 2007. There were 25 monitoring stations in the region as shown in Figure 1, which is the same as Figure 1 in [3]. The original data set Y 0 was collected from 25 stations, and there were a total of 1586 (13 years × 122 days) measurements at each station. The number of missing data in Y 0 is shown by N1.Miss in Table 1, which is the same as Table 1 in [3]. In this section, we fill in missing measurements.

3.1. Filling in the Missing Measurements for Each Monitoring Station within the Period of Monitoring Blocks

Since there are missing data in the dataset, we follow the steps in [3] in filling in some missing measurements occurred during the operation of each monitoring station, using the regression model as
y 122 ( i 1 ) + j = a sin 2 ( 122 ( i 1 ) + j ) π 122 + b cos 2 ( 122 ( i 1 ) + j ) π 122 + c i + ε 122 ( i 1 ) + j = a sin j π 61 + b cos j π 61 + c i + ε 122 ( i 1 ) + j ,
for i = 1 , , 13 , and j = 1 , , 122 , where a and b are regression coefficients, c i , for i = 1 , , 13 , are the categorical factors and { ε t } is a sequence of independently and identically distributed Gaussian random variables with mean 0 and variance σ 2 . The model (3) assigns different means to the years with a yearly cycle of 122 days. We re-express the 13 factors in the model via Helmert contrasts, which compare the first level of the factor with all later levels, the second level with all later levels, and so forth. The Helmert matrix, Z 13 × 13 , is defined as follows.
Z = 1 1 1 1 1 1 1 1 1 1 1 0 2 1 1 1 0 0 11 1 1 0 0 0 12 .
Let X, the matrix of covariates, be
X = S Z 1 122 1586 × 15 ,
where 1 n = ( 1 , 1 , , 1 , 1 ) 1 × n and
S = sin ( π / 61 ) sin ( i π / 61 ) sin ( 1586 π / 61 ) cos ( π / 61 ) cos ( i π / 61 ) cos ( 1586 π / 61 ) 2 × 1586 ,
and let y = ( y 1 , y 2 , , y 1586 ) , β = ( a , b , d 1 , , d 13 ) and ε = ( ε 1 , ε 2 , , ε 1586 ) denote the response variables, regression coefficient vector and error variables, respectively. The model (3) is written as y = X β + ε .
We then fill in the missing measurements within the blocks by the least squares predictions plus errors and obtain a new data set Y 1 in which the unfilled missing measurements are either in the end of the time period or in the beginning of the time period. The number of missing data in Y 1 is shown in Table 1 by N2.Miss.

3.2. Filling in the Missing Measurements in Y 1

To fill in the missing measurements in Y 1 , we can proceed as follows [3]:
(i)
Obtain a new data set Y 2 from Y 1 by filling in the 488 missing measurements at Stations 5 and 25 during the end of the time period by using the HBST modeling technique. N3.Miss in Table 1 displays the number of missing data in the data set Y 2 , which shows that Y 2 has a staircase data structure, as all of the missing data are located in the beginning of the time period.
(ii)
Put d = 4 , l = 15 , n = 1586 , k = 7 , m 1 = 854 , m 2 = 610 , m 3 = 488 , m 4 = 366 , m 5 = 318 , m 6 = 244 , m 7 = 0 , g 1 = 1 , g 2 = 1 , g 3 = 0 , g 4 = 3 , g 5 = 1 , g 6 = 1 and g 7 = 16 . Fill in the remaining missing values in Y 2 by executing Steps 1–2 of the HBST modeling procedure.

4. Model the Ozone Concentrations in the Pittsburgh Region

To model ozone concentrations in the Pittsburgh region by spatial interpolation, we cover the region by the 100 grid boxes of a spatial resolution of latitude 0.2 × longitude 0.2 . Thus, u = 100 . The grid points are ungauged sites, and their classes are displayed in Figure 4. To derive the predictive distributions for these grid points, a key step is to estimate the covariance matrix.

4.1. Estimation of the Covariance Matrix

In this subsection, we introduce a method for estimating the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields in terms of H ^ g from Step 1 of the HBST procedure.
Let Y ( x ) : x S , S R d , be an environmental random process, where x is a d-dimensional spatial index that varies continuously throughout the region S . At n spatial locations denoted by x i : i = 1 , , n , we observe realizations of the random process Y ( x ) , i.e., Y ( x i ) : i = 1 , , n . We are interested in learning the spatial dependency of the process through the observed data. The semivariogram function which describes the degree of spatial dependency of an intrinsic stationary random process is a cornerstone in spatial statistics. An intrinsic stationary random process satisfies the following two conditions (Cressie [6]):
  • E Y ( x ) = U , for x S ,
  • var Y ( x i ) Y ( x j ) = 2 γ ( x i x j ) ,
where a semivariogram is defined as γ ( x i x j ) = 1 2 var Y ( x i ) Y ( x j ) for two different locations, x i and x j , in the monitored region. The estimated covariance matrix of the monitoring stations Σ ^ [ g , g ] is based on the estimation of H g from Step 1 of the HBST procedure. We estimate the semivariograms of the ozone concentrations from the monitoring stations by
γ ^ ( x i x j ) = 1 2 var ^ ( Y ( x i ) ) + 1 2 var ^ ( Y ( x j ) ) cov ^ Y ( x i ) , Y ( x j ) .
From Figure 2, we notice that the estimated semivariograms related to Station 3 (marked by “×”) are much higher than the other stations. We examine the location of Station 3 and notice that it was on the edge of the monitored region. Moreover, there were over ten airports around this station. According to Xue et al. [7], there is a great impact of high-altitude aircrafts on the ozone layer in the stratosphere. This becomes an influential factor in modeling the ozone concentrations. Next, we introduce how this factor is considered in the proposed modeling technique.
It is obvious that this field is not stationary. Bornn et al. [8] proposed a novel approach to find the latent dimensions over which the nonstationary fields exhibit stationarity through dimension expansion. They justified that for a nonstationary Gaussian process Y ( x ) , where x R d , there exists a vector z R p , p > 0 , such that the expanded process Y [ x , z ] is stationary under appropriate moment constraints. Note that [ x , z ] is the concatenation of the vectors x and z . The stationary semivariogram with latent vectors can be expressed by
2 γ x i , z i x j , z j = E Y x i , z i Y x j , z j 2 ,
where x i , z i is the expanded spatial index for the ith location. Qin et al. [5] improved the method in Bornn et al. [8] by considering the covariance structure of the γ ^ i , j , for j i , which are generally correlated. In our application, we use the lasso-penalized weighted least-squares criterion (WLS) in Qin et al. [5] as follows,
ϕ ^ , Z W L S = argmin Œ , Z j < i 1 γ ϕ 2 d i , j [ X , Z ] γ ^ i , j γ ϕ d i , j [ X , Z ] 2 + λ k = 1 p Z . k 1 .
to estimate the parameters and the expanded dimensions. Here, γ ^ i , j is the estimated semivariogram by (5) and d i , j [ X , Z ] is the Euclidean distance between the locations x i , z i and x j , z j and Z . k is the kth column of Z . [ X , Z ] is the concatenation of the matrices X and Z . The tuning parameter λ in the group lasso is used to determine the number of latent dimensions and regularize the estimation of Z to prevent overfitting. γ ϕ d i , j [ X , Z ] is a parametric stationary semivariogram model with parameter ϕ . The most popular ones are the exponential model, the spherical model and the Gaussian model (see Journel and Huijbregts [9] and Cressie [6]), among others). For example, the exponential model is defined as
γ ϕ ( d ) = ϕ 1 1 exp d / ϕ 2 + ϕ 3 ,
where ϕ = ( ϕ 1 , ϕ 2 , ϕ 3 ) , ϕ 1 0 , ϕ 2 0 and ϕ 3 0 .
The semivariogram plot with estimated expanded dimensions (Figure 3) of the monitoring stations shows that the field is in good agreement with the theoretical model, as most of the points are near the solid red line, the fitted exponential semivariogram model. Two extra dimensions are added to the original coordinate with λ = 0.01 . Figure 3 shows that with the extra dimensions, Station 3 is pushed much further out of the two-dimensional plane, reflecting the impact of high-altitude aircrafts on the ozone layer in the stratosphere we have mentioned earlier.
After the expanded dimensions for the monitoring stations are obtained, we use the thin-plate spline method [10] to estimate the hidden dimensions for the ungauged sites. The semivariograms for the ungauged stations are estimated by the exponential model using the estimated parameter vector ϕ ^ . Next, we estimate the semivariograms γ s i , s j between stations s i and s j using the exponential model based on the distances over the space composed by the original and the expanded dimensions. Last, the covariance between any two sites can be estimated by
Σ ^ i , j = Cov ^ ( Y ( s i ) , Y ( s j ) ) = 1 2 σ ^ Y ( s i ) + 1 2 σ ^ Y ( s j ) γ ^ s i , s j ,
where σ ^ Y ( s i ) and σ ^ Y ( s j ) are estimates of σ Y ( s i ) and σ Y ( s j ) obtained by the thin-plate spline approach.

4.2. Prediction of the Daily Ozone Concentrations at the Grid Points

By Chapter 10 of Le and Zidek (2006), spatial predictive distributions at the grid points given the monitoring sites are as follows:
( Y [ u ] | Y [ g ] , H ) t n × u U u | g , Φ [ u | g ] Ψ [ u | g ] δ 0 * , δ 0 * ,
where δ 0 * = δ 0 u + 1 , Ψ [ u | g ] = Λ 0 , U [ u | g ] = X B 0 [ u ] + ( Y [ g ] X B 0 [ g ] ) Υ 00 and Φ [ u | g ] = I n + X V B X + ( Y [ g ] X B 0 [ g ] ) H 0 ( Y [ g ] X B 0 [ g ] ) (see (a) of Appendix A for definition of the matric-t distribution).
We estimate the hyperparameters associated with the grid points Λ 0 , Υ 00 , H 0 and δ 0 via
δ ^ 0 = δ ^ 1 + + δ ^ k k , H ^ 0 = Λ ^ [ 1 , , k ] , Υ ^ 00 = ( Σ ^ [ g , g ] ) 1 Σ ^ [ g , u ] ,
Λ ^ 0 = δ ^ 0 u 1 1 + t r ( Σ ^ [ g , g ] H ^ 0 ) ( Σ ^ [ u , u ] Υ ^ 00 Σ ^ [ g , g ] Υ ^ 00 )
with
Λ ^ [ j , , k ] = Λ ^ j + Υ ^ 0 j Λ ^ [ j + 1 , , k ] Υ ^ 0 j Υ ^ 0 j Λ ^ [ j + 1 , , k ] Λ ^ [ j + 1 , , k ] Υ ^ 0 j Λ ^ [ j + 1 , , k ] , j = 1 , , k 1 ,
and Λ ^ [ k ] = Λ ^ k .
After all of the hyperparameters in the predictive distributions are estimated, we can predict the daily ozone concentrations at all the grid points in the time period of study by generating samples from the predictive distributions.

5. Environmental Network Extension

Assume that Y has the density function f. The total reduction in uncertainty of Y can be presented by the entropy of its distribution, i.e., H ( Y ) = E [ log f ( Y ) / h ( Y ) ] , where h ( · ) is a not necessarily integrable reference density (Jaynes [11]). According to the predictive distribution (7), the total entropy H ( Y [ u ] | Y [ g ] ) can be defined as
H ( Y [ u ] | Y [ g ] ) = 1 2 log Ψ [ u | g ] + c u ( u , q ) ,
where c u ( u , q ) is a constant depending on the degree of freedom and the dimension of the ungauged sites.
The key step in expanding an environmental network is to find appropriate ungauged sites to add to the existing network that maximizes the corresponding entropy. We use the following optimality criterion as given in [3]:
max a d d 1 2 log Ψ [ u | g ] a d d
The a d d sites, in a vector of dimension u 1 , are selected to maximize the entropy in (8). In [3], the grid points 91 , 92 , 93 were selected with the highest entropy 11.3774. The proposed method selects the grid points 41 , 71 , 100 with entropy 12.1207. This selection is more reasonable, as they are not gathered in the southeast corner of the region like 91 , 92 , 93 . The selected sites among 100 grid points by the two methods are shown in Figure 4 below.

6. Model Evaluation

In this section, we use the leave-one-out cross-validation to evaluate the accuracy of the predictive model derived using the proposed method and compare the proposed method with the one in [3]. We select the observations from one of the original 25 stations as validation data, and observations in the remaining 24 stations are treated as training data. We use the data from day 855 to day 1586 at the end of the study from each station to evaluate the prediction because during this period, none of the stations has missing data. By choosing this period, we avoid using the Bayesian hierarchical modeling technique for estimating the missing data in the training data set, which is time-consuming and not our intention for evaluating the proposed method on estimating the covariance matrix. Station 22 is excluded because it is the only industrial station in the study. For each of the 24 stations, we generate 100 samples from the predictive distribution with parameters estimated using observations from the rest of the 23 stations. We compute the average of relative absolute bias (ARAB) as j = 1 100 y j , i , t y i , t y i , t , where y j , i , t is the jth sample generated from the predictive distributions and y i , t is the observation from Station i on time t. The results are given in Table 2.
In Table 2, “-” means that there is no prediction for the station because there are negative correlations and the method in [3] is not applicable to estimate the predictive distribution. The results in Table 2 show that the proposed method provides slightly more accurate predictions than the one in [3] for most of the stations. More important is that, when there are negative correlations obtained from the estimations of the hierarchical Bayesian spatio-temporal modeling technique, the method in [3] fails to estimate the covariance matrix, while the proposed method still provides accurate predictions except for Station 3. This is expected because Station 3 is an influential station. Therefore, if we use observations at Station 3 as the validation data set, it has a great impact on estimating the covariance matrix.

7. Conclusions

In this paper, we have derived a predictive model through the hierarchical Bayesian spatio-temporal modeling technique given in [12] at ungauged sites based on the covariance matrix estimated by a dimension expansion method for modeling semivariograms in nonstationary fields. Further, we have applied an entropy criterion (see [12] or [3] for details) based on the predictive model to decide if new stations need to be added. This entropy criterion helps to solve the environmental network design problem. For demonstration, we have applied the proposed method on ozone concentrations at 25 stations in the Pittsburgh region studied in [3]. The proposed method has provided satisfactory results. Moreover, the results have shown that the method is more general and applicable, as no assumption is imposed on the correlation structure.

Author Contributions

Conceptualization, B.S. and Y.W.; methodology, B.S. and Y.W.; software, B.S.; validation, B.S. and Y.W.; formal analysis, B.S.; investigation, B.S. and Y.W.; resources, Y.W.; data curation, B.S.; writing—original draft preparation, B.S. and Y.W.; writing—review & editing, Y.W.; supervision, Y.W.; project administration, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Acknowledgments

The authors would like to thank the three anonymous reviewers for their helpful comments and constructive suggestions which led to the improvement of this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Hierarchical Bayesian Spatio-Temporal Modeling

The following is mainly based on Chapter 10 of Le and Zidek (2006).
(a) Somedistributions
Matrix normal distribution. If, for an n × m matrix U and positive definite matrices A : n × n and B : m × m , the density function of an n × m random matrix X has the form
f ( X ) = ( 2 π ) n m 2 | A | m 2 | B | n 2 exp { 1 2 tr [ A 1 ( X U ) B 1 ( X U ) ] } ,
then X is said to have a matrix normal distribution and is denoted by X N n × m ( U , A B ) .
Inverted Wishart distribution. If, for a p × p positive definite matrix Ψ and a positive constant δ p , the density function of a p × p random matrix S has the form
f ( S ) = 1 2 δ p 2 Γ p ( δ 2 ) | Ψ | δ 2 | S | ( δ + p + 1 ) 2 exp 1 2 tr ( S 1 Ψ ) ,
then S is said to have an inverted Wishart distribution and is denoted by S I W ( Ψ , δ ) .
Matric-t distribution. If, for an n × m matrix U , positive definite matrices A : n × n and B : m × m and a positive constant δ > 0 , the density function of an n × m random matrix X has the form
f ( X ) | A | m 2 | B | n 2 I n + δ 1 A 1 ( X U ) B 1 ( X U ) δ + n + m 1 2 ,
then X is said to have a matric-t distribution and is denoted by X t n × m ( U , A B , δ ) .
(b) Estimation of H g = { V B , B 0 , ( Υ 01 , H 1 , Λ 1 , δ 1 ) , , ( Υ 0 , k 1 , H k 1 , Λ k 1 , δ k 1 ) , ( Λ k , δ k ) }
Le and Zidek (2006) showed that the predictive distributions derived through the integrated framework above are completely characterized by their hyperparameters, which are estimated by an empirical Bayes approach, that is, to estimate them by maximizing the marginal likelihood of all the measured responses (conditional on those hyperparameters) evaluated at their observed values. This procedure is referred to as type-II maximum likelihood estimation (type-II MLE). To estimate H g , the following procedure can be employed: Compute the hyperparameter values that maximize the marginal distribution f ( Y [ g o ] | H g ) , where Y [ g o ] = { Y [ g 1 o ] , , Y [ g k o ] } . The subscript g indicates that not all the hyperparameters are involved in this marginal distribution. The response matrix Y follows the GIW distribution specified by (1) and (2). The marginal distribution f ( Y [ g o ] | H g ) can be written as
Y [ g o ] | H g j = 1 k t ( n m j ) × g j ( U o [ j ] , Φ o [ j ] Ψ o [ j ] , δ o [ j ] ) ,
where U o [ j ] = U ( 2 ) [ j ] , Φ o [ j ] = A 22 [ j ] , Ψ o [ j ] = Λ j δ j g j + 1 , δ o [ j ] = δ j g j + 1 with ε ˜ [ g j + 1 , , g k ] = Y [ g j + 1 , , g k ] X B 0 [ g j + 1 , , g k ] , for j = 1 , , k 1 ,
U ( 1 ) [ j ] U ( 2 ) [ j ] : m j × g j ( n m j ) × g j = X B 0 [ g j ] + ε ˜ [ g j + 1 , , g k ] Υ 0 j ,
and
A 11 [ j ] A 12 [ j ] A 21 [ j ] A 22 [ j ] : m j × m j m j × ( n m j ) ( n m j ) × m j ( n m j ) × ( n m j ) = I n + X V B X + ε ˜ [ g j + 1 , , g k ] H j ( ε ˜ [ g j + 1 , , g k ] ) .
Although f ( Y [ g o ] | H g ) can be written as a matric-t distribution as in (A1), direct maximization of this marginal density presents a challenge. The EM algorithm helps circumvent it.
(c) Predictive distributions of missing data
By Theorem 10.1 of Le and Zidek (2006), it follows that
Y [ g k m ] | Y [ g o ] , H g t m k × g k ( U ( m | g ) [ k ] , Φ ( m | g ) [ k ] Ψ ( m | g ) [ k ] , δ ( m | g ) [ k ] ) ,
Y [ g j m ] | Y [ g j + 1 m , , g k m ] , Y [ g o ] , H g t m j × g j ( U ( m | g ) [ j ] , Φ ( m | g ) [ j ] Ψ ( m | g ) [ j ] , δ ( m | g ) [ j ] ) ,
where U ( m | g ) [ j ] = U ( 1 ) [ j ] + A 12 [ j ] ( A 22 [ j ] ) 1 ( Y [ g j o ] U ( 2 ) [ j ] ) , Φ ( m | g ) [ j ] = δ j g j + 1 δ j g j + n m j + 1 [ A 11 [ j ] A 12 [ j ] ( A 22 [ j ] ) 1 A 21 [ j ] ] , Ψ ( m | g ) [ j ] = 1 δ j g j + 1 [ Λ j + ( Y [ g j o ] U ( 2 ) [ j ] ) ( A 22 [ j ] ) 1 ( Y [ g j o ] U ( 2 ) [ j ] ) ] and δ ( m | g ) [ j ] = δ j g j + n m j + 1 .

References

  1. Wikle, C.K.; Berliner, L.M.; Cressie, N. Hierachichical Bayesian space-time models. Environ. Ecol. Stat. 1998, 5, 117–154. [Google Scholar] [CrossRef]
  2. Le, N.D.; Sun, W.; Zidek, J.V. Spatial prediction and temporal backcasting for environmental fields having monotone data patterns. Can. J. Stat. 2001, 29, 529–554. [Google Scholar] [CrossRef]
  3. Jin, B.; Wu, Y.; Chan, E. Hierarchical Bayesian spatial-temporal modeling of regional ozone concentrations and respecitve network design. J. Environ. Stat. 2012, 3, 1–32. [Google Scholar]
  4. Diggle, P.J.; Ribeiro, P.J., Jr. Model-Based Geostatistics; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  5. Qin, S.; Sun, B.; Wu, Y.; Fu, Y. Generalized least-squares in dimension expansion method for nonstationary processes. Environmetrics 2021, 32, e2684. [Google Scholar] [CrossRef]
  6. Cressie, N. Statistics for Spatial Data, Revised Edition; Wiley: New York, NY, USA, 2015. [Google Scholar]
  7. Xue, X.T.; Guy, B.; Xing, L.; Pierre, F.; Claire, G.; Philip, R. The Impact of High Altitude Aircraft on the Ozone Layer in the Stratosphere. J. Atmos. Chem. 1994, 18, 103–128. [Google Scholar]
  8. Bornn, L.; Shaddick, G.; Zidek, J.V. Modeling non-stationary processes through dimension expansion. J. Am. Stat. Assoc. 2012, 107, 281–289. [Google Scholar] [CrossRef] [Green Version]
  9. Journel, A.G.; Huijbregts, C.J. Mining Geostatistics; Academic: London, UK, 1978. [Google Scholar]
  10. Wabba, G.; Wendelberger, J. Some new mathematical methods for variational objective analysis using splines and cross-validation. Mon. Weather Rev. 1980, 108, 1122–1143. [Google Scholar] [CrossRef] [Green Version]
  11. Jaynes, E.T. Information Theory and Statistical Mechanics, Statistical Physics, 3rd ed.; Ford, K.W., Ed.; Benjamin: New York, NY, USA, 1963. [Google Scholar]
  12. Le, N.D.; Zidek, J.V. Statistical Analysis of Environmental Space-Time Processes; Springer: New York, NY, USA, 2006. [Google Scholar]
Figure 1. Monitoring stations in the Pittsburgh region.
Figure 1. Monitoring stations in the Pittsburgh region.
Entropy 24 00492 g001
Figure 2. Empirical semivariograms of the ozone concentrations from the monitoring stations versus the Euclidean distances between monitoring stations based on the Bayesian hierarchical model. The semivariograms related to Station 3 are marked by “×”.
Figure 2. Empirical semivariograms of the ozone concentrations from the monitoring stations versus the Euclidean distances between monitoring stations based on the Bayesian hierarchical model. The semivariograms related to Station 3 are marked by “×”.
Entropy 24 00492 g002
Figure 3. Semivariogram plot of the ozone concentrations from the monitoring stations over a larger range of distances than the range shown in Figure 2, owing to the application of dimension expansion. The fitted exponential semivariogram model is shown by the red solid line.
Figure 3. Semivariogram plot of the ozone concentrations from the monitoring stations over a larger range of distances than the range shown in Figure 2, owing to the application of dimension expansion. The fitted exponential semivariogram model is shown by the red solid line.
Entropy 24 00492 g003
Figure 4. The selected sites among 100 grid points (black circled points by [3] and red circled points by our method).
Figure 4. The selected sites among 100 grid points (black circled points by [3] and red circled points by our method).
Entropy 24 00492 g004
Table 1. Location of the stations and number of missing data.
Table 1. Location of the stations and number of missing data.
IDClassLonLatN1.MissN2.MissN3.MissIDClassLonLatN1.MissN2.MissN3.Miss
12−40.2480.66855854854142−40.3880.182200
22−41.0980.65610610610151−40.5680.501300
33−39.6479.92618610610161−40.6880.351100
43−40.3079.50488488488172−40.7480.31400
53−40.3680.61858854366182−41.2180.48500
63−40.4480.01370366366191−40.4480.421600
72−40.4179.94370366366203−40.1479.90300
83−40.8179.56328318318212−40.1780.26100
91−39.8180.28278244244224−40.9980.34000
102−40.9381.121200233−40.4279.69500
111−41.4580.59100242−40.4280.58500
123−40.4679.96200252−40.1280.694884880
132−40.6179.73800
The numbers 1, 2, 3 and 4 under Class denote agricultural, residential, commercial and industrial, respectively.
Table 2. Mean and SD of the average of relative absolute bias.
Table 2. Mean and SD of the average of relative absolute bias.
IDOur MethodJin et al. (2012) [3]IDOur MethodJin et al. (2012) [3]
10.0789 (0.0627)0.8134 (0.0682)130.1145 (0.1096 )0.2003 (0.1769)
20.1206 (0.1356)0.1221 (0.1121)140.1361 (0.1732)0.2211 ( 0.2283)
30.8517 (0.8517)0.1572 ( 0.1572)150.1911 (0.2052)-
40.1756 (0.1693)-160.1189 (0.1179)0.1285 (0.1161)
50.1575 (0.1731)0.1986 (0.1855)170.1496 (0.1594 )0.1669 (0.1727)
60.1336 (0.1513)0.1477 (0.1667 )180.1253 (0.1154 )0.1256 (0.1372)
70.1265 (0.1563 )0.1456 (0.1732)190.1369 (0.1272)0.1026 ( 0.0994)
80.0968 (0.0804)0.1135 (0.1023)200.1603 (0.1598)0.1310 (0.1134)
90.1497 (0.1104)0.1619 (0.1208)210.1351 (0.1154)0.1274 (0.1123)
100.1589 (0.1796 )-230.1617 (0.1858)-
110.6913 (0.6455)-240.1286 (0.1051)-
120.1406 (0.1409)0.1265( 0.1416)250.1583 (0.1701)0.1722 ( 0.1675)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sun, B.; Wu, Y. Estimation of the Covariance Matrix in Hierarchical Bayesian Spatio-Temporal Modeling via Dimension Expansion. Entropy 2022, 24, 492. https://doi.org/10.3390/e24040492

AMA Style

Sun B, Wu Y. Estimation of the Covariance Matrix in Hierarchical Bayesian Spatio-Temporal Modeling via Dimension Expansion. Entropy. 2022; 24(4):492. https://doi.org/10.3390/e24040492

Chicago/Turabian Style

Sun, Bin, and Yuehua Wu. 2022. "Estimation of the Covariance Matrix in Hierarchical Bayesian Spatio-Temporal Modeling via Dimension Expansion" Entropy 24, no. 4: 492. https://doi.org/10.3390/e24040492

APA Style

Sun, B., & Wu, Y. (2022). Estimation of the Covariance Matrix in Hierarchical Bayesian Spatio-Temporal Modeling via Dimension Expansion. Entropy, 24(4), 492. https://doi.org/10.3390/e24040492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop