1. Introduction
The variational method of approximating intractable computations has its roots in the calculus of variations and is reviewed in Stephenson [
1]. Ormerod and Wang [
2] gave an overview of how the variational approximations facilitate approximate inference for the parameters in complex statistical models, and provide a fast and deterministic alternative to the Monte Carlo method. Moreover, Teschendorff et al. [
3] and Flandin and Penny [
4] applied the variational method to model gene-expression data and fMRI data, respectively. Hall et al. [
5] and Hall et al. [
6] applied the variational method to approximate the likelihood function for a Poisson linear mixed model with only one predictor and derived the maximum likelihood estimators and their asymptotic distributions.
The main idea of the variational method of approximating the density of a statistic from a complex model is to find a density from a pre-determined family of distributions such that the approximated density and the target density have the smallest Kullback–Leibler divergence. Mathematically, let
be the target density and
be a specific density from a pre-determined family of distributions
. Then, the variational approximation of
is
is the Gaussian family, the variational approximation is referred to as the Gaussian variational approximation (GVA).
Recent advances in GVA include links to Bayesian posterior inference in [
7,
8] and Bayesian Gaussian graphical model selection [
9]; see also other applications to particle inference [
10], GVA with a factorial covariance structure [
11], and variational approximation in a Poisson-lognormal model [
12].
In this paper, we consider the Poisson lognormal mixed model considered in [
5] but with more than one predictor in the model. The GVA method is applied to obtain the maximum likelihood estimators of the parameters in the model. The advantages of the proposed method over the method discussed in [
5] are:
The rest of the paper is organized as follows.
Section 2 examines a special member of the exponential family model. For this specific distribution, it is shown that the first three moments can be obtained directly from the normalizing constant. Although the explicit form of the normalizing constant is unavailable, it can be approximated by the GVA method.
Section 3 reviews the Poisson lognormal mixed model and shows that the density of the model can be expressed in the form of the specific exponential model introduced in
Section 2.
Thus, the likelihood function of the Poisson lognormal mixed model is explicitly available, and the maximum likelihood estimators and their limiting distributions are derived. A real-life example is presented in
Section 4 to demonstrate the application of the proposed method. Moreover, results from simulation studies are also presented in
Section 4 to illustrate the accuracy of the proposed method. Finally, we give our concluding remarks and discussion in
Section 5.
2. A Special Member of the Exponential Family Model
Consider a special exponential family model with density
where
and
are non-negative real numbers, and
is the normalizing constant. Then,
and
The variance of
X can be obtained by
. Following the same argument as above, it can be shown that
Note that, as , the Model (1) converges to the normal model with mean and variance . Furthermore, as , the Model (1) converges to the log-gamma model with shape and rate .
Although Model (1) is not commonly discussed in the statistics literature, it occurs naturally in information theory. In particular, the concept of cross entropy in information theory, which is generally known as the Kullback–Leibler (KL) divergence in mathematics, is a measure of the closeness of two probability distributions
P and
Q. If both
P and
Q are continuous distributions, the cross entropy is defined as
where
and
are the density functions of
P and
Q, respectively. Note that, by Jensen’s inequality,
is always non-negative.
The variational method is to find
such that it is closest to Model (1) in terms of having the smallest KL divergence. When
is restricted to the family of normal distributions with mean
and variance
, we have
where
Mathematically, the variational method gives
Since the KL divergence is always non-negative, the variational method gives
where
. Note that Equation (
3) indicates that minimizing the KL divergence is the same as minimizing between
and all its lower bounds by Jensen’s inequality. Moreover, Equation (
4) shows that (
3) is equivalent to maximizing an expectation with respect to
and
.
Since the main idea of GVA is to approximate the log of an intractable integral by a function that is a maximum of all possible lower bounds derived from Jensen’s inequality, and all the possible lower bounds of the log of the normalizing constant of Model (1) can be determined by Jensen’s inequality, with
, we have
where
and
. The Jensen gap between
and its lower bounds can be further narrowed by maximizing the lower bounds for all values of
and
. Finally, we have the following theorem:
Theorem 1. The log of the normalizing constant approximated by GVA iswhere and satisfy In particular, for large α and β with , It is important to note that the drawback of Equation (
5) is that numerical approximation is needed to solve the nonlinear system of the estimated mean and variance. In contrast, Equation (
6) has an explicit form with second-order accuracy. A comparison of these two equations is given in
Figure 1. The inference for the Poisson lognormal mixed model will be provided in
Section 3.
Proof of Theorem 1. If
and
maximize
, then we have
Hence,
and
satisfy the equations
Therefore, (
7) gives
and
. Furthermore,
Substituting
obtained from (
7) into (8) and solving it for
, we have
By substituting
in (11) back into (
9), we have
Finally, replacing
and
in
by (
9) and (11), respectively, we have
With
and
large and
, by substituting
with (
13) and
with (12) into (
14), we have
□
To demonstrate the difference in the numerical accuracy of the GVA, we consider the density
as stated in (
1) with
and
. The exact
is obtained by numerical integration. The plot of the result is given in
Figure 1.
From
Figure 1, we observe that the results of GVA by (
5) are accurate for small
, while GVA by (
6) gives a good approximation for large
. Overall, Equation (
6) is a better explicit approximation compared with the numerical approximation by Equation (
5).
3. GVA for Poisson Lognormal Mixed Model
In this section, we apply the results from Theorem 1 to obtain the estimates of the parameters in a Poisson lognormal mixed model. The estimates are simpler to obtain than the ones in Hall et al. [
5,
6].
Let
, which takes on non-negative integer values, be the number of occurrences of an event for subject
i at time
t, where
and
. Moreover, let
where
is the
power of the covariate for subject
i at time
t,
. Then, the conditional Poisson model is
where
is the mean of the conditional Poisson model,
, and
are the regression coefficients. An unobservable latent random variable
for the
subject is introduced into Model (
16) to capture the within-subject correlation. The resulting model, which is known as the Poisson generalized linear mixed model, can be written as
where the mean of the model is
, and
and
are assumed to be independently distributed. Furthermore, if
are assumed to be identical and independently distributed as
), the model is referred to as the Poisson lognormal mixed model, though [
5] referred to it as the Poisson linear mixed model. McCulloch et al. [
13] gave a detailed discussion of the connections between the Poisson lognormal mixed model, and the model is studied in longitudinal data analysis.
It is challenging to obtain the estimators of the regression coefficients,
, and
, because it involves the unobservable latent variables
. Rue et al. [
14] suggested eliminating the effect of the unobservable latent variables by using the marginal log-likelihood function, which can be written as
The major challenge to developing likelihood-based inferences for the parameters is hindered by the last term in (
18), which involves the summation of
m integrals. Hall et al. [
5] applied the GVA method to overcome such integration problems. They considered only the model with one covariate
and the resulting estimating equations involved
nuisance parameters. Due to the complexity of their method, it is difficult, if not impossible, to extend to models with more than one covariate
.
Using the notations introduced in
Section 2, the marginal log-likelihood function (
18) can be rewritten as
where
From (
6), we have
where
. Therefore, the log-likelihood function (
19) can be approximated by
For
, denote
and
. Hence, the partial derivatives of the log-likelihood function with respect to
and
are
Note that
and
. Since the maximum likelihood estimators
and
must satisfy
we have
and
Although the maximum likelihood estimators of do not have a closed form solution, Equation (23) is well-structured enough that numerical procedure, such as the Newton–Raphson method or the EM algorithm can be applied to obtain the numerical solution.
To study the asymptotic properties of the maximum likelihood estimators, we need to make the following three assumptions.
Assumption 1. Let as , where ’s are independent and have the same distribution as X whose moment generating function (MGF), , is well-defined on the whole real line. , and (entry of ) are defined aswhereand Assumption 2. For , .
Assumption 3. For ,where , , , and are solutions obtained from (22). We give a few remarks on each assumption.
Remark 1. Assumption 1 is implied by the Central Limit Theorem. Thus, we only need to calculate the moments about the expectation and covariance. Similarly, from the definitions of and , we have To be more specific, when , Remark 2. In Assumption 2, , and thus the variance is On account of Markov’s inequality, we need to show Note that . Derive the second-order Taylor expansion of the ratio about the value and use it to obtain .
Remark 3. In Assumption 3, we obtain the -order Taylor expansion of and about the value , which yields If , . Furthermore, we have , where Note that agrees with Equation (3.5) in Theorem 3.1 of Hall et al. [5]. After tedious calculations, we obtain the simplified versions of and : Theorem 2. With Assumptions 1, 2, and 3, for ,where ’s are the solutions obtained from (22), is the jth row and kth column of matrix with , and . Thus, as , is a consistent estimator with asymptotic distribution of iswhere . Moreover, is a consistent estimator with asymptotic distributionwhere is the solution of provided in (23). Finally, is also a consistent estimator with the asymptotic distribution iswhere is the solution presented in (24). Proof of Theorem 2. From (23), we consider the following algebraic manipulation:
where the first term is
, the second term is
, and the last term is zero.
As
and
,
for each
. By the continuity,
for each
. This shows that the estimators are consistent. In addition,
This gives the asymptotic result below by the definition of
,
Therefore, by Central Limit Theorem, we have
where
.
Moreover, we have
where
. From the estimator of
in (
22),
Therefore,
, and hence
is a consistent estimator. Moreover, by the Central Limit Theorem, we have
From the estimator of
in (
24), combing the consistence of
and (
25), we have
where
. Hence, we have
, and, thus,
is a consistent estimator. Again, by the Central Limit Theorem, we have
□
Remark 4. It is natural to consider independent covariates in which the can be modeled as , where and and are independent for . We still havewherewhere , and if . Furthermore, and if . Other modifications on can be similarly done. Remark 5. According to Theorem 2, we can construct an approximate confidence interval (CI) for parameters based on the limiting distributions, and they take the form:Parameter | CI |
| |
| |
| |
where Φ denotes the distribution function of . Note that, when , these confidence intervals agree with those given in Hall, Pham, Wand, and Wang (2011).
5. Conclusions and Discussion
We proposed a special exponential family model and studied the GVA of the normalized constant by connecting the KL divergence and Jensen’s inequality. A closed form of expansion of the normalized constant was given up to the second order, which simplified the calculation and theoretical justification in inference of the Poisson lognormal mixed model. Real data analysis justified the importance of extension to multiple predictors. Simulation studies showed that the coverage percentage was consistent.
The proposed method was implemented into
R and is available from the authors upon request. The flowchart of the proposed algorithm by GVA is shown in
Figure 6.
Note that the glmer function in R package and lme4 perform similar calculations; however, our proposed method has two advantages over glmer—namely,
In summary, the Gaussian variational approximation method was developed for the Poisson lognormal mixed model with one or more predictors in this paper. The explicit forms of the estimators of the parameters and the corresponding limiting distributions were derived. Hence, inference for the parameters was obtained. A real-life example demonstrated the applicability of the proposed method, and simulation studies illustrated the accuracy of the proposed method. Although the same problem has been studied in Hall et al. [
5] and Hall et al. [
6], their studies were restricted to one-predictor models and are difficult to generalize to multiple-predictor models. The proposed method gives results that are comparable to their results but are applicable to multiple-predictor models.