1. Introduction
Quantile regression modeling has been widely applied in different fields such as economics, environmental science, ecology, and medicine, among many others (Cade and Noon [
1], Yu et al. [
2]). A number of studies on nonparametric quantile regression and its applications have been developed since the seminal work of Koenker and Bassett [
3]. Recently, several parametric quantile models have been studied in the regression literature, which have motivated the study of probability distributions that are useful for this purpose.
In the univariate setting, some distributions suitable for parametric quantile modeling appear in Ferrari and Fumes [
4], Gijbels et al. [
5], Mazucheli et al. [
6], and Smithson and Shou [
7]. Multivariate quantile modeling is less frequent in the statistical literature and often uses nonparametric methods. Several studies are based on extensions of the quantile concept to a multivariate setting. Some examples can be found in Breckling and Chambers [
8], Kong and Mizera [
9], McKeague et al. [
10], and Wei [
11]. Other multivariate models are based on the univariate quantile notion. For instance, Petrella and Raponi [
12], Morán-Vásquez and Ferrari [
13], and Morán-Vásquez et al. [
14] propose methods for jointly modeling univariate marginal quantiles, taking into account the potential correlation between marginals.
In the present article, we define a quantile-based multivariate log-normal distribution. This distribution has positive support, and is simplified to the quantile-based log-normal distribution (Saulo et al. [
15]) in the univariate setting. On the other hand, the usual multivariate log-normal distribution (Morán-Vásquez and Ferrari [
13] and Morán-Vásquez et al. [
14]) can be expressed as a quantile-based multivariate log-normal distribution. The parameters of the proposed distribution are interpretable in terms of marginal quantiles and associations between pairs of variables, making this model attractive to quantile modeling for correlated multivariate positive skewed data.
In this article, we study some statistical properties of the quantile-based multivariate log-normal family, describe the estimation of its parameters, and show its usefulness through an application to real data. We derive distributional properties obtained through transformations, as well as results related to the mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginal and conditional distributions, and independence. Applications of some of our results derived in this article establish new properties of the multivariate log-normal distribution. We compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution from the maximum likelihood estimates of the multivariate log-normal distribution. We evaluate the performance of the proposed estimation procedure through Monte Carlo simulations. The usefulness of the proposed distribution for modeling multivariate positive skewed data is illustrated through an analysis of real data on children’s weights and heights.
The paper is organized as follows.
Section 2 presents the quantile-based multivariate log-normal distribution.
Section 3 deals with the derivation of various statistical properties of the proposed distribution.
Section 4 focuses on maximum likelihood estimation and simulation studies. Also, a graphical method to assess the goodness of fit is described.
Section 5 presents an application to real data. Finally,
Section 6 closes the paper with concluding remarks.
2. Quantile-Based Multivariate Log-Normal Distribution
We denote vectors with lowercase Greek letters in bold and matrices with capital Greek letters in bold. For vectors and matrices, the components are denoted by the respective Greek letter in normal font. For example, if
and
are a real matrix, then
and
. We denote by
and
the
p-dimensional vectors whose components are all zero and one, respectively. We denote by
the
identity matrix. Let
be a square matrix. We denote by
and
the determinant and trace of
, respectively. If
is a symmetric matrix, then
means that
is the positive definite. Additionally,
is the unique symmetric positive definite square root of
. If
and
are matrices of the same dimension, then
denotes the Hadamard product of
and
. If
is a vector, then
denotes the diagonal matrix with diagonal elements of
, that is,
. We define the set
as
If
and
f are a real function, we denote
, provided that the components of
are in the domain of
f. If
and
, we write
. We denote random vectors and their components with capital Roman letters in bold and normal fonts, respectively.
It is well known that the PDF of a multivariate normal vector
is given by
where
is the square of the Mahalanobis distance between
and
with respect to
. On the other hand, the random vector
has a multivariate log-normal distribution with median vector
and dispersion matrix
, denoted by
, if
, where the log denotes the natural logarithm function. The PDF of
is (Morán-Vásquez et al. [
14])
The multivariate log-normal distribution given in (
1) has a slightly different parameterization than the one used by Fang et al. ([
16], Section 2.8).
Let
be fixed values in
. Theorem 5 and Corollary 2 of Morán–Vásquez and Ferrari [
13] permits us to establish that the
-quantile
of
satisfies
, where
is the
-quantile of a standard normal distribution,
. Note that the quantile vector
can be expressed as
where
and
. A reparameterization of the multivariate log-normal distribution in terms of
is obtained by replacing
in (
1). Based on this, we present the quantile-based multivariate log-normal distribution in Definition 1.
Definition 1. Let and be fixed vectors such that is the -quantile of a standard normal distribution, . The random vector is said to have a quantile-based multivariate log-normal distribution with quantile vector and dispersion matrix , denoted by , if its PDF iswhere . If we choose
, then Definition 1 coincides with the definition of a multivariate log-normal distribution (Morán-Vásquez et al. [
14]). In this case,
is the median vector. Note that
if
, which establishes the way in which the quantile-based multivariate log-normal and normal distributions are related through the logarithmic transformation.
Figure 1 displays contour plots (at levels 0.15, 0.1, 0.05, 0.02, 0.01) of the quantile-based bivariate log-normal distribution. The legend indicates the values of
,
, and all the parameters considered in the first plot and the values that are changed from a plot to the subsequent one (in alphabetical order). The parameters
and
of the distribution in
Figure 1a are the marginal medians of
and
, respectively. For
Figure 1b–f, these parameters are the first quartile of
and the median of
, respectively. The parameter
impacts the scale of the marginal distribution of
(
Figure 1b,c). The parameter
controls the dispersion of the marginal distribution of
(
Figure 1c,d). The parameter
controls the association between the marginal distributions of
and
, ranging from a negative to positive association (
Figure 1d–f).
The quantile-based multivariate log-normal distribution is suitable in situations where it is necessary to model quantiles of the marginals, taking into account the correlation between them. Additionally, our model can be useful for regression modeling purposes. For instance, assume that, for fixed
k,
, where
are unknown regression parameters and
are fixed covariates. So,
is the multiplicative effect of a one unit increase in
on the
-quantile of
. This is a parametric methodology that allows us to jointly analyze marginal quantiles, taking into account the association among the response variables through the dispersion matrix
. These types of models can provide more accurate estimates than those that consider univariate models for each marginal assuming independence among them (Morán-Vásquez et al. [
14]).
3. Main Properties
Theorems 1–3 state distributional results involving the transformation of quantile-based multivariate log-normal random vectors.
Theorem 1. Let . If , then .
Proof. From the transformation
, with the Jacobian
, in (
3), we arrive at
Since
, (
4) can be expressed as
where the last line is obtained by using the identity
. □
Corollary 1. Let . If , then .
Proof. The result follows by applying Theorem 1 to the quantile-based multivariate log-normal distribution generated by . □
The result stated in the above corollary can also be obtained as a particular case of Theorem 3(1) of Morán–Vásquez and Ferrari [
13].
Theorem 2. Let have nonzero components. If , then , where .
Proof. Transforming
, with the Jacobian
, in (
3), we have
By using the identity
with
and
, in (
5), we have
where the last line is derived by noting that
. □
Corollary 2. Let with nonzero components. If , then .
Proof. The result follows by applying Theorem 2 to the quantile-based multivariate log-normal distribution generated by . □
The above corollary can also be obtained as a particular case of Theorem 3(2) of Morán–Vásquez and Ferrari [
13].
Theorem 3. Let . If , thenwhere . Proof. Since
, we have
where
and
. This completes the proof. □
Corollary 3. Let . If , then Proof. Simply apply Theorem 3 to the quantile-based multivariate log-normal distribution generated by . □
In Theorem 4, we give a closed-form expression for the mixed moments of quantile-based multivariate log-normal random vectors.
Theorem 4. Let . If , then Proof. From (
3), we have
By making the change of variables
, with the Jacobian
, in (
7), we arrive at
, where
is the moment-generating function of
. This completes the proof. □
In the following corollary, we derive the expected value and the covariance matrix of a quantile-based multivariate log-normal random vector.
Corollary 4. Let . Then,
, where is the vector with elements being the main diagonal elements of Σ.
, where
Proof. For each
, by choosing
with all its components being 0, except the
kth which is 1, in (
6), we obtain
From the above expression, we get the first assertion. Similarly, for each
, by choosing
with all components equal to 0, except the
jth and
kth, which are 1, in (
6), we have
The second assertion is obtained from the identity
. □
In
Section 2, we described the behavior of the quantile-based multivariate log-normal distribution in terms of the parameters involved in the matrix
. The following corollary establishes an exact interpretation of these parameters in terms of covariance between pairs of variables according to their signs.
Corollary 5. Let . Then,
if and only if , .
if and only if , .
Proof. The result follows from (
8). □
Corollary 6. Let . If , thenMoreover, , where is the vector with elements being the main diagonal elements of Σ, and , with Proof. Apply Theorem 4 and Corollary 4 to the quantile-based multivariate log-normal distribution generated by . □
Theorem 5 gives a closed-form expression for the mode of the quantile-based multivariate log-normal distribution.
Theorem 5. The mode of is given by . The value of the PDF of at the mode is Proof. The mode of
is obtained by maximizing (
3) with respect to
, which is the one that maximizes the function
with respect to
. By using results on vector differentiation (Seber ([
17], Chapter 17)), we find that the equation
is equivalent to
The solution for
of the above equation is
.
Now, for
, we have
which implies that
for all
. Hence,
. □
Corollary 7. The mode of is given by . The value of the PDF of at the mode is Proof. Apply Theorem 5 to the quantile-based multivariate log-normal distribution generated by . □
Theorem 6 provides the distribution of a Mahalanobis-type distance involving a quantile-based multivariate log-normal random vector.
Theorem 6. If , then .
Proof. The result follows by noting that . □
The above result allows us to evaluate the goodness of fit of the quantile-based multivariate log-normal distribution by using quantile–quantile plots to compare empirical Mahalanobis distances with theoretical quantiles obtained from a chi-squared distribution with p degrees of freedom.
The Shannon entropy (also called differential entropy) of a continuous random vector
with PDF
is defined as
On the other hand, the Kullback–Leibler (KL) divergence between the distributions of two
p-dimensional random vectors
and
is given by
where
and
denote the PDFs of
and
, respectively. The above expected value is defined with respect to the PDF
. A detailed study about Shannon entropy and KL divergence can be found in Pardo [
18].
Lemmas 1 and 2 provide the Shannon entropy and the KL divergence for the multivariate normal distribution, respectively.
Lemma 1. The Shannon entropy of is given by Proof. See Pardo ([
18], p. 32). □
Note that
in the above lemma can be expressed as
Lemma 2. The KL divergence between and is given by Proof. See Pardo ([
18], p. 33). □
In the following Theorem, we derive the Shannon entropy of the quantile-based multivariate log-normal distribution.
Theorem 7. The Shannon entropy of is given by Proof. By definition,
By making the change of variables
, with Jacobian
, in the above integral, we have
where
. The result follows by calculating
by using Lemma 1 and replacing
in the above expression. □
Corollary 8. The Shannon entropy of is given by Proof. The result follows by applying Theorem 7 to the quantile-based multivariate log-normal distribution generated by . □
In Theorem 8, we derive the KL divergence between two quantile-based multivariate log-normal distributions.
Theorem 8. The KL divergence between and is given by Proof. By definition,
We substitute
, with Jacobian
, above to arrive at
where
and
. By using Lemma 2 to calculate
we arrive at the desired result. □
Corollary 9. The KL divergence between and is given by Proof. Take the quantile-based multivariate log-normal random vector in Theorem 8 generated by . □
Corollary 10. The KL divergence between and is given by Proof. Generate the quantile-based multivariate log-normal random vector in Corollary 9 with . □
With the aim to derive results on marginal and conditional distributions and independence, relating sub-vectors of the random vector having a quantile-based multivariate log-normal distribution, we introduce notations for partitions of
,
,
, and
as follows:
where
,
,
,
,
,
,
,
, and
and
are such that
. The Schur complement of the block
of
is given by
. Also, we define
,
, and
. The dimension
p is such that
.
In Lemma 3, we give a factorization of the PDF of the quantile-based multivariate log-normal distribution.
Lemma 3. Consider the partitions given in (9). The PDF of can be expressed aswhere , with . Proof. It suffices to show that
The straightforward calculation shows that
Now, using the result
we have
which is the desired result. □
In Theorem 9, we show that the quantile-based multivariate log-normal family is preserved under marginalization and conditioning. In this theorem, we also present a characterization of the independence between subvectors of this family.
Theorem 9. Let . Consider the partitions given in (9). Then, .
.
and are independent if and only if .
Proof. Statements 1 and 2 follow from the factorization given in (
10). To prove the statement 3, note that
and
are independent if and only if
which, from (
10), is satisfied if and only if
. □
4. Parameter Estimation
The reparameterization used in Definition 1 permits us to compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution through the maximum likelihood estimates of the parameters of the multivariate log-normal distribution. Let
be the observed values of a random sample
of
. We denote the maximum likelihood estimators of
and
by
and
, respectively. From (
2), we have
where
, and
and
are the maximum likelihood estimators of the multivariate log-normal distribution given by (Morán-Vásquez et al. [
14])
Note that the maximum likelihood estimator of
in the quantile-based multivariate log-normal distribution is the same as in the multivariate log-normal distribution. Furthermore, this estimator is the same for any choice of
.
We assess the goodness of fit of the quantile-based multivariate log-normal distributions by using quantile–quantile plots, comparing the empirical Mahalanobis distances
,
, with the theoretical quantiles
, where
,
, obtained from a chi-squared distribution with
p degrees of freedom. Additionally, we plot simulated envelopes (Atkinson [
19]) for the quantile–quantile plots in order to help the comparison between quantiles and judge the adequacy of the models.
To evaluate the estimation procedure, we conducted simulations with the quantile-based bivariate log-normal distribution. We consider the sample sizes of , and Monte Carlo replicates. The random samples of were generated through the following steps:
Generate a random sample of of .
Compute . Then, is a random sample of .
The true parameters were yielded by fitting the quantile-based bivariate log-normal distribution to the children data set considered in
Section 5.
Table 1 reports the median and the interquartile range for the estimated values of the parameters of the investigated models. The medians get close to the true parameters and the interquartile range gets smaller as the sample size grows, indicating a satisfactory performance of the estimators. All the computations were conducted in the
R software [
20].
6. Final Remarks
In this article, we have proposed a multivariate distribution with positive support derived by applying a parameterization of the multivariate log-normal distribution by using their marginal quantiles. This distribution will attract researchers in the area of quantile modeling for correlated multivariate positive skewed data. We derived a number of important statistical properties of this distribution involving the transformations, mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginalization, conditioning, and independence. Needless to say, the quantile-based multivariate log-normal distribution defined in this article is rich in theoretical properties and can easily be manipulated from a mathematical viewpoint. The parameter estimation was approached by using the maximum likelihood estimation method. The satisfactory behavior of the estimation procedure was verified through simulation studies. Also, a graphical diagnostic tool was employed in order to assess the quality of the fitted distributions. On the other hand, an application to real data is presented and discussed as an alternative for the quantile estimation of the children’s weights and heights, considering the natural association between these variables.
There are several aspects that will be addressed in future articles. Bayesian approaches for the estimation of the parameters of the quantile-based multivariate log-normal distribution will be developed. The study of regression models based on the quantile-based multivariate log-normal distributions together with inferential developments and applications to real data will also be undertaken. These models will allow us to analyze the relationship between marginal quantiles of response vectors and a set of explanatory variables, taking into account the potential association among the marginal response variables. Additionally, a comparative analysis of this methodology with the model proposed by Petrella and Raponi [
12] will be included in a forthcoming article.