1. Introduction
There has been a substantial literature on estimation and inference of relative entropy measures of joint dependence as measures of serial correlation. These particular measures of dependence were first proposed by Joe [
1] and extended by Granger and Lin [
2]. Relative entropy based measures of dependence have so far received much interest in econometrics because they provide very general concepts for gauging joint dependence; and they can be used for a set of variables that can be a mixture of continuous, ordinal-categorical, and nominal-categorical variables. Interested readers are referred to [
3,
4,
5] for a concise review of important contributions in this area.
Econometricians have recently become interested in the computation of maximum entropy densities (see, e.g., Golan [
6], Usta and Kantar [
7], and references therein for the background and discussions regarding maximum entropy (ME) densities.) The ME densities are derived by maximization of an information criterion (the level of uncertainty) subject to mass and mean preserving constraints. The justification for using the ME in this context can be found in [
8]. Rockinger and Jondeau [
9] apply the ME method to determine the ME return distribution which is then utilized to extend Bollerslev’s GARCH into autoregressive conditional skewness and kurtosis. Maasoumi and Racine [
10] employ a metric entropy measure of dependence to examine the predictability of asset returns. Hang [
11] uses the ME to determine flexible functional forms of regression functions subject to side conditions. Miller and Liu [
12] propose a method to recover a joint distribution function by applying the KLCE distance while imposing a required degree of dependence through the joint moments. An example is the normal distribution which is completely characterized by first and second moments. In this case, the minimum KLCE distribution is the multivariate Normal distribution where the dependence is specified through conventional linear correlation.
There has been a great deal of interest in copulas, especially in financial economics, as they have the potential to model and explain asymmetric dependence between random variables separately from their marginal distributions. For example, Patton [
13] employs various families of copulas to investigate the inter-relationship between univariate skewnesses, asymmetric dependence between asset returns, and the optimal portfolios of assets. Rodriguez [
14] models financial contagion using copulas. Chollete, Heinen, and Valdesogo [
15] propose a multivariate regime-switching copula to capture asymmetric dependence and regime-switching in portfolio selection. Ning, Xu, andWirjanto [
16] investigate asymmetric pattern in volatility clustering by employing a semi-parametric copula approach. Detailed indications of various econometric aspects or applications of copulas in economics and finance can be found, for instance, in the survey papers by Patton [
17] and Fan and Patton [
18]. A comprehensive treatment of copula theory is presented in the monograph by Nelsen [
19].
Given the broad context described above, we propose a theoretical framework to recover relative entropy measures of joint dependence from limited information by constructing a set of the most entropic copulas (MEC’s), which can essentially be done by maximizing Shannon entropy subject to constraints on the uniform marginal distributions and other constraints on the copula-based measures of dependence (or the distance between the MEC and an arbitrary nested copula). In the class of MECs, there exists a simplified form, namely the most entropic canonical copula (MECC). Moreover, it can be shown that the proposed MEC approach and the KLCE approach in Miller and Liu [
12] are dual in the sense that they can recover the same joint distribution. Applications of MEC’s to economics include Chu [
20], Dempster, Medova, and Yang [
21], Friedman and Huang [
22], Veremyev, Tsyurmasto, Uryasev, and Rockafellar [
23], Zhao and Lin [
24].
We shall now discuss the contributions of the current paper in relation to [
20]. The similarity between the two papers is that rank correlations are employed as prior information about dependence in order to construct the MECC. This paper differs from [
20] in several respects. First, in [
20], Carleman’s condition permits constraints on moments to be employed so as to ensure that the MEC satisfies all the properties of a copula while, in the present paper, constraints are explicitly imposed on marginal copula densities. Therefore the entropy maximization problem defined in [
20] is merely a good approximation of the entropy maximization problem in this study. Second the main problem in [
20] is the standard entropy maximization problem while the main problem in the present paper involves a continuum of constraints on the marginal distributions, which can be written as integrals with varying end-points that need to be smoothed out by using kernels. This kernel-smoother can generate MECs with smooth densities whilst the discrete approximation technique proposed by [
21] can only allow for MECs with discrete densities. The feasibility and benefits of the proposed approach to construct MECs will then be demonstrated through a Monte-Carlo simulation study presented in
Section 3.
Although our analysis is restricted to the bivariate case, the multivariate case is a straightforward extension. The remainder of the paper is organized in three sections. In
Section 2, we formulate and approximate most entropic copulas (MECs). Next, we discuss the link between the MEC and the minimum KLCE density and the extent to which the MEC is more flexible than the KLCE method. We then compute the MEC and the MECC subject to marginal constraints and other constraints on various copula-based dependence measures such as Spearman’s rho and tau. We also outline the large sampling properties of the relevant parameter estimators. We present these results in Theorems 2.1–2.4. A simulation study is presented in
Section 3, demonstrating that the MEC fits data well when compared with other competing procedures (e.g., parametric copulas and kernel estimators). Derivation of statistical properties for the proposed copula estimator is rather challenging and will be left for future research. Finally, to facilitate reading of this paper, we collect all materials of technical flavour into the three main appendices at the end of this paper.
2. Recovering the Most Entropic Copulas
2.1. Maximum Entropy and Copula
This section provides a brief explanation of entropy and copula. We refer to [
25] for a comprehensive review of entropy econometrics and [
19] for important results concerning copulas.
Shannon entropy has been used as an information criterion to construct the probability densities for economic or financial variables such as stock returns, income, GDP,
etc. (see,
inter alia, [
26,
27,
28]). A univariate ME density is generally obtained by maximizing Shannon entropy,
, with respect to
under probability and moment constraints. A bivariate ME density that is closest to a given reference density, say the product of two univariate densities, can be obtained by minimizing the KLCE under joint moment constraints (see, e.g., [
1] and [
12]):
subject to
where
f is a bivariate density,
and
are some univariate densities,
and
h is an arbitrary function such that
.
The
copula is proposed by Sklar [
29] as a method to construct joint distributions with given marginals. The advantage of copulas is that dependence between random variables can be parametrically specified entirely independently from their marginals. A bivariate copula is defined as a function
from
to
with the following properties: 1) for every
it holds that
and
2)
is 2-increasing,
i.e.,
for every
such that
and
(see, e.g., [
19], p. 8)). Note that Property (2) always holds if
has a positive density
, and Property (1) implies that a copula is a function with Uniform[0,1] marginals. Sklar’s theorem links a copula,
, to a joint distribution,
, via
, where
and
are the marginals.
We shall use measures of association and rank correlations to construct the MEC, which we discuss next. Measures of association are, unlike joint moments, invariant under nonlinear transformations of the underlying random variables, and thus they are natural measures of dependence for non-elliptical random variables (see
Appendix A for formal definitions of measures of association). A measure of association is, in general, defined as
, where
h is a bivariate function such that
. This measure, based on
C, is also referred to as the
copula-based measure of dependence. In practice,
τ can be estimated by the rank statistic
, where
represents the ranks of
in a sample of size
N. An advantage of using rank statistics as nonparametric measures of nonlinear dependence is that they are robust—in the sense that they will be insensitive to contamination and maintain a high efficiency for heavier tailed elliptical distributions as well as for multivariate normal distributions (see, e.g., [
30] for a detailed treatment of rank statistics). Examples of
include Spearman’s rho and Blest’s rank correlations (see, e.g., [
31]), which are summarized in
Table 1.
Nonetheless, it is worth mentioning that the definition of
τ is somewhat restrictive since it does not include Kendall’s tau, for example.
1 Moreover, not every rank correlation can be formulated in terms of the above general rank statistic
. For instance, the statistic
, which was proposed by Gideon and Hollister [
32] as a coefficient of rank correlation resistant to outliers even in a small sample, has the form:
where
is the value of
with the subscript
i satisfying
, and
is the greatest integer notation. In addition,
estimates a copula-based measure of dependence,
.
In the present paper, we use the bivariate Shannon entropy of a copula, given by
By Sklar’s theorem the Shannon entropy of a copula is then equivalent to the KLCE:
Hence, minimization of the KLCE and maximization of the bivariate Shannon entropy are dual problems. Let
denote the MEC. Then, in view of [
1], the relative entropy measure of dependence (recovered from limited information) is given by
. Generally speaking, a multivariate Shannon entropy can be defined in an obvious way, and this dual relationship holds. However, as pointed out in Friedman and Huang [
22] the problem of maximizing a multivariate Shannon entropy of copulas can suffer from the curse of dimensionality because the number of constraints (on the marginal densities) needed for the MEC to satisfy all the properties of a copula increases as the problem involve more dimensions.
2.2. The Most Entropic Copula
We assume for the rest of this paper that the MEC is a differentiable function so that its copula density exists. The bivariate MEC (
or the MEC) is obtained by maximizing the bivariate Shannon entropy (
2) under two following constraints: (1) the marginals of
are Uniform[0,1]; and (2) the measures of association, defined in
Section 2.1, are set equal to the corresponding rank correlations. We call this Problem
EM.
subject to
where (
4) implies that
is a joint density on the unit circle; Equations (5) and (6) imply that the marginals of
are Uniform[0,1] distributions; Equation (7) imposes a constraint on the joint behavior of
U and
V. To give an example, let
, then the left-hand side of (7) becomes Spearman’s rho and
(note that, in what follows, we sometimes omit ‘
N’ for brevity) is the rank correlation associated with Spearman’s rho. To give another example, suppose that the true data generating copula, say
, belongs to a family,
. Given this prior information, to recover a MECC from the data, one may randomly choose a copula,
, from
, then use it to construct (7) with
, where
and
is an estimate of the difference between the probabilities of concordance and discordance (
cf. Appendix A). By doing this, it is expected that some feature of the family
could be effectively incorporated into the MECC. Other examples of Equation (7) also include Blest’s coefficients or Gideon and Hollister’s (1987) coefficient, etc. Also note that we may have more than one constraint like (7). It is to be stressed at this point that some versions of the MEC problem may exhibit boundary solutions due to theoretical restrictions on the measures of dependence employed (e.g., the Hoeffding-Frechet bounds on correlation statistics). Consequently, the large-sample theory stated in
Section 2.3 below only holds for interior solutions to the stated problem.
2For future reference, we shall denote by , where is a vector of coefficients, as the MEC [that solves Problem EM]. The MECs (accordingly the MECC) can then be approximated by replacing the continuums of varying end-points in (5) and (6) by sets of definite integrals. We now present an approximate solution to Problem EM in Theorem 2.1 below.
THEOREM 2.1. The MEC, , can be approximated by an approximator, , as follows:withwhereand contains the minimal values of the following potential function:Note that is the standard normal cdf (arising from smoothing indicator functions, , with the Gaussian kernel) and is an arbitrary copula (which may involve a nuisance parameter that needs to be estimated). In particular, the MEC, , can be symmetrized by letting be equal to () and be a symmetric function.
Proof: The proof utilizes the standard method of Variational Calculus for maximization of functions in normed linear spaces (see, e.g., [
33], p. 129). See
Appendix D. ■
As we can see, the MEC density nests an arbitrary copula,
, (
cf. Equation (
9)). Indeed, the MEC depends on both
and
, thus no uniqueness is obtained. However, we can obtain a canonical form, which is called the
MECC, by setting
to zero. This idea of a
canonical model can be traced back to Jeffreys
3 who proposed to use the
principle of simplicity for deductive inference—that is, for any given set of data, there is usually an infinite number of possible laws that will “explain” the data precisely; and the simplest model should be chosen.
It is also worth noting at this point that, like the empirical copula, the MECC is a valid distribution function; however, it satisfies the Uniform[0,1] marginal constraints only asymptotically. In addition the potential function in the above theorem is a multivariate convex function of Λ, which in general has a unique minimum because it is the product of (positive) univariate convex functions.
We can claim that the MECC,
, is equivalent to a maximum likelihood estimator (MLE). Now, we need to verify this claim—given a bivariate sample
for
, the average maximum log-likelihood function is given by
where
is defined in (
8),
and
in which
and
are the ranks of
and
in the sample, respectively. Assuming that
N is greater than
n and that
n is large enough, in view of (
9) with
, we obtain the following representation:
where
; the approximation (≈) follows because
for every
; and the last equality holds because
is set equal to its consistent rank estimator,
. Hence, the claim has been verified.
REMARK 2.1. To compute the MECC, we could use either a Monte-Carlo integration procedure or Gaussian quadratures to approximate the potential function (
10)
(see Appendix C for further details), and then employ a global optimization technique (for example the stochastic search algorithm proposed by Csendes [34]) to minimize this function. In general, we can also approximate by using a collection of equally-spaced partitions of the unit interval , and then, a high-order kernel smoothing of the indicator function. This is stated in Theorem 2.2:
THEOREM 2.2. The MEC, , can be approximated by an approximator, , as follows:withwherefor some kernel function, , in , where is the space of symmetric, Lebesgue integrable, kernel functions of order, r, (cf. Definition B.1) and contains the minimal values of the following potential function: Proof: The proof is very similar to Theorem 2.1 combined with Lemma B.1. So we shall omit its details here. ■
2.3. Large Sample Properties with Unknown Parameters of Dependence
The approximate MECC densities are members of a statistical exponential family parametrized by the Lagrange multipliers. Since the true parameters of dependence
in (7) are unknown, a random sample of size
N is then used to form their consistent estimates
. Therefore, the sampling properties of
may be derived from the associated sampling properties of
. Let
represent the approximate potential function with the dependence parameters
Θ as formulated in
Section 2, where
and
denote the minimal values of
for
and
respectively. The Hessian matrices of
are
and
. The following assumptions are maintained
- AS1.
, where
is some non-empty compact set;
is the number of dependence constraints. Further,
is also a non-empty and compact set, where
is the number of the Lagrange multipliers in
. Therefore, the number of marginal constraints is
.
- AS2.
The map from to is a diffeomorphism (i.e., one-to-one, continuous and onto in both directions).
- AS3.
is a strictly convex function of
Λ for all
Θ and uniformly continuous (in probability) in
Θ,
i.e.- AS4.
The vector of dependence parameter estimates is asymptotically normal such that
where Ψ is an asymptotic variance-covariance matrix of
.
AS2 states that the relationship between
and
is one-to-one in both directions (
i.e., for a given set of dependence parameter estimates
in
there exists uniquely a set of the Lagrange multipliers
in
which contains a unique subset of the Lagrange multipliers determining the dependence constraints). This assumption ensures that the potential function has uniquely minimal values for a given set of parameters. Conversely, these minimal values are uniquely determined by a set of parameters. Regarding
AS4,
may be a set of sample moments after
N draws from the kernel densities constructed from actual data. If all the moments exist and Carleman’s condition holds, then
are consistent asymptotically normal estimates of
(see, e.g., Hardel, Muller, Sperlich, and Werwatz [
35]).
THEOREM 2.3. In view of AS1–AS4, we obtain If the dependence constraints are linear in their parameters,
i.e.,
, we can redefine the potential function associated with the constraints of Problem
EM as follows:
where
, and
is the Lagrange multiplier for the constraint
.
THEOREM 2.4. If (
12)
satisfies AS1–AS4, then we havewhere is a diagonal matrix. Proof: Noting that , the proof follows directly from Theorem 2.3. ■
Theorem 2.4 suggests that in general the efficiency of the estimators can be improved by using more marginal constraints. However, adding too many marginal constraints can decrease efficiency since this may increase the probability that the covariances of in are negative. Thus, the Hessian matrix contains some negative elements which may cause the asymptotic variance of to increase overall. Theorems 2.3 and 2.4 can be used to develop tests of hypotheses about the “distance” between the MECC and another copula of the exponential function family.
3. Simulation
In this section, we perform some simulations to investigate the finite-sample properties of the MECC approximators (proposed above). We shall address three main issues in these simulations. First, the MECC can outperform the parametric copulas used in this study (the Gaussian copula, Student’s
t copula, the Clayton copula, and the Gumbel copula) while its performance remains comparable to other nonparametric estimators (
i.e., the “shrinked” local linear (LLS) type kernel copula estimator and the “shrinked” mirror-reflection (MRS) kernel copula estimator proposed by Omelka, Gijbels, and Veraverbeke [
36]). Second, an increase in the number of marginal constraints leads to an improvement in the performance of the MECC. Third the MECC, for the most part, becomes as stable as other parametric copulas as more marginal constraints are utilized.
To accomplish the above objectives, we choose Frank’s copula,
where
as the true model whereby samples are generated. (See [
37,
38] for the statistical properties of Frank’s copula.) This copula is radially symmetric and close to the independence as
θ approaches the origin,
i.e.,
. Later, we shall use two values,
and
, for the true parameter
θ; these values, roughly speaking, correspond to the close-to-independence case and the weak dependence case respectively.
The simulation procedure is outlined as follows. First, we generate 100 samples of 5000 observations from Frank’s copula for each value of
θ. With these samples in hand, we estimate four commonly-used parametric copulas, mentioned above, by using MLE method. We also estimate 12 MECCs (that is,
with combinations of
marginal constraints and
joint moment constraints) by using our proposed method. To gauge the errors of these estimators, we shall use the integrated mean squared error (IMSE);
where
is the density of Frank’s copula; and
represents an estimate using one of the above-mentioned parametric copulas or a MECC. Next, for each copula, we use the 100 samples of 5000 observations drawn from Frank’s copula to estimate the squared bias and the variance (as the functions of
u and
v). Both the integrated squared bias (
) and the integrated variance (
) are then obtained by evaluating the estimated squared bias (
where
denotes the empirical mean calculated using 100 samples) and the estimated variance (
) at 10000 pseudo-random Uniform [0,1] points, then taking their individual averages,
i.e.,
where
denotes a sample of 10000 points (drawn from the Uniform [0,1] distribution) whereby both
and
are evaluated. To gauge the errors of the nonparametric copula estimators, we shall use the expressions for the asymptotic bias and variance given in [
36,
39]; the optimal bandwidth is obtained by minimizing the integrated asymptotic MSE [
39]. We report our simulation results in
Table 2.
First, it can be noticed from
Table 2 that the MECCs significantly outperform elliptical copulas (
i.e., the Normal copula and Student’s
t copula) in terms of Int.
and IMSE. However, with a small number of marginal constraints the MECCs are mostly less stable than other parametric copulas; the only way to improve the stability (Int. Var.) of the MECCs is to increase the number of marginal constraints. For the close-to-independence case (
), the asymmetric copulas (
i.e., the Clayton copula and the Gumbel copula) outperform the MECCs. The intuition for these asymmetric copulas to have small Int.
and Int. Var. is that Frank’s copula, the Clayton copula, and the Gumbel copula all behave like the independence copula for
It is also interesting to note that the MECCs often outperform the LLS and MRS estimators in terms of
whilst these nonparametric estimators outperform the MECCs in terms of
The reason for the existence of non-zero
in the LLS and MRS estimators is that the optimal bandwidth (being shrinked close to zero at the corners of the unit square) can keep the bias bounded, but does not completely remove the bias.
Second, when
the data will become less independent, leading to a significant increase in Int.
pertaining to the estimation of the Clayton copula and Gumbel copula by using samples drawn from Frank’s copula. In this case, MECC(4,1), MECC(16,1), MECC(64,1), MECC(4,2), MECC(64,2), and MECC(64,3) all show significant improvements in Int.
over all the other estimators. It is also important to note at this point that, for a fixed number of marginal constraints, Int.
and
tend to deteriorate as one increases the number of joint moment constraints. To ameliorate this, it suffices to increase the number of marginal constraints as one adds one more joint moment constraint into the MEC problem. Indeed, as shown in
Table 2, for one joint moment constraint, one merely needs four marginal constraints to yield MECC(4,1) with minimum Int.
and IMSE; meanwhile, for two joint moment constraints, one needs to use up to 64 marginal constraints to yield MECC(64,2) with minimum Int.
, Int. Var., and IMSE. Our final observation is that, for a fixed number of moment constraints, an increase in the number of marginal constraints will always lead to a significant reduction in
Finally, to check the general validity of the obtained simulation results, we also replicate the above simulation study using data generated from Clayton copulas.
Table 3 shows that the good performance of the MECCs relative to other copula estimators is still carried over to this case when a sufficient number of marginal constraints is being used.