Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence

Chu, Ba; Satchell, Stephen

doi:10.3390/econometrics4020020

Open AccessArticle

Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence

by

Ba Chu

^1,*

and

Stephen Satchell

^2,3

¹

Department of Economics, Carleton University, B-857 Loeb Building, 1125 Colonel By Drive, Ottawa, ON K1S 5B6, Canada

²

Trinity College, University of Cambridge, Cambridge CB2 1TQ, UK

³

The University of Sydney Business School, New South Wales NSW 2006, Australia

^*

Author to whom correspondence should be addressed.

Econometrics 2016, 4(2), 20; https://doi.org/10.3390/econometrics4020020

Submission received: 7 June 2015 / Revised: 24 February 2016 / Accepted: 1 March 2016 / Published: 29 March 2016

(This article belongs to the Special Issue Recent Developments of Financial Econometrics)

Download Versions Notes

Abstract

:

This paper provides a new approach to recover relative entropy measures of contemporaneous dependence from limited information by constructing the most entropic copula (MEC) and its canonical form, namely the most entropic canonical copula (MECC). The MECC can effectively be obtained by maximizing Shannon entropy to yield a proper copula such that known dependence structures of data (e.g., measures of association) are matched to their empirical counterparts. In fact the problem of maximizing the entropy of copulas is the dual to the problem of minimizing the Kullback-Leibler cross entropy (KLCE) of joint probability densities when the marginal probability densities are fixed. Our simulation study shows that the proposed MEC estimator can potentially outperform many other copula estimators in finite samples.

Keywords:

entropy; relative entropy measure of joint dependence; copula; most entropic copula; canonical; kullback-Leibler cross entropy

JEL:

C190; C590; C130

1. Introduction

There has been a substantial literature on estimation and inference of relative entropy measures of joint dependence as measures of serial correlation. These particular measures of dependence were first proposed by Joe [1] and extended by Granger and Lin [2]. Relative entropy based measures of dependence have so far received much interest in econometrics because they provide very general concepts for gauging joint dependence; and they can be used for a set of variables that can be a mixture of continuous, ordinal-categorical, and nominal-categorical variables. Interested readers are referred to [3,4,5] for a concise review of important contributions in this area.

Econometricians have recently become interested in the computation of maximum entropy densities (see, e.g., Golan [6], Usta and Kantar [7], and references therein for the background and discussions regarding maximum entropy (ME) densities.) The ME densities are derived by maximization of an information criterion (the level of uncertainty) subject to mass and mean preserving constraints. The justification for using the ME in this context can be found in [8]. Rockinger and Jondeau [9] apply the ME method to determine the ME return distribution which is then utilized to extend Bollerslev’s GARCH into autoregressive conditional skewness and kurtosis. Maasoumi and Racine [10] employ a metric entropy measure of dependence to examine the predictability of asset returns. Hang [11] uses the ME to determine flexible functional forms of regression functions subject to side conditions. Miller and Liu [12] propose a method to recover a joint distribution function by applying the KLCE distance while imposing a required degree of dependence through the joint moments. An example is the normal distribution which is completely characterized by first and second moments. In this case, the minimum KLCE distribution is the multivariate Normal distribution where the dependence is specified through conventional linear correlation.

There has been a great deal of interest in copulas, especially in financial economics, as they have the potential to model and explain asymmetric dependence between random variables separately from their marginal distributions. For example, Patton [13] employs various families of copulas to investigate the inter-relationship between univariate skewnesses, asymmetric dependence between asset returns, and the optimal portfolios of assets. Rodriguez [14] models financial contagion using copulas. Chollete, Heinen, and Valdesogo [15] propose a multivariate regime-switching copula to capture asymmetric dependence and regime-switching in portfolio selection. Ning, Xu, andWirjanto [16] investigate asymmetric pattern in volatility clustering by employing a semi-parametric copula approach. Detailed indications of various econometric aspects or applications of copulas in economics and finance can be found, for instance, in the survey papers by Patton [17] and Fan and Patton [18]. A comprehensive treatment of copula theory is presented in the monograph by Nelsen [19].

Given the broad context described above, we propose a theoretical framework to recover relative entropy measures of joint dependence from limited information by constructing a set of the most entropic copulas (MEC’s), which can essentially be done by maximizing Shannon entropy subject to constraints on the uniform marginal distributions and other constraints on the copula-based measures of dependence (or the distance between the MEC and an arbitrary nested copula). In the class of MECs, there exists a simplified form, namely the most entropic canonical copula (MECC). Moreover, it can be shown that the proposed MEC approach and the KLCE approach in Miller and Liu [12] are dual in the sense that they can recover the same joint distribution. Applications of MEC’s to economics include Chu [20], Dempster, Medova, and Yang [21], Friedman and Huang [22], Veremyev, Tsyurmasto, Uryasev, and Rockafellar [23], Zhao and Lin [24].

We shall now discuss the contributions of the current paper in relation to [20]. The similarity between the two papers is that rank correlations are employed as prior information about dependence in order to construct the MECC. This paper differs from [20] in several respects. First, in [20], Carleman’s condition permits constraints on moments to be employed so as to ensure that the MEC satisfies all the properties of a copula while, in the present paper, constraints are explicitly imposed on marginal copula densities. Therefore the entropy maximization problem defined in [20] is merely a good approximation of the entropy maximization problem in this study. Second the main problem in [20] is the standard entropy maximization problem while the main problem in the present paper involves a continuum of constraints on the marginal distributions, which can be written as integrals with varying end-points that need to be smoothed out by using kernels. This kernel-smoother can generate MECs with smooth densities whilst the discrete approximation technique proposed by [21] can only allow for MECs with discrete densities. The feasibility and benefits of the proposed approach to construct MECs will then be demonstrated through a Monte-Carlo simulation study presented in Section 3.

Although our analysis is restricted to the bivariate case, the multivariate case is a straightforward extension. The remainder of the paper is organized in three sections. In Section 2, we formulate and approximate most entropic copulas (MECs). Next, we discuss the link between the MEC and the minimum KLCE density and the extent to which the MEC is more flexible than the KLCE method. We then compute the MEC and the MECC subject to marginal constraints and other constraints on various copula-based dependence measures such as Spearman’s rho and tau. We also outline the large sampling properties of the relevant parameter estimators. We present these results in Theorems 2.1–2.4. A simulation study is presented in Section 3, demonstrating that the MEC fits data well when compared with other competing procedures (e.g., parametric copulas and kernel estimators). Derivation of statistical properties for the proposed copula estimator is rather challenging and will be left for future research. Finally, to facilitate reading of this paper, we collect all materials of technical flavour into the three main appendices at the end of this paper.

2. Recovering the Most Entropic Copulas

2.1. Maximum Entropy and Copula

This section provides a brief explanation of entropy and copula. We refer to [25] for a comprehensive review of entropy econometrics and [19] for important results concerning copulas.

Shannon entropy has been used as an information criterion to construct the probability densities for economic or financial variables such as stock returns, income, GDP, etc. (see, inter alia, [26,27,28]). A univariate ME density is generally obtained by maximizing Shannon entropy,

- \int p (x) log p (x) d x

, with respect to

p (x)

under probability and moment constraints. A bivariate ME density that is closest to a given reference density, say the product of two univariate densities, can be obtained by minimizing the KLCE under joint moment constraints (see, e.g., [1] and [12]):

min_{f} K L C E (f : g) = min_{f} \int_{supp (g_{1}) \times supp (g_{2})} f (x, y) log \frac{f (x, y)}{g_{1} (x) g_{2} (y)} d x d y

(1)

subject to

\int_{supp (g_{1}) \times supp (g_{2})} h (x, y) f (x, y) d x d y = μ_{0},

where f is a bivariate density,

g_{1}

and

g_{2}

are some univariate densities,

supp (g_{1}) = {x \in R : g_{1} (x) \neq 0},

supp (g_{2}) = {y \in R : g_{2} (y) \neq 0},

and h is an arbitrary function such that

μ_{0} < \infty

.

The copula is proposed by Sklar [29] as a method to construct joint distributions with given marginals. The advantage of copulas is that dependence between random variables can be parametrically specified entirely independently from their marginals. A bivariate copula is defined as a function

C (\cdot, \cdot)

from

{[0, 1]}^{2}

to

[0, 1]

with the following properties: 1) for every

u, v \in [0, 1],

it holds that

C (u, 0) = C (0, v) = 0,

C (u, 1) = u,

and

C (1, v) = v;

2)

C (u, v)

is 2-increasing, i.e.,

C (u_{2}, v_{2}) - C (u_{2}, v_{1}) - C (u_{1}, v_{2}) + C (u_{1}, v_{1}) \geq 0

for every

u_{1}, u_{2}, v_{1}, v_{2} \in [0, 1]

such that

u_{1} \leq u_{2}

and

v_{1} \leq v_{2}

(see, e.g., [19], p. 8)). Note that Property (2) always holds if

C (u, v)

has a positive density

c (u, v)

, and Property (1) implies that a copula is a function with Uniform[0,1] marginals. Sklar’s theorem links a copula,

C (u, v)

, to a joint distribution,

F (X, Y)

, via

F (X, Y) = C (G_{1} (X), G_{2} (Y))

, where

G_{1}

and

G_{2}

are the marginals.

We shall use measures of association and rank correlations to construct the MEC, which we discuss next. Measures of association are, unlike joint moments, invariant under nonlinear transformations of the underlying random variables, and thus they are natural measures of dependence for non-elliptical random variables (see Appendix A for formal definitions of measures of association). A measure of association is, in general, defined as

τ = \int_{{[0, 1]}^{2}} h (u, v) d C (u, v)

, where h is a bivariate function such that

| τ | < + \infty

. This measure, based on C, is also referred to as the copula-based measure of dependence. In practice, τ can be estimated by the rank statistic

\hat{τ} = \frac{1}{N} \sum_{i = 1}^{N} h (\frac{R_{i}}{N}, \frac{S_{i}}{N})

, where

(R_{i}, S_{i})

represents the ranks of

(X_{i}, Y_{i})

in a sample of size N. An advantage of using rank statistics as nonparametric measures of nonlinear dependence is that they are robust—in the sense that they will be insensitive to contamination and maintain a high efficiency for heavier tailed elliptical distributions as well as for multivariate normal distributions (see, e.g., [30] for a detailed treatment of rank statistics). Examples of

\hat{τ}

include Spearman’s rho and Blest’s rank correlations (see, e.g., [31]), which are summarized in Table 1.

Nonetheless, it is worth mentioning that the definition of τ is somewhat restrictive since it does not include Kendall’s tau, for example.1 Moreover, not every rank correlation can be formulated in terms of the above general rank statistic

\hat{τ}

. For instance, the statistic

{\hat{R}}_{g}

, which was proposed by Gideon and Hollister [32] as a coefficient of rank correlation resistant to outliers even in a small sample, has the form:

{\hat{R}}_{g} = \frac{1}{[N / 2]} (max_{i} \sum_{s = 1}^{i} 1 (p_{s} < N + 1 - i) - max_{i} \sum_{s = 1}^{N} 1 (R_{s} \leq i < S_{s})),

where

p_{s}

is the value of

S_{i}

with the subscript i satisfying

R_{i} = s

, and

[•]

is the greatest integer notation. In addition,

{\hat{R}}_{g}

estimates a copula-based measure of dependence,

R_{g} = 2 \int_{{[0, 1]}^{2}} [{sup}_{w \in [0, 1]} 1 (u \leq w, v < 1 - w) - {sup}_{w \in [0, 1]} (1 (u \leq w) - 1 (u \leq w, v < w))] c (u, v) d u d v

.

In the present paper, we use the bivariate Shannon entropy of a copula, given by

W (c) = - \int_{{[0, 1]}^{2}} c (u, v) log c (u, v) d u d v, where c (u, v) = \frac{\partial^{2} C (u, v)}{\partial u \partial v} .

(2)

By Sklar’s theorem the Shannon entropy of a copula is then equivalent to the KLCE:

W (c) = - K L C E (f : g) .

Hence, minimization of the KLCE and maximization of the bivariate Shannon entropy are dual problems. Let

\hat{c} (u, v)

denote the MEC. Then, in view of [1], the relative entropy measure of dependence (recovered from limited information) is given by

- W (\hat{c})

. Generally speaking, a multivariate Shannon entropy can be defined in an obvious way, and this dual relationship holds. However, as pointed out in Friedman and Huang [22] the problem of maximizing a multivariate Shannon entropy of copulas can suffer from the curse of dimensionality because the number of constraints (on the marginal densities) needed for the MEC to satisfy all the properties of a copula increases as the problem involve more dimensions.

2.2. The Most Entropic Copula

We assume for the rest of this paper that the MEC is a differentiable function so that its copula density exists. The bivariate MEC (or the MEC) is obtained by maximizing the bivariate Shannon entropy (2) under two following constraints: (1) the marginals of

c (u, v)

are Uniform[0,1]; and (2) the measures of association, defined in Section 2.1, are set equal to the corresponding rank correlations. We call this Problem EM.

Problem EM : Maximize W (c) = - \int_{{[0, 1]}^{2}} c (u, v) log c (u, v) d u d v

(3)

subject to

\begin{matrix} \int_{{[0, 1]}^{2}} c (u, v) d u d v & = & 1, \end{matrix}

(4)

\begin{matrix} \int_{(0, u]} \int_{[0, 1]} c (x, v) d x d v & = & u, \forall u \in [0, 1], \end{matrix}

(5)

\begin{matrix} \int_{[0, 1]} \int_{(0, v]} c (u, y) d u d y & = & v, \forall v \in [0, 1], \end{matrix}

(6)

\begin{matrix} \int_{{[0, 1]}^{2}} h (u, v; {\hat{θ}}_{N}) c (u, v) d u d v & = & 0, \end{matrix}

(7)

where (4) implies that

c (u, v)

is a joint density on the unit circle; Equations (5) and (6) imply that the marginals of

c (u, v)

are Uniform[0,1] distributions; Equation (7) imposes a constraint on the joint behavior of U and V. To give an example, let

h (u, v; {\hat{θ}}_{N}) = 12 u v - 3 - {\hat{ρ}}_{S}

, then the left-hand side of (7) becomes Spearman’s rho and

{\hat{θ}}_{N} = {\hat{ρ}}_{S}

(note that, in what follows, we sometimes omit ‘N’ for brevity) is the rank correlation associated with Spearman’s rho. To give another example, suppose that the true data generating copula, say

C_{0} (u, v)

, belongs to a family,

C_{0}

. Given this prior information, to recover a MECC from the data, one may randomly choose a copula,

C_{1} (u, v; β)

, from

C_{0}

, then use it to construct (7) with

h (u, v; \hat{θ}) = 4 C_{1} (u, v; \hat{β}) - 1 - \hat{τ}

, where

\hat{θ} = {\hat{β}, \hat{τ}}^{^{'}}

and

\hat{τ}

is an estimate of the difference between the probabilities of concordance and discordance (cf. Appendix A). By doing this, it is expected that some feature of the family

C_{0}

could be effectively incorporated into the MECC. Other examples of Equation (7) also include Blest’s coefficients or Gideon and Hollister’s (1987) coefficient, etc. Also note that we may have more than one constraint like (7). It is to be stressed at this point that some versions of the MEC problem may exhibit boundary solutions due to theoretical restrictions on the measures of dependence employed (e.g., the Hoeffding-Frechet bounds on correlation statistics). Consequently, the large-sample theory stated in Section 2.3 below only holds for interior solutions to the stated problem.2

For future reference, we shall denote by

\hat{c} (u, v) = c (u, v, \hat{Λ})

, where

\hat{Λ}

is a vector of coefficients, as the MEC [that solves Problem EM]. The MECs (accordingly the MECC) can then be approximated by replacing the continuums of varying end-points in (5) and (6) by sets of definite integrals. We now present an approximate solution to Problem EM in Theorem 2.1 below.

THEOREM 2.1.

The MEC,

\hat{c} (u, v)

, can be approximated by an approximator,

{\hat{c}}_{n, N_{h}} (u, v)

, as follows:

\hat{c} (u, v) = lim_{\begin{matrix} n \to \infty \\ N_{h} \to \infty \end{matrix}} {\hat{c}}_{n, N_{h}} (u, v)

with

{\hat{c}}_{n, N_{h}} (u, v) = \frac{E_{n, N_{h}} (u, v)}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v},

(8)

where

\begin{matrix} E_{n, N_{h}} (u, v) & = & exp \{- \sum_{k = 0}^{2^{n} - 1} [{\hat{λ}}_{k} (Φ (N_{h} (k 2^{- n} - u)) + Φ (- N_{h} ((k + 1) 2^{- n} - u))) \\ + & {\hat{γ}}_{k} (Φ (N_{h} (k 2^{- n} - v)) + Φ (- N_{h} ((k + 1) 2^{- n} - v)))] - {\hat{λ}}_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)\} \end{matrix}

(9)

and

{\hat{Λ}}_{n} = \{{\hat{λ}}_{0} \dots, {\hat{λ}}_{2^{n} - 1}, {\hat{γ}}_{0}, \dots, {\hat{γ}}_{2^{n} - 1}\}

contains the minimal values of the following potential function:

\begin{matrix} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & \int_{{[0, 1]}^{2}} exp {- \sum_{k = 0}^{2^{n} - 1} [λ_{k} (Φ (N_{h} (k 2^{- n} - u)) + Φ (- N_{h} ((k + 1) 2^{- n} - u)) - 1 + 2^{- n}) \\ + & γ_{k} (Φ (N_{h} (k 2^{- n} - v)) + Φ (- N_{h} ((k + 1) 2^{- n} - v)) - 1 + 2^{- n})] \\ - & λ_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)} d u d v f o r a g i v e n b_{0} a n d \tilde{c} (u, v) . \end{matrix}

(10)

Note that

Φ (x) = \frac{1}{2 \sqrt{π}} \int_{- \infty}^{x} exp {- \frac{1}{2} y^{2}} d y

is the standard normal cdf (arising from smoothing indicator functions,

I (u \in [k 2^{- n}, (k + 1) 2^{- n}])

, with the Gaussian kernel) and

\tilde{c} (u, v)

is an arbitrary copula (which may involve a nuisance parameter that needs to be estimated).

In particular, the MEC,

\hat{c} (u, v)

, can be symmetrized by letting

λ_{k}

be equal to

γ_{k}

(

\forall k = 1, \dots, (2^{n} - 1)

) and

h (u, v, \hat{θ})

be a symmetric function.

Proof:

The proof utilizes the standard method of Variational Calculus for maximization of functions in normed linear spaces (see, e.g., [33], p. 129). See Appendix D. ■

As we can see, the MEC density nests an arbitrary copula,

\tilde{c} (u, v)

, (cf. Equation (9)). Indeed, the MEC depends on both

b_{0}

and

\tilde{c} (u, v)

, thus no uniqueness is obtained. However, we can obtain a canonical form, which is called the MECC, by setting

b_{0}

to zero. This idea of a canonical model can be traced back to Jeffreys3 who proposed to use the principle of simplicity for deductive inference—that is, for any given set of data, there is usually an infinite number of possible laws that will “explain” the data precisely; and the simplest model should be chosen.

It is also worth noting at this point that, like the empirical copula, the MECC is a valid distribution function; however, it satisfies the Uniform[0,1] marginal constraints only asymptotically. In addition the potential function

Q_{n, N_{h}} (Λ, \hat{θ})

in the above theorem is a multivariate convex function of Λ, which in general has a unique minimum because it is the product of (positive) univariate convex functions.

We can claim that the MECC,

\hat{c} (u, v)

, is equivalent to a maximum likelihood estimator (MLE). Now, we need to verify this claim—given a bivariate sample

(X_{i}, Y_{i})

for

i = 1, \dots, N

, the average maximum log-likelihood function is given by

\begin{matrix} ℓ ({\hat{Λ}}_{n}) & = & \frac{1}{N} \sum_{i = 1}^{N} log {\hat{c}}_{n, N_{h}} (u_{i}, v_{i}, {\hat{Λ}}_{n}) \\ = & \frac{1}{N} \sum_{i = 1}^{N} log E_{n, N_{h}} (u_{i}, v_{i}) - log \int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v, \end{matrix}

where

{\hat{c}}_{n, N_{h}} (u_{i}, v_{i}) \dot{=} {\hat{c}}_{n, N_{h}} (u_{i}, v_{i}, {\hat{Λ}}_{n})

is defined in (8),

u_{i} = \frac{1}{N} \sum_{s = 1}^{N} 1 (X_{s} \leq X_{i}) = \frac{R_{i}}{N + 1},

and

v_{i} = \frac{1}{N} \sum_{s = 1}^{N} 1 (Y_{s} \leq Y_{i}) = \frac{S_{i}}{N + 1},

in which

R_{i}

and

S_{i}

are the ranks of

X_{i}

and

Y_{i}

in the sample, respectively. Assuming that N is greater than n and that n is large enough, in view of (9) with

b_{0} = 0

, we obtain the following representation:

\begin{matrix} ℓ ({\hat{Λ}}_{n}) & = & - \frac{1}{N} \sum_{i = 1}^{N} ({\hat{λ}}_{- 1} + \sum_{k = 0}^{2^{n} - 1} ([{\hat{λ}}_{k} (Φ (N_{h} (k 2^{- n} - u_{i})) + Φ (- N_{h} ((k + 1) 2^{- n} - u_{i}))) \\ + & {\hat{γ}}_{k} (Φ (N_{h} (k 2^{- n} - v_{i})) + Φ (- N_{h} ((k + 1) 2^{- n} - v_{i})))] + {\hat{λ}}_{2^{n}} h (u_{i}, v_{i}))) \\ \approx & - ({\hat{λ}}_{- 1} + \frac{2}{2^{n}} \sum_{k = 0}^{2^{n} - 1} ({\hat{λ}}_{k} + {\hat{γ}}_{k}) + {\hat{λ}}_{2^{n}} \frac{1}{N} \sum_{i = 1}^{N} h (u_{i}, v_{i})) \\ = & - W (\hat{c} (u, v)), \end{matrix}

where

{\hat{λ}}_{- 1} = log \int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v

; the approximation (≈) follows because

\frac{1}{N} \sum_{i = 1}^{N} (Φ (N_{h} (k 2^{- n} - u_{i})) + Φ (- N_{h} ((k + 1) 2^{- n} - u_{i})) \approx \frac{1}{2^{n}}

for every

k = 0, \dots, (2^{n} - 1)

; and the last equality holds because

\int_{{[0, 1]}^{2}} h (u, v, \hat{θ}) c (u, v) d u d v

is set equal to its consistent rank estimator,

\frac{1}{N} \sum_{i = 1}^{N} h (R_{i} / (N + 1, S_{i} / (N + 1))

. Hence, the claim has been verified.

REMARK 2.1.

To compute the MECC, we could use either a Monte-Carlo integration procedure or Gaussian quadratures to approximate the potential function (10) (see Appendix C for further details), and then employ a global optimization technique (for example the stochastic search algorithm proposed by Csendes [34]) to minimize this function.

In general, we can also approximate

\hat{c} (u, v)

by using a collection of equally-spaced partitions of the unit interval

[0, 1]

, and then, a high-order kernel smoothing of the indicator function. This is stated in Theorem 2.2:

THEOREM 2.2.

The MEC,

\hat{c} (u, v)

, can be approximated by an approximator,

{\hat{c}}_{L, h} (u, v)

, as follows:

\hat{c} (u, v) = lim_{\begin{matrix} L \to \infty \\ h \to 0 \end{matrix}} {\hat{c}}_{L, h} (u, v)

with

{\hat{c}}_{L, h} (u, v) = \frac{E_{L, h} (u, v)}{\int_{{[0, 1]}^{2}} E_{L, h} (u, v) d u d v},

where

\begin{matrix} E_{L, h} (u, v) & = & exp \{- \frac{1}{L} \sum_{k = 1}^{L} [{\hat{λ}}_{k} (\frac{1}{h} \int_{\frac{k - 1}{L}}^{\frac{k}{L}} K (\frac{u - w}{h}) d w) + {\hat{γ}}_{k} (\frac{1}{h} \int_{\frac{k - 1}{L}}^{\frac{k}{L}} K (\frac{v - w}{h}) d w)] \\ - & {\hat{λ}}_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)\} \end{matrix}

for some kernel function,

K (•)

, in

K^{r} (R)

, where

K^{r} (R)

is the space of symmetric, Lebesgue integrable, kernel functions of order, r, (cf. Definition B.1) and

{\hat{Λ}}_{L} = {{\hat{λ}}_{1}, \dots, {\hat{λ}}_{L}, {\hat{γ}}_{1}, \dots, {\hat{γ}}_{L}}

contains the minimal values of the following potential function:

\begin{matrix} Q_{L, h} (Λ_{L}, \hat{θ}) & = & \int_{{[0, 1]}^{2}} exp \{- \frac{1}{L} \sum_{k = 1}^{L} [λ_{k} (\frac{1}{h} \int_{\frac{k - 1}{L}}^{\frac{k}{L}} K (\frac{u - w}{h}) d w - \frac{1}{L}) \\ + & γ_{k} (\frac{1}{h} \int_{\frac{k - 1}{L}}^{\frac{k}{L}} K (\frac{v - w}{h}) d w - \frac{1}{L})] - λ_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)} d u d v \\ f o r a g i v e n b_{0} a n d \tilde{c} (u, v) . \end{matrix}

Proof:

The proof is very similar to Theorem 2.1 combined with Lemma B.1. So we shall omit its details here. ■

2.3. Large Sample Properties with Unknown Parameters of Dependence

The approximate MECC densities are members of a statistical exponential family parametrized by the Lagrange multipliers. Since the true parameters of dependence

Θ^{0}

in (7) are unknown, a random sample of size N is then used to form their consistent estimates

{\hat{Θ}}_{N}

. Therefore, the sampling properties of

{\hat{Λ}}_{N}

may be derived from the associated sampling properties of

{\hat{Θ}}_{N}

. Let

Q_{n} (Λ, Θ)

represent the approximate potential function with the dependence parameters Θ as formulated in Section 2, where

{\hat{Λ}}_{N}

and

Λ^{0}

denote the minimal values of

Q_{n} (Λ, Θ)

for

Θ = {\hat{Θ}}_{N}

and

Θ = Θ^{0}

respectively. The Hessian matrices of

Q_{n} (Λ, Θ)

are

H_{1, n} (Λ, Θ) = \nabla_{Λ Λ^{^{'}}} Q_{n} (Λ, Θ)

and

H_{2, n} (Λ, Θ) = \nabla_{Λ Θ^{^{'}}} Q_{n} (Λ, Θ)

. The following assumptions are maintained

AS1.: ${\hat{Θ}}_{N} \overset{p}{⟶} Θ^{0} \in int (M)$ , where $M$ is some non-empty compact set; $d i m (M)$ is the number of dependence constraints. Further,

$N = \{Λ \in R^{d i m (Λ)} : \nabla_{Λ} Q_{n} (Λ, Θ) = 0, \forall Θ \in M\}$

is also a non-empty and compact set, where $d i m (Λ)$ is the number of the Lagrange multipliers in $Q_{n} (Λ, Θ)$ . Therefore, the number of marginal constraints is $d i m (Λ) - d i m (M)$ .
AS2.: The map from $M$ to $N$ is a diffeomorphism (i.e., one-to-one, continuous and onto in both directions).
AS3.: $Q_{n} (Λ, Θ)$ is a strictly convex function of Λ for all Θ and uniformly continuous (in probability) in Θ, i.e.

$sup_{Λ \in N} | Q_{n} (Λ, {\hat{Θ}}_{N}) - Q_{n} (Λ, Θ^{0}) | \overset{p}{⟶} 0, a s | {\hat{Θ}}_{N} - Θ^{0} | \overset{p}{⟶} 0 .$
AS4.: The vector of dependence parameter estimates is asymptotically normal such that

$N^{1 / 2} ({\hat{Θ}}_{N} - Θ^{0}) \overset{d}{⟶} N (0, Ψ),$

(11)

where Ψ is an asymptotic variance-covariance matrix of ${\hat{Θ}}_{N}$ .

AS2 states that the relationship between

M

and

N

is one-to-one in both directions (i.e., for a given set of dependence parameter estimates

{\hat{Θ}}_{N}

in

M

there exists uniquely a set of the Lagrange multipliers

{\hat{Λ}}_{N}

in

N

which contains a unique subset of the Lagrange multipliers determining the dependence constraints). This assumption ensures that the potential function has uniquely minimal values for a given set of parameters. Conversely, these minimal values are uniquely determined by a set of parameters. Regarding AS4,

{\hat{Θ}}_{N}

may be a set of sample moments after N draws from the kernel densities constructed from actual data. If all the moments exist and Carleman’s condition holds, then

{\hat{Θ}}_{N}

are consistent asymptotically normal estimates of

Θ_{0}

(see, e.g., Hardel, Muller, Sperlich, and Werwatz [35]).

THEOREM 2.3.

In view of AS1–AS4, we obtain

{\hat{Λ}}_{N} \overset{p}{⟶} Λ^{0} .

N^{1 / 2} ({\hat{Λ}}_{N} - Λ^{0}) \overset{d}{⟶} N (0, H_{1, n}^{- 1} (Λ^{0}, Θ^{0}) H_{2, n} (Λ^{0}, Θ^{0}) Ψ H_{2, n}^{^{'}} (Λ^{0}, Θ^{0}) {H_{1, n}^{- 1}}^{^{'}} (Λ^{0}, Θ^{0})) .

Proof:

See Appendix D. ■

If the dependence constraints are linear in their parameters, i.e.,

h (u, v, Θ) = h (u, v) - Θ

, we can redefine the potential function associated with the constraints of Problem EM as follows:

\begin{matrix} Q_{n} (Λ, \hat{Θ}) & = & \int_{{[0, 1]}^{2}} exp {λ_{- 1} + \sum_{k = 0}^{2^{n} - 1} [λ_{k} (Φ (k - 2^{n} u) + Φ (2^{n} u - k - 1) - 1 + 2^{- n}) \\ + & γ_{k} (Φ (k - 2^{n} v) + Φ (2^{n} v - k - 1) - 1 + 2^{- n})] \\ - & κ^{'} (h (u, v) - \hat{Θ})} d u d v - λ_{- 1}, \end{matrix}

(12)

where

Λ = {λ_{- 1}, λ_{0}, γ_{0}, \dots, λ_{k}, γ_{k}, \dots, λ_{2^{n} - 1}, γ_{2^{n} - 1}, κ^{^{'}}}

, and

λ_{- 1}

is the Lagrange multiplier for the constraint

\int_{{[0, 1]}^{2}} c (u, v) d u d v = 1

.

THEOREM 2.4.

If (12) satisfies AS1–AS4, then we have

{\hat{Λ}}_{N} \overset{p}{⟶} Λ^{0} .

N^{1 / 2} ({\hat{Λ}}_{N} - Λ^{0}) \overset{d}{⟶} N (0, H_{1, n}^{- 1} (Λ^{0}, Θ^{0}) I Ψ I^{^{'}} {H_{1, n}^{- 1} (Λ^{0}, Θ^{0})}^{^{'}}),

where

I

is a

d i m (Λ^{0}) \times d i m (Θ^{0})

diagonal matrix.

Proof:

Noting that

H_{2, n} (Λ^{0}, Θ) = I

, the proof follows directly from Theorem 2.3. ■

Theorem 2.4 suggests that in general the efficiency of the estimators

{\hat{Λ}}_{N}

can be improved by using more marginal constraints. However, adding too many marginal constraints can decrease efficiency since this may increase the probability that the covariances of

{u, v, h (u, v, {\hat{Θ}}_{N})}

in

Q_{n} (Λ, {\hat{Θ}}_{N})

are negative. Thus, the Hessian matrix

H_{1, n} (Λ^{0}, Θ^{0})

contains some negative elements which may cause the asymptotic variance of

N^{1 / 2} ({\hat{Λ}}_{N} - Λ^{0})

to increase overall. Theorems 2.3 and 2.4 can be used to develop tests of hypotheses about the “distance” between the MECC and another copula of the exponential function family.

3. Simulation

In this section, we perform some simulations to investigate the finite-sample properties of the MECC approximators (proposed above). We shall address three main issues in these simulations. First, the MECC can outperform the parametric copulas used in this study (the Gaussian copula, Student’s t copula, the Clayton copula, and the Gumbel copula) while its performance remains comparable to other nonparametric estimators (i.e., the “shrinked” local linear (LLS) type kernel copula estimator and the “shrinked” mirror-reflection (MRS) kernel copula estimator proposed by Omelka, Gijbels, and Veraverbeke [36]). Second, an increase in the number of marginal constraints leads to an improvement in the performance of the MECC. Third the MECC, for the most part, becomes as stable as other parametric copulas as more marginal constraints are utilized.

To accomplish the above objectives, we choose Frank’s copula,

C (u, v; θ) = - \frac{1}{θ} log (\frac{(1 - e^{- θ}) - (1 - e^{- θ u}) (1 - e^{- θ v})}{(1 - e^{- θ})}),

where

θ \in (- \infty, \infty) / {0},

as the true model whereby samples are generated. (See [37,38] for the statistical properties of Frank’s copula.) This copula is radially symmetric and close to the independence as θ approaches the origin, i.e.,

{lim}_{θ \to 0} C (u, v; θ) = u v

. Later, we shall use two values,

0.1

and

0.8

, for the true parameter θ; these values, roughly speaking, correspond to the close-to-independence case and the weak dependence case respectively.

The simulation procedure is outlined as follows. First, we generate 100 samples of 5000 observations from Frank’s copula for each value of θ. With these samples in hand, we estimate four commonly-used parametric copulas, mentioned above, by using MLE method. We also estimate 12 MECCs (that is,

M E C C (L, M)

with combinations of

L = 4, 16, 64

marginal constraints and

M = 1, 2, 3, 4

joint moment constraints) by using our proposed method. To gauge the errors of these estimators, we shall use the integrated mean squared error (IMSE);

\begin{matrix} \int_{0}^{1} \int_{0}^{1} \{E [{(c (u, v; θ) - \hat{c} (u, v))}^{2}]\} d u d v & = & \int_{0}^{1} \int_{0}^{1} {|E [\hat{c} (u, v) - c (u, v; θ)]|}^{2} d u d v \\ + & \int_{0}^{1} \int_{0}^{1} E [{(\hat{c} (u, v) - E [\hat{c} (u, v)])}^{2}] d u d v \\ = & {Int . Bias}^{2} + Int . Var ., \end{matrix}

where

c (u, v; θ)

is the density of Frank’s copula; and

\hat{c} (u, v)

represents an estimate using one of the above-mentioned parametric copulas or a MECC. Next, for each copula, we use the 100 samples of 5000 observations drawn from Frank’s copula to estimate the squared bias and the variance (as the functions of u and v). Both the integrated squared bias (

{Int . Bias}^{2}

) and the integrated variance (

Int . Var .

) are then obtained by evaluating the estimated squared bias (

{\hat{Bias}}^{2} (u, v) = {|\hat{E} [\hat{c} (u, v) - c (u, v; θ)]|}^{2},

where

\hat{E}

denotes the empirical mean calculated using 100 samples) and the estimated variance (

\hat{Var .} (u, v) = \hat{E} [{(\hat{c} (u, v) - E [\hat{c} (u, v)])}^{2}]

) at 10000 pseudo-random Uniform [0,1] points, then taking their individual averages, i.e.,

\begin{matrix} {Int . Bias}^{2} & \approx & \frac{1}{10000} \sum_{i = 1}^{10000} {\hat{Bias}}^{2} (u_{i}, v_{i}), \\ Int . Var . & \approx & \frac{1}{10000} \sum_{i = 1}^{10000} \hat{Var .} (u_{i}, v_{i}), \end{matrix}

where

{(u_{i}, v_{i})}_{i = 1}^{10000}

denotes a sample of 10000 points (drawn from the Uniform [0,1] distribution) whereby both

c (u, v, θ)

and

\hat{c} (u, v)

are evaluated. To gauge the errors of the nonparametric copula estimators, we shall use the expressions for the asymptotic bias and variance given in [36,39]; the optimal bandwidth is obtained by minimizing the integrated asymptotic MSE [39]. We report our simulation results in Table 2.

First, it can be noticed from Table 2 that the MECCs significantly outperform elliptical copulas (i.e., the Normal copula and Student’s t copula) in terms of Int.

{Bias}^{2}

and IMSE. However, with a small number of marginal constraints the MECCs are mostly less stable than other parametric copulas; the only way to improve the stability (Int. Var.) of the MECCs is to increase the number of marginal constraints. For the close-to-independence case (

θ = 0.1

), the asymmetric copulas (i.e., the Clayton copula and the Gumbel copula) outperform the MECCs. The intuition for these asymmetric copulas to have small Int.

{Bias}^{2}

and Int. Var. is that Frank’s copula, the Clayton copula, and the Gumbel copula all behave like the independence copula for

θ = 0.1 .

It is also interesting to note that the MECCs often outperform the LLS and MRS estimators in terms of

Int . Bias

whilst these nonparametric estimators outperform the MECCs in terms of

Int . Var .

The reason for the existence of non-zero

Int . Bias

in the LLS and MRS estimators is that the optimal bandwidth (being shrinked close to zero at the corners of the unit square) can keep the bias bounded, but does not completely remove the bias.

Second, when

θ = 0.8

the data will become less independent, leading to a significant increase in Int.

{Bias}^{2}

pertaining to the estimation of the Clayton copula and Gumbel copula by using samples drawn from Frank’s copula. In this case, MECC(4,1), MECC(16,1), MECC(64,1), MECC(4,2), MECC(64,2), and MECC(64,3) all show significant improvements in Int.

{Bias}^{2}

over all the other estimators. It is also important to note at this point that, for a fixed number of marginal constraints, Int.

{Bias}^{2}

and

Int . Var .

tend to deteriorate as one increases the number of joint moment constraints. To ameliorate this, it suffices to increase the number of marginal constraints as one adds one more joint moment constraint into the MEC problem. Indeed, as shown in Table 2, for one joint moment constraint, one merely needs four marginal constraints to yield MECC(4,1) with minimum Int.

{Bias}^{2}

and IMSE; meanwhile, for two joint moment constraints, one needs to use up to 64 marginal constraints to yield MECC(64,2) with minimum Int.

{Bias}^{2}

, Int. Var., and IMSE. Our final observation is that, for a fixed number of moment constraints, an increase in the number of marginal constraints will always lead to a significant reduction in

Int . Var .

Finally, to check the general validity of the obtained simulation results, we also replicate the above simulation study using data generated from Clayton copulas. Table 3 shows that the good performance of the MECCs relative to other copula estimators is still carried over to this case when a sufficient number of marginal constraints is being used.

4. Conclusions

We propose to employ the entropy-maximization principle to recover copulas from limited information regarding contemporaneous dependence between random variables. The main results of this article are twofold. First, we provide an entropy approach to recover relative entropy measures of joint dependence that are independent of marginal distributions by constructing most entropic copulas (MECs), in particular, their canonical forms, namely most entropic canonical copulas (MECC). Second, as a consequence of the MEC, we can construct ME joint distributions with a fixed dependence structure given by a MEC. Our method is shown to incorporate Miller and Liu [12]’s approach and can handle both moment-based and copula-based measures of dependence. Simulation results confirm that the accuracy of the approximate MECC can effectively be improved by increasing the number of side constraints.

Acknowledgments

We would like to express our sincere thanks to the guest editors (Professors Fredj Jawadi, Tony S. Wirjanto, Marc S. Paolella, and Nuttanan Wichitaksorn) and three anonymous referees for many valuable comments and constructive suggestions that help us to substantially improve this paper.

Author Contributions

Both authors contributed equally to the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

A. Known Results

DEFINITION A.1

(Adapted and modified from [19], Chapter 5). Let τ denote the difference between the probabilities of concordance and discordance of

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

, i.e., let

τ = P {(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0} - P {(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0},

(A1)

where

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

are independent vectors of continuous random variables with joint distributions

F_{1} (X, Y)

and

F_{2} (X, Y)

respectively which have common marginals

G_{1} (X)

(of

X_{1}, X_{2}

) and

G_{2} (Y)

(of

Y_{1}, Y_{2}

). When

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

have the same joint distribution function

F (X, Y)

, τ is Kendall’s tau (

τ_{K}

). The other measures of dependence such as Spearman’s rho and Gini’s gamma can be defined similarly.

THEOREM A.1

(Nelsen [19], Chapter 5). Let

C_{1}

and

C_{2}

denote the copulas of

(X_{1}, Y_{1})

and (

X_{2}, Y_{2}

) respectively so that

F_{1} (X, Y) = C_{1} (G_{1} (X), G_{2} (Y))

and

F_{2} (X, Y) = C_{2} (G_{1} (X), G_{2} (Y))

, then

τ = Q (C_{1}, C_{2}) = 4 \int_{{[0, 1]}^{2}} C_{2} (u, v) d C_{1} (u, v) - 1 .

(A2)

If

C_{2} (u, v)

is the Falier-Gumbel-Mogernstern (FGM) copula, i.e.,

C_{2} (u, v) = u v + θ u v (1 - u - v - u v)

then

τ = τ_{F G M}

. Note that we choose the FGM copula as a reference copula because it consists of quadrants of

u, v

which then enter

τ_{F G M}

as uniform joint moments.

If

C_{2} (u, v) = u v

then

τ = ρ_{S}

(Spearman’s rho).

If

C_{2} (u, v) = 2 (| u + v - 1 | - | u - v |)

then

τ = γ_{G}

(Gini’s gamma).

B. Auxiliary Results

DEFINITION B.1.

A kernel function

K : R \to R

of real order

r > 0

is a symmetric, Lebesgue integrable, function such that

(i): $\int_{R} K (y) d y = 1$ ,
(ii): $\int_{R} y^{j} K (y) d y = 0$ for $j = 1, \dots, [r],$ and
(iii): $\int_{R} {| y |}^{r} | K (y) | d y < \infty,$ where $[r]$ is the integer part of r.

LEMMA B.1.

Let

g (x)

represent a measurable function of

R^{n}

such that

i: $\int | g (x) | d x < \infty$ ,
ii: ${lim}_{∥x∥ ⟶ 0} {∥x∥}^{n} | g (x) | = 0$ ,
iii: $sup |g (x)| < \infty$ ,

where

∥x∥

is the Euclidean norm of

x

. Let

f (x)

be another function on

R^{n}

such that

\int |f (x)| < \infty

. Then, at every point,

x_{0}

, of continuity of f,

\frac{1}{h_{T}^{n}} \int_{R^{n}} g (\frac{x_{0} - w}{h_{T}}) f (w) d w ⟶ f (x_{0}) \int g (w) d w

as

T ⟶ \infty

, where

h_{T}

is a sequence of positive constants such that

h \to 0

as

T \to \infty

.

Proof:

See ([41], p. 362). ■

LEMMA B.2.

Let

Ω = [0, 1]

, let P denote Lebesgue measure, and let

f = f (x) \in L^{1} (Ω)

. Put

f_{n} (x) = 2^{n} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} f (y) d y, x \in [k 2^{- n}, (k + 1) 2^{- n}) .

(B1)

Then

f_{n} (x) \overset{P - a s}{⟶} f (x)

, where

{k 2^{- n}}

is a compact dyadic sequence dense in Ω. (Note that a sequence is defined to be dense in an interval if for every point of the interval, there exist a point of the sequence which is arbitrarily close to it. (See [42], p. 515.))

LEMMA B.3.

[DuBois-Reymond’s lemma] Let a function

b (t)

be continuous on the interval

[t_{0}, t_{1}]

. Assuming that the following equality holds for any continuous function

v (t)

with mean value zero (i.e.

\int_{t_{0}}^{t_{1}} v (t) d t = 0

)

\int_{t_{0}}^{t_{1}} b (t) v (t) d t = 0 .

(B2)

Then,

b (t) = b_{0} = c o n s t

. Vice versa, if

b (t) = c o n s t

then

\int_{t_{0}}^{t_{1}} b (t) v (t) d t = 0

. (See [43], p. 400)

LEMMA B.4.

The indicator function

1_{y > x} (y)

can be approximated by a continuous function

Φ_{N} (y, x)

, where

Φ_{N} (y, x)

is given by

Φ_{N} (y, x) = \frac{N}{2 π} \int_{- \infty}^{y} exp {- {(v - x)}^{2} N^{2} / 2} d v;

(B3)

Φ_{N} (y, x)

has the following properties:

\begin{matrix} lim_{N ⟶ \infty} Φ_{N} (y, x) ⟹ 1_{y > x} (y), \\ lim_{N ⟶ \infty} \frac{\partial Φ_{N} (y, x)}{\partial y} ⟹ δ (y - x), \end{matrix}

where

δ (•)

is Dirac’s delta function. (See [44] (p. 30))

C. Approximation of Potential Functions

We now present a Gaussian-Legendre quadrature method to approximate the potential function (10) for the MECC. Using affine transformations,

x_{1} (u) : [0, 1] ⟶ [- 1, 1]

with

x_{1} = 2 u - 1

and

x_{2} (v) : [0, 1] ⟶ [- 1, 1]

with

x_{2} = 2 v - 1

, (10) can be rewritten as follows:

\begin{matrix} Q_{n, N_{h}} (Λ, \hat{θ}) & = & \frac{1}{4} \int_{{[- 1, 1]}^{2}} exp {\sum_{k = 0}^{2^{n} - 1} [λ_{k} (Φ (k - 2^{n - 1} (x_{1} + 1)) + Φ (2^{n - 1} (x_{1} + 1) - k - 1) - 1 + 2^{- n}) \\ + & γ_{k} (Φ (k - 2^{n - 1} (x_{2} + 1)) + Φ (2^{n - 1} (x_{2} + 1) - k - 1) - 1 + 2^{- n})] \\ - & λ_{2^{n}} (h (\frac{x_{1} + 1}{2}, \frac{x_{2} + 1}{2}) - \hat{θ})} d x_{1} d x_{2} \\ = & \frac{1}{4} \int_{{[- 1, 1]}^{2}} exp {- Λ^{^{'}} Ψ (X)} d X, \end{matrix}

(C1)

where

X = {x_{1}, x_{2}}

,

Λ^{^{'}} = {λ_{0}, γ_{0}, \dots, λ_{k}, γ_{k}, \dots, λ_{2^{n} - 1}, γ_{2^{n} - 1}, λ_{2^{n}}}

, and

Ψ (X)

has an obvious meaning.

The function

exp {- Λ^{^{'}} Ψ (X)}

can be expanded into a series of the orthogonal Legendre polynomials, that is,

\begin{matrix} exp {- Λ^{^{'}} Ψ (X)} = \sum_{n = 0}^{\infty} \sum_{m = 0}^{\infty} a_{n m} P_{n m} (X), \end{matrix}

(C2)

where

P_{n m} (X) = P_{n} (x_{1}) P_{m} (x_{2})

are products of two Legendre orthogonal polynomials (see, e.g., [45] for further details of the Legendre polynomials),

a_{n m} = \frac{(2 n + 1) (2 m + 1)}{4} \int_{{[- 1, 1]}^{2}} exp {- Λ^{^{'}} Ψ (X)} P_{n m} (X) d X,

and

a_{00} = \frac{1}{4} \int_{{[- 1, 1]}^{2}} exp {- Λ^{^{'}} Ψ (X)} d X .

Now, let

X_{i, j} = (x_{1 i}, x_{2 j})

,

\forall i = 1, \dots, N

and

j = 1, \dots, M

, be the roots of the polynomials

P_{N} (x_{1}) = 0

and

P_{M} (x_{2}) = 0

respectively –

X_{i, j}

are also called the abscissae of the Legendre polynomials – then, choose weights,

ω_{i j}

, satisfying the following M×N relations:

\{\begin{matrix} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} P_{00} (X_{i, j}) = \sum_{1}^{N} \sum_{1}^{M} ω_{i j} = 1, \\ \sum_{1}^{N} \sum_{1}^{M} ω_{i j} P_{k h} (X_{i, j}) = 0, ω_{i j} \geq 0, \end{matrix}

(C3)

where

(k, h) \in (1, \dots, N) \otimes (1, \dots, M)

. We obtain:

\begin{matrix} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} exp {- Λ^{^{'}} Ψ (X_{i j})} & = & \sum_{n = 0}^{\infty} \sum_{m = 0}^{\infty} a_{n m} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} P_{n m} (X_{i j}) \\ = & a_{00} + \sum_{n = N + 1}^{\infty} \sum_{m = M + 1}^{\infty} a_{n m} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} P_{n m} (X_{i j}) . \end{matrix}

(C4)

Hence,

\begin{matrix} a_{00} = Q_{n, N_{h}} (Λ, \hat{θ}) & = & \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} exp {- Λ^{^{'}} Ψ (X_{i j})} - \sum_{n = N + 1}^{\infty} \sum_{m = M + 1}^{\infty} a_{n m} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} P_{n m} (X_{i j}) \\ = & \underset{The Approximation}{\underset{︸}{\sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} exp {- Λ^{^{'}} Ψ (X_{i j})}}} + R_{N M}, \end{matrix}

(C5)

where

R_{N M} = - \sum_{n = N + 1}^{\infty} \sum_{m = M + 1}^{\infty} a_{n m} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ω_{i j} P_{n m} (X_{i j})

is an error term, and

(M, N)

are large enough.

We now present a Quasi-Newton algorithm to minimize (C5).

(1) Given a starting point

Λ_{0}

, a convergence tolerance

ϵ > 0

and an initial step length

α_{0} > 0

,

(ρ, c) \in {[0, 1]}^{2}

, an initial inverse Hessian matrix

H_{0}

and the numbers of Gaussian quadratures

N, M

.

(2) While

∥ \nabla \underset{(2 n + 1 \times 1)}{Q_{n} (Λ_{k}, \hat{Θ})} ∥ > ϵ

:

\underset{2 n + 1 \times 1}{p_{k}} = - H_{k} \nabla Q_{n} (Λ_{k}, \hat{Θ})

(search direction).

repeat until

Q_{n} (Λ_{k} + α p_{k}, \hat{Θ}) \leq Q_{n} (Λ_{k}, \hat{Θ}) + c α \nabla^{^{'}} Q_{n} (Λ_{k}, \hat{Θ}) p_{k}

α \leftarrow ρ α

Stop with

α_{k} = α

(step length satisfies the Goldstein condition (see, e.g., [46])

Set

\underset{2 n + 1 \times 1}{Λ_{k + 1}} = Λ_{k} - α_{k} p_{k}

(3) Define

\underset{2 n + 1 \times 1}{s_{k}} = Λ_{k + 1} - Λ_{k}

and

\underset{2 n + 1 \times 1}{y_{k}} = \nabla Q_{n} (Λ_{k + 1}, \hat{Θ}) - \nabla Q_{n} (Λ_{k}, \hat{Θ})

Compute the updated Hessian matrix:

H_{k + 1} = (I - ρ_{k} s_{k} y_{k}^{^{'}}) H_{k} (I - ρ_{k} y_{k} s_{k}^{^{'}}) + ρ_{k} s_{k} s_{k}^{^{'}},

where

ρ_{k} = \frac{1}{y_{k}^{^{'}} s_{k}}

k \leftarrow k + 1

end;

where

\nabla Q_{n} (Λ, \hat{Θ})

is the first-order gradient vector of

Q_{n} (Λ, \hat{Θ})

and

| | • | |

is a standard matrix norm.

To compute the MECC, we used a stochastic search algorithm to minimize (C5) whilst setting

M = N = 30

.

D. Proofs

Proof of Theorem 2.1:

Since (5)–(6) are continuums of constraints with varying end-points, we need to replace these continuums with sets of definite integrals:

\int_{[a, b]} \int_{[0, 1]} c (u, v) d u d v = \int_{[0, 1]} \int_{[a, b]} c (u, v) d u d v = b - a,

(D1)

where a and b are arbitrary numbers in

[0, 1]

. Using a dense dyadic sequence in

[0, 1]

, (D1) can be approximated by

\begin{matrix} \sum_{k = k_{1}}^{k = k_{2}} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} \int_{[0, 1]} c (u, v) d u d v = \sum_{k = k_{1}}^{k = k_{2}} \int_{[0, 1]} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} c (u, v) d u d v = \frac{k_{2} - k_{1}}{2^{n}}, \end{matrix}

where

k_{1}

and

k_{2}

are chosen such that

| a - k_{1} 2^{- n} | \leq ϵ

and

| b - k_{2} 2^{- n} | \leq ϵ

, where ϵ is small enough. Hence, (D1) is equivalent to

\begin{matrix} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} \int_{[0, 1]} c (u, v) d u d v = \int_{[0, 1]} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} c (u, v) d u d v = \frac{1}{2^{n}} \\ \forall k = 0, 1, 2, \dots, (2^{n} - 1), and n is large enough . \end{matrix}

(D2)

The Lagrangian function of Problem EM can be formulated as follows:

\begin{matrix} L (c, Λ_{n}; \hat{θ}) & = & - \int_{{[0, 1]}^{2}} c (u, v) log c (u, v) d u d v - λ_{- 1} [\int_{{[0, 1]}^{2}} c (u, v) d u d v - 1] \\ - & \sum_{k = 0}^{2^{n} - 1} \{λ_{k} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} \int_{[0, 1]} [c (u, v) - 2^{- n}] + γ_{k} \int_{[0, 1]} \int_{[k 2^{- n}, (k + 1) 2^{- n}]} [c (u, v) - 2^{- n}]\} \\ - & λ_{2^{n}} \int_{{[0, 1]}^{2}} h (u, v, \hat{θ}) c (u, v) d u d v \\ = & - \int_{{[0, 1]}^{2}} {c (u, v) log c (u, v) + λ_{- 1} [c (u, v) - 1] \\ + & \sum_{k = 0}^{2^{n} - 1} (λ_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) + γ_{k} I (v \in [k 2^{- n}, (k + 1) 2^{- n}])) [c (u, v) - 2^{- n}] \\ + & λ_{2^{n}} h (u, v, \hat{θ}) c (u, v)} d u d v . \end{matrix}

(D3)

Taking the first derivative of

L (c, Λ_{n}; \hat{θ})

with respect to c leads to

\begin{matrix} \int_{{[0, 1]}^{2}} {log c (u, v) & + & (1 + λ_{- 1}) + \sum_{k = 0}^{2^{n} - 1} [λ_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) \\ + & γ_{k} I (v \in [k 2^{- n}, (k + 1) 2^{- n}])] + λ_{2^{n}} h (u, v, \hat{θ})} d u d v = 0 . \end{matrix}

(D4)

Define

b_{n} (u, v) \dot{=} log c (u, v) + (1 + λ_{- 1}) + \sum_{k = 0}^{2^{n} - 1} [λ_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) + γ_{k} I (v \in [k 2^{- n}, (k + 1) 2^{- n}])] + λ_{2^{n}} h (u, v, \hat{θ})

, then applying Lemma B.3 to the function

b (u, v) = \frac{b_{n} (u, v)}{\tilde{c} (u, v) - 1},

(D5)

where

\tilde{c} (u, v)

is an arbitrary copula density such that

\int_{{[0, 1]}^{2}} (\tilde{c} (u, v) - 1) d u d v = 0

, we obtain the following representation:

\begin{matrix} {\hat{c}}_{n, N_{h}} (u, v) & = & exp {- (1 + λ_{- 1} - b_{0}) - \sum_{k = 0}^{2^{n} - 1} [λ_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) + γ_{k} I (v \in [k 2^{- n}, (k + 1) 2^{- n}])] \\ - & λ_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)}, \end{matrix}

(D6)

and

b_{0}

is a generic constant. Now, by substituting (D6) into (4) the leading term,

1 + λ_{- 1} - b_{0}

, is canceled out, then we obtain:

{\hat{c}}_{n, N_{h}} (u, v) = \frac{E_{n, N_{h}} (u, v)}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v},

(D7)

where

\begin{matrix} E_{n, N_{h}} (u, v) & = & exp \{- \sum_{k = 0}^{2^{n} - 1} [{\hat{λ}}_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) + {\hat{γ}}_{k} I (v \in [k 2^{- n}, (k + 1) 2^{- n}])] - {\hat{λ}}_{2^{n}} h (u, v, \hat{θ}) \\ - & b_{0} \tilde{c} (u, v)\} . \end{matrix}

The Lagrangian multipliers

{\hat{Λ}}_{n} = \{{\hat{λ}}_{0}, \dots, {\hat{λ}}_{2^{n}}, {\hat{γ}}_{0}, \dots, {\hat{γ}}_{2^{n} - 1}\}

can be solved out by substituting (D7) into (5), (6), and (7), leading to the following system of equations:

\begin{matrix} \frac{1}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v} \int_{{[0, 1]}^{2}} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) E_{n, N_{h}} (u, v) d u d v & = & 2^{- n}, \\ \frac{1}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v} \int_{{[0, 1]}^{2}} I (v \in [k 2^{- n}, (k + 1) 2^{- n}]) E_{n, N_{h}} (u, v) d u d v & = & 2^{- n}, \\ \frac{1}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v} \int_{{[0, 1]}^{2}} h (u, v, \hat{θ}) E_{n, N_{h}} (u, v) d u d v & = & 0 \end{matrix}

(D8)

for all

k = 0, \dots, (2^{n} - 1)

. Since (D7) can be rewritten as

\begin{matrix} {\hat{c}}_{n, N_{h}} (u, v) & = & \frac{- \sum_{k = 0}^{2^{n} - 1} ({\hat{λ}}_{k} 2^{- n} + {\hat{γ}}_{k} 2^{- n})}{\int_{{[0, 1]}^{2}} E_{n, N_{h}} (u, v) d u d v} exp {- \sum_{k = 0}^{2^{n} - 1} [{\hat{λ}}_{k} (I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) - 2^{- n}) \\ + & {\hat{γ}}_{k} (I (v \in [k 2^{- n}, (k + 1) 2^{- n}]) - 2^{- n})] - {\hat{λ}}_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)}, \end{matrix}

we can define the potential function as follows:

\begin{matrix} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & \int_{{[0, 1]}^{2}} exp {- \sum_{k = 0}^{2^{n} - 1} [λ_{k} (I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) - 2^{- n}) \\ + & γ_{k} (I (v \in [k 2^{- n}, (k + 1) 2^{- n}]) - 2^{- n})] - λ_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)} d u d v . \end{matrix}

Then, (D8) is equivalent to the following system of equations:

\begin{matrix} \frac{\partial}{\partial λ_{k}} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & 0, \\ \frac{\partial}{\partial γ_{k}} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & 0, \\ \frac{\partial}{\partial λ_{2^{n}}} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & 0 \end{matrix}

(D9)

for all

k = 0, \dots, (2^{n} - 1)

. Also note that, since the second order derivatives of

Q_{n, N_{h}} (Λ_{n}, \hat{θ})

is the covariance matrix of

{\{I (u \in [k 2^{- n}, (k + 1) 2^{- n}]), I (v \in [k 2^{- n}, (k + 1) 2^{- n}]), h (u, v, \hat{θ})\}}_{k = 0}^{2^{n} - 1}

, thus

Q_{n, N_{h}} (Λ_{n}, \hat{θ})

is positive definite. It follows that the solutions to (D9) are the minimum values of

Q_{n, N_{h}} (Λ_{n}, \hat{θ})

, which depend on

\hat{θ}

,

b_{0}

, and

\tilde{c} (u, v)

.

Since the potential function

Q_{n, N_{h}} (Λ_{n}, \hat{θ})

and the MEC (D7) are non-smooth, following common practice, they need to be smoothed out. We can obtain their smoothings by using a continuous approximation to the indicator function,

\sum_{k = 0}^{2^{n} - 1} λ_{k} I (u \in [k 2^{- n}, (k + 1) 2^{- n}])

, for a sufficiently large n. An application of Lemma B.4 yields

\begin{matrix} I (u \in [k 2^{- n}, (k + 1) 2^{- n}]) & = & 1 - I (u < k 2^{- n}) - I (u > (k + 1) 2^{- n}) \\ = & 1 - lim_{N_{h} ⟶ \infty} \frac{N_{h}}{\sqrt{2 π}} \int_{- \infty}^{k 2^{- n}} exp {- {(x - u)}^{2} N_{h}^{2} / 2} d x \\ - & lim_{N_{h} ⟶ \infty} \frac{N}{\sqrt{2 π}} \int_{- \infty}^{- (k + 1) 2^{- n}} exp {- {(x + u)}^{2} N_{h}^{2} / 2} d x \\ = & 1 - lim_{N_{h} ⟶ \infty} Φ (N_{h} (k 2^{- n} - u)) - lim_{N ⟶ \infty} Φ (- N_{h} ((k + 1) 2^{- n} - u)) . \end{matrix}

We then immediately obtain:

\begin{matrix} Q_{n, N_{h}} (Λ_{n}, \hat{θ}) & = & \int_{{[0, 1]}^{2}} exp {\sum_{k = 0}^{2^{n} - 1} [λ_{k} (Φ (N_{h} (k 2^{- n} - u)) + Φ (- N_{h} ((k + 1) 2^{- n} - u)) - 1 + 2^{- n}) \\ + & γ_{k} (Φ (N_{h} (k 2^{- n} - v)) + Φ (- N_{h} ((k + 1) 2^{- n} - v)) - 1 + 2^{- n})] \\ - & λ_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)} d u d v, \end{matrix}

(D10)

and

\begin{matrix} E_{n, N_{h}} (u, v) & \approx & exp {\sum_{k = 0}^{2^{n} - 1} [{\hat{λ}}_{k} (Φ (N_{h} (k 2^{- n} - u)) + Φ (- N_{h} ((k + 1) 2^{- n} - u))) \\ + & {\hat{γ}}_{k} (Φ (N_{h} (k 2^{- n} - v)) + Φ (- N_{h} ((k + 1) 2^{- n} - v)))] \\ - & {\hat{λ}}_{2^{n}} h (u, v, \hat{θ}) - b_{0} \tilde{c} (u, v)}, \end{matrix}

(D11)

where

{\hat{Λ}}_{n}

are the minimum values of (D10). In particular,

{\hat{c}}_{n, N_{h}} (u, v) = \frac{E_{n, N_{h}} (u, v)}{\int_{0}^{1} \int_{0}^{1} E_{n, N_{h}} (u, v) d u d v}

can be symmetrized by letting

λ_{k} = γ_{k}

for every

k = 0, \dots, (2^{n} - 1)

and letting

h (u, v, \hat{θ})

be a symmetric function.

Finally, to complete this proof, we still need to prove that the MEC approximator,

{\hat{C}}_{n, N_{h}} (u, v) = \int_{0}^{u} \int_{0}^{v} {\hat{c}}_{n, N_{h}} (u, v) d u d v

, is 2-increasing. Let’s denote by

[u_{1}, u_{1} + Δ] \times [v_{1}, v_{1} + Δ]

a rectangle in

{[0, 1]}^{2}

, we immediately establish that, since

{\hat{c}}_{n, N_{h}} (u, v)

is a (positive) exponential function, the mass of the rectangle,

{\hat{C}}_{n, N_{h}} (u_{1} + Δ, v_{1} + Δ) - {\hat{C}}_{n, N_{h}} (u_{1} + Δ, v_{1}) - {\hat{C}}_{n, N_{h}} (u_{1}, v_{1} + Δ) + {\hat{C}}_{n, N_{h}} (u_{1}, v_{1}) = \int_{u_{1}}^{u_{1} + Δ} \int_{v_{1}}^{v_{1} + Δ} {\hat{c}}_{n, N_{h}} (u, v) d u d v

, is thus nonnegative. Now, we can obtain the MECs by letting n and

N_{h}

become sufficiently large. ■

Proof of Theorem 2.3:

For all

Θ_{T} \in M

,

Q_{n} (Λ, Θ_{T})

has a unique finite supremum for all T in view of AS1 and AS2. AS2 and

{\hat{Θ}}_{T} \overset{p}{⟶} Θ^{0}

implies that

P r o b {{\hat{Λ}}_{T} \in \partial N} ⟶ 0

. Thus,

Q_{n} (Λ, {\hat{Θ}}_{T})

has a unique interior supremum

{\hat{Λ}}_{T}

for a sufficiently large T. Let

Λ^{0}

denote the unique supremum of

Q_{n} (Λ, Θ^{0})

. In view of AS3,

{\hat{Θ}}_{T} \overset{p}{⟶} Θ^{0}

implies

{\hat{Λ}}_{T} \overset{p}{⟶} Λ^{0}

.

An application of the mean-value theorem yields:

\nabla Q_{n} ({\hat{Λ}}_{T}, Θ^{0}) = \nabla Q_{n} (Λ^{0}, Θ^{0}) + H_{1, n} ({\hat{Λ}}_{T}^{*}, Θ^{0}) ({\hat{Λ}}_{T} - Λ^{0}),

where

min ({\hat{Λ}}_{T}, Λ^{0}) < {\hat{Λ}}_{T}^{*} < max ({\hat{Λ}}_{T}, Λ^{0})

. Thus, we have

{\hat{Λ}}_{T} - Λ^{0} = H_{1, n}^{- 1} ({\hat{Λ}}_{T}^{*}, Θ^{0}) \nabla Q_{n} ({\hat{Λ}}_{T}, Θ^{0}) .

Another application of the mean-value theorem yields:

\nabla Q_{n} ({\hat{Λ}}_{T}, {\hat{Θ}}_{T}) = \nabla Q_{n} ({\hat{Λ}}_{T}, Θ^{0}) + H_{2, n} ({\hat{Λ}}_{T}, {\hat{Θ}}_{T}^{*}) ({\hat{Θ}}_{T} - Θ^{0}),

where

min ({\hat{Θ}}_{T}, Θ^{0}) < {\hat{Θ}}_{T}^{*} < max ({\hat{Θ}}_{T}, Θ^{0})

. Thus, we obtain

T^{1 / 2} ({\hat{Λ}}_{T} - Λ^{0}) = - T^{1 / 2} H_{1, n}^{- 1} ({\hat{Λ}}_{T}^{*}, Θ^{0}) H_{2, n} ({\hat{Λ}}_{T}, {\hat{Θ}}_{T}^{*}) ({\hat{Θ}}_{T} - Θ^{0}) .

Since

{\hat{Θ}}_{T} \overset{p}{⟶} Θ^{0}

implies

{\hat{Λ}}_{T} \overset{p}{⟶} Λ^{0}

, the continuous mapping theorem yields:

H_{1, n} ({\hat{Λ}}_{T}^{*}, Θ^{0}) \overset{p}{⟶} H_{1, n} (Λ^{0}, Θ^{0}),

H_{2, n} ({\hat{Λ}}_{T}, {\hat{Θ}}_{T}^{*}) \overset{p}{⟶} H_{2, n} (Λ^{0}, Θ^{0}) .

Hence, Slutsky’s theorem and AS4 yield

T^{1 / 2} ({\hat{Λ}}_{T} - Λ^{0}) \overset{d}{⟶} N (0, H_{1, n}^{- 1} (Λ^{0}, Θ^{0}) H_{2, n} (Λ^{0}, Θ^{0}) Ψ H_{2, n}^{^{'}} (Λ^{0}, Θ^{0}) {H_{1, n}^{- 1}}^{^{'}} (Λ^{0}, Θ^{0})) .

■

References

H. Joe. “Relative entropy measures of multivariate dependence.” J. Am. Stat. Assoc. 84 (1989): 157–164. [Google Scholar] [CrossRef]
C. Granger, and J.L. Lin. “Using the mutual information coefficient to identify lags in nonlinear models.” J. Time Ser. Anal. 15 (1994): 371–384. [Google Scholar] [CrossRef]
V.H. De la Peña, R. Ibragimov, and S. Sharakhmetov. “Characterizations of joint distributions, copulas, information, dependence and decoupling, with applications to time series.” In Optimality, Institute of Mathematical Statistics Lecture Notes—Monograph Series 49. Beachwood, OH, USA: The Institute of Mathematical Statistics, 2006, pp. 183–209. [Google Scholar]
A. Golan. “Information and entropy econometrics: Editor’s view.” J. Econom. 107 (2002): 1–15. [Google Scholar] [CrossRef]
J.M. Sarabia, and E. Gomez-Deniz. “Construction of multivariate distributions: A review of some recent results.” Stat. Oper. Res. Trans. 32 (2008): 3–36. [Google Scholar]
A. Golan. “Information and entropy econometrics—Volume overview and synthesis.” J. Econom. 138 (2007): 379–387. [Google Scholar] [CrossRef]
I. Usta, and Y.M. Kantar. “On the performance of the flexible maximum entropy distributions within partially adaptive estimation.” Comput. Stat. Data Anal. 55 (2011): 2172–2182. [Google Scholar] [CrossRef]
E.T. Jaynes. “Information theory and statistical mechanics.” Phys. Rev. 106 (1957): 620–630. [Google Scholar] [CrossRef]
M. Rockinger, and E. Jondeau. “Entropy densities with an application to autoregressive conditional skewness and kurtosis.” J. Econom. 106 (2002): 119–142. [Google Scholar] [CrossRef]
E. Maasoumi, and J. Racine. “Entropy and predictability of stock market returns.” J. Econom. 107 (2002): 291–312. [Google Scholar] [CrossRef]
R.K. Hang. “Maximum entropy estimation of density and regression functions.” J. Econom. 56 (1993): 397–400. [Google Scholar]
D.J. Miller, and W. Liu. “On the recovery of joint distributions from limited information.” J. Econom. 107 (2002): 259–274. [Google Scholar] [CrossRef]
A.J. Patton. “On the out-of-sample importance of skewness and asymmetric dependence for asset allocation.” J. Financ. Econom. 2 (2004): 130–168. [Google Scholar] [CrossRef]
J.C. Rodriguez. “Measuring financial contagion: A copula approach.” J. Empir. Financ. 14 (2007): 401–423. [Google Scholar] [CrossRef]
L. Chollete, A. Heinen, and A. Valdesogo. “Modeling international financial returns with a multivariate regime-switching copula.” J. Financ. Econom. 7 (2009): 437–480. [Google Scholar] [CrossRef]
C. Ning, D. Xu, and T.S. Wirjanto. “Is volatility clustering of asset returns asymmetric? ” J. Bank. Financ. 52 (2015): 62–76. [Google Scholar] [CrossRef]
A.J. Patton. “Copula-based models for financial time series.” In Handbook of Financial Time Series. Edited by T. Mikosch, J.-P. Kreiß, R.A. Davis and T.G. Andersen. Berlin, Heidelberg, Germany: Springer-Verlag, 2009, pp. 767–785. [Google Scholar]
Y. Fan, and A.J. Patton. “Copulas in econometrics.” Annu. Rev. Econ. 6 (2014): 179–200. [Google Scholar] [CrossRef]
R.B. Nelsen. An Introduction to Copulas. New York, NY, USA: Springer-Verlag, 1998. [Google Scholar]
B. Chu. “Recovering copulas from limited information and an application to asset allocation.” J. Bank. Financ. 35 (2011): 1824–1842. [Google Scholar] [CrossRef]
M.A.H. Dempster, E.A. Medova, and S.W. Yang. “Empirical copulas for CDO tranche pricing using relative entropy.” Int. J. Theor. Appl. Financ. 10 (2007): 679–701. [Google Scholar] [CrossRef]
C.A. Friedman, and J. Huang. “Most Entropic Copulas: General Form, and Calibration to High-Dimensional Data in an Important Special Case.” SSRN Electron. J., 2010. [Google Scholar] [CrossRef]
A. Veremyev, P. Tsyurmasto, S. Uryasev, and R.T. Rockafellar. “Calibrating probability distributions with convex-concave-convex functions: Application to CDO pricing.” Comput. Manag. Sci. 11 (2014): 341–364. [Google Scholar] [CrossRef]
N. Zhao, and W.T. Lin. “A copula entropy approach to correlation measurement at the country level.” Appl. Math. Comput. 218 (2011): 628–642. [Google Scholar] [CrossRef]
A. Golan, G. Judge, and D. Miller. Maximum Entropy Econometrics: Robust Estimation with Limited Data. New York, NY, USA: John Wiley & Sons, 1996. [Google Scholar]
X. Wu. “Calculation of maximum entropy densities with application to income distribution.” J. Econom. 115 (2003): 347–354. [Google Scholar] [CrossRef]
X. Wu, and J.M. Perloff. “GMM estimation of a maximum entropy distribution with interval data.” J. Econom. 138 (2007): 532–546. [Google Scholar] [CrossRef]
A. Zellner, and R.A. Highfield. “Calculation of maximum entropy distributions and approximation of marginal posterior distributions.” J. Econom. 37 (1988): 195–209. [Google Scholar] [CrossRef]
A. Sklar. Fonctions de Repartition n Dimensions et Leurs Marges. Paris, France: Publications de l’Institut Statistique de l’Universit de Paris, 1959, pp. 229–231. [Google Scholar]
J. Hajek, and Z. Sidak. Theory of Rank Tests. New York, NY, USA: Academic Press, 1967. [Google Scholar]
C. Genest, and J.-F. Plante. “On Blest’s measure of rank correlation.” Can. J. Stat. 31 (2003): 1–18. [Google Scholar] [CrossRef]
R.A. Gideon, and R.A. Hollister. “A rank correlation coefficient resistant to outliers.” J. Am. Stat. Assoc. 82 (1987): 656–666. [Google Scholar] [CrossRef]
J.L. Troutman. Variational Calculus and Optimal Control, 2nd ed. New York, NY, USA; Berlin, Heidelberg, Germany: Springer, 1996. [Google Scholar]
T. Csendes. “Nonlinear parameter estimation by global optimization—Efficiency and reliability.” Acta Cybern. 8 (1988): 361–370. [Google Scholar]
W. Hardel, M. Muller, S. Sperlich, and A. Werwatz. Nonparametric and Semiparametric Models. Springer Series in Statistics; Berlin, Heidelberg, Germany; New York, NY, USA: Springer-Verlag, 2004. [Google Scholar]
M. Omelka, I. Gijbels, and N. Veraverbeke. “Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing.” Ann. Stat. 37 (2009): 3023–3058. [Google Scholar] [CrossRef]
C. Genest. “Frank’s family of bivariate distributions.” Biometrika 74 (1987): 549–555. [Google Scholar] [CrossRef]
R.B. Nelsen. “Properties of a one-parameter family of bivariate distributions with specified marginals.” Commun. Stat. Theory Methods 15 (1986): 3277–3285. [Google Scholar] [CrossRef]
S.X. Chen, and T.-M. Huang. “Nonparametric estimation of copula functions for dependence Modelling.” Can. J. Stat. 35 (2007): 265–282. [Google Scholar] [CrossRef]
I. Gijbels, and J. Mielniczuk. “Estimating the density of a copula function.” Commun. Stat. - Theory Methods 19 (1990): 445–464. [Google Scholar] [CrossRef]
A. Pagan, and A. Ullah. Nonparametric Econometric, 1st ed. Themes in Modern Econometrics; Cambridge, UK: Cambridge University Press, 1999. [Google Scholar]
A.N. Shiryaev. Probability, 2nd ed. Graduate text in mathematics; New York, NY, USA; Berlin, Heidelberg, Germany: Springer-Verlag, 1995, Volume 95. [Google Scholar]
A.D. Ioffe, and V.M. Tihomirov. Theory of Extremal Problems. Edited by J.L. Lions, G. Papanocolalaou and R.T. Rockafellar. Studies in mathematics and its applications; Amsterdam, The Netherlands; New York, NY, USA; Oxford, UK: North Holland Publishing Company, 1979, Volume 6. [Google Scholar]
Y.A. Kutoyants. Statistical Inference for Ergodic Diffusion Processes. Springer series in statistics; London, UK; Berlin, Heidelberg, Germany: Springer-Verlag, 2004. [Google Scholar]
M. Abramowitz, and I.A. Stegun. Handbook of Mathematical Functions. New York, NY, USA: Dover Publications, 1972. [Google Scholar]
D.P. Bertsekas. Convex Analysis and Optimization. Belmont, MA, USA: Athena Scientific, 2003. [Google Scholar]

^1.We are indebted to a referee for pointing this out.
^2.We are indebted to a referee for suggesting to us this point.
^3.Jeffreys, H. (1961). Theory of Probability. Oxford: Clarendon, pp. 2–3.

Table 1. Blest’s measures of rank correlation.

**Table 1.** Blest’s measures of rank correlation.
Measures of Association	Rank Correlation
Spearman’s rho: $ρ_{S} = 12 \int_{{[0, 1]}^{2}} u v c (u, v) d u d v - 3$ , and $ρ_{S} \in [- 1, 1]$ ,	${\hat{ρ}}_{S} = \frac{12}{N^{3} - N} \sum_{i = 1}^{N} R_{i} S_{i} - 3 \frac{N + 1}{N - 1}$ .
Blest’s measure I: $ν_{1} = 2 - 12 \int_{{[0, 1]}^{2}} {(1 - u)}^{2} v c (u, v) d u d v$ , and $ν_{1} \in [- 1, 1]$ ,	${\hat{ν}}_{1} = \frac{2 N + 1}{N - 1} - \frac{12}{N^{2} - N} \sum_{i = 1}^{N} {(1 - \frac{R_{i}}{N + 1})}^{2} S_{i}$ .
Blest’s measure II: $ν_{2} = 2 - 12 \int_{{[0, 1]}^{2}} u {(1 - v)}^{2} c (u, v) d u d v$ , and $ν_{2} \in [- 1, 1]$ ,	${\hat{ν}}_{2} = \frac{2 N + 1}{N - 1} - \frac{12}{N^{2} - N} \sum_{i = 1}^{N} R_{i} {(1 - \frac{S_{i}}{N + 1})}^{2}$ .
Blest’s measure III: $η = 6 \int_{{[0, 1]}^{2}} u^{2} v^{2} c (u, v) d u d v - \frac{1}{5}$ , and $η \in [0, 1]$ ,	$\hat{η} = \frac{6}{N^{2} - N} \sum_{i = 1}^{N} {(\frac{R_{i}}{N + 1})}^{2} {(\frac{S_{i}}{N + 1})}^{2} - \frac{(1 / 5) N + 1}{N - 1}$ .
Blest’s measure IV: $ϕ = \int_{{[0, 1]}^{2}} [10 {(1 - u)}^{3} v - 3 u^{2} v^{2}] c (u, v) d u d v - 9 / 10$ , and $ϕ \in [- 1, 1]$ ),	$\hat{ϕ} = \frac{1}{N^{2} - N} \sum_{i = 1}^{N} [{(1 - \frac{R_{i}}{N + 1})}^{3} \frac{S_{i}}{N + 1}$ $- {(\frac{R_{i}}{N + 1})}^{2} {(\frac{S_{i}}{N + 1})}^{2}] - \frac{0.9 N + 1}{N - 1}$ .

Table 2. IMSE for the MECC and parametric copulas: Frank copula as the true copula.

**Table 2.** IMSE for the MECC and parametric copulas: Frank copula as the true copula.
Copula	$θ = 0.1$ (Kendall’s $τ = 0.011$ )			$θ = 0.8$ (Kendall’s $τ = 0.1$ )
Copula	Int. ${Bias}^{2}$	Int. Var.	IMSE	Int. ${Bias}^{2}$	Int. Var.	IMSE
Normal	27.9150	1.0416 $\times e^{- 13}$	27.9150	28.3512	1.0641 $\times e^{- 13}$	28.3512
Clayton	0.0002	2.8924 $\times e^{- 15}$	0.0002	0.0176	2.9086 $\times e^{- 15}$	0.0176
Gumbel	0.0002	2.8745 $\times e^{- 15}$	0.0002	0.0203	2.6789 $\times e^{- 15}$	0.0203
Student’s t	1203.8600	2.9161 $\times e^{- 12}$	1203.8600	955.8010	3.3985 $\times e^{- 12}$	955.8010
LLS kernel	0.6180	0.0004	0.6185	0.6290	0.0004	0.6294
MRS kernel	0.6180	0.0004	0.6183	0.6290	0.0004	0.6292
MECC(4,1)	0.0028	0.0342	0.0371	0.0083	0.0048	0.0132
MECC(16,1)	0.0149	0.0014	0.0164	0.0157	0.0016	0.0173
MECC(64,1)	0.0126	0.0007	0.0133	0.0158	0.0005	0.0163
MECC(4,2)	0.0145	0.0878	0.1023	0.0132	0.0674	0.0806
MECC(16,2)	0.0170	0.0070	0.0241	0.0182	0.0074	0.0256
MECC(64,2)	0.0112	0.0017	0.0130	0.0134	0.0040	0.0175
MECC(4,3)	0.2667	11.4610	11.7277	0.1528	6.8401	6.9929
MECC(16,3)	0.0192	0.0804	0.0997	0.0196	0.1781	0.1977
MECC(64,3)	0.0184	0.0159	0.0343	0.0104	0.1360	0.1464
MECC(4,4)	0.1858	10.3994	10.5852	0.6811	25.2689	25.9500
MECC(16,4)	0.0487	1.7230	1.7718	0.0755	2.1647	2.2402
MECC(64,4)	0.0302	0.2109	0.2411	0.0230	0.1424	0.1655

Note: MECC(L, M denotes the MECC estimated by using L marginal constraints and M moment constraints. All the figures are rounded to four decimal places. LLS is the “shrinked” version of the local linear-type kernel estimator of a copula [36,39]. MRS is the “shrinked” version of the mirror-reflection kernel estimator of a copula [36,40].

Table 3. IMSE for the MECC and parametric copulas: Clayton copula as the true copula.

**Table 3.** IMSE for the MECC and parametric copulas: Clayton copula as the true copula.
Copula	$θ = 0.1$ (Kendall’s $τ = 0.05$ )			$θ = 0.8$ (Kendall’s $τ = 0.3$ )
Copula	Int. ${Bias}^{2}$	Int. Var.	IMSE	Int. ${Bias}^{2}$	Int. Var.	IMSE
Normal	28.0325	1.0580 $\times e^{- 13}$	28.0325	33.8124	1.2673 $\times e^{- 13}$	33.8124
Frank	0.0137	0.0035	0.0172	0.1751	6.4080 $\times e^{- 11}$	0.1751
Gumbel	0.0990	4.6433 $\times e^{- 7}$	0.0990	4.1741	3.9633 $\times e^{- 7}$	4.1741
Student’s t	73.7077	4.6044	78.3122	23.1521	0.8711	24.0233
LLS kernel	0.6251	0.0005	0.6256	0.7183	0.0003	0.7186
MRS kernel	0.6253	0.0005	0.6256	0.7199	0.0003	0.7201
MECC(4,1)	0.0363	0.0907	0.1271	0.1006	0.1244	0.2251
MECC(16,1)	0.0245	0.0014	0.0259	0.0329	0.0026	0.0356
MECC(64,1)	0.0070	0.0009	0.0079	0.0188	0.0011	0.0200
MECC(4,2)	0.0149	0.6796	0.6946	0.2090	3.2983	3.5073
MECC(16,2)	0.0226	0.0049	0.0275	0.0821	0.0142	0.0964
MECC(64,2)	0.1801	0.1165	0.2966	0.2485	0.1658	0.4143
MECC(4,3)	0.2833	9.1560	9.4393	0.4628	17.1669	17.6297
MECC(16,3)	0.0304	0.4528	0.4833	0.4002	0.2889	0.6891
MECC(64,3)	0.0110	0.1303	0.1414	0.2430	0.1982	0.4412
MECC(4,4)	0.2993	14.8533	15.1526	1.8968	41.9359	43.8328
MECC(16,4)	0.0596	1.5987	1.6583	0.2976	2.1513	2.4489
MECC(64,4)	0.0608	0.2037	0.2645	0.2659	0.3957	0.6616

Note: MECC(L, M denotes the MECC estimated by using L marginal constraints and M moment constraints. All the figures are rounded to four decimal places. LLS is the “shrinked” version of the local linear-type kernel estimator of a copula [36,39]. MRS is the “shrinked” version of the mirror-reflection kernel estimator of a copula [36,40].

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chu, B.; Satchell, S. Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence. Econometrics 2016, 4, 20. https://doi.org/10.3390/econometrics4020020

AMA Style

Chu B, Satchell S. Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence. Econometrics. 2016; 4(2):20. https://doi.org/10.3390/econometrics4020020

Chicago/Turabian Style

Chu, Ba, and Stephen Satchell. 2016. "Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence" Econometrics 4, no. 2: 20. https://doi.org/10.3390/econometrics4020020

APA Style

Chu, B., & Satchell, S. (2016). Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence. Econometrics, 4(2), 20. https://doi.org/10.3390/econometrics4020020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence

Abstract

1. Introduction

2. Recovering the Most Entropic Copulas

2.1. Maximum Entropy and Copula

2.2. The Most Entropic Copula

2.3. Large Sample Properties with Unknown Parameters of Dependence

3. Simulation

4. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

A. Known Results

B. Auxiliary Results

C. Approximation of Potential Functions

D. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI