Canonical Correlations and Nonlinear Dependencies

Loperfido, Nicola Maria Rinaldo

doi:10.3390/sym13071308

Open AccessArticle

Canonical Correlations and Nonlinear Dependencies

by

Nicola Maria Rinaldo Loperfido

Dipartimento di Economia, Società e Politica, Università degli Studi di Urbino “Carlo Bo”, Via Saffi 42, 61029 Urbino (PU), Italy

Symmetry 2021, 13(7), 1308; https://doi.org/10.3390/sym13071308

Submission received: 6 July 2021 / Revised: 17 July 2021 / Accepted: 19 July 2021 / Published: 20 July 2021

(This article belongs to the Special Issue Symmetry and Asymmetry in Multivariate Statistics and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Canonical correlation analysis (CCA) is the default method for investigating the linear dependence structure between two random vectors, but it might not detect nonlinear dependencies. This paper models the nonlinear dependencies between two random vectors by the perturbed independence distribution, a multivariate semiparametric model where CCA provides an insight into their nonlinear dependence structure. The paper also investigates some of its probabilistic and inferential properties, including marginal and conditional distributions, nonlinear transformations, maximum likelihood estimation and independence testing. Perturbed independence distributions are closely related to skew-symmetric ones.

Keywords:

canonical correlations; central symmetry; skew-symmetric distribution; sign symmetry

1. Introduction

Canonical correlation analysis is a multivariate statistical method purported to analyze the correlation structure between two random vectors

x = (\begin{matrix} X_{1} \\ \dots \\ X_{p} \end{matrix}) and y = (\begin{matrix} Y_{1} \\ \dots \\ Y_{q} \end{matrix}) .

It obtains the linear transformations

z = Ax = (\begin{matrix} Z_{1} \\ \dots \\ Z_{p} \end{matrix}) and w = By = (\begin{matrix} W_{1} \\ \dots \\ W_{q} \end{matrix}) .

where the only nonnull correlations are those between components of

z

and

w

with the same indices, that is

(\begin{matrix} Z_{1} \\ W_{1} \end{matrix}), \dots, (\begin{matrix} Z_{r} \\ W_{r} \end{matrix}) with r = min (p, q) .

The random vector

{(Z_{1}, W_{1})}^{⊤}

is the first canonical pair and the correlation between its components, that is, the first canonical correlation is the highest among all correlations between a projection of

x

and a projection of

y

. Similarly, the random vector

{(Z_{i}, W_{i})}^{⊤}

is the i-th canonical pair and the correlation between its components, that is, the i-th canonical correlation is the highest among all correlations between a projection of

x

and a projection of

y

, which are orthogonal to the previous canonical pairs, for

i \in {2, \dots, r}

.

Canonical correlation analysis is particularly appropriate when the joint distribution of the vectors

x

and

y

is multivariate normal but it often performs poorly when the data are nonnormal [1]. The problem has been addressed nonparametrically [2], semiparametrically [1] and parametrically [3]. In this paper we introduce a semiparametric model to investigate the nonlinear dependence structure by means of canonical correlations. Kernel canonical correlation analysis (KCCA) and distance canonical correlation analysis (DCCA) play a prominent role among nonparametric generalizations of CCA aimed at addressing nonlinear dependencies (see, e.g., [4,5]).

The main contributions of the paper are as follows. Firstly, it defines the perturbed independence distribution as a statistical model for the joint distribution of two random vectors. The proposed model is somewhat reminiscent of copula models, in that the parameters addressing the dependence structure between two random vectors do not appear in the marginal distributions of the vectors themselves; however, the generating mechanism of perturbed independence distributions is very different from those of ordinary copulas.

Secondly, the perturbed independence model allows for flexible and tractable modeling of the nonlinear dependence structure between two random vectors, since the conditional distribution of a random vector with respect to the other is skew-symmetric. The proposed model provides a parametric interpretation of KCCA and DCCA, which are commonly regarded as nonparametric multivariate methods.

Thirdly, some appealing properties of canonical correlation analysis that hold true in the normal case still hold true in the perturbed independence case. For example, the first (second) component of a canonical pair is independent from the second (first) component of any other canonical pair. Further, if the marginal distributions of the two given vectors are normal, any canonical pair is independent of any other canonical pair.

Fourthly, the paper investigates the bivariate perturbed independence models within the framework of positive and negative association. In particular, it shows that the canonical pairs obtained from a perturbed independence distribution have the desirable properties of being positive quadrant dependent, under mild assumptions on the perturbing function.

The rest of the paper is structured as follows. Section 2 defines perturbed independence distributions and states some of their probabilistic and inferential properties. Section 3 connects perturbed independence distributions, canonical correlation analysis, positive dependence orderings and ordinal measures of association. Section 4 uses both theoretical and empirical results to find nonlinear transformations that increase correlations. Appendix A contains all proofs.

2. Model

This section defines the perturbed independence model, states its invariance properties and the independence properties of its canonical pairs. The theoretical results are illustrated with the bivariate distribution

2 ϕ (x) ϕ (y) Φ (λ x y)

introduced by [6,7], where

ϕ (\cdot)

and

Φ (\cdot)

denote the probability and the cumulative density functions of a standard normal distribution, while

λ

is a real value. Ref. [8] thoroughly investigated its properties and proposed some generalizations.

A p-dimensional random vector

x

is centrally symmetric (simply symmetric, henceforth) if there is a p-dimensional real vector

ξ

such that

x - ξ

and

ξ - x

are identically distributed ([9]). A real-valued function

π (\cdot)

is a skewing function (also known as perturbing function) if it satisfies the equality

π (- a) = 1 - π (a)

and the inequalities

0 \leq π (a) \leq 1

for any real vector

a

[10]. The probability density function of a perturbed independence model is twice the product of two symmetric probability density functions and a skewing function evaluated at a bilinear function of the outcomes. A more formal definition follows.

Definition 1.

Let the joint distribution of the random vectors

x

and

y

be

f (x, y) = 2 h (x - μ) k (y - ν) π \{{(y - ν)}^{⊤} Ψ (x - μ)\},

where

h (\cdot)

is the pdf of a p-dimensional, centrally symmetric distribution,

k (\cdot)

is the pdf of a q-dimensional, centrally symmetric distribution, Ψ is a

q \times p

matrix and

π (\cdot)

is a function satisfying

0 \leq π (- a) = 1 - π (a) \leq 1

for any real value a. We refer to this distribution as to a perturbed independence model, with components

h (\cdot)

and

k (\cdot)

, location vectors

μ

and

ν

, perturbing function

π (\cdot)

and association matrix Ψ.

In the bivariate distribution

2 ϕ (x) ϕ (y) Φ (λ x y)

, both components coincide with the normal pdf, both location vectors coincide with the origin, the perturbing function is the standard normal cdf and the association matrix is the scalar parameter

λ

.

Random numbers having a perturbed independence distribution can be generated in a very simple way. For the sake of simplicity, we illustrate it in the simplified case where

μ

and

ν

are null vectors and

π (\cdot)

is a cumulative distribution function of a distribution symmetric at the origin. First, generate the vectors

u

and

v

from the densities

h (\cdot)

and

k (\cdot)

. Second, generate the scalar r from the distribution whose cumulative density function is

π (\cdot)

. Third, let the vector

w

be

{(u^{⊤}, v^{⊤})}^{⊤}

if the bilinear form

u^{⊤} Ψ v

is greater than r and either

{(- u^{⊤}, v^{⊤})}^{⊤}

or

{(u^{⊤}, - v^{⊤})}^{⊤}

in the opposite case. Then, the distribution of

w

is perturbed independence with components

h (\cdot)

and

k (\cdot)

, null location vectors, perturbing function

π (\cdot)

and association matrix

Ψ

.

The bivariate distribution

2 ϕ (x) ϕ (y) Φ (λ x y)

might be generated as follows. First, generate three mutually independent, standard normal random numbers U, W and Z. Second, set X equal to U and Y equal to W if the product

λ U W

is greater than Z. Otherwise, set X equal to

- U

and Y equal to W. Then the joint distribution of X and Y is

2 ϕ (x) ϕ (y) Φ (λ x y)

.

A p-dimensional probability density function

2 g (a - ξ) π (a - ξ)

is skew-symmetric with kernel

g (\cdot)

(i.e., a probability density function symmetric at the origin), location vector

ξ

and skewing function function

π (\cdot)

. The function

g (\cdot)

would be more precisely denoted by

g_{p} (\cdot)

, since it depends on the dimension of the corresponding random vector. However, we use

g (\cdot)

instead of

g_{p} (\cdot)

to relieve the notational burden. Ref. [11] discuss hypothesis testing on

g (\cdot)

for any choice of function

π (\cdot)

. The most widely studied skew-symmetric distributions are the linearly skewed distributions, where the skewing function depends on

a - ξ

only through its linear function

α^{⊤} (a - ξ)

, as it happens for the multivariate skew-normal case. [12], as well as [13], investigated their inferential properties. Ref. [14] used them to motivate kurtosis-based projection pursuit.

In the notation of the above definition, the first part of the following theorem states that the marginal distributions of

x

and

y

are

h (x - μ)

and

k (y - ν)

. Thus, perturbed independent distributions separately model the marginal distributions and the association between two random vectors, and constitute an alternative to copulas. The second part of the following theorem states that the conditional distribution of a component with respect to the other is linearly skewed. Hence, the association between the two components has an analytical form, which has been thoroughly investigated.

Theorem 1.

Let the random vectors

x

and

y

have a perturbed independence distribution with components

h (\cdot)

,

k (\cdot)

and location vectors

μ

,

ν

. Then the following statements hold true.

The marginal probability density functions of $x$ and $y$ are $h (x - μ)$ and $k (y - ν)$ .
The conditional probability density functions of $x$ given $y$ and $y$ given $x$ are skew-symmetric with kernels $h (\cdot)$ and $k (\cdot)$ , while the associated location vectors are $μ$ and $ν$ .

The marginal distributions of

2 ϕ (x) ϕ (y) Φ (λ x y)

are standard normal:

X ~ N (0, 1)

and

Y ~ N (0, 1)

. The conditional distributions are skew-normal: the probability density functions of

X | Y = y ~ SN (λ y)

and of

Y | X = x ~ SN (λ x)

are

2 ϕ (x) Φ (λ x y)

and

2 ϕ (y) Φ (λ x y)

. The sign of the correlation between X and Y is the same as the sign of

λ

but the two random variables are nonlinearly dependent [7]:

E (X | Y = y) = \sqrt{\frac{2}{π}} \frac{λ y}{\sqrt{1 + λ^{2} y^{2}}}, E (Y | X = x) = \sqrt{\frac{2}{π}} \frac{λ x}{\sqrt{1 + λ^{2} x^{2}}} .

There is a close connection between order statistics and either skew-normal distributions or their generalizations. For example, any linear combination of the minimum and the maximum of a bivariate, exchangeable and elliptical random vector is skew-elliptical [15]. In particular, any skew-normal distribution might be represented as the maximum or the minimum of a bivariate, normal and exchangeable random vector. At present, it is not clear whether there exists a meaningful connection between order statistics and perturbed independence distributions, which would ease both the interpretation and the application of these distributions.

The mean vector

m

and the covariance matrix

S

of the

n \times d

data matrix

X

are statistically independent, if the rows of

X

are a random sample from a multivariate normal distribution. As a direct consequence, the components of the pairs

(m_{1}, S_{2})

and

(m_{2}, S_{1})

are statistically independent, too, where

m_{1}

and

S_{1}

(

m_{2}

and

S_{2}

) are the mean vector and the covariance matrix of

X_{1}

(

X_{2}

), that is the data matrix whose columns coincide with the first

0 < p < d

(the last

d - p

) columns of

X

. The same property holds true for perturbed independence models, as a corollary of the following theorem.

Theorem 2.

Let the random vectors

x

and

y

have the perturbed independence distribution with location vectors

μ

and

ν

. Then any even function of

x - μ

is independent of

y

. Similarly, any even function of

y - ν

is independent of

x

.

Let the joint distribution of the random variables X and Y be

2 ϕ (x) ϕ (y) Φ (λ x y)

. Then Y and

X^{2}

are mutually independent. Similarly, X and

Y^{2}

are mutually independent.

The components of the canonical covariates

z = {(Z_{1}, \dots, Z_{p})}^{⊤}

and

w = {(W_{1}, \dots, W_{q})}^{⊤}

are uncorrelated when their indices differ:

ρ (Z_{i}, Z_{j}) = ρ (W_{i}, W_{j}) = ρ (Z_{i}, W_{j}) = 0 when i \neq j .

A p-dimensional random vector

v

is said to be sign-symmetric if there is a p-dimensional real vector

u

such that

v - u

and

U (v - u)

are identically distributed, where

U

is any

p \times p

diagonal matrix whose diagonal elements are either 1 or

- 1

[9]. For example, spherical random vectors are sign-symmetric. The following theorem shows that the canonical covariates belonging to different canonical vectors and with different indices are independent, if the joint distribution of the original variables is perturbed independence with sign-symmetric components.

Theorem 3.

Let the random vectors

x \in R^{p}

and

y \in R^{q}

have a perturbed independence distribution with sign-symmetric components. Further, let

z = {(Z_{1}, \dots, Z_{p})}^{⊤}

and

w = {(W_{1}, \dots, W_{q})}^{⊤}

be the canonical covariates of

x

and

y

. Then

Z_{i}

and

W_{j}

are independent when

i \neq j

.

Under normal sampling, the components of different canonical pairs are statistically independent. The following corollary of the above theorem shows that the same property still holds true when the original variables have a perturbed independence distribution with normal components.

Corollary 1.

Let the random vectors

x \in R^{p}

and

y \in R^{q}

have a perturbed independence distribution with normal components. Further, let

z = {(Z_{1}, \dots, Z_{p})}^{⊤}

and

w = {(W_{1}, \dots, W_{q})}^{⊤}

the canonical covariates of

x

and

y

. Then the variables

Z_{i}

,

Z_{j}

,

W_{i}

and

W_{j}

are pairwise independent when

i \neq j

.

As remarked by [16], the default measures of multivariate skewness and kurtosis are those introduced by [17]. Mardia’s skewness is the sum of all squared, third-order, standardized moments, while Mardia’s kurtosis is the fourth moment of the Mahalanobis distance of the random vector from its mean.

Mardia’s kurtosis of

2 ϕ (x_{1}) ϕ (x_{2}) Φ (λ x_{1} x_{2})

is

\frac{8 + 4 E^{2} (X Y)}{{\{1 - E^{2} (X Y)\}}^{2}},

so that it increases with the squared correlation between X and Y ([18]).

It is tempting to generalize

2 ϕ (x) ϕ (y) Φ (λ x y)

by letting

2 Φ (λ \underset{i = 1}{\prod^{p}} x_{i}) \underset{i = 1}{\prod^{p}} ϕ (x_{i}),

as performed in [6,7]. Unfortunately, this model does not preserve the nonlinear associations between pairs of its components. For example, the joint bivariate marginals of the trivariate distribution

2 ϕ (x) ϕ (y) ϕ (z) Φ (λ x y z)

are bivariate, standard normal random vectors [19]. Other generalizations of

2 ϕ (x) ϕ (y) Φ (λ x y)

have been proposed by [8].

Let

f_{xy}

be the joint probability density function of the p-dimensional random vector

x

and of the q-dimensional random vector

y

. Further, let

f_{x}

and

f_{y}

be the marginal probability density functions of

x

and

y

. The distance covariance between

x

and

y

with respect to the weight function w is

V^{2} (x, y, w) = \int_{R^{p + q}} {|f_{xy} (t, s) - f_{x} (t) f_{y} (s)|}^{2} w (t, s) d t d s,

where

t \in R^{p}

,

s \in R^{q}

and

w (t, s) \geq 0

[20]. If the joint distribution of

x

and

y

is a perturbed independence model with components

h (\cdot)

and

k (\cdot)

, location vectors

μ

and

ν

, perturbing function

π (\cdot)

and association matrix

Ψ

we have

\begin{matrix} f_{xy} (t, s; μ, ν, Ψ) = 2 h (t - μ) k (s - ν) π \{{(s - ν)}^{⊤} Ψ (t - μ)\}, \\ f_{x} (t; μ) = h (t - μ), f_{y} (s; ν) = k (s - ν), π (- a) = 1 - π (a) . \end{matrix}

A little algebra leads to the identities

\begin{matrix} f_{xy} (t, s) - f_{x} (t) f_{y} (s) = 2 h (t - μ) k (s - ν) π \{{(s - ν)}^{⊤} Ψ (t - μ)\} - h (t - μ) k (s - ν) = \\ h (t - μ) k (s - ν) [2 π \{{(s - ν)}^{⊤} Ψ (t - μ)\} - 1] = \frac{f_{xy} (t, s; μ, ν, Ψ) - f_{xy} (t, s; μ, ν, - Ψ)}{2} . \end{matrix}

Hence, for perturbed independence models, the distance covariance is just half the difference between

f_{xy} (t, s; μ, ν, Ψ)

and

f_{xy} (t, s; μ, ν, - Ψ)

, which is the probability density functions of

(\begin{matrix} x \\ y \end{matrix}) and (\begin{matrix} x \\ - y \end{matrix}) .

In particular, if the joint distribution of the random variables X and Y is

2 ϕ (x) ϕ (y) Φ (λ x y)

we have

f_{X Y} (t, s) - f_{X} (t) f_{Y} (s) = ϕ (x) ϕ (y) \{Φ (λ x y) - Φ (- λ x y)\} .

3. Concordance

This section investigates the bivariate perturbed independence models within the framework of positive and negative association. In particular, it shows that the canonical pairs obtained from a perturbed independence distribution have the desirable properties of being positive quadrant dependent, under mild assumptions on the perturbing function. The seminal paper by [21] started a vast literature on dependence orderings and their connections with ordinal measures of association. For the sake of brevity, here we mention only some thorough reviews of the concepts in this section: [22,23,24,25,26,27].

Two random variables are said to be either concordant, positively associated or positively dependent if larger (smaller) outcomes of one of them often occur together with larger (smaller) outcomes of the other random variable. Conversely, two random variables are said to be either discordant, negatively associated or negatively dependent if larger (smaller) outcomes of one of them often occur together with smaller (larger) outcomes of the other random variable. For example, financial returns from different markets are known to be positively dependent (see, e.g., [28,29,30]). The degree of concordance or discordance is assessed with ordinal measures of association, of which the most commonly used are Pearson’s correlation (simply correlation, for short), Spearman’s rho and Kendall’s tau.

The correlation is the best known measure of ordinal association. The correlation between two random variables X and Y is

ρ (X, Y) = \frac{E \{(X - μ_{X}) (Y - μ_{Y})\}}{\sqrt{E \{{(X - μ_{X})}^{2}\} \cdot E \{{(Y - μ_{Y})}^{2}\}}},

where

μ_{X}

and

μ_{Y}

are the expectations of X and Y. The ordinal association between two random variables might be decomposed into a linear component and a nonlinear component. The liner component refers to the tendency of the random variables to deviate from their means in a proportional way. The correlation only detects and measures the linear component of the ordinal association. When the nonlinear component is not negligible, the information conveyed by the correlation needs to be integrated with information from other measures of ordinal association.

Spearman’s rho, also known as Spearman’s correlation, between the random variables X and Y is the correlation between the two variables after being transformed according to their marginal cumulative distribution functions:

ρ_{S} (X, Y) = ρ \{F_{X} (X), F_{Y} (Y)\},

where

F_{X} (\cdot)

and

F_{Y} (\cdot)

are the marginal cumulative distribution functions of X and Y. Its sample counterpart is the correlation between the observed ranks. Spearman’s rho is a measure of ordinal association detecting both linear and nonlinear dependence. It is also more robust to ouliers than the Pearson’s correlation.

Kendall’s tau, also known as Kendall’s correlation, between two random variables is the difference between their probability of concordance and their probability of discordance. The former (latter) is the probability that the difference between the first components of two independent outcomes from a bivariate distribution have the same sign of (a different sign than) the difference between the second components of the same pairs. More formally, Kendall’s tau between the random variables X and Y is

τ (X, Y) = Pr \{(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0\} - Pr \{(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0\},

where

{(X_{1}, Y_{1})}^{⊤}

and

{(X_{2}, Y_{2})}^{⊤}

are two independent outcomes from the bivariate random vector

{(X, Y)}^{⊤}

. Just like Spearman’s rho, Kendall’s tau is an ordinal measure of association detecting linear as well as nonlinear dependence and is more robust to outliers than Pearson’s correlation.

Unfortunately, Pearson’s correlation, Spearman’s rho and Kendall’s tau might take different signs, thus making it difficult to measure ordinal association. In order to prevent this from happening, it is convenient to impose some constraints on the bivariate distribution. The distribution of a bivariate random vector

{(X, Y)}^{⊤}

is said to be positively quadrant dependent (PQD) if its joint cdf is greater or equal than the product of the marginal cdf:

F_{X, Y} (x, y) \geq F_{X} (x) \cdot F_{Y} (y)

for any two real values x and y. Similarly, the distribution of a bivariate random vector

{(X, Y)}^{⊤}

is said to be negatively quadrant dependent (PQD) if its joint cdf is either smaller or equal than the product of the marginal cdf:

F_{X, Y} (x, y) \leq F_{X} (x) \cdot F_{Y} (y)

for any two real values x and y. Pearson’s correlation, Spearman’s rho and Kendall’s tau of PQD (NQD) distributions are either null or have positive (negative) signs.

Independent random variables are special cases of PQD and NQD random variables. In order to rule this case out, the PQD and NQD condition can be made more restrictive that the above inequalities needs to be strict for measurable sets of x and y values. For example, a strictly positive quadrant dependent pair of random variables satisfies the inequality

F_{X, Y} (x, y) > F_{X} (x) \cdot F_{Y} (y)

for any two real values x and y belonging to given interval of positive length. Pearson’s correlation, Spearman’s rho and Kendall’s tau of strictly positive (negative) quadrant dependent distributions have positive (negative) signs. As shown in the following theorem, a bivariate perturbed independence model is strictly positive (negative) quadrant dependent if the perturbing function is a cumulative density function and the association parameter is a positive (negative) scalar.

Theorem 4.

Let the joint distribution of the random variables X and Y be perturbed independent with components

h (\cdot)

and

k (\cdot)

, perturbing function

π (\cdot)

and association parameter λ:

f (x, y) = 2 h (x) k (y) π (λ x y)

. Further, let

π (\cdot)

be the cumulative density function of a symmetric distribution. Then the random variables X and Y are strictly positive (negative) quadrant dependent when λ is positive (negative).

The joint distribution

2 ϕ (x) ϕ (y) Φ (λ x y)

of the bivariate random vector

{(X, Y)}^{⊤}

introduced in the previous section fulfills the assumptions in Theorem 5. In particular, if the association parameter

λ

is positive, the random variables X and Y are strictly positive quadrant dependent:

F_{X, Y} (a, b) = \int_{- \infty}^{b} \int_{- \infty}^{a} 2 ϕ (x) ϕ (y) Φ (λ x y) d x d y > Φ (a) Φ (b) = F_{X} (a) F_{Y} (b),

for any two real values a and b. As a direct consequence, their Pearson’s correlation

ρ (X, Y)

, their Spearman’s rho

ρ_{S} (X, Y)

and their Kendall’s tau

τ (X, Y)

are positive.

Pearson’s correlation between the components of a canonical pair is nonnegative. However, within a nonparametric framework, their Spearman’s rho and their Kendall’s tau can take any sign. When Pearson’s correlation between the components of a canonical pair is positive but their Spearman’s rho and their Kendall’s tau are negative, the former ordinal association measure becomes quite unreliable and canonical correlation analysis provides little insight into the dependence structure. This problem does not occur under a perturbed independence model satisfying the assumptions stated in the following theorem.

Theorem 5.

Let

{(Z_{1}, W_{1})}^{⊤}

, …,

{(Z_{r}, W_{r})}^{⊤}

, with

r = min (p, q)

the canonical pairs obtained from a perturbed independence distribution, and let their density be

2 π (\underset{i = 1}{\sum^{r}} λ_{i} z_{i} w_{i}) \{\overset{r}{\prod_{i = 1}} h_{i} (z_{i})\} \{\overset{r}{\prod_{i = 1}} k_{i} (w_{i})\},

where

π (\cdot)

is a strictly increasing perturbing function. Then the joint distribution of the i-th canonical pair is a bivariate perturbed independence model:

2 h_{i} (z_{i}) k_{i} (w_{i}) G (λ_{i} z_{i} w_{i}),

where

G (\cdot)

is a strictly increasing perturbing function.

We illustrate the above theorem with the perturbed independence distribution

2 ϕ_{p} (x; Σ_{x}) ϕ_{p} (y; Σ_{y}) F (y^{⊤} Ψ x),

where

ϕ_{q} (\cdot; Ω)

is the q-dimensional normal density with null mean vector and covariance matrix

Σ

,

F (\cdot)

is the cdf of a continuous distribution symmetric at the origin and

Ψ

is a symmetric

p \times p

matrix. The distribution of the canonical variates

z = Ax = {(Z_{1}, . . ., Z_{p})}^{⊤}

and

w = By = {(W_{1}, . . ., W_{q})}^{⊤}

is

2 ϕ_{2 p} (z, w; I_{2 p}) F (\underset{i = 1}{\sum^{p}} λ_{i} z_{i} w_{i}),

which fulfills the assumptions in Theorem 6. Then the joint distribution of the i-th canonical pair

{(Z_{i}, W_{i})}^{⊤}

is

2 ϕ (z_{i}) ϕ (w_{i}) G_{i} (λ_{i} z_{i} w_{i}),

where

ϕ (\cdot)

is the pdf of a univariate, standard normal distribution and

G_{i} (\cdot)

is the cdf of a continuous distribution symmetric at the origin. By Theorem 5 and since the i-th canonical correlation

ρ (Z_{i}, W_{i})

is nonnegative, the association parameter

λ_{i}

, Kendall’s tau

τ (Z_{i}, W_{i})

and Spearman’s rho

ρ_{S} (Z_{i}, W_{i})

are nonnegative, too. Moreover, if

ρ (Z_{i}, W_{i})

is positive, the association parameter

λ_{i}

, Kendall’s tau

τ (Z_{i}, W_{i})

and Spearman’s rho

ρ_{S} (Z_{i}, W_{i})

are positive, too.

4. Nonlinearity

As a desirable property, CCA decomposes the covariance matrix between the p-dimensional random vector

x

and the q-dimensional random vector

y

into linear combinations of the covariances between uncorrelated linear functions of

x

and

y

. Ref. [31] thoroughly investigate the interpretation of CCA within the framework of linear dependence. The first output of CCA are the linear combinations of

x

and

y

, which are maximally correlated:

max_{a \in R_{0}^{p}, b \in R_{0}^{q}} c o r r (a^{⊤} x, b^{⊤} y),

where

R_{0}^{p}

and

R_{0}^{q}

are the sets of p-dimensional and q-dimensional nonnull, real vectors.

As mentioned in the Introduction and in the previous section, both the interpretability and the usefulness of CCA are severely diminished by nonlinear dependencies between

x

and

y

. A solution would be looking for the linear and nonlinear transformations of

x

and

y

, which are maximally correlated:

max_{g_{1}, g_{2} \in G} max_{a \in R_{0}^{p}, b \in R_{0}^{q}} c o r r \{g_{1} (a^{⊤} x), g_{2} (b^{⊤} y)\},

where

G

is the set of all real valued monotonic functions. In the general case, the maximization needs to be performed simultaneously with respect to the nonlinear functions

g_{1} (\cdot)

,

g_{2} (\cdot)

and the real vectors

a

,

b

, thus being difficult to compute and difficult to interpret. Ref. [1] addressed the problem by proposing the Gaussian copula model, where the components of

x

and

y

have a joint distribution that is multivariate normal, after being transformed according to monotonic and nonlinear functions. However, these monotonic transformations do not have a clear interpretation and they are not guaranteed to increase the correlations.

Perturbed independence models do not suffer from these limitations. Firstly, the monotonic transformations have a simple interpretation, being the expectations of one variable conditioned with respect to the other. Secondly, the same transformations are guaranteed to increase the correlations, under mild assumptions. These statements are made more precise in the following theorem.

Theorem 6.

Let the joint distribution of the random variables X and Y be perturbed independent with null location parameters, nonnull association parameter and increasing perturbing function. Finally, let

X

and

Y

have finite second moments. Then the conditional expectation

g (X) = E (Y | X)

is a monotone, odd and nonlinear function, while the correlation between Y and X is smaller than the correlation between

Y

and

g (X)

.

We illustrate the above theorem with the distribution

2 ϕ (x) ϕ (y) Φ (λ x y)

of the bivariate random vector

{(X, Y)}^{⊤}

introduced in Section 1. The conditional expectations of Y and X with respect to the outcomes x of X and y of Y are

E (Y | X = x) = \frac{λ x \sqrt{2 / π}}{\sqrt{1 + λ^{2} x^{2}}} and E (X | Y = y) = \frac{λ y \sqrt{2 / π}}{\sqrt{1 + λ^{2} y^{2}}},

so that the nonlinear function of X and Y maximally correlated with Y and X are

g_{1} (X) = E (Y | X) = \frac{λ X \sqrt{2 / π}}{\sqrt{1 + λ^{2} X^{2}}} and g_{2} (Y) = E (X | Y) = \frac{λ Y \sqrt{2 / π}}{\sqrt{1 + λ^{2} Y^{2}}} .

The above theorem does not guarantee that

E (Y | X)

is the nonlinear transformation of one component that is maximally correlated with Y, nor that such correlation is smaller than the correlation between

E (X | Y)

and

E (Y | X)

. We empirically address this point by simulating n = 10,000 bivariate data from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 2, 3, 4, 5, 6\}

. The left-hand scatterplots in Figure 1 clearly hint at positive dependence: more points lie in the first and in the third quadrants as the association parameter increases, despite the absence of the ellipsoidal shapes associated with bivariate normality. For each simulated sample, we computed Kendall’s tau, Spearman’s rho and Pearson’s correlation and report their values in Table 1. The three measures of ordinal association are positive and they increase with the association parameter, consistently with the theoretical results in Section 2. More surprisingly, Spearman’s rho is always greater than Kendall’s tau and Pearson’s correlation, unlike the bivariate normal distribution, where Pearson’s correlation is always greater than Kendall’s tau and Spearman’s rho.

Finally, for each simulated sample

(X_{1}, Y_{1})

,…,

(X_{n}, Y_{n})

, we computed Pearson’s correlation between

Z_{1}

, …,

Z_{n}

and

W_{1}

, …,

W_{n}

, where

Z_{i} = \frac{X_{i}}{\sqrt{1 + λ^{2} X_{i}^{2}}} and W_{i} = \frac{Y_{i}}{\sqrt{1 + λ^{2} Y_{i}^{2}}}

are proportional to the sample counterpart of the expectation of Y given

X = x

and X given

Y = y

under the model

2 ϕ (x) ϕ (y) Φ (λ x y)

. For each simulated sample, these correlations are always greater than the correlations between the original data, consistently with Theorem 7. Moreover, Pearson’s correlations between

W_{1}

, …,

W_{n}

and

Z_{1}

, …,

Z_{n}

are always greater their Spearman’s correlations. As shown in the right-hand scatterplots of Figure 1 and Figure 2, the transformed data lie at the lower left corner and at the upper right corner of a square. This pattern becomes more evident as the association parameter increases. The histograms of

Z_{1}

, …,

Z_{n}

in Figure 2 are symmetric and bimodal, with both modes at the ends of the observed range. Bimodality becomes more evident as the association parameter increases. The behavior of the transformed data

W_{1}

, …,

W_{n}

is virtually identical and therefore is not reported.

We conclude that perturbed independence distributions, by modeling the nonlinear association between random variables, might help in finding the nonlinear transformations that are maximally correlated to each other. A positive Pearson’s correlation much lower than Spearman’s rho and Kendall’s tau hints for the presence of nonlinear association, whose analytical form might be estimated by looking for the maximally correlated nonlinear transformations of the random variables. This approach is particularly appropriate for the single index regression model

Y = g (X) + ε

, where the response variable Y is the sum of a smooth function

g (\cdot)

of the predictor X and the error term

ε

. When

g (\cdot)

is monotone, its analytical form might be estimated by looking for the transformation

g (X)

that is maximally correlated with Y.

As remarked in the Introduction, kernel canonical correlation analysis (KCCA) and distance canonical correlation analysis (DCCA) are the two most popular generalizations of CCA aimed at dealing with nonlinear dependencies. A formal description of KCCA, based on Hilbert spaces and their inner products, might be found in the seminal papers by [32,33]. For most practical purposes, KCCA might be defined as the statistical method searching for linear projections of nonlinear functions of a random vector that are maximally correlated with linear projections of nonlinear functions of another random vector. Let

F

be a class of p-dimensional random vectors whose i-th components are nonlinear fuctions of the p-dimensional random vector

x

. Similarly, let

G

be a class of q-dimensional random vectors whose i-th components are nonlinear fuctions of the q-dimensional random vector

y

. Then KCCA looks for the random vectors

f \in F

,

g \in G

and for the real vectors

a \in R^{p}

,

b \in R^{q}

such that

a^{⊤} f

and

b^{⊤} g

are maximally correlated with each other.

In a nonparametric framework, the choice of the nonlinear functions may not be straightforward. On the other hand, in the perturbed independence framework, the theoretical and empirical results in this section suggest to set them equal to the conditional expectations:

f = E (x | y)

and

g = E (y | x)

. In particular, for the perturbed independence model

2 ϕ_{p} (x; μ, Ω) ϕ_{q} (y; ν, Γ) Φ \{{(y - ν)}^{⊤} Ψ (x - μ)\}

the suggested nonlinear functions of

x

and

y

are

f = μ + \sqrt{\frac{2}{π}} \frac{Ω Ψ^{⊤} (y - ν)}{\sqrt{1 + (y - ν)^{⊤} Ψ Ω Ψ^{⊤} (y - ν)}} and g = ν + \sqrt{\frac{2}{π}} \frac{Γ Ψ (x - μ)}{\sqrt{1 + {(x - μ)}^{⊤} Ψ^{⊤} Γ Ψ (x - μ)}} .

DCCA looks for two projections whose joint distribution differs the most from the product of their marginal distributions, where difference is measured by distance correlation. The distance correlation between the random variables X and Y with respect to the weight function w is

R (X, Y, w) = \frac{V (X, Y, w)}{\sqrt{V (X, X, w) V (Y, Y, w)}},

where

V^{2} (X, Y, w)

is the distance covariance between X and Y with respect to w, as defined in the previous section. Hence the first canonical correlation between the p-dimensional random vector

x

and the q-dimensional random vector

y

is

max_{a \in R^{p}} max_{b \in R^{q}} R (a^{⊤} x, b^{⊤} y, w) .

For other distance canonical correlations, the distance canonical pairs and the distance canonical transformations are defined similarly to their CCA analogues.

A natural question to ask is whether CCA and DCCA lead to identical projections, under the assumption of perturbed independence. At present, we are unable to either prove or disprove this statement, which we conjecture to be true, under the assumptions of Theorem 6: increasing perturbing functions that increase more steeply are more likely to imply both higher Pearson and distance correlations. We plan to investigate this conjecture by means of both theoretical arguments and simulation studies.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author would like to thank two anonymous reviewers for their very insightful comments. They greatly helped in improving the quality of the present paper.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Proof of Theorem 1.

Let the p-dimensional random vector

x

and the q-dimensional random vector

y

have a perturbed independence distribution with components

h (\cdot)

and

k (\cdot)

, location vectors

μ

and

ν

, perturbing function

π (\cdot)

and association matrix

Ψ

. For the sake of simplicity and without loss of generality we assume that the location vectors are null vectors and that the joint probability density function of

x

and

y

is absolutely continuous. The marginal probabilty density function of

x

is

f (x) = \int_{R^{q}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y .

Let

A^{+}

and

A^{-}

be the sets of q-dimensional real vectors whose first nonnull component are nonnegative and negative, so that

A^{+} \cup A^{-} = R^{q}

,

A^{+} \cap A^{-} = \emptyset

and

f (x) = \int_{A^{+}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y + \int_{A^{-}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y .

If a nonnull q-dimensional real vector

y

belongs to

A^{+}

then

- y

belongs to

A^{-}

. By making the change of variable

u = - y

in the second integral we have

\int_{A^{-}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y = \int_{A^{+}} 2 h (x) k (- u) π (- u^{⊤} Ψ x) d u .

By assumption, the functions

k (\cdot)

and

π (\cdot)

satisfy the identities

k (v) = k (- v)

and

π (a) = 1 - π (- a)

:

\int_{A^{-}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y = \int_{A^{+}} 2 h (x) k (u) \{1 - π (u^{⊤} Ψ x)\} d u .

The marginal density of

x

is then

f (x) = \int_{A^{+}} 2 h (x) k (y) π (y^{⊤} Ψ x) d y + \int_{A^{+}} 2 h (x) k (u) \{1 - π (u^{⊤} Ψ x)\} d u = h (x) .

The last identity follows from

k (\cdot)

being a symmetric probability density function. In a similar way it can be proved that the marginal probability density function of

y

is

k (\cdot)

. The conditional probability density function of

x

given

y

is

f (x | y) = \frac{f (x, y)}{f (y)} = \frac{2 h (x) k (y) π (y^{⊤} Ψ x)}{k (y)} = 2 h (x) π (y^{⊤} Ψ x),

that is skew-symmetric with symmetric kernel

h (\cdot)

, null location vector, skewing function

π (\cdot)

and shape parameter

Ψ^{⊤} y

. In a similar way it is possible to prove that the conditional probability density function of

y

given

x

is skew-symmetric with symmetric kernel

k (\cdot)

, null location vector, skewing function

π (\cdot)

and shape parameter

x^{⊤} Ψ^{⊤}

. □

Proof of Theorem 2.

Let the joint distribution of the p-dimensional random vector

x

and the q-dimensional random vector

y

be

f (x, y) = 2 h (x - μ) k (y - υ) π \{{(y - ν)}^{⊤} Ψ (x - μ)\} .

Further, let

w (u) = w (- u)

be an even function of the p-dimensional real vector

u

. We prove the theorem for an even function of

x - μ

only, the proof for an even function of

y - ν

being very similar. Without loss of generality we assume that

μ

and

ν

are null real vectors, that

f (x, y)

is an absolutely continuous probability density fuction and that

w = w (x)

is a k-dimensional random vector. The characteristic function of

w

in the k-dimensional real vector t is

φ_{w} (t) = E \{exp (w^{⊤} t)\} = \int_{R^{p}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y .

Let

S^{+}

and

S^{-}

be the sets of p-dimensional real vectors whose first nonnull component are nonnegative and negative, so that

S^{+} \cup S^{-} = R^{p}

,

S^{+} \cap S^{-} = \emptyset

and

\begin{matrix} φ_{w} (t) = \int_{S^{+}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y \\ + \int_{S^{-}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y . \end{matrix}

If a nonnull vector

x

belongs to

S^{+}

then

- x

belongs to

S^{-}

. By making the change of variable

u = - x

in the second integral we have

\begin{matrix} \int_{S^{-}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y = \\ \int_{S^{+}} \int_{R^{q}} exp \{w^{⊤} (- x) t\} 2 h (- u) k (y) π (- y^{⊤} Ψ u) d u d y . \end{matrix}

By assumption, the functions

h (\cdot)

,

π (\cdot)

and

w (\cdot)

satisfy the identities

h (v) = h (- v)

,

π (a) = 1 - π (- a)

and

w (u) = w (- u)

, so that

\begin{matrix} \int_{S^{-}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y = \\ \int_{S^{+}} \int_{R^{q}} exp \{w^{⊤} (u) t\} 2 h (u) k (y) \{1 - π (y^{⊤} Ψ u)\} d u d y . \end{matrix}

The characteristic function of

w

is then

\begin{matrix} φ_{w} (t) = \int_{S^{+}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y \\ + \int_{S^{+}} \int_{R^{q}} exp \{w^{⊤} (x) t\} 2 h (u) k (y) \{1 - π (y^{⊤} Ψ u)\} d u d y = \\ 2 \int_{S^{+}} exp \{w^{⊤} (x) t\} h (x) k (y) d x d y = 2 \int_{S^{+}} exp \{w^{⊤} (x) t\} h (x) d x . \end{matrix}

The definitions of

S^{+}

and

S^{-}

, together with the identity

h (v) = h (- v)

yield

\int_{S^{+}} exp \{w^{⊤} (x) t\} h (x) d x = \int_{S^{-}} exp \{w^{⊤} (x) t\} h (x) d x .

The characteristic function of

w

is then

φ_{w} (t) = \int_{R^{p}} exp \{w^{⊤} (x) t\} h (x) d x .

which does not depend neither on the association matrix

Ψ

nor on the perturbing function

π (\cdot)

. In order to prove that

w

and

y

are stochastically independent we consider their joint characteristic function

\begin{matrix} φ_{w, y} (t, r) = E (exp [\{w^{⊤} (x) t + y^{⊤} r\}]) = \\ \int_{R^{p}} \int_{R^{q}} exp \{w^{⊤} (x) t + y^{⊤} r\} 2 h (x) k (y) π (y^{⊤} Ψ x) d x d y, \end{matrix}

where r is a q-dimensional real vector. An argument very similar to the one in the first part of the proof yields

φ_{w, y} (t, r) = E [exp \{w^{⊤} (x) t\}] \cdot E \{exp (y^{⊤} r)\} = φ_{w} (t) \cdot φ_{y} (r),

thus implying that

w

and

y

are stochastically independent. □

Proof of Theorem 3.

Without loss of generality we assume that the location vectors are null vectors. We prove the theorem in the special case of both

x

and

y

being bivariate vectors, the proof for the general case being very similar but much more cumbersome in the notation. Let the joint distribution of

z

and

w

be

f (z_{1}, z_{2}, w_{1}, w_{2}) = 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) .

The marginal distribution

f (z_{1}, w_{2})

of

z_{1}

and

w_{2}

is

\begin{matrix} \int_{- \infty}^{+ \infty} \{\int_{- \infty}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} = \\ \int_{0}^{+ \infty} \{\int_{0}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} + \\ \int_{0}^{+ \infty} \{\int_{- \infty}^{0} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} + \\ \int_{- \infty}^{0} \{\int_{0}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} + \\ \int_{- \infty}^{0} \{\int_{- \infty}^{0} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} . \end{matrix}

By assumption, both

h (\cdot)

and

k (\cdot)

are sign-symmetric distributions, that is, they satisfy the equalities

\begin{matrix} h (z_{1}, z_{2}) = h (- z_{1}, z_{2}) = h (z_{1}, - z_{2}) = h (- z_{1}, - z_{2}) and \\ k (w_{1}, w_{2}) = k (- w_{1}, w_{2}) = k (w_{1}, - w_{2}) = k (- w_{1}, - w_{2}) . \end{matrix}

The change of variables

u_{1} = - z_{1}

and

u_{2} = - w_{2}

yields

\begin{matrix} \int_{- \infty}^{0} \{\int_{- \infty}^{0} h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} = \\ \int_{0}^{+ \infty} [\int_{0}^{+ \infty} h (z_{1}, z_{2}) k (w_{1}, w_{2}) \{1 - π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\}] d w_{2} . \end{matrix}

The change of variable

u_{1} = - z_{1}

yields

\begin{matrix} \int_{0}^{+ \infty} \{\int_{- \infty}^{0} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} = \\ \int_{0}^{+ \infty} [\int_{0}^{+ \infty} 2 h (u_{1}, z_{2}) k (w_{1}, w_{2}) π (- λ_{1} u_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}] d u_{2} = \\ \int_{0}^{+ \infty} [\int_{0}^{+ \infty} 2 h (u_{1}, z_{2}) k (w_{1}, w_{2}) \{1 - π (λ_{1} u_{1} w_{1} - λ_{2} z_{2} w_{2})\} d z_{1}] d u_{2} . \end{matrix}

The change of variable

u_{2} = - w_{2}

yields

\begin{matrix} \int_{- \infty}^{0} \{\int_{0}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} \\ \int_{0}^{+ \infty} \{\int_{0}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, u_{2}) π (λ_{1} z_{1} w_{1} - λ_{2} z_{2} u_{2}) d z_{1}\} d w_{2} . \end{matrix}

We can show that the random variables

Z_{1}

and

W_{2}

are independent:

\begin{matrix} f (z_{1}, w_{2}) = \int_{- \infty}^{+ \infty} \{\int_{- \infty}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) π (λ_{1} z_{1} w_{1} + λ_{2} z_{2} w_{2}) d z_{1}\} d w_{2} = \\ 2 \int_{0}^{+ \infty} \{\int_{0}^{+ \infty} 2 h (z_{1}, z_{2}) k (w_{1}, w_{2}) d z_{1}\} d w_{2} = \int_{- \infty}^{+ \infty} h (z_{1}, z_{2}) d z_{1} \int_{- \infty}^{+ \infty} k (w_{1}, w_{2}) d w_{2} . \end{matrix}

□

Proof of Theorem 4.

For the sake of simplicity and without loss of generality we assume that the perturbing function

π (\cdot)

is the cumulative distribution function of a continuous distribution. The random variables X and Y are either positively or negatively dependent if they satisfy the inequalities

Pr (X < a, Y < b) \geq Pr (X < a) \cdot Pr (Y < b) or Pr (X < a, Y < b) \leq Pr (X < a) \cdot Pr (Y < b)

for all real values a and b, with strict inequalities holding for at least some a and b. We only prove the theorem for positive values of a, b and

λ

: the proofs of the remaining cases are very similar. The integral representation of the joint probability that X and Y are smaller than a and b is

Pr (X < a, Y < b) = \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y .

Convenient partitioning of the integration region leads to the identity

\begin{matrix} \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y = \int_{- \infty}^{- b} \int_{- \infty}^{- a} 2 h (x) k (y) π (λ x y) d x d y \\ + \int_{0}^{b} \int_{- \infty}^{0} 2 h (x) k (y) π (λ x y) d x d y + \int_{- b}^{0} \int_{- \infty}^{0} 2 h (x) k (y) π (λ x y) d x d y \\ + \int_{- \infty}^{0} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y + \int_{- \infty}^{0} \int_{- a}^{0} 2 h (x) k (y) π (λ x y) d x d y \\ + \int_{0}^{b} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y + \int_{- b}^{0} \int_{- a}^{0} 2 h (x) k (y) π (λ x y) d x d y . \end{matrix}

By making the transformations

z = - x

and

w = - y

we obtain

\begin{matrix} \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y = \int_{- \infty}^{- b} \int_{- \infty}^{- a} 2 h (- z) k (- w) π (λ z w) d z d w \\ + \int_{0}^{b} \int_{- \infty}^{0} 2 h (x) k (y) π (λ x y) d x d y - \int_{b}^{0} \int_{- \infty}^{0} 2 h (x) k (- w) π (- λ x w) d x d w \\ + \int_{- \infty}^{0} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y - \int_{- \infty}^{0} \int_{a}^{0} 2 h (- z) k (y) π (- λ z y) d z d y \\ + \int_{0}^{b} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y - \int_{b}^{0} \int_{a}^{0} 2 h (- z) k (- w) π (λ z w) d z d w . \end{matrix}

We use the identities

h (x) = h (- x)

,

h (x) = h (- x)

,

π (- λ x y) = 1 - π (λ x y)

to obtain

\begin{matrix} \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y = \int_{- \infty}^{- b} \int_{- \infty}^{- a} 2 h (z) k (w) π (λ z w) d z d w \\ + \int_{0}^{b} \int_{- \infty}^{0} 2 h (x) k (y) π (λ x y) d x d y - \int_{b}^{0} \int_{- \infty}^{0} 2 h (x) k (w) [1 - π (λ x w)] d x d w \\ + \int_{- \infty}^{0} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y - \int_{- \infty}^{0} \int_{a}^{0} 2 h (z) k (y) [1 - π (λ z y)] d z d y \\ + \int_{0}^{b} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y - \int_{b}^{0} \int_{a}^{0} 2 h (z) k (w) π (λ z w) d z d w . \end{matrix}

Apply now basic properties of integrals:

\begin{matrix} \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y = \int_{b}^{\infty} \int_{a}^{\infty} 2 h (x) k (y) π (λ x y) d x d y \\ + \int_{0}^{b} \int_{- \infty}^{0} 2 h (x) k (y) π (λ x y) d x d y + \int_{0}^{b} \int_{- \infty}^{0} 2 h (x) k (y) [1 - π (λ x y)] d x d y \\ + \int_{- \infty}^{0} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y + \int_{- \infty}^{0} \int_{0}^{a} 2 h (x) k (y) [1 - π (λ x y)] d x d y \\ + \int_{0}^{b} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y - \int_{0}^{b} \int_{0}^{a} 2 h (x) k (y) π (λ x y) d x d y . \end{matrix}

A little algebra simplifies the above integrals:

\begin{matrix} \int_{- \infty}^{b} \int_{- \infty}^{a} 2 h (x) k (y) π (λ x y) d x d y = \int_{b}^{\infty} \int_{a}^{\infty} 2 h (x) k (y) π (λ x y) d x d y \\ = 2 \{\int_{- \infty}^{0} h (x) d x\} \{\int_{0}^{b} k (y) d y\} + 2 \{\int_{- \infty}^{0} k (y) d y\} \{\int_{0}^{a} h (x) d x\} . \end{matrix}

The above integral identity leads to the probabilistic identity

Pr (X < a, Y < b) = Pr (X > a, Y > b) + 2 Pr (X < 0) P (0 < Y < b) + 2 Pr (0 < X < a) P (Y < 0) .

The probabilities

Pr (X < 0)

,

Pr (0 < X < a)

,

Pr (Y < 0)

and

Pr (0 < Y < b)

do not depend on the perturbing function

π (\cdot)

, while the probability

Pr (X > a, Y > b)

does. By assumption,

π (\cdot)

is the cdf of a symmetric, continuous distribution, so that

π (q) > 0.5

for any positive real value q. It follows that

\begin{matrix} Pr (X > a, Y > b) = \int_{b}^{\infty} \int_{a}^{\infty} 2 h (x) k (y) π (λ x y) d x d y \\ > \int_{b}^{\infty} \int_{a}^{\infty} h (x) k (y) d x d y = Pr (X > a) Pr (Y > b) . \end{matrix}

Logical steps similar to the above ones lead to the inequality

Pr (X > - a, Y > - b) > Pr (X > - a) \cdot Pr (Y > - b)

for any two negative values

- a

and

- b

. The joint distribution of X and Y is centrally symmetric, so that the random variables

- X

and

- Y

have the same joint distribution of X and Y:

\begin{matrix} Pr (- X > - a, - Y > - b) & = & Pr (X < a, Y < b) > \\ Pr (- X > - a) \cdot Pr (- Y > - b) & = & Pr (X < a) \cdot Pr (Y < b) . \end{matrix}

□

Proof of Theorem 5.

Let

h_{i} (\cdot)

and

k_{i} (\cdot)

be the probability density functions of the random variables

U_{i}

and

V_{i}

,for

i \in {1, \dots, n}

. Further, let the random variables

U_{1}

, …,

U_{n}

and

V_{1}

, …,

V_{n}

be mutually independent, so that their joint probability density function is

f (u_{1}, \dots, u_{n}, v_{1}, \dots, v_{n}) = \{\overset{r}{\prod_{i = 1}} h_{i} (u_{i})\} \{\overset{r}{\prod_{i = 1}} k_{i} (v_{i})\} .

By assumption

π (\cdot)

is a strictly increasing perturbing function, so that we can define a random variable Q, which is symmetric at the origin and whose cumulative density function is

π (\cdot)

:

Pr (Q \leq a) = F_{Q} (a) = π (a) = 1 - π (a), where a \in R .

By an argument virtually identical to the one used in Section 2 to generate a bivariate perturbed independence distribution we obtain

(Z_{1}, \dots, Z_{n}, W_{1}, \dots, W_{n}) ~ (U_{1}, \dots, U_{n}, V_{1}, \dots, V_{n})| Q \leq \underset{i = 1}{\sum^{r}} λ_{i} U_{i} V_{i} .

Simple algebra leads to the distributional identity

(Z_{j}, W_{j}) ~ (U_{j}, V_{j})| Q - \sum_{i \neq j} λ_{i} U_{i} V_{i} \leq λ_{j} U_{j} V_{j} .

Since the random variables Q,

U_{1}

, …,

U_{n}

,

V_{1}

, …,

V_{n}

are continuous, symmetric and mutually independent, their linear combination

Q_{j} = Q - \sum_{i \neq j} λ_{i} U_{i} V_{i}

is continuous and symmetric, too. Let

π_{j} (\cdot)

be the cumulative density function of

Q_{j}

, so that

π_{j} (\cdot)

is a strictly increasing perturbing function, too:

π_{j} (a) < π_{j} (b) if a < b and 0 \leq π_{j} (- a) = 1 - π_{j} (a) \leq 1 .

By further application of the generating argument described in Section 2 we conclude that the joint distribution of

Z_{j}

and

W_{j}

have a joint distribution, which is perturbed independent:

f_{j} (z, w) = 2 h_{j} (z) k_{j} (w) π_{j} (λ_{j} z w) .

□

Proof of Theorem 6.

By assumption, the association parameter is nonnull and the perturbing function is increasing. Therefore, by Theorem 5, X and Y are either positively dependent (if the association parameter is positive) or negatively dependent (if the association parameter is negative). By ordinary properties of monotone association, the conditional expectation

g (x) = E (Y | X = x)

of

Y | X = x

is an increasing and nonconstant function of x if the association parameter is positive and a decreasing function of x if the association parameter is negative. In either case,

g (\cdot)

is a monotone and nonconstant function.

By definition, a real valued function

q (\cdot)

is odd if it satisfies the identity

q (- a) = - q (a)

for any real value a belonging to the domain of

q (\cdot)

. As remarked at the end of Section 2, a perturbed independence distribution is centrally symmetric: the bivariate random vectors

{(X, Y)}^{⊤}

and

{(- X, - Y)}^{⊤}

are identically distributed, so that

E (Y | X = x) = g (x) = E (- Y | - X = x) = - E (Y | X = - x) = - g (- x) .

As a direct consequence, the identity

- g (x) = g (- x)

holds true for any real value x belonging to the support of X.

By assumption, both X and Y have finite second moments and their probability density functions are symmetric at the origin. Hence their expectations equal zero:

E (X) = E (Y) = 0

. Without loss of generality we can assume that their variances equal one:

E (X^{2}) = E (Y^{2}) = 1

. Finally, let

ρ

be the correlation between X and Y:

E (X Y) = ρ

. Were

g (x)

a nonlinear function, it would coincide with

ρ x

, that is the linear regression of Y on X. By Theorem 1, the conditional distribution of Y given

X = x

is skew-symmetric, thus implying the identity

E (Y^{2} | X = x) = 1

. However,

ρ x

in an unbounded function but

g (x) = E (Y | X = x)

is not, since it must satisfy the inequality

1 = E (Y^{2} | X = x) \geq E^{2} (Y | X = x) = g^{2} (x) .

We therefore proved by contradiction that

g (\cdot)

is a nonlinear function.

By the minimizing property of the expected value we have

g (x) = E (Y | X = x) = \underset{a \in A}{arg min} E [{\{Y - a (X)\}}^{2}| X = x] .

where

A

is the set of all real valued functions defined on the support of X. As a direct consequence, we have

E [{\{Y - g (X)\}}^{2}| X = x] \leq E \{[{(Y - X)}^{2}| X = x]\} .

By taking expectation with respect to X we have

E [{\{Y - g (X)\}}^{2}] \leq E \{[{(Y - X)}^{2}]\} .

By recalling that

E (X) = E (Y) = 0

and

E (X^{2}) = E (Y^{2}) = 1

we therefore obtain the inequality

E [{\{Y - g (X)\}}^{2}] \leq 2 - 2 ρ .

By expanding the squares we obtain

E [{\{Y - g (X)\}}^{2}] = E (Y^{2}) - 2 E \{Y g (X)\} + E \{[g^{2} (X)]\} \leq 2 - 2 ρ,

which in turn leads to the inequality

E \{Y g (X)\} \geq ρ

by noticing that

E \{g^{2} (X)\} < 1 = E (Y^{2})

, by ordinary properties of variance decomposition. We further apply the same inequality to obtain

c o r r \{g (X), Y\} = \frac{E \{Y g (X)\}}{E (Y^{2}) E \{g^{2} (X)\}} > E \{Y g (X)\} \geq ρ,

which implies the strict inequality

c o r r \{g (X), Y\} > ρ

, thus completing the proof. □

References

Yoon, G.; Carroll, R.; Gayanova, I. Sparse semiparametric canonical correlation analysis for data of mixed types. Biometrika 2020, 107, 609–625. [Google Scholar] [CrossRef]
Hardoon, D.; Szedmak, S.; Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comp. 2004, 16, 2639–2664. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zoh, R.; Mallick, B.; Ivanov, I.; Baladandayuthapani, V.; Manyam, G.; Chapkin, R.; Lampe, J.; Carroll, R.J. PCAN: Probabilistic correlation analysis of two non-normal data sets. Biometrics 2016, 72, 1358–1368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fukumizu, K.; Bach, F.; Gretton, A. Statistical Consistency of Kernel Canonical Correlation Analysis. J. Mach. Learn. Res. 2007, 8, 361–383. [Google Scholar]
Hu, W.; Zhang, A.; Cai, B.; Calhoun, V.; Wang, Y.P. Distance canonical correlation analysis with application to an imaging-genetic study. J. Med. Imag 2019, 6, 026501. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Arnold, B.; Castillo, E.; Sarabia, J. Conditionally specified distributions: An introduction. Stat. Sci. 2001, 16, 249–274. [Google Scholar] [CrossRef]
Arnold, B.; Castillo, E.; Sarabia, J. Conditionally specified multivariate skewed distributions. Sankhya Ser. A 2002, 64, 206–226. [Google Scholar]
Adcock, C. Copulaesque Versions of the Skew-Normal and Skew-Student Distributions. Symmetry 2021, 13, 815. [Google Scholar] [CrossRef]
Serfling, R.J. Multivariate symmetry and asymmetry. In Encyclopedia of Statistical Sciences, 2nd ed.; Kotz, S., Read, C.B., Balakrishnan, N., Vidakovic, B., Eds.; Wiley: New York, NY, USA, 2006. [Google Scholar]
Genton, M.; Loperfido, N. Generalized Skew-Elliptical Distributions and their Quadratic Forms. Ann. Inst. Statist. Math. 2005, 57, 389–401. [Google Scholar] [CrossRef] [Green Version]
Jiménez-Gamero, M.; Alba-Fernández, M.; Jodrác, P.; Chalco-Cano, Y. Testing for the symmetric component in skew distributions. Math. Methods Appl. Sci. 2016, 39, 4713–4722. [Google Scholar] [CrossRef]
Ley, C.; Paindaveine, D. On the singularity of skew-symmetric models. J. Multiv. Anal. 2010, 101, 1434–1444. [Google Scholar] [CrossRef] [Green Version]
Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information: a tale of two densities. Bernoulli 2012, 18, 747–763. [Google Scholar] [CrossRef]
Loperfido, N. A New Kurtosis Matrix, with Statistical Applications. Lin. Alg. Appl. 2017, 512, 1–17. [Google Scholar] [CrossRef]
Loperfido, N. A Note on Skew-Elliptical Distributions and Linear Functions of Order Statistics. Stat. Probab. Lett. 2008, 78, 3184–3186. [Google Scholar] [CrossRef] [Green Version]
Kollo, T. Multivariate skewness and kurtosis measures with an application in ICA. J. Multivar. Anal. 2008, 99, 2328–2338. [Google Scholar] [CrossRef] [Green Version]
Mardia, K. Measures of multivariate skewness and kurtosis with applications. Biometrika 1970, 57, 519–530. [Google Scholar] [CrossRef]
Loperfido, N. Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection. J. Multivar. Anal. 2021, in press. [Google Scholar]
Loperfido, N. Skewness-Based Projection Pursuit: A Computational Approach. Comp. Statist. Data Anal. 2018, 120, 42–57. [Google Scholar] [CrossRef]
Székely, G.; Rizzo, M.; Bakirov, N. Measuring and testing dependence by correlation of distances. Ann. Statist. 2007, 35, 2769–2794. [Google Scholar] [CrossRef]
Lehmann, E. Some concepts of dependence. Ann. Math. Statist. 1966, 37, 1137–1153. [Google Scholar] [CrossRef]
Kendall, M. Rank Correlation Methods; Griffin: London, UK, 1962. [Google Scholar]
Kimeldorf, G.; Sampson, A. Positive dependence orderings. Ann. Inst. Statist. Math. 1987, 39, 113–128. [Google Scholar] [CrossRef]
Kimeldorf, G.; Sampson, A. A framework for positive dependence. Ann. Inst. Statist. Math. 1989, 41, 31–45. [Google Scholar]
Kruskal, W. Ordinal measures of association. J. Am. Statist. Assoc. 1958, 53, 814–861. [Google Scholar] [CrossRef]
Nelsen, R. On measures of association as measures of positive dependence. Statist. Prob. Lett. 1992, 14, 269–274. [Google Scholar] [CrossRef]
Tchen, A. Inequalities for distribution functions with given marginals. Ann. Prob. 1980, 8, 814–827. [Google Scholar] [CrossRef]
De Luca, G.; Loperfido, N. A Skew-in-Mean GARCH Model for Financial Returns. In Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality; Genton, M.G., Ed.; Chapman & Hall, CRC: Boca Raton, FL, USA, 2004; pp. 205–222. [Google Scholar]
De Luca, G.; Genton, M.; Loperfido, N. A Multivariate Skew-Garch Model. In Advances in Econometrics: Econometric Analysis of Economic and Financial Time Series, Part A (Special volume in Honor of Robert Engle and Clive Granger, the 2003 Winners of the Nobel Prize in Economics); Terrell, D., Ed.; Elsevier: Oxford, UK, 2006; Volume 20, pp. 33–56. [Google Scholar]
De Luca, G.; Loperfido, N. Modelling Multivariate Skewness in Financial Returns: A SGARCH Approach. Eur. J. Financ. 2015, 21, 1113–1131. [Google Scholar] [CrossRef]
Cox, D.; Wermuth, N. On the Calculation of Derived Variables in the Analysis of Multivariate Responses. J. Multivar. Anal. 1992, 42, 162–170. [Google Scholar] [CrossRef] [Green Version]
Lai, P.; Fyfe, C. Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst. 2000, 10, 365–377. [Google Scholar] [CrossRef]
Akaho, S. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting on Psychometric Society (IMPS2001), Osaka, Japan, 15–19 July 2001. [Google Scholar]

Figure 1. The upper, medium and lower right-hand (left-hand) panels contain the scatterplots of 10,000 (transformed) data from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 3, 5\}

.

Figure 1. The upper, medium and lower right-hand (left-hand) panels contain the scatterplots of 10,000 (transformed) data from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 3, 5\}

.

Figure 2. Histograms of the first transformed components of 10,000 data from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 2, 3, 4, 5, 6\}

.

Figure 2. Histograms of the first transformed components of 10,000 data from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 2, 3, 4, 5, 6\}

.

Table 1. Mean values of Pearson’s correlation, Kendall’s tau and Spearman’s rho for

100, 00

outcomes from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 2, 3, 4, 5, 6\}

. The column “Transformed” contains Pearson’s correlations between the transformed components.

Table 1. Mean values of Pearson’s correlation, Kendall’s tau and Spearman’s rho for

100, 00

outcomes from

2 ϕ (x) ϕ (y) Φ (λ x y)

, where

λ \in \{1, 2, 3, 4, 5, 6\}

. The column “Transformed” contains Pearson’s correlations between the transformed components.

Lambda	Kendall	Pearson	Spearman	Transformed
1	$0.311$	$0.439$	$0.468$	$0.469$
2	$0.412$	$0.552$	$0.619$	$0.642$
3	$0.444$	$0.587$	$0.667$	$0.713$
4	$0.458$	$0.598$	$0.689$	$0.752$
5	$0.471$	$0.612$	$0.709$	$0.791$
6	$0.480$	$0.618$	$0.720$	$0.810$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Loperfido, N.M.R. Canonical Correlations and Nonlinear Dependencies. Symmetry 2021, 13, 1308. https://doi.org/10.3390/sym13071308

AMA Style

Loperfido NMR. Canonical Correlations and Nonlinear Dependencies. Symmetry. 2021; 13(7):1308. https://doi.org/10.3390/sym13071308

Chicago/Turabian Style

Loperfido, Nicola Maria Rinaldo. 2021. "Canonical Correlations and Nonlinear Dependencies" Symmetry 13, no. 7: 1308. https://doi.org/10.3390/sym13071308

APA Style

Loperfido, N. M. R. (2021). Canonical Correlations and Nonlinear Dependencies. Symmetry, 13(7), 1308. https://doi.org/10.3390/sym13071308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Canonical Correlations and Nonlinear Dependencies

Abstract

1. Introduction

2. Model

3. Concordance

4. Nonlinearity

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI