Next Article in Journal
A Hierarchical Multi-Task Learning Framework for Semantic Annotation in Tabular Data
Previous Article in Journal
Modified Gravity in the Presence of Matter Creation: Scenario for the Late Universe
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions

1
Department of Mathematics and Statistics, University of Zambia, Lusaka 10101, Zambia
2
Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(8), 663; https://doi.org/10.3390/e26080663
Submission received: 9 July 2024 / Revised: 2 August 2024 / Accepted: 3 August 2024 / Published: 4 August 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The Kullback–Leibler divergence is a measure of the divergence between two probability distributions, often used in statistics and information theory. However, exact expressions for it are not known for multivariate or matrix-variate distributions apart from a few cases. In this paper, exact expressions for the Kullback–Leibler divergence are derived for over twenty multivariate and matrix-variate distributions. The expressions involve various special functions.

1. Introduction

The Kullback–Leibler divergence (KLD) due to [1] is a fundamental concept in information theory and statistics used to measure the divergence between two probability distributions. It quantifies how one probability distribution diverges from a second, reference probability distribution. Specifically, it calculates the expected extra amount of information required to represent data sampled from one distribution using a code optimized for another distribution. The KLD is asymmetric and not a true metric as it does not satisfy the triangle inequality. It is widely employed in various fields including machine learning, where it serves as a key component in tasks such as model comparison, optimization, and generative modeling, providing a measure of dissimilarity or discrepancy between probability distributions [2].
Suppose X is a continuous vector-variate random variable or a continuous matrix-variate random variable having one of two probability density functions f i · ; θ i , i = 1 , 2 parameterized by θ i , i = 1 , 2 . The KLD between f 1 · ; θ 1 and f 2 · ; θ 2 is defined by
K L D = E log f 1 X ; θ 1 f 2 X ; θ 2 ,
where the expectation is with respect to f 1 · ; θ 1 .
Because of the increasing applications of the KLD, it is useful to have exact expressions for (1). Apart from the multivariate normal distribution, not many expressions have been derived for (1) for multivariate or matrix-variate distributions. The KLD for the multivariate generalized Gaussian distribution was derived only in 2019, see [3]. The KLD for the multivariate Cauchy distribution was derived only in 2022, see [4]. The KLD for the multivariate t distribution was derived only in 2023, see [5].
The aim of this paper is to derive exact expressions for (1) for over twenty multivariate and matrix-variate distributions. The exact expressions for multivariate distributions are presented in Section 2. The exact expressions for matrix-variate distributions are presented in Section 3. The derivations of all of the expressions including a technical lemma needed for the derivations are presented in Section 4. The distributions considered in this paper are continuous. We shall not be considering discrete distributions including mixtures.
The functions and parameters used in this paper are all real-valued. The calculations involve several real-valued special functions listed in the Appendix A.

2. Exact Expressions for Multivariate Distributions

In this section, we state the exact expressions for (1) for Dirichlet, multivariate generalized Gaussian, inverted Dirichlet, multivariate Gauss hypergeometric, multivariate Kotz type, ref. [6]’s multivariate logistic, ref. [7]’s multivariate logistic, ref. [8]’s multivariate normal, multivariate Pearson type II, multivariate Selberg beta, multivariate weighted exponential and von Mises distributions.
A closed form for (1) for the multivariate generalized Gaussian distribution was derived by [3]. But it involved a special function defined as a ( p 1 ) folded infinite sum. The expression we give in Section 2.2 is much simpler in that it involves a single infinite sum. A closed form for (1) for the Dirichlet distribution is available in [9] and https://statproofbook.github.io/P/dir-kl.html (accessed on 1 July 2024).

2.1. Dirichlet Distribution

Consider the joint probability density functions
f 1 x = i = 1 K x i a i 1 B a 1 , , a K 1 ; a K
and
f 2 x = i = 1 K x i b i 1 B b 1 , , b K 1 ; b K
for K 2 , a 1 > 0 , , a K > 0 , b 1 > 0 , , b K > 0 , 0 x 1 1 , , 0 x K 1 and x 1 + + x K = 1 . The corresponding KLD is
K L D = log B b 1 , , b K 1 ; b K B a 1 , , a K 1 ; a K + i = 1 K a i b i ψ a i ψ a 1 + + a K i = 1 K a i b i .

2.2. Multivariate Generalized Gaussian Distribution ([10], p. 215)

Consider the joint probability density functions
f 1 x 1 , , x p = α Γ p 2 2 π p 2 Γ p α V 1 1 2 exp x T V 1 1 x α 2
and
f 2 x 1 , , x p = β Γ p 2 2 π p 2 Γ p β V 2 1 2 exp x T V 2 1 x β 2
for < x 1 < , , < x p < , α > 0 , β > 0 and V 1 , V 2 positive definite symmetric matrices. The corresponding KLD is
K L D = log α Γ p β β Γ p α + 1 2 log V 2 V 1 p α + Γ p 2 Γ p + β 2 π p 2 Γ p α r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 β 2 r 1 × r 1 r 2 r p 2 r p 1 λ 1 β 2 r 1 j = 1 p 1 λ j + 1 λ j r j r j + 1 B r j + p j 2 , 1 2
provided that the infinite sum converges.

2.3. Inverted Dirichlet Distribution

Consider the joint probability density functions
f 1 x = Γ a 1 + + a K + 1 Γ a 1 Γ a K + 1 1 + i = 1 K x i a 1 a K + 1 i = 1 K x i a i 1
and
f 2 x = Γ b 1 + + b K + 1 Γ b 1 Γ b K + 1 1 + i = 1 K x i b 1 b K + 1 i = 1 K x i b i 1
for K 2 , a 1 > 0 , , a K + 1 > 0 , b 1 > 0 , , b K + 1 > 0 and x 1 > 0 , , x K > 0 . The corresponding KLD is
K L D = log Γ a 1 + + a K + 1 Γ b 1 Γ b K + 1 Γ b 1 + + b K + 1 Γ a 1 Γ a K + 1 + i = 1 K a i b i ψ a i ψ a K + 1 i = 1 K a i b i + b 1 + + b K + 1 a 1 a K + 1 ψ a 1 + + a K + 1 ψ a K + 1 .

2.4. Multivariate Gauss Hypergeometric Distribution [11]

Consider the joint probability density functions
f 1 x = C a 1 , , a K , b , c 1 i = 1 K x i b 1 i = 1 K x i a i 1 1 + i = 1 K x i c
and
f 2 x = C d 1 , , d K , e , f 1 i = 1 K x i e 1 i = 1 K x i d i 1 1 + i = 1 K x i f
for K 2 , a 1 > 0 , , a K > 0 , b > 0 , < c < , d 1 > 0 , , d K > 0 , e > 0 , < f < , 0 x 1 1 , , 0 x K 1 and x 1 + + x K 1 . The corresponding KLD is
K L D = log C a 1 , , a K , b , c C d 1 , , d K , e , f + i = 1 K a i d i α C a 1 , , a i , , a K , b , c C a 1 , , a i + α , , a K , b , c α = 0 + ( b e ) α C a 1 , , a i , , a K , b , c C a 1 , , a K , b + α , c α = 0 ( c f ) α C a 1 , , a i , , a K , b , c C a 1 , , a K , b , c α α = 0 .

2.5. Multivariate Kotz Type Distribution [12]

Consider the joint probability density functions
f 1 x 1 , , x p = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a Σ 1 1 2 x T Σ 1 1 x N 1 exp q x T Σ 1 1 x a
and
f 2 x 1 , , x p = b Γ p 2 s 2 M + p 2 2 b π p 2 Γ 2 M + p 2 2 b Σ 2 1 2 x T Σ 2 1 x M 1 exp s x T Σ 2 1 x b
for < x 1 < , , < x p < , a > 0 , q > 0 , N > 1 p 2 , b > 0 , s > 0 , M > 1 p 2 and Σ 1 , Σ 2 positive definite symmetric matrices. The corresponding KLD is
K L D = log a q 2 N + p 2 2 a Γ 2 M + p 2 2 b b s 2 M + p 2 2 b Γ 2 N + p 2 2 a + 1 2 log Σ 2 Σ 1 + N 1 a ψ 2 N + p 2 2 a log q 2 N + p 2 2 a ( M 1 ) Γ p 2 a π p 2 ψ p + 2 N 2 2 a log q j = 1 p 1 B p j 2 , 1 2 ( M 1 ) Γ p 2 π p 2 log λ 1 j = 1 p 1 B p j 2 , 1 2 + ( M 1 ) Γ p 2 π p 2 k = 1 ( 1 ) k k i 1 + + i p 1 = k k i 1 , , i p 1 × j = 1 p 1 λ j + 1 λ j λ 1 i j B = j p 1 i + p j 2 , 1 2 + s Γ p 2 Γ 2 N + p + 2 b 2 2 a π p 2 q b a Γ 2 N + p 2 2 a r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 b r 1 r 1 r 2 r p 2 r p 1 λ 1 b r 1 × j = 1 p 1 λ j + 1 λ j r j r j + 1 B r j + p j 2 , 1 2
provided that the infinite sum converges.

2.6. Multivariate Logistic Distribution [6]

Consider the joint probability density functions
f 1 x 1 , , x p = p ! a 1 a p exp a 1 x 1 a p x p 1 + exp a 1 x 1 + + exp a p x p p + 1
and
f 2 x 1 , , x p = p ! b 1 b p exp b 1 x 1 b p x p 1 + exp b 1 x 1 + + exp b p x p p + 1
for < x 1 < , , < x p < , a 1 > 0 , , a p > 0 and b 1 > 0 , , b p > 0 . The corresponding KLD is
K L D = log a 1 a p b 1 b p ( p + 1 ) k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p j = 1 p Γ a j + i j b j a j Γ 1 j = 1 p i j b j a j ( p + 1 ) Γ ( p + 1 ) Γ ( p + 1 ) Γ ( 1 ) p !
provided that j = 1 p i j b j a j < 1 and the infinite series converges.

2.7. Multivariate Logistic Distribution [7]

Consider the joint probability density functions
f 1 x 1 , , x p = ( b ) p a 1 a p exp a 1 x 1 a p x p 1 + exp a 1 x 1 + + exp a p x p b + p
and
f 2 x 1 , , x p = ( d ) p c 1 c p exp c 1 x 1 c p x p 1 + exp c 1 x 1 + + exp c p x p d + p
for < x 1 < , , < x p < , a 1 > 0 , , a p > 0 , c 1 > 0 , , c p > 0 , b > 0 and d > 0 . The corresponding KLD is
K L D = log ( b ) p a 1 a p ( d ) p b 1 b p d + p Γ ( b ) k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p j = 1 p Γ a j + i j c j a j Γ b j = 1 p i j c j a j ( b + p ) Γ ( b ) Γ ( b + p ) Γ ( b + p ) Γ ( b ) Γ ( b ) Γ ( b + p )
provided that j = 1 p i j c j a j < b and the infinite series converges.

2.8. Sarabia [8]’s Multivariate Normal Distribution

Consider the joint probability density functions
f 1 x 1 , , x p = a 1 a p β p c , a 1 , , a p ( 2 π ) p 2 exp 1 2 i = 1 p a i x i 2 + c i = 1 p a i x i 2
and
f 2 x 1 , , x p = b 1 b p β p d , b 1 , , b p ( 2 π ) p 2 exp 1 2 i = 1 p b i x i 2 + d i = 1 p b i x i 2
for < x 1 < , , < x p < , a 1 > 0 , , a p > 0 , b 1 > 0 , , b p > 0 , c > 0 and d > 0 , where β p c , a 1 , , a p and β p d , b 1 , , b p denote normalizing constants. The corresponding KLD is
K L D = log a 1 a p β p c , a 1 , , a p b 1 b p β p d , b 1 , , b p + 1 2 c β p c , a 1 , , a p β p c , a 1 , , a p i = 1 p b i a i 1 + d b 1 b p a 1 a p c β p c , a 1 , , a p β p c , a 1 , , a p ,
where β p c , a 1 , , a p = c β p c , a 1 , , a p .

2.9. Multivariate Pearson Type II Distribution

Consider the joint probability density functions
f 1 x = π K 2 Γ K 2 Γ K 2 + a + b 1 Γ a Γ K 2 + b 1 i = 1 K x i 2 b 1 1 i = 1 K x i 2 a 1
and
f 2 x = π K 2 Γ K 2 Γ K 2 + c + d 1 Γ c Γ K 2 + d 1 i = 1 K x i 2 d 1 1 i = 1 K x i 2 c 1
for K 2 , a > 0 , b > 0 , c > 0 , d > 0 and 0 < x 1 2 + + x K 2 < 1 . The corresponding KLD is
K L D = log Γ K 2 + a + b 1 Γ c Γ K 2 + d 1 Γ a Γ K 2 + b 1 Γ K 2 + c + d 1 + ( b d ) ψ K 2 + b 1 ψ K 2 + a + b 1 + ( a c ) ψ a ψ K 2 + a + b 1 .

2.10. Multivariate Selberg Beta Distribution [13]

Consider the joint probability density functions
f 1 x = C a , b , c 1 i < j p x i x j 2 c i = 1 p x i a 1 1 x i b 1
and
f 2 x = C d , e , f 1 i < j p x i x j 2 f i = 1 p x i d 1 1 x i e 1
for a > 0 , b > 0 , c > 0 , d > 0 , e > 0 , f > 0 and 0 < x 1 < 1 , , 0 < x p < 1 . The corresponding KLD is
K L D = log C d , e , f C a , b , c + ( a d ) α C ( a , b , c ) C ( a + α , b , c ) α = 0 + ( b e ) α C ( a , b , c ) C ( a , b + α , c ) α = 0 + 2 ( c f ) α C ( a , b , c ) C ( a , b , c + α ) α = 0 .

2.11. Multivariate Weighted Exponential Distribution [14]

Consider the joint probability density functions
f 1 x 1 , , x p = i = 1 p a i i = 1 p a i a p + 1 1 exp a p + 1 min x 1 , , x p i = 1 p exp a i x i
and
f 2 x 1 , , x p = i = 1 p b i i = 1 p b i b p + 1 1 exp b p + 1 min x 1 , , x p i = 1 p exp b i x i
for x 1 > 0 , , x p > 0 , a 1 > 0 , , a p + 1 > 0 and b 1 > 0 , , b p + 1 > 0 . The corresponding KLD is
K L D = log b p + 1 i = 1 p a i i = 1 p a i a p + 1 i = 1 p b i i = 1 p b i + i = 1 p b i a i 1 a i + 1 a p + 1 k = 1 a 1 + + a p a 1 + + a p + a p + 1 k a 1 + + a p + k a p + 1 a 1 + + a p + ( k + 1 ) a p + 1 + k = 1 a 1 + + a p a 1 + + a p + a p + 1 k a 1 + + a p + k b p + 1 a 1 + + a p + a p + 1 + k b p + 1 ,
which follows from properties stated in [14] provided that the infinite series converge.

2.12. Von Mises Distribution

Consider the joint probability density functions
f 1 x = κ 1 p 2 1 exp κ 1 μ 1 T x ( 2 π ) p 2 I p 2 1 κ 1
and
f 2 x = κ 2 p 2 1 exp κ 2 μ 2 T x ( 2 π ) p 2 I p 2 1 κ 2
for κ 1 > 0 , κ 2 > 0 , μ 1 T μ 1 = 1 , μ 2 T μ 2 = 1 and x T x = 1 . The corresponding KLD is
K L D = log κ 1 p 2 1 κ 2 p 2 1 I p 2 1 κ 2 I p 2 1 κ 1 + κ 1 κ 2 μ 2 T μ 1 .

3. Exact Expressions for Matrix-Variate Distributions

In this section, we state exact expressions for (1) for matrix-variate beta, matrix-variate Dirichlet, matrix-variate gamma, matrix-variate Gauss hypergeometric, matrix-variate inverse beta, matrix-variate inverse gamma, matrix-variate Kummer beta, matrix-variate Kummer gamma, matrix-variate normal and matrix-variate two-sided power distributions.

3.1. Matrix-Variate Beta Distribution [15]

Consider the joint probability density functions
f 1 x = Ω x a p + 1 2 x b p + 1 2 Ω a + b B p ( a , b )
and
f 2 x = Ω x c p + 1 2 x d p + 1 2 Ω c + d B p ( c , d )
for a > p 1 2 , b > p 1 2 , c > p 1 2 , d > p 1 2 and Ω , x , Ω x being p × p positive definite matrices. The corresponding KLD is
K L D = log Ω c + d B p ( c , d ) Ω a + b B p ( a , b ) + ( a c ) α Ω α B p ( α + a , b ) B p ( a , b ) α = 0 + ( b d ) α Ω α B p ( a , α + b ) B p ( a , b ) α = 0 .

3.2. Matrix-Variate Dirichlet Distribution

Consider the joint probability density functions
f 1 x 1 , , x n = 1 B p a 1 , , a n ; a n + 1 x 1 a 1 p x n a n p I p i = 1 n x i a n + 1 p
and
f 2 x 1 , , x n = 1 B p b 1 , , b n ; b n + 1 x 1 b 1 p x n b n p I p i = 1 n x i b n + 1 p
for a 1 > p 1 2 , , a n + 1 > p 1 2 , b 1 > p 1 2 , , b n + 1 > p 1 2 and x 1 , , x n , I p x 1 x n being p × p positive definite matrices. The corresponding KLD is
K L D = log B p b 1 , , b n ; b n + 1 B p a 1 , , a n ; a n + 1 + a n + 1 b n + 1 α B p a 1 , , a n ; a n + 1 + α B p a 1 , , a n ; a n + 1 α = 0 + i = 1 n a i b i α B p a 1 , , a i + α , , a n ; a n + 1 B p a 1 , , a i , , a n ; a n + 1 α = 0 .

3.3. Matrix-Variate Gamma Distribution

Consider the joint probability density functions
f 1 x = Σ 1 a b a p Γ p ( a ) x a p + 1 2 exp tr 1 b Σ 1 1 x
and
f 2 x = Σ 2 c d c p Γ p ( c ) x c p + 1 2 exp tr 1 d Σ 2 1 x
for a > p 1 2 , b > 0 , c > p 1 2 , d > 0 and x , Σ 1 , Σ 2 being p × p positive definite symmetric matrices. The corresponding KLD is
K L D = log d p c Γ p ( c ) b p a Γ p ( a ) Σ 2 c Σ 1 a + a c Γ p ( a ) α Σ 1 α b p α Γ p ( a + α ) α = 0 + 2 a p b tr 2 a d Σ 2 1 Σ 1 .

3.4. Matrix-Variate Gauss Hypergeometric Distribution [16]

Consider the joint probability density functions
f 1 x = x a p + 1 2 I p x b p + 1 2 B p ( a , b )   2 F 1 a , c ; a + b ; B I p + B x c
and
f 2 x = x d p + 1 2 I p x e p + 1 2 B p ( d , e )   2 F 1 d , f ; d + e ; B I p + B x f
for a > p 1 2 , b > p 1 2 , 0 c < , d > p 1 2 , e > p 1 2 , 0 f < and x , I p x , B , I p + B being p × p positive definite matrices, where Γ p ( c ) and Γ p ( f ) are assumed to exist. The corresponding KLD is
K L D = log B p ( d , e )   2 F 1 d , f ; d + e ; B B p ( a , b )   2 F 1 a , c ; a + b ; B + ( a d ) α B p ( a + α , b )   2 F 1 a + α , c ; a + b + α ; B B p ( a , b )   2 F 1 a , c ; a + b ; B α = 0 + ( b e ) α B p ( a , b + α )   2 F 1 a , c ; a + b + α ; B B p ( a , b )   2 F 1 a , c ; a + b ; B α = 0 + ( f c ) α   2 F 1 a , c α ; a + b ; B   2 F 1 a , c ; a + b ; B α = 0 .

3.5. Matrix-Variate Inverse Beta Distribution

Consider the joint probability density functions
f 1 x = Ω + x a b x b p + 1 2 Ω a B p ( a , b )
and
f 2 x = Ω + x c d x d p + 1 2 Ω c B p ( c , d )
for a > p 1 2 , b > p 1 2 , c > p 1 2 , d > p 1 2 and x , Ω , Ω + x being p × p positive definite matrices. The corresponding KLD is
K L D = log Ω a c B p ( c , d ) B p ( a , b ) + ( c + d a b ) α Ω α B p ( a α , b ) B p ( a , b ) α = 0 + ( b d ) α Ω α B p ( a α , α + b ) B p ( a , b ) α = 0 .

3.6. Matrix-Variate Inverse Gamma Distribution

Consider the joint probability density functions
f 1 x = Σ 1 a b a p Γ p ( a ) x a p + 1 2 exp tr 1 b Σ 1 x 1
and
f 2 x = Σ 2 c d c p Γ p ( c ) x c p + 1 2 exp tr 1 d Σ 2 x 1
for a > p 1 2 , b > 0 , c > p 1 2 , d > 0 and x , Σ 1 , Σ 2 being p × p positive definite symmetric matrices. The corresponding KLD is
K L D = log d p c Γ p ( c ) b p a Γ p ( a ) Σ 1 a Σ 2 c + c a Γ p ( a ) α Σ 1 α Γ p ( a α ) b p α α = 0 + 2 a d tr Σ 2 Σ 1 1 2 a p b .

3.7. Matrix-Variate Kummer Beta Distribution [17]

Consider the joint probability density functions
f 1 x = x a p + 1 2 I p x b p + 1 2 exp tr B 1 x B p ( a , b )   1 F 1 a , a + b ; B 1
and
f 2 x = x c p + 1 2 I p x d p + 1 2 exp tr B 2 x B p ( c , d )   1 F 1 c ; c + d ; B 2
for a > p 1 2 , b > p 1 2 , c > p 1 2 , d > p 1 2 and x , I p x , B 1 , B 2 being p × p positive definite matrices. The corresponding KLD is
K L D = log B p ( c , d )   1 F 1 c ; c + d ; B 2 B p ( a , b )   1 F 1 a , a + b ; B 1 + ( a c ) α B p ( a + α , b )   1 F 1 a + α ; a + b + α ; B 1 B p ( a , b )   1 F 1 a ; a + b ; B 1 α = 0 + ( b d ) α B p ( a , b + α )   1 F 1 a ; a + b + α ; B 1 B p ( a , b )   1 F 1 a ; a + b ; B 1 α = 0 + tr B 2 B 1 z   1 F 1 a ; a + b ; z B 1   1 F 1 a ; a + b ; B 1 z = 0 .

3.8. Matrix-Variate Kummer Gamma Distribution [18]

Consider the joint probability density functions
f 1 x = x a p + 1 2 I p + x b exp tr B 1 x Γ p ( a ) Ψ 1 a ; a b + p + 1 2 ; B 1
and
f 2 x = x c p + 1 2 I p + x d exp tr B 2 x Γ p ( c ) Ψ 1 c ; c d + p + 1 2 ; B 2
for a > p 1 2 , < b < , c > p 1 2 , < d < and x , I p + x , B 1 , B 2 being p × p positive definite matrices, where Ψ 1 a ; a b + p + 1 2 ; B 1 and Ψ 1 c ; c d + p + 1 2 ; B 2 denote Kummer functions with matrix arguments and the parameters are chosen such that these functions exist. The corresponding KLD is
K L D = log Γ p ( c ) Ψ 1 c ; c d + p + 1 2 ; B 2 Γ p ( a ) Ψ 1 a ; a b + p + 1 2 ; B 1 + ( a c ) α Γ p ( a + α ) Ψ 1 a + α ; a b + α + p + 1 2 ; B 1 Γ p ( a ) Ψ 1 a ; a b + p + 1 2 ; B 1 α = 0 + ( d p ) α Ψ 1 a ; a b + α + p + 1 2 ; B 1 Ψ 1 a ; a b + p + 1 2 ; B 1 α = 0 + tr B 2 B 1 z Ψ 1 a ; a b + p + 1 2 ; B 1 z Ψ 1 a ; a b + p + 1 2 ; B 1 z = 0 .

3.9. Matrix-Variate Normal Distribution

Consider the joint probability density functions
f 1 x = 1 ( 2 π ) n p 2 V 1 n 2 U 1 p 2 exp 1 2 tr V 1 1 x M 1 T U 1 1 x M 1
and
f 2 x = 1 ( 2 π ) n p 2 V 2 n 2 U 2 p 2 exp 1 2 tr V 2 1 x M 2 T U 2 1 x M 2
for U 1 , U 2 being positive definite symmetric matrices of dimension n × n , V 1 , V 2 being positive definite symmetric matrices of dimension p × p and M 1 , M 2 being matrices of dimension n × p . The corresponding KLD is
K L D = log V 2 n 2 U 2 p 2 V 1 n 2 U 1 p 2 + 1 2 tr U 1 tr V 2 1 V 1 U 2 1 + 1 2 tr V 2 1 M 1 T M 1 U 2 1 1 2 tr V 2 1 M 1 T M 2 U 2 1 1 2 tr V 2 1 M 2 T M 1 U 2 1 + 1 2 tr V 2 1 M 2 T M 2 U 2 1 1 2 tr U 1 tr U 1 1 .

3.10. Matrix-Variate Two-Sided Power Distribution [19]

Consider the joint probability density functions
f 1 x = C ( a ) x a p + 1 2 B a + p + 1 2 , 0 p < x B , I p x a p + 1 2 I p B a + p + 1 2 , B x I p
and
f 2 x = C ( b ) x b p + 1 2 B b + p + 1 2 , 0 p < x B , I p x b p + 1 2 I p B b + p + 1 2 , B x I p ,
for a > p 1 2 , b > p 1 2 and x , I p x , B , I p B being p × p positive definite matrices, where
C ( a ) = B p + 1 2 B p a , p + 1 2 + I p B p + 1 2 B p p + 1 2 , a
and
C ( b ) = B p + 1 2 B p b , p + 1 2 + I p B p + 1 2 B p p + 1 2 , b .
The corresponding KLD is
K L D = log C ( a ) C ( b ) + ( a b ) α B α + p + 1 2 B p a + α , p + 1 2 α = 0 + ( a b ) α I p B α + p + 1 2 B p p + 1 2 , a + α α = 0 + ( b a ) B p + 1 2 log B B p a , p + 1 2 + ( b a ) I p B p + 1 2 log I p B B p p + 1 2 , a .

4. Proofs

Before presenting the proofs of the expressions in Section 2 and Section 3, we state a lemma and give its proof.

4.1. A Technical Lemma

Lemma 1. 
Let
I a 1 , , a p , t 1 , , t p , b = R p exp j = 1 p t j x j 1 + j = 1 p exp a j x j b d x 1 d x p
for t j > 0 , a j > 0 , j = 1 , 2 , , p and b > 0 . Then,
I a 1 , , a p , t 1 , , t p , b = 1 a 1 a p Γ ( b ) j = 1 p Γ t j a j Γ b j = 1 p t j a j
provided that b j = 1 p t j a j > 0 .
Proof. 
Setting y j = exp a j x j and assuming the conditions in the lemma, we can write
I a 1 , , a p , t 1 , , t p , b = 1 Γ ( b ) R p 0 t b 1 exp 1 + j = 1 p exp a j x j t exp j = 1 p t j x j d t d x 1 d x p = 1 Γ ( b ) 0 t b 1 e t j = 1 p exp t j x j t exp a j x j d x j d t = 1 a 1 a p Γ ( b ) 0 t b 1 e t j = 1 p 0 y j t j a j 1 exp t y j d y j d t = 1 a 1 a p Γ ( b ) j = 1 p Γ t j a j 0 t b j = 1 p t j a j 1 e t d t .
The result follows. □

4.2. Proof for Section 2.1

The corresponding KLD can be expressed as
K L D = log B b 1 , , b K 1 ; b K B a 1 , , a K 1 ; a K + i = 1 K a i b i E log X i .
It is easy to show that
E log X i = ψ a i ψ a 1 + + a K ,
so (2) reduces to the required.

4.3. Proof for Section 2.2

The corresponding KLD can be expressed as
K L D = log α Γ p β β Γ p α + 1 2 log V 2 V 1 + E X T V 2 1 X β 2 E X T V 1 1 X α 2 .
The second expectation in (3) can be expressed as
E X T V 1 1 X α 2 = α Γ p 2 2 π p 2 Γ p α V 1 1 2 R p x T V 1 1 x α 2 exp x T V 1 1 x α 2 d x = α Γ p 2 2 π p 2 Γ p α R p y T y α 2 exp y T y α 2 d y = α 2 Γ p α 0 u p 2 + α 2 1 exp u α 2 d u = 1 Γ p α 0 t p α exp ( t ) d t = p α ,
where y = V 1 1 2 x , u = y T y and t = u α 2 .
Let V = V 1 1 2 V 2 1 V 1 1 2 and V = P D P 1 , where P is an orthonormal matrix composed of eigenvectors of V and D is a diagonal matrix composed of eigenvalues say λ i of V . Then, the first expectation in (3) can be expressed as
E X T V 2 1 X β 2 = α Γ p 2 2 π p 2 Γ p α R p tr DP T yy T P β 2 exp y T y α 2 d y = α Γ p 2 2 π p 2 Γ p α R p x T D x β 2 exp x T x α 2 d x = α Γ p 2 2 π p 2 Γ p α R R i p λ i x i 2 β 2 exp i p x i 2 α 2 d x ,
where y = V 1 1 2 x and z = P T y . Using the pseudo-polar transformation z 1 = r sin θ 1 , z 2 = r cos θ 1 sin θ 2 , , z p = r cos θ 1 cos θ 2 cos θ p 1  https://en.wikipedia.org/wiki/Polar_coordinate_system (accessed on 1 July 2024), (4) can be expressed as
E X T V 2 1 X β 2 = α Γ p 2 2 π p 2 Γ p α 0 r p 1 π 2 π 2 π π r 2 λ 1 sin 2 θ 1 + + λ p cos 2 θ 1 cos 2 θ p 1 β 2 × exp r 2 α 2 j = 1 p 1 cos θ j p j 1 d r j = 1 p 1 d θ j = α Γ p 2 π p 2 Γ p α 0 r p + β 1 exp r α 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × B p x 1 , , x p 1 β 2 d x 1 d x p 1 d r = Γ p 2 Γ p + β 2 π p 2 Γ p α 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 B p x 1 , , x p 1 β 2 d x 1 d x p 1 ,
where x i = cos 2 θ i and B p x 1 , , x p 1 = λ 1 + λ 2 λ 1 x 1 + + λ p λ p 1 x 1 x 2 x p 1 . Provided that λ 1 λ 2 λ 1 x 1 + + λ p λ p 1 x 1 x 2 x p 1 holds, we can apply the generalized multinomial theorem to calculate (5) as
E X T V 2 1 X β 2 = Γ p 2 Γ p + β 2 π p 2 Γ p α 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 β 2 r 1 r 1 r 2 r p 2 r p 1 × λ 1 β 2 r 1 λ 2 λ 1 x 1 r 1 r 2 λ 3 λ 2 x 1 x 2 r 2 r 3 λ p 1 λ p 2 x 1 x 2 x p 2 r p 2 r p 1 λ p λ p 1 x 1 x 2 x p 1 r p 1 d x 1 d x p 1 = Γ p 2 Γ p + β 2 π p 2 Γ p α 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 β 2 r 1 r 1 r 2 r p 2 r p 1 λ 1 β 2 r 1 × j = 1 p 1 λ j + 1 λ j = 1 j x r j r j + 1 d x 1 d x p 1 = Γ p 2 Γ p + β 2 π p 2 Γ p α r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 β 2 r 1 r 1 r 2 r p 2 r p 1 λ 1 β 2 r 1 × j = 1 p 1 λ j + 1 λ j r j r j + 1 B r j + p j 2 , 1 2
provided that the infinite sum converges. Hence, the required.

4.4. Proof for Section 2.3

The corresponding KLD can be expressed as
K L D = log Γ a 1 + + a K + 1 Γ b 1 Γ b K + 1 Γ b 1 + + b K + 1 Γ a 1 Γ a K + 1 + i = 1 K a i b i E log X i + b 1 + + b K + 1 a 1 a K + 1 E log 1 + i = 1 K X i .
It is easy to show that
E log X i = ψ a i ψ a K + 1
and
E log 1 + i = 1 K X i = ψ a 1 + + a K + 1 ψ a K + 1 ,
so (6) reduces to the required.

4.5. Proof for Section 2.4

The corresponding KLD can be expressed as
K L D = log C a 1 , , a K , b , c C d 1 , , d K , e , f + i = 1 K a i d i E log X i + ( b e ) E log 1 i = 1 K X i ( c f ) E log 1 + i = 1 K X i .
It is easy to show that
E log X i = α C a 1 , , a i , , a K , b , c C a 1 , , a i + α , , a K , b , c α = 0 ,
E log 1 i = 1 K X i = α C a 1 , , a i , , a K , b , c C a 1 , , a K , b + α , c α = 0
and
E log 1 + i = 1 K X i = α C a 1 , , a i , , a K , b , c C a 1 , , a K , b , c α α = 0 ,
so (7) reduces to the required.

4.6. Proof for Section 2.5

The corresponding KLD can be expressed as
K L D = log a q 2 N + p 2 2 a Γ 2 M + p 2 2 b b s 2 M + p 2 2 b Γ 2 N + p 2 2 a + 1 2 log Σ 2 Σ 1 + ( N 1 ) E log X T Σ 1 1 X q E X T Σ 1 1 X a ( M 1 ) E log X T Σ 2 1 X + s E X T Σ 2 1 X b .
The second expectation in (8) can be calculated as
E X T Σ 1 1 X a = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a Σ 1 1 2 R p x T Σ 1 1 x a + N 1 exp q x T Σ 1 1 x a d x = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p y T y a + N 1 exp q y T y a d y = 1 q Γ 2 N + p 2 2 a 0 t 2 N + p 2 2 a exp ( t ) d t = 2 N + p 2 2 a q ,
where y = Σ 1 1 2 x and t = q y T y a .
Let Σ = Σ 1 1 2 Σ 2 1 Σ 1 1 2 . The first expectation in (8) can be calculated as
E log X T Σ 1 1 X = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a Σ 1 1 2 R p x T Σ 1 1 x N 1 log x T Σ 1 1 x exp q x T Σ 1 1 x a d x = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p y T y N 1 log y T y exp q y T y a d y = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a N R p y T y N 1 exp q y T y a d y = q 2 N + p 2 2 a Γ 2 N + p 2 2 a N 0 t 2 N + p 2 2 a 1 q 2 N + p 2 2 a exp ( t ) d t = q 2 N + p 2 2 a Γ 2 N + p 2 2 a N Γ 2 N + p 2 2 a q 2 N + p 2 2 a = 1 a ψ 2 N + p 2 2 a log q ,
where y = Σ 1 1 2 x and t = q y T y a .
As in Section 4.3, write Σ = P D P 1 , where P is an orthonormal matrix composed of eigenvectors of Σ and D is a diagonal matrix composed of eigenvalues say λ i of Σ . Then, the fourth expectation in (8) can be expressed as
E X T Σ 2 1 X b = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a Σ 1 1 2 R p x T Σ 2 1 x b x T Σ 1 1 x N 1 exp q x T Σ 1 1 x a d x = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p y T Σ y b y T y N 1 exp q y T y a d y = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p tr D P T y y T P b y T y N 1 exp q y T y a d y = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p v T D v b v T v N 1 exp q v T v a d v = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p i = 1 p λ i v i 2 b i = 1 p v i 2 N 1 exp q i = 1 p v i 2 a d v ,
where y = Σ 1 1 2 x and v = P T y .
Using the pseudo-polar transformation v 1 = r sin θ 1 , v 2 = r cos θ 1 sin θ 2 , , v p = r cos θ 1 cos θ 2 cos θ p 1 , (9) can be expressed as
E X T Σ 2 1 X b = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 r p 1 π 2 π 2 π π r 2 λ 1 sin 2 θ 1 + + λ p cos 2 θ 1 cos 2 θ p 1 b × r 2 N 1 exp q r 2 a j = 1 p 1 cos θ j p j 1 d r j = 1 p 1 d θ j = Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 t 2 N + p + 2 b 2 2 a 1 q 2 N + p + 2 b 2 2 a exp ( t ) 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × B p x 1 , , x p 1 b d x 1 d x p 1 d t = Γ p 2 Γ 2 N + p + 2 b 2 2 a π p 2 q b a Γ 2 N + p 2 2 a 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × B p x 1 , , x p 1 b d x 1 d x p 1 ,
where x i = cos 2 θ i , t = q r 2 a and B p x 1 , , x p 1 = λ 1 + λ 2 λ 1 x 1 + + λ p λ p 1 x 1 x 2 x p 1 .
Provided that λ 1 λ 2 λ 1 x 1 + + λ p λ p 1 x 1 x 2 x p 1 holds, we can apply the generalized multinomial theorem to calculate (10) as
E X T Σ 2 1 X b = Γ p 2 Γ 2 N + p + 2 b 2 2 a π p 2 q b a Γ 2 N + p 2 2 a 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 b r 1 r 1 r 2 r p 2 r p 1 λ 1 b r 1 λ 2 λ 1 x 1 r 1 r 2 λ 3 λ 2 x 1 x 2 r 2 r 3 λ p 1 λ p 2 x 1 x 2 x p 2 r p 2 r p 1 λ p λ p 1 x 1 x 2 x p 1 r p 1 d x 1 d x p 1 = Γ p 2 Γ 2 N + p + 2 b 2 2 a π p 2 q b a Γ 2 N + p 2 2 a 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 b r 1 r 1 r 2 r p 2 r p 1 × λ 1 b r 1 j = 1 p 1 λ j + 1 λ j = 1 j x r j r j + 1 d x 1 d x p 1 = Γ p 2 Γ 2 N + p + 2 b 2 2 a π p 2 q b a Γ 2 N + p 2 2 a r 1 = 0 r 2 = 0 r 1 r p 1 = 0 r p 2 b r 1 r 1 r 2 r p 2 r p 1 λ 1 b r 1 × j = 1 p 1 λ j + 1 λ j r j r j + 1 B r j + p j 2 , 1 2
provided that the infinite sum converges.
The third expectation in (8) can be expressed as
E log X T Σ 2 1 X = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a Σ 1 1 2 R p x T Σ 1 1 x N 1 log x T Σ 2 1 x exp q x T Σ 1 1 x a d x = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p y T y N 1 log tr D P T y y T P exp q y T y a d y = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a R p i = 1 p v i 2 N 1 log i = 1 p λ i v i 2 exp q i = 1 p v i 2 a d v = a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 r p 1 π 2 π 2 π π r 2 N 1 × log r 2 λ 1 sin 2 θ 1 + + λ p cos 2 θ 1 cos 2 θ p 1 × exp q r 2 a j = 1 p 1 cos θ j p j 1 d r j = 1 p 1 d θ j = I 1 + I 2
say, where y = Σ 1 1 2 x , v = P T y ,
I 1 = 2 a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 r p + 2 N 3 log r 2 exp q r 2 a × 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 d x 1 d x p 1 d r
and
I 2 = 2 a Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 r p + 2 N 3 exp q r 2 a × 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 log B p x 1 , , x p 1 d x 1 d x p 1 d r .
The I 1 can be calculated as
I 1 = Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 t p + 2 N 2 2 a 1 q 2 N + p 2 2 a log t q 1 a exp ( t ) j = 1 p 1 B p j 2 , 1 2 d t = Γ p 2 a π p 2 Γ 2 N + p 2 2 a a N 0 t p + 2 N 2 2 a 1 exp ( t ) Γ 2 N + p 2 2 a log q j = 1 p 1 B p j 2 , 1 2 = Γ p 2 a π p 2 Γ 2 N + p 2 2 a a N Γ p + 2 N 2 2 a Γ 2 N + p 2 2 a log q j = 1 p 1 B p j 2 , 1 2 = Γ p 2 a π p 2 ψ p + 2 N 2 2 a log q j = 1 p 1 B p j 2 , 1 2 ,
where t = q r 2 a .
The I 2 can be calculated as
I 2 = Γ p 2 q 2 N + p 2 2 a π p 2 Γ 2 N + p 2 2 a 0 t 2 N + p 2 2 a 1 q 2 N + p 2 2 a exp ( t ) × 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 log B p x 1 , , x p 1 d x 1 d x p 1 d t = Γ p 2 π p 2 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × log λ 1 + log 1 + λ 2 λ 1 x 1 λ 1 + + λ p λ p 1 x 1 x 2 x p 1 λ 1 d x 1 d x p 1 = Γ p 2 π p 2 log λ 1 j = 1 p 1 B p j 2 , 1 2 Γ p 2 π p 2 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 × k = 1 ( 1 ) k k λ 2 λ 1 x 1 λ 1 + + λ p λ p 1 x 1 x 2 x p 1 λ 1 k d x 1 d x p 1 = Γ p 2 π p 2 log λ 1 j = 1 p 1 B p j 2 , 1 2 Γ p 2 π p 2 k = 1 ( 1 ) k k i 1 + + i p 1 = k k i 1 , , i p 1 × 0 1 0 1 j = 1 p 1 x j p j 2 1 1 x j 1 2 λ j + 1 λ j λ 1 = 1 j x i j d x 1 d x p 1 = Γ p 2 π p 2 log λ 1 j = 1 p 1 B p j 2 , 1 2 Γ p 2 π p 2 k = 1 ( 1 ) k k i 1 + + i p 1 = k k i 1 , , i p 1 × j = 1 p 1 λ j + 1 λ j λ 1 i j B = j p 1 i + p j 2 , 1 2
provided that the infinite sum converges.
Hence, the required.

4.7. Proof for Section 2.6

The corresponding KLD can be expressed as
K L D = log a 1 a p b 1 b p + = 1 p b a E X + ( p + 1 ) E log 1 + exp b 1 X 1 + + exp b p X p ( p + 1 ) E log 1 + exp a 1 X 1 + + exp a p X p = log a 1 a p b 1 b p + ( p + 1 ) E log 1 + exp b 1 X 1 + + exp b p X p ( p + 1 ) E log 1 + exp a 1 X 1 + + exp a p X p
since the expectations are zero. Using the Taylor expansion for log ( 1 + z ) , the first expectation in (11) can be expressed as
k = 1 ( 1 ) k k E exp b 1 X 1 + + exp b p X p k = k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p E exp i 1 b 1 X 1 i p b p X p = k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p j = 1 p Γ a j + i j b j a j Γ 1 j = 1 p i j b j a j ,
where the last step follows by Lemma 1 provided that j = 1 p i j b j a j < 1 and the infinite series converges. The second expectation in (11) can be expressed as
p ! a 1 a p R p α 1 + exp a 1 x 1 + + exp a p x p α α = 0 × exp a 1 x 1 a p x p d x 1 d x p 1 + exp a 1 x 1 + + exp a p x p p + 1 = p ! a 1 a p α R p exp a 1 x 1 a p x p 1 + exp a 1 x 1 + + exp a p x p p + 1 α d x 1 d x p α = 0 = p ! α Γ ( 1 α ) Γ ( p + 1 α ) α = 0 = Γ ( p + 1 ) Γ ( p + 1 ) Γ ( 1 ) p ! ,
where the penultimate step is followed by Lemma 1. Hence, the required.

4.8. Proof for Section 2.7

The corresponding KLD can be expressed as
K L D = log ( b ) p a 1 a p ( d ) p b 1 b p + = 1 p c a E X + ( d + p ) E log 1 + exp c 1 X 1 + + exp c p X p ( b + p ) E log 1 + exp a 1 X 1 + + exp a p X p = log ( b ) p a 1 a p ( d ) p b 1 b p + ( d + p ) E log 1 + exp c 1 X 1 + + exp c p X p ( b + p ) E log 1 + exp a 1 X 1 + + exp a p X p
since the expectations are zero. Using the Taylor expansion for log ( 1 + z ) , the first expectation in (12) can be expressed as
k = 1 ( 1 ) k k E exp c 1 X 1 + + exp c p X p k = k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p E exp i 1 c 1 X 1 i p c p X p = 1 Γ ( b ) k = 1 ( 1 ) k k i 1 + + i p = k k i 1 , , i p j = 1 p Γ a j + i j c j a j Γ b j = 1 p i j c j a j ,
where the last step follows by Lemma 1 provided that j = 1 p i j c j a j < b and the infinite series converges. The second expectation in (12) can be expressed as
( b ) p a 1 a p R p α 1 + exp a 1 x 1 + + exp a p x p α α = 0 × exp a 1 x 1 a p x p d x 1 d x p 1 + exp a 1 x 1 + + exp a p x p b + p = ( b ) p a 1 a p α R p exp a 1 x 1 a p x p 1 + exp a 1 x 1 + + exp a p x p b + p α d x 1 d x p α = 0 = ( b ) p α Γ ( b α ) Γ ( b + p α ) α = 0 = Γ ( b ) Γ ( b + p ) Γ ( b + p ) Γ ( b ) Γ ( b ) Γ ( b + p ) ,
where the penultimate step is followed by Lemma 1. Hence, the required.

4.9. Proof for Section 2.8

The corresponding KLD can be expressed as
K L D = log a 1 a p β p c , a 1 , , a p b 1 b p β p d , b 1 , , b p + 1 2 i = 1 p b i a i E X i 2 + 1 2 d b 1 b p c a 1 a p E X 1 2 X p 2 .
Using results in [8], we can calculate (13) as the required.

4.10. Proof for Section 2.9

The corresponding KLD can be expressed as
K L D = log Γ K 2 + a + b 1 Γ c Γ K 2 + d 1 Γ a Γ K 2 + b 1 Γ K 2 + c + d 1 + ( b d ) E log i = 1 K X i 2 + ( a c ) E log 1 i = 1 K X i 2 .
It is easy to show that
E log i = 1 K X i 2 = ψ K 2 + b 1 ψ K 2 + a + b 1
and
E log 1 i = 1 K X i 2 = ψ a ψ K 2 + a + b 1 ,
so (14) reduces to the required.

4.11. Proof for Section 2.10

The corresponding KLD can be expressed as
K L D = log C d , e , f C a , b , c + ( a d ) E log i = 1 p X i + ( b e ) E log i = 1 p 1 X i + 2 ( c f ) E log 1 i < j p X i X j .
Easy calculations show that
E log i = 1 p X i = α C ( a , b , c ) C ( a + α , b , c ) α = 0 ,
E log i = 1 p 1 X i = α C ( a , b , c ) C ( a , b + α , c ) α = 0
and
E log 1 i < j p X i X j = α C ( a , b , c ) C ( a , b , c + α ) α = 0 ,
so (7) reduces to the required.

4.12. Proof for Section 2.11

The corresponding KLD can be expressed as
K L D = log b p + 1 i = 1 p a i i = 1 p a i a p + 1 i = 1 p b i i = 1 p b i + i = 1 p b i a i E X i + E log 1 exp a p + 1 min X 1 , , X p E log 1 exp b p + 1 min X 1 , , X p .
Using the series expansion for log ( 1 + z ) , we can express (16) as
K L D = log b p + 1 i = 1 p a i i = 1 p a i a p + 1 i = 1 p b i i = 1 p b i + i = 1 p b i a i E X i k = 1 E exp k a p + 1 min X 1 , , X p k + k = 1 E exp k b p + 1 min X 1 , , X p k .
Hence, the required.

4.13. Proof for Section 2.12

The corresponding KLD can be expressed as
K L D = log κ 1 p 2 1 κ 2 p 2 1 I p 2 1 κ 2 I p 2 1 κ 1 + κ 1 μ 1 T κ 2 μ 2 T E X = log κ 1 p 2 1 κ 2 p 2 1 I p 2 1 κ 2 I p 2 1 κ 1 + κ 1 μ 1 T κ 2 μ 2 T μ 1 .
Hence, the required.

4.14. Proof for Section 3.1

The corresponding KLD can be expressed as
K L D = log Ω c + d a b B p ( c , d ) B p ( a , b ) + ( a c ) E log Ω X + ( b d ) E log X .
The expectations in (17) can be calculated as
E log Ω X = α Ω x α + a p + 1 2 x b p + 1 2 Ω a + b B p ( a , b ) d x α = 0 = α Ω α B p ( α + a , b ) B p ( a , b ) α = 0
and
E log X = α Ω x a p + 1 2 x α + b p + 1 2 Ω a + b B p ( a , b ) d x α = 0 = α Ω α B p ( a , α + b ) B p ( a , b ) α = 0 .
Hence, the required.

4.15. Proof for Section 3.2

The corresponding KLD can be expressed as
K L D = log B p b 1 , , b n ; b n + 1 B p a 1 , , a n ; a n + 1 + a n + 1 b n + 1 E log I p i = 1 n X i + i = 1 n a i b i E log X i .
It is easy to show that
E log X i = α B p a 1 , , a i + α , , a n ; a n + 1 B p a 1 , , a i , , a n ; a n + 1 α = 0
and
E log I p i = 1 n X i = α B p a 1 , , a n ; a n + 1 + α B p a 1 , , a n ; a n + 1 α = 0 ,
so (18) reduces to the required.

4.16. Proof for Section 3.3

The corresponding KLD can be expressed as
K L D = log d p c Γ p ( c ) b p a Γ p ( a ) Σ 2 c Σ 1 a + ( a c ) E log X + tr 1 b Σ 1 1 E X tr 1 d Σ 2 1 E X .
The first expectation in (19) can be calculated as
E log X = Σ 1 a b a p Γ p ( a ) α x α + a p + 1 2 exp tr 1 b Σ 1 1 x α = 0 = 1 Γ p ( a ) α Σ 1 α b p α Γ p ( a + α ) α = 0 .
Since E ( X ) = 2 a Σ 1 , the second and third terms in (19) are equal to
tr 1 b Σ 1 1 E X = 2 a p b
and
tr 1 d Σ 2 1 E X = tr 2 a d Σ 2 1 Σ 1 ,
respectively. Hence, the required.

4.17. Proof for Section 3.4

The corresponding KLD can be expressed as
K L D = log B p ( d , e )   2 F 1 d , f ; d + e ; B B p ( a , b )   2 F 1 a , c ; a + b ; B + ( a d ) E log X + ( b e ) E log I p X + ( f c ) E log I p + B X .
The expectations in (20) can be easily calculated as
E log X = α B p ( a + α , b )   2 F 1 a + α , c ; a + b + α ; B B p ( a , b )   2 F 1 a , c ; a + b ; B α = 0 ,
E log I p X = α B p ( a , b + α )   2 F 1 a , c ; a + b + α ; B B p ( a , b )   2 F 1 a , c ; a + b ; B α = 0
and
E log I p + B X = α   2 F 1 a , c α ; a + b ; B   2 F 1 a , c ; a + b ; B α = 0 .
Hence, the required.

4.18. Proof for Section 3.5

The corresponding KLD can be expressed as
K L D = log Ω a c B p ( c , d ) B p ( a , b ) + ( c + d a b ) E log Ω + X + ( b d ) E log X .
The expectations in (21) can be calculated as
E log Ω + X = α Ω + x α a b x b p + 1 2 Ω a B p ( a , b ) d x α = 0 = α Ω α B p ( a α , b ) B p ( a , b ) α = 0
and
E log X = α Ω + x a b x α + b p + 1 2 Ω a B p ( a , b ) d x α = 0 = α Ω α B p ( a α , α + b ) B p ( a , b ) α = 0 .
Hence, the required.

4.19. Proof for Section 3.6

The corresponding KLD can be expressed as
K L D = log d p c Γ p ( c ) b p a Γ p ( a ) Σ 1 a Σ 2 c + ( c a ) E log X + 1 d tr Σ 2 E X 1 1 b tr Σ 1 E X 1 .
The first expectation in (22) can be calculated as
E log X = Σ 1 a b a p Γ p ( a ) α x α a p + 1 2 exp tr 1 b Σ 1 x α = 0 = 1 Γ p ( a ) α Σ 1 α Γ p ( a α ) b p α α = 0 .
Since E X 1 = 2 a Σ 1 1 , the second and third terms in (22) are equal to
tr Σ 2 E X 1 = 2 a tr Σ 2 Σ 1 1
and
tr Σ 1 E X 1 = 2 a p ,
respectively. Hence, the required.

4.20. Proof for Section 3.7

The corresponding KLD can be expressed as
K L D = log B p ( c , d )   1 F 1 c ; c + d ; B 2 B p ( a , b )   1 F 1 a , a + b ; B 1 + ( a c ) E log X + ( b d ) E log I p X + tr B 2 B 1 E X .
The expectations in (23) can be easily calculated as
E log X = α B p ( a + α , b )   1 F 1 a + α ; a + b + α ; B 1 B p ( a , b )   1 F 1 a ; a + b ; B 1 α = 0 ,
E log I p X = α B p ( a , b + α )   1 F 1 a ; a + b + α ; B 1 B p ( a , b )   1 F 1 a ; a + b ; B 1 α = 0
and
E X = z   1 F 1 a ; a + b ; z B 1   1 F 1 a ; a + b ; B 1 z = 0 .
Hence, the required.

4.21. Proof for Section 3.8

The corresponding KLD can be expressed as
K L D = log Γ p ( c ) Ψ 1 c ; c d + p + 1 2 ; B 2 Γ p ( a ) Ψ 1 a ; a b + p + 1 2 ; B 1 + ( a c ) E log X + ( d b ) E log I p + X + tr B 2 B 1 E X .
The expectations in (24) can be easily calculated as
E log X = α Γ p ( a + α ) Ψ 1 a + α ; a b + α + p + 1 2 ; B 1 Γ p ( a ) Ψ 1 a ; a b + p + 1 2 ; B 1 α = 0 ,
E log I p + X = α Ψ 1 a ; a b + α + p + 1 2 ; B 1 Ψ 1 a ; a b + p + 1 2 ; B 1 α = 0
and
E X = z Ψ 1 a ; a b + p + 1 2 ; B 1 z Ψ 1 a ; a b + p + 1 2 ; B 1 z = 0 .
Hence, the required.

4.22. Proof for Section 3.9

The corresponding KLD can be expressed as
K L D = log V 2 n 2 U 2 p 2 V 1 n 2 U 1 p 2 + 1 2 E tr V 2 1 X M 2 T U 2 1 X M 2 1 2 E tr V 1 1 X M 1 T U 1 1 X M 1 .
The second expectation in (25) can be expressed as
tr V 1 1 E X M 1 T X M 1 U 1 1 = tr U 1 tr U 1 1 .
The first expectation in (25) can be expressed as
tr V 2 1 E X M 2 T X M 2 U 2 1 = tr V 2 1 E X T X X T M 2 M 2 T X + M 2 T M 2 U 2 1 = tr V 2 1 E X T X E X T M 2 E M 2 T X + M 2 T M 2 U 2 1 = tr U 1 tr V 2 1 V 1 U 2 1 + tr V 2 1 M 1 T M 1 U 2 1 tr V 2 1 M 1 T M 2 U 2 1 tr V 2 1 M 2 T M 1 U 2 1 + tr V 2 1 M 2 T M 2 U 2 1 .
Hence, the required.

4.23. Proof for Section 3.10

The corresponding KLD can be expressed as
K L D = log C ( a ) C ( b ) + ( a b ) E log X I 0 p < X B + ( a b ) E log I p X I B < X I p + ( b a ) B p + 1 2 log B B p a , p + 1 2 + ( b a ) I p B p + 1 2 log I p B B p p + 1 2 , a .
The expectations in (26) can be easily calculated as
E log X I 0 p < X B = α B α + p + 1 2 B p a + α , p + 1 2 α = 0
and
E log I p X I B < X I p = α I p B α + p + 1 2 B p p + 1 2 , a + α α = 0 .
Hence, the required.

Author Contributions

Conceptualization, V.N. and S.N.; methodology, V.N. and S.N.; investigation, V.N. and S.N. All authors have read and agreed to the published version of the manuscript.

Funding

Research of the first author was partially supported by grants from the IMU-CDC, Simons Foundation, and the Heilbronn Institute for Mathematical Research.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the Editor and the two referees for careful reading and comments which greatly improved the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Special Functions

The following special functions are used in the paper: the gamma function defined by
Γ ( a ) = 0 t a 1 exp ( t ) d t
for a > 0 ; the digamma function is defined by
ψ ( a ) = d log Γ ( a ) d a
for a > 0 ; the beta function is defined by
B ( a , b ) = 0 1 t a 1 ( 1 t ) b 1 d t
for a > 0 and b > 0 ; the type I Dirichlet density is defined by
B a 1 , , a n ; a n + 1 = 0 t 1 + + t n 1 t 1 a 1 1 t n a n 1 1 i = 1 n t i a n + 1 1 d t 1 d t n
for a j > 0 , j = 1 , 2 , , k + 1 ; the modified Bessel function of the first kind of order ν defined by
I ν ( x ) = k = 0 1 Γ ( k + ν + 1 ) k ! x 2 2 k + ν
for k + ν + 1 0 , 1 , 2 , ; the matrix-variate gamma function is defined by
Γ p ( α ) = x α p + 1 2 exp tr ( x ) d x
for x a p × p positive definite matrix and a > p 1 2 ; the matrix-variate beta function is defined by
B p ( α , β ) = I p x α p + 1 2 x β p + 1 2 d x
for x a p × p positive definite matrix, I p x a p × p positive definite matrix, α > p 1 2 and β > p 1 2 ; the matrix-variate type I Dirichlet density is defined by
B p a 1 , , a n ; a n + 1 = x 1 a 1 p x n a n p I p i = 1 n x i a n + 1 p d x 1 d x n
for x i , i = 1 , 2 , , n and I p i = 1 n x i being p × p positive definite matrices, and a j > p 1 2 , j = 1 , 2 , , n + 1 ; the matrix-variate confluent hypergeometric function is defined by
  1 F 1 a ; b ; X = k = 0 κ ( a ) κ ( b ) κ C κ ( X ) k !
for X a p × p positive definite matrix and provided that Γ p ( α ) and Γ p ( β ) exist; the matrix-variate Gauss hypergeometric function is defined by
  2 F 1 a , b ; c ; X = k = 0 κ ( a ) κ ( b ) κ ( c ) κ C κ ( X ) k !
for X a p × p positive definite matrix and provided that Γ p ( a ) , Γ p ( b ) and Γ p ( c ) exist, where C κ ( X ) denotes the zonal polynomial of the p × p symmetric matrix X corresponding to the ordered partition κ = k 1 , , k p with k 1 k p 0 and k 1 + + k p = k , κ denotes summation over all such partitions κ , and
( a ) κ = i = 1 p a i 1 2 k i ,
where ( a ) 0 = 1 and ( a ) k = a ( a + 1 ) · ( a + k 1 ) .

References

  1. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  2. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Cham, Switzerland, 2006. [Google Scholar]
  3. Bouhlel, N.; Dziri, A. Kullback–Leibler divergence between multivariate generalized gaussian distributions. IEEE Signal Process. Lett. 2019, 26, 1021–1025. [Google Scholar] [CrossRef]
  4. Bouhlel, N.; Rousseau, D. A generic formula and some special cases for the Kullback–Leibler divergence between central multivariate Cauchy distributions. Entropy 2022, 24, 838. [Google Scholar] [CrossRef] [PubMed]
  5. Bouhlel, N.; Rousseau, D. Exact Rényi and Kullback–Leibler divergences between multivariate t-distributions. IEEE Signal Process. Lett. 2023, 30, 1672–1676. [Google Scholar] [CrossRef]
  6. Malik, H.J.; Abraham, B. Multivariate logistic distributions. Ann. Stat. 1973, 1, 588–590. [Google Scholar] [CrossRef]
  7. Satterthwaite, S.P.; Hutchinson, T.P. A generalization of Gumbel’s bivariate logistic distribution. Metrika 1978, 25, 163–170. [Google Scholar] [CrossRef]
  8. Sarabia, J.-M. The centered normal conditional distributions. Commun. Stat.-Theory Methods 1995, 24, 2889–2900. [Google Scholar] [CrossRef]
  9. Penny, W.D. Kullback-Liebler Divergences of Normal, Gamma, Dirichlet and Wishart Densities; Wellcome Department of Cognitive Neurology: London, UK, 2001. [Google Scholar]
  10. Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions; John Wiley and Sons: New York, NY, USA, 2000. [Google Scholar]
  11. Nagar, D.K.; Bedoya-Valencia, D.; Nadarajah, S. Multivariate generalization of the Gauss hypergeometric distribution. Hacet. J. Math. Stat. 2015, 44, 933–948. [Google Scholar] [CrossRef]
  12. Kotz, S. Multivariate distributions at a cross-road. In Statistical Distributions in Scientific Work; Patil, G.P., Kotz, S., Ord, J.K., Eds.; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1975; Volume 1, pp. 247–270. [Google Scholar]
  13. Pham-Gia, T. The multivariate Selberg beta distribution and applications. Statistics 2009, 43, 65–79. [Google Scholar] [CrossRef]
  14. Al-Mutairi, D.K.; Ghitany, M.E.; Kundu, D. A new bivariate distribution with weighted exponential marginals and its multivariate generalization. Stat. Pap. 2011, 52, 921–936. [Google Scholar] [CrossRef]
  15. Dawid, A.P. Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika 1981, 68, 265–274. [Google Scholar] [CrossRef]
  16. Gupta, A.K.; Nagar, D.K. Matrix-variate Gauss hypergeometric distribution. J. Aust. Math. Soc. 2012, 92, 335–355. [Google Scholar] [CrossRef]
  17. Nagar, D.K.; Gupta, A.K. Matrix-variate Kummer-beta distribution. J. Aust. Math. Soc. 2002, 73, 11–25. [Google Scholar] [CrossRef]
  18. Nagar, D.K.; Cardeno, L. Matrix variate Kummer-gamma distribution. Random Oper. Stoch. Equ. 2001, 9, 207–218. [Google Scholar]
  19. Zinodiny, S.; Nadarajah, S. Matrix variate two-sided power distribution. Methodol. Comput. Appl. Probab. 2022, 24, 179–194. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nawa, V.; Nadarajah, S. Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions. Entropy 2024, 26, 663. https://doi.org/10.3390/e26080663

AMA Style

Nawa V, Nadarajah S. Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions. Entropy. 2024; 26(8):663. https://doi.org/10.3390/e26080663

Chicago/Turabian Style

Nawa, Victor, and Saralees Nadarajah. 2024. "Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions" Entropy 26, no. 8: 663. https://doi.org/10.3390/e26080663

APA Style

Nawa, V., & Nadarajah, S. (2024). Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions. Entropy, 26(8), 663. https://doi.org/10.3390/e26080663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop