Hellinger Information Matrix and Hellinger Priors

Shemyakin, Arkady

doi:10.3390/e25020344

Open AccessArticle

Hellinger Information Matrix and Hellinger Priors

by

Arkady Shemyakin

Mathematics Department, University of St. Thomas, 2115 Summit Ave, St. Paul, MN 55105, USA

Entropy 2023, 25(2), 344; https://doi.org/10.3390/e25020344

Submission received: 20 December 2022 / Revised: 16 January 2023 / Accepted: 6 February 2023 / Published: 13 February 2023

(This article belongs to the Special Issue Information Theory and Information Geometry in Dynamical Systems and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Hellinger information as a local characteristic of parametric distribution families was first introduced in 2011. It is related to the much older concept of the Hellinger distance between two points in a parametric set. Under certain regularity conditions, the local behavior of the Hellinger distance is closely connected to Fisher information and the geometry of Riemann manifolds. Nonregular distributions (non-differentiable distribution densities, undefined Fisher information or denisities with support depending on the parameter), including uniform, require using analogues or extensions of Fisher information. Hellinger information may serve to construct information inequalities of the Cramer–Rao type, extending the lower bounds of the Bayes risk to the nonregular case. A construction of non-informative priors based on Hellinger information was also suggested by the author in 2011. Hellinger priors extend the Jeffreys rule to nonregular cases. For many examples, they are identical or close to the reference priors or probability matching priors. Most of the paper was dedicated to the one-dimensional case, but the matrix definition of Hellinger information was also introduced for higher dimensions. Conditions of existence and the nonnegative definite property of Hellinger information matrix were not discussed. Hellinger information for the vector parameter was applied by Yin et al. to problems of optimal experimental design. A special class of parametric problems was considered, requiring the directional definition of Hellinger information, but not a full construction of Hellinger information matrix. In the present paper, a general definition, the existence and nonnegative definite property of Hellinger information matrix is considered for nonregular settings.

Keywords:

Hellinger information; Hellinger priors; Bayes risk

1. Introduction

Information geometry plays an important role in parametrical statistical analysis. Fisher information is the most common information measure that is instrumental in the construction of lower bounds for quadratic risk (information inequalities of Cramer–Rao type), optimal experimental designs (E-optimality) and noninformative priors in Bayesian analysis (Jeffreys prior). These applications of Fisher information require certain regularity conditions on the distributions of the parametric family, which include the existence and integrability of the partial derivatives of the distribution density function with respect to the components of vector parameter and independence of the density support on the parameter. If these regularity conditions are not satisfied, Cramer–Rao lower bounds might be violated, and Jeffreys prior might not be defined.

There exist a number of ways to define information quantities (for the scalar parameter case) and matrices (for the vector parameter case) in the nonregular cases when Fisher information might not exist. One of such suggestions is the Wasserstein information matrix [1], which has been recently applied to the construction of objective priors [2]. The Wasserstein matrix does not require the differentiablity of the distribution density function, but it cannot be extended to the case of discontinuous densities. This latter case requires a more general definition of information.

Our approach is based on analyzing the local behavior of parametric sets using finite differences of pdf values at two adjacent points instead of derivatives at a point, which allows us to include differentiable densities as a special case but also to treat non-differentiable densities including jumps and other types of nonregular behavior (for classification of nonregularities, see [3]). A logical approach is to use (in lieu of Fisher information) the Hellinger information closely related to the definition of Hellinger distance between adjacent points of the parametric set.

Hellinger information for the case of scalar parameter was first defined in [4] and suggested for the construction of noninformative Hellinger priors. Section 2 is dedicated to the revision of this definition and the relationship of Hellinger information to the information inequalities for the scalar parameter proven in [5,6]. It contains some examples of noninformative Hellinger priors comparing the priors obtained in [4] with more recent results.

Lin et al. [7] extended the definition of Hellinger information to a special multiparameter case, where all components of a parameter expose the same type of nonregularity. This is effective in the resolution of some optimization problems in experimental design. However, the most interesting patterns of local behavior of Hellinger distance bringing about differences in the behavior of matrix lower bounds of the quadratic risk and multidimensional noninformative priors are observed when the parametric distribution family has different orders of nonregularity [3] for different components of the vector parameter. Thus, the main challenge in the construction of a Hellinger information matrix in the general case consists in the necessity to consider different magnitudes of increments in different directions of the vector parametric space.

A general definition of the Hellinger information matrix was attempted in [6] as related to information inequalities and in [4] as related to noninformative priors. Important questions were left out, such as the conditions of Hellinger information matrix being positive definite and the existence of non-trivial matrix lower bounds for the quadratic risk in case of the vector parameter. These questions are addressed in Section 3 of the paper. The main results are formulated, and several new examples are considered. General conclusions and possible future directions of study are discussed in Section 4.

2. Hellinger Information for Scalar Parameter

In this section, we address the case of probability measures parametrized by a single parameter. We provide necessary definitions of information measures along with the discussion of their properties, including the new definition of Hellinger information. Then, we consider the applications of Hellinger information to the information inequalities of the Cramér–Frechet–Rao type, the construction of objective priors, and problems of optimal design.

Definition (2) is modified from [8]. Inequality (5) was obtained in [5]. Examples in Section 2.3 are modified from [4].

2.1. Definitions

A family of probability measures

{P_{θ}, θ \in Θ \subset R}

is defined on a measurable space

(X, B)

so that all the measures from the family are absolutely continuous with respect to some

σ

-finite measure

λ

on

B

. The square of the Hellinger distance between any two parameter values can be defined in terms of densities

p (x; θ) = \frac{d P_{θ}}{d λ}

as

h^{2} (θ_{1}, θ_{2}) = \int_{X} {(\sqrt{p (x; θ_{1})} - \sqrt{p (x; θ_{2})})}^{2} λ (d x) .

(1)

This definition of the Hellinger distance (also known as Hellinger–Bhattacharyya distance) in its modern form was given in [9]. We use this definition to construct a new information measure. If for almost all

θ

from

Θ

(with regard to measure

λ

) there exists an

α \in (0, 2]

(index of regularity) such that

lim_{ϵ \to 0} \frac{h^{2 / α} (θ, θ + ϵ)}{{| ϵ |}^{2}} = J (θ),

(2)

we define Hellinger information at a point

θ

as

J (θ)

. The index of regularity is related to the local behavior of the density

p (x; θ)

. Using the classification of [3],

1 < α < 2

corresponds to singularities of the first and the second type,

α = 1

to densities with jumps, and

0 < α < 1

to singularities of the third type.

Notice that in the regular situations classified in [3] (

p (x; θ)

is twice continuously differentiable with respect to

θ

for almost all

x \in X

with respect to

λ

, the density support

{x : p (x; θ) > 0}

does not depend on parameter

θ

, Fisher information

I (θ) = E_{θ} {(\frac{d}{d θ} log p (x; θ))}^{2} = \int_{X} {(\frac{d}{d θ} p (x; θ))}^{2} / p (x; θ) λ (d x)

(3)

is continuous, strictly positive and finite for almost all

θ

from

Θ

), it is true that

α = 2, J (θ) = \frac{1}{4} I (θ)

. Under the regularity conditions above, the score function

\frac{d}{d θ} log p (x; θ)

has mean zero and Fisher information as variance. This helps to establish the connection of Fisher information to the limiting distribution of maximum likelihood estimators, its additivity with respect to i.i.d. sample observations, and its role in the lower bounds of risk (information inequalities).

Wasserstein information [1] can be defined for the scalar parameter through the c.d.f.

F (x; θ) = \int_{- \infty}^{x} p (x; θ) d θ

as

W (θ) = \int_{X} {(\frac{d}{d θ} F (x; θ) / p (x; θ))}^{2} λ (d x),

which does not require differentiablity of the density function

p (x; θ)

. That opens new possibilities for the construction of an objective prior in the case of non-differentiable densities, see [2].

However, we are interested in even less regular situations (including uniform densities with support depending on parameter) for which neither Fisher information

I (θ)

nor Wasserstein information

W (θ)

can be helpful, while Hellinger information

J (θ)

may function as their substitute.

2.2. Information Inequalities

We define the quadratic Bayes risk for an estimator

θ^{*} = θ^{*} (X^{(n)})

constructed by an independent identically distributed sample

X^{(n)} = (X_{1}, \dots, X_{n})

of size n from the model considered above with

p (x^{(n)}; θ) = \frac{d P_{θ}^{(n)}}{d λ^{n}}

and prior

π (θ)

as

R (θ^{*}) = \int_{X^{n}} \int_{Θ} {(θ^{*} (x^{(n)}) - θ)}^{2} p (x^{(n)}; θ) π (θ) λ^{n} (d x) d θ .

Let us consider an integral version of the classical Cramér–Frechet–Rao inequality, which under certain regularity conditions leads to the following asymptotic lower bound for the Bayes risk in terms of Fisher information:

inf_{θ^{*}} R (θ^{*}) \geq n^{- 1} \int_{Θ} I^{- 1} (θ) π (θ) d θ + O (n^{- 2}) .

(4)

This lower bound, which can be proven to be tight, was first obtained by [10], also in [11,12] under slightly different regularity assumptions. This bound can be extended to the nonregular case, when Fisher information may not exist. One of these extensions is Hellinger information inequality, providing an asymptotic lower bound

inf_{θ^{*}} R (θ^{*}) \geq n^{- 2 / α} C (α) \int_{Θ} J^{- 1} (θ) π (θ) d θ + O (n^{- 2 / α})

(5)

obtained in [6] under the assumptions of Hellinger information

J (θ)

being strictly positive, almost surely continuous, bounded on any compact subset of

Θ

, and satisfying condition

\int_{Θ} J^{- 1} (θ) π (θ) d θ < \infty

, where

Θ

is an open subset of real numbers and the constant

C (α)

is related to technical details of the proof and is not necessarily tight.

The key identity establishing the role of

J (θ)

in the case of i.i.d. samples

x^{(n)} = (x_{1}, \dots, x_{n})

h_{n}^{2} (θ_{1}, θ_{2}) = \int_{X} {(\sqrt{p (x^{(n)}; θ_{1})} - \sqrt{p (x^{(n)}; θ_{2})})}^{2} λ^{n} (d x) = 2 [1 - {(1 - \frac{1}{2} h^{2} (θ_{1}, θ_{2}))}^{n}]

(6)

easily follows from the definition and independence of

{X_{i}}

. Similar to the additivity of Fisher information, it allows for a transition from a single observation to a sample.

2.3. Hellinger Priors

The three most popular ways to obtain a non-informative prior might be as follows:

The Jeffreys rule $π (θ) \propto \sqrt{I (θ)}$ [13],
Probability matching priors [14,15],
Reference priors [16,17].

For many regular parameter families, probability matching and reference priors both satisfy the Jeffreys rule. However, it is not necessary in case of multi-parametric families and the loss of regularity. Let us focus on the nonregular case, when Fisher information may not be defined. Most comprehensive results on reference priors in the nonregular case were obtained in [18].

Define Hellinger prior for the parametric set as in [4,8]:

π_{H} (θ) \propto \sqrt{J (θ)} .

(7)

Hellinger priors will often coincide with well-known priors obtained by the approaches described above. However, there are some distinctions. A special role might be played by Hellinger priors in nonregular cases. We provide two simple examples of densities with support depending on the parameter.

Example 1.

Uniform

X \sim Unif (θ^{- 1}, θ), θ \in (1, \infty)

.

α = 1, π_{H} (θ) \propto J (θ) = \frac{θ^{2} + 1}{θ (θ^{2} - 1)} .

The same prior can be constructed as the probability matching prior g or the reference prior [18].

Example 2.

Uniform

X \sim Unif (θ, θ^{2}), θ \in (1, \infty)

.

α = 1, π_{H} (θ) \propto J (θ) = \frac{2 θ + 1}{θ (θ - 1)} .

(8)

This prior is different from the reference prior obtained in [18] by rather technical calculations, maximizing the Kullback–Leibler divergence from the prior to the posterior:

π_{R 1} (θ) \propto \frac{2 θ - 1}{θ (θ - 1)} exp \{ψ (\frac{2 θ}{2 θ - 1})\},

(9)

where

ψ (z) = \frac{d}{d z} log Γ (z), z > 0

is the polygamma function order 1. Tri Minh Le [19] used a similar approach, maximizing Hellinger distance between the prior and the posterior, and obtained

π_{R 2} (θ) \propto \frac{{(2 θ + 1)}^{2} {(2 θ - 1)}^{5}}{θ^{3} (θ - 1) {[20 \sqrt{π} {(2 θ - 1)}^{2} - 2 {(2 θ + 1)}^{2}]}^{2}} .

(10)

All three priors in (8), (9), and (10) have distinct functional forms. However, they are very close after appropriate re-normalization on the entire domain, which can be demonstrated graphically and numerically. See Figure 1 and the following comment.

For instance, the ratio

π_{H} (θ) / π_{R 2} (θ)

monotonically increases for

θ

from 1 to ∞ so that

0.996 < π_{H} (θ) / π_{R 2} (θ) < 1.083 .

2.4. Optimal Design

A polynomial model of the experimental design may be presented as in [20],

y_{i} = \sum_{k = 0}^{q} θ_{k} x_{i}^{k} + ϵ_{i}, i = 1, \dots, n,

where

x_{i}

are scalars,

θ

is the unknown vector parameter of interest, and errors

ϵ_{i}

are non-negative i.i.d variables with density

p_{0} (y; α) \sim α c (α) y^{α - 1}

(e.g., Weibull or Gamma). The space of balanced designs is defined as

Ξ = {ξ = (w_{i}, x_{i}), i = 1, \dots, n : \sum_{i = 1}^{n} w_{i} x_{i} = 0, x_{i} \in [- A, A]},

and there exist several definitions of optimal design. Lin et al. [7] suggest using criterion

ξ_{o p t} = arg max_{ξ} J_{ξ} (θ), J_{ξ} (θ) = inf_{u} J_{ξ} (θ, u)

where

J_{ξ} (θ, u)

is Hellinger information in the direction

u \in R^{q + 1}

, defined as

lim_{ϵ \to 0} \frac{h (θ, θ + ϵ u)}{{| ϵ |}^{α}} = J_{ξ} (θ, u),

(11)

which is similar to the definition given in Section 1, but notice the difference with (2) in the treatment of powers:

ϵ^{α}

versus

ϵ^{2}

in the denominator. Notice also that index of regularity

α

is assumed to be the same for all components of the vector parameter.

3. Hellinger Information Matrix for Vector Parameter

In this section, we concentrate on the multivariate parameter case allowing for different degrees of regularity for different components of the vector parameter. We define the Hellinger information matrix, determine our understanding of matrix information inequalities, formulate main results establishing lower bounds for the Bayes risk in terms of Hellinger information matrix, and provide examples of Hellinger priors illustrating the conditions of Theorems 1 and 2.

Proofs of the main results use the approach developed in [5]; Example 3 was previously mentioned in [8].

3.1. Definitions

Extending definitions of Section 1 to the vector case

Θ \subset R^{m}, m = 1, 2, \dots

, we first introduce, as in [6], the Hellinger distance matrix H with elements

H_{i j} (θ, U) = \int_{X} (\sqrt{p (x; θ)} - \sqrt{p (x; θ + u_{i})}) (\sqrt{p (x; θ)} - \sqrt{p (x; θ + u_{j})}) λ (d x)

(12)

where increments

u_{i}

are columns of an

m \times m

matrix U. Define also vectors

α = {(α_{1}, \dots, α_{m})}^{'}

(index of regularity with components

0 < α_{i} \leq 2

) and

δ = {(δ_{1}, \dots, δ_{m})}^{'}

with components

δ_{i} = ϵ^{2 / α_{i}}

,

Δ = D i a g (δ)

such that for all

i = 1, \dots, m

there exist finite non-degenerate limits

0 < lim_{ϵ \to 0} {| ϵ |}^{- 2} H_{i i}^{2 / α_{i}} (θ, Δ) < \infty

Then, the Hellinger information matrix will be defined by its components

J_{i j} (θ) = lim_{ϵ \to 0} {| ϵ |}^{- 2} H_{i j}^{1 / α_{i} + 1 / α_{j}} (θ, Δ) .

(13)

Notice that components of the vector index of regularity

α

can be different, and therefore, components of the vector of increments

δ

can have different orders of magnitude with respect to

ϵ

. As a result, while the elements of matrix

H (θ, Δ)

may expose different local behavior depending on the components of

α

, the elements of matrix

J (θ)

are all finite.

3.2. Information Inequalities

Define the matrix of Bayes risk for an i.i.d. sample

x^{(n)} = (x_{1}, \dots, x_{n})

with

p (x^{(n)}; θ) = \frac{d P_{θ}^{(n)}}{d λ^{n}}

as

R (θ^{*}) = \int_{X^{(n)}} \int_{Θ} (θ^{*} (x^{(n)}) - θ) {(θ^{*} (x^{(n)}) - θ)}^{'} p (x^{(n)}; θ) π (θ) λ^{n} (d x) d θ

(14)

and matrix ordering

A (n) ⪰ B (n)

as asymptotic positive semi-definite property

inf_{| v | = 1} \sum_{i = 1}^{m} \sum_{j = 1}^{m} {(A (n) - B (n))}_{i j} v_{i} v_{j} \geq - δ_{n}, lim_{n \to \infty} δ_{n} | | A (n) - {B (n) | |}^{- 1} = 0, δ_{n} > 0 .

(15)

Let also

E (g (X^{(n)}, θ)) = \int_{X^{(n)}} \int_{Θ} g (x^{(n)}, θ) p (x^{(n)}; θ) π (θ) λ^{n} (d x) d θ

denote the expectation over

(X^{(n)}, Θ)

. The following results formulate conditions under which the lower bounds for risk (14) in the sense of (15) are obtained in terms of Hellinger information and the index of regularity.

3.3. Main Results

Theorem 1.

Let

α = {(2, \dots, 2, α_{m})}^{'}

,

α_{m} \in [1, 2)

,

J_{i j} (θ) = I_{i j} (θ), i, j = 1, \dots, m - 1; J_{i m} = 0, J_{m m} \neq 0 .

(\begin{matrix} 0 \\ I (θ) & \dots \\ 0 \\ 0 & \dots & 0 & J_{m m} (θ) \end{matrix})

Then, if

E I^{- 1} (θ) < \infty

,

E J_{m m}^{- 1} (θ) < \infty

, it is true that

1.: $J (θ) > 0$
2.: ${inf}_{θ^{*}} R (θ^{*}) ⪰ C_{1} (α_{m}) D i a g (n^{- 2 / α_{m}}) E J^{- 1} (θ)$

Theorem 2.

Let

α_{1} = \dots = α_{m} = 1

,

J_{i j} (θ) = 0, i \neq j; J_{i i} \neq 0 .

(\begin{matrix} J_{11} (θ) & 0 & \dots & 0 \\ 0 & J_{22} (θ) & 0 & \dots \\ \dots & 0 & \dots & 0 \\ 0 & \dots & 0 & J_{m m} (θ) \end{matrix})

Then, if

E J_{i i}^{- 1} (θ) < \infty

, it is true that

1.: $J (θ) > 0$
2.: ${inf}_{θ^{*}} R (θ^{*}) ⪰ C_{2} n^{- 2} E J^{- 1} (θ)$

Proofs of Theorems 1 and 2 are technically similar to the proof of the main results of [5], although the definition of Hellinger information was not explicitly provided in that paper.

3.4. Hellinger Priors

If

J > 0

, as in the conditions of Theorems 1 and 2, the vector Hellinger prior can be defined as

π_{H} (θ) \propto \sqrt{det J (θ)} .

In the case of all components

α_{i} \equiv 2

, Hellinger information reduces to the Fisher information matrix, and our approach leads to the Jeffreys prior [12].

Example 3.

Truncated Weibull distribution (see Theorem 1):

p (x; β, ϕ, τ) = β ϕ^{β} x^{β - 1} exp {- ϕ^{β} (x^{β} - τ^{β})}, x \in [τ, \infty)

with two parameters of interest: regular pseudo-scale

ϕ > 0

and nonregular threshold

τ > 0

. Assume

β > 2

fixed. See Figure 2.

Using notation

θ = (ϕ, τ)

, obtain

α_{1} = 2, α_{2} = 1, δ_{1} = ϵ, δ_{2} = ϵ^{2}

and

H (θ, Δ) = (\begin{matrix} δ_{1} ϕ^{- 2} & o (δ_{1} δ_{2}) \\ o (δ_{1} δ_{2}) & δ_{2} ϕ^{2 β} τ^{2 β - 2} \end{matrix}) .

After limit transition in

δ

,

J (θ) = (\begin{matrix} ϕ^{- 2} & 0 \\ 0 & ϕ^{2 β} τ^{2 β - 2} \end{matrix}) .

Therefore,

π_{H} (θ) \propto ϕ^{β - 1} τ^{β - 1}

, which is also the reference prior for the vector parameter [21].

Example 4.

Circular beta distribution on a disc (see Theorem 1):

p (r, w; β_{1}, β_{2}, ρ) = \frac{r}{π ρ^{2}} {(\frac{w}{2 π})}^{β_{1} - 1} {(1 - \frac{w}{2 π})}^{β_{1} - 1}

with three parameters of interest: regular

β_{1}, β_{2} > 1

and nonregular radius

ρ > 0

. See Figure 3.

J (θ) = (\begin{matrix} ψ (β_{1}) - ψ (β_{1} + β_{2}) & - ψ (β_{1} + β_{2}) & 0 \\ - ψ (β_{1} + β_{2}) & ψ (β_{1}) - ψ (β_{1} + β_{2}) & 0 \\ 0 & 0 & 4 ρ^{- 2} \end{matrix}),

where

ψ (z)

is the polygamma function of order 1.

Therefore,

π_{H} (θ) \propto ρ^{- 1} \sqrt{ψ (β_{1}) ψ (β_{2}) - ψ (β_{1} + β_{2}) [ψ (β_{1}) + ψ (β_{2})]}

Example 5.

Uniform on a rectangle (see Theorem 2):

X \sim Unif (θ, θ^{2}), θ \in (1, \infty)

with two parameters of interest: regular pseudo-scale

ϕ > 0

and nonregular threshold

τ > 0 .

Using notation

θ = (θ_{1}, θ_{2})

, obtain

α_{1} = α_{2} = 1, δ = δ_{1} = δ_{2} = ϵ^{2}

and

H (θ, Δ) = (\begin{matrix} δ θ_{1}^{- 2} & δ^{2} {(θ_{1} θ_{2})}^{- 1} \\ δ^{2} {(θ_{1} θ_{2})}^{- 1} & δ θ_{2}^{- 2} \end{matrix}) .

After the limit transition,

J (θ) = (\begin{matrix} θ_{1}^{- 2} & 0 \\ 0 & θ_{2}^{- 2} \end{matrix}) .

Therefore,

π_{H} (θ) \propto θ_{1}^{- 1} θ_{2}^{- 1}

, also the reference prior.

Example 6.

Uniform with two moving boundaries

X \sim Unif (θ_{1}, θ_{1} + θ_{2}), θ_{2} \in (0, \infty)

(neither Theorem 1 nor Theorem 2 applies): using notation

θ = (θ_{1}, θ_{2})

, obtain

α_{1} = α_{2} = 1, δ = δ_{1} = δ_{2} = ϵ^{2}

and

H (θ, Δ) = (\begin{matrix} 2 δ θ_{2}^{- 2} & δ θ_{2}^{- 2} \\ δ θ_{2}^{- 2} & δ θ_{2}^{- 2} \end{matrix}) .

After limit transition,

J (θ) = (\begin{matrix} 2 θ_{2}^{- 2} & θ_{2}^{- 2} \\ θ_{2}^{- 2} & θ_{2}^{- 2} \end{matrix}) .

Therefore,

π_{H} (θ) \propto θ_{2}^{- 2}

, also the reference prior.

4. Discussion

A Hellinger information matrix can be defined in a reasonable way to serve as a substitute for the Fisher information matrix in multivariate nonregular cases. It can be used as a technically simple tool for the elicitation of non-informative priors.

Properties of the Hellinger distance (symmetry, etc.) grant certain advantages vs analogous constructions based on Kullback–Leibler divergence.

Some interesting nonregularities are not covered by the conditions of Theorems 1 and 2 (see Example 6). More general results related to a positive definite property of

J (θ)

would be interesting.

It is tempting to obtain Hellinger priors as the solution of a particular optimization problem (similar to the reference priors).

Funding

This research was supported by the Center for Applied Mathematics at the University of St. Thomas.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author is grateful for helpful discussions at the online Borovkov Readings and O-Bayes-22 in Santa Cruz, CA in August 2022. The author is also thankful for many helpful comments by the reviewers.

Conflicts of Interest

The author declares no conflict of interest.

References

Li, W.; Zhao, J. Wasserstein information matrix. arXiv 2019, arXiv:1910.11248. [Google Scholar]
Li, W.; Rubio, F.J. On a prior based on the Wasserstein information matrix. arXiv 2022, arXiv:2202.03217. [Google Scholar] [CrossRef]
Ibragimov, I.A.; Has’minskii, R.Z. Statistical Estimation: Asymptotic Theory; Springer: New York, NY, USA, 1981. [Google Scholar]
Shemyakin, A. A New Approach to Construction of Objective Priors: Hellinger Information. Appl. Econom. 2012, 28, 124–137. (In Russian) [Google Scholar]
Shemyakin, A. Rao-Cramer type multidimensional integral inequalities for parametric families with singularities. Sib. Math. J. 1991, 32, 706–715. (In Russian) [Google Scholar] [CrossRef]
Shemyakin, A. On Information Inequalities in the Parametric Estimation. Theory Probab. Appl. 1992, 37, 89–91. [Google Scholar] [CrossRef]
Lin, Y.; Martin, R.; Yang, M. On Optimal Designs for Nonregular Models. Ann. Statist. 2019, 47, 3335–3359. [Google Scholar] [CrossRef]
Shemyakin, A. Hellinger Distance and Non-informative Priors. Bayesian Anal. 2014, 9, 923–938. [Google Scholar] [CrossRef]
Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 1943, 35, 99–109. [Google Scholar]
Bobrovsky, B.Z.; Mayer-Wof, E.; Zakai, M. Some classes of global Cramer-Rao bounds. Ann. Statist. 1987, 15, 1421–1438. [Google Scholar] [CrossRef]
Brown, L.D.; Gajek, L. Information inequalities for the Bayes risk. Ann. Statist. 1990, 18, 1578–1594. [Google Scholar] [CrossRef]
Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. Math. Phys. Sci. 1946, 186, 453–461. [Google Scholar]
Ghosal, S. Probability matching priors for non-regular cases. Biometrika 1999, 86, 956–964. [Google Scholar] [CrossRef]
Berger, J.O.; Bernardo, J.M.T. On the development of reference priors (with discussion). Bayesian Anal. 1992, 4, 35–60. [Google Scholar]
Ghosal, S.; Samanta, T. Expansion of Bayes risk for entropy loss and reference prior in nonregular cases. Statist. Decis. 1997, 15, 129–140. [Google Scholar] [CrossRef]
Berger, J.O.; Bernardo, J.M.; Sun, D. The formal definition of reference priors. Ann. Statist. 2009, 37, 905–938. [Google Scholar] [CrossRef]
Sun, D.; Berger, J.O. Reference priors with partial information. Biometrika 1998, 85, 55–71. [Google Scholar] [CrossRef]
Le, T.M. The Formal Definition of Reference Priors under a General Class of Divergence. Electronic Dissertations, University of Missouri-Columbia, Columbia, MO, USA, 2014. [Google Scholar] [CrossRef]
Smith, R.L. Nonregular regression. Biometrika 1994, 81, 173–183. [Google Scholar] [CrossRef]
Ghosal, S.; Ghosh, J.; Samanta, T. On convergence of posterior distributions. Ann. Statist. 1995, 23, 2145–2152. [Google Scholar] [CrossRef]
Sun, D. A note on non-informative priors for Weibull distribution. J. Stat. Plan. Inference 1997, 61, 319–338. [Google Scholar] [CrossRef]

Figure 1. Three priors for Example 2:

π_{H} (θ), π_{R 1} (θ), π_{R 2} (θ)

.

Figure 1. Three priors for Example 2:

π_{H} (θ), π_{R 1} (θ), π_{R 2} (θ)

.

Figure 2. Truncated Weibull

p (x

; 5, 0.1, 0);

p (x

; 5, 0.11, 0);

p (x

; 5, 0.1, 5).

Figure 2. Truncated Weibull

p (x

; 5, 0.1, 0);

p (x

; 5, 0.11, 0);

p (x

; 5, 0.1, 5).

Figure 3. Truncated beta on a disc.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shemyakin, A. Hellinger Information Matrix and Hellinger Priors. Entropy 2023, 25, 344. https://doi.org/10.3390/e25020344

AMA Style

Shemyakin A. Hellinger Information Matrix and Hellinger Priors. Entropy. 2023; 25(2):344. https://doi.org/10.3390/e25020344

Chicago/Turabian Style

Shemyakin, Arkady. 2023. "Hellinger Information Matrix and Hellinger Priors" Entropy 25, no. 2: 344. https://doi.org/10.3390/e25020344

APA Style

Shemyakin, A. (2023). Hellinger Information Matrix and Hellinger Priors. Entropy, 25(2), 344. https://doi.org/10.3390/e25020344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hellinger Information Matrix and Hellinger Priors

Abstract

1. Introduction

2. Hellinger Information for Scalar Parameter

2.1. Definitions

2.2. Information Inequalities

2.3. Hellinger Priors

2.4. Optimal Design

3. Hellinger Information Matrix for Vector Parameter

3.1. Definitions

3.2. Information Inequalities

3.3. Main Results

3.4. Hellinger Priors

4. Discussion

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI