1. Introduction
Probability theory, information theory, learning theory, statistical signal processing and other related disciplines, greatly benefit from non-negative measures of dissimilarity (a.k.a. divergence measures) between pairs of probability measures defined on the same measurable space (see, e.g., [
1,
2,
3,
4,
5,
6,
7]). An axiomatic characterization of information measures, including divergence measures, was provided by Csiszár [
8]. Many useful divergence measures belong to the set of
f-divergences, independently introduced by Ali and Silvey [
9], Csiszár [
10,
11,
12,
13], and Morimoto [
14] in the early sixties. The family of
f-divergences generalizes the relative entropy (a.k.a. the Kullback- Leibler divergence) while also satisfying the data processing inequality among other pleasing properties (see, e.g., [
3] and references therein).
Integral representations of
f-divergences serve to study properties of these information measures, and they are also used to establish relations among these divergences. An integral representation of
f-divergences, expressed by means of the DeGroot statistical information, was provided in [
3] with a simplified proof in [
15]. The importance of this integral representation stems from the operational meaning of the DeGroot statistical information [
16], which is strongly linked to Bayesian binary hypothesis testing. Some earlier specialized versions of this integral representation were introduced in [
17,
18,
19,
20,
21], and a variation of it also appears in [
22] Section 5.B. Implications of the integral representation of
f-divergences, by means of the DeGroot statistical information, include an alternative proof of the data processing inequality, and a study of conditions for the sufficiency or
-deficiency of observation channels [
3,
15].
Since many distance measures of interest fall under the paradigm of an
f-divergence [
23], bounds among
f-divergences are very useful in many instances such as the analysis of rates of convergence and concentration of measure bounds, hypothesis testing, testing goodness of fit, minimax risk in estimation and modeling, strong data processing inequalities and contraction coefficients, etc. Earlier studies developed systematic approaches to obtain
f-divergence inequalities while dealing with pairs of probability measures defined on arbitrary alphabets. A list of some notable existing
f-divergence inequalities is provided, e.g., in [
22] Section 1 and [
23] Section 3. State-of-the-art techniques which serve to derive bounds among
f-divergences include:
- (1)
Moment inequalities which rely on log-convexity arguments ([
22] Section 5.D, [
24,
25,
26,
27,
28]);
- (2)
Inequalities which rely on a characterization of the exact locus of the joint range of
f-divergences [
29];
- (3)
f-divergence inequalities via functional domination ([
22] Section 3, [
30,
31,
32]);
- (4)
Sharp
f-divergence inequalities by using numerical tools for maximizing or minimizing an
f-divergence subject to a finite number of constraints on other
f-divergences [
33];
- (5)
Inequalities which rely on powers of
f-divergences defining a distance [
34,
35,
36,
37];
- (6)
Vajda and Pinsker-type inequalities for
f-divergences ([
4,
10,
13,
22] Sections 6–7, [
38,
39]);
- (7)
Bounds among
f-divergences when the relative information is bounded ([
22] Sections 4–5, [
40,
41,
42,
43,
44,
45,
46,
47]), and reverse Pinsker inequalities ([
22] Section 6, [
40,
48]);
- (8)
Inequalities which rely on the minimum of an
f-divergence for a given total variation distance and related bounds [
4,
33,
37,
38,
49,
50,
51,
52,
53];
- (9)
Bounds among
f-divergences (or functions of
f-divergences such as the Rényi divergence) via integral representations of these divergence measures [
22] Section 8;
- (10)
Inequalities which rely on variational representations of
f-divergences (e.g., [
54] Section 2).
Following earlier studies of the local behavior of
f-divergences and their asymptotic properties (see related results by Csiszár and Shields [
55] Theorem 4.1, Pardo and Vajda [
56] Section 3, and Sason and Vérdu [
22] Section 3.F), it is known that the local behavior of
f-divergences scales, such as the chi-square divergence (up to a scaling factor which depends on
f) provided that the first distribution approaches the reference measure in a certain strong sense. The study of the local behavior of
f-divergences is an important aspect of their properties, and we further study it in this work.
This paper considers properties of
f-divergences, while first introducing in
Section 2 the basic definitions and notation needed, and in particular the various measures of dissimilarity between probability measures used throughout this paper. The presentation of our new results is then structured as follows:
Section 3 is focused on the derivation of new integral representations of
f-divergences, expressed as a function of the relative information spectrum of the pair of probability measures, and the convex function
f. The novelty of
Section 3 is in the unified approach which leads to integral representations of
f-divergences by means of the relative information spectrum, where the latter cumulative distribution function plays an important role in information theory and statistical decision theory (see, e.g., [
7,
54]). Particular integral representations of the type of results introduced in
Section 3 have been recently derived by Sason and Verdú in a case-by-case basis for some
f-divergences (see [
22] Theorems 13 and 32), while lacking the approach which is developed in
Section 3 for general
f-divergences. In essence, an
f-divergence
is expressed in
Section 3 as an inner product of a simple function of the relative information spectrum (depending only on the probability measures
P and
Q), and a non-negative weight function
which only depends on
f. This kind of representation, followed by a generalized result, serves to provide new integral representations of various useful
f-divergences. This also enables in
Section 3 to characterize the interplay between the DeGroot statistical information (or between another useful family of
f-divergence, named the
divergence with
) and the relative information spectrum.
Section 4 provides a new approach for the derivation of
f-divergence inequalities, where an arbitrary
f-divergence is lower bounded by means of the
divergence [
57] or the DeGroot statistical information [
16]. The approach used in
Section 4 yields several generalizations of the Bretagnole-Huber inequality [
58], which provides a closed-form and simple upper bound on the total variation distance as a function of the relative entropy; the Bretagnole-Huber inequality has been proved to be useful, e.g., in the context of lower bounding the minimax risk in non-parametric estimation (see, e.g., [
5] pp. 89–90, 94), and in the problem of density estimation (see, e.g., [
6] Section 1.6). Although Vajda’s tight lower bound in [
59] is slightly tighter everywhere than the Bretagnole-Huber inequality, our motivation for the generalization of the latter bound is justified later in this paper. The utility of the new inequalities is exemplified in the setup of Bayesian binary hypothesis testing.
Section 5 finally derives new results on the local behavior of
f-divergences, i.e., the characterization of their scaling when the pair of probability measures are sufficiently close to each other. The starting point of our analysis in
Section 5 relies on the analysis in [
56]
Section 3, regarding the asymptotic properties of
f-divergences.
3. New Integral Representations of -Divergences
The main result in this section provides new integral representations of
f-divergences as a function of the relative information spectrum (see Definition 2). The reader is referred to other integral representations (see [
15] Section 2, [
4] Section 5, [
22] Section 5.B, and references therein), expressing a general
f-divergence by means of the DeGroot statistical information or the
divergence.
Lemma 1. Letbe a strictly convex function at 1. Letbe defined aswheredenotes the right-hand derivative of f at 1 (due to the convexity of f on,
it exists and it is finite). Then, the function g is non-negative, it is strictly monotonically decreasing on,
and it is strictly monotonically increasing onwith.
Proof. For any function
, let
be given by
and let
be the conjugate function, as given in (
12). The function
g in (
54) can be expressed in the form
as it is next verified. For
, we get from (
12) and (
55),
and the substitution
for
yields (
56) in view of (
54).
By assumption,
is strictly convex at 1, and therefore these properties are inherited to
. Since also
, it follows from [
3] Theorem 3 that both
and
are non-negative on
, and they are also strictly monotonically decreasing on
. Hence, from (
12), it follows that the function
is strictly monotonically increasing on
. Finally, the claimed properties of the function
g follow from (
56), and in view of the fact that the function
is non-negative with
, strictly monotonically decreasing on
and strictly monotonically increasing on
. ☐
Lemma 2. Letbe a strictly convex function at 1, and letbe as in (
54)
. Letand let and be the two inverse functions of g. Then, Proof. In view of Lemma 1, it follows that is strictly monotonically increasing and is strictly monotonically decreasing with .
Let
, and let
. Then, we have
where (
61) relies on Proposition 1; (
62) relies on Proposition 2; (
64) follows from (
3); (
65) follows from (
56); (
66) holds by the definition of the random variable
V; (
67) holds since, in view of Lemma 1,
, and
for any non-negative random variable
Z; (
68) holds in view of the monotonicity properties of
g in Lemma 1, the definition of
a and
b in (
58) and (
59), and by expressing the event
as a union of two disjoint events; (
69) holds again by the monotonicity properties of
g in Lemma 1, and by the definition of its two inverse functions
and
as above; in (
67)–(
69) we are free to substitute > by ≥, and < by ≤; finally, (
70) holds by the definition of the relative information spectrum in (
4). ☐
Remark 1. The functionin (
54)
is invariant to the mapping ,
for ,
with an arbitrary .
This invariance of g (and, hence, also the invariance of its inverse functions and )
is well expected in view of Proposition 1 and Lemma 2.
Example 1. For the chi-squared divergence in (
26)
, letting f be as in (
27)
, it follows from (
54)
thatwhich yields, from (
58)
and (59), .
Calculation of the two inverse functions of g, as defined in Lemma 2, yields the following closed-form expression:
Substituting (
72)
into (
60)
provides an integral representation of .
Proof. Let
. Then, we have
where (
74) holds by (
4); (
75) follows from (
3); (
76) holds by the substitution
; (
77) holds since
, and finally (
78) holds since
. ☐
Remark 2. Unlike Example 1, in general, the inverse functionsandin Lemma 2 are not expressible in closed form, motivating our next integral representation in Theorem 1.
The following theorem provides our main result in this section.
Theorem 1. The following integral representations of an f-divergence, by means of the relative information spectrum, hold:
- (1)
Let
- -
be differentiable on;
- -
be the non-negative weight function given, for,
by - -
the functionbe given by
- (2)
More generally, for an arbitrary,
letbe a modified real-valued function defined as
Proof. We start by proving the special integral representation in (
81), and then extend our proof to the general representation in (
83).
- (1)
We first assume an additional requirement that
f is strictly convex at 1. In view of Lemma 2,
Since by assumption
is differentiable on
and strictly convex at 1, the function
g in (
54) is differentiable on
. In view of (
84) and (
85), substituting
in (
60) for
implies that
where
is given by
for
, where (
88) follows from (
54). Due to the monotonicity properties of
g in Lemma 1, (
87) implies that
for
, and
for
. Hence, the weight function
in (
79) satisfies
The combination of (
80), (
86) and (
89) gives the required result in (
81).
We now extend the result in (
81) when
is differentiable on
, but not necessarily strictly convex at 1. To that end, let
be defined as
This implies that
is differentiable on
, and it is also strictly convex at 1. In view of the proof of (
81) when
f is strict convexity of
f at 1, the application of this result to the function
s in (
90) yields
In view of (
6), (
22), (
23), (
25) and (
90),
from (
79), (
89), (
90) and the convexity and differentiability of
, it follows that the weight function
satisfies
for
. Furthermore, by applying the result in (
81) to the chi-squared divergence
in (
25) whose corresponding function
for
is strictly convex at 1, we obtain
Finally, the combination of (
91)–(
94), yields
; this asserts that (
81) also holds by relaxing the condition that
f is strictly convex at 1.
- (2)
In view of (
80)–(
82), in order to prove (
83) for an arbitrary
, it is required to prove the identity
Equality (
95) can be verified by Lemma 3: by rearranging terms in (
95), we get the identity in (
73) (since
). ☐
Remark 3. Due to the convexity of f, the absolute value in the right side of (
79)
is only needed for (see (
88)
and (
89)
). Also, since .
Remark 4. The weight functiononly depends on f, and the functiononly depends on the pair of probability measures P and Q. In view of Proposition 1, it follows that, for,
the equalityholds onif and only if (
11)
is satisfied with an arbitrary constant .
It is indeed easy to verify that (
11)
yields on .
Remark 5. An equivalent way to writein (
80)
iswhere .
Hence, the function is monotonically increasing in ,
and it is monotonically decreasing in ;
note that this function is in general discontinuous at 1 unless .
If ,
then Note that if, thenis zero everywhere, which is consistent with the fact that.
Remark 6. In the proof of Theorem 1-(1), the relaxation of the condition of strict convexity at 1 for a differentiable function is crucial, e.g., for the divergence with . To clarify this claim, note that in view of (
32)
, the function is differentiable if , and with ; however, if , so in not strictly convex at 1 unless . Remark 7. Theorem 1-(1) with enables, in some cases, to simplify integral representations of f-divergences. This is next exemplified in the proof of Theorem 2.
Theorem 1 yields integral representations for various
f-divergences and related measures; some of these representations were previously derived by Sason and Verdú in [
22] in a case by case basis, without the unified approach of Theorem 1. We next provide such integral representations. Note that, for some
f-divergences, the function
is not differentiable on
; hence, Theorem 1 is not necessarily directly applicable.
Theorem 2. The following integral representations hold as a function of the relative information spectrum:
- (1)
Relative entropy [22] (219): - (2)
Hellinger divergence of order [22] (434) and (437): In particular, the chi-squared divergence, squared Hellinger distance and Bhattacharyya distance satisfywhere (
100)
appears in [22] (439). - (3)
Rényi divergence [22] (426) and (427): For , - (4)
divergence: For In particular, the following identities hold for the total variation distance:where (
105)
appears in [22] (214). - (5)
DeGroot statistical information: - (6)
Triangular discrimination: - (7)
Lin’s measure: For ,where denotes the binary entropy function. Specifically, the Jensen-Shannon divergence admits the integral representation: - (8)
- (9)
divergence: For ,
An application of (
112) yields the following interplay between the
divergence and the relative information spectrum.
Theorem 3. Let , and let the random variable have no probability masses. Denote Then,
is a continuously differentiable function of γ on , and ;
the sets and determine, respectively, the relative information spectrum on and ;
Proof. We start by proving the first item. By our assumption,
is continuous on
. Hence, it follows from (
112) that
is continuously differentiable in
; furthermore, (
45) implies that
is monotonically decreasing in
, which yields
.
We next prove the second and third items together. Let
and
. From (
112), for
,
which yields (
115). Due to the continuity of
, it follows that the set
determines the relative information spectrum on
.
To prove (
116), we have
where (
120) holds by switching
P and
Q in (
46); (
121) holds since
; (
122) holds by switching
P and
Q in (
115) (correspondingly, also
and
are switched); (
123) holds since
; (
124) holds by the assumption that
has no probability masses, which implies that the sign < can be replaced with ≤ at the term
in the right side of (
123). Finally, (
116) readily follows from (
120)–(
124), which implies that the set
determines
on
.
Equalities (
117) and (
117) finally follows by letting
, respectively, on both sides of (
115) and (
116). ☐
A similar application of (
107) yields an interplay between DeGroot statistical information and the relative information spectrum.
Theorem 4. Let , and let the random variable have no probability masses. Denote Then,
is a continuously differentiable function of ω on ,and is, respectively, non-negative or non-positive on and ; the sets and determine, respectively, the relative information spectrum on and ;
for for and
Remark 8. By relaxing the condition in Theorems 3 and 4 where has no probability masses with , it follows from the proof of Theorem 3 that each one of the setsdetermines at every point on where this relative information spectrum is continuous. Note that, as a cumulative distribution function, is discontinuous at a countable number of points. Consequently, under the condition that is differentiable on , the integral representations of in Theorem 1 are not affected by the countable number of discontinuities for . In view of Theorems 1, 3 and 4 and Remark 8, we get the following result.
Corollary 1. Let be a differentiable function on , and let be probability measures. Then, each one of the sets and in (
131)
and (
132)
, respectively, determines . Remark 9. Corollary 1 is supported by the integral representation of in [3] Theorem 11, expressed as a function of the set of values in , and its analogous representation in [22] Proposition 3 as a function of the set of values in . More explicitly, [3] Theorem 11 states that if , thenwhere is a certain σ-finite measure defined on the Borel subsets of ; it is also shown in [3] (80) that if is twice differentiable on , then 4. New -Divergence Inequalities
Various approaches for the derivation of
f-divergence inequalities were studied in the literature (see
Section 1 for references). This section suggests a new approach, leading to a lower bound on an arbitrary
f-divergence by means of the
divergence of an arbitrary order
(see (
45)) or the DeGroot statistical information (see (
50)). This approach leads to generalizations of the Bretagnole-Huber inequality [
58], whose generalizations are later motivated in this section. The utility of the
f-divergence inequalities in this section is exemplified in the setup of Bayesian binary hypothesis testing.
In the following, we provide the first main result in this section for the derivation of new
f-divergence inequalities by means of the
divergence. Generalizing the total variation distance, the
divergence in (
45)–(
47) is an
f-divergence whose utility in information theory has been exemplified in [
17] Chapter 3, [
54],[
57] p. 2314 and [
69]; the properties of this measure were studied in [
22] Section 7 and [
54] Section 2.B.
Theorem 5. Let , and let be the conjugate convex function as defined in (12). Let P and Q be probability measures. Then, for all , Proof. Let
and
be the densities of
P and
Q with respect to a dominating measure
. Then, for an arbitrary
,
where (
139) follows from the convexity of
and by invoking Jensen’s inequality.
Setting
with
gives
and
where (
146) follows from (
143) by setting
. Substituting (
143) and (
146) into the right side of (
139) gives (
135). ☐
An application of Theorem 5 gives the following lower bounds on the Hellinger and Rényi divergences with arbitrary positive orders, expressed as a function of the divergence with an arbitrary order .
Corollary 2. For all and ,and Proof. Inequality (
147), for
, follows from Theorem 5 and (
22); for
, it holds in view of Theorem 5, and equalities (
17) and (
24). Inequality (
148), for
, follows from (
30) and (
147); for
, it holds in view of (
24), (
147) and since
. ☐
Specialization of Corollary 2 for
in (
147) and
in (
148) gives the following result.
Corollary 3. For , the following upper bounds on divergence hold as a function of the relative entropy and divergence: Remark 10. is a tight lower bound on the chi-squared divergence as a function of the total variation distance. In view of (
49)
, we compare (
151)
with the specialized version of (
149)
when . The latter bound is expected to be looser than the tight bound in (
151)
, as a result of the use of Jensen’s inequality in the proof of Theorem 5; however, it is interesting to examine how much we loose in the tightness of this specialized bound with . From (
49)
, the substitution of in (
149)
gives and, it can be easily verified that
if , then the lower bound in the right side of (
152)
is at most twice smaller than the tight lower bound in the right side of (
151)
; if , then the lower bound in the right side of (152) is at most times smaller than the tight lower bound in the right side of (
151)
.
Remark 11. Setting in (150), and using (49), specializes to the Bretagnole-Huber inequality [58]: Inequality (
153) forms a counterpart to Pinsker’s inequality:
proved by Csiszár [
12] and Kullback [
70], with Kemperman [
71] independently a bit later. As upper bounds on the total variation distance, (
154) outperforms (
153) if
nats, and (
153) outperforms (
154) for larger values of
.
Remark 12. In [59] (8), Vajda introduced a lower bound on the relative entropy as a function of the total variation distance: The lower bound in the right side of (
155)
is asymptotically tight in the sense that it tends to ∞ if , and the difference between and this lower bound is everywhere upper bounded by (see [59] (9)). The Bretagnole-Huber inequality in (
153)
, on the other hand, is equivalent to Although it can be verified numerically that the lower bound on the relative entropy in (
155)
is everywhere slightly tighter than the lower bound in (
156)
(for ), both lower bounds on are of the same asymptotic tightness in a sense that they both tend to ∞ as and their ratio tends to 1. Apart of their asymptotic tightness, the Bretagnole-Huber inequality in (
156)
is appealing since it provides a closed-form simple upper bound on as a function of (see (
153)
), whereas such a closed-form simple upper bound cannot be obtained from (
155)
. In fact, by the substitution and the exponentiation of both sides of (
155)
, we get the inequality whose solution is expressed by the Lambert W function [72]; it can be verified that (
155)
is equivalent to the following upper bound on the total variation distance as a function of the relative entropy:where W in the right side of (
157)
denotes the principal real branch of the Lambert W function. The difference between the upper bounds in (
153)
and (
157)
can be verified to be marginal if is large (e.g., if nats, then the upper bounds on are respectively equal to 1.982 and 1.973), though the former upper bound in (
153)
is clearly more simple and amenable to analysis. The Bretagnole-Huber inequality in (
153)
is proved to be useful in the context of lower bounding the minimax risk (see, e.g., [5] pp. 89–90, 94), and the problem of density estimation (see, e.g., [6] Section 1.6). The utility of this inequality motivates its generalization in this section (see Corollaries 2 and 3, and also see later Theorem 7 followed by Example 2). In [
22] Section 7.C, Sason and Verdú generalized Pinsker’s inequality by providing an upper bound on the
divergence, for
, as a function of the relative entropy. In view of (
49) and the optimality of the constant in Pinsker’s inequality (
154), it follows that the minimum achievable
is quadratic in
for small values of
. It has been proved in [
22] Section 7.C that this situation ceases to be the case for
, in which case it is possible to upper bound
as a constant times
where this constant tends to infinity as we let
. We next cite the result in [
22] Theorem 30, extending (
154) by means of the
divergence for
, and compare it numerically to the bound in (
150).
Theorem 6. ([22] Theorem 30) For every ,where the supremum is over , and is a universal function (independent of ), given bywhere in (161) denotes the secondary real branch of the Lambert W function [72]. As an immediate consequence of (
159), it follows that
which forms a straight-line bound on the
divergence as a function of the relative entropy for
. Similarly to the comparison of the Bretagnole-Huber inequality (
153) and Pinsker’s inequality (
154), we exemplify numerically that the extension of Pinsker’s inequality to the
divergence in (
162) forms a counterpart to the generalized version of the Bretagnole-Huber inequality in (
150).
Figure 1 plots an upper bound on the
divergence, for
, as a function of the relative entropy (or, alternatively, a lower bound on the relative entropy as a function of the
divergence). The upper bound on
for
, as a function of
, is composed of the following two components:
the straight-line bound, which refers to the right side of (
162), is tighter than the bound in the right side of (
150) if the relative entropy is below a certain value that is denoted by
in nats (it depends on
);
the curvy line, which refers to the bound in the right side of (
150), is tighter than the straight-line bound in the right side of (
162) for larger values of the relative entropy.
It is supported by
Figure 1 that
is positive and monotonically increasing, and
; e.g., it can be verified that
,
,
, and
(see
Figure 1).
Bayesian Binary Hypothesis Testing
The DeGroot statistical information [
16] has the following meaning: consider two hypotheses
and
, and let
and
with
. Let
P and
Q be probability measures, and consider an observation
Y where
, and
. Suppose that one wishes to decide which hypothesis is more likely given the observation
Y. The operational meaning of the DeGroot statistical information, denoted by
, is that this measure is equal to the minimal difference between the
a-priori error probability (without side information) and
a posteriori error probability (given the observation
Y). This measure was later identified as an
f-divergence by Liese and Vajda [
3] (see (
50) here).
Theorem 7. The DeGroot statistical information satisfies the following upper bound as a function of the chi-squared divergence:and the following bounds as a function of the relative entropy: - (1)
where for is introduced in (
160)
; - (2)
Proof. The first bound in (
163) holds by combining (
53) and (
149); the second bound in (
164) follows from (
162) and (
53) for
, and it follows from (
52) and (
154) when
; finally, the third bound in (
165) follows from (
150) and (
53). ☐
Remark 13. The bound in (
164)
forms an extension of Pinsker’s inequality (
154)
when (i.e., in the asymmetric case where the hypotheses and are not equally probable). Furthermore, in view of (
52)
, the bound in (
165)
is specialized to the Bretagnole-Huber inequality in (
153)
by letting .
Remark 14. Numerical evidence shows that none of the bounds in (
163)–(
165)
supersedes the others. Remark 15. The upper bounds on in (
163)
and (
165)
are asymptotically tight when we let and tend to infinity. To verify this, first note that (see [23] Theorem 5)which implies that also and tend to infinity. In this case, it can be readily verified that the bounds in (
163)
and (
165)
are specialized to ; this upper bound, which is equal to the a-priori
error probability, is also equal to the DeGroot statistical information since the a-posterior error probability tends to zero in the considered extreme case where P and Q are sufficiently far from each other, so that and are easily distinguishable in high probability when the observation Y is available. Remark 16. Due to the one-to-one correspondence between the divergence and DeGroot statistical information in (
53)
, which shows that the two measures are related by a multiplicative scaling factor, the numerical results shown in Figure 1 also apply to the bounds in (
164)
and (
165)
; i.e., for , the first bound in (
164)
is tighter than the second bound in (
165)
for small values of the relative entropy, whereas (
165)
becomes tighter than (
164)
for larger values of the relative entropy. Corollary 4. Let , and let be as defined in (
12)
. Then, Proof. Inequalities (
167) and (
168) follow by combining (
135) and (
53). ☐
We end this section by exemplifying the utility of the bounds in Theorem 7.
Example 2. Let and with , and assume that the observation Y given that the hypothesis is or is Poisson distributed with the positive parameter μ or λ, respectively:where Without any loss of generality, let . The bounds on the DeGroot statistical information in Theorem 7 can be expressed in a closed form by relying on the following identities: In this example, we compare the simple closed-form bounds on in (
163)–(
165)
with its exact value To simplify the right side of (
174)
, let , and definewhere for , denotes the largest integer that is smaller than or equal to x. It can be verified that To exemplify the utility of the bounds in Theorem 7, suppose that μ and λ are close, and we wish to obtain a guarantee on how small is. For example, let , , and . The upper bounds on in (
163)–(
165)
are, respectively, equal to , and ; we therefore get an informative guarantee by easily calculable bounds. The exact value of is, on the other hand, hard to compute since (see (
175)
), and the calculation of the right side of (
178)
appears to be sensitive to the selected parameters in this setting.