1. Preliminaries
Throughout this paper, X, Y and Z denote discrete random variables taking on the values {x1, · · ·, x|X|}, {y1, · · ·, y|Y |} and {z1, · · ·, z|Z|}, respectively, where |A| denotes the number of the values of the discrete random variable A. We denote the discrete random variable following a uniform distribution by U. We set the probabilities as p(xi) ≡ Pr(X = xi), p(yj) ≡ Pr(Y = yj) and p(zk) ≡ Pr(Z = zk). If |U| = n, then
for all k = 1, · · ·, n. In addition, we denote by p(xi, yj) = Pr(X = xi, Y = yj), p(xi, yj, zk) = Pr(X = xi, Y = yj, Z = zk) the joint probabilities, by p(xi|yj) = Pr(X = xi|Y = yj), p(xi|yj, zk) = Pr(X = xi|Y = yj, Z = zk) the conditional probabilities, and so on.
The notion of entropy was used in statistical thermodynamics by Boltzmann [
1] in 1871 and Gibbs [
2] in 1902, in order to quantify the diversity, uncertainty and randomness of isolated systems. Later, it was seen as a measure of “information, choice and uncertainty” in the theory of communication, when Shannon [
3] defined it by:
In what follows, we consider |X| = |Y| = |U| = n, unless otherwise specified.
Making use of the concavity of the logarithmic function, one can easily check that the equiprobable states are maximizing the entropy, that is:
The right-hand side term of this inequality has been known since 1928 as Hartley entropy [
4].
For two random variables
X and
Y following distributions {
p(
xi)} and {
p(
yi)}, the Kullback–Leibler [
5] discrimination function (divergence or relative entropy) is defined by:
(We note that the relative entropy is usually defined for two probability distributions
P = {
pi} and
Q = {
qi} as
in the standard notation of information theory.
D(
P||Q) is often rewritten by
D(
X||Y ) for random variables
X and
Y following the distributions
P and
Q. Throughout this paper, we use the style of
Equation (3) for relative entropies to unify the notation with simple descriptions.) Here, the conventions
and
are used. (We also note that the convention is often given in the following way with the definition of
D(
X||Y ). If there exists
i, such that
p(
xi) ≠ 0 =
p(
yi), then we define
D(
X||Y )
≡ +
∞ (in this case,
D(
X||Y ) is no longer significant as an information measure). Otherwise,
D(
X||Y ) is defined by
Equation (3) with the convention
. This fact has been mentioned in the abstract of the paper.) In what follows, we use such conventions in the definitions of the entropies and divergences. However, we do not state them repeatedly.
It holds that:
Moreover, the cross-entropy (or inaccuracy):
satisfies the identity:
Many extensions of Shannon entropy have been studied. The Rényi entropy [
6] and
α-entropy [
7] are famous. The mathematical results until the 1970s are well written in the book [
8]. In the present paper, we focus on the hypoentropy introduced by Carlo Ferreri and the Tsallis entropy introduced by Constantino Tsallis.
The hypoentropy at the level
λ (
λ-entropy) was introduced in 1980 by Ferreri [
9] as an alternative measure of information in the following form:
for
λ > 0. According to Ferreri [
9], the parameter
λ can be interpreted as a measure of the information inaccuracy of economic forecast. As we will show that
Fλ(
X)
≤ H(
X) in the second section, the name hypoentropy comes from this property.
On the other hand, Tsallis introduced a one-parameter extension of the entropy in 1988 in [
10], for handling systems that appear to deviate from standard statistical distributions. It plays an important role in the nonextensive statistical mechanics of complex systems, being defined as:
Here, the
q–logarithmic function for
x > 0 is defined by
, which converges to the usual logarithmic function log(
x) in the limit
q → 1. The Tsallis divergence (relative entropy) [
11] is given by:
Note that some important properties of the Tsallis relative entropy were given in the papers [
12–
14].
2. Hypoentropy and Hypodivergence
For nonnegative real numbers, ai and bi (i = 1, · · ·, n), we define the generalized relative entropy (for incomplete probability distributions):
Then, we have the so-called “log-sum” inequality:
with equality if and only if
. for all i = 1, · · ·, n.
If we impose the condition:
then D(gen)(a1, · · ·, an||b1, · · ·, bn) is just the relative entropy,
We put
and
with
λ > 0 and
,
p(
xi)
≥ 0,
p(
yi)
≥ 0. Then, we find that it is equal to the hypodivergence (
λ-divergence) introduced by Ferreri in [
9],
Clearly, we have:
Using the “log-sum” inequality, we have the nonnegativity:
with equality if and only if p(xi) = p(yi) for all i = 1, · · ·, n.
For the hypoentropy
Fλ(
X) defined in
Equation (7), we firstly show the fundamental relations. To do so, we prepare with the following lemma.
Lemma 1
For any a > 0 and 0 ≤ x ≤ 1, we have
Proof
We set f(x) ≡ x(1 + a) log(1 + a) – (1 + ax) log(1 + ax). For any a > 0, we then have
and f(0) = f(1) = 0. Thus, we have the inequality.
Proposition 1
For λ > 0, we have the following inequalities:
The equality in the first inequality holds if and only if p(xj) = 1 for some j (then p(xi) = 0 for all i ≠ j). The equality in the second inequality holds if and only if p(xi) = 1/n for all i = 1, · · ·, n.
Proof
From the nonnegativity of the hypodivergence
Equation (15), we get:
Thus, we have:
Adding
to both sides, we have:
with equality if and only if p(xi) = 1/n for all i = 1, · · ·, n.
For the first inequality, it is sufficient to prove:
Since
, the above inequality is written as:
so that we have only to prove:
for any λ > 0 and 0 ≤ p(xi) ≤ 1. Lemma 1 shows this inequality and the equality condition.
It is a known fact [
9] that
Fλ(
X) is monotonically increasing as a function of
λ and:
whence its name, as we noted in the Introduction. Thus, the hypoentropy appears as a generalization of Shannon’s entropy. One can see that the hypoentropy also equals zero as the entropy does, in the case of certainty (i.e., for a so-called pure state when all probabilities vanish, but one).
It also holds that:
It is of some interest for the reader to look at the hypoentropy that arises for equiprobable states,
Seen as a function of two variables,
n and
λ, it increases in each variable [
9]. Since:
We shall call it Hartley hypoentropy. (Throughout the paper, we add the name Hartley to the name of mathematical objects whenever they are considered for the uniform distribution. In the same way, we proceed with the name Tsallis, which we add to the name of some mathematical objects that we define, to emphasize that they are used in the framework of Tsallis statistics. This means that we will have Tsallis hypoentropies, Tsallis hypodivergences, and so on.) We have the cross-hypoentropy:
It holds:
therefore, we have
.
We can show an upper bound for Fλ(X) as a direct consequence.
Proposition 2
The following inequality holds.
for all λ > 0, where pmax ≡ max {p(x1), · · ·, p(xn)}.
Proof
In the inequality (
30), if for a fixed
k, one takes the probability of the
k-th component of
Y to be
p(
yk) = 1, then:
This implies that:
Since k is arbitrarily fixed, the conclusion follows.
3. Tsallis Hypoentropy and Hypodivergence
Now, we turn our attention to the Tsallis statistics. We extend the definition of hypodivergences as follows:
Definition 1
The Tsallis hypodivergence (q-hypodivergence, Tsallis relative hypoentropy) is defined by:
for λ > 0 and q ≥ 0.
Then, we have the relation:
which is the Tsallis divergence, and:
which is the hypodivergence.
Definition 2
For λ > 0 and q ≥ 0, the Tsallis hypoentropy (q-hypoentropy) is defined by:
where the function h(λ, q) > 0 satisfies two conditions,
and:
These conditions are equivalent to:
and, respectively,
Some interesting examples are h(λ, q) = λ1−q and h(λ, q) = (1+λ)1−q.
Lemma 2
For any a > 0, q ≥ 0 and 0 ≤ x ≤ 1, we have:
Proof
We set
. For any a > 0 and q ≥ 0, we then have
and g(0) = g(1) = 0. Thus, we have the inequality.
Proposition 3
For λ > 0,
q ≥ 0
and h(
λ, q)
> 0
satisfying (43) and (44), we have the following inequalities: The equality in the first inequality holds if and only if p(xj) = 1 for some j (then p(xi) = 0 for all i ≠ j). The equality in the second inequality holds if and only if p(xi) = 1/n for all i = 1, · · ·, n.
Proof
In a similar way to the proof of Proposition 1, for the first inequality, it is sufficient to prove:
so that we have only to prove:
for any λ > 0, q ≥ 0 and 0 ≤ p(xi) ≤ 1. Lemma 2 shows this inequality with the equality condition.
The second inequality is proven by the use of the nonnegativity of the Tsallis hypodivergence in the following way:
which implies (by the use of the formula
:
The equality condition of the second inequality follows from the equality condition of the nonnegativity of the Tsallis hypodivergence (
41).
We may call:
the Hartley–Tsallis hypoentropy. We study the monotonicity for n or λ of the Hartley–Tsallis hypoentropy Hλ,q(U) and the Tsallis hypoentropy Hλ,q(X). (Throughout the present paper, the term “monotonicity” means the monotone increasing/decreasing as a function of the parameter λ. We emphasize that it does not mean the monotonicity for some stochastic maps.)
Lemma 3
The function:
is monotonically increasing in x, for any q ≥ 0.
Proof
By direct calculations, we have:
and:
Since
, we have
.
Proposition 4
The Hartley–Tsallis hypoentropy:
is a monotonically increasing function of n, for any λ > 0 and q ≥ 0.
Proof
Note that:
Putting
for λ > 0 fixed in Lemma 3, we get the function:
which is a monotonically increasing function of n. Thus, we have the present proposition.
Proposition 5
We assume h(λ, q) = (1 + λ)1−q. Then, Hλ,q(X) is a monotone increasing function of λ > 0 when 0 ≤ q ≤ 2.
Proof
Note that:
where:
is defined on 0 ≤ x ≤ 1, 0 ≤ q ≤ 2 and λ > 0. Then, we have:
where:
is defined on 0 ≤ x ≤ 1, 0 ≤ q ≤ 2 and λ > 0. By some computations, we have:
since (x − 1)(q − 1) + 1 ≥ 0 for 0 ≤ x ≤ 1 and 0 ≤ q ≤ 2. We easily find sλ,q(0) = sλ,q(1) = 0. Thus, we have sλ,q(x) ≥ 0 for 0 ≤ x ≤ 1, 0 ≤ q ≤ 2 and λ > 0. Therefore, we have
for 0 ≤ q ≤ 2 and λ > 0.
This result agrees with the known fact that the usual (Ferreri) hypoentropy is increasing as a function of λ.
Closing this subsection, we give a q-extended version for Proposition 2.
Proposition 6
Let pmax ≡ max{p(x1), ···, p(xn)}. Then, we have the following inequality.
for all λ > 0 and q ≥ 0.
Proof
From the “lnq-sum” inequality, we have Dλ,q(X||Y ) ≥ 0. Since λ > 0, we have:
which is equivalent to:
Thus, we have:
which extends the result given from the inequality (
30). For arbitrarily fixed
k, we set
p(
yk) = 1 (and
p(
yi) = 0 for
i ≠
k) in the above inequality; then, we have:
Since
, we have:
Multiplying both sides by
and then adding
to both sides, we have:
Since k is arbitrary, we have this proposition.
Letting q → 1 in the above proposition, we recover Proposition 2.
4. The Subadditivities of the Tsallis Hypoentropies
Throughout this section, we assume |X| = n, |Y | = m, |Z| = l. We define the joint Tsallis hypoentropy at the level λ by:
Note that Hλ,q(X, Y ) = Hλ,q(Y, X).
For all i = 1, ···, n for which p(xi) ≠ 0, we define the Tsallis hypoentropy of Y given X = xi, at the level λp(xi), by:
For n = 1, this coincides with the hypoentropy Hλ,q(Y ). As for the particular case m = 1, we get Hλp(xi),q(Y |xi) = 0.
Definition 3
The Tsallis conditional hypoentropy at the levelλ is defined by:
(As a usual convention, the corresponding summand is defined as zero, if p(xi) = 0.)
Throughout this section, we consider the particular function h(λ, q) = λ1−q for λ > 0, q ≥ 0.
Lemma 4
We assume h(λ, q) = λ1−q. The chain rule for the Tsallis hypoentropy holds:
Proof
The proof is done by straightforward computation as follows.
In the limit
λ → ∞, the identity (
66) becomes
Tq(
X, Y ) =
Tq(
X) +
Tq(
Y |
X), where
is the Tsallis conditional entropy and
is the Tsallis joint entropy (see also p. 3 in [
17]).
In the limit q → 1 in Lemma 4, we also obtain the identity Fλ(X, Y ) = Fλ(X) + Fλ(Y |X), which naturally leads to the definition of Fλ(Y |X) as conditional hypoentropy.
In order to obtain the subadditivity for the Tsallis hypoentropy, we prove the monotonicity in λ of the Tsallis hypoentropy.
Lemma 5
We assume h(λ, q) = λ1−q. The Tsallis hypoentropy Hλ,q(X) is a monotonically increasing function of λ > 0 when 0 ≤ q ≤ 2 and a monotonically decreasing function of λ > 0 when q ≥ 2 (or q ≤ 0).
Proof
Note that:
where:
is defined on 0 ≤ x ≤ 1 and λ > 0. Then, we have:
where:
is defined on 0 ≤ x ≤ 1 and > > 0. By elementary computations, we obtain:
Since we have lλ,q(0) = lλ,q(1) = 0, we find that lλ,q(x) ≥ 0 for 0 ≤ q ≤ 2 and any λ > 0. We also find that lλ,q(x) ≤ 0 for q ≥ 2 (or q ≤ 0) and any λ > 0. Therefore, we have
when 0 ≤ q ≤ 2, and
when q ≥ 2 (or q ≤ 0).
This result also agrees with the known fact that the usual (Ferreri) hypoentropy is increasing as a function of λ.
Theorem 1
We assume h(λ, q) = λ1−q. It holds Hλ,q(Y |X) ≤ Hλ,q(Y ) for 1 ≤ q ≤ 2.
Proof
We prove this theorem by the method used in [
18] with Jensen’s inequality. We note that
Lnλ,q(
x) is a nonnegative and concave function in
x, when 0 ≤
x ≤ 1,
λ > 0 and
q ≥ 0. Here, we use the notation for the conditional probability as
when
p(
xi) ≠ 0. By the concavity of
Lnλ,q(
x), we have:
Summing both sides of the above inequality over j, we have:
Since p(xi)q ≤ p(xi) for 1 ≤ q ≤ 2 and Lnλ,q(x) ≥ 0 for 0 ≤ x ≤ 1, λ > 0 and q ≥ 0, we have:
Summing both sides of the above inequality over i, we have:
By the two inequalities (
74) and (
76), we have:
Here, we can see that
is the Tsallis hypoentropy for fixed xi and the Tsallis hypoentropy is a monotonically increasing function of λ in the case 1 ≤ q ≤ 2, due to Lemma 5. Thus, we have:
By the two inequalities (
77) and (
78), we finally have:
which implies (since
:
since we have for all fixed xi,
Therefore, we have Hλ,q(Y |X) ≤ Hλ,q(Y ).
Corollary 1
We have the following subadditivity for the Tsallis hypoentropies:
in the case h(λ, q) = λ1−q for 1 ≤ q ≤ 2.
Proof
The proof is easily done by Lemma 4 and Theorem 1.
We are now in a position to prove the strong subadditivity for the Tsallis hypoentropies. The strong subadditivity for entropy is one of interesting subjects in entropy theory [
19]. For this purpose, we firstly give a chain rule for three random variables
X, Y and
Z.
Lemma 6
We assume h(λ, q) = λ1−q. The following chain rule holds:
Proof
The proof can be done following the recipe used in Lemma 4.
Theorem 2
We assume h(λ, q) = λ1−q. The strong subadditivity for the Tsallis hypoentropies,
holds for 1 ≤ q ≤ 2.
Proof
This theorem is proven in a similar way as Theorem 1. By the concavity of the function Lnλp(zk),q(x) in x and by using Jensen’s inequality, we have:
Multiplying both sides by p(zk)q and summing over i and k, we have:
since
. By p(yj |zk)q ≤ p(yj |zk) for all j, k and 1 ≤ q ≤ 2, and by the nonnegativity of the function Lnλp(zk),q, we have:
Multiplying both sides by p(zk)q and summing over j and k in the above inequality, we have:
From the two inequalities (
84) and (
85), we have:
which implies:
since p(yj, zk) ≤ p(zk) (because of
for all j and k and the function Lnλp(zk),q is monotonically increasing in λp(zk) > 0, when 1 ≤ q ≤ 2. Thus, we have Hλ,q(X|Y, Z) ≤ Hλ,q(X|Z), which is equivalent to the inequality:
by Lemmas 4 and 6.
Definition 4
Let 1 ≤ q ≤ 2 and λ > 0. The Tsallis mutual hypoentropy is defined by:
and the Tsallis conditional mutual hypoentropy is defined by:
From the chain rule given in Lemma 4, we find that the Tsallis mutual hypoentropy is symmetric, that is,
In addition, we have:
from the subadditivity given in Theorem 1 and the nonnegativity of the Tsallis conditional hypoentropy. We also find Iλ,q(X, Y |Z) ≥ 0 from the strong subadditivity given in Theorem 2.
Moreover, we have the chain rule for the Tsallis mutual hypoentropies in the following.
From the strong subadditivity, we have Hλ,q(X|Y, Z) ≤ Hλ,q(X|Z); thus, we have:
for 1 ≤ q ≤ 2 and λ > 0.
5. Jeffreys and Jensen–Shannon Hypodivergences
In what follows, we indicate extensions of two known information measures.
Definition 5 ([21,22])
The Jeffreys divergence is defined by:
and the Jensen–Shannon divergence is defined as:
The Jensen–Shannon divergence was introduced in 1991 in [
23], but its roots can be older, since one can see some analogous formulae used in thermodynamics under the name entropy of mixing (p. 598 in [
24]), for the study of gaseous, liquid or crystalline mixtures.
Jeffreys and Jensen–Shannon divergences have been extended to the context of Tsallis theory in [
25]:
Definition 6
The Jeffreys–Tsallis divergence is:
and the Jensen–Shannon–Tsallis divergence is:
Note that:
This expression was used in [
26] as Jensen–Tsallis divergence.
In accordance with the above definition, we define the directed Jeffreys and Jensen–Shannon q-hypodivergence measures between two distributions and emphasize the mathematical significance of our definitions.
Definition 7
The Jeffreys–Tsallis hypodivergence is:
and the Jensen–Shannon–Tsallis hypodivergence is:
Here, we point out that again, one has:
where:
Lemma 7
The following inequality holds:
for q ≥ 0 and λ > 0.
Proof
Using the inequality between the arithmetic and geometric mean, one has:
Thus, the proof is completed.
In the limit
λ → ∞, Lemma 7 recovers Lemma 3.4 in [
25].
Lemma 8 ([25])
The function:
is concave for 0 ≤ r ≤ q.
The next two results of the present paper are stated in order to establish the counterpart of Theorem 3.5 in [
25] for hypodivergences.
Proposition 7
It holds:
for q ≥ 0 and λ > 0.
Proof
By the use of Lemma 7, one has:
This completes the proof.
Proposition 8
It holds that:
for 0 ≤ r ≤ q and λ > 0.
Proof
According to Lemma 8,
Then:
Thus, the proof is completed.
We further define the dual symmetric hypodivergences.
Definition 8
The dual symmetric Jeffreys–Tsallis hypodivergence is defined by:
and the dual symmetric Jensen–Shannon–Tsallis hypodivergence is defined by:
Using Lemma 7, we have the following inequality.
Proposition 9
It holds:
for 0 ≤ q ≤ 2 and λ > 0.
In addition, we have the following inequality.
Proposition 10
It holds:
for 1 < r ≤ 2, r ≤ q and λ > 0.
Proof
The proof can be done by similar calculations with Proposition 8, applying the facts (see Lemmas 3.9 and 3.10 in [
25]) that exp
q(
x) is a monotonically increasing function in
q for
x ≥ 0 and the inequality −ln
2−r x ≤ −ln
r x holds for 1
< r ≤ 2 and
x > 0
.