1. Introduction
Some of the most famous inequalities in mathematics are surely the Jensen inequality and its converse. The converse Jensen inequality is given by Lah and Ribarič in [
1] and separately by Edmundson in [
2], so it is sometimes referred by Edmundson–Lah–Ribarič inequality. The Jensen inequality and its converse are closely connected with the Hermite-Hadamard inequality, and these three inequalities have always been the great inspiration for further investigations, generalizations, refinements, improvements and extensions. An interested reader can consult several very new papers, published just in the last few months, in order to obtain a more comprehensive understanding of the recent research progress in this field (see for example [
3,
4,
5,
6,
7,
8,
9,
10]).
In the recent papers [
11,
12] the authors investigated the sharpness of the Jensen inequality. However, how sharp is the converse of the Jensen inequality?
Let
be an interval in
. Consider the Green functions
,
defined by
By means of these functions, the authors in [
13,
14] gave the uniform treatment of the Jensen type inequalities, allowing the measure also to be negative. In this paper, we continue this investigation, and we concentrate on the converse Jensen inequality.
The paper is organized as follows: after this introduction, in the
Section 2 we give our main results. We analyse the sharpness of the converse of the Jensen inequality. Here, instead of the convexity of the function, we use previously mentioned Green functions. After the first theorem, the following corollaries give us some further results and an example with the condition which is easier to verify. In the
Section 3, the analogous results in discrete case are presented. As we know, the Jensen inequality is important when obtaining inequalities for divergences. Therefore, in our
Section 4 we use our results with the converse Jensen inequality in order to derive new inequalities for different types of generalized
f-divergences. According to their definition, divergences measure the differences between probability distributions. So, to conclude the paper, in the
Section 5 we apply our results with
f-divergences on the special kind of a probability distribution defined as Zipf–Mandelbrot law.
2. Main Results
To simplify the notation, we denote
We give our first result.
Theorem 1. Let be continuous function and , , where . Let be such that for all . Let be a function of bounded variation, such that . Let , , be such that .
Proof. Using the functions
we can represent every function
,
, as:
which can be easily shown by integrating by parts. For instance, for
we have
which proves the first identity. For
we have
and this gives us the second identity. The other identities can be proved analogously.
Furthermore, by simple calculation using these identities it can be shown that for every function
,
, and for any
holds
Now, using the triangle inequality for integrals we get
and then applying the Hölder inequality we get the statement of our theorem. □
Let us see what happens for
,
. If the term
has the same positivity for all
, then we can calculate
Q. The following result holds.
Corollary 1. Let be continuous function and , , where . Let be such that for all . Let be a function of bounded variation, such that . If for any for all the inequalityholds, or if for any for all the reverse inequality in (1) holds, then Proof. Applying Theorem 1 for
,
we get
If the term doesn’t change it’s positivity for all , we can calculate the integral on the right side of (3).
Let us start with the case when
. We have
and therefore it is
Similarly, when we consider the case when
, we have
and we obtain
For
we have
for
and for
and using the same procedure, we get the same result. Thus, for
we have that
If for all
the inequality (1) holds, then (3) becomes
and if for all
the reverse inequality in (1) holds, then (3) becomes
So, if for all the inequality (1) holds or if for all the reverse inequality in (1) holds, in both cases (2) is valid. □
Remark 1. Note that (2) can also be expressed as Let us consider the case when
. If we set that
and
, we have that
, and the inequality (1) transforms into
Therefore, we have the following result.
Corollary 2. Let be continuous function and , , where . Let be a function of bounded variation, such that . If for all the inequality (4) holds, or if for all the reverse inequality in (4) holds, then As we can see, this case looks much simpler, while the condition (4) is easier to verify than the condition (1). Similar results could also be given in cases when , but we are not mentioning them here, because their conditions are not so simple.
3. On the Converse Jensen Type Inequality in Discrete Case
In this section, we give our results in the discrete case. We omit the proofs, as they are similar to those in the integral case from the previous section. We introduce the notation: , .
For
,
,
such that
, we have that for every function
,
, holds:
Using that fact, we obtain the following result.
Theorem 2. Let , , be such that , and let , . Let , , be such that .
Let us now see what happens for
,
. If the term
has the same positivity for all
, then we can calculate
L. The following result holds.
Corollary 3. Let , , be such that , and let , . If for any for all the inequalityholds, or if for any for all the reverse inequality in (5) holds, then In the case when
,
and
, the inequality (5) transforms into
and we obtain the following result.
Corollary 4. Let , , be such that , and let , . If for all the inequality (6) holds, or if for all the reverse inequality in (6) holds, then 4. Inequalities for Generalized -Divergences
I. Csiszár in [
15] defined the
f-divergence
for a function
and two positive probability distributions
,
. He considered the case when the function
f is convex. Although several other authors ([
16,
17]) also introduced and studied this divergence, it is well known as the Csiszár
f-divergence.
There exist various kinds of divergences, and all of them measure the differences between probability distributions. We focus here on the
f-differences which are generalized using weights (see [
18,
19]), and we apply our results from the previous section in order to get new results and inequalities for these generalized
f-divergences.
The generalized Csiszár
-divergence is defined by
where
and
.
To simplify our results, we use the following notations
Theorem 3. Let be such that , , , and let , , be such that . If , , thenholds, where Proof. Substituting
,
our result directly follows from Theorem 2. □
The generalized Kullback–Leibler divergence is defined by
where
. For this divergence we have the following result.
Theorem 4. Let be such that , , , and let , , be such that . Thenholds, where is the identity function and L is as defined in (7). Proof. This result follows directly from Theorem 3 by setting , . □
For the generalized Hellinger divergence defined by
the following result holds.
Theorem 5. Let be such that , , , and let , , be such that . Thenholds, where and L is as defined in (7). Proof. This result follows directly from Theorem 3 by setting , . #x25A1;
The generalized Rényi divergence is defined by
where
.
Theorem 6. Let be such that , , , and let , , be such that . Thenholds, where () and L is as defined in (7). Proof. This result follows directly from Theorem 3 by setting (). □
The generalized
-divergence is defined by
The following result holds.
Theorem 7. Let be such that , , , and let , , be such that . Thenholds, where and L is as defined in (7). Proof. This result follows directly from Theorem 3 by setting . □
The generalized Shannon entropy of a positive probability distribution
is defined by
It is a special case of the generalized Csiszár f-divergence if we set and , . We have the following.
Theorem 8. Let be such that , , , and let , , be such that . Thenholds, where L is as defined in (7). 5. Applications to Zipf–Mandelbrot Law
Definition 1([20]). Zipf–Mandelbrot law is a discrete probability distribution, depends on three parameters , and , and it is defined bywhere When , then Zipf–Mandelbrot law becomes Zipf’s law.
The Zipf–Mandelbrot law got its name after the linguist George Kingsley Zipf, who gave its primary form, and after the mathematician Benoit Mandelbrot, who gave its generalization. The Zipf law goes after the frequency of a certain word in the text, and it is used in bibliometric and in information science. It is used in linguistics, but also in economics (as Pareto’s law) when analysing the distribution of the wealth. Apart from that, this law can be found also in other disciplines like mathematics, physics, biology, computer science, social sciences, demography, etc. Here we are going to concentrate on its mathematical aspect of course. (More about the Zipf–Mandelbrot law in mathematical context can be found in [
21].)
As the Zipf–Mandelbrot law is a probability distribution, and f-divergences measure the differences between two probability distributions, we can apply the results from the previous section on the Zipf–Mandelbrot law.
Suppose
are two Zipf–Mandelbrot laws with parameters
,
,
and
,
, respectively. Then
and
where
The generalized Csiszár divergence for such
, and for
is given by
Using (8) and (9), we get the following expressions for
and
:
and we obtain the following result.
Corollary 5. Let be two Zipf–Mandelbrot laws with parameters , , and , respectively, and such that Let , , be such that . If , , thenwhere , , , , are as defined in (8)–(12), and L is as defined in (7). The generalized Kullbach-Leibler divergence of two Zipf–Mandelbrot laws
with parameters
,
,
and
,
, respectively, and
, is given by:
The following holds.
Corollary 6. Let be two Zipf–Mandelbrot laws with parameters , , and , respectively, and such that Let , , be such that . Thenholds, where is the identity function and , , , , are as defined in (8), (9), (11)–(13), and L is as defined in (7). The generalized Hellinger divergence for two Zipf–Mandelbrot laws
with parameters
,
,
and
,
, respectively, and
, has the following representation:
The following result holds.
Corollary 7. Let be two Zipf–Mandelbrot laws with parameters , , and , respectively, and such that Let , , be such that . Thenwhere , and , , , , are as defined in (8), (9), (11), (12) and (14), respectively, and L is as defined in (7). The generalized Rényi divergence for two Zipf–Mandelbrot laws
with parameters
,
,
and
,
, respectively, and
, has the following representation:
where
. The following result holds.
Corollary 8. Let be two Zipf–Mandelbrot laws with parameters , , and , respectively, and such that Let , , be such that . Thenholds, where (), and , , , , are as defined in (8), (9), (11), (12) and (15), respectively, and L is as defined in (7). The generalized
-divergence for two Zipf–Mandelbrot laws
with parameters
,
,
and
,
, respectively, and
, has the following representation:
We have the following result.
Corollary 9. Let be two Zipf–Mandelbrot laws with parameters , , and , respectively, and such that Let , , be such that . Thenholds, where and , , , , are as defined in (8), (9), (11), (12) and (16), respectively, and L is as defined in (7). In addition, at the end, we also give the result for the generalized Shannon entropy of a Zipf–Mandelbrot law
with parameters
,
,
, and
, which has the following representation:
Corollary 10. Let be a Zipf–Mandelbrot law with parameters , and , and , such that Let , , be such that . Thenholds, where , , are as defined in (8), (11) and (17), respectively, and L is as defined in (7).