1. Introduction
The theory of sums of random variables belongs to the core of modern probability theory. The fundamental contribution to the formation of the classical core was made by A. de Moivre, J. Bernoulli, P.-S. Laplace, D. Poisson, P.L. Chebyshev, A.A. Markov, A.M. Lyapunov, E. Borel, S.N. Bernstein, P. Lévy, J. Lindeberg, H. Cramér, A.N. Kolmogorov, A.Ya. Khinchin, B.V. Gnedenko, J.L. Doob, W. Feller, Yu.V. Prokhorov, A.A. Borovkov, Yu.V. Linnik, I.A. Ibragimov, A. Rényi, P. Erdös, M. Csörgö, P. Révész, C. Stein, P. Hall, V.V. Petrov, V.M. Zolotarev, J. Jacod and A.N. Shiryaev among others. The first steps led to limit theorems for appropriately normalized partial sums of sequences of independent random variables. Besides the laws of large numbers, special attention was paid to emergence of Gaussian and Poisson limit laws. Note that despite many efforts to find necessary and sufficient conditions for the validity of the central limit theorem (the term was proposed by G. Pólya for a class of limit theorems describing weak convergence of distributions of normalized sums of random variables to the Gaussian law), this problem was completely resolved for independent summands only in the second part of the 20th century in the works by V.M. Zolotarev and V.I. Rotar. Also in the last century, the beautiful theory of infinitely divisible and stable laws was constructed. New developments of infinite divisibility along with classical theory can be found in [
1]. For exposition of the theory of stable distributions and their applications, we refer to [
2], see also references therein.
Parallel to partial sums of a sequence of random variables (and vectors), other significant schemes have appeared, for instance, the arrays of random variables. Moreover, in physics, biology and other domains, researchers found that it was essential to study the sums of random variables when the number of summands was random. Thus, the random sums with random summands became an important object of investigation. One can mention the branching processes which stem from the 19th century population models by I.J. Bienaymé, F. Galton and H.W. Watson that are still intensively being developed, see, e.g., [
3]. In the theory of risk, it is worth recalling the celebrated Cramér–Lundberg model for dynamics of the capital of an insurance company, see, e.g., Ch. 6 in [
4]. Various examples of models described by random sums are considered in Ch. 1 of [
5], including (see Example 1.2.1) the relationship between certain random sums analysis and the famous Pollaczek–Khinchin formula in queuing theory. A vast literature deals with the so-called geometric sums. There, one studies the sum of independent identically distributed random variables, and the summation index follows the geometric distribution, being independent with summands. Such random sums can model many real world phenomena, e.g., in queuing, insurance and reliability, see the Section “Origin of Geometric Sums” in the Introduction of [
6]. Furthermore, a multitude of important stochastic models described by systems of dependent random variables occurred to meet diverse applications, see, e.g., [
7]. In particular, the general theory of stochastic processes and random fields arose in the last century (for introduction to random fields, see, e.g., [
8]).
An intriguing problem of estimating the convergence rate to a limit law was addressed by A.C. Berry and C.-G. Esseen. Their papers initiated the study of proximity for distribution functions of the normalized partial sums of independent random variables to the distribution function of a standard Gaussian law in the framework of the classical theory of random sums.
To assess the proximity of distributions, we will employ various integral probability metrics. Usually, for random variables
Y,
Z and a specified class
of functions
, one sets
Clearly,
is a functional depending on
and
, i.e., distributions of
Y and
Z. A class
should be rich enough to guarantee that
possesses the properties of a metric (or semi-metric). The general theory of probability metrics is presented, e.g., in [
9,
10]. In terms of such metrics, one often compares the distribution of a random variable
Y under consideration with that of a target random variable
Z. In
Section 2, we recall the definitions of the Kolmogorov and Kantorovich (alternatively called Wasserstein) distances and Zolotarev ideal metrics corresponding to the adequate choice of
, denoted below as
,
and
, respectively.
It should be emphasized that for sums of random variables, deep results were established along with creation and development of different methods of analysis. One can mention the method of characteristic functions due to the works of J.Fourier, P.-S.Laplace and A.M.Lyapunov, the method of moments proposed by P.L.Chebyshev and developed by A.A.Markov, the Lindeberg method of employing auxiliary Gaussian random variables and the Bernstein techniques of large and small boxes. In 1972, C.Stein in [
11] (see also [
12]) introduced the new method to estimate the proximity of the distribution under consideration to a normal law. Furthermore, this powerful method was developed in the framework of classical limit theorems of the probability theory. We describe this method in
Section 2. Applying the Stein method along with other tools, one can establish in certain cases the sharp estimates of closeness between a target distribution and other ones in specified metrics (see, e.g., [
13,
14]). We recommend the books [
15,
16] and the paper [
17] for basic ideas of the ingenious Stein method. The development of this techniques under mild moment restrictions for summands is treated in [
18,
19]. We mention in passing that there are deep generalizations of Stein techniques involving generators of certain Markov processes; a compact exposition is provided, e.g., on p. 2 of [
20].
In the theory of random sums of random summands, the limit theorems with exponential law as a target distribution play a role similar to the central limit theorem for (nonrandom) sums of random variables. Here, one has to underline the principal role of the Rényi classical theorem for geometric sums published in [
21]. Recall this famous result. Let
be a sequence of independent identically distributed (i.i.d.) random variables such that
. Take a geometric random variable
with parameter
, defined as follows:
Assume that
and
are independent. Set
,
,
. Then,
where
stands for convergence in distribution, and
Z follows the exponential law
with parameter
,
. In fact, instead of
, A.Rényi considered the shifted geometric random variable
such that
,
. Clearly,
has the same law as
. He supposed that i.i.d. random variables
are non-negative, and
and
are independent. Then,
converges in distribution to
as
, where
. It was explained in [
22] that both statements are equivalent and the assumption of nonnegativity of summands can be omitted.
Building on the previous investigations discussed below in this section, we study different instances of quantifying the approximation of random sums by limit laws and also extend the Stein method employment. The main goals of our paper are the following: (1) to find sharp estimates (i.e., optimal ones which cannot be diminished) of proximity of geometric sums of independent (in general non-identically distributed) random variables to exponential law using the probability metric ; (2) to prove the new version of the Rényi theorem when the summands are described by a model of exchangeable random variables, establishing the due non-exponential limit law together with an optimal bound of the convergence rate applying ; (3) to obtain the exact convergence rate of appropriately normalized random sums of random summands to the generalized gamma distribution when the number of summands follows the generalized negative binomial distribution employing ; (4) to introduce the inverse transformation to an “equilibrium distribution transformation”, give full description of its existence and demonstrate the advantage of applying the Stein method combined with that inverse transform; and (5) to use such approach in deriving the new approximation in the Kolmogorov metric of the Pareto distribution by an exponential one, which is important in signal processing.
The main idea is to apply the Stein method and deduce (Lemma 2) new estimates of the solution of Stein’s equation (corresponding to an exponential law as a target distribution) when a function h appearing in its right-hand side belongs to a class . This entails the established sharp estimates. The integral probability metrics and the techniques of integration with respect to sign measures are essentially employed. It should be stressed that we consider random summands which take, in general, positive and negative values and in certain cases need not have the same law.
Now, we briefly comment on the relevance of the five groups of the paper results mentioned above. Some upper bounds for convergence rates in Equation (
3) were obtained previously by different tools (the renewal techniques and the memoryless property of the geometric distribution), and the estimates were not sharp. We refer to the results by A.D. Soloviev, V.V. Kalashnikov and S.Y. Vsekhsvyatskii, M. Brown, V.M. Kruglov and V.Yu. Korolev, where the authors either used the Kolmogorov distance or proved specified nonuniform estimates for differences of the corresponding distribution functions. For instance, in [
23] the following estimate was proved
where
. Moreover, this estimate is asymptotically exact when
. Some improvements are in [
24] under certain (hazard rate) assumptions. E.V. Sugakova obtained a version of the Rényi theorem for independent, in general, not identically distributed random variables. We also mention contributions by V.V. Kalashnikov, E.F. Peköz, A. Röllin, N. Ross and T.L. Hung which gave the estimates in terms of the Zolotarev ideal metrics. We do not reproduce all these results here since they can be viewed on pages 3 and 4 of [
22] with references where they were published.
In Corollary 3.6 of [
25] for nondegenerate i.i.d. positive random variables
with mean
and finite second moment, it was proved that
where
,
is the Zolotarev ideal metric of order two,
,
. In [
22], the estimates for proximity of geometric sums distributions to
were provided in the Kantorovich and
metrics. A substantial contribution of the authors of [
22] is the study of random summands
that need not be positive (see also [
26]). The general estimate for deviation of
from
in the ideal metric of order
s was proved in [
27]. We do not assume that
is constructed by means of i.i.d. random variables and, moreover, demonstrate that our estimate (for summands taking real values) involving the metric
is sharp.
The exchangeable random variables form an important class having various applications in statistics and combinatorics, see, e.g., [
28]. As far as we know, the model of exchangeable random variables is studied in the context of random sums for the first time here. It is interesting that instead of the exponential limit law we indicate explicit expression of the new limit law. In addition, we establish the sharp estimate of proximity of random sums distributions to this law using
.
A natural generalization of the Rényi theorem is to study the summation index following non-geometrical distribution. In this way, the upper bound of the convergence rate of random sums of random summands to generalized gamma distribution was proved in [
29]. Theorem 3.1 in [
30] contains the estimates in the Kolmogorov and Kantorovich distances for approximations of non-negative random variable law by specified (nongeneralized) gamma distribution. The proof relies on Stein’s identity for gamma distribution established in H.M.Luk’s PhD thesis (see the reference in [
30]). New estimates of the solutions of the gamma Stein equation are given in [
31]. We derive the sharp estimate for approximation of random sums by generalized gamma law using the Zolotarev metric of order two. In a quite recent paper [
32] the author established deep results concerning further generalizations of the Rényi theorem. Namely, Theorem 1 of [
32] demonstrates how one can provide the upper bounds of the convergence rate of specified random sums to a more general law than an exponential one using the estimates in the Rényi theorem. This approach is appealing since the author employs the ideal metric of order
. However, the sharpness of these estimates was not examined.
Note that in [
33] the important “equilibrium transformation of distributions” was proposed and employed along with the Stein techniques. We will consider this transformation
for a random variable
X in
Section 7 and also tackle other useful transformations. In the present paper, the inverse to the “equilibrium distribution transformation” is introduced. We completely describe the possibility to construct such transformation and provide an explicit formula for the corresponding density. The idea to apply such inverse transformation whenever it exists is based on the result [
33] demonstrating that one can obtain a more precise estimate for proximity in the Kantorovich metric between
and
Z than between
X and
Z, where
and
,
. We extend this result. Moreover, we prove that in this way one can obtain a new estimate of approximation of the Pareto distribution by an exponential one. It is shown that our new estimate is advantageous for a wide range of parameters of the Pareto distribution. Let
, i.e., the distribution function of
is
We show that the preimage
. Thus, for any
,
, one has
where
and
stands for the Kolmogorov distance. This bound is more precise than the previous ones applied in signal processing, see, e.g., [
34].
This paper is organized as follows. After the Introduction, the auxiliary results are provided in
Section 2. Here we include the material important for understanding the main results. We recall the concept of probability metrics, consider the Kolmogorov and the Kantorovich distances and examine the Zolotarev ideal metrics. We describe the basic ideas of Stein’s method, especially for the exponential target distribution. In this section, we formulate a simple but useful Lemma 1 concerning the essential supremum of the Lipschitz function, an important Lemma 2 giving the solution of the Stein equation for different functional classes. We explain the essential role of the generalized equilibrium transformation proposed in [
22] which permits study of the summands taking both positive and negative values. We formulate Lemma 3 to be able to solve an integral equation involving the generalized equilibrium transformation when
and
. The proofs of auxiliary lemmas are placed in
Appendix A.
Section 3 is devoted to an approximation of the normalized geometric sums
by an exponential law. Here, the sharp convergence rate is found (see Theorem 1) by means of the probability metric
. The proof is based on the Lebesgue–Stieltjes integration techniques, the formula of integration by parts for functions of bounded variations, Lemma 2, various limit theorems for integrals and the important result of [
22] concerning the estimates involving the Kantorovich distance. In
Section 4, for the first time an analog of the Rényi theorem is proved for a model of exchangeable random variables proposed in [
35]. We demonstrate (Theorem 2) that, in contrast to Rényi’s theorem, the limit distribution for random sums under consideration is a specified mixture of two explicitly indicated laws. Moreover, the sharp convergence rate to this limit law is obtained (Theorem 3) by means of
. In
Section 5, the distance between the generalized gamma law and the suitably normalized sum of independent random variables is estimated when the number of summands has the generalized negative binomial distribution. Theorem 4 demonstrates that this estimate is sharp. For the proof, we employ various truncation techniques, the transformations of parameters of initial random variables, the monotone convergence theorem and explicit formula for the generalized gamma distribution moments of order
, obtained in [
27].
Section 6 provides the pioneering study of the same problem in the framework of exchangeable random variables and also gives the sharp estimate for the
metric (Theorem 5). In
Section 7, we introduce the inverse to the equilibrium transformation of the probability measures. Lemma 6 contains a full description of situations when a unique preimage
X of a random variable
exists and gives an explicit formula for distribution of
X. This approach permits us to obtain the new estimates of closeness of probability measures in the Kolmogorov and Kantorovich metrics (Theorem 6). In particular, due to Theorem 6 and Lemmas 2, 6, it becomes possible to find a useful estimate of proximity of the Pareto law to the exponential one (Example 2).
Section 8 containing the conclusions and indications for further research work is followed by
Appendix A and the list of references.
2. Auxiliary Results
Let
where
if
A holds and zero otherwise. The choice
in Equation (
1) corresponds to the Kolmogorov distance. Note that
h above is a function in
x, whereas
z is the index parameterizing the class.
A function
is called the Lipschitz one if
Then,
and in light of Equation (
4),
is the smallest possible constant
C appearing in Equation (
5). We write
, where
for a collection of the Lipschitz functions having
. For
set
(where, for
,
stands for the minimal integer number which is equal or greater than
a). Introduce a class of functions
As usual,
,
. We write
for a metric defined according to Equation (
1) with
. V.M. Zolotarev and many other researchers defined an ideal metric
of order
involving only bounded functions from
. We will use collections
and
without assumption that functions
h are bounded on
. This is the reason why we write
instead of
. Thus, we employ
Note that in definitions of
we deal with
, where the space
consists of functions
such that
exists for all
, and
is continuous on
(evidently the Lipschitz function is continuous). One calls
the Kantorovich metric (the term Wasserstein metric appears in the literature as well). One also uses the bounded Kantorovich metric when the class
contains all the bounded functions from
. The metric
was introduced in [
36] and called an ideal metric in light of its important properties. The properties of
metrics, where
, are collected in Sec. 2 of [
32]. We mention in passing that various functionals are ubiquitous in assessing the proximity of distributions. In this regard, we refer, e.g., to [
37,
38].
To apply the Stein method, we begin with fixing the target random variable
Z (or its distribution) and describe a class
to estimate
for a random variable
Y under consideration. Then, the problem is to indicate an operator
T (with specified domain of definition) so that the Stein equation
has a solution
,
, for each function
. After that, one can substitute
Y instead of
x in Equation (
6) and take the expectation of both sides, assuming that all these expectations are finite. As a result, one comes to the relation
It is not a priori clear why the estimation of the left-hand side of Equation (
7) is more adequate than the estimation of
for
. However, in many situations, justifying the method this occurs. The choice of
T depends on the distribution of
Z. Note that in certain cases (e.g., when
Z follows the Poisson law) one considers functions
f defined on a subset of
. We emphasize that the construction of operator
T is a nontrivial problem, see, e.g., [
33,
39,
40,
41].
The basic idea in this way is the following. For many probability distributions (Gaussian, Laplace, Exponential, etc.), one can find an operator
T characterizing the law of a target variable
Z. In other words, for a rather large class of functions
f,
if and only if
(i.e., the laws of
Y and
Z coincide). Thus, if
is small enough for a suitable class of functions
h, this leads to the assertion that the law of
Y is close (in a sense) to the law of
Z. One has to verify that this kind of “continuity” takes place. Clearly, if for any
, where
defines the integral probability metric in Equation (
1), one can find a solution
of Equation (
6), then the relation
for all
,
, yields
and, consequently,
.
Further, we assume that
, i.e.,
Z has exponential distribution with parameter
. In this case (see, e.g., Sec. 5 in [
17]), one uses the operator
and writes the Stein Equation (
6) as follows
It should be stipulated that
for a test function
, and there exists a differentiable solution
f of Equation (
9). Therefore, if one can find such solution
f, then
under the hypothesis that all these expectations are finite. If
is absolutely continuous, then (see, e.g., Theorem 13.18 of [
42]) for almost all
with respect to the Lebesgue measure, there exists
. Moreover, one can find an integrable (on each interval) function
,
, to guarantee, for each
, that
where
for almost all
. Thus,
is defined for such
f according to Equation (
8) for almost all
. In general, for an arbitrary random variable
Y, one cannot write
since the value of expectation depends on the choice of a version of
,
. Really, let
be such that
, where
m stands for the Lebesgue measure. Assume that
Y takes values in
B. Then, it is clear that
depends on the choice of a function
version defined on
. However, if the distribution
of a random variable
Y has a density with respect to
m, then
will be the same for any version of
(with respect to the Lebesgue measure). In certain cases, the Stein operator is applied to smoothed functions (see, e.g., [
33,
43]). Otherwise, Equation (
6) does not hold at each point of
(see, e.g., Lemma 2.2 in [
16]), and complementary efforts are needed. For our study, it is convenient to employ in Equation (
8) for
T in the capacity of
,
, the right derivative. In many cases, for a real-valued function
f defined on a fixed set
one considers
as "essential supremum". Recall that a function
is a version of
f (and vice versa) if the measure (here the Lebesgue measure) of points
x such that
is zero. The notation
means that one takes
, where
belongs to the class of all versions of
f. Clearly,
will be the same if we change
f on a subset of
D having a measure which is equal to zero. Thus, we write
instead of
appearing in Equation (
11). The following simple observation is useful. Its proof is provided in
Appendix A.
Lemma 1. A function h is the Lipschitz function on with if and only if h is absolutely continuous and (its essential supremum).
Remark 1. Note that , , for any . If, for some positive constant C, , then Equation (5) yields that . If is a Lipschitz function (with ), then exists for almost all and an application of Lemma 1 givesConsequently, for some positive A, B (one can take , ) and any . As is continuous on each interval, it follows that for some positive and all (, , ). Therefore, for some positive and each . Lemma 2. For any and each , the equationhas a solutionwhere . If , then for all there exists and . If , then is defined on and . For , a function is defined on and . The right-hand side of Equation (
13) is well defined for each
in light of Remark 1. Lemma 4.1 of [
33] contains for
some statements of Lemma 1. We will use the above estimates for any
. Estimates for
were not considered in [
33]. The proof of Lemma 2 is given in
Appendix A.
The following concept was introduced in [
33].
Definition 1 ([
33])
. Let X be a non-negative random variable with finite . One says that a random variable has distribution of equilibrium with respect to X if for any Lipschitz function , Note that Definition 1 deals separately with distributions of
X and
. One says that
is the result of the equilibrium transformation applied to
X. The same terminology is used for transition from
to
. For the sake of completeness, we explain in
Appendix A (Comments to Definition 1) why one can take the law of
having a density with respect to the Lebesgue measure
to guarantee the validity of Equation (
14).
Remark 2. For a non-negative random variable X with finite , one can construct a random variable having a density (15). Accordingly, we then have a random vector with specified marginal distributions. However, the joint law of X and is not fixed and can be chosen in appropriate way. If is a sequence of independent random variables, we will assume that a sequence consists of independent vectors, and these vectors are independent with all considered random variables which are independent with . In the recent paper [
22], a generalization of the equilibrium transformation of distributions was proposed without assuming that random variable
X is non-negative.
Definition 2 ([
22])
. Let X be a random variable having a distribution function , . Assume the existence of finite . An equilibrium distribution function corresponding to X (or ) is introduced by way ofwhere . This function can be written as , wherethus, is a density (with respect to the Lebesgue measure) of a signed measure corresponding to . In other words, Equation (17) demonstrates the Jordan decomposition (see, e.g., Sec. 29 of [44]) of . Clearly, for a non-negative random variable, the functions defined in Equation (
15) and Equation (
16) coincide. For a nonpositive random variable, the function
appearing in Equation (
16) is a distribution function of a probability measure. In general, when
X can take positive and negative values, the function introduced in Equation (
16) is not a distribution function. We will call
the generalized equilibrium distribution function. Note that
. Thus,
is the Lipschitz function and consequently continuous (
is well defined for each
since
is finite and nonzero). Moreover,
is absolutely continuous being the Lipschitz function. Each absolutely continuous function has bounded variation. If
G is a function of bounded variation, then
, where
and
are nondecreasing functions (see, e.g., [
42], Theorem 12.18). One can employ the canonical choice
, where
means the variation of
G on
,
(if
then
). If
G is right-continuous (on
), then evidently
and
are also right-continuous. Thus, for a right-continuous
G having bounded variation, a nondecreasing function
in its representation corresponds to a
-finite measure
on
,
. More precisely, there exists a unique
-finite measure
on
such that, for each finite interval
,
,
. Recall that one writes for the Lebesgue–Stieltjes integral with respect to a function
G
whenever the integrals in the right-hand side exist (with values in
), and the cases
or
are excluded. The integral
means the integration with respect to measure
,
. The signed measure
Q corresponding to
G is
. Thus,
means the integration with respect to signed measure
Q. Note that if
where
is right-continuous and nondecreasing (
), then
The left-hand side and the right-hand side of Equation (
19) make sense simultaneously, and if so, are equal to each other. Indeed, for any finite interval
(
), one has
. Thus, the signed measures corresponding to
and
coincide on
. We mention in passing that one can also employ the Jordan decomposition of a signed measure.
For
introduced in Equation (
16), the analog of Equation (
15) has the form
Taking into account Equation (
17), one can rewrite Equation (
20) equivalently as follows
The right-hand side of the latter relation does not depend on the choice of a version of
. Due to Theorem 1(d) of [
22], Equation (
20) is valid for any Lipschitz function
f. Evidently, an arbitrary function
need not be the Lipschitz one and vice versa.
Lemma 3. Let X be a random variable such that and . Then, Equation (20) is satisfied for all . 7. Inverse to Equilibrium Transformation
The development of Stein’s method is closely connected with various transformations of distributions. Let a random variable
and
. Then, one says that a random variable
has the
W-size biased distribution if for all
f such that
exists
The connection of this transformation with Stein’s equation was considered in [
50,
51]. It was pointed out in [
51] that this transformation works well for combinatorial problems, such as counting the number of vertices in a random graph having prespecified degrees, see also [
52]. In [
53], another transformation was introduced. Namely, if a random variable
W has mean zero and variance
, then the authors of [
53] write (Definition 1.1) that a variable
has
W-zero biased distribution whenever, for all differentiable
f such that
exists, the following relation holds
This definition is inspired by an equation
characterizing the normal law
. The authors of [
53] explain that
always exists if
and
. Zero-based coupling for products of normal random variables is treated in [
54]. In Sec. 2 of [
30], it is demonstrated that the gamma distribution is uniquely characterised by the property that its size-biased distribution is the same as its zero-biased distribution. Two generalizations of zero biasing were proposed in [
55], see p. 104 of that paper for discussion of these transformations. We refer also to survey [
56].
Now, we turn to the equilibrium distribution transformation introduced in [
33] and concentrate on approximation of the law under consideration by means of an exponential law, see the corresponding Definition 1 in
Section 2.
According to the second part of Theorem 2.1 of [
33] (in our notation), for
and non-negative random variable
X with
and
the following estimate holds
and at the same time
The authors of [
33] also proved that
. Notice that the estimate for
is more precise than that for
.
Now we turn to Equation (
77) and demonstrate how to find the distribution of
X when we know the distribution of
. In other words, we concentrate on the inverse of an equilibrium distribution transformation.
Assume that
. Recall that a random variable
exists if
appearing in Equation (
16) is a distribution function. The latter statement for
is equivalent to nonnegativity of
X. Indeed, for non-negative
X,
coincides with a distribution function having a density (
15). If
is a distribution function and
in Equation (
16), then
for
only if
for
.
Thus a random variable
has a (version of) density
introduced in Equation (
15). Obviously, the function
has the following properties. It is nonincreasing on
and
for
. This density is right-continuous on
and consequently
. Now, we are able to provide a full description of the class of densities for random variables
relevant to all non-negative
X with positive mean.
Lemma 6. Let a non-negative random variable have a version of density (with respect to the Lebesgue measure), , such that this function is nonincreasing on , for , and there is finite . Then, there exists a unique preimage of distribution having the distribution function F continuous at . Namely, Proof. First of all, note that
as otherwise
for all
(
is a nonincreasing function on
). We also know that there exist a left-sided limit and a right-sided limit of
at each point
as well as the right-sided limit of
at
. The set of discontinuity points of
is at most countable, and we can take a version which is right continuous at each point of
. Then, Equation (
78) introduces a distribution function. Consider a random variable
X with distribution function
F and check the validity of Equation (
14).
The integration by a parts formula yields, for any
,
Summands in the right-hand side of Equation (
79) are non-negative. Therefore, for any
,
. Hence, the monotone convergence theorem implies that
is finite. According to Equation (
78)
since
. Taking in the Equation (
79) limit as
, one obtains
. Now, we are ready to verify Equation (
14). For any Lipschitz function
f,
is finite and
Taking into account Equation (
80), we infer that
as
. Consequently, applying integration by parts once again (
f has bounded variation), we obtain
Uniqueness of
X distribution corresponding to
is a consequence of Equation (
15) and continuity of
at
. Indeed, assume that for
and
one has
. Then, Equation (
15) yields that for almost all
,
and therefore
, where
c is a positive constant (the equilibrium distribution in Definition 1 is introduced for random variables with positive expectation only). Since
, one has
. Let
,
, where the points
belong to the set considered in Equation (
81) to ensure that
. Thus, distributions of
and
coincide. □
Remark 6. Let be the Bernoulli random variable taking values 1 and 0 with probabilities p and , respectively. Then, it is easily seen that the distribution of is uniform on . Thus, in contrast to Lemma 6, without assumption of continuity of F at a point one can not guarantee, in general, the preimage uniqueness for the inverse transformation to the equilibrium one.
In the proof of Lemma 6, we find out that . Set , . Then, . Further, we suppose that this choice of is made.
Recall that random variables
U and
V are stochastically ordered if either
, for every
, or the opposite inequality holds (for all
). Now, we clarify one of the Theorem 2.1 of [
33] statements (see also Theorem 3 [
22], where the result similar to Theorem 2.1 of [
33] is formulated employing the generalized distributions).
Theorem 6. Let a random variable satisfy conditions of Lemma 6, and and X be a preimage of the equilibrium transformation. Then, Equation (77) holds. Moreover, the inequality becomes an equality when X and are stochastically ordered. Proof. Apply the Stein Equation (
10) along with equilibrium transformation (
14). Then, in light of
and
, we can write
The last inequality in (
82) is true due to Lemma 2. Now, we demonstrate that equality in (
82) can be attained. Taking
, we have a solution
of Equation (
12). Then,
Employing the integration by parts formula, one can show that the expression in the right-hand side of the last equality is equal to the Kantorovich distance between
X and
when these variables are stochastically ordered. Note that
,
as
and
,
as
because
and
are finite. Thus,
since
(or ≤) for all
. It is well-known that the Kantorovich distance is the minimal one for the metric
(see, e.g., [
9], Ch. 1, §1.3). Therefore,
where the infimum has taken over all joint laws
such that
and
(see also Remark 2 and [
10], Corollary 5.3.2). Consequently, in the framework of Theorem 6,
. □
Remark 7. One can show that by means of Lemma 2 and Equation (82) it is possible to provide an estimateFor each function h belonging to , in a similar way to Equation (82), one can apply Equation (10) together with equilibrium transformation. Now, it is sufficient to study the Stein equation with right derivative. Formula (13) gives a solution of the Stein equation according to Lemma 2. Note that for , the right derivative coincides almost everywhere with the derivative, and the law of is absolutely continuous according to Equation (15). Thus, for the Lipschitz function (see Lemma 2), one can use an equilibrium transformation. Example 1. Consider the distribution functions
of random variables
, taking values
and
with probabilities
,
. Formula (
15) yields that
has the following piece-line structure
If
then, for all
, the following inequality holds:
, i.e.,
and
are stochastically ordered. We see that for
, the inequality is violated in the right neighborhood of a point
. Thus, there are beside the stochastically ordered pairs (
X,
) also those of a different kind.
Now, we turn to another example of stochastically ordered X and .
Example 2. Take having the Pareto distribution. The notation means that has a density () and the corresponding distribution function , where , .
Further, we consider only
, since in this case there exists finite
. By means of Lemma 6, we obtain the distribution of the preimage of the equilibrium transformation
Thus one can state that
. It is not difficult to see that
for
, i.e., the random variables
and
X are stochastically ordered. Due to Theorem 6, one has
In such a way we find the bound for the Kolmogorov distance between the distributions
and
. This relation demonstrates the convergence rate of
to zero as
. The estimate is nontrivial for
.
Remark 8. It is interesting that estimation of the proximity of the Pareto law to the Exponential one became important in signal processing, see [
34] and references therein. Let
, where
,
, and
. In [
34], the author indicates that the Pinsker–Csiszár inequality was employed to derive
where
is the Kullback–Leibler divergence between laws of
X and
Z. More precisely, in the left-hand side of Equation (
85) one can write the total variation distance
between distributions of
X and
Z. Clearly,
. By evaluating
and performing an optimal choice of parameter
, it was demonstrated (formula (19) in [
34]) that, for
and any
,
if
. The author of [
34] on page 8 writes that in his previous work [
57] the inequality
was established with the same choice of
. Next, he also writes that “in the most cases
” and notes that the estimate in Equation (
86) involving the Kullback–Leibler divergence is more precise for
than the estimate in Equation (
87) obtained by the Stein method. Moreover, on page 4 of [
34] we read: “The problem with the Stein approach is that the bounds do not suggest a suitable way in which, for a given Pareto model, an appropriate approximating Exponential distribution can be specified”. However, we have demonstrated that application of the inverse equilibrium transformation together with the Stein method permits indicating, whenever
, the corresponding Exponential distribution with proximity closer than the right-hand sides of Equation (
86) and Equation (
87) can provide.
8. Conclusions
Our principle goal was to find the sharp estimates of the proximity of random sums distributions to exponential and more general laws. This goal is achieved when we employ the probability metric
. Thus, it would be valuable to find the best possible approximations of random sums distributions by means of specified laws using the metrics
of order
. The results of [
32] provide the basis for this approach.
There are various complementary refinements of the Rényi theorem. One approach is related to the employment of Brownian motion. It is interesting that in [
58] (p. 1071) the authors proposed an explanation of the Rényi theorem involving the embedding theorem. We provide a little bit different complete proof. Let
be i.i.d. random variables with mean
and
, whereas
denote the corresponding partial sums. According to Theorem 12.6 of [
59], which is due to A.V. Skorokhod and V. Strassen, there exists a standard Brownian motion
(perhaps it is defined on an extension of initial probability space) such that
and
where
stands for convergence in probability, and a.s. means almost surely. Thus, in light of Equation (
89), we can write, for
,
where
and
a.s. when
. Substitute
(see Equation (
2)) in Equation (
90) instead of
t. It is easily seen that
(i.e., for each
, one has
as
) and by means of characteristic functions one can verify that
as
, where
. Therefore,
,
. In the proof of Lemma 4, we showed (Equation (
24)) that
. Consequently,
Hence,
as
. Now, we demonstrate that
For any
and any
,
In light of Equation (
88), for arbitrary
and
, one can take
such that
. Then, for any
, we obtain
Since
, we can find
such that
if
. Therefore,
as
. The Slutsky lemma yields the desired relation
which implies Equation (
3). However, it seems that there is no clear intuitive reason why the law of the random sum converges to an exponential in the Rényi theorem. Moreover, in Ch. 3, Sec. 2 “The Rényi Limit Theorem” of [
20] (see Sec. 2.1 “Motivation”), one can find examples demonstrating that intuition behind the Rényi theorem is poor.
Actually, relation (
90) leads to refinements of Equation (
3). In [
58], it is proved that if
has finite exponential moments and other specified conditions are satisfied then there exists a more sophisticated approximation for distribution of
, and its accuracy is estimated. The results are applied to the study of
queue for both light-tailed and heavy-tailed service time distributions. Note that in [
58],
Section 5, the authors study the model where the distribution of
can depend on
p. For future research, it would be desirable to establish analogues of our theorems for such a model.
The results concerning the accuracy of approximating a distribution under consideration by an exponential law are applicable to some queuing models. Let, for a queue
, the inter-arrival times follow
distribution and
S stand for the general service time. Introduce the stationary waiting time
W and define
to be its load. Due to [
60], if
then
as
, where
. Theorem 3.1 of [
45] contains an upper bound of
, where
. This estimate is used by the authors for analysis of queueing systems with a single server. It would be interesting to obtain the sharp approximations in the framework of queueing systems.
For the model of exchangeable random variables, Theorem 2 in
Section 2 ensures the weak convergence of distributions under consideration to specified mixture of explicitly indicated laws. Theorem 3 proves the sharp convergence rate estimate to this limit law by means of the ideal probability metric of the second order. It would be worthwhile to establish such an estimate of the distributions proximity applying the Lévy–Prokhorov distance because convergence in this metric is equivalent to the weak convergence of distributions of random variables. All the more, at present there is no unified theory of probability metrics. In this regard, one can mention Proposition 1.2 of [
17] stating that if a random variable
Z has the Lebesgue density bounded by
C then, for any random variable
Y,
However, this estimate only gives the sub-optimal convergence rates. We also highlight the important total variation distance
. The authors of [
61] study the sum
, where
is a family of locally dependent non-negative integer-valued random variables. Using the perturbations of Stein’s operator, they establish the upper bounds for
where the law of
M is a mixture of Poisson distribution and either binomial or negative binomial distribution. It would be desirable to obtain the sharp estimates and, moreover, consider a more general model where the set of summation is random. In this connection, it seems helpful to employ the paper [
62], where the authors proved results concerning the weak convergence of distributions of statistics constructed from samples of random size. In addition, it would be interesting to extend these results to stratified samples by invoking Lemma 1 of [
63].
Special attention is paid to various generalizations of the geometric sums. In Theorem 3.3 of [
64], the authors consider random sums with summation index
, where
are i.i.d. random variables following the geometric law
, see Equation (
2). Then, they show that
converge in distribution to the gamma law with certain parameters as
. In [
62], it is demonstrated that the Linnik and the Mittag–Leffler laws arise naturally in the framework of limit theorems for random sums. Hopefully, in future the complete picture of limit laws involving general theory of distributions mixtures will appear. In addition, it is desirable to study various models of random sums of dependent random variables. On this track, it could be useful to consider the decompositions of exchangeable random sequences extending the fundamental de Finetti theorem, see, e.g., [
65].
One can try to generalize the results of
Section 7 for accumulative laws proposed in [
66]. These laws are akin to both the Pareto distribution and the lognormal distribution. In addition, we refer to [
43] where the “variance-gamma distributions” were studied. These distributions form a four-parameter family and comprise as special and limiting cases the normal, gamma and Laplace distributions. Employment of these distributions permits enlarging a range of applications in modeling and fitting real data.
To complete the indication of further research directions, we note that the next essential and nontrivial step is to establish the limit theorem in functional spaces for processes generated by a sequence of random sums of random variables. For such stochastic processes, one can obtain the analogues of the classical invariance principles.