1. Introduction
The origins of
q-calculus date back to the works of [
1] in connection to infinite series in his famous book
Introductio in Analysin Infinitorum. In the last few decades, there has been a considerable interest in the topic due to its numerous applications in different areas of mathematics and physics, such as number theory, special functions, orthogonal polynomials, operator theory, combinatorics, probability, quantum mechanics, relativity, string theory, etc. An extensive treatment on
q-calculus, its history and applications can be found in [
2].
In this paper, we explore a connection between q-calculus and the distribution theory of records from sequences of random variables geometrically distributed.
Ref. [
3] started the probabilistic study of record values and gave many of their basic distributional properties. Let us recall some definitions and results. Let
be a sequence of independent and identically distributed (iid) random variables with the common distribution function
F. The sequence of (upper)
ordinary record times,
, is defined as
,
with
. The
m-th
(ordinary) record of the sequence is the random variable
,
. For
, the
record indicators are the binary random variables
and
with probability one. In the case in which
F is continuous, the basic distribution theory of the statistics above defined is very well-known; see, for instance, chapter 2 of [
4]. In the continuous case, the distributions of the record indicators, the record times and the inter-record times do not depend on the parent distribution, and we have results like the following (for further references, see [
5]):
The record indicators are independent Bernoulli random variables with
,
. Then
The joint probability mass function (pmf) of the first
m (non-trivial) record times is
for integers
; also
with
being the Stirling numbers of the first kind defined by the equality
The pmf of the inter-record times,
,
satisfies (see [
6])
so then, for
,
The number of records in a sequence of length
n, denoted
, has pmf
and its expected value is
.
If the parent distribution, F, is discrete, the previous results are no longer true, and in general, the distribution of record times, inter-record times, record indicators and number of records depend on F. Thus, the distribution theory for records from discrete parents is more complicated than in the continuous case. In fact, there exist in the literature two definitions of records, ordinary and weak records, which are equivalent in the case of continuous parents, but not so in the discrete case. The weak record time sequence is defined as , and . Then, the weak record of the sequence (or from the parent distribution F) is the random variable , . Note that the difference with ordinary records is that ties with the previous records are also considered as records. Obviously, for continuous distributions, weak records coincide (almost surely) with ordinary records. In the case of discrete distributions, ordinary and weak records lead to different definitions.
We say that a random variable, X, follows a geometric distribution with parameter if its pmf is for and . We denote .
The number of records in geometrically distributed sequences has been studied in connection with a data structure called skiplist; see [
7,
8] or [
9]. Related studies can be found in [
10], which established the asymptotic normality of the number of records from geometric parents and also studied large deviations, local limit theorems and approximations. Some of these results have been extended to other discrete parents, so [
11] established a CLT for the number of records from discrete distributions; rates of growth for the number of records and laws of large numbers can be found in [
12,
13].
Most of the results above deal with limit properties, and not very much attention has been paid to the exact distribution of statistics related to the number of records. Our main contribution in this paper is the use of techniques of q-calculus for the study of the distribution theory of records from geometric parents.
In
Section 2, we present the Jackson integral or
q-integral and the extension of this concept to multiple
q-integration in a convex polytope; see [
14]. We also present in this section some useful lemmas for the calculation of multiple
q-integrals (proofs are deferred to the
Appendix A) and the non-central
q-Stirling numbers; see [
15]. No previous knowledge of
q-calculus is necessary to understand this section. In
Section 3, we study some statistics related to the sequence of weak records from a sequence of geometrically distributed random variables. Theorem 1 is a central result in this section. It gives a multiple
q-integral representation of the joint probability function of the (weak) record times. As a consequence of this result, we show that the distribution of record times and that of the number of records in a sample of size
n can be written in terms of the central and non-central
q-Stirling numbers. We also study the distribution of inter-(weak) record times and we prove its asymptotic log-normal behavior. In
Section 4, it becomes clear that the results obtained for a geometric parent, Ge
, are
q-analogues of the corresponding results in the case of an absolutely continuous distribution, so we can say that the absolutely continuous case is the limit case of geometric parents when
. Finally, in
Section 5, we consider the case of ordinary records. We show that there is the following duality: the formulas in the case of ordinary records can be obtained from the corresponding ones in the case of weak records by replacing
with
.
Throughout this paper, we will use the usual conventions with sums and products: for , and , and more generally, the summation (product) over an empty set of indexes is equal to zero (one). Additionally, .
2. The Jackson Integral and -Analogues
In the following, we present some basic concepts of
q-calculus. The interested reader can find more details in [
16], or, for a deeper treatment of the subject, see [
2]. Regardless, this section contains all the results of
q-calculus that we need in this paper.
The most basic concept in
q-calculus is perhaps that of the
q-number. Let
. The
q-numbers are defined as
for
. For a positive integer,
m, its
q-factorial is
and
. Throughout this paper,
. Thus, for instance, for
, we have
.
Given a function
f defined on the non-negative semi-axis of the real line,
and
, the Jackson integral or
q-integral of
f in the interval
is defined as
provided that the series in the right hand side (rhs) of (
5) is absolutely convergent.
The Jackson integral has some resemblances to the Riemann integral; for instance, for
,
Among the differences with respect to the ordinary Riemann integrals, we mention that the formula of the change of variables is no longer valid, so, for natural
,
The definition of a
q-integral can be extended to a multiple
q-integral in a convex polytope; see [
14]. Let us denote as
the set of permutations of
. Let
. For a function of
m variables,
and a convex polytope
, the
q-integral of
f over
C with respect to the order
of integration is defined as
where
,
is the set of real numbers depending on the values of
with
and
(implicitly, hereafter we assume the absolute convergence of the involved series).
It follows from the definition that when the polytope
C is a hyper-rectangle, i.e.,
, the ordering
is irrelevant; that is to say, we can change the order of integration, and so, for any
,
for
f, a function in
m variables, such that the series in the rhs of (
8) is absolutely convergent. Thus, for example,
for any
and
,
.
The next Lemmas are useful in connection with iterated and multiple Jackson integrals.
Lemma 1. Let f be a real function of m variables such that the series appearing below are absolutely convergent. Then, for any ,
- (a)
.
- (b)
In general, when the polytope of integration is not a hyper-rectangle, the order of integration is important. We illustrate this fact with the following example:
but,
Throughout this paper we will use the reverse ordering,
. For
, we define the polytopes
and from the definition of a multiple
q-integral given in (
7),
Lemma 2. Let f be a real function of m variables such that the series appearing below are absolutely convergent. Then,
- (a)
- (b)
Comparing the results in Lemmas 1 and 2, we obtain
and using these results in combination with (
6), it can be confirmed that for
and
, real numbers such that
, for
,
and in particular, for
,
and for
and
,
Given a (formal) power series
, we denote
,
. For non-negative integers,
n and
r, the non-central
q-Stirling numbers of the first kind (see [
15]) are defined as
and by convention
if
or
or
. The case
corresponds to the
q-Stirling numbers of the first kind; see [
2], Ch. 5. We denote
. The non-central
q-Stirling numbers satisfy the recurrence
Some useful relationships for our purposes are the following:
By equating coefficients of
in both sides of
we obtain
Lemma 3. For the integers , and ,
- (a)
.
- (b)
, where .
Lemma 4. For and f, a non-negative function defined in the non-negative real numbers,
- (a)
- (b)
Many variants of Lemma 4 (all of them with similar proofs) are possible; for instance, for a non-negative function
,
,
Note that results like (
25) or those in Lemma 4 justify the formal interchange of the summation and integration symbols.
Suppose that
is an expression involving the parameter
q. Then, it is said that
is a
q-analogue of
A if
when
.
q-analogues appear in different areas of mathematics and physics, such as, for instance, in quantum groups, string theory, fractals, dynamical systems, elliptic curves, etc. It is clear that
q-numbers are
q-analogues of the corresponding real numbers. Expression (
6) is a
q-analogue of the well-known formula for Riemann integrals
,
. Similarly, (
9), (
14)–(17) are
q-analogues of the corresponding expressions obtained by substituting
q-integrals for Riemann integrals and
q-numbers for real numbers. Then, the following lemma is immediate.
Lemma 5. If is a polynomial with real coefficients in the variables , then the multiple q-integralis a q-analogue of the multiple Riemann integral Finally, we show a connection between multiple
q-integrals and expected values involving geometric samples. Let
,
be independent random variables with geometric distributions of the parameter
(i.e.,
). Let
. Let
be a function such that
and thus, from (
8) we get
3. Weak Records from Geometric Parent
Let be a sequence of iid random variables with a common distribution Ge with and . The index n in the process can be regarded as discrete time. A trajectory of the process is a sequence of non-negative integers. A trajectory up to time n is a finite sequence of length n of non-negative integers. We will denote a finite trajectory of length by a sequence where , are non-negative integers. Consecutive equal integers in a trajectory are denoted with exponents; for instance, is the sequence of length t. A sequence of length 1 formed with a non-negative integer less than k is denoted by . Similar notations are used in the cases ≤, >, ≥, etc. As an example of these notations, consider the trajectory . Whenever it is convenient, this trajectory could be denoted as: or , or ,…etc.
Theorem 1. For ,
- (a)
For , the joint pmf of from admits the following q-integral representation: - (b)
The joint probability generating function of isfor , .
Proof. (a) For the integers
,
, the event
corresponds to finite trajectories of length
of the process
, which are of the form:
with
. The probability associated with the trajectory given in (
28) is
. Then, using Lemma 2(a),
(b) Firstly, observe that for the integers
, and
,
For
and
,
,
. Using Theorem 1(a) and (
29), we have
where the interchange of the summation and integral symbols can be justified with a similar argument to the one in the proof of Lemma 4. □
Theorem 2. For and ,
- (a)
- (b)
Proof. - (a)
From Theorem 1(b), the probability-generating function of
is
If
, then
. Therefore, if
,
then we have that
, and the following expansion holds:
On the other hand, it is not difficult to check that
for
,
and
.
Thus, for
, using (
30), (
31), Lemma 3 and (
32),
- (b)
For
, using Theorem 2(a),
□
Some other statistics related to weak record indicators can be expressed in terms of q-integrals and q-numbers as shown below.
Theorem 3. For ,
- (a)
.
- (b)
.
- (c)
.
Proof. (a) For
, firstly observe that
The event
consists of all the trajectories of the process
with a weak record at position
n, i.e., trajectories of the form
; then, using the definition of the Jackson integral (see (
5) and (
33))
(b) The trajectories corresponding to the event
are of the form
, with
. Then, using Lemma 2 and (
16),
(c) The trajectories corresponding to the event
are of the form
, with
. Then
(see the proof of
(a) for the last step). □
The weak record indicators from geometric parents, as opposed to the continuous case, are not independentl for instance,
The number of weak records in a sample of size n is (note that the 0-th record, , is counted).
Theorem 4. For ,
- (a)
.
- (b)
.
- (c)
Proof. - (a)
Observe that if , . Then, the result is immediate using Theorem 2(b).
- (b)
For
, using (a) and the properties (22) and (23)
For
, note that if in a sample of size
n, there is only one weak record. Then, it is in the first position, so, using Theorem 3(c),
where in the last equality we have used the property
.
- (c)
In the proof of Theorem 3(a), we obtained the integral representation
Clearly, this formula is also valid for
. Therefore,
□
The inter-(weak)-record times are defined as , .
Theorem 5. For ,
- (a)
The joint pmf of the inter-record times from a Ge(p) parent isfor , . - (b)
The pmf of the m-th inter-weak record is - (c)
Proof. (a) It is straightforward from Theorem 1(a).
(b) For
(integer), from (
34), using Lemma 4(a),
where the last equality follows from (
18).
(c) Using (
36) and (
12), for
,
□
The next result proves the asymptotic log-normality of the inter-(weak) record times from geometric parents.
Theorem 6. The inter-weak record times from a parent, , satisfy Proof. Let
,
, be independent random variables. The CLT yields,
For
, let us define
. From Theorem 5(c) and (
26),
for
.
Let us define the functions . These functions satisfy the following properties:
- (a)
- (b)
For a fixed value of , the function is non-decreasing.
Then, for
,
where
.
Let
be the cumulative distribution function of a standard normal random variable. If we show that
then we can conclude
and this proves (
37).
In order to prove (
38), for
, we have
where
and this quantity can be made arbitrarily small choosing an
yet small enough. □
A by-product of Theorem 6 is the following weak law of large numbers:
When
,
, and this is the limit in probability for
as
, where
denotes the waiting-time between the
-th and the
m-th upper record in a sequence of iid random variables with a continuous distribution; see [
6]. Then, (
39) can be seen as a
q-analogue of Theorem 1 in [
6]. Similarly, our Theorem 6 is a
q-analogue of Theorem 2 in [
6].