1. Introduction
Waiting time problems for runs and patterns in a random sequence of trials are considered important, as they are of theoretical interest and have practical applications in various areas of statistics and applied probability such as reliability, sampling inspection, quality control, DNA/RNA sequence analysis, and hypothesis testing ([
1]). For comprehensive surveys and applications of related waiting time problems, refer to the books of Balakrishnan and Koutras [
2] and Fu and Lou [
3].
Let
be a sequence of random variables taking values in a finite set
A. A finite sequence of elements of
A is called a pattern. We consider a finite collection
of patterns, possibly of different lengths. For
, let
be the waiting time until the first occurrence of pattern
as a run in the series
. Let
W be the waiting time until one of the
K patterns appears, i.e.,
Many researchers have studied waiting time problems for general and specific choices of
in a random sequence of trials. When
is a sequence of independent and identically distributed (i.i.d.) Bernoulli trials, Fu and Koutras [
4] developed a finite Markov chain embedding method, which was first employed by Fu [
5], to study the exact distributions for the numbers of specified runs and patterns. Fu [
6] extended the finite Markov chain embedding method to study the exact distributions for the numbers of runs and patterns in a sequence of i.i.d. multi-state trials. In addition, he obtained the waiting time distribution of a specified pattern.
In this paper, we are mainly interested in computing the waiting time distribution, as well as the stopping probabilities
,
. Li [
7], Gerber and Li [
8], Guibas and Odlyzko [
9], Blom and Thorburn [
10] and Antzoulakos [
11] considered the case when
is a sequence of i.i.d. multi-state trials. Li [
7] and Gerber and Li [
8] used the martingale approach to obtain the mean waiting time
and the stopping probabilities
,
for a finite collection
of patterns. Guibas and Odlyzko [
9] used the combinatorial method to obtain the probability generating function of the waiting time. Blom and Thorburn [
10] also used the combinatorial method to obtain the mean waiting time
and the stopping probabilities
,
for a finite collection
of patterns with the same length. Antzoulakos [
11] used the finite Markov chain embedding method to study waiting time problems for a single pattern as well as a finite collection
of patterns.
Han and Hirano [
12], Fu and Chang [
13], Glaz et al. [
14], Pozdnyakov [
15], Gava and Salotti [
16], Zhao et al. [
17] and Kerimov and Öner [
18] considered the case when
is a discrete time homogenous Markov chain with a finite state space, i.e.,
is a sequence of Markov dependent multi-state trials. Han and Hirano [
12] studied waiting time problems for two different patterns. Fu and Chang [
13] studied waiting time problems for a finite collection of patterns by using the finite Markov chain embedding method. Glaz et al. [
14] obtained the mean waiting time
and the probability generating function of the waiting time for a finite collection of patterns in a two-state Markov chain by using the method of gambling teams and martingale approach. Pozdnyakov [
15] investigated the same problems as in Glaz et al. [
14] for multi-state Markovian trials. Gava and Salotti [
16] obtained the system of linear equations of stopping probabilities
,
, by using the methods developed for gambling teams in [
14,
15]. Recently, Zhao et al. [
17] found a method, which is based on the method of [
9], to calculate
and
,
. Even more recently, Kerimov and Öner [
18] found oscillation properties of the expected stopping times and stopping probabilities for patterns consisting of two consecutive states. For useful reviews of different approaches to solve waiting time problems of patterns for both i.i.d. and Markov dependent trials, refer to Fu and Lou [
3].
Antzoulakos [
11] and Fu and Chang [
13] obtained the probability generating function of the waiting time for a finite collection of patterns in a sequence of i.i.d. and Markov dependent multi-state trials, respectively. They used a Markov chain with absorbing states corresponding to the patterns and considered the waiting time as the first entrance time into the absorbing state. The Markov chain has the transition probability matrix
P of the form:
where
is the submatrix of
P whose entries are transition probabilities from a transient state to a transient state,
is the submatrix of
P whose entries are transition probabilities from a transient state to an absorbing state,
O is the zero matrix and
I is the identity matrix. By using the general formula that represents the probability generating function of the first entrance time into the absorbing state, they obtained the probability generating function of the waiting time. Their results are expressed in terms of the submatrices
and
, as well as variants of them. Chang [
19] also studied waiting time problems for a finite collection of patterns. He investigated the distribution of the waiting time until the
rth occurrence of any pattern in the collection of patterns. He also used the expression (
1) for the analysis.
In this paper, we consider a sequence of i.i.d. multi-state trials. We also use a Markov chain with transition probability matrix of the form (
1). However, we heavily investigate the structure of the submatrices
and
. This enables us to construct a finite GI/M/1-type Markov chain with a disaster and consider the waiting time as the time until the occurrence of the disaster. Based on this and the matrix analytic method, we obtain the probability generating function of the waiting time
W on
,
,
. From this, we can obtain the stopping probabilities
,
as well as the conditional/unconditional mean waiting times,
and
, but it also enables us to compute the waiting time distribution by a numerical inversion. The benefit of our method is that it is useful and efficient even when the length of the pattern is large. Our method can also be extended to Markov dependent multi-state trials.
The paper is organized as follows. In
Section 2, we formulate our waiting time problems. In
Section 3, we construct a GI/M/1-type Markov chain with a disaster. From this we can obtain our results, which are given in
Section 4. In
Section 5, numerical examples are presented to illustrate our results. Conclusions are given in
Section 6.
3. GI/M/1-Type Markov Chain with a Disaster
In this section, we construct a GI/M/1-type Markov chain with a disaster to obtain an expression for (
2). We define the following three terms:
is a subpattern of pattern if for some k with ; when , means the null pattern (i.e., the pattern with length 0).
A subpattern of pattern is proper if (i.e., ).
A subpattern of pattern is a leading subpattern of if for some j with .
Assume that for , is not a proper subpattern of .
We now introduce a two-dimensional process , , where and are defined as follows:
(i) , and for ,
is the largest such that is a proper leading subpattern of a pattern in , if .
, where is an extra point, if .
(ii) and for ,
is the smallest such that is a proper leading subpattern of pattern , if .
, if .
, if and .
To clarify the definitions of
and
, we provide the following example: Let
be a sequence of i.i.d. trials taking values in a finite set
. Suppose
, where
and
. For example, if we consider the sequence of trials
then
,
are given in
Table 1. As another example, if we consider the sequence of trials
then
,
are given in
Table 2. Note that
is a discrete time Markov chain.
Define
and for
, let
be the number of patterns in
whose lengths are larger than
k, i.e.,
We also define
. Note that
. If
, then
. Furthermore, the set of all possible values of
when
is given by
Therefore, the state space of the discrete time Markov chain
is
For each state
, the first component
k is called level. The one-step transition probability matrix
P of
is given, in lexicographic order with
being the last element in the set of levels, as follows:
where the submatrices are described below. A matrix consisting of
components with
and
will be called an
matrix.
For
,
is the
matrix whose
-component is
where
is the subset of
A consisting of
x such that
is not a leading subpattern of a pattern in
for any
.
For
,
is an
matrix. The
-component of
is
if
. Otherwise,
.
For
, and
,
is an
matrix. The
-component of
is
if the following three conditions hold:
- (i)
is a proper leading subpattern of pattern ;
- (ii)
is not a proper leading subpattern of pattern for ;
- (iii)
is not a leading subpattern of a pattern in for .
Otherwise, .
O’s are zero matrices (possibly of different sizes).
is the identity matrix.
To make it easier to understand how the matrix
P in (
4) is constructed, we explain with an example. For the previously described example with
,
and
, we have
The matrix
P is
where
That is,
P is given by
Let
be a two-dimensional discrete time Markov chain with the same state space
as that given in (
3) and the same transition probability matrix
P as that given in (
4), but with an arbitrary initial state. Note that
is a finite
-type Markov chain with a disaster. This disaster occurs when
reaches
.
4. Probability Generating Function of the Waiting Time
In this section, we derive an expression for (
2). The analysis is based on the matrix analytic method. For more details, refer to Neuts [
20,
21] and Latouche and Ramaswami [
22]. Let
For
, we define
which means the probability generating functions for the time of the first visit to state
, starting from state
at time 0, before the first visit to level
. Let
be the matrix of the probability generating functions whose
-component is
. Conditioning on the first transition, we have
where the
-component of
is
Since
we have
Substituting (
6) into (
5), we obtain
Equation (
7) can be interpreted as follows: Starting from level
n, the Markov chain may visit level
(while avoiding level
) in two ways: it may move up to level
at the very next transition (contributing the factor
), or it may move to level
k (
) at the first transition, move up from level
k to level
, then from level
to level
, and so on, until finally moving from level
n to level
(contributing the factor
). From (
7), we obtain
where
is the
identity matrix.
For
, we define
which means that
(
) is the probability generating function for the time of the first visit to state
, starting from state
, before the first visit to level
, and that
is the probability generating function for the time of the first visit to state
, starting from state
. Let
be the matrix of the probability generating functions whose
-component is
. Conditioning on the first transition, we have
where the
-component of
(
) is
Since
we have
Substituting (
11) into (
9), we obtain
which can be written as
From this equation, we obtain
Recall that
,
. Since
we have
which means, by (
10),
Therefore, by (
11),
In summary, we obtain the following theorem.
Theorem 1. The probability generating functions of the sooner waiting time W on , , , are given bywhere , and , are given by (8) and (12), respectively. From Theorem 1, we can obtain the following results.
Corollary 1. - (i)
The stopping probabilities , are given by - (ii)
The conditional probability generating functions of W, given , are given by - (iii)
The marginal probability generating function of W is given by
Remark. As mentioned in
Section 2, the conditional probability mass function
,
can be computed from (
14) by a numerical inversion. In addition, the probability mass function
can be computed from (
15) by a numerical inversion. For the numerical inversion of probability generating functions, refer to Abate and Whitt [
23].
By Theorem 1, we can also obtain the conditional/unconditional means of the sooner waiting time. To get this, we introduce
Recall that
,
. By differentiating (
13) with respect to
z and evaluating at
, we have
Therefore, to obtain an expression for
,
, we need to determine
,
, and
,
. Equation (
8) may be written as
from which
Therefore,
,
are obtained as follows:
Similarly, we can obtain
,
, by using equation (
12). Equation (
12) may be written as
from which
Therefore,
,
are obtained as follows:
Since
,
, we can obtain the conditional mean waiting times
,
from
. We can also obtain the unconditional mean waiting time
from
. From these two formulas and (
16), we obtain the following theorem.
Theorem 2. The conditional and unconditional means of the sooner waiting time W are given by, respectively,wherewith , and , given by (17) and (18), respectively. 5. Numerical Examples
In this section, we present numerical results for the computations of the stopping probabilities, the probability mass functions (along with the tail probabilities) of the sooner waiting time, and the conditional/unconditional means of the sooner waiting time. To illustrate our results, we provide two examples.
Example 1. Let be a sequence of i.i.d. trials taking values in a finite set . Assume that for ,Suppose that , i.e., the collection consists of 10 patterns, . We select the collection of patterns as shown in Table 3, where the lengths of the patterns, , are chosen from the order statistics of i.i.d. random variables with mean 5. The set of patterns given in Table 3 is an example of a randomly selected pattern set such that one pattern is not a subpattern of another. The procedure of randomly selecting a one pattern set is omitted here. By Theorem 1, we can compute
.
Table 4 shows the stopping probabilities
In
Figure 1, we plot the joint probabilities
,
, with
n varying. This can be computed by the numerical inversion of its generating function:
In
Table 5, we present the probability mass function of
W
and the tail probability of
W
with
n varying. Here,
can be computed by the numerical inversion of its generating function:
By Theorem 2, we can compute the conditional mean of the sooner waiting time
W,
,
, and the unconditional mean of
W,
.
Table 6 shows the conditional and unconditional mean waiting times for Example 1.
The next example will be for the Bernoulli trials.
Example 2. Let be a sequence of i.i.d. Bernoulli trials, i.e., takes values in a finite set . Assume that for ,Suppose that the collection consists of 5 patterns, , where For Example 2, the joint probabilities
,
are shown in
Figure 2. Also, the stopping probabilities
,
, the probability mass function of
W (along with the tail probability) and the conditional/unconditional means of
W are shown in
Table 7,
Table 8 and
Table 9, respectively.
6. Conclusions
We have derived the probability generating function of the sooner waiting time for a finite collection of patterns in a sequence of i.i.d. multi-state trials. From this probability generating function we have obtained the stopping probabilities and the mean waiting time, but it also has enabled us to compute the waiting time distribution by a numerical inversion. As mentioned in the introduction, our method can be extended to Markov dependent multi-state trials.
For further research, we will investigate the tail asymptotics for the sooner waiting time
W. From
Figure 1 and
Figure 2, we can expect that the distribution of
W has a geometric tail behavior. This is true under certain aperiodic condition because
W is the first passage time to a subset of the state space, in a discrete time Markov chain with a finite state space. Under some assumptions about periodicity, the distribution of
W exhibits a geometric tail behavior, i.e.,
for some
and
. Here “∼" means that the limit of the ratio is 1. It would be of interest to find explicit expressions for
c and
. We also have the following geometric tail behavior:
for some
and
. Here
is independent of
i and is the same as that described above. It would also be of interest to find explicit expressions for
,
.