Next Article in Journal
On the Use of High-Order Shape Functions in the SAFE Method and Their Performance in Wave Propagation Problems
Next Article in Special Issue
Preface to Computational Mathematics and Applied Statistics
Previous Article in Journal / Special Issue
A Bivariate Beta from Gamma Ratios for Determining a Potential Variance Change Point: Inspired from a Process Control Scenario
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data

1
Department of Statistics, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
2
Faculty of Graduate Studies for Statistical Research, Department of Mathematical Statistics, Cairo University, Giza 12613, Egypt
3
Department of Statistics, CHRIST (Deemed to be University), Hosur Road, Bangalore 560029, India
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2022, 27(4), 62; https://doi.org/10.3390/mca27040062
Submission received: 8 June 2022 / Revised: 14 July 2022 / Accepted: 15 July 2022 / Published: 20 July 2022
(This article belongs to the Special Issue Computational Mathematics and Applied Statistics)

Abstract

:
In this paper, a new discrete distribution called Binomial–Natural Discrete Lindley distribution is proposed by compounding the binomial and natural discrete Lindley distributions. Some properties of the distribution are discussed including the moment-generating function, moments and hazard rate function. Estimation of the distribution’s parameter is studied by methods of moments, proportions and maximum likelihood. A simulation study is performed to compare the performance of the different estimates in terms of bias and mean square error. SO2 data applications are also presented to see that the new distribution is useful in modeling data.

1. Introduction

Count data modeling is a challenging task in many areas, including, but not limited to, public health, medicine, epidemiology, applied science, sociology, and agriculture. In many situations, the life length of a device cannot be measured on a continuous scale and the survival function is assumed to be a function of a count random variable instead of being a function of a continuous-time random variable. Therefore, discrete distributions are somewhat meaningful to model lifetime data in situations where output may be of a discrete nature. The traditional discrete distributions have limited applicability as models for reliability, failure times, aggregate loss, etc., especially with the count data with over-dispersion in which the variance is greater than the mean. This has led to the development of some discrete distributions based on popular continuous models in reliability analysis, actuarial sciences survival analysis, etc. The discretization of continuous distributions has produced many discrete distributions in the last few decades in the statistical literature. However, the quest for a quintessential model remains the crux of the matter in the diverse scientific paradigm.
One of the many approaches to define new models is the discretization of distributions. Until recently, the majority of discrete lifetime distributions have been proposed in the statistical literature by discretizing the survival function S ( x ) of continuous lifetime distributions (see the work of authors, for example, in references [1,2,3,4,5,6,7,8,9,10,11,12]).
The probability mass function (pmf) P ( X = x ) is defined as follows
P ( X = x ) = S ( x ) S ( x + 1 )                                         x = 0 ,   1 ,   2 ,    
Away from this method, Afify [12] have introduced and studied a new discrete Lindley distribution by constructing a mixture of discrete analogs to the continuous components used in creating the continuous Lindley distribution.
In this paper, we propose and study a new probability mass function (pmf), denoted by p x , by compounding the binomial and the NDL distributions. The basic principle of this method is stated as if N (input) and X (output) are two random variables denoting the number of particles entering and leaving an attenuator, then the probability functions   p ( n ) and f ( x )   of these two random variables are connected by the binomial decay transformation
P ( X = x ) = n = x ( n x ) p x ( 1 p ) n x p ( n ) ;     x = 0 , 1 , ,  
where 0 p 1 is the attenuating coefficient which is discussed by Hu et al. [7]. They considered p ( n ) as a Poisson distribution with the parameter λ > 0 , and then they showed that Pr ( X = x ) is the Poisson distribution with the parameter λ p . For clarity, attenuators are electrical devices built to lower the amount of voltage flowing through them without severely compromising the signal’s integrity. They serve as a safeguard against systems being exposed to signals with power levels that are too high to be decoded. Déniz [13] introduced uniform Poisson distribution using the idea of Hu et al. [7] by interchanging in Equation (1) the binomial distribution and the discrete uniform distribution and maintaining P ( n ) as the Poisson distribution. Some new discrete distributions also are proposed in the literature using the methodology of [7]. Akdoğan et al. [14] proposed uniform-geometric distribution and Coşkun et al. [15] constructed binomial–discrete Lindley distribution.
The rest of the paper is arranged as follows: Section 2 defines the natural discrete Lindley distribution and proposes the new binomial–natural discrete Lindley distribution with important properties, subsequently. In Section 3, various parameter estimation and simulation studies are given. Section 4 concerns the real data illustration of the findings. In Section 5, some conclusions are provided.

2. Natural Discrete Lindley Distribution

Recently, Al-Babtain et al. [16] proposed and studied a new natural discrete analog of the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution is called natural discrete Lindley (NDL) distribution and it has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. The NDL can be applied in the collective risk models and is competitive with the Poisson distribution to fit automobile-claim-frequency data. Let N be a non-negative random variable obtained as a finite mixture of geometric ( p ) and negative binomial (2, p ) with mixing probabilities p p + 1 and 1 p + 1 , respectively, then the probability mass function of the NDL distribution is defined as
P ( N = n ) = p 2 p + 1 ( 2 + n ) ( 1 p ) n       ;       n = 0 , 1 , 2 ,     and     p ( 0 , 1 )  
One of the most important features of this distribution is that it has a single parameter and it has attractive properties, which makes it suitable for applications not only in insurance settings but also in other fields where over-dispersions are observed. For more details about this distribution, see Al-Babtain et al. [16]. Given the usefulness of NDL, the discrete analogue due to NDL known as the binomial NDL (BNDL) seems to be naturally interesting to explore.

2.1. The Proposed Discrete Analog

The probability mass function (1) can be expressed as
P ( X = x ) = n = x P ( X = x | N = n ) P ( N = n ) ,
where P ( X | N = n ) has the binomial b ( n , p ) distribution. Suppose that   N is the random variable from NDL with parameter p given in (2); then, the probability mass function of the discrete random variable X is obtained as
p x ( x ; p ) = P ( X = x ) = n = x P ( X = x | N = n ) P ( N = n ) = n = x ( n x ) p x ( 1 p ) n x p 2 p + 1 ( 2 + n ) ( 1 p ) n = n x = 0 ( n x ) p x ( 1 p ) n x p 2 p + 1 ( 2 + n ) ( 1 p ) n = k = 0 ( x + k x ) p x ( 1 p ) k p 2 p + 1 ( 2 + x + k ) ( 1 p ) x + k = p 2 p + 1 k = 0 ( x + k x ) p x ( 2 + x + k ) ( 1 p ) x + 2 k = ( 1 p ) x ( 1 + x + 2 p p 2 ) ( p + 1 ) ( 2 p ) x + 2 ;                 x = 0 , 1 , 2 and   p ( 0 , 1 )
If X has the pmf (3), then it is called a binomial natural discrete Lindley (BNDL) random variable and it is denoted by X ~ BNDL ( p ) . For n = 0 , this means that no particles enter into the attenuator and it will be termed as failure. Consequently, the corresponding cumulative distribution function (cdf) of BNDL distribution is given by
F ( x ; p ) = P ( X x ) = t = 0 x p x ( t ) = t = 0 x ( 1 p ) t ( 1 + t + 2 p p 2 ) ( p + 1 ) ( 2 p ) t + 2 = 1 ( 1 p ) x + 1 ( 3 + x + p p 2 ) ( p + 1 ) ( 2 p ) x + 2 .                    
Figure 1 shows the probability mass function (pmf) plots of the proposed distribution for various values of parameter p. Thus, the pmf is always a decreasing function, and the new discrete random variable tends to take small values when p increases. The stochastic process tends to happen very quickly once the parameter value grows, which is implied quite strongly by the model’s behavior. Therefore, the BNDL model is a logical substitute for the traditional exponential distribution to characterize such phenomena. Additionally, the flexibility of the proposed BNDL can be tested for varied count data sources. For example, this model may be helpful for simulating aggregate losses that are typically limited to actuarial data by maximizing the overall garment fit for a particular number of sizes and accommodation rate, crucial to assessing the goodness of the scaling system. Furthermore, it may be helpful to overcome the problem of over-dispersed data in social sciences, as in anthropology where civilizations grew near the existence of a consistent water source, which is necessary for human survival. Figure 2 complements the results of Figure 1.

2.2. Statistical Properties of the BNDL Distribution

Primarily in this section, we provide some explicit results based on the mathematical properties of the BNDL distribution.

2.2.1. Moment-Generating Function

If X ~ BNDL ( p ) distribution, then the moment-generating function of X is given as
M X ( t ) = E ( e t X ) = x = 0 e t x ( 1 p ) x ( 1 + x + 2 p p 2 ) ( p + 1 ) ( 2 p ) x + 2 = 1 p ( e t 2 ) + p 2 ( e t 1 ) ( 2 e t + p e t p ) 2 ( p + 1 ) .
For more on generating functions, see Yalcin and Simsek [17], Yalcin and Simsek [18] and Simsek [19].

2.2.2. Probability-Generating Function

The probability-generating function of the random variable X ~ BNDL ( p ) can be obtained using its moment-generating function which is equivalent to calculating E ( t X ) ; therefore, the probability-generating function of the random variable X is
G X ( t ) = E ( t X ) = M X ( l o g ( t ) ) = 1 p ( t 2 ) + p 2 ( t 1 ) ( 2 t + p ( t 1 ) ) 2 ( p + 1 ) .
Since,
G X ( k ) ( t ) = d k G X ( t ) d t k = E { X ( X 1 ) ( X 2 ) ( X k + 1 ) t X k } .
Therefore, at t = 1 , we can obatin
G X ( k ) ( 1 ) = d k G X ( t ) d t k | t = 1 = E { X ( X 1 ) ( X 2 ) ( X k + 1 ) } ,
where μ ( k ) = E { X ( X 1 ) ( X 2 ) ( X k + 1 ) } is the k th factorial moment of X .

2.2.3. Non-Central Moments and Variance

If X ~ BNDL ( p ) distribution, then the kth moment about zero of X is given by
μ k = E ( X r ) = x = 0 x k p x = x = 0 x k ( 1 p ) x ( 1 + x + 2 p p 2 ) ( p + 1 ) ( 2 p ) x + 2 .
The first four raw moments can be obtained as follows
μ 1 = E ( X ) = ( p + 2 ) ( 1 p ) p + 1 ,
μ 2 = E ( X 2 ) = ( 1 p ) ( 8 3 p 2 p 2 ) p + 1 ,
μ 3 = E ( X 3 ) = ( 1 p ) ( 44 53 p + 6 p 2 + 6 p 3 ) p + 1 ,
and
μ 4 = E ( X 4 ) = ( 1 p ) ( 308 516 p + 346 p 2 12 p 3 24 p 4 ) p + 1 .
The variance in the random variable X is
V a r ( X ) = E ( X 2 ) [ E ( X ) ] 2 = ( 1 p ) ( 4 + 5 p 2 p 2 p 3 ) ( p + 1 ) 2 .

2.2.4. Central Moments

The kth moment about the mean of X is
μ r = E [ ( X μ 1 ) k ] = x = 0 ( x μ 1 ) k p x ( x ) = x = 0 ( x μ 1 ) k ( 1 p ) x ( 1 + x + 2 p p 2 ) ( p + 1 ) ( 2 p ) x + 2 .
Therefore, the second, third and fourth central moments of the random variable X are
μ 2 = ( 1 p ) ( 4 + 5 p 2 p 2 p 3 ) ( p + 1 ) 2 ,
μ 3 = ( 1 p ) ( 12 + 21 p 7 p 2 21 p 3 + 5 p 4 + 2 p 5 ) ( p + 1 ) 3 ,
and
μ 4 = ( 1 p ) ( 100 + 181 p 132 p 2 285 p 3 + 50 p 4 + 137 p 5 27 p 6 9 p 7 ) ( p + 1 ) 4

2.2.5. Skewness and Kurtosis

The coefficient of skewness and the coefficient of kurtosis of the of BNDL distribution are, respectively,
β 1 = μ 3 μ 2 3 = ( 1 p ) ( 12 + 21 p 7 p 2 21 p 3 + 5 p 4 + 2 p 5 ) ( 4 + p 7 p 2 + p 3 + p 4 ) 3 / 2 .
β 2 = μ 4 μ 2 2 = 100 + 181 p 132 p 2 285 p 3 + 50 p 4 + 137 p 5 27 p 6 9 p 7 ( 1 p ) ( 4 + 5 p 2 p 2 p 3 ) 2 .

2.2.6. Index of Dispersion

The index of dispersion (ID) indicates whether a certain distribution is suitable for under- or over-dispersed datasets. For example, ID = 1 for the Poisson distribution where the variance is equal to the mean, for the geometric distribution and the negative binomial distribution ID > 1 , while the binomial distribution has ID < 1 .
Theorem 1.
If X ~ B N D L ( p ) , then V a r ( X ) > E ( X ) for all p ( 0 , 1 ) .
Proof. 
We have
ID ( X ) = V a r ( X ) E ( X ) = 4 + 5 p 2 p 2 p 3 p 2 + 3 p + 2 .
This function is a monotonic decreasing function as p ( 0 , 1 ) increases. It converges to 2 when p 0 , while it tends to 1 as p 1 ; therefore, ID ( X ) ( 1 , 2 ) , which means that ID ( X ) > 1 , and hence, V a r ( X ) > E ( X ) . □
From Theorem 1, BNDL distribution should only be used in the count data analysis with over-dispersion. In Table 1, some of the empirical findings of these measured are due for considerations.

2.2.7. Log-Concavity

A necessary and sufficient condition that p x be strongly unimodal is that it has to be log-concave, i.e., p x + 1 2 p x p x + 2 for all x (see Keilson and Gerber [20])).
Theorem 2.
The pmf of the BNDL distribution in (3) is log-concave.
Proof. 
From (3), we can directly reach
p x + 1 2 = ( 1 p ) 2 x + 2 ( 2 + x + 2 p p 2 ) 2 ( p + 1 ) 2 ( 2 p ) 2 x + 6 ,
and
p x p x + 2 = ( 1 p ) 2 x + 2 ( 1 + x + 2 p p 2 ) ( 3 + x + 2 p p 2 ) ( p + 1 ) 2 ( 2 p ) 2 x + 6 .
After some algebraic operations, we find that
p x + 1 2 p x p x + 2 = ( 1 p ) 2 x + 2 ( p + 1 ) 2 ( 2 p ) 2 x + 6 > 0 ,
for all x and for all choices   p ( 0 , 1 ) .
Theorem 2 confirms that the BNDL distribution is strongly unimodal. □

2.3. Reliability Properties of the BNDL Distribution

2.3.1. Survival Function

If X ~ BNDL ( p ) distribution, then from (4), the survival function of   X is
S ( x ; p ) = P ( X x ) = ( 1 p ) x + 1 ( 3 + x + p p 2 ) ( p + 1 ) ( 2 p ) x + 2 .

2.3.2. Hazard Rate and Mean Residual Life Functions

The hazard (failure) rate function is the probability that an item has survived time x , given that it has survived to at least time x . If X ~ BNDL ( p ) distribution, then its hazard rate (failure rate) function is given as
r ( x ; p ) = P ( X = x | X > x ) = p x ( x ; p ) S ( x ; p ) = 1 + x + 2 p p 2 ( 1 p ) ( 3 + x + p p 2 ) .
Obviously, the upper limit of the failure rate function is 1 1 p , i.e., lim x r ( x ; p ) = 1 1 p . Graphical illustrations of hazard rate function are presented in Figure 3 while descriptive measures are presented in Figure 4.
The mean residual life function of   X is given by
m ( x ; p ) = P ( X x | X > x ) = t = x + 1 S ( t ; p ) S ( x ; p ) = ( p 1 ) ( p 2 x 5 ) 3 + p p 2 + x .
Corollary 1.
If X ~ B N D L ( p ) distribution, then it has an increasing failure rate and decreasing mean residual life.
As we explained through Theorem 2, the BNDL distribution has a property of log-concavity; therefore, according to Gupta et al. [21], the BNDL distribution has an IFR property. According to Kemp [22], the next chain is verified
IFR IFRA NBU NBUE DMRL .
So, the BNDL distribution is
  • IFR (increasing failure rate).
  • IFRA(increasing failure rate average).
  • NBU (new better than used).
  • NBUE(new better than used in expectation).
  • DMRL (decreasing mean residual lifetime).

2.4. Stochastic Orderings

Stochastic orders are important measures to judge comparative behaviors of random variables. Shaked and Shanthikumar [8] showed that many stochastic orders exist and have various applications. Given two random variables X and Y , we say that X is smaller than Y in the
  • Usual stochastic order, denoted by X s t Y , if F X ( x ) F Y ( x ) , for all x .
  • Hazard rate order, denoted by X h r Y , if h X ( x ) h Y ( x ) , for all x .
  • Reversed hazard rate order, denoted by   X r h Y , if F X ( x ) / F Y ( x ) decreases in x .
  • Mean residual life order, denoted by   X m r l Y , if m X ( x ) m Y ( x ) , for all x.
  • Likelihood ratio order, denoted by X l r Y , if f X ( x ) / f Y ( x ) decreases in x .
For all the previous orders, we have the following chains of implications:
X l r Y X h r Y X s t Y ,
and
X l r Y X r h Y X s t Y
also,
X h r Y X m r l Y .
Theorem 3.
Let X ~ B N D L ( p 1 ) and Y ~ B N D L ( p 2 ) ; then, X l r Y for all p 1 > p 2 .
Proof. 
Let
L ( x ; p 1 , p 2 ) = p X ( x ; p 1 ) p Y ( x ; p 2 ) .
Now,
L ( x ; p 1 , p 2 ) = ( p 2 + 1 ) ( 2 p 2 ) x + 2 ( 1 p 1 ) x ( 1 + x + 2 p 1 p 1 2 ) ( p 1 + 1 ) ( 2 p 1 ) x + 2 ( 1 p 2 ) x ( 1 + x + 2 p 2 p 2 2 ) ,
and
L ( x + 1 ;   p 1 , p 2 ) = ( p 2 + 1 ) ( 2 p 2 ) x + 3 ( 1 p 1 ) x + 1 ( 2 + x + 2 p 1 p 1 2 ) ( p 1 + 1 ) ( 2 p 1 ) x + 3 ( 1 p 2 ) x + 1 ( 2 + x + 2 p 2 p 2 2 ) .
Therefore,
L ( x + 1 ;   p 1 , p 2 ) L ( x ; p 1 , p 2 ) = ( 2 p 2 ) ( 1 p 1 ) ( 2 + x + 2 p 1 p 1 2 ) ( 1 + x + 2 p 2 p 2 2 ) ( 2 p 1 ) ( 1 p 2 ) ( 2 + x + 2 p 2 p 2 2 ) ( 1 + x + 2 p 1 p 1 2 )
Let p 1 = 1 δ and p 2 = 1 δ ε , where 0 < δ < 1 and 0 < ε < 1 δ .
After substitution of the values p 1 and p 2 in (5), we obtain
L ( x + 1 ;   p 1 , p 2 ) L ( x ; p 1 , p 2 ) = η 1 ( δ + δ 2 + δ ε ) η 2 ( δ + δ ε + δ 2 + ε ) ,
where
η 1 = ( 3 + x δ 2 ) ( 2 + x ( δ + ε ) 2 ) ,
and
η 2 = ( 3 + x ( δ + ε ) 2 ) ( 2 + x ( δ ) 2 ) .
After some algebraic operations, we find that
η 1 η 2 = ε ( 2 δ + ε ) < 0   η 1 < η 2 .
Therefore,
η 1 ( δ + δ 2 + δ ε ) < η 2 ( δ + δ ε + δ 2 + ε ) .
This implies that
L ( x + 1 ;   p 1 , p 2 ) L ( x ; p 1 , p 2 ) < 1 L ( x + 1 ;   p 1 , p 2 ) < L ( x ; p 1 , p 2 ) .

2.5. Entropy

Entropy is a measure of uncertainty of a random variable. The entropy of a discrete random variable X with pmf p ( x ) and alphabet X is given by
( X ) = E ( l o g p ( X ) ) = x X   p ( x ) l o g ( p ( x ) ) .  
Entropy can be interpreted as the measure of average uncertainty in X or the average number of bits needed to describe   X . For more details on entropy and information theory, we refer the reader to Gray [23].
Now, if X ~ BNDL ( p ) , then the entropy of the random variable X can be calculated by the following formula
( X ) = 1 ( 2 p ) 2 ( 1 + p ) { ( 2 p ) 2 [ ( 2 + p + p 2 ) l o g ( 1 p ) + ( 4 + p p 2 ) l o g ( 2 p ) + ( 1 + p ) l o g ( 1 + p ) ] + LerchPhi   ( 0 , 1 , 0 ) [ 1 p 2 p , 1 , 1 + 2 p p 2 ] } ,  
where   LerchPhi ( 0 , 1 , 0 ) [ z , s , a ] gives the Lerch transcendent Φ ( z , s , a ) = k = 0 z k ( a + k ) s . Table 2 presents some numerical values of the entropy of X ~ BNDL ( p ) for different choices of p . From Table 2, one can observe that ( X ) is monotonically decreasing in p ( 0 , 1 ) with its limits tending to be 1.88 as p tends to 0 as p 1 .
Figure 5 relates the ( X )   to the values of parameter p. One may note that (X) is monotonically decreasing in p ∈ (0, 1) with its limit inclining to zero as p tends to 1.

3. Estimation and Simulation

In this section, we determine the estimation of unknown parameter p by the maximum likelihood, moment and proportion methods.

3.1. Method of Maximum Likelihood Estimation

Let x 1 ,   x 2 , ,   x n be the observed values from the BNDL distribution with parameter   p . The likelihood and log-likelihood function are given, respectively, as
L ( p ) = i = 1 n f ( x i ) = i = 1 n ( 1 p ) x i ( 1 + x i + 2 p p 2 ) ( p + 1 ) ( 2 p ) x i + 2 ,
and
l ( p ) = l o g ( 1 p ) i = 1 n x i + i = 1 n l o g ( 1 + x i + 2 p p 2 ) n l o g ( p + 1 ) 2 n l o g ( 2 p ) l o g ( 2 p ) i = 1 n x i .
The maximum likelihood estimate (MLE) of the parameter p can be obtained by solving the following equation using some numerical procedures.
l ( p ) p = 3 p n 2 + p p 2 i = 1 n x i 2 3 p + p 2 + 2 i = 1 n 1 p 1 + 2 p p 2 + x i = 0

3.2. Method of Moments Estimation

Let X 1 ,   X 2 , ,   X n be a random sample from the BNDL distribution with parameter p . The moment estimate (ME) of the parameter p can be obtained by solving the following equation.
( p + 2 ) ( 1 p ) p + 1 = 1 n i = 1 n X i .

3.3. Method of Proportions Estimation

Let X 1 ,   X 2 , ,   X n be a random sample from the BNDL distribution with parameter p . For i = 1 , 2 , , n , we define the indicator functions
I ( X i ) = { 1           i f   X i = 0 0           i f   X i > 0
.
Therefore, the proportion of 0s in the sample Π = 1 n i = 1 n I ( X i ) . The proportion estimate (PE) of the parameter p can be obtained by solving the following equation with respect to p
Π = 1 + 2 p p 2 ( p + 1 ) ( 2 p ) 2 .

3.4. Simulation Study

In this section, we assess the behavior of the maximum likelihood estimators for a finite sample of size n. Based on BNDL distribution, a simulation study is carried out. The simulation study is based on the following steps: firstly, generate N = 1000 samples of sizes n = 25, 50, …, 500 from the BNDL distribution. Then, compute the maximum likelihood estimators for the model parameters. Lastly, compute the MSEs given by
MSE ( p ) = 1 1000 i = 1 1000 ( p ^ p ) 2  
For various parameters’ values, the simulation’s results provided in Figure 6 indicate that the estimated MSEs fall off toward zero when the sample size n increases. Hence, we have conclusive evidence to claim that the maximum likelihood estimation of p satisfies the asymptotic convergence of normality. The asymptotic normality of the MLE is a very well-known classic property given as follows. In a parametric model, we say that an estimator p ^ based on X 1 ,   X 2 ,   X 3 ,   ,   X n is consistent if p ^ p in probability as n     . We say that it is asymptotically normal if n ( p ^ p ) converges in distribution to a normal distribution. So p ^ above is consistent and asymptotically normal.

4. Applications to Count Data

In this section, to show the application, we used a real-life data set to examine the efficiency and superiority of the BNDL distribution in modeling real data practice, recently studied by Balakarishnan et al. [24], consisting of 744 discrete observations. Santiago, Chile is recognized as one of the most environmentally contaminated cities in the world. In order to obtain the level of air pollution and its associated adverse effects on humans in Santiago, the National Commission of Environment (CONAMA) of the government of Chile collects data on sulfur dioxide (SO2) concentrations in the air. The data corresponding to the hourly SO2 concentrations (in ppm) observed at a monitoring station located in Santiago city are:
x12345678910 and above
f862351201193515119410
The descriptive statistics of the data sets are, Mean = 2.93, Median = 2, Mode = 3, SD = 2.02, Coefficient of Variation = 0.69, Skewness = 4.32, Kurtosis = 34.57, Range = 24, Min value = 1 and Max value = 25.
We compare BNDL to Binomial–Discrete Lindley Distribution (BDLD) by Kuş et al. [15] and Negative Binomial distribution. The pmf of BDLD is given as
p x ( x ; p ) = p 2 x [ { p 3 ( 1 p ) ( 1 p x ) } l o g ( p ) + ( 1 p ) { 1 p ( 1 p ) } ] { 1 l o g ( p ) } { 1 p ( 1 p ) } x + 2
We considered the AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion) and HQIC (Hannan–Quinn Information Criterion). The model with minimum values for these statistics could be chosen as the best model to fit the data. All results in Table 3 were obtained using the R PROGRAM.
Figure 7 gives the quantile–quantile plot (Q-Q plot) and box plot and Figure 8 gives TTT plot versus the EHRF for the given data set. Total Time on Test (TTT plots) showed that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figure 9 and Figure 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.

5. Concluding Remarks

A new one-parameter discrete distribution was proposed and its important distributional, monotonic, and reliability characteristics were explored. Some statistical and reliability properties of the proposed discrete model were derived. Various estimating approaches were discussed. A simulation study was conducted to determine the MLEs’ accuracy and precision. The applicability of the proposed distribution in modeling a real-life discrete data set was demonstrated. It is clear from the comparison that the new distribution is the best distribution for fitting the data sets from among the all-tested distributions and it will be a useful contribution to the field of count data modeling.

Author Contributions

Conceptualization, S.S. and S.K.; methodology, W.M.; software, J.G.; validation, S.S. and S.K.; formal analysis, W.M.; investigation, S.S.; resources, F.J.; data curation, W.M.; writing—original draft preparation, S.S. and W.M.; writing—review and editing, S.K.; visualization, J.G.; supervision, S.K.; project administration, F.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Aryuyuen, S.; Bodhisuwan, W.; Volodin, A. Discrete Generalized Odd Lindley—Weibull Distribution with Applications. Lobachevskii J. Math. 2020, 41, 945–955. [Google Scholar] [CrossRef]
  2. Chakraborty, S. A New Discrete Distribution Related to Generalized Gamma Distribution and Its Properties. Commun. Stat. Theory Methods 2015, 44, 1691–1705. [Google Scholar] [CrossRef]
  3. Chakraborty, S.; Chakravarty, D. Discrete Gamma Distributions: Properties and Parameter Estimations. Commun. Stat. Theory Methods 2012, 41, 3301–3324. [Google Scholar] [CrossRef]
  4. Chakraborty, S.; Dhrubajyoti, C. A Discrete Gumbel Distribution. arXiv 2014. Available online: https://arxiv.org/abs/1410.7568 (accessed on 8 June 2022).
  5. El-Morshedy, M.; Eliwa, M.S.; Nagy, H. A New Two-Parameter Exponentiated Discrete Lindley Distribution: Properties, Estimation and Applications. J. Appl. Stat. 2018, 47, 354–375. [Google Scholar] [CrossRef]
  6. Gómez-Déniz, E.; Calderín-Ojeda, E. The Discrete Lindley Distribution: Properties and Applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
  7. Hu, Y.; Peng, X.; Li, T.; Guo, H. On the Poisson Approximation to Photon Distribution for Faint Lasers. Phys. Lett. A 2007, 367, 173–176. [Google Scholar] [CrossRef] [Green Version]
  8. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
  9. Nekoukhou, V.; Alamatsaz, M.H.; Bidram, H. Discrete Generalized Exponential Distribution of a Second Type. Statistics 2013, 47, 876–887. [Google Scholar] [CrossRef]
  10. Para, B.A.; Jan, T.R. Discrete Generalized Weibull Distribution: Properties and Applications in Medical Sciences. Pak. J. Stat. 2017, 33, 337–354. [Google Scholar]
  11. Roy, D. The Discrete Normal Distribution. Commun. Stat.-Theory Methods 2003, 32, 1871–1883. [Google Scholar] [CrossRef]
  12. Afify, A.Z.; Elmorshedy, M.; Eliwa, M.S. A New Skewed Discrete Model: Properties, Inference, and Applications. Pak. J. Stat. Oper. Res. 2021, 17, 799–816. [Google Scholar] [CrossRef]
  13. Déniz, E.G. A New Discrete Distribution: Properties and Applications in Medical Care. J. Appl. Stat. 2013, 40, 2760–2770. [Google Scholar] [CrossRef]
  14. Akdoğan, Y.; Kuş, C.; Asgharzadeh, A.; Kinaci, I.; Sharafi, F. Uniform-Geometric Distribution. J. Stat. Comput. Simul. 2016, 86, 1754–1770. [Google Scholar] [CrossRef]
  15. Kuş, C.; Akdoğan, Y.; Asgharzadeh, A.; Kınacı, I.; Karakaya, K. Binomial-Discrete Lindley Distribution. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2019, 68, 401–411. [Google Scholar] [CrossRef]
  16. Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications. Entropy 2020, 22, 603. [Google Scholar] [CrossRef]
  17. Yalcin, F.; Simsek, Y. Formulas for characteristic function and moment generating functions of beta type distribution. Rev. Real Acad. Cienc. Exactas Físicas Y Naturales. Ser. A Matemáticas 2022, 116, 86. [Google Scholar] [CrossRef]
  18. Yalcin, F.; Simsek, Y. Anew class of symmetric beta type distributions constructed by means of symmetric Bernstein type basis functions. Symmetry 2020, 12, 779. [Google Scholar] [CrossRef]
  19. Simsek, B. Formulas derived from moment generating functions and Bernstein polynomials. Appl. Anal. Discret. Math. 2019, 13, 839–848. [Google Scholar] [CrossRef] [Green Version]
  20. Keilson, J.; Gerber, H. Some Results for Discrete Unimodality. J. Am. Stat. Assoc. 1971, 66, 386–389. [Google Scholar] [CrossRef]
  21. Gupta, P.L.; Gupta, R.C.; Tripathi, R.C. On the monotonic properties of discrete failure rates. J. Stat. Plan. Inference 1997, 65, 255–268. [Google Scholar] [CrossRef]
  22. Kemp, A.W. Classes of discrete lifetime distributions. Commun. Stat. Theory Methods 2004, 33, 3069–3093. [Google Scholar] [CrossRef]
  23. Gray, R.M. Entropy and Information Theory; Springer: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  24. Balakrishnan, N.; Leiva, V.; Sanhueza, A.; Cabrera, E. Mixture inverse Gaussian distributions and its transformations, moments and applications. Statistics 2009, 431, 91–104. [Google Scholar] [CrossRef]
Figure 1. Pmf of BNDL distribution for some choices of p.
Figure 1. Pmf of BNDL distribution for some choices of p.
Mca 27 00062 g001
Figure 2. Histograms of the BNDL model for simulated data.
Figure 2. Histograms of the BNDL model for simulated data.
Mca 27 00062 g002aMca 27 00062 g002b
Figure 3. Plots of hazard rate of BNDL distribution for some choices of p.
Figure 3. Plots of hazard rate of BNDL distribution for some choices of p.
Mca 27 00062 g003
Figure 4. Plots of the BNDL model for (a) Mean, (b) Variance, (c) Skewness, (d) Kurtosis and (e) ID.
Figure 4. Plots of the BNDL model for (a) Mean, (b) Variance, (c) Skewness, (d) Kurtosis and (e) ID.
Mca 27 00062 g004
Figure 5. ( X )   of X versus p.
Figure 5. ( X )   of X versus p.
Mca 27 00062 g005
Figure 6. Plots of the estimated parameter and MSEs for various values of p.
Figure 6. Plots of the estimated parameter and MSEs for various values of p.
Mca 27 00062 g006aMca 27 00062 g006b
Figure 7. (a) QQ plot and (b) box for the given data.
Figure 7. (a) QQ plot and (b) box for the given data.
Mca 27 00062 g007
Figure 8. (a) TTT plot and (b) Expected Hazard Rate Function (EHRF) for the BDLD model for the dataset.
Figure 8. (a) TTT plot and (b) Expected Hazard Rate Function (EHRF) for the BDLD model for the dataset.
Mca 27 00062 g008
Figure 9. Fitted plots of BNDL and BDLD distribution for given data set.
Figure 9. Fitted plots of BNDL and BDLD distribution for given data set.
Mca 27 00062 g009
Figure 10. Fitted plot of Negative Binomial distributions for given data set.
Figure 10. Fitted plot of Negative Binomial distributions for given data set.
Mca 27 00062 g010
Table 1. Mean, Variance, Skewness, kurtosis and ID of the BNDL distribution for different values of the parameter p.
Table 1. Mean, Variance, Skewness, kurtosis and ID of the BNDL distribution for different values of the parameter p.
p0.10.20.30.40.50.60.70.80.9
Mean1.718181.46661.23841.02850.83330.65000.47640.31110.1526
Variance3.33142.72882.19231.71911.30550.94750.64120.38320.1703
Skewness1.55781.61861.68311.75421.83721.94272.09352.35222.9813
Kurtosis7.70699.499111.837815.090219.948827.865642.374674.4447180.1786
ID1.93891.86061.7702681.67141.56661.45761.34591.23171.1159
Table 2. Numerical results of ( X ) for different values of the parameter p.
Table 2. Numerical results of ( X ) for different values of the parameter p.
p ( X ) p ( X )
0.00011.879340.51.25943
0.011.868520.551.18391
0.031.846540.61.10402
0.051.824370.651.01888
0.071.802010.70.927315
0.091.779480.750.827736
0.111.756750.80.717861
0.141.722310.850.594157
0.171.68740.90.450497
0.21.6520.950.273684
0.251.591810.960.231718
0.31.529940.970.186252
0.351.466110.980.135994
0.41.400020.990.078212
0.451.331280.9990.0112562
Table 3. MLEs and their standard errors (in parentheses) with statistics AIC, BIC, HQIC and CAIC values for given data.
Table 3. MLEs and their standard errors (in parentheses) with statistics AIC, BIC, HQIC and CAIC values for given data.
DistributionMLE (SE)MEASURES
AICCAICBICHQIC
BNDL (p)0.6283
(0.0129)
2681.8392681.8442686.4512683.616
BDLD (p)0.6922
(0.0055)
3092.37003092.37603096.98203094.1480
Negative Binomial (n, k)17.2957, 2.9262
(4.7378, 0.0678)
2824.1562849.442833.382818.69
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shafiq, S.; Khan, S.; Marzouk, W.; Gillariose, J.; Jamal, F. The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data. Math. Comput. Appl. 2022, 27, 62. https://doi.org/10.3390/mca27040062

AMA Style

Shafiq S, Khan S, Marzouk W, Gillariose J, Jamal F. The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data. Mathematical and Computational Applications. 2022; 27(4):62. https://doi.org/10.3390/mca27040062

Chicago/Turabian Style

Shafiq, Shakaiba, Sadaf Khan, Waleed Marzouk, Jiju Gillariose, and Farrukh Jamal. 2022. "The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data" Mathematical and Computational Applications 27, no. 4: 62. https://doi.org/10.3390/mca27040062

APA Style

Shafiq, S., Khan, S., Marzouk, W., Gillariose, J., & Jamal, F. (2022). The Binomial–Natural Discrete Lindley Distribution: Properties and Application to Count Data. Mathematical and Computational Applications, 27(4), 62. https://doi.org/10.3390/mca27040062

Article Metrics

Back to TopTop