Next Article in Journal
On Soft Generalized ω-Closed Sets and Soft T1/2 Spaces in Soft Topological Spaces
Next Article in Special Issue
Dynamic Behaviors of an Obligate Commensal Symbiosis Model with Crowley–Martin Functional Responses
Previous Article in Journal
Continuous-Stage Runge–Kutta Approximation to Differential Problems
Previous Article in Special Issue
Research on the n-Stage Delay Distribution Method Based on a Compensation Mechanism in a Random Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Discrete Poisson–Mirra Distribution: Regression, INAR(1) Process and Applications

by
Radhakumari Maya
1,
Muhammed Rasheed Irshad
2,
Christophe Chesneau
3,*,
Soman Latha Nitin
4 and
Damodaran Santhamani Shibu
4
1
Department of Statistics, Government College for Women, Thiruvananthapuram 695 014, Kerala, India
2
Department of Statistics, Cochin University of Science and Technology, Cochin 682 022, Kerala, India
3
Department of Mathematics, Université de Caen Basse-Normandie, LMNO, UFR de Sciences, F-14032 Caen, France
4
Department of Statistics, University College, Thiruvananthapuram 695 034, Kerala, India
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(5), 193; https://doi.org/10.3390/axioms11050193
Submission received: 28 March 2022 / Revised: 17 April 2022 / Accepted: 20 April 2022 / Published: 21 April 2022
(This article belongs to the Special Issue Advances in Applied Mathematical Modelling)

Abstract

:
Several pieces of research have spotlighted the importance of count data modelling and its applications in real-world phenomena. In light of this, a novel two-parameter compound-Poisson distribution is developed in this paper. Its mathematical functionalities are investigated. The two unknown parameters are estimated using both maximum likelihood and Bayesian approaches. We also offer a parametric regression model for the count datasets based on the proposed distribution. Furthermore, the first-order integer-valued autoregressive process, or INAR(1) process, is also used to demonstrate the utility of the suggested distribution in time series analysis. The unknown parameters of the proposed INAR(1) model are estimated using the conditional maximum likelihood, conditional least squares, and Yule–Walker techniques. Simulation studies for the suggested distribution and the INAR(1) model based on this innovative distribution are also undertaken as an assessment of the long-term performance of the estimators. Finally, we utilized three real datasets to depict the new model’s real-world applicability.

1. Introduction

Many studies have underlined the importance of modelling count data as well as the time series of counts, which has sparked considerable interest in a variety of sectors, including medical science, earth science, physics, finance, and insurance. For modelling count datasets, the Poisson distribution is the most frequently used, but it has the drawback of not being able to model over-dispersed datasets, while the negative binomial distribution is used for modelling over-dispersed data. When investigating counting data, the existence of at least one overdispersion warrants extra consideration when selecting a count model (for details, see [1,2]). As a consequence, the conventional Poisson regression model is rarely used in these situations.
Furthermore, one of the most widely utilized methodologies for analyzing time-dependent data are time series analysis. The corresponding time series is known as a time series of counts or, equivalently, an integer-valued time series when the data are generated by a random counting procedure. The demand for modelling count time series is seen in diverse real-life situations—for example, the monthly number of earthquakes in a specific place, the monthly number of automobile sales, the monthly number of deaths due to a specific disease, the monthly number of traffic accidents, and the number of living cells (see [3]). Refs. [4,5,6] proposed the class of first-order non-negative integer-valued autoregressive, shortly INAR(1), processes with Poisson innovation based on the binomial thinning operator to model time series of counts. INAR models are frequently built using the binomial thinning operator.
Moreover, the INAR(1) model can be altered by changing the innovation distributions according to various real-life situations. According to [7], the INAR(1) model with respect to negative binomial innovations (NB-INAR(1)) is effective for generating overdispersion. Since the common occurrence in overdispersion is that the incidence of zero counts is higher than expected from the Poisson distribution, Ref. [8] proposed an INAR(1) process with geometric distribution as the innovation model. Ref. [9] proposed the PL-INAR(1) model, which combines the INAR(1) model with Poisson–Lindley innovations. Then, Ref. [10] introduced the INAR(1) process with Poisson–Bilal innovations. These are a few of the most significant studies on the overdispersed INAR(1) process. Even while these methods give fine solutions for over-dispersed time series count datasets, they still have significant drawbacks that can pose computing challenges in real-world applications. Thus, discovering more INAR(1) models will provide more possibilities for superiorly fitting the real datasets by choosing those models according to the situations.
In this paper, we construct a novel discrete two-parameter distribution by mixing the Poisson and Mirra distributions, which can be used to model overdispersed count data sets. Hereafter, we call this new distribution the Poisson–Mirra distribution (PMiD). The two-parameter Mirra distribution by [11] is considered to be the generalization of the Xgamma distribution with one parameter, which is proposed by [12]. In addition, Ref. [13] considers the Poisson–Xgamma distribution to be an alternative or rival to the well-known one-parameter Poisson–Lindley distribution proposed by [14]. Interestingly, we discovered that the proposed PMiD is the generalization of the Poisson–Xgamma distribution. As a result, more research into the PMiD is unavoidable, both theoretically and in terms of applied aspects, which is the primary motivation for this work. The suggested distribution has the advantages of having a simple probability mass function (pmf) and cumulative distribution function (cdf), as well as explicit probability and moment generating functions, and can monitor over-dispersed count datasets, which are common in real-world data modelling. By using the PMiD as an innovation process, we also emphasize the significance of the PMiD in the INAR(1) process.
The following is how the rest of the article is sorted. The detailed description of the Mirra distribution is covered in Section 2. The definition of the new distribution, its factorial moments, moments about the origin, skewness, kurtosis, mode, generating functions, entropy measures, and other details are presented in Section 3. In Section 4, the maximum likelihood (ML) estimation and the Bayesian estimation technique are defined to estimate the unknown parameters of the new distribution, and the performance of the PMiD parameters for the ML estimation is also studied using simulation technique in the same section. A regression model with respect to the new distribution is discussed in Section 5. Section 6 develops the INAR(1) model construction with PMiD innovations; the PMiD-INAR(1) process is created. The methods for the estimation of parameters, which include the conditional maximum likelihood, conditional least squares, and the Yule–Walker estimation procedures for the PMiD-INAR(1) process are discussed in Section 7, and the simulation studies based on these three estimation procedures are also conducted in the same section. The applications and the empirical studies based on the new model concerning three real datasets are conducted in Section 8. A detailed discussion of the article is presented in Section 9. Then, Section 10 finishes with the decisive concluding words.

2. The Mirra Distribution

An inaugural approach to the Mirra distribution (MiD) was implemented in the literature by [11]. Let T be a continuous random variable which follows the MiD with parameters α and θ . Then, the probability density function (pdf), and the survival function (sf) of the random variable T are respectively given by
f ( t ) = θ 3 θ 2 + α 1 + α t 2 2 e θ t
and
S ( t ) = θ 2 θ 2 + α 1 + α θ 2 + α t θ + α t 2 2 e θ t ,
where t > 0 and the parameters α , θ > 0 . This distribution is denoted by M i ( α , θ ) . The plots in Figure 1 portray the graphical representation of the pdf.
Now, the hazard function (hf) of the MiD is given as
h ( t ) = θ 1 + α t 2 2 1 + α θ 2 + α t θ + α t 2 2 .
As indicated by [11], the hf of the MiD is shaped as a bathtub, decreasing for t < 2 α , and increasing for t > 2 α .
As a special case, for α = θ = λ , the Xgamma distribution (XGD) is obtained with the pdf given by
f ( t ) = λ 2 λ + 1 1 + λ t 2 2 e t λ ,
where t > 0 and the parameter λ > 0 . For more information on the XGD, see [12].

3. The Poisson–Mirra Distribution

3.1. Presentation

Through the following proposition, a novel mixed-Poisson distribution is introduced by compounding the Poisson and Mirra distributions.
Proposition 1.
Assume that X follows the new compound Poisson–Mirra distribution (PMiD), which has the stochastic representation as follows:
X | λ P o i s s o n ( λ ) λ | α , θ M i ( α , θ ) ,
where λ , α and θ > 0 . Then, the pmf of X is given by
p ( x ; α , θ ) = Pr ( X = x ) = θ 3 ( θ 2 + α ) ( 1 + θ ) x + 1 1 + α ( x + 1 ) ( x + 2 ) 2 ( 1 + θ ) 2 , x = 0 , 1 , 2 ,
This distribution is denoted as P M i D ( α , θ ) , and one can note X P M i D ( α , θ ) to inform that X follows the PMiD with parameters α and θ.
Proof. 
The pmf of X can be derived using the general compounding formula as follows:
p ( x ; α , θ ) = 0 Pr ( X = x | λ ) f ( λ | α , θ ) d λ = 0 e λ λ x x ! θ 3 θ 2 + α 1 + α λ 2 2 e θ λ d λ = θ 3 x ! ( θ 2 + α ) 0 e λ ( 1 + θ ) λ x d λ + α 2 0 e λ ( 1 + θ ) λ x + 2 d λ = θ 3 x ! ( θ 2 + α ) Γ ( x + 1 ) ( 1 + θ ) x + 1 + α 2 Γ ( x + 3 ) ( 1 + θ ) x + 3 = θ 3 ( θ 2 + α ) ( 1 + θ ) x + 1 1 + α ( x + 1 ) ( x + 2 ) 2 ( 1 + θ ) 2 .
We have employed the gamma function defined by Γ ( x ) = 0 t x 1 e t d t , with the relation Γ ( m ) = ( m 1 ) ! for any positive integer m. The proof is completed. □
Now, for α = θ = λ , the pmf of the PMiD reduces to
p ( x ; λ ) = λ 2 2 ( 1 + λ ) x + 4 2 ( 1 + λ ) 2 + λ ( x + 1 ) ( x + 2 ) , x = 0 , 1 , 2 , .
The expression in Equation (2) is the pmf of Poisson–Xgamma distribution (PXGD), which was introduced by [13]. Thus, the PXGD is a special case of the PMiD.
Now, the possible pmf plots for various values of the parameters of the PMiD are portrayed in Figure 2.
The pmf appears to be declining, growing, and unimodal, with some fluctuation in the mode and tails.

3.2. Mode

The next result clarifies the mode analysis of the PMiD.
Proposition 2.
Let X be a random variable following a PMiD. Then, if α 8 θ 2 ( 1 + θ ) 2 ( θ + 2 ) 2 , the mode of X, denoted by x m , exists, and lies in either of these two cases:
max a 1 ( α , θ ) , b 2 ( α , θ ) x m b 1 ( α , θ ) o r a 1 ( α , θ ) x m min b 1 ( α , θ ) , a 2 ( α , θ ) ,
where
a 1 ( α , θ ) = α ( 2 θ ) ϵ 2 α θ , b 1 ( α , θ ) = α ( 2 θ ) + ϵ 2 α θ , a 2 ( α , θ ) = α ( 2 3 θ ) ϵ 2 α θ , b 2 ( α , θ ) = α ( 2 3 θ ) + ϵ 2 α θ ,
with ϵ = α 2 ( θ + 2 ) 2 8 α θ 2 ( 1 + θ ) 2 . The condition α 8 θ 2 ( 1 + θ ) 2 ( θ + 2 ) 2 is to ensure that ϵ 0 , such that ϵ has sense.
Proof. 
We have to find the integer x = x m , for which p ( x ; α , θ ) takes its maximum value. That is, we aim to solve p ( x ; α , θ ) p ( x 1 ; α , θ ) and p ( x ; α , θ ) p ( x + 1 ; α , θ ) , which is equivalent to solving:
α θ x 2 + α ( θ 2 ) x + 2 [ θ ( 1 + θ ) 2 α ] 0 ,
and
α θ x 2 + α ( 3 θ 2 ) x + 2 [ α ( θ 2 ) + θ ( 1 + θ ) 2 ] 0 ,
respectively. On solving the quadratic inequality in Equation (4), we obtain a 1 ( α , θ ) x m b 1 ( α , θ ) , and on solving the quadratic inequality in Equation (5), we obtain b 2 ( α , θ ) x m or x m a 2 ( α , θ ) , where a 1 ( α , θ ) , b 1 ( α , θ ) , a 2 ( α , θ ) , and b 2 ( α , θ ) are given in Equation (3). Combining these three inequalities, we obtain the mode x m such that
max a 1 ( α , θ ) , b 2 ( α , θ ) x m b 1 ( α , θ ) o r a 1 ( α , θ ) x m min b 1 ( α , θ ) , a 2 ( α , θ ) .
This completes the proof. □

3.3. Cdf and Hf

The corresponding cdf of the PMiD is given by
F ( x ; α , θ ) = Pr ( X x ) = ( θ + 1 ) x 3 2 ( α + θ 2 ) { α [ 2 ( θ + 1 ) x + 3 θ ( x + 3 ) θ ( x + 2 ) + 2 2 ] + 2 θ 2 ( θ + 1 ) 2 [ ( θ + 1 ) x + 1 1 ] } ,
and the hf of the PMiD is given by
H ( x ; α , θ ) = p ( x ; α , θ ) 1 F ( x ; α , θ ) ,
where p ( x ; α , θ ) and F ( x ; α , θ ) are respectively given in Equations (1) and (6). Furthermore, plots in Figure 3 refer to the shapes of the hf of the PMiD.
The hf is found to have all of the typical shapes, such as increasing, decreasing, bathtub, and upside-down bathtub shapes.

3.4. Moments

Some aspects of the PMiD are now being studied using various moment measures.
Proposition 3.
The r t h factorial moment of X P M i D ( α , θ ) is given by
μ r = E ( X ( X 1 ) ( X r + 1 ) ) = θ 3 ( θ 2 + α ) r ! θ r + 1 1 + α ( r + 1 ) ( r + 2 ) 2 θ 2 .
Proof. 
Based on the compound-Poisson theory (see [14]), the r t h factorial moment of X can be obtained as follows:
μ r = 0 λ r θ 3 θ 2 + α 1 + α λ 2 2 e θ λ d λ = θ 3 θ 2 + α 0 e θ λ λ x d λ + α 2 0 e θ λ λ x + 2 d λ = θ 3 ( θ 2 + α ) r ! θ r + 1 1 + α ( r + 1 ) ( r + 2 ) 2 θ 2 .
Thus, the proof is complete. □
The first four factorial moments of the PMiD can be obtained by substituting r = 1 , 2 , 3 , and 4 in Equation (7). That is,
μ 1 = θ θ 2 + α 1 + 3 α θ 2 , μ 2 = 2 θ 2 + α 1 + 6 α θ 2 ,
μ 3 = 6 θ ( θ 2 + α ) 1 + 10 α θ 2 , and μ 4 = 24 θ 2 ( θ 2 + α ) 1 + 15 α θ 2 .
Now, the first four moments about the origin of the PMiD are obtained by utilizing the general relationship between factorial moments and moments about the origin. We get
Mean = μ 1 = E ( X ) = μ 1 = θ θ 2 + α 1 + 3 α θ 2 ,
μ 2 = E ( X 2 ) = 1 θ 2 + α 2 1 + 6 α θ 2 + θ 1 + 3 α θ 2 ,
μ 3 = E ( X 3 ) = 1 θ 2 + α 6 θ 1 + 10 α θ 2 + 6 1 + 6 α θ 2 + θ 1 + 3 α θ 2
and
μ 4 = E ( X 4 ) = 1 θ 2 + α 24 θ 2 1 + 15 α θ 2 + 36 θ 1 + 10 α θ 2 + 14 1 + 6 α θ 2 + θ 1 + 3 α θ 2 .
Therefore, the variance of the PMiD is obtained as
V a r ( X ) = 1 θ 2 ( θ 2 + α ) 2 ( θ 2 + 6 α ) + θ ( θ 2 + 3 α ) 1 θ θ 2 + α 1 + 3 α θ 2 .
The dispersion index of the PMiD is given by
D I = 1 + 2 θ θ 2 + 6 α θ 2 + 3 α θ 2 + 3 α θ ( θ 2 + α ) .
Since α and θ > 0 , one can establish that the PMiD’s dispersion index is greater than one, i.e., D I > 1 , indicating that the PMiD is over-dispersed. The following formulae can be used to get explicit expressions for the skewness and kurtosis of the PMiD:
S k e w n e s s ( X ) = μ 3 3 μ 2 μ 1 + 2 ( μ 1 ) 3 [ V a r ( X ) ] 3 / 2
and
K u r t o s i s ( X ) = μ 4 4 μ 3 μ 1 + 6 μ 2 ( μ 1 ) 2 3 ( μ 1 ) 4 [ V a r ( X ) ] 2 .
Now, Table 1 shows some numerical values for the mean, variance, DI, skewness, and kurtosis for the PMiD for various parameter settings.
Proposition 4.
The probability generating function (pgf) of X P M i D ( α , θ ) is obtained as
G ( s ) = E ( s X ) = θ 3 ( θ 2 + α ) ( θ + 1 s ) 1 + α ( θ + 1 s ) 2 ,
for s ( 1 , 1 ) .
Proof. 
Owing to the well-known compound-Poisson theory, the pgf of the PMiD is obtained as follows:
G ( s ) = 0 e λ ( s 1 ) θ 3 θ 2 + α 1 + α λ 2 2 e θ λ d λ = θ 3 θ 2 + α 0 e λ ( θ + 1 s ) d λ + α 2 0 e λ ( θ + 1 s ) λ 2 d λ = θ 3 θ 2 + α Γ ( 1 ) θ + 1 s + α 2 Γ ( 3 ) ( θ + 1 s ) 3 = θ 3 ( θ 2 + α ) ( θ + 1 s ) 1 + α ( θ + 1 s ) 2 .
Thus, the proof is complete. □
When s is replaced by e t and e i t in Equation (11), we obtain the moment generating function (mgf) and characteristic function (cf) of the PMiD, respectively. They are respectively given by
M ( t ) = θ 3 ( θ 2 + α ) ( θ + 1 e t ) 1 + α ( θ + 1 e t ) 2 ,
for t 0 , and
ϕ ( t ) = θ 3 ( θ 2 + α ) ( θ + 1 e i t ) 1 + α ( θ + 1 e i t ) 2 ,
for t R .

3.5. Rényi and Shannon Entropies

Entropy is a measure of uncertainty fluctuation in a stochastic situation, in which higher entropy indicates less information. The most popular entropy measures are Rényi entropy and Shannon entropy, which are among the most accessible in the literature.
For every discrete distribution with pmf  p ( x ) , the related Rényi entropy is defined by
H γ = 1 1 γ log x p γ ( x ) ,
for γ > 0 and γ 1 .
In the context of the PMiD, by using Equation (1), we obtain
x = 0 p γ ( x ; α , θ ) = θ 3 γ ( θ 2 + α ) γ x = 0 1 ( 1 + θ ) x + 1 + α ( x + 1 ) ( x + 2 ) 2 ( 1 + θ ) x + 3 γ .
Thus, the Rényi entropy of the PMiD is simplified to the following formula:
H γ = 1 1 γ log θ 3 γ ( θ 2 + α ) γ + log x = 0 1 ( 1 + θ ) x + 1 + α ( x + 1 ) ( x + 2 ) 2 ( 1 + θ ) x + 3 γ .
Now, the Shannon entropy for a discrete distribution with pmf  p ( x ) is given by
H 1 = x = 0 p ( x ) log p ( x ) .
Hence, the Shannon entropy for the PMiD can be expressed as
H 1 = log θ 3 θ 2 + α x = 0 p ( x ) log 1 ( 1 + θ ) x + 1 + α ( x + 1 ) ( x + 2 ) 2 ( 1 + θ ) x + 3 .

4. Estimation of the Parameters

Hereafter, we perform the estimation of parameters of the PMiD using two well-known estimation approaches: ML and Bayesian methods.

4.1. Maximum Likelihood Estimation

Let X 1 , X 2 , , X n be a random sample of size n from X P M i D ( α , θ ) (so n independent and identically distributed (iid) random variables with the PMiD), with unknown α and θ , and x 1 , x 2 , , x n be observations of X 1 , X 2 , , X n . Then, the log-likelihood function is
log L n = 3 n log ( θ ) n log ( θ 2 + α ) log ( 1 + θ ) i = 1 n ( x i + 1 ) + i = 1 n log 1 + α ( x i + 1 ) ( x i + 2 ) 2 ( 1 + θ ) 2 .
The maximization of log L n with respect to the parameters give their ML estimates (MLEs).
The following approach can be considered. The score function associated with this log-likelihood function is
U = log L n α , log L n θ T .
Now, by solving log L n α = 0 , and log L n θ = 0 , we obtain the associated nonlinear log-likelihood equations. They are respectively given by
i = 1 n ( x i + 1 ) ( x i + 2 ) 2 ( 1 + θ ) 2 + α ( x i + 1 ) ( x i + 2 ) n θ 2 + α = 0 ,
and
3 n θ 2 n θ θ 2 + α i = 1 n x i + 1 1 + θ 1 2 α ( x i + 2 ) 2 ( 1 + θ ) 2 + α ( x i + 1 ) ( x i + 2 ) = 0 .
The solutions of these two equations give the MLEs. We obtained the MLEs numerically using the fitdistrplus package of the R  software (see [15]). For more details on the fitdistrplus package, one should go through the lin k https://CRAN.R-project.org/package=fitdistrplus accessed on 14 February 2021. For the detailed R-code for finding the MLEs of the PMiD, see Appendix A of this article.

4.2. Bayesian Estimation

The Bayesian estimation technique is used to estimate the PMiD parameters in this subsection. That is, each parameter of PMiD must have some prior densities. For both of the parameters α and θ , the half-Cauchy (hC) distribution is used as the prior densities. The hereunder is the pdf of the hC distribution with scale parameter δ :
f h C ( u ) = 2 δ π ( u 2 + δ 2 ) , u > 0 , δ > 0 .
There is no mean and variance for the hC distribution. Other than that, its mode is equal to 0. The hC distribution with the value of δ equals 25 is the preferable alternative to the uniform distribution, if more information is needed, according to [16]. Thus, as a noninformative prior distribution for the parameters α and θ , we utilize hC distribution with its δ value fixed to 25. That is, we use
α , θ h C ( 25 ) .
Thus, using Equation (12), the joint posterior pdf is given by
ψ * ( α , θ | x ) L n × ψ ( α ) × ψ ( θ ) ,
where L n is the likelihood function for the PMiD, and ψ ( x ) is the pdf of the hC distribution with δ = 25 . It is clear from Equation (13) that there is no analytical solution for determining the Bayesian estimates. As a consequence, we adopt the Metropolis–Hastings algorithm (MHA) of the Markov Chain Monte Carlo (MCMC) technique, which is a phenomenal simulation method, using the R software.

4.3. Performance of the PMiD Parameters Using Simulation Study

For some finite sample sizes, we execute the simulation studies to test the long-run accuracy of the MLEs of the PMiD parameters. We have generated samples of sizes n = 100 , 250 , 500 , 750 , and 1000 from the PMiD using various sets of parameter values. The R-code for generating the PMiD random samples for the specified parameter settings are given in Appendix A. The iteration is conducted 1001 times in this case. As a consequence, we calculated the average of the biases, mean squared errors (MSEs), coverage probabilities (CPs), and average lengths (ALs) of each parameter estimate for all iterations in the relevant sample sizes. The results are reported in Table 2. It can be seen that, as the sample size increases, the MSEs and ALs were associated with each estimate decrease. Interestingly, the CPs of the confidence intervals (CIs) for each parameter are relatively close to the nominal 95 percent level. This illustrates the steady performances of the MLEs.

5. PMiD Regression Model

In this section, we define a new count regression model based on the PMiD known as a PMiD regression model. Let Y P M i D ( α , θ ) . By considering the re-parametrization α = θ 2 ( μ θ 1 ) / ( 3 μ θ ) , the pmf of the PMiD can be expressed in terms of the mean E ( Y ) = μ > 0 and θ > 0 , and it is given by
P ( Y = y | μ , θ ) = θ ( 3 θ μ ) 2 ( 1 + θ ) y + 1 1 + θ 2 ( θ μ 1 ) ( y + 1 ) ( y + 2 ) 2 ( 3 θ μ ) ( 1 + θ ) 2 , y = 0 , 1 , 2 ,
The pmf in Equation (14) that defines a distribution is denoted as P M i D ( μ , θ ) . Thus, we have Y P M i D ( μ , θ ) . Assume that the response variable Y satisfies Y P M i D ( μ , θ ) , and the covariates are in relation with the i t h mean by the log-link function given as follows:
μ i = E ( Y i ) = exp v i T τ , i = 1 , 2 , , n ,
where v i T = 1 , v i 1 , v i 2 , , v i p is the vector of covariates and τ = τ 0 , τ 1 , , τ p T is the unknown vector of regression coefficients. The log-likelihood function of the PMiD regression model is derived by inserting the log-link function of Equation (15) in Equation (14), and is given by
log L ( Θ ) = n log ( θ / 2 ) + i = 1 n log 3 θ μ i log ( 1 + θ ) i = 1 n ( y i + 1 ) + i = 1 n log 1 + θ 2 ( θ μ i 1 ) ( y i + 1 ) ( y i + 2 ) 2 ( 3 θ μ i ) ( 1 + θ ) 2 ,
where y i is the i t h observations of Y, μ i is given in Equation (15) for i = 1 , 2 , , n , and Θ = τ , θ are obtained by maximizing Equation (16) using optim function in the R software.

6. INAR(1) Model with PMiD Innovations

The novel distribution is particularly well suited for modelling integer-valued time series data.
The stochastic process, X t t Z , is an INAR(1) process if it is given by
X t = p X t 1 + ε t , t Z ,
where p [ 0 , 1 ) and ε t is represented as the innovation process, thus composed of iid integer-valued random variables, with mean E ( ε t ) = μ ε and variance V a r ( ε t ) = σ ε 2 . The operator symbol ‘∘’ represents the binomial thinning operator, which is defined as
p X t 1 = j = 1 X t 1 B j ,
where B j j 1 is a sequence of iid Bernoulli random variables with success probability, p. For p [ 0 , 1 ) , the INAR(1) process is stationary, while, for the case p = 1 , the process is non-stationary. The INAR(1) process, according to [4,5], is a homogeneous Markov chain with 1-step transition probabilities given by
P ( X t = k | X t 1 = l ) = i = 0 min ( k , l ) l i p i ( 1 p ) l i Pr ( ε t = k i ) , k , l 0 ,
where p ( 0 , 1 ) . Therefore, in general, the mean, variance, and dispersion index of the INAR(1) process are, respectively, given by
E ( X t ) = μ ε 1 p ,
Var ( X t ) = p μ ε + σ ε 2 1 p 2 ,
and
D I X t = D I ε + p 1 + p ,
where μ ε , σ ε 2 , and D I ε are given in Equations (8)–(10), respectively. For the detailed information on the INAR process, one can go through [17].
Now, in this section, based on the PMiD innovations, we present a new over-dispersed INAR(1) model, and we call it the PMiD-INAR(1) process.
Definition 1.
Assume that the innovation process ε t t Z follows a PMiD. Then, the (one-step) transition probability of the PMiD-INAR(1) process is given by
P k , l = Pr ( X t = k | X t 1 = l ) = i = 0 min ( k , l ) l i p i ( 1 p ) l i θ 3 ( θ 2 + α ) ( 1 + θ ) k i + 1 1 + α ( k i + 1 ) ( k i + 2 ) 2 ( 1 + θ ) 2 .
Now, using Equations (17)–(19), the mean, variance, and dispersion index of the PMiD-INAR(1) process are respectively obtained as
μ X t = θ ( θ 2 + α ) ( 1 p ) 1 + 3 α θ 2 ,
σ X t 2 = θ ( θ 2 + α ) ( 1 p 2 ) 1 + 3 α θ 2 p + 1 1 ( θ 2 + α ) 1 + 3 α θ 2 + 2 1 + 6 α θ 2 ,
and
D I X t = 1 + 2 ( 1 + p ) θ θ 2 + 6 α θ 2 + 3 α θ 2 + 3 α 2 θ ( θ 2 + α ) .
Since D I X t is clearly greater than 1, the process is appropriate for over-dispersed integer valued autoregressive time series data. The conditional expectation and variance of the PMiD-INAR(1) process are given by
E ( X t | X t 1 ) = E p X t 1 | X t 1 + E ε t | X t 1 = p X t 1 + θ θ 2 + α 1 + 3 α θ 2 ,
and
V a r ( X t | X t 1 ) = V a r p X t 1 | X t 1 + V a r ε t | X t 1 = p ( 1 p ) X t 1 + σ ε 2 ,
where σ ε 2 is given in Equation (9) (see [17,18]).

7. Estimation of the Parameters: PMiD-INAR(1) Process

In this section, the conditional maximum likelihood (CML), conditional least squares (CLS), and Yule–Walker (YW) methods are used to investigate the inference of the PMiD-INAR(1) process. To examine the efficiency of these three strategies, a simulation study is also carried out.

7.1. Conditional Maximum Likelihood (CML) Estimation

Let X 1 , X 2 , , X T be a random sample of the stationary PMiD-INAR(1) process, and x 1 , x 2 , , x T be observations of X 1 , X 2 , , X T . Then, the conditional log-likelihood function is given by
L ( p , α , θ ) = t = 2 T log P ( X t = x t | X t 1 = x t 1 ) = t = 2 T log i = 0 min ( x t , x t 1 ) x t 1 i p i ( 1 p ) x t 1 i θ 3 ( θ 2 + α ) ( 1 + θ ) x t i + 1 1 + α ( x t i + 1 ) ( x t i + 2 ) 2 ( 1 + θ ) 2 .
By applying the direct maximization technique on Equation (23), the respective CML estimates, say p ^ c l s , α ^ c l s , and θ ^ c l s corresponding to the parameters of the PMiD-INAR(1) process, p , α , and θ can be obtained using the optim function of the R software.

7.2. Conditional Least Squares (CLS) Estimation

By using Equation (21), the CLS estimates of the PMiD-INAR(1) process parameters ζ = ( p , α , θ ) are obtained by minimizing the function given below:
S ( ζ ) = t = 2 T x t E ( X t | X t 1 = x t 1 ) 2 = t = 2 T x t p x t 1 θ θ 2 + α 1 + 3 α θ 2 2 .
As a result, by solving ζ S ( ζ ) = 0 , the system of equations based on the estimates can be derived. The CLS estimate of α is obtained as
p ^ C L S = ( T 1 ) t = 2 T x t x t 1 t = 2 T x t t = 2 T x t 1 ( T 1 ) t = 2 T x t 1 2 t = 2 T x t 1 2 .
Hence, in S ( ζ ) , p is switched with p ^ C L S , and the resultant function S ( α , θ ) should be minimized by concerning α and θ to obtain their CLS estimates. Since α ^ C L S and θ ^ C L S do not have the closed-form expression, we utilize optim function of the R software to obtain it numerically by minimizing S ( α , θ ) .

7.3. Yule–Walker (YW) Estimation

The concept of the YW approach is to synchronously solve the theoretical moments with the empirical ones. Because of the autocorrelation function (ACF) of the INAR(1) process at lag τ is ρ x ( τ ) = p τ , the YW estimate of p is given by
p ^ Y W = t = 2 T ( x t x ¯ ) ( x t 1 x ¯ ) t = 1 T ( x t x ¯ ) 2 .
Now, the theoretical mean and dispersion index of the PMiD-INAR(1) process is then solved with their empirical equivalents to derive the YW estimates of α and θ . When the theoretical mean is equated to the empirical mean, the YW estimate α ^ Y W of α is calculated as follows:
α ^ Y W = θ ^ Y W 2 X ¯ θ ^ Y W ( 1 p ^ Y W ) 1 3 X ¯ θ ^ Y W ( 1 p ^ Y W ) ,
where x ¯ = 1 N t = 1 T x t . Then, substituting α ^ Y W with Equation (24) in Equation (20) and equating Equation (20) to the empirical value of the dispersion index ( D I ^ X t ), we obtain the YW estimate θ ^ Y W of θ . Since θ ^ Y W does not have the closed-form expression, we utilize the uniroot function available in the R software to obtain it numerically.

7.4. Simulation: PMiD-INAR(1) Process

In this section, a simulation study is conducted for the comprehensive assessment of the long-standing performances of CML, YW, and CLS estimates of the PMiD-INAR(1) process parameters. We have generated samples of sizes n = 100 , 250 , 500 , 750 , and 1000 using values of the parameter setting ( p = 0.5 , α = 0.6 , θ = 0.7 ), and the iteration process is conducted 1001 times. The simulation results are analyzed using estimated biases, MSEs, and mean relative errors (MREs), and are presented in Table 3. The following formulae are used to determine these values:
Average bias = 1 1001 i = 1 1001 ( ζ ^ i ζ ) , Average MSE = 1 1001 i = 1 1001 ( ζ ^ i ζ ) 2 , Average MRE = 1 1001 i = 1 1001 ζ ^ i ζ ,
where ζ = ( p , α , θ ) .

8. Applications and Empirical Study

To show the usage of the PMiD model, we utilize three real datasets in this section: the first is COVID-19 data, the second is hospital length of stay, and the third is the number of earthquakes data.

8.1. COVID-19 Data: Armenia

First, we utilize the dataset of the daily new deaths in Armenia due to the COVID-19 disease. The data are available at https://www.worldometers.info/coronavirus/country/armenia/ accessed on the 10 January 2021 and are also studied by [19]. They contain the daily new COVID cases between 15 February 2020 and 4 October 2020. To demonstrate the PMiD’s potential benefit, the distributions given in Table 4 are considered for comparison.
We compare the competitive distributions to the suggested distribution using the statistical techniques provided, namely the negative log-likelihood ( log L ), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), χ 2  statistic, and p-value. Table 5 and Table 6 display the corresponding MLEs (with standard errors (SEs) and CIs) and goodness-of-fit (GOF) results, respectively. The PMiD’s GOF statistics values are less than the other examined distributions, as shown in these tables. As a result, the suggested model is the most appropriate for modelling the given COVID-19 data. Now, the empirical mean, variance, and DI of this Armenia dataset are 4.1931 , 18.7944 , and 4.4822 , respectively, and the theoretical values for the mean, variance, and DI measures of the PMiD are 4.1931 , 19.6659 , and 4.6900 . It is observed that the empirical and the theoretical means are exactly the same, and the empirical and the theoretical values of variances, and DIs are the closest to each other. The empirical cdf, pmf, and P-P plots for the Armenia dataset are respectively given in Figure 4, Figure 5 and Figure 6. It again gives some better-shaped curves for those fitted in the PMiD.
The next goal is to use the Bayesian technique to estimate the parameters of the PMiD using the above-mentioned COVID-19 dataset. The analysis was carried out using the MHA of the MCMC technique with 1000 iterations in this perspective. For comparison purposes, both the Bayesian and MLE estimates of the PMiD parameters for the real dataset are given in Table 7. R programming is used to perform the numerical computations.

8.2. Length of Hospital Stay

By applying the PMiD regression model to an actual dataset, we can confirm its prominence. The dataset is about the 1991 Arizona cardiovascular patients (AZPRO data), which is available in COUNT package of the R software. The goal of this study is to investigate the combined relationship between the length of hospital stay ( y i ) with the covariates x i 1 - the cardiovascular procedures ( C A B G = 1 , P T C A = 0 ) , x i 2 - the sex of the patients ( m a l e = 1 , f e m a l e = 0 ) , x i 3 - the type of the admission ( u r g e n t = 1 , e l e c t i v e = 0 ) , and x i 4 - the age of the patients ( ( a g e > 75 ) = 1 , ( a g e 75 ) = 0 ) . The fitted nonlinear regression model is given by
μ i = exp τ 0 + τ 1 x i 1 + τ 2 x i 2 + τ 3 x i 3 + τ 4 x i 4 .
In Table 8, we compare the performance of the PMiD regression model with the Poisson regression model, the NPWE regression model, and the PXGD regression model and also display the summaries due to the real dataset, which include the SEs, p-value, negative log-likelihood, AIC, and BIC values. Table 8 points out that the PMiD regression model has the lowest values across all the model selection criteria, indicating that it is the better count regression model than all the Poisson, NPWE, and PXGD regression models.

8.3. Japan Earthquake Data

We use data from the ETAS package of R software to calculate annual counts of earthquakes with a magnitude of 4.5 or higher that occurred in Japan between the years 1926 and 2007. The data comprise 82 observations. For more details on this package, one can go through [23]. In addition, Figure 7 depicts the Japan earthquake catalog. The spatial distribution of earthquakes in the study area is depicted in the top-left picture. The three figures in the right half of Figure 7 depict variations in the latitude, longitude, and magnitude of the earthquakes over time. The two figures in the bottom-left corner of Figure 7 depict the earthquake catalog’s completeness and time stationarity. Here, the number of earthquakes having a magnitude higher than or equal to m is represented by N m . If the plot of l o g ( N m ) versus m exhibits a linear trend, it reflects the completeness of the earthquake catalog. Furthermore, the time stationarity of the catalog can be determined by looking at the plot of N t versus t, where N t is the total number of occurrences in the catalog up to time t. The earthquake time series is stationary if the plotted points of N t versus t have a functional form in such a way that N t = λ 0 t , where λ > 0 .
We calculated the CML estimates of parameters, as well as the negative log-likelihood ( log L ), AIC, and BIC of the PMiD-INAR(1) as well as the INAR(1) with the innovations, new Poisson-weighted exponential (INAR-NPWE(1)), Poisson-transmuted exponential (INAR-PTE(1)), discrete Poisson–Lindley (INAR-DPLi(1)), and Poisson (PINAR(1)), for the comparison. For more details of these innovations, see Table 4. Now, Table 9 displays the results corresponding to the earthquake data. We see that the log L , AIC, and BIC values of the PMiD-INAR(1) model is smaller than those of the other compared INAR(1) models, and we conclude that the PMiD-INAR(1) is the most suitable model for the earthquake data compared to that of the other models.
The residual analysis is done to make sure the fitted model is accurate for the earthquake data. To do so, the Pearson residuals for the PMiD-INAR(1) process are computed using the following formula:
r t = x t E ( X t | X t 1 = x t 1 ) V a r ( X t | X t 1 = x t 1 ) 1 / 2 ,
where E ( X t | X t 1 = x t 1 ) and V a r ( X t | X t 1 = x t 1 ) can be found in Equations (21) and (22), respectively. In general, the mean and variance of Pearson residuals should be near zero and one, respectively, and the computed Pearson residuals should not have any autocorrelation issues if the fitted INAR(1) process is statistically accurate. Here, the obtained Pearson residuals of the PMiD-INAR(1) process have a mean and variance of 0.0012 and 1.1612 , respectively, which were quite close to the anticipated values. Therefore, the PMiD-INAR(1) process is well judged for the given earthquake data. Now, the fitted PMiD-INAR(1) process is obtained as follows:
X t = 0.2813 X t 1 + ε t ,
where the innovation process ε t PMiD( 0.6869 , 0.0247 ). Now, we can calculate the predicted number of earthquakes in Japan via
X ^ 1 = θ ^ ( θ ^ 2 + α ^ ) ( 1 p ^ ) 1 + 3 α ^ θ ^ 2 = 168.8961 X ^ t = p ^ x t 1 + θ ^ θ ^ 2 + α ^ 1 + 3 α ^ θ ^ 2 = 0.2813 x t 1 + 121.3856 .

9. Discussion

9.1. Context

Discovering new count models is inevitable in the scenario of overdispersion, which will provide more possibilities for superiorly fitting the real datasets by choosing the right models according to the situations. In that sense, we proposed a new overdispersed count model, discussed its regression aspects, and constructed an INAR(1) process based on them. The main motivation behind the construction of this model is also discussed. In all the aspects, we found that the proposed model outperforms the compared models.

9.2. This Work

The Poisson–Mirra distribution (PMiD), a novel two-parameter discrete distribution, is introduced and thoroughly investigated. We delivered specific expressions for the factorial moments, mean, variance, dispersion index, skewness, kurtosis, mode, probability generating function, moment generating function, characteristic function, and the entropy measures. The distribution parameters were estimated by using the classical maximum likelihood and Bayesian estimation methods and were also studied in their real-data analysis. A simulation study on MLE was also conducted. A new regression model for count data based on the PMiD is proposed and compared with its competitive regression models based on a real dataset. More importantly, we introduced the integer-valued autoregressive model based on the PMiD, known as the PMiD-INAR(1) process. The parameter estimation problems, which include the conditional maximum likelihood, conditional least squares, and the Yule–Walker estimation procedures for the PMiD-INAR(1) process, are discussed, and simulation studies based on these three estimation procedures are also conducted. In total, three real-world datasets were used to demonstrate the use of the novel model, the first regarding COVID-19 data, the second regarding the length of hospital stay, and the third concerning earthquake data.

9.3. Contributions and Limitations

The article thus developed the new distribution PMiD from which we derived a new count model, its regression model, and the first-order integer-valued autoregressive model. The proposed distribution, we believe, is ideal for researching data from COVID-19 related areas, biology, and earthquake-related fields, and we hope the PMiD applies to other fields of study also. The absence of a bimodal feature is the possible limitation of the proposed distribution.

9.4. Future Work

This research could take another route if the bivariate version of the PMiD and the associated BINAR(1) process are built. This work needs considerable modifications and studies, which we shall delegate to future research.

10. Conclusions

In this article, we fitted three real datasets and concluded that the PMiD model outperforms all the compared models in all aspects. We anticipate that the proposed model will increase its prevalence and have a wider variety of applications in the modelling of positive integer-valued real-world datasets from different study areas such as physics, astronomy, survival analysis, and so on.   

Author Contributions

Conceptualization, R.M., M.R.I., C.C., S.L.N. and D.S.S.; methodology, R.M., M.R.I., C.C., S.L.N. and D.S.S.; software, R.M., M.R.I., C.C., S.L.N. and D.S.S.; validation, R.M., M.R.I., C.C., S.L.N. and D.S.S.; formal analysis, R.M., M.R.I., C.C., S.L.N. and D.S.S.; investigation, R.M., M.R.I., C.C., S.L.N. and D.S.S.; writing—original draft preparation, R.M., M.R.I., C.C., S.L.N. and D.S.S.; writing—review and editing, R.M., M.R.I., C.C., S.L.N. and D.S.S.; visualization, R.M., M.R.I., C.C., S.L.N. and D.S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We appreciate the constructive feedback from the three reviewers on the first version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
INAR(1)First-order Integer-valued Autoregressive
NBNegative Binomial
PLPoisson–Lindley
PMiDPoisson–Mirra Distribution
MiDMirra Distribution
XGDXgamma Distribution
pdfProbability Density Function
cdfCumulative Distribution Function
sfSurvival Function
hfHazard Function
pmfProbability Mass Function
PXGDPoisson–Xgamma Distribution
DIDispersion Index
pgfProbability Generating Function
mgfMoment Generating Function
cfCharacteristic Function
MLEMaximum Likelihood Estimate
hChalf-Cauchy
MHAMetropolis–Hastings Algorithm
MCMCMarkov Chain Monte Carlo
MSEMean Squared Error
CPCoverage Probability
ALAverage Length
CMLConditional Maximum Likelihood
CLSConditional Least Squares
YWYule–Walker
ACFAutocorrelation Function
MREMean Relative Error
AICAkaike Information Criterion
BICBayesian Information Criterion
GOFGoodness-of-Fit
DGLiDiscrete Generalized Lindley
DPLiDiscrete Poisson–Lindley
DLiDiscrete Lindley
NPWENew Poisson-Weighted Exponential
PTEPoisson-Transmuted Exponential
SEStandard Error
CIConfidence Interval
dfDegrees of Freedom

Appendix A

R-code for generating the PMiD random samples is given by
ppmid <- function(q, alpha, theta){
p <- (1/(2*(alpha+theta^2))) * ((theta+1)^(-q-3)) *
(alpha*(2*((theta+1)^(q+3)) - theta*(q+3)*(theta*(q+2)+2)-2) +
2*(theta^2)*((theta+1)^2)* (((theta+1)^(q+1)) - 1))
return(p)
}
 
rpmid <- function(n, alpha, theta)
{
U <- runif(n)
X <- rep(0,n)
for(i in 1:n)
{
if(U[i] < ppmid(0, alpha, theta))
{
X[i] <- 0
} else
{
B = FALSE
I = 0
while(B == FALSE)
{
int <- c( ppmid(I, alpha, theta), ppmid(I+1, alpha, theta) )
if( (U[i] > int[1]) & (U[i] < int[2]) )
{
X[i] <- I+1
B = TRUE
} else
{
# If not, continue the while loop and increase I by 1
I=I+1
}
}
}
}
return(X)
}.
R-code for finding the MLEs and GOF statistics values of the PMiD is given by
library(fitdistrplus)
 
dpmid <- function(x, alpha, theta){
d <- (theta^3)/((theta^2)+alpha) * 1/((1+theta)^(x+1)) *
(1 + (alpha*(x+1)*(x+2))/(2*(1+theta)^2))
return(d)
}
 
ppmid <- function(q, alpha, theta){
p <- (1/(2*(alpha+theta^2))) * ((theta+1)^(-q-3)) *
(alpha*(2*((theta+1)^(q+3)) - theta*(q+3)*(theta*(q+2)+2)-2) +
2*(theta^2)*((theta+1)^2)* (((theta+1)^(q+1)) - 1))
return(p)
}
 
pre <- prefit(x, "pmid", "mle", list(alpha=0.3, theta=0.3),
lower=c(0, 0), upper = c(Inf, Inf))
 
fit.pmid <- fitdist(x, "pmid",
start = list(alpha = pre$alpha, theta = pre$theta),
optim.method = "L-BFGS-B", lower=c(0, 0), upper = c(Inf, Inf),
discrete = TRUE)
 
summary(fit.pmid)
gofstat(fit.pmid).

References

  1. Rigby, R.; Stasinopoulos, D.; Akantziliotou, C. A framework for modelling overdispersed count data, including the poisson-shifted generalized inverse gaussian distribution. Comput. Stat. Data Anal. 2008, 53, 381–393. [Google Scholar] [CrossRef]
  2. Sellers, K.F.; Raim, A. A flexible zero-inflated model to address data dispersion. Comput. Stat. Data Anal. 2016, 99, 68–80. [Google Scholar] [CrossRef]
  3. Contreras-Reyes, J.E. Lerch distribution based on maximum nonsymmetric entropy principle: Application to Conway’s game of life cellular automaton. Chaos Solitons Fractals 2021, 151, 111272. [Google Scholar] [CrossRef]
  4. Al-Osh, M.A.; Alzaid, A.A. First-order integer-valued autoregressive (INAR(1)) process. J. Time Ser. Anal. 1987, 8, 261–275. [Google Scholar] [CrossRef]
  5. McKenzie, E. Some simple models for discrete variate time series1. JAWRA J. Am. Water Resour. Assoc. 1985, 21, 645–650. [Google Scholar] [CrossRef]
  6. McKenzie, E. Autoregressive moving-average processes with negative-binomial and geometric marginal distributions. Adv. Appl. Probab. 1986, 18, 679–705. [Google Scholar] [CrossRef]
  7. Jung, R.; Ronning, G.; Tremayne, A. Estimation in conditional first order autoregression with discrete support. Stat. Pap. 2005, 46, 195–224. [Google Scholar] [CrossRef]
  8. Jazi, M.A.; Jones, G.; Lai, C.-D. Integer valued ar(1) with geometric innovations. J. Iran. Stat. Soc. 2012, 11, 173–190. [Google Scholar]
  9. Lívio, T.; Khan, N.M.; Bourguignon, M.; Bakouch, H. An inar(1) model with poisson-lindley innovations. Econ. Bull. 2018, 38, 1505–1513. [Google Scholar]
  10. Altun, E. A new one-parameter discrete distribution with associated regression and integer-valued autoregressive models. Math. Slovaca 2020, 70, 979–994. [Google Scholar] [CrossRef]
  11. Sen, S.; Ghosh, S.; Al-Mofleh, H. The Mirra distribution for modeling time-to-event data sets. In Strategic Management, Decision Theory, and Decision Science; Sinha, B.K., Bagchi, S.B., Eds.; Springer: Singapore, 2021; pp. 59–73. [Google Scholar]
  12. Sen, S.; Maiti, S.; Chandra, N. The xgamma distribution: Statistical properties and application. J. Mod. Appl. Stat. Methods 2016, 15, 774–788. [Google Scholar] [CrossRef] [Green Version]
  13. Altun, E.; Cordeiro, G.M.; Ristić, M.M. An one-parameter compounding discrete distribution. J. Appl. Stat. 2021, 1–22. [Google Scholar] [CrossRef]
  14. Sankaran, M. The discrete poisson-lindley distribution. Biometrics 1970, 26, 145–149. [Google Scholar] [CrossRef]
  15. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 6 September 2021).
  16. Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Analytical Methods for Social Research; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  17. Weiß, C. An Introduction to Discrete-Valued Time Series; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2018. [Google Scholar]
  18. Alzaid, A.; Alosh, M. First-order integer-valued autoregressive (inar (1)) process: Distributional and regression properties. Stat. Neerl. 1988, 42, 53–61. [Google Scholar] [CrossRef]
  19. El-morshedy, M.; Altun, E.; Eliwa, M.S. A new statistical approach to model the counts of novel coronavirus cases. Math. Sci. 2021, 1–14. [Google Scholar] [CrossRef]
  20. Gómez-Déniz, E.; Calderín-Ojeda, E. The discrete lindley distribution: Properties and applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
  21. Altun, E. A new generalization of geometric distribution with properties and applications. Commun. Stat.-Simul. Comput. 2020, 49, 793–807. [Google Scholar] [CrossRef]
  22. Bhati, D.; Kumawat, P.; Gómez-Déniz, E. A new count model generated from mixed poisson transmuted exponential family with an application to health care data. Commun. Stat. Theory Methods 2017, 46, 11060–11076. [Google Scholar] [CrossRef]
  23. Jalilian, A. Etas: An r package for fitting the space-time etas model to earthquake data. J. Stat. Software Code Snippets 2019, 88, 1–39. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Plots of the pdf of the MiD.
Figure 1. Plots of the pdf of the MiD.
Axioms 11 00193 g001
Figure 2. Plots of the pmf of the PMiD.
Figure 2. Plots of the pmf of the PMiD.
Axioms 11 00193 g002
Figure 3. Plots of the hf of the PMiD.
Figure 3. Plots of the hf of the PMiD.
Axioms 11 00193 g003
Figure 4. Empirical cdfs of the fitted distributions for the Armenia dataset.
Figure 4. Empirical cdfs of the fitted distributions for the Armenia dataset.
Axioms 11 00193 g004
Figure 5. Empirical pmfs of the fitted distributions for the Armenia dataset.
Figure 5. Empirical pmfs of the fitted distributions for the Armenia dataset.
Axioms 11 00193 g005
Figure 6. Empirical PP plots of the fitted distributions for the Armenia dataset.
Figure 6. Empirical PP plots of the fitted distributions for the Armenia dataset.
Axioms 11 00193 g006
Figure 7. The plots of the earthquake catalog of Japan.
Figure 7. The plots of the earthquake catalog of Japan.
Axioms 11 00193 g007
Table 1. Values of some moment measures of the PMiD for various values of α and θ .
Table 1. Values of some moment measures of the PMiD for various values of α and θ .
α = 0.5 and Various Values of θ
Measures θ = 1 . 5 θ = 3 . 5 θ = 5 . 5 θ = 7 . 5 θ = 9 . 5
Mean0.90910.30810.18770.13570.1064
Variance1.77960.40850.22400.15440.1179
DI1.95761.32561.19311.13791.1076
Skewness2.14072.59132.93193.24823.5398
Kurtosis9.387211.887813.683015.597817.5592
α = 1 . 5 and Various Values of θ
Measures θ = 1 . 5 θ = 3 . 5 θ = 5 . 5 θ = 7 . 5 θ = 9 . 5
Mean1.20.34810.19900.14030.1087
Variance2.42670.47920.24110.16080.1209
DI2.02221.37691.21171.14621.1118
Skewness1.82892.53142.90323.22583.5212
Kurtosis7.471311.540213.597215.519417.4747
Table 2. The simulation results for ( α = 2.5 , θ = 0.5 ).
Table 2. The simulation results for ( α = 2.5 , θ = 0.5 ).
ParametersnMLEBiasMSECPAL
α 1002.82960.32965.46120.830218.9348
2502.99590.49594.39590.870112.1873
5002.94020.44022.72540.91317.6664
7502.88960.38962.11990.91715.9141
10002.82830.32831.57740.91814.8197
θ 1000.4922−0.00780.00190.95700.1855
2500.4973−0.00270.000750.96300.1161
5000.4989−0.00110.000380.96400.0816
7500.4998−0.000230.000250.97100.0666
10000.50030.000310.00020.96100.0578
Table 3. Simulation results for ( p = 0.5 , α = 0.6 , θ = 0.7 ).
Table 3. Simulation results for ( p = 0.5 , α = 0.6 , θ = 0.7 ).
ζ nCMLCLSYW
BiasMSEMREBiasMSEMREBiasMSEMRE
p100−0.00290.00210.99430.03660.00790.9269−0.06740.01810.8653
250−0.00170.000950.9966−0.01370.00300.9725−0.02460.00470.9508
500−0.00150.000540.9984−0.00690.00160.9862−0.01010.00190.9799
750−0.00120.000350.9966−0.00560.00110.9888−0.00700.00130.9859
1000−0.00110.000290.9979−0.00270.00080.9946−0.00370.00080.9927
α 1000.07010.12511.11690.08260.02481.13771.869620.51814.1161
2500.06280.09571.10470.05170.01071.08620.94827.74212.5804
5000.04830.07401.08050.04560.00711.07600.42341.82461.7056
7500.04560.06171.07600.04270.00531.07120.24600.72041.4099
10000.04210.05661.07850.03840.00441.06410.14950.25791.2492
θ 100−0.02030.01200.9710−0.01510.00490.97840.02910.15331.0416
250−0.00920.00580.9868−0.00160.00200.9979−0.00830.02360.9881
500−0.00570.00380.99180.00140.00111.0022−0.00470.00910.9933
750−0.00320.00210.99540.00110.00081.0038−0.00400.00440.9943
1000−0.00110.00180.99840.00090.00061.0067−0.00210.00350.9970
Table 4. The considered competitive distributions.
Table 4. The considered competitive distributions.
DistributionAbbreviationReference
Discrete generalized LindleyDGLi[19]
Poisson–XgammaPXGD[13]
Discrete Poisson–LindleyDPLi[9]
Discrete LindleyDLi[20]
New Poisson-weighted exponentialNPWE[21]
Poisson-transmuted exponentialPTE[22]
PoissonP-
Table 5. Armenia dataset: MLEs of the parameters.
Table 5. Armenia dataset: MLEs of the parameters.
Distribution α θ
MLESECIMLESECI
PMiD0.10290.0586(−0.0121, 0.2178)0.41620.0463(0.3254, 0.5070)
PXGD0.54310.0275(0.4892, 0.5969)---
DGLi0.24770.0843(0.0825, 0.4128)0.77630.0316(0.7144, 0.8382)
DPLi0.410010.0227(0.3656, 0.4545)---
DLi0.69140.0121(0.6677, 0.7151)---
NPWE0.21673.1158(−5.8903, 6.3236)0.100815.8322(−30.9297, 31.1313)
PTE0.00010.2144(−0.4202, 0.4202)0.23850.0303(0.1790, 0.2980)
P4.19310.1342(−3.9302, 4.4561)---
Table 6. Armenia dataset: Goodness-of-fit test.
Table 6. Armenia dataset: Goodness-of-fit test.
XOFExpected Frequency
PMiDPXGDDGLiDPLiDLiNPWEPTEP
05645.165435.441943.687533.673928.481944.867444.86723.5181
13135.003931.499935.801533.791733.094036.227636.227414.7516
22228.012828.707129.257530.993632.146529.251529.251430.9278
32522.883525.770123.849726.965528.631823.618723.618643.2281
41118.897422.505619.397222.659424.224819.070619.070545.3153
51415.664619.099515.743318.577519.810815.398315.398238.0026
61412.973015.790912.753514.953515.814112.433112.433126.5583
71010.703312.761410.313511.866312.397410.038910.039015.9089
8118.783410.11338.32699.31019.58338.10588.10598.3385
937.16377.88126.71317.23727.32556.54496.54503.8850
10105.80536.05355.40455.58265.54855.28465.28461.6290
1174.67464.59194.34554.27834.17064.26704.26700.6210
1243.74093.44543.48983.26053.11473.44533.44530.2170
1352.97622.56072.79952.47292.31332.78192.78180.0700
1422.35471.88702.24341.86761.70992.24622.24620.0210
1521.85341.38021.79601.40531.25861.81361.81400.0059
≥1666.34393.51067.07754.10443.37447.60487.60490.0020
Total233233233233233233233233
log L 590.3751596.7075592.6174598.9318605.3913592.7991592.7991827.4472
AIC 1184.7501195.4151189.2351199.8641212.7831189.5981189.5981656.894
BIC 1191.6521198.8661196.1371203.3151216.2341196.5001196.5001660.345
χ 2 9.318725.161512.371028.132845.723612.055012.0549483.1912
df 67687667
p value 0.1564<0.0010.05420.0004<0.0010.06080.0607<0.001
Table 7. MLEs and Bayes estimates of the PMiD parameters on the COVID-19 dataset.
Table 7. MLEs and Bayes estimates of the PMiD parameters on the COVID-19 dataset.
ParameterMLBayes
α 0.10290.1688
θ 0.41620.4515
Table 8. Regression results on the length of the hospital stay dataset (Standard errors in brackets).
Table 8. Regression results on the length of the hospital stay dataset (Standard errors in brackets).
CovariatesPoissonNPWEPXGDPMiD
Estimate (SE)p-ValueEstimate (SE)p-ValueEstimate (SE)p-ValueEstimate (SE)p-Value
τ 0 1.4560 (0.0159)<0.0011.3871 (0.0458)<0.0011.3996 (0.0349)<0.0012.1007 (0.0228)<0.001
τ 1 0.9604 (0.0122)<0.0010.9982 (0.0363)<0.0010.9721 (0.0270)<0.0010.3693 (0.0789)<0.001
τ 2 −0.1240 (0.0118)<0.001−0.1276 (0.0380)<0.001−0.1269 (0.0280)<0.001−0.0553 (0.0130)<0.001
τ 3 0.3266 (0.0121)<0.0010.4047 (0.0376)<0.0010.1201 (0.0298)<0.0010.2087 (0.0220)<0.001
τ 4 0.1222 (0.0125)<0.0010.1189 (0.0405)<0.0010.3732 (0.0280)<0.0010.0309 (0.0108)<0.001
θ 0.5311 (1.4829 × 103)0.9997 0.3296 (0.0054)<0.001
log L11,189.90 11,194.01 10,569.80 10,382.38
AIC22,389.80 22,400.02 21,149.60 20,776.76
BIC22,420.72 22,437.13 21,180.60 20,813.88
Table 9. The CML estimates of the fitted INAR(1) models and corresponding GOF statistics.
Table 9. The CML estimates of the fitted INAR(1) models and corresponding GOF statistics.
ModelParametersEstimateSE log LAICBIC
INAR-PMiD(1)p0.28130.0293446.0982898.1965905.4166
α 0.68692.8522
θ 0.02470.0019
INAR-NPWE(1)p0.35030.0204465.8715937.7431944.9632
α 0.00680.0043
θ 0.34080.8622
INAR-PTE(1)p0.03200.0293451.1948908.3896915.6098
α −0.99990.2543
θ 0.01300.00124
INAR-DPLi(1)p0.31790.0238450.902905.804910.6174
α 0.01720.0015
PINAR(1)p0.05920.01451418.9182841.8362846.65
α 158.60022.7801
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Maya, R.; Irshad, M.R.; Chesneau, C.; Nitin, S.L.; Shibu, D.S. On Discrete Poisson–Mirra Distribution: Regression, INAR(1) Process and Applications. Axioms 2022, 11, 193. https://doi.org/10.3390/axioms11050193

AMA Style

Maya R, Irshad MR, Chesneau C, Nitin SL, Shibu DS. On Discrete Poisson–Mirra Distribution: Regression, INAR(1) Process and Applications. Axioms. 2022; 11(5):193. https://doi.org/10.3390/axioms11050193

Chicago/Turabian Style

Maya, Radhakumari, Muhammed Rasheed Irshad, Christophe Chesneau, Soman Latha Nitin, and Damodaran Santhamani Shibu. 2022. "On Discrete Poisson–Mirra Distribution: Regression, INAR(1) Process and Applications" Axioms 11, no. 5: 193. https://doi.org/10.3390/axioms11050193

APA Style

Maya, R., Irshad, M. R., Chesneau, C., Nitin, S. L., & Shibu, D. S. (2022). On Discrete Poisson–Mirra Distribution: Regression, INAR(1) Process and Applications. Axioms, 11(5), 193. https://doi.org/10.3390/axioms11050193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop