1. Introduction
Discrete models are very important in handling count data encountered in several theoretical and applied sciences such as medicine, insurance, life testing, biology, and agriculture. Recently, there has been an increased interest among statisticians to construct new flexible discrete distributions. Chakraborty and Chakravarty [
1] mentioned that almost all observed values are actually discrete because they are measured to only a finite number of decimal places and cannot really constitute all points in a continuum.
On the other hand, in some life testing and survival analysis studies, lifetimes can be treated as a discrete random variable and hence its reliability function is a function of a discrete random variable. For example, the reliability of a switching device is a function of the number of times the switch is operated or the reliability of a computer is a function of the number of time the computer has broken down. Recently, many continuous lifetime distributions have been discretized for modeling discrete lifetime data. For example, discrete Weibull by [
2], discrete Burr and discrete Pareto distributions by [
3].
Furthermore, some discrete distributions have been proposed by compounding two discrete distributions—for example, the uniform-Poisson distribution by [
4], uniform-geometric distribution by [
5], and binomial discrete-Lindley distribution by [
6]. Recently, Al-Babtain et al. [
7] proposed a natural discrete analog of the continuous-Lindley distribution as a mixture of negative binomial and geometric distributions.
In recent decades, several works have been introduced in the statistical literature to discretize continuous distributions. However, there is still a clear need to introduce a more flexible discrete lifetime distributions to model several types of count data in many applied areas including insurance, social sciences, reliability studies, and economics. It is worth noting that the probability mass functions (pmfs) of most recently introduced discrete distributions are developed by discretizing the continuous survival functions of continuous distributions and have quite a complex structure in terms of their parameter estimation—for example, the two discrete Lindley models by [
8,
9].
Apart from discretization techniques, we were motivated to propose a more flexible extension of the geometric distribution using a transmuted record type (TRT) method due to [
10]. The TRT approach can be summarized as follows. Let
be a random sample from a distribution having cdf
. Let
and
be the first and second upper records, respectively, based on this sample. Consider a random variable
X that is defined as follows:
Hence, the cdf of
X follows as
where
.
According to Equation (3.1) of [
11], the cdf of the first upper record,
, reduces to:
The cdf of the second upper record,
, takes the form:
By inserting Equations (
2) and (
3) in (
1), we obtain:
After some algebra, the cdf of
X reduces to:
where
is any baseline cdf and Equation (
4) is referred as the cdf of the TRT method. Tanış and Saraçoğlu [
12] constructed the TRT-Weibull distribution using the TRT approach. Further information about the TRT approach can be explored in [
10].
To the best of our knowledge, this is the first article that applies the TRT method to construct an extended form of geometric distribution. The proposed discrete model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed data and hence can be applied in collective risk models and can be considered a competitive distribution to the negative binomial and Poisson distributions for fitting automobile claim frequency data.
Additionally, we derive explicit expressions for its basic distributional properties including moment generating and probability generating functions, mean, variance, skewness, kurtosis quantile function, stochastic orders, mean deviation, and mean residual life. In addition, we derived two important risk measures, namely the value at risk and tail value at risk for the TRTG model. The TRTG parameters were estimated via the maximum likelihood, moments, proportions, and Bayesian estimation methods. The simulation results were determined to explore their performance in estimating the TRTG parameters
and
q. The applicability of the TRTG distribution was studied by three data sets from the actuarial sciences showing its superiority as compared with competing models, namely transmuted geometric [
13], discrete Burr [
3], discrete Chen [
14], negative binomial, geometric, and Poisson distributions. We were also motivated to propose a count regression model based on the TRTG distribution. The new TRTG regression model outperformed the Poisson, geometric, and Poisson–Lindey (PL) [
15] regression models.
The rest of the paper is organized as follows. We define the TRTG distribution in
Section 2. Some of its distributional properties are provided in
Section 3. In
Section 4, we derive two important risk measures of the TRTG model and provide some numerical computations for them. The TRTG parameters,
and
q, are estimated via four estimation methods in
Section 5. In
Section 6, a Monte Carlo simulation study is conducted to investigate the efficiency of different proposed estimates. In
Section 7, we analyze three insurance data sets to illustrate the flexibility of the TRTG model. The TRTG count regression model is discussed in modeling real life count data in
Section 8. Finally, the paper is concluded in
Section 9.
2. The TRTG Distribution
First, we applied the TRT methodology to propose the two-parameter TRTG distribution as an extended version of the geometric distribution with pmf, , and a cumulative distribution function (cdf), .
By inserting the cdf, the geometric distribution in Equation (
4), we obtain the cdf of the TRTG distribution as follows:
The TRTG distribution is specified by the following pmf:
where
. If
X has the pmf (
6), then it is denoted by
.
The survival function of the TRTG model is specified by
It is clear that In other words, the TRTG distribution behaves like a geometric model when lies around zero.
Consequently, the hazard function (hf) of
X reduces to:
Figure 1 presents the plots of the pmf of the TRTG model for some choices of
q and
.
Figure 1 shows that the probabilities can only be decreasing or increasing–decreasing-shaped. Furthermore, it is observed that as
increases, in most diagrams, the mode moves to the right, showing that the TRTG model is so versatile and that small values of
have a substantial effect on the TRTG distribution.
Figure 2 displays the hf plots of the TRTG model for some choices of
q and
and it reveals that the TRTG model has a decreasing discrete hazard rate.
6. Simulation Study
To obtain information about the performance of the previous estimators, we conducted an appropriate simulation study. In this simulation, we generated samples of size
from the TRTG
distribution and then computed the ML, MM, MP, and Bayes estimates of
q and
. We calculated the average absolute biases (ABBs), mean square errors (MSEs), and mean relative errors of the estimates (MREs) for all methods. The ABBs, MREs, and MSEs are calculated by
and:
where
and
.
The optim-CG routine in the R program were adopted to generate 5000 trials to estimate these indices of the ML, MM, MP, and Bayes estimates. Different sample sizes and two-parameter settings were considered,
and
. The results are given in
Table 3 and
Table 4. From
Table 3 and
Table 4, it was concluded that the ABBs, MREs and MSEs of all estimates decrease when
n increases as expected. Moreover, the Bayes, ML, and MM methods provide the best estimates in terms of performance criteria. The Bayes, ML and MM estimates are almost identical in terms of ABBs, MSEs, and MREs and they perform better than the MP estimates. Furthermore, as the sample size
n increases, the ABBs and MSEs of all estimators reduce as expected.
7. Modeling Three Actuarial Data
In this section, the TRTG distribution was fitted into three real actuarial data sets and compared with the transmuted geometric (TRAG), discrete Burr (DB), discrete Chen (DC), negative binomial (NB), geometric (G), and Poisson (P) distributions.
First data set: These data were reported in Klugman et al. [
19] and represent the number of claims of automobile liability policies.
Second data set: The data were analyzed by Klugman et al. [
19] and represent the number of hospitalizations per family member and year.
Third data set: These data were studied by Willmot [
20] and refer to the number of automobile insurance claims per policy in two portfolios from Belgium during the period 1975–1976, respectively.
The TRTG, TRAG, DB, DC, NB, G, and P distributions were fitted to the three data sets, respectively. Their parameters estimates were obtained via the ML method. The chi-square procedure was adopted to test
TRTG
. To compute the
statistic, the unknown parameters
q and
were estimated from the three data sets. Under null hypothesis, the estimated probabilities can be calculated by
and the estimated expected frequencies are
where
is an ML estimate of
. For the three data sets, the chi-square test,
, was computed for the TRTG distribution and other competing distributions. The results of observed and expected frequencies,
and
are listed in
Table 5,
Table 6 and
Table 7 for the three data sets, respectively. The values in these tables reveal that the TRTG distribution has the lowest values for
and
among all competing discrete models and it provides a better fit for the given data sets than the TRAG, DB, DC, NB, G, and P distributions. Based on the results, we cannot reject
at the
significance level.
Furthermore, for visual comparisons, the observed and fitted distributions are displayed in
Figure 3,
Figure 4 and
Figure 5 for the three data sets, respectively.
8. TRTG Count Regression Model
Let
X be the response variable and
be its associated
vector of covariates. Assume that the response variable
X follows the TRTG distribution with the mean
. Furthermore, the mean of the response variable linked with the explanatory variables by log-linear form, i.e.,
, where
and
by replacing
with
, we obtain the re-parameterized pmf of Equation (
6) as
The corresponding log-likelihood equation takes the form
where
. Equation (
14) is not in closed form and it cannot be solved explicitly. Some numerical methods can be used to achieve solutions. We illustrate the application of TRTG regression model by analyzing a real data about the count of infected blood cells (per mm2) on microscope slides prepared from
randomly selected individuals [
21]. The response variable (
: count of infected blood cells) was related to the following explanatory variables: the smoking status of the subject (
0: yes; 1: no), and their sex (
0: female; 1: male). Based on
Table 8, we can strongly conclude that the proposed TRTG count regression model outperforms the Poisson, geometric, and PL regression models. The log-likelihood
and Akaike information criteria (AIC) are also reported in
Table 8.