1. Introduction
A random variable
X follows the xgamma distribution if its probability density function (pdf) is given by
and its cumulative distribution function (CDF) is
Plots of the pdf of the xgamma distribution are presented in
Figure 1 for some values of
.
As an extension to the xgamma distribution, the two-parameter xgamma distribution is proposed as a new distribution by [
1] by using an additional parameter to the xgamma distribution to obtain a more flexible distribution in modeling real data sets due to the wide use of the xgamma distribution in several survival analyses. When a random variable,
X, follows the TPXG distribution, the probability density function and the cumulative distribution function are, respectively, given by
For in (3), we obtain the xgamma distribution with parameter as a special case of the TPXG distribution.
The
rth moment for the distribution is obtained by
The characteristic function (CF) and hazard function
of the model are, respectively, given by
Figure 2 represents some possible pdf shapes of the TPXG distribution for selected values of
and
, which reveals the flexibility of the distribution in modeling right-skewed observations. Further,
Figure 3 indicates the possible shapes of the function
. They are bathtub, increasing, decreasing, and decreasing-increasing shapes. For more explanations regarding the TPXG distribution, see [
1].
When the variable of interest is expensive to measure or difficult to obtain, but cheap and simple to rank, ranked set sampling is recognized as an effective sampling strategy for enhancing the accuracy and efficiency of parameters estimation. McIntyre [
2] proposed the ranked set sampling scheme for estimating the pasture and forage yields.
Let TPXG distribution, with the pdfs and CDF , where and represent, respectively, the population mean and variance. Let the random sample , , ⋯, with the same pdf . The method of the ranked set sampling (RSS) can be described as follows:
The selected RSS units are denoted by , where is the ith largest unit in a set of size k in the jth cycle. Notice that even we selected units, we only measured k of them; these units are not identically distributed, but they are independent because they are selected from different sets.
Takahasi and Wakimoto [
3] delivered the mathematical theory of the RSS, and showed that the RSS estimator of mean with the perfect ranking is unbiased and better than the SRS estimator due to its smaller variance. The SRS mean estimator is given by
The RSS estimator of the population mean with its variance are given by
Note that since
, we have
They also showed that
where
Under perfect rankings, this relation emphasizes the efficiency of the
mean estimator due to its variance compared to
for the SRS estimator for the same number of quantified observations regardless of the distribution of nature. Even with a ranking error, Dell and Clutter [
4] demonstrated that RSS is more effective than simple random sampling.
Some further modifications of RSS are suggested in the literature, such as extreme RSS by Samawi et al. [
5], Mutllak [
6] introduced a modification of RSS called median ranked set sampling; another scheme of RSS is proposed by Al-Saleh and Al-Kadiri [
7] which is the double RSS, percentile RSS by Mutllak [
8], L RSS by Al-Nasser [
9], Haq et al. [
10] suggested partial RSS design, and neoteric RSS by Zamanzade and Al-Omari [
11]. In addition to these modifications, many authors investigated the parameter estimation of some distributions using RSS or its modifications. For example, the logistic model parameters are estimated based on SRS and RSS by Abu-Dayyeh et al. [
12]. The generalized quasi-Lindley distribution parameter estimation is considered by Al-Omari et al. [
13]. Yousef and Al-Subh [
14] used maximum likelihood methods to estimate Gumbel parameters under RSS. Akgul and Şenoglu [
15] investigated some modifications of the RSS in estimating the Weibull distribution parameters. The Bayesian and maximum likelihood estimation approaches are considered by Hussian [
16] to determine parameter estimates for the Kumaraswamy distribution under RSS. Chen et al. [
17] used moving extremes RSS to estimate the scale parameter for the scale distribution. Al-Omari and Bouza [
18] considered ratio estimators of the population mean with missing values using RSS. Later, Al-Omari [
19] considered the varied L RSS and used the MLE in location-scale families. Hassan et al. [
20] used median RSS and estimated the stress–strength reliability for the generalized inverted exponential distribution.
Due to the importance of the TPXG distribution in lifetime distributions and to our knowledge, this is the first study to consider the RSS design for parameter estimations of the TPXG distribution. Hence, the main focus of this paper is to use RSS design for estimating the TPXG distribution parameters and then use some well-known methods of estimation, including the method of maximum product of spacings, maximum likelihood method, ordinary least square method, method of Cramer and von Mises, weight least square method, and the Anderson–Darling method. Then, the suggested estimators based on the RSS design are compared with their competitors in SRS for the same number of measured observations. A real data set is analyzed to explain the usefulness of the offered estimators. Based on the gained results, the RSS estimators are found to be better than the SRS counterparts in terms of the MSE, bias, and efficiency values for all methods of estimation considered in the study.
The layout of this paper is as follows. The estimation methods of the TPXG distribution parameters are presented in
Section 2. A simulation study is conducted to show the superiority of the RSS relative to the SRS estimators in
Section 3. In
Section 4, the suggested estimators’ usefulness is examined using a real data set fitted to the TPXG distribution. The last section will present the conclusion and remarks.
2. Method of Estimation
Here, based on RSS design, six estimation methods are considered to estimate the and parameters of the TPXG distribution, which are: the maximum likelihood (MLE) method, the maximum product of spacings (MPS) method, ordinary least square (OLS) method, weight least square (WLS) method, Cramer–von Mises (CV) method, and Anderson–Darling (AD) method. In all methods, we denote by k the ith order statistics from the ith set of size k of the jth cycle and take them to be the RSS data for X with sample size .
2.1. MLE Method
Considering an RSS sample of size
, the likelihood function is obtained by
with
Let the log-likelihood function
be
The and cannot be obtained explicitly and they are not in closed form. Hence, they should be solved numerically to find the MLEs, and of and , respectively.
2.2. Method of MPS
Cheng and Amin [
21,
22] introduced this method, which depends on maximizing the geometric mean of data spacings. Consider
to be an ordered sample forming a RSS of size
from the TPXG distribution. The uniform spacings are given by
Note that and . It is clear that
Let the geometric mean of the spacing be
The natural logarithm of (
11) is
The estimators,
and
, are the values of
and
, which maximize the geometric mean of spacings. The determination of these estimators can be achieved by determining the solution of the following nonlinear equations:
where
and
that can be solved numerically.
2.3. Methods of LS
Well-known results in probability theory indicate that , where F is a distribution function, and are the ith-order statistic of the sample . Therefore, and .
Using the expectation and variance, two variants of the least squares methods can be obtained. Swain et al. [
23] were the first to use the method of LS for parameter estimations of the beta distribution.
2.3.1. OLS Method
The OLS estimators
and
of
and
, respectively, can be found by minimizing the following function, with respect to
and
:
Alternatively, we can obtain the estimators by solving simultaneously the nonlinear equations:
where
and
are defined as in (
12) and (
13), respectively.
2.3.2. WLS Method
The WLS estimators of
and
, say,
and
, respectively, can be determined by minimizing the following function, with respect to
and
:
Note that these estimators are also the solution to the following nonlinear equations:
where
and
are specified as in (
12) and (
13), respectively.
2.4. Methods of Minimum Distances
Several methods of estimation can be proposed based on the minimization of test statistics between the empirical cumulative distribution and theoretical functions. The Cramer–von Mises and Anderson–Darling methods are considered here. (See D’Agostino and Stephens [
24]).
2.4.1. CV Method
The CV
and
of
and
, respectively, can be found by minimizing the following function, with respect to
and
:
Consequently, these estimators are also the solution to the nonlinear equations:
where
and
are given in (
12) and (
13), respectively.
2.4.2. AD Method
The AD estimates of the TPXG distribution parameters,
and
, denoted by
and
, can be gained by minimizing the following function with respect to
and
:
or by simultaneously solving the two equations:
and
where
and
are specified in (
12) and (
13), respectively.
4. Application to Read Data
The usefulness of the proposed RSS estimators is examined in this section using a well-known real data set, which embodies the waiting times (in minutes) before service of 100 bank customers. These data were studied by Ghitany et al. [
25]. The data observations are: 2.6, 2.7, 2.9, 3.1, 3.2, 3.3, 3.5, 3.6, 4.0, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.6, 4.7, 4.7, 4.8, 4.9, 4.9, 5.0, 0.8, 0.8, 1.3, 1.5, 1.8, 1.9, 1.9, 2.1, 5.3, 5.5, 5.7, 5.7, 6.1, 7.1, 7.1, 7.4, 7.6, 7.7, 8.0, 8.2, 8.6, 8.6, 8.6, 8.8, 8.8, 11.0, 11.1, 11.2, 6.2, 6.2, 6.2, 8.9, 8.9, 9.5, 9.6, 9.7, 9.8, 10.7, 10.9, 11.0, 6.3, 6.7, 6.9, 7.1, 7.1, 11.2, 11.5, 11.9, 12.4, 12.5, 12.9, 13.0, 13.1, 13.3, 17.3, 17.3, 18.1, 18.2, 18.4, 18.9, 19.0, 19.9, 20.6, 21.3, 21.4, 21.9, 23.0, 27.0, 31.6, 33.1, 13.6, 13.7, 13.9, 14.1, 15.4, 15.4, 38.5.
The TPXGD distribution is fitted to this data. We considered different criteria in this study, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), Hannan Quinn Information Criterion (HQIC), Consistent Akaike Information Criterion (CAIC). Details of these criteria can be found in Akaike [
26], and Schwarz [
27], Hannan and Quinn [
28] and Bozdogan [
29]. Additionally, Kolmogorov–Smirnov (KS) is obtained for each model.
The formulae for these criteria are: AIC = , CAIC = , HQIC = , BIC = , and KS = , where h is the number of parameters and n is the sample size and L is the value of the maximum log-likelihood function.
Since the distribution under study has two parameters, for fitting the data, we considered two distributions of two parameters—Darna distribution and Marshall–Olkin Esscher transformed Laplace distribution—and one distribution of one parameter, the inverse length-biased Maxwell distribution. The pdfs of these distributions are mentioned below.
Darna distribution with pdf:
Marshall–Olkin Esscher transformed Laplace distribution (MOETL) with pdf:
Inverse length-biased Maxwell distribution (ILBMD) with pdf:
The results are reported in
Table 5. They show that the TPXG distribution provides a superior fit over other competing continuous models, since it has the smallest values for all measures with smallest values of the Kolmogorov–Smirnov distance;
Figure 4 supports this claim.
Total Time on the Test (TTT) plot plays a vital role in selecting the proper model for fitting the underlying data regarding the failure rates. This informs us of the altered forms of the model failure rate. If the plot has a straight line, then the given data have a constant failure rate. The failure rates will be decreased if it is convex and increased if this plot is concave. For the bathtub shape, the TTT plot decreases first and then increases. Whereas, if the TTT plot is concave first and then convex, the failure rates will have an inverted bathtub shape. The TTT and density plots for TPXG distribution for the bank customers’ data are given in
Figure 5. The probability–probability (P-P) and quantile–quantile (Q-Q) plots for the TPXG model based on the real data are given in
Figure 6.
Figure 7 presents the box and Bee Swarm plots for these data.
Based on these data, we take an SRS of size 20, while for the ranked set sampling, a small sample size of
is considered with number of cycles as
.
Table 6 and
Table 7 include the RSS (
n = 4 and
k = 5) and SRS (
) samples taken from the bank customers data. It is of interest to note here that the SRS and RSS methods are compared based on the same number of measured units. Using the previous methods, we calculate the estimates of
and
in each design. Here, we assumed that the ranking is perfect. To compare estimators, we considered the previous criteria measures, AIC, BIC, CAIC HQC, and KS. The results are summarized in
Table 8.
The findings in
Table 8 illustrate that the TPXG parameter estimates, based on the RSS method, are improved compared to their counterparts in SRS in terms of the smallest values of AIC, BIC, CAIC HQC, and KS, using the MLE, MPS, OLS, WLS, CV, and AD.