1. Introduction
The Nikulin-Rao-Robson (N-RR) test statistic is a measure of the goodness of fit of a statistical model to a set of data. The N-RR test is a general test that can be used to assess the fit of a wide range of statistical models, including survival models, regression models, and time series models. The N-RR test statistic can be used to compare the fit of different statistical models to the same data. This can help in model selection by identifying the model that provides the best fit to the data. The N-RR test statistic can be used to assess the goodness of fit of a statistical model to the data. If the N-RR test statistic value is small, it indicates a good fit between the model and the data. On the other hand, if the N-RR test statistic value is large, it indicates a poor fit between the model and the data. The N-RR test statistic can be used to detect outliers in the data. Outliers are data points that do not fit the general pattern of the data and can have a significant impact on the fit of the model. The N-RR test can identify these outliers and help to improve the fit of the model. The N-RR test statistic can be used to diagnose problems with a statistical model. If the N-RR test statistic value is large, it can indicate that the model is mis-specified or that there are problems with the assumptions of the model. The N-RR test statistic can be used to assess the fit of time series models. Time series models are used to analyze data that are collected over time, and the N-RR test can help to identify the best model to use for the data. The N-RR test statistic is a valuable tool in statistical analysis and can be used in a wide range of applications. It is particularly useful for model selection, assessing the goodness of fit of a model, and diagnosing problems with a model. Based on the extreme importance of the N-RR test, we are excited to apply the test and to present a modified N-RR version of it and harness it in statistical modeling and in the theory of testing statistical hypotheses in this work.
As is common for many probability distribution researchers, we will not approach this new distribution with the usual method in this work. For example, we will not pay as much attention to the traditional study of the new distribution, not because it is unimportant, but rather because we are more interested in the real-world applications of the mathematical and statistical modelling, as well as a sizable portion of the distribution’s verification using censored data. We will omit a collection of theoretical mathematical features, a huge number of algebraic derivations, and related theories in order to highlight the importance and adaptability of the new distribution, as well as its wide range of applications in statistical and mathematical modelling and the handling of controlled data.
In order to effectively model and evaluate real-life data that cannot be fully represented by existing distributions, a new probability distribution may be required. New distributions are created for a variety of reasons, including for addressing certain data traits or characteristics, enhancing the precision of simulations or forecasts, or offering more adaptable modelling alternatives. New distributions can also result in improvements in statistical theory and its use. When current distributions fall short of accurately describing the properties of real-life data, a new probability distribution may be required. This may occur when the data exhibits distinct traits or properties that cannot be explained by pre-existing distributions, such as asymmetry, heavy tails, or multi-modality.
To provide a better match to the data and boost the precision of statistical analysis, forecasts, or simulations under these circumstances, a new distribution may be created. Depending on the particular application or issue being solved, a new distribution’s motivations can change. To represent the distribution of financial returns or exceptional occurrences, for instance, new distributions may be created in the field of finance. To model the distribution of gene expression levels in biology, new distributions may be created. New distributions may be created in engineering to simulate the distribution of material strength or fatigue life.
In this context, we must mention briefly some details about the emergence of the new distribution and how it was derived and formed to be useful to researchers in this field to help them present other similar and possibly more flexible distributions. Following Aryal and Yousof [
1], the cumulative distribution function (CDF) of the quasi-Poisson-exponential (QPE) distribution can be expressed as:
where
,
, and
The corresponding probability density function (PDF) can then be expressed as:
for all
,
. The hazard rate function (HRF) of the new model can be obtained from
. The exponential distribution and new flexible extensions of it have drawn the attention of many academics. These scholars are also interested in the applications of these new extensions in a variety of scientific fields, including engineering, insurance, medicine, reliability, actuarial science, and others. The purpose of devising a new distribution is ultimately to offer a more precise and adaptable tool for modelling and analyzing data, which can result in a better comprehension of the underlying mechanisms and improved decision-making. To determine whether a new distribution can accurately characterize the data and forecast the future, it is crucial to thoroughly assess its attributes and compare them to those of existing distributions.
In this work, a modified version of the Nikulin-Rao-Robson (N-RR) test from the quality of fit test for real data is presented. The method of creating the new test was presented first with all its related algebraic derivations and theoretical results. Secondly, the test was applied to uncensored real data. Finally, the modified test was applied to some censored real data sets. In this context, it is worth pointing out that some recent works closely related to the subject of the statistical hypothesis tests, such as Yousof et al. [
2] (distributional validation of the exponential extension under a modified goodness of fit test with applications to censored and complete data) and Emam et al. [
3] (for the right censored Bayesian and non-Bayesian validation and testing).
When exploring the flexibility of a new probability distribution, it is important to consider the characteristics of the data being modeled and to compare the new distribution to other commonly used distributions to determine its suitability for the task at hand. The flexibility of a distribution can also be improved by combining multiple distributions in a mixture model or by using transformations to model non-standard data patterns. In this work, we will test the flexibility of the new distribution from several aspects, including theoretical aspects (mathematical and statistical) and practical and applied aspects (statistical modeling and statistical hypothesis tests).
It is worth noting that the first thing that students care about is exploring the density function and the failure rate function graphically. This is what we will give great attention to in the following lines.
Figure 1 gives some plots for the new PDF (left graph) and HRF (right graph) for the QPE model. According to the
Figure 1 (the left graph), it is seen that the new PDF can be asymmetric density with one high peak, asymmetric density with one wide peak and asymmetric density with heavy tail to the right. According to the
Figure 1 (the right graph), it is seen that the new HRF can be increasing-constant and upside down-constant.
The main motives behind introducing this new distribution can be listed in the following main points:
- I.
Introduce a new probability distribution with one parameter since the statistical literature has dozens of probability distributions but not many of them have one parameter. The less distribution parameters the better in applied modeling, estimation, simulation experiments, etc.
- II.
Introducing a new probability distribution with mathematical properties that are easy to derive and easy to calculate and apply. In the current distribution, as will be shown later, all its mathematical and statistical properties are found in specific formulas, except for the quantile function. However, the new statistical packages help a lot in overcoming this problem with numerical methods and solutions. Numerical methods (and the numerical solutions they provide) have become necessary to study in such cases to overcome some of the complex formulas that researchers may face.
- III.
Presenting probability distributions whose density function has a heavy tail to the right or to the left and accommodates other forms that indicate the flexibility of the new distribution. In many fields, such as finance, economics, physics, and engineering, heavy-tailed distributions are important because they more accurately reflect real-world phenomena where extreme events are more frequent than would be predicted by a normal distribution.
- IV.
Assessing many classical estimation methods under the new distribution, whether through simulated experiments or through practical applications on real data.
- V.
Employing the new distribution in statistical modeling processes to be a suitable alternative to many of the well-known distributions found in the statistical literature. As indicated earlier, the current distribution has only one parameter; however, it has proved its importance and flexibility compared to many distributions that have a greater number of parameters, such as Marshall–Olkin exponential, the beta exponential, the Marshall–Olkin Kumaraswamy exponential, and the Kumaraswamy Marshall–Olkin exponential.
- VI.
Introduce a new distribution with a small number of parameters suitable for modeling real data with outliers, as shown in the
Section 4.2.
- VII.
Although the new distribution is not bimodal, the new distribution proved its application ability in the statistical modeling processes for bimodal data, and this feature gives an added advantage and a new characteristic of the distribution and indicates an important aspect of the flexibility of probability distributions (see
Figure 1 (the middle right graph for relief times data) and Figure 4 (the middle right graph for the survival times data)).
- VIII.
Employing the new probability distribution in the modeling domains of complete and right censored data.
- IX.
Harnessing the new distributions in the modeling processes is imperative in most of the applied statistical work, and in this work, we did that by using an old well-known goodness-of-fit test and using a new modified goodness-of-fit test, and we presented evidence and arguments that support the importance of the new distribution and also support the new modified test.
- X.
By exploring the statistical literature in the field of probability distributions and their applications, we will, in fact, find a lot of new distributions and a lot of routine work in the way it is presented and in its applied content. Therefore, in this work, we were very keen to present a new distribution and to study it in a thorough study, theoretically and practically. In fact, simply presenting a new distribution is not an interesting matter. Thus, we studied the new distribution from a variety of aspects, and those related to the mathematical side, and those related to the modeling side, including those related to estimation and simulation processes in various ways, including those related to statistical hypothesis tests, and related to statistical modeling for censored data, validation tests, and quality of fit.
2. Main Properties
There are many mathematical properties associated with probability distributions that are important to understand. Some of the most important properties include quantile function, moments, mean, variance, moment generating function (MGF), and incomplete moments. Understanding these mathematical properties is crucial for several reasons. Firstly, they allow us to compute various statistical measures that can help us understand the behavior of a random variable.
For example, the mean and variance can help us describe the central tendency and variability of a probability distribution. Secondly, these properties allow us to make predictions about the behavior of a random variable in the future. In this paper, although we focus a lot on the practical and applied aspects in the areas of statistical modeling and applications on data that are diverse in nature and in their functional form, in this section, we will also review some of the mathematical properties of the new distribution in order to cover this important aspect of this work. In this section, we look into the quantile, generating, and full moments as well as some other mathematical aspects of the new model. It may be more effective to use established algebraic expansions than to directly compute parts of this family’s structural features by numerically integrating its density function.
2.1. Quantile Function
The importance of the quantile function lies in its ability to provide important information about the distribution of a random variable. Specifically, the quantile function can be used to calculate a wide range of statistical measures, including the median, quartiles, deciles, and percentiles of a distribution. This makes it a useful tool for describing the shape and variability of the distribution. The quantile function has many applications in different fields. The quantile function is an important tool in statistics and data analysis, with a wide range of applications in different fields. It provides important information about the distribution of a random variable and can be used to estimate a variety of statistical measures and make important decisions based on data. When
, then
implies:
By solving non-linear Equation (3), we can generate data from the proposed model. The quantile function is used in quality control to set tolerance limits and control charts for manufacturing processes. It can be used to ensure that products meet specifications and minimize defects. The quantile function is used to estimate risk and calculate value-at-risk (VaR) in finance. The VaR is a measure of the potential loss in value of a portfolio of financial assets, and it is calculated using the quantile function. The quantile function is often used in exploratory data analysis to identify patterns and relationships in data. It can be used to identify outliers, estimate the central tendency and variability of a distribution, and compare different groups of data.
2.2. Asymptotic Analysis for CDF, PDF, and HRF
Asymptotic properties are important in many areas of mathematics, including probability theory and statistics. In general, asymptotic properties describe the behavior of a mathematical function or a sequence of numbers as the input grows arbitrarily large or small. In probability theory and statistics, the asymptotic properties of statistical estimators and test statistics are of particular interest. The importance of the asymptotic properties lies in the fact that they allow us to make statistical inferences using large sample sizes. Asymptotic results provide valuable information about the behavior of a statistical estimator or test statistic as the sample size increases, and this information can be used to derive important statistical properties and make inference on the population. The asymptotic results of CDF, PDF, and HRF as
are given by:
The asymptotics of CDF, PDF, and HRF as
are given by:
These results above show the effect of parameter on left tail and right tail. Asymptotic consistency is a fundamental property of statistical estimators. It states that as the sample size increases, the estimator approaches the true value of the parameter being estimated. This property is important because it ensures that the estimator will converge to the true value in the long run.
Asymptotic efficiency is a measure of the precision of a statistical estimator. It states that the estimator that has the smallest asymptotic variance is the most efficient. This property is important because it allows us to compare the performance of different estimators and choose the most precise one. Asymptotic theory is used to derive the distribution of test statistics, such as the t-test or F-test, as the sample size increases.
2.3. Moments and Incomplete Moments
Moments and incomplete moments are important statistical tools that are used in a wide range of applications, including physics, engineering, economics, and finance. They are mathematical functions that provide information about the shape, location, and variability of a probability distribution. The
power of the random variable, where the expected value is taken with respect to the probability distribution. First, the PDF of the QPE model can be expressed as:
where
Then based on (4), we can derive many of the relevant properties of the new model. Let
denote the
moments of
. Then, using Equation (4), we can obtain:
Incomplete moments are similar to moments, but they involve integrating over only part of the range of the random variable. The
incomplete moments of
. Then, using Equation (4), we can obtain:
where
denote the lower incomplete gamma function. The importance of moments and incomplete moments lies in their ability to provide information about the properties of a probability distribution. Specifically, they can be used to calculate a variety of statistical measures, including the mean, variance, skewness, and kurtosis of a distribution. In addition, moments and incomplete moments are used in the construction and analysis of statistical models.
For example, in physics, moments are used to describe the spatial and temporal distributions of particles, while in finance, moments are used to model stock price movements and calculate risk measures. Incomplete moments are also used in a variety of applications, such as image processing, where they are used to describe the intensity distribution of pixels in an image.
2.4. The MGF
The MGF is an important mathematical tool in probability theory and statistics. It provides a powerful method for analyzing probability distributions and making statistical inferences in a wide range of applications. Let
denote the moment generating function of
. Then, using Equation (4), we can obtain:
The MGF is used to derive the sampling distribution of a statistic. For example, it can be used to derive the distribution of the sample mean or sample variance, which are important statistics in statistical inference. The MGF is used to model financial variables and estimate financial parameters.
2.5. Residual Life Function
The residual life function, also known as the remaining life function, is a function used in reliability theory to describe the probability that an item will fail in a given time interval, given that it has survived up to a certain time. The residual life function can be defined as the conditional survival probability, given that the item has already survived up to a certain time. The residual life function is a useful tool in many areas where the analysis of reliability/survival data is important. The
moment of the residual life can be derived as:
where
and
The residual life function can be used to predict when a machine or component is likely to fail. By monitoring the condition of a machine and calculating the residual life function, maintenance personnel can determine when to perform maintenance or replace the machine before it fails. The residual life function can be used to analyze product warranties. Manufacturers can use the function to estimate the likelihood of failure of their products during the warranty period and adjust the warranty terms accordingly. The residual life function can be used to analyze reliability/survival data in medical research. For example, it can be used to estimate the probability of a patient surviving a certain amount of time after a medical treatment or diagnosis. The residual life function can be used to calculate the expected remaining life of an insured item or person, which can be used to determine insurance premiums. The residual life function can be used to determine the optimal time to replace or retire assets, such as buildings, vehicles, or machinery. By predicting the residual life of the asset, managers can optimize the use of the asset and minimize costs.
2.6. The Reversed Residual Life Function
The reversed residual life function, also known as the exceedance probability function, is the complement of the residual life function. It describes the probability that an item will fail before a given time, given that it has already survived up to a certain time. The reversed residual life function is a valuable tool in many applications where the analysis of probabilities associated with time-to-event data is important. Then, the
moment of the reversed residual life of
becomes:
where
. The reversed residual life function is often used in reliability engineering to analyze the reliability of complex systems. It can be used to estimate the probability of a system component failing before a specified time, given that it has already operated for a certain period. The reversed residual life function can be used in risk management to estimate the probability of an event occurring within a given time frame. This can be used to assess the risk associated with a particular activity or investment. The reversed residual life function can be used in quality control to estimate the probability of a defect occurring within a specified time period. This can be used to set quality control standards and to ensure that products meet quality specifications. The reversed residual life function can be used in environmental analysis to estimate the probability of an environmental hazard occurring within a given time frame. This can be used to assess the risk associated with environmental hazards, such as natural disasters, pollution, or climate change. The reversed residual life function can be used in finance and investments to estimate the probability of a financial asset or investment performing poorly within a specified time period. This can be used to assess the risk associated with particular investments and to make investment decisions based on the estimated probability of returns. For more details about other new mathematical properties.
3. Estimation and Assessment
In the statistical literature, there are many statistical methods that are used in estimation processes. The imperative to use a particular method and prefer a particular method is constrained by the new probability distribution. In this part of the paper, we will present comprehensive simulation studies and use them to compare the methods with each other. Through these comparisons, we will try to judge the performance of these methods (through the behavior of their estimators) by increasing the sample size. The maximum likelihood estimation (MLE), the Cramér–von Mises estimation (CVME), the ordinary least square (OLSQ) estimation method, the weighted least square (WLSQ) estimation, and the Kolmogorov estimation are six non-Bayesian estimation techniques that are covered in this section (KE). Several MCMC simulation studies are carried out to compare the conventional methods. The results are presented in
Table 1 (
),
Table 2 (
), and
Table 3 (
). The numerical assessments are performed depending on the mean squared errors (MSEs). First, we generate N = 1000 samples of the QPE model.
Based on
Table 1,
Table 2 and
Table 3, it is seen that for all estimation methods, the performance of the method improves when
. For example, we can highlight the following results:
- (1)
The MSE| for the MLE method stating with 0.31999|n = 50 and ending with 0.03047|n = 500;
- (2)
The MSE| for the OLSQ method stating with 0.33138|n = 50 and ending with 0.03702|n = 500;
- (3)
The MSE| for the WLSQ method stating with 0.32409|n = 50 and ending with 0.03684|n = 500;
- (4)
The MSE| for the CVMS method stating with 0.36573|n = 50 and ending with 0.03819|n = 500;
- (5)
The MSE| for the moment method stating with 0.37736|n = 50 and ending with 0.03966|n = 500;
- (6)
The MSE| for the KE method stating with 0.37287|n = 50 and ending with 0.03969|n = 500;
- (7)
The method of MLE is still the best method, despite the diversity and abundance of the other classic methods, as shown in
Table 1,
Table 2 and
Table 3. This assessment is based primarily on a comprehensive simulation study, as shown in
Table 1,
Table 2 and
Table 3. This section uses simulation studies to assess various estimating approaches rather than to contrast them. However, this does not exclude the use of simulation to contrast various estimation approaches. However, actual data is frequently used to evaluate various estimating techniques, which is why we will describe four examples exclusively for this function. To compare the rival models, there are further two more applications.
4. Numerical Studies
4.1. Comparison with Existing Methods
The process of preferring a particular estimation method over another must be subject to comparisons, and this comparison will certainly be subject to numerical results, whether for simulation experiments or applications on actual data. In this aspect, we will present a good set of comparisons to examine the different estimation methods and compare them, but through applied modeling operations on real data. This modeling, as will be seen in this aspect, will be through two sets of real data; this data has certain characteristics, and these groups have been carefully selected because the statistical literature has many data sets that can be applied to.
The first data set is called the failure time data, which represents the lifetime data relating to relief times (in minutes) of patients receiving an analgesic. The relief times data was recently analyzed by Ibrahim et al. [
4], Al-Babtain et al. [
5], and Shehata et al. [
6], among others. The second data set is called the survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli. For more reliability data sets, see Wang et al. [
7], Wang et al. [
8], Zhang et al. [
9], and Xu et al. [
10]. This data was recently analyzed by Ibrahim et al. [
4], Al-Babtain et al. [
5], and Shehata et al. [
6], among others.
Table 4 gives the application results (the Kolmogorov–Smirnov (KS) statistic and its
p-value) for comparing methods under the relief data.
Table 5 lists the application results (
p-value) for comparing methods with the relief/survival data. Based on
Table 4, it is seen that the CVMS method is the best with a
p-value = 0.39460 (K.S = 0.20094), then the OLSQ method with a
p-value = 0.38332 (K.S = 0.20278). Based on
Table 5, it is seen that the ML method is the best with a
p-value = 0.66842 (K.S = 0.08551), then the ML method with 0.59476 (K.S = 0.09067).
4.2. Comparison with Competing Models
Several competing models will be compared to the QPE distribution’s fit, including the exponential (E), Odd Lindley exponential (OLE), Marshall–Olkin exponential (MOE), moment exponential (ME), Burr–Hatke exponential (BHE), generalized Marshall–Olkin exponential (GMOE), beta exponential (BE), Marshall–Olkin Kumaraswamy exponential (MOKE), and Kumaraswamy Marshall–Olkin exponential (KMOE) distributions. More competitive models may be developed using the results from Aboraya [
11], Aboraya [
12], Aboraya [
13], Refaie ([
14,
15,
16,
17,
18]), Refaie et al. [
19], Korkmaz et al. [
20], Karamikabir et al. [
21] and Khalil et al. [
22]. However, many flexible families can be used for generating some new useful exponential version based on the new proposed model (see Eliwa et al. [
23], El-Morshedy and Eliwa [
24], and Refaie et al. [
19]). Following Salem et al. [
25], one can also model an actuarial data using the new model.
For comparing models, we consider the Cramér–von Mises (CVMS) and the Anderson–Darling (AD). We investigate the skewness–kurtosis plot (or the Cullen and Frey plot) in these applications for assessing initial fits of theoretical distributions, such as normal, uniform, exponential, logistic, beta, lognormal, and Weibull. Plotting and bootstrapping are both employed for greater accuracy. The scattergram plots, the “nonparametric Kernel density estimation (N-KDE)” method for examining the initial shape of the insurance claims density (see Zárate and Cepeda [
26] for more new related tools), the “Quantile–Quantile (Qu–Qu)” plot to visually assess whether a data set follows a particular distribution, such as a normal distribution, and it represents quantiles of the data set against the corresponding quantiles of the theoretical distribution), the “total time in test (TTT)” plot for examining the initial shape of the empirical hazard rate function (HRF), and the “box plot” for identifying the extreme data were also presented.
Figure 2 gives the box plot (the first row, the left graph), Qu–Qu plot (the first row, the right graph), TTT plot (the second row, the left graph), N-KDE plot (second row, the right graph), the Cullen and Frey plot (third row, the left graph), and scattergrams (the third row, the right graph) for the relief times data.
Figure 2 gives the estimated PDF (E-PDF) (the left graph) and estimated CDF (E-CDF) (the right graph) for relief times data.
Figure 3 gives the Kaplan–Meier survival (KMS) plot (the left graph) and Pr–Pr plot (the right graph) for relief times data. Based on
Figure 2 (first row), the relief data has only one extreme observation; based on
Figure 2 (second row, the left graph), the HRF of the relief times is “monotonically-increasing HRF”; based on
Figure 2 (second row, the right graph), N-KDE is bimodal and right skewed with an asymmetric shape; based on
Figure 2 (third row, the left graph), the relief times data do not follow any of the theoretical distributions, such the normal, uniform, exponential, logistic, beta, lognormal, and Weibull.
Figure 3 (E-PDF (the left graph) and E-CDF (the right graph)) and
Figure 4 (KMS plot (the left graph) and Pr–Pr plot (the right graph)) clearly indicate that the new model fits these data, as the experimental results agreed with the practical results, which was shown in
Figure 3 and
Figure 4.
Table 6 lists the MLEs and SEs for the relief data under the maximum likelihood method.
Table 7 lists the AD and CVMS for the relief data under the maximum likelihood method. Based on
Table 7, we conclude that the proposed lifetime QPE model is much better than all other mentioned models with AD = 0.573 and CVMS = 0.097, thus, the new lifetime model is a good alternative to these models in modeling the relief times data set. As is clear from these results, the new distribution showed its superiority over all the competing distributions in modeling the relief times data, which is bimodal and right skewed with an asymmetric shape.
Figure 5 gives the box plot (first row, the left graph), Qu–Qu plot (first row, the right graph), TTT plot (second row, the left graph), N-KDE plot (second row, the right graph), the Cullen and Frey plot (third row, the left graph), and scattergrams (third row, the right graph) for the survival times data.
Figure 6 gives the E-PDF (the left graph) and E-CDF (the right graph) for survival times data.
Figure 7 gives the KMS plot (the left graph) and Pr–Pr plot (the right graph) for survival times data. Based on
Figure 5 (first row), the reliability/survival data has four extreme observations; based on
Figure 5 (second row, the left graph), the HRF of the survival times is “monotonically-increasing HRF”; based on
Figure 5 (second row, the right graph), nonparametric Kernel density estimation is bimodal and right skewed with an asymmetric shape; based on
Figure 5 (third row, the left graph), the survival times data do not follow any of the theoretical distributions, such the normal, uniform, exponential, logistic, beta, lognormal, and Weibull.
Figure 6 (E-PDF (the left graph) and E-CDF (the right graph)) and
Figure 7 (KMS plot (the left graph) and Pr–Pr plot (the right graph)) clearly indicate that the new model fits these data, as the experimental results agreed with the practical results, and this was shown by
Figure 6 and
Figure 7.
Table 8 lists the MLEs and SEs for the reliability/survival data under the maximum likelihood method.
Table 9 lists the AD and CVMS for the reliability/survival data under the maximum likelihood method. Based on
Table 9, we conclude that the proposed lifetime QPE model is much better than all other mentioned models with AD = 0.589 and CVMS = 0.098, thus, the new lifetime model is a good alternative to these models in modeling relief times data set. These findings clearly demonstrate that the novel distribution outperformed all other distributions in simulating the survival times data, which is bimodal, right skewed, and asymmetric in shape.
5. Construction of N-RR Statistic for the QPE Model
Hypothesis testing is a statistical method used to determine whether a given hypothesis about a population is true or false, based on a sample of data. When dealing with right-censored real-life datasets, which means some of the observations have only a lower limit or only a time-to-event is known but the event has not occurred yet, there are several methods available to conduct hypothesis testing. One common method is the Kaplan–Meier estimator, which is a non-parametric approach used to estimate the survival function of the population from the censored data. The Kaplan–Meier estimator can be used to test hypotheses about the survival curves of different populations or to compare the survival curves of a single population under different conditions.
Another approach to hypothesis testing under right-censored data is to use the Cox proportional hazards model. This model is a semi-parametric method that can be used to estimate the hazard function of a population and to compare the hazard rates between different populations or different conditions within the same population. The Cox proportional hazards model does not require any assumptions about the distribution of the survival times, making it a flexible and widely used method for hypothesis testing in survival analysis. In addition to these methods, other techniques, such as parametric survival models, Bayesian methods, and accelerated failure time models, can also be used for hypothesis testing under right-censored real-life datasets. The choice of method will depend on the specific research question, the nature of the data, and the assumptions made about the underlying distribution of the survival times. Hypothesis testing under right-censored real-life datasets requires careful consideration of the available methods and the assumptions made about the data. It is essential to choose the most appropriate method based on the research question and the nature of the data, to ensure accurate and reliable results. Hence in this work, the N-RR test statistic is chosen for checking the distributional validity for the complete data. Moreover, a new modified N-RR version is presented for the right censored datasets.
The significance of right filtered data comes from the fact that the censoring threshold has an impact on how the response variable’s underlying distribution is estimated. Censoring could provide results that are skewed or false if it is not properly taken into account. Hypothesis testing is an important statistical tool for evaluating claims about a population based on a sample of data. In the context of censored data, hypothesis testing can be used to make inferences about a population when some of the observations are not fully observed or missing, which is known as censoring.
Censored data can arise in various ways, such as in survival analysis, where the length of time a subject survives is recorded, but the exact time of death is unknown. In this case, censored data may result from subjects who are still alive at the end of the study or when the event of interest has not yet occurred. In such situations, standard statistical methods, such as t-tests and ANOVA, may not be appropriate because the censored data can bias the results and lead to incorrect conclusions. Hypothesis testing for censored data, on the other hand, takes into account the censoring information and provides more accurate inferences.
For example, one can use the log-rank test, which is a commonly used hypothesis test in survival analysis, to compare the survival times of two or more groups. The log-rank test accounts for the censored data by considering only the times at which events occur and not the times at which they are censored. In conclusion, hypothesis testing for censored data is crucial in accurately making inferences about a population when some of the data is missing or not fully observed. It helps researchers to account for the censoring information and make more informed decisions based on the data available.
The N-RR test statistic, which is based on differences between two estimators of the chance to fall into grouping intervals, is a well-known modification of the conventional chi-squared tests in the case of full data. One estimate is based on the empirical distribution function, and the other is based on maximum likelihood estimations of the tested model’s unobserved parameters using ungrouped beginning data (see Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29]). The N-RR statistic, a natural adaption of the Pearson statistic for the entire dataset, was introduced by Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29]. Bagdonavičius and Nikulin [
30], as well as Bagdonavičius et al. [
31], suggested changing the N-RR statistic to take into account random right censored real-life data. For newer test statistics see with some applications, see Chaturvedi and Kumar [
32] and Noughabi et al. ([
33] and [
34]).
For the QPE model in the current study, we recommend creating a modified chi-square test. To test the theory, Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29] created the N-RR statistic
as follows:
where a sample belongs to a parametric family. Then due to Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29], the N-RR statistic can be expressed as:
where
gives the information matrix for the grouped data, and
then
where
is the MLE of
and the estimated Fisher information is
. The
statistic has
degrees of freedom and follows the
distribution. Consider a set of observations
that are collected in
(these
subintervals are mutually disjoint:
). The intervals
’s limits for
are determined as follows:
and
The vector of frequencies is created by dividing the data into intervals of
,
, where:
In this work, we construct an N-RR test statistic as a modified goodness-of-fit test to see if the used data is distributed in line with the QPE model in the case of an unknown parameter .
6. Uncensored Distributional Validation of QPE Model
This section is allocated for the uncensored distributional validation of QPE model; the uncensored distributional validation is performed by focusing on both sides of statistical modeling (simulation studies and applications on real data).
6.1. Uncensored Simulation Study
To verify the assertions made in this paper, we carried out a thorough investigation utilizing numerical simulation. In order to test the null hypothesis,
, that the sample belongs to the QPE model, we thus created the N statistics of
simulated samples with
, and
. For different theoretical levels
, we calculate the average of the non-rejection numbers for the null hypothesis
.
Table 10 shows the corresponding empirical and theoretical levels. It is clear that the determined empirical level value and its equivalent theoretical level value are fairly similar. As a result, we draw the conclusion that the suggested test is excellent for the QPE distribution.
6.2. Uncensored Real-Life Applications
6.2.1. Uncensored Strengths of Glass Fibers Data Modeling
The data set was created by researchers at the UK National Physical Laboratory and consists of
measurements of the strengths of
glass fibers (see Smith and Naylor [
35]). Assuming that our QPE model can match the strengths of
glass fibers data, we can determine the MLE using the BB technique as
. The estimated Fisher information may be written as follows using the value:
After calculating, we provide for the crucial value and for the N-RR test statistic, respectively. We can state that our QPE model can satisfactorily represent data of 1.5 cm glass fibers.
6.2.2. Uncensored Gene Expression Breast Cancer Data Modeling
We use breast cancer gene expression data, specifically gene expression from breast tumors, to demonstrate the utility of our QPE model. This information may be found in R under the “breastCancerNKI” package. An eSet containing the gene expression data from a study on breast cancer (see Yousof et al. [
36]). The MLE using the BB algorithm and “breastCancerNKI” package is
. The estimated Fisher information is given by
After calculation, the crucial value is and the N-RR test statistic is . We can assert that the gene expression data for breast cancer can effectively match our QPE model.
6.2.3. Uncensored Breaking Stress of Carbon Fibers Data Modeling
This data collection contains
carbon fiber fracture stresses (in Gba) from Nichols and Padgett [
37] explanation. Assuming that our QPE model can fit the strength data of
glass fiber, we can obtain the ML using the BB algorithm is
. Using the value, we calculate and obtain the Fisher information as follows:
After calculation, the critical values for the N-RR statistical test were and . We can be sure that the glass fiber data can be accurately simulated by our QPE distribution.
6.2.4. Uncensored Heat Exchanger Tube Crack Data Modeling
The crack data was taken from Meeker and Escobar [
38] and comprises of testing performed until fractures appeared in
comparable turbine components at
predetermined intervals.
Time of inspection: | 186, 606, 902, 1077, 1209, 1377, 1592, 1932 |
Number of fans found to have cracks: | 5, 16, 12, 18, 18, 2, 6, 17 |
Using previously collected N-RR statistics, we tested the null hypothesis that our QPE distribution has no effect on these data. Using R programming and the BB approach, we computed the MLE
(see Ravi (2009)). Here, the estimated Fisher information is:
Then, is the result we get. The critical value for significance level is set at and Since the N-RR statistic for this model () is smaller than the critical value, we can say that the data accurately match the QPE model.
7. Censored Distributional Validation of QPE Model
When the parameters are unknown and the data are censored, we use the test statistic based on a variation of the N-RR statistic to confirm the adequacy of the QPE model. We adjust the Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29]. Bagdonavičius and Nikulin [
30] and Bagdonavičius et al. [
31] test for a QPE model since the failure rate follows a QPE distribution. Consider,
The QPE distribution’s survival function and cumulative hazard function are as follows:
and
With this selection of intervals, we have a constant value of
for every
. Intervals can be computed repeatedly since the inverse hazard function of the QPE distribution lacks a defined shape. Let
where:
The estimated value of
may be expressed as follows if
is the
element in the ordered statistics
and if:
where
is the inverse of the cumulative hazard function
and
where
are random data functions, and the
predicted failure rates for the k chosen periods are equal. The modified N-RR test statistic can then be expressed as:
where
, and
reflect the total number of failures that have been observed over these times, which can be used to test for hypothesis
. The test statistic from Nikulin [
27] and Nikulin [
28]), and Rao and Robson [
29]. Bagdonavičius and Nikulin [
30] and Bagdonavičius et al. [
31] is written as follows:
where
and
For the QPE model, we compute each element of the statistic. The chi-square limit distribution of the statistic has a degree of freedom of . If is not degenerate, . If (where is the quantile of chi-square with degrees of freedom), the estimated significance threshold is rejected. If this is the case, the hypothesis is rejected. The major element of the test statistic of the QPE model is , which is easy to be derived.
7.1. Censored Example via Simulation Study under the N-RR Statistic
In this section, a censored simulation under the N-RR statistics
is envisaged under a generated sample
and censored at
with
grouping intervals. We calculated the average value of the null hypothesis’ non-rejection numbers for different theoretical levels
, where
.
Table 11, which compares the theoretical and empirical levels, shows how closely the value of the calculated empirical level equals the value of the corresponding theoretical level. As a result, we infer that the custom test is perfectly matched to the QPE model.
These results lead us to the conclusion that the empirical significance level of the statistics, at which it is statistically significant, corresponds to the theoretical level of the chi-square distribution on degrees of freedom. This suggests that the proposed test can successfully fit the censored data obtained from the QPE distribution.
7.2. Censored Applications under the N-RR Statistics
Modeling right-censored data using probability distributions refers to the process of fitting a statistical distribution to data that have some observations that are only partially observed or censored. Right-censored data are data where the censoring occurs at the right tail of the distribution, meaning that the exact value of the censored observations is not known, but only that it is greater than a certain threshold. In survival analysis, right-censored data are often encountered when studying time-to-event outcomes, such as the time until a disease progresses, the time until failure of a mechanical component, or the time until a customer churns. There are various probability distributions that can be used to model right-censored data, including the exponential, Weibull, log-normal, and log-logistic distributions. The choice of distribution depends on the nature of the data, the underlying assumptions about the population, and the scientific questions being asked. Once a distribution has been chosen, the parameters of the distribution can be estimated from the data using maximum likelihood estimation. This involves finding the parameter values that maximize the likelihood of observing the data given the distribution. Once the parameters have been estimated, the distribution can be used to make inferences about the population, such as calculating survival probabilities, hazard rates, and median survival times. In conclusion, modeling right-censored data using probability distributions is a powerful tool for making inferences about time-to-event outcomes in survival analysis. It allows researchers to account for the censoring information and make more informed decisions based on the data available.
7.2.1. Example 1: Lung Cancer Data
Consider the censored lung cancer data of the North Central Cancer Treatment Group where
and the number of censored observations = 63 (see Yousof et al. [
39] and Emam et al. [
3]). By using the maximum likelihood estimation method and making the assumption that the data are distributed according to the QPE distribution, we can estimate the vector parameter Y as follows:
. We use
as the number of classes. The test statistic
items are displayed as follows:
| 92.086 | 171.584 | 216.125 | 283.169 | 355.404 | 456.477 | 685.183 | 1022.3174 |
| 29 | 30 | 35 | 31 | 32 | 25 | 28 | 18 |
| 2.1904 | 2.1904 | 2.1904 | 2.1904 | 2.1904 | 2.1904 | 2.1904 | 2.1904 |
The estimated information
and the estimated information
via Fisher are as follows:
| 0.4751 | 0.6044 | −0.8376 | 0.7715 | 0.8731 | −0.0039 | 0.9222 | 0.2467 |
and
. The critical value for the chi-squared test is
Using the previous results, we realize that the estimated statistic for the recommended test is
. Because the tabular value of the
statistic is higher than the calculated value, we can say that our hypothesis
is accepted. In light of all this, we conclude that there is a
probability that the data on lung cancer will differ from the QPE distribution. Therefore, we can accept the null hypothesis that the data of times to infection of lung cancer follows the QPE distribution.
7.2.2. Example 2: Capacitor Reliability Data
Consider the censored reliability data of Yousof et al. [
39] and Emam et al. [
3] where
and number of censored observations =
. Assuming that the data are distributed using the QPE distribution, the maximum likelihood estimator
is
with
. The statistical test
has the following components:
| 346.1486 | 469.347 | 587.697 | 679.108 | 1078.874 | 1089.357 | 1102.161 | 1106.444 |
| 11 | 15 | 6 | 10 | 6 | 5 | 6 | 5 |
| 3.62266 | 3.62266 | 3.62266 | 3.62266 | 3.62266 | 3.62266 | 3.62266 | 3.62266 |
The estimated information
and Fisher’s estimated matrix
are:
| 0.40132 | 0.64441 | −0.96358 | −0.73841 | 0.26485 | 0.83212 | 0.60845 | 0.97468 |
and
. The value of the statistical test
is determined.
is the crucial value. We arrive at the conclusion that the life statistics for glass capacitors is updated using the QPE model. Therefore, we can accept the null hypothesis that the data of capacitor follows the QPE distribution.
7.2.3. Example 3: Aluminum Reduction Cells Data
The data of Whitmore [
40] considered the times of failures for
aluminum reduction cells, and the numbers of failures in
days units were: 0.468, 0.725, 0.838, 0.853, 0.965, 1.139, 1.142, 1.304, 1.317, 1.427, 1.554, 1.658, 1.764, 1.776, 1.990, 2.010, 2.224, 2.279*, 2.244*, 2.286* (the value where “*” refers to the censored observations). Assuming that these data are distributed in accordance with the QPE distribution, the maximum likelihood estimator
is
. We select
the number of classes. The element of the test statistic
are given as:
| 0.9603 | 1.19069 | 1.7004 | 2.2945 |
| 4 | 3 | 5 | 8 |
| 1.3792 | 1.3792 | 1.3792 | 1.3792 |
| 0.39477 | 0.38625 | 0.29814 | 0.84079 |
The estimated information
and Fisher’s estimated matrix
are:
| 0.39477 | 0.38625 | 0.29814 | 0.84079 |
and
. Then, we can consider the value of the test statistic
The critical value is
; we decided that the data of aluminum reduction cells is in concord with the QPE model. Therefore, we can accept the null hypothesis that the data of aluminum reduction cells follows the QPE distribution.
7.2.4. Example 4: Cancer Data
The data considered below (was conducted by the Northern California oncology group) was used by Efron [
41] for logistic distribution. The survival times in days for the patients (
) are (7, 34, 42, 63, 64, 74*, 83, 84, 91, 108, 112, 129, 133, 133, 139, 140, 140, 146, 149, 154, 157, 160, 160, 165, 173, 176, 185*, 218, 225, 241, 248, 273, 277, 279*, 297, 319*, 405, 417, 420, 440, 523*, 523, 583, 594, 1101, 1116*, 1146, 1226*, 1349*, 1412*, 1417). We use the data after reworking the survival times in months (where one month=
day). The maximum likelihood estimator
. We select
as the number of intervals. The elements of the test statistic
have been presented as follows:
| 2.753 | 5.100 | 9.779 | 21.004 | 37.183 | 44.457 | 46.903 |
| 7 | 7 | 20 | 10 | 2 | 3 | 2 |
| 1.9788 | 1.9788 | 1.9788 | 1.9788 | 1.9788 | 1.9788 | 1.9788 |
The estimated information
and Fisher’s estimated matrix
are:
| 0.1975 | 0.4437 | 0.7948 | 0.3367 | −0.8791 | −0.4777 | 0.881 |
and
. After calculating, we find
The critical value
; this dataset can be properly modeled by means of our QPE model. Therefore, we can accept the null hypothesis that the arm-head and neck cancer data follow the QPE distribution. However many real datasets can be obtained from Alizadeh et al. [
42], Merovci et al. ([
43,
44]), Elgohari et al. [
45], Yousof et al. [
46],
8. Conclusions
The quasi-Poisson exponential (QPE) model, a new adaptable variation of the exponentiated exponential model, is introduced and studied in this article. We examine, describe, and apply six established estimating techniques. When modelling data sets for relief times and survival times, the new model performs better than many existing comparable models. However, the reader can read more results by reading the article and the applications; the following results can be highlighted in particular:
Despite the variety and richness of the other classic approaches, the maximum likelihood method is still the most efficient and reliable of the surviving classic methods. For statistical modeling and applications, the Bayesian technique and the Maximum Likelihood method are advised.
The proposed lifetime quasi-Poisson exponential model performs considerably better than the other discussed models in modelling the asymmetric bimodal right skewed relief data with 36.198, 38.19, 36.904, 36.587, = 0.2891, = 0.0494, K.S = 0.12901, and p-value = 0.8932, thus, the new lifetime model is a good alternative to these models in modeling the relief times data set. As is clear from these results, the new distribution showed its superiority over all the competing distributions.
The proposed lifetime quasi-Poisson exponential model outperforms all previous specified models in describing the asymmetric bimodal right skewed reliability/survival data with 202.08, 206.6, 202.258, 203.897, = 0.562, = 0.0949, K.S = 0.0792, and p-value = 0.7572, thus, the new lifetime model is a good alternative to these models in modeling the relief times data set. In view of these results, the new distribution shows its superiority over all the competing distributions.
Two real data applications are evaluated in a censored scenario; the first data is reliability data on capacitors, and the second data is information about lung cancer (medical data). We came to the conclusion that the suggested test can successfully fit censored data from the quasi-Poisson exponential distribution as a result of these applications.
For the uncensored distributional validation under the QPE model, we have the following results:
For the uncensored strengths of glass fibers data: we can state that our quasi-Poisson-exponential model can satisfactorily represent the uncensored data of 1.5 cm glass fibers.
For the uncensored gene expression breast cancer data: we can assert that the uncensored gene expression data for breast cancer can effectively match our QPE model.
For the uncensored breaking stress of carbon fibers data: we can be sure that the uncensored glass fiber data can be accurately simulated by our quasi-Poisson exponential distribution.
For the uncensored heat exchanger tube crack data: we can state that our quasi-Poisson exponential model can satisfactorily represent the uncensored heat exchanger tube crack data.
For the censored distributional validation under the QPE model, we have the following results:
For the censored lung cancer data: we can state that the quasi-Poisson exponential model can satisfactorily represent the censored lung cancer data.
For the censored reliability data: we can say that our quasi-Poisson exponential model can satisfactorily represent the censored reliability data.
For the censored reduction cells data: we can claim that the quasi-Poisson exponential model can satisfactorily represent the censored reduction cells data.
For the censored reduction cells data: we can state that the quasi-Poisson exponential model can satisfactorily represent the censored reduction cells data.
Author Contributions
H.M.Y.: review and editing, software, validation, writing the original draft preparation, conceptualization, supervision. H.G.: validation, conceptualization, software. W.E.: validation, writing the original draft preparation, conceptualization, data curation, formal analysis, software. Y.T.: methodology, conceptualization, software. M.A.: conceptualization, review and editing. M.M.A.: review and editing, conceptualization, supervision. M.I.: review and editing, software, validation, writing the original draft preparation, conceptualization. All authors have read and agreed to the published version of the manuscript.
Funding
The study was funded by Researchers Supporting Project number (RSP2023R488), King Saud University, Riyadh, Saudi Arabia.
Data Availability Statement
The dataset can be provided upon requested.
Acknowledgments
The study was supported by Researchers Supporting Project number (RSP2023R488), King Saud University, Riyadh, Saudi Arabia.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Aryal, G.R.; Yousof, H.M. The exponentiated generalized-G Poisson family of distributions. Stoch. Qual. Control. 2017, 32, 7–23. [Google Scholar] [CrossRef]
- Yousof, H.M.; Ali, M.M.; Goual, H.; Ibrahim, M. A new reciprocal Rayleigh extension: Properties, copulas, different methods of estimation and a modified right-censored test for validation. Stat. Transit. New Ser. 2021, 23, 99–121. [Google Scholar] [CrossRef]
- Emam, W.; Tashkandy, Y.; Goual, H.; Hamida, T.; Hiba, A.; Ali, M.M.; Yousof, H.M.; Ibrahim, M. A New One-Parameter Distribution for Right Censored Bayesian and Non-Bayesian Distributional Validation under Various Estimation Methods. Mathematics 2023, 11, 897. [Google Scholar] [CrossRef]
- Ibrahim, M.; Altun, E.; Goual, H.; Yousof, H.M. Modified goodness-of-fit type test for censored validation under a new Burr type XII distribution with different methods of estimation and regression modeling. Eurasian Bull. Math. 2020, 3, 162–182. [Google Scholar]
- Al-Babtain, A.A.; Elbatal, I.; Yousof, H.M. A new flexible three-parameter model: Properties, clayton copula, and modeling real data. Symmetry 2020, 12, 440. [Google Scholar] [CrossRef] [Green Version]
- Shehata, W.A.M.; Butt, N.S.; Yousof, H.; Aboraya, M. A New Lifetime Parametric Model for the Survival and Relief Times with Copulas and Properties. Pak. J. Stat. Oper. Res. 2022, 18, 249–272. [Google Scholar] [CrossRef]
- Wang, B.; Yu, K.; Jones, M. Inference under progressively type II right-censored sampling for certain lifetime distributions. Technometrics 2010, 52, 453–460. [Google Scholar] [CrossRef] [Green Version]
- Wang, P.; Tang, Y.; Bae, S.J.; He, Y. Bayesian analysis of two-phase degradation data based on change-point Wiener process. Reliab. Eng. Syst. Saf. 2018, 170, 244–256. [Google Scholar] [CrossRef]
- Zhang, L.; Xu, A.; An, L.; Li, M. Bayesian inference of system reliability for multicomponent stress-strength model under Marshall-Olkin Weibull distribution. Systems 2022, 10, 196. [Google Scholar] [CrossRef]
- Xu, A.; Zhou, S.; Tang, Y. A unified model for system reliability evaluation under dynamic operating conditions. IEEE Trans. Reliab. 2021, 70, 65–72. [Google Scholar] [CrossRef]
- Aboraya, M. A new extremely flexible Vversion of the exponentiated Weibull model: Theorem and applications to reliability and medical data sets. Pak. J. Stat. Oper. Res. 2019, 15, 195–215. [Google Scholar]
- Aboraya, M. A new flexible lifetime model with statistical properties and applications. Pak. J. Stat. Oper. Res. 2018, 14, 881–901. [Google Scholar] [CrossRef] [Green Version]
- Aboraya, M. A New One-parameter G Family of Compound Distributions: Copulas, Statistical Properties and Applications. Stat. Optim. Inf. Comput. 2021, 9, 942–962. [Google Scholar] [CrossRef]
- Refaie, M.K.A. A new two-parameter exponentiated Weibull model with properties and applications to failure and survival times. Int. J. Math. Arch. 2019, 10, 1–13. [Google Scholar]
- Refaie, M. A New Family of Continuous Distributions: Properties, Copulas and Real-Life Data Modeling. Stat. Optim. Inf. Comput. 2021, 9, 748–768. [Google Scholar] [CrossRef]
- Refaie, M.K. Extended Poisson-exponentiated Weibull distribution: Theoretical and computational aspects. Pak. J. Statist. 2018, 34, 513–530. [Google Scholar]
- Refaie, M.K. Burr X exponentiated exponential distribution. J. Stat. Appl. 2018, 1, 71–88. [Google Scholar]
- Refaie, M. A New Compound Generalization of the Lomax Lifetime Model: Properties, Copulas and Modeling Real Data. Stat. Optim. Inf. Comput. 2022, 10, 484–504. [Google Scholar] [CrossRef]
- Refaie, M.K.A.; Butt, N.S.; Ali, E.I.A. A New Reciprocal System of Burr Type X Densities with Applications in Engineering, Reliability, Economy, and Medicine. Pak. J. Stat. Oper. Res. 2023, 19. forthcoming. [Google Scholar]
- Korkmaz, M.C.; Altun, E.; Yousof, H.M.; Hamedani, G.G. The Hjorth’s IDB Generator of Distributions: Properties, Characterizations, Regression Modeling and Applications. J. Stat. Theory Appl. 2020, 19, 59–74. [Google Scholar] [CrossRef] [Green Version]
- Karamikabir, H.; Afshari, M.; Yousof, H.M.; Alizadeh, M.; Hamedani, G. The Weibull Topp-Leone Generated Family of Distributions: Statistical Properties and Applications. J. Iran. Stat. Soc. 2020, 19, 121–161. [Google Scholar] [CrossRef]
- Khalil, M.G.; Ali, E.I.A. A Generalization of Burr Type XII Distribution with Properties, Copula and Modeling Symmetric and Skewed Real Data Sets. Pak. J. Stat. Oper. Res. 2023, 19, 77–101. [Google Scholar]
- Eliwa, M.S.; El-Morshedy, M.; Sajid, A. Exponentiated odd Chen-G family of distributions: Statistical properties, Bayesian and non-Bayesian estimation with applications. J. Appl. Stat. 2020, 48, 1948–1974. [Google Scholar] [CrossRef] [PubMed]
- El-Morshedy, M.; Eliwa, M.S. The odd flexible Weibull-H family of distributions: Properties and estimation with applications to complete and upper record data. Filomat 2019, 33, 2635–2652. [Google Scholar] [CrossRef]
- Salem, M.; Khalil, M.G. Short-Term Insurance Claims Payments Forecasting with Holt-Winter Filtering and Residual Analysis. Pak. J. Stat. Oper. Res. 2023, 19, 167–186. [Google Scholar] [CrossRef]
- Zárate, H.; Cepeda, E. Semiparametric Smoothing Spline in Joint Mean and Dispersion Models with Responses from the Biparametric Exponential Family: A Bayesian Perspective. Stat. Optim. Inf. Comput. 2021, 9, 351–367. [Google Scholar] [CrossRef]
- Nikulin, M.S. Chi-squared test for continuous distributions with shift and scale parameters. Theory Probab. Its Apl. 1973, 18, 559–568. [Google Scholar] [CrossRef]
- Nikulin, M.S. On a Chi-squared test for continuous distributions. Theory Probab. Its Apl. 1973, 19, 638–639. [Google Scholar]
- Rao, K.C.; Robson, D.S. A Chi-square statistic for goodness-of-fit tests within the exponential family. Commun. Stat. 1974, 3, 1139–1153. [Google Scholar] [CrossRef]
- Bagdonavičius, V.; Nikulin, M. Chi-squared goodness-of-fit test for right censored data. Int. J. Appl. Math. Stat. 2011, 24, 30–50. [Google Scholar]
- Bagdonavičius, V.; Levuliene, R.J.; Nikulin, M. Chi-squared goodness-of-fit tests for parametric accelerated failure time models. Commun. Stat.-Theory Methods 2013, 42, 2768–2785. [Google Scholar] [CrossRef]
- Chaturvedi, A.; Kumar, S. Estimation and Testing Procedures for the Reliability Characteristics of Chen Distribution Based on Type II Censoring and the Sampling Scheme of Bartholomew. Stat. Optim. Inf. Comput. 2020, 9, 99–122. [Google Scholar] [CrossRef]
- Noughabi, H.A. Testing the Validity of Laplace Model Against Symmetric Models, Using Transformed Data. Stat. Optim. Inf. Comput. 2022, 10, 1162–1167. [Google Scholar] [CrossRef]
- Noughabi, H.A. Testing the Validity of Lindley Model Based on Informational Energy with Application to Real Medical Data. Stat. Optim. Inf. Comput. 2022, 10, 372–382. [Google Scholar] [CrossRef]
- Smith, R.L.; Naylor, J.C. A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. Appl. Stat. 1987, 36, 358–369. [Google Scholar] [CrossRef]
- Yousof, H.M.; Al-nefaie, A.H.; Aidi, K.; Ali, M.M.; Ibrahim, M. A Modified Chi-square Type Test for Distributional Validity with Applications to Right Censored Reliability and Medical Data. Pak. J. Stat. Oper. Res. 2021, 17, 1113–1121. [Google Scholar] [CrossRef]
- Nichols, M.D.; Padgett, W.J. A Bootstrap control chart for Weibull percentiles. Qual. Reliab. Eng. Int. 2006, 22, 141–151. [Google Scholar] [CrossRef]
- Meeker, W.Q.; Escobar, L.A. Statistical Methods for Reliability Data; Wiley: New York, NY, USA, 1998. [Google Scholar]
- Yousof, H.M.; Tashkandy, Y.; Emam, W.; Ali, M.M.; Ibrahim, M. A New Reciprocal Weibull Extension for Modeling Extreme Values with Risk Analysis Under Insurance Data. Mathematics 2023, 11, 966. [Google Scholar] [CrossRef]
- Whitmore, G.A. A regression method for censored inverse-gaussian data. Can. J. Stat. 1983, 11, 305–315. [Google Scholar] [CrossRef]
- Efron, B. Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve. J. Am. Stat. Assoc. 1988, 83, 414–425. [Google Scholar] [CrossRef]
- Alizadeh, M.; Yousof, H.M.; Jahanshahi, S.M.A.; Najibi, S.M.; Hamedani, G.G. The transmuted odd log-logistic-G family of distributions. J. Stat. Manag. Syst. 2020, 23, 761–787. [Google Scholar] [CrossRef]
- Merovci, F.; Alizadeh, M.; Yousof, H.M.; Hamedani, G.G. The exponentiated transmuted-G family of distributions: Theory and applications. Commun. Stat.-Theory Methods 2017, 46, 10800–10822. [Google Scholar] [CrossRef]
- Merovci, F.; Yousof, H.M.; Hamedani, G.G. The Poisson Topp Leone Generator of Distributions for Lifetime Data: Theory, Characterizations and Applications. Pak. J. Stat. Oper. Res. 2020, 16, 343–355. [Google Scholar] [CrossRef]
- Elgohari, H. A New Version of the Exponentiated Exponential Distribution: Copula, Properties and Application to Relief and Survival Times. Stat. Optim. Inf. Comput. 2021, 9, 311–333. [Google Scholar] [CrossRef]
- Yousof, H.M.; Aidi, K.; Hamedani, G.; Ibrahim, M. A new parametric lifetime distribution with modified Chi-square type test for right censored validation, characterizations and different estimation methods. Pak. J. Stat. Oper. Res. 2021, 17, 399–425. [Google Scholar] [CrossRef]
Figure 1.
PDF (the left panel) and HRF (the right panel) for the QPE model.
Figure 1.
PDF (the left panel) and HRF (the right panel) for the QPE model.
Figure 2.
The box pot (top left), Qu–Qu plot (top right), TTT (middle left), N-KDE (middle right), Frey (bottom left), and scattergram (bottom right) for the relief times.
Figure 2.
The box pot (top left), Qu–Qu plot (top right), TTT (middle left), N-KDE (middle right), Frey (bottom left), and scattergram (bottom right) for the relief times.
Figure 3.
E-PDF (left panel) and E-CDF (right panel) for relief times data.
Figure 3.
E-PDF (left panel) and E-CDF (right panel) for relief times data.
Figure 4.
Kaplan–Meier survival plot (left panel) and P–P plot (right panel) for relief times data.
Figure 4.
Kaplan–Meier survival plot (left panel) and P–P plot (right panel) for relief times data.
Figure 5.
The box pot (top left), Qu–Qu plot (top right), TTT (bottom left), N-KDE (bottom right), Frey (bottom left), and scattergram (bottom right) for the survival times.
Figure 5.
The box pot (top left), Qu–Qu plot (top right), TTT (bottom left), N-KDE (bottom right), Frey (bottom left), and scattergram (bottom right) for the survival times.
Figure 6.
E-PDF (left panel) and E-CDF (right panel) for survival times data.
Figure 6.
E-PDF (left panel) and E-CDF (right panel) for survival times data.
Figure 7.
Kaplan–Meier survival plot (left panel) and P–P plot (right panel) for survival times data.
Figure 7.
Kaplan–Meier survival plot (left panel) and P–P plot (right panel) for survival times data.
Table 1.
MSEs under β = 0.3.
Table 1.
MSEs under β = 0.3.
n | MLE | OLS | WLS | CVM | Moment | KE |
---|
50 | 0.25030 | 0.24381 | 0.27054 | 0.23614 | 0.33289 | 0.27121 |
100 | 0.12121 | 0.11887 | 0.12193 | 0.11704 | 0.15141 | 0.12384 |
200 | 0.06021 | 0.06017 | 0.06822 | 0.06121 | 0.07235 | 0.06315 |
300 | 0.04003 | 0.04002 | 0.04559 | 0.04218 | 0.04965 | 0.04224 |
500 | 0.02354 | 0.02411 | 0.02665 | 0.02504 | 0.03126 | 0.02509 |
Table 2.
MSEs under β = 1.2.
Table 2.
MSEs under β = 1.2.
n | MLE | OLS | WLS | CVM | Moment | KE |
---|
50 | 0.25532 | 0.25173 | 0.26247 | 0.27340 | 0.31672 | 0.27380 |
100 | 0.13060 | 0.13337 | 0.14453 | 0.12982 | 0.15810 | 0.14834 |
200 | 0.06509 | 0.06791 | 0.07288 | 0.06506 | 0.07908 | 0.07320 |
300 | 0.04236 | 0.04155 | 0.04517 | 0.04380 | 0.05097 | 0.04436 |
500 | 0.02530 | 0.02624 | 0.02824 | 0.02728 | 0.03223 | 0.02746 |
Table 3.
MSEs under β = 2.5.
Table 3.
MSEs under β = 2.5.
n | MLE | OLS | WLS | CVM | Moment | KE |
---|
50 | 0.31999 | 0.33138 | 0.32409 | 0.36573 | 0.37736 | 0.37287 |
100 | 0.16430 | 0.18758 | 0.18922 | 0.17463 | 0.18264 | 0.20350 |
200 | 0.07683 | 0.08428 | 0.08907 | 0.08744 | 0.09733 | 0.09216 |
300 | 0.05202 | 0.05647 | 0.05943 | 0.05968 | 0.06420 | 0.06269 |
500 | 0.03047 | 0.03702 | 0.03684 | 0.03819 | 0.03966 | 0.03969 |
Table 4.
p-values for comparing methods under the relief data.
Table 4.
p-values for comparing methods under the relief data.
Method | β | K.S | p Value |
---|
ML | −3.74140 | 0.24686 | 0.17462 |
LS | −2.92506 | 0.20278 | 0.38332 |
WLS | −3.79664 | 0.25135 | 0.15970 |
CVM | −2.94830 | 0.20094 | 0.39460 |
Moment | −2.41894 | 0.24598 | 0.17768 |
KE | −2.77326 | 0.21510 | 0.31302 |
Table 5.
p-values for comparing methods with the relief/survival data.
Table 5.
p-values for comparing methods with the relief/survival data.
Method | β | K.S | p Value |
---|
ML | −2.09206 | 0.09067 | 0.59476 |
LS | −1.90356 | 0.10705 | 0.38132 |
WLS | −2.16387 | 0.08551 | 0.66842 |
CVM | −1.90964 | 0.10651 | 0.38759 |
Moment | −1.88017 | 0.10914 | 0.35771 |
KE | −2.08576 | 0.09119 | 0.58722 |
Table 6.
MLEs and SE for the relief times data.
Table 6.
MLEs and SE for the relief times data.
Models | MLE and StErs | The MLEs (Corresponding SEs) |
---|
| MLE | 0.52555 |
| StErs | (0.1172) |
| MLE | 0.60444 |
| StErs | (0.0536) |
| MLE | 0.95042 |
| StErs | (0.15041) |
| MLE | 0.52635 |
| StErs | (0.11833) |
| MLE | 54.474, 2.316 |
| StErs | (35.582), (0.374) |
| MLE | 81.633, 0.5419, 3.5138 |
| StErs | (120.411), (0.336), (1.414) |
| MLE | 0.1342, 33.251, 0.571, 1.666 |
| StErs | (0.333), (57.86), (0.667), (1.881) |
| MLE | 8.868, 34.827, 0.2988, 4.8978 |
| StErs | (9.15), (22.31), (0.24), (3.18) |
| MLE | 1.1635, 0.3207 |
| StErs | (0.334), (0.036) |
| MLE | −3.740625 |
| StErs | (1.036281) |
Table 7.
AD and CVM for relief times.
Table 7.
AD and CVM for relief times.
Models | AD | CVMS |
---|
E | 4.603 | 0.962 |
OLE | 1.347 | 0.222 |
ME | 2.764 | 0.529 |
BHE | 0.624 | 0.105 |
MOE | 0.849 | 0.144 |
BE | 0.738 | 0.124 |
MOKE | 0.655 | 0.128 |
KMOE | 1.189 | 0.195 |
BXE | 1.396 | 0.257 |
Table 8.
MLEs, StErs, and CIs for the survival times data.
Table 8.
MLEs, StErs, and CIs for the survival times data.
Models | MLE and StErs | The MLEs (Corresponding SEs) |
---|
| MLE | 0.5401 |
| StErs | (0.0631) |
| MLE | 0.38153 |
| StErs | (0.0212) |
| MLE | 0.92534 |
| StErs | (0.0777) |
| MLE | 0.54419 |
| StErs | (0.0644) |
| MLE | 8.783, 1.381 |
| StErs | (3.559), (0.188) |
| MLE | 0.18, 47.64, 4.47 |
| StErs | (0.072), (44.91), (1.334) |
| MLE | 3.3041, 1.1002, 1.0372 |
| StErs | (1.1064), (0.763), (0.614) |
| MLE | 0.8074, 3.4616, 1.33139 |
| StErs | (0.69614), (1.003), (0.862) |
| MLE | 0.0083, 2.716, 1.986, 0.099 |
| StErs | (0.003), (1.3163), (0.784), (0.05) |
| MLE | 0.3732, 3.4783, 3.306, 0.2991 |
| StErs | (0.135), (0.863), (0.779), (1.112) |
| MLE | 0.48437, 0.21349 |
| StErs | (0.06144), (0.01229) |
| MLE | −2.19453 |
| StErs | (0.45686) |
Table 9.
AD and CVM for survival times.
Table 9.
AD and CVM for survival times.
Models | AD | CVMS |
---|
E | 6.531 | 1.253 |
OLE | 1.944 | 0.334 |
ME | 1.523 | 0.256 |
BHE | 0.714 | 0.115 |
MOE | 1.185 | 0.177 |
GMOE | 1.025 | 0.168 |
KE | 0.745 | 0.157 |
BE | 0.977 | 0.152 |
MOKE | 0.791 | 0.137 |
KMOE | 0.614 | 0.149 |
BXE | 2.955 | 0.518 |
QPE | 0.589 | 0.098 |
Table 10.
Empirical levels and corresponding theoretical levels and .
Table 10.
Empirical levels and corresponding theoretical levels and .
n↓ & ε→ | ε1 = 1% | ε2 = 2% | ε3 = 5% | ε4 = 10% |
---|
n1 = 25 | 0.9935 | 0.9825 | 0.9525 | 0.9031 |
n2 = 50 | 0.9922 | 0.9818 | 0.9518 | 0.9021 |
n3 = 150 | 0.9920 | 0.9811 | 0.9513 | 0.9011 |
n4 = 350 | 0.9907 | 0.9807 | 0.9507 | 0.9009 |
n5 = 600 | 0.9904 | 0.9803 | 0.9506 | 0.9006 |
n6 = 1000 | 0.9901 | 0.9802 | 0.9503 | 0.9004 |
Table 11.
Simulation results for the empirical levels versus the corresponding theoretical levels for and .
Table 11.
Simulation results for the empirical levels versus the corresponding theoretical levels for and .
n↓ & ε→ | ε1 = 1% | ε2 = 2% | ε3 = 5% | ε4 = 10% |
---|
n1 = 25 | 0.9930 | 0.9828 | 0.9529 | 0.9026 |
n2 = 50 | 0.9928 | 0.9814 | 0.9522 | 0.9014 |
n3 = 150 | 0.9921 | 0.9811 | 0.9512 | 0.9010 |
n4 = 350 | 0.9915 | 0.9804 | 0.9504 | 0.9007 |
n5 = 600 | 0.9905 | 0.9802 | 0.9503 | 0.9004 |
n6 = 1000 | 0.9903 | 0.9801 | 0.9501 | 0.9002 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).