1. Introduction
Data fitting is an essential procedure in statistical analysis to ensure the precision and dependability of outcomes. Recently, there has been a growing interest in developing innovative models for data fitting, particularly within the unit interval. This statistical approach aims to address the challenges associated with manipulating data within a specific range and offers a more robust framework for analysis. In this manuscript, we present a groundbreaking model for data fitting within the unit interval, which exhibits promising potential for enhancing the quality of statistical inferences.
The proposed distribution is particularly useful for modeling data within the unit interval , making it relevant in various areas of life. For instance, in finance, this distribution can be used to model the probability of investment returns, especially in assets with variable risk or volume. In environmental science, it can be applied to model proportion data, such as species coverage in an ecosystem. Additionally, in public health, this distribution can be utilized to analyze data on the incidence or prevalence rates of diseases in a population.
The proposed model is based on the transformation of a random variable that adheres to a truncated positive normal distribution [
1]. Utilizing a truncated positive normal (TPN) distribution as the foundation of the suggested model offers several advantages. Firstly, this distribution allows for greater flexibility in data modeling and analysis compared to the conventional semi-normal distribution. Secondly, the proposed model incorporates an additional shape parameter that further enhances its flexibility and enables a more optimal fit. Among the models associated with the normal distribution, the half-normal (HN) model is notable. It emerges as a specific instance within the context of the TPN, where the shape parameter equals zero. This property renders the TPN a more adaptable model in contrast to the HN.
The objective of this study is to propose a flexible distribution for modeling real-world data with support in the interval (0, 1) using the exponential transformation , where Z follows a TPN distribution. Models to which this transformation is applied are called unitary distributions. Various alternative transformations can be employed to achieve a unitary distribution, such as , , , and , where . In the current study, the proposed transformation yields closed-form expressions and a simple structure for the X distribution. Furthermore, this transformation provides statistical properties related to the simple closed-form distribution. For instance, the moment-generating function remains closed, unlike with other transformations.
This range encompasses well-established models, such as the beta and Kumaraswamy distributions. Additionally, the literature presents various unitary distribution models, including the Topp–Leone distribution [
2]: unit-gamma [
3], log-Lindley [
4], two-parameter unit-logistic [
5], two-parameter unit-Birnbaum–Saunders [
6], unit-Weibull [
7], unit-Gompertz [
8], unit-inverse Gaussian [
9], unit modified Burr-III [
10], one-parameter unit-Lindley [
11], alpha-unit [
12], modified unit-half-normal [
13], unit-half-normal [
14], unit-exponential [
15] and unit upper truncated Weibull [
16].
The proposed model provides a somewhat better fit compared to these models. Another crucial feature of the model is its ability to adequately fit small data sets that exhibit extreme right skewness within the unit interval.
The proposed distribution encompasses the behavior and provides superior fits compared to some established lifetime distributions, such as the unit-logistic, beta, and Kumaraswamy distributions. The rationale for introducing the unit-truncated positive normal distribution is based on (i) its ability to model constant, increasing, or inverted risk rates, allowing it to capture different behaviors over time; (ii) suitabilitys for fitting data that are skewed and may not adequately fit other common distributions and its applicability to a variety of problems in diverse fields, such as environmental studies, industrial reliability, and survivability analysis; and (iii) its favorable comparison to three alternative life distributions for testing failure and environmental data based on two practical data applications.
The following summary provides an overview of the remaining sections of the paper. The proposed distribution is presented in
Section 2 along with a discussion of its basic characteristics. In
Section 3, the expected Fisher information matrix and estimators of the unknown parameters by the maximum likelihood (ML) technique are presented.
Section 4 conducts Monte Carlo simulations to assess the effectiveness of the ML estimators and the parameters’ asymptotic confidence intervals. Two sets of real-world data are analyzed and presented in
Section 5. The paper is finally concluded in
Section 6.
2. The Model and Its Properties
In this section, we outline the proposed bounded distribution based on transforming a truncated positive normal variable, as described in [
1].
A random variable
Z follows the TPN, denoted as
, if its cumulative distribution function (CDF) is given by
where
,
is a shape parameter,
is a scale parameter, and
represents the CDF of the standard normal (SN) distribution.
Let
Z be a non-negative random variable following the TPN distribution. Its probability density function (PDF) is expressed as
where
represents the PDF of the SN distribution.
The quantile function (the inverse of CDF given in (
1)) of the variable
Z is given by
where
is the SN distribution’s quantile function.
Proposition 1. Using the transformation , we derive a new distribution with support on the interval , known as the unit-TPN distribution, denoted as . Its PDF is defined aswhere and are the parameters for shape and scale, respectively. Proof. Using the change-of-variable approach and the symmetry of the SN (
), then the random variable
takes values within the interval
and has the density function
□
The associated CDF and hazard rate (HR) functions for Equation (
3) are provided, respectively, by
2.1. Characterizations of the UTPN Distribution Based on Its Hazard Function
Characterizing a PDF using the hazard function is essential for grasping temporal event dynamics. This linkage offers a valuable understanding of how event probabilities evolve over time and with pertinent factors. Particularly in survival analysis, it aids in predicting survival probabilities, while in reliability engineering, it assists in evaluating failure rates and directing maintenance strategies. In essence, this characterization proves to be a potent instrument for analyzing time-to-event data and facilitating informed decision-making across diverse domains. Various researchers, including Glänzel [
17,
18] and Hamedani [
19], have delved into different techniques for such characterizations of continuous probability distributions.
According to Akhila et al. [
20], the PDF and the hazard function are related in the following manner:
where
is the PDF and
is the hazard function.
For the following result, we will redefine the two earlier functions of the UTPN distribution as follows: Let represent and represent .
Theorem 1. Let be a continuous random variable. Equation (3) provides the PDF of X if and only if the next differential equation is satisfied by its hazard function, : Proof. The PDF
and the hazard function
of
X are given by Equations (
3) and (
5), respectively. Then, we have
Utilizing Equation (
6), we can express
which implies
Now, given that Equation (
7) holds,
from which we derive
□
2.2. Shapes
The UTPN distribution’s PDF is unimodal and log-concave. Indeed, the second derivative of
is
Figure 1 and
Figure 2 present the different curves for the PDF and the HR function, respectively, of the UTPN distribution for varying values of
and
.
Figure 1 reveals that there are four possible behaviors for the UTPN distribution: increasing, unimodal, reversed J-shaped, and right-skewed.
Figure 2 demonstrates that the HR function of the UTPN distribution can have a bathtub-inverted shape, be increasing, or remain constant. An advantage of the UTPN distribution over the TPN distribution is that the latter is unable to describe situations with an inverted bathtub-shaped hazard function.
2.3. Quantile Function
The quantile function of the UTPN distribution is derived by inverting Equation (
4) as follows:
Note that , , and stand for median, first quartile, and third quartile of the UTPN distribution, correspondingly.
2.4. Mode
The mode of
is the root of the equation
Therefore, if
,
this means that
is the sole point where
reaches its maximum.
2.5. Hazard Rate Function
Lemma 1. Given that , for , is a twice-differentiable density function of a positive real-valued continuous random variable with an HR function , let . Consequently, (i) if decreases (increases) as x increases, then increases (decreases) as x increases, and (ii) if follows a bathtub (inverted bathtub) pattern, then will also follow a bathtub (inverted bathtub) pattern.
The proof of this result is provided by Glaser [
21]. Based on this finding, the shape of the HR function of the UTPN distribution can be inferred as follows.
Proposition 2. The HR function of the UTPN distribution exhibits an inverted bathtub shape.
Proof. Given that
it follows that
Consequently, by
, the global maximum of
is
, because
This demonstrates that
exhibits an inverted bathtub shape. Therefore, according to Glaser’s Lemma,
also exhibits an inverted bathtub shape. Additionally, with
for
,
, and
, it follows that
is an increasing function. □
2.6. Moments and Moment Generating Function
Here, we derive the expressions for the moments and moment-generating function of the distribution, which are crucial for any statistical analysis, particularly in applied research. The distribution’s moments, including its mean, variance, skewness, and kurtosis, provide insight into its most significant characteristics.
Proposition 3. If the random variable X follows a UTPN distribution, its r-th moment about zero can be calculated as Proof. Using the stochastic representation
, where
, one can write
and defining
implies
This is accomplished using some simplifications and minor algebraic manipulation. □
Corollary 1. (i) Let ; then, the mean, variance, skewness (), and kurtosis () coefficients are, respectively, given as(ii) As a result of the central limit theorem, let be independent random variables, and utilizing the identical distribution of , then, if , one haswhere In
Figure 3, the UTPN distribution’s mean, variance, skewness, and kurtosis are displayed as functions of
and
. Observations indicate that as
varies, the mean decreases independently of the values of
. In terms of variance, the curves exhibit concavity and unimodality for all
values, with values decreasing as
decreases. On the other hand, negative skewness is observed for
, while positive skewness is observed for
. In addition, smaller
values are associated with reduced skewness, while larger
values are associated with increased skewness. Finally, kurtosis decreases as
decreases, which occurs for
(approximately).
Proposition 4. If the random variable X is UTPN distributed, then its moment-generating function is given by Proof. The moment-generating function’s definition implies
using the exponential series
and taking the change of variables
□
2.7. Curves of Bonferroni and Lorenz
Bonferroni and Lorenz curves [
22] are commonly used in economics to study income and poverty, although they are also helpful in other domains, including reliability, insurance, demography, and medicine. The definition of the Bonferroni curve is
where
and
. The Lorenz curve is obtained by the expression
. Specifically, the Bonferroni curve for the UTPN distribution can be calculated as
Figure 4 illustrates the Bonferroni curve for the UTPN distribution with
, showing different values for
. It is clear that the Bonferroni value increases as
decreases. Additionally, the graph indicates that as the probability
p increases, the Bonferroni value also increases.
2.8. Entropy
A measure of the uncertainty’s variation is the entropy of a random variable
X with a certain PDF. Greater data uncertainty is indicated by a high entropy value. The Rényi entropy [
23],
, for
X is defined as
where
and
. Suppose
X has the UTPN distribution, then by substituting (
3) in (
18), we obtain
So one obtains the Rényi entropy as follows:
Shannon entropy [
24] defined by
is the particular case of Equation (
18) when
. Then calculating the
and using L’Hospital’s rule, after some algebraic work, one obtains the result that
4. Simulation Study
This section examines the performance of the MLEs and the asymptotic confidence intervals for the parameters indexing the UTPN distribution through Monte Carlo simulations. The sample size is set at
, 35, 50, 100, 200, and 500, while the parameters are fixed at
−
, 0.3, and 4.5, with
and 2.3. For each combination,
pseudo-random samples are generated from the UTPN distribution using the inverse CDF method, meaning
where
u is a uniform
observation.
To assess the performance of the MLEs and their asymptotic confidence intervals, the bias (Bias), standard error (SE), root mean squared error (RMSE), and coverage probability (CP) of the 95% confidence intervals are calculated. Insights can be gleaned from
Table 1.
The simulation study, detailed in
Table 1, provides valuable insights into the performance of ML estimates for the UTPN distribution under varying sample sizes (
n) and true values of the scale parameter (
). Notably, the estimates exhibit commendable convergence as sample size increases, reflecting the robustness of the ML method. When
is a positive value, for a true value of
, the estimates of
and
approach stability as
n grows. The bias diminishes, SE decreases, and the RMSE converges, indicating the dependability and accuracy of the ML estimates. The CP of the 95% confidence intervals consistently approaches the nominal level, highlighting the precision of the estimates. Similarly, when the true value of
, the ML estimates exhibit convergence properties as the sample size increases. The bias decreases, SE reduces, and RMSE stabilizes, reflecting the consistency and efficiency of the estimation approach. The CP of the confidence intervals remains close to the expected 95% level, underscoring the dependability of the ML estimates even under larger-scale values. When
is a negative value, the asymptotic convergence is slow for small sample sizes. In summary, the simulation results affirm the suitability and robustness of the ML estimation approach for the UTPN distribution, particularly in achieving reliable parameter estimates as sample sizes increase, regardless of variations in the true scale parameter. These findings contribute to the methodological robustness of the UTPN distribution, enhancing its applicability in diverse statistical modeling scenarios.
6. Conclusions
In numerous applied scientific fields, various metrics, such as indicators, percentages, proportions, ratios, and rates, measured on the scale of (0, 1) serve as crucial study variables for characterizing diverse phenomena. However, the current statistical literature offers limited model options for handling these variables. The beta and Kumaraswamy distributions are two of the main models. This study introduces a flexible two-parameter probability distribution with a bounded domain, derived using an exponential transformation of a truncated positive normal variable, this transformation provides statistical properties related to the distribution in simple and closed form. We investigate several statistical properties of the proposed distribution, including maximum likelihood analyses conducted on two practical datasets. Notably, the obtained findings demonstrate that the proposed distribution exhibits greater flexibility compared to commonly used statistical distributions, such as beta, Kumaraswamy, and unit-logistic distributions. Particularly in the realm of modeling small samples, the obtained results underscore the superior performance of the UTPN distribution. On the other hand, the study’s findings suggest that the UTPN distribution may not be ideal for modeling lifetime data with a decreasing HR function or a bathtub-shaped HR, which includes burn-in and wear-out phases along with extended periods of low, constant hazard. Therefore, further research may aim to improve the distribution to overcome these limitations. Moreover, future research avenues could explore the derivation of alternative models from the TPN distribution using different transformations, as well as the examination of the proposed model within the quantile regression framework.