1. Introduction
The Gumbel distribution, also known as the type-I generalized extreme value distribution, is commonly used to model data with extreme observations. This distribution and its extensions have a wide range of applications in several disciplines such as hydrology, economics, finance, climatology and seismology. The probability density function (pdf), the cumulative distribution function (cdf) and the quantile function of a random variable
X that follows the Gumbel distribution are given by
where
and
. Applications of this model in different scenarios can be found in Bhaskaran et al. [
1], Gurung et al. [
2], Purohit et al. [
3], Li et al. [
4] and Kang et al. [
5].
Several extensions of the Gumbel distribution have been recently proposed in the literature. Hossam et al. [
6] presented a statistical model that combines the new alpha power transformation method and Gumbel distribution. Watthanawisut and Bodhisuwan [
7] proposed a new extension of the so called Topp-Leone Gumbel distribution that is used to model minimum flow data. Fayomi et al. [
8] presented the exponentiated Gumbel-G family of distributions and explored a special case called EGuNH. Nagode et al. [
9] introduced a three-parameter Gumbel distribution, which was applied to rope failure data. Oseni and Okasha [
10] derived the Gumbel-geometric distribution, which was applied to precipitation and maximum annual wind speed data. Note that all these extensions do not consider a regression framework, and their main objectives rely on the fit of univariate data.
It is evident that regression models have become relevant tools in the era of Data Science. Among them, the so-called quantile regression models (introduced by Koenker and Bassett [
11]) are an alternative to the usual regression techniques where the mean response conditional to values of covariates (or explanatory variables) is estimated. The quantile regression models allow us to measure the effects of covariates at different quantiles of the response variable distribution. Thus, they provide an analysis across the entire conditional distribution, as can be seen in Cade et al. [
12], Koenker [
13] and Wei et al. [
14]. The mean, as the only summary measure, is generally quite poor for assessing risk, as it is greatly affected by the presence of outlier observations. Outlier data can be quite strange, but at the same time, these can be enough to cause serious problems when analyzing the information obtained; see, for example, Gómez-Déniz et al. [
15], who analyzed extreme values in insurance companies. To our knowledge, there are no studies on quantile regression models based on the Gumbel distribution. Thus, the objectives of this work were to introduce a new generalization of the truncated Gumbel distribution and then establish a quantile regression model based on this novel generalization. To do this, a reparametrization was obtained of the new truncated Gumbel generalization by incorporating a parameter that represents the quantile. We should note that the proposed generalization was achieved by considering the work of Neamah and Qasim [
16] and the transformation provided by Cooray and Ananda [
17]. The latter authors developed an extension of the half-normal (HN) distribution through the relation
, where
.
The rest of the paper is organized as follows. In
Section 2, we introduce our proposal, the generalized truncated Gumbel (GTG) distribution, and several important properties of this new model are presented. In
Section 3, inference is performed, including some initial points to obtain maximum likelihood (ML) estimators and present the observed Fisher information matrix for the proposed model. In
Section 4, the reparametrized model in terms of a quantile is presented. In
Section 5, the simulation study carried out to analyze the performances of the ML estimators in finite samples for the proposed model without and with covariates is discussed. In
Section 6, two real-data applications are presented to illustrate the proposed models, without and with covariates. Finally, in
Section 7, some concluding comments are presented.
2. Generalized Truncated Gumbel Distribution
Neamah and Qasim [
16] derived a new model with positive support for the Gumbel distribution by truncating its pdf from the left. We will refer to the resulting model of these authors as the truncated Gumbel (TG) distribution, which is defined in the interval
. In considering the reparametrization
, the pdf of the TG distribution can be written as follows:
where
is a scale parameter,
is a shape parameter, and
and
are the pdf and cdf for the standard Gumbel distribution, respectively.
In this work, we considered the transformation developed by Cooray and Ananda [
17] to extend the TG distribution. That is, we considered the transformation
, where
. We will refer to this extension as the generalized truncated Gumbel (GTG) distribution. Important functions, such as the pdf, cdf, hazard and quantile functions of the GTG distribution are provided below.
2.1. Pdf, cdf and Hazard Function
Proposition 1. Let . Then, the pdf of Z is given bywhere , and . Proof. Considering the stochastic representation of
Z, we have that
. Then,
Therefore, the result is obtained by replacing in . □
Remark 1. We previously mentioned that if , where . Thus, when , we obtain the TG distribution; that is, .
Proposition 2. Let . Then, the cdf and hazard function of Z are given byandrespectively, for all . Proof. Both functions are obtained immediately from their definitions. □
Figure 1 shows the pdf, cdf and hazard function for the
model, considering some combinations for
and
. We observe that the GTG model can have decreasing or unimodal shapes for the pdf, whereas for the hazard function, we can have decreasing or increasing shapes. Also, we observe that for some combinations of
and
, the cdf rapidly increases, although all of them tend to 1 when
z increases.
2.2. Mode
The shape of the pdf of
can be examined based on its inflection points. By computing the first derivative of
with respect to
z, where
is the pdf for the GTG model, we obtain that
where
. By equating the previous expression to 0, we obtain that
from which the mode of
Z can be numerically obtained. The nature of the points are determined by
, where
is given by
Depending on whether
or
, where
is a solution of Equation (
5), the inflection points can be local maxima or minima.
Figure 2 shows the shape of
for
and selected values of
and
. From here, we observe that the pdf of the GTG distribution is zero when
, both for when
takes a positive or negative value.
2.3. Quantiles
Proposition 3. If , then the quantile function of Z is given by Proof. It follows from a direct computation, by applying the definition of the quantile function. □
Corollary 1. The quartiles of the GTG distribution are as follows:
- 1.
(First quartile) .
- 2.
(Median) .
- 3.
(Third quartile) .
Proof. It is immediate from Proposition 3. □
2.4. Moments
Proposition 4. Let and n be a positive integer. Then, the n-th moment of Z is given bywhere ,, and is the generalized binomial coefficient. When , the sum in stops at . Proof. Given the stochastic representation of
Z, it is immediate that
, where
. Then, the
moment of
Y can be computed by following the properties presented in Neamah and Qasim [
16]. □
Corollary 2. If , then the first four moments and the variance of Z are obtained as follows:
- 1.
- 2.
- 3.
- 4.
- 5.
.
Proof. It is immediate from Proposition 4. □
Corollary 3. Let . Then, the skewness coefficient () and the kurtosis coefficient () are given bywhere , and , for . Proof. The expressions above are obtained using the definitions of the skewness and kurtosis coefficients; that is,
where
, for
, are given in Corollary 2. □
Remark 2. Proposition 4 shows that the moments of the distribution basically depend on the moments of the model. Plots for the expected value, variance, skewness and kurtosis coefficients of the model are given in Figure 3 for different values of the λ and α parameters. The bottom plots in Figure 3 reflect the effect of the α parameter: a lower value of α produces higher values of the skewness and kurtosis coefficients. This fact can also be appreciated in Table 1 and Table 2. 2.5. Bonferroni Curves
In different disciplines, such as socio-economics and public health sciences, there is a necessity to compare and analyze the inequality of non-negative distributions. Generally, Bonferroni curves are used as graphical methods to achieve the required comparison/analysis (see Bonferroni [
18], and Arcagni and Porro [
19] for a further discussion about these curves). The following result shows the expressions of these curves for the GTG model.
Proposition 5. If , then the Bonferroni curves, say , are given bywhere , , and . Proof. The expression above is obtained using the definition of the Bonferroni curves; that is,
where
is the expected value of the corresponding non-negative random variable, and
. □
Figure 4 shows the Bonferroni curves for the
model, considering different values for
and
.
3. Inference for the GTG Distribution
In this section, we discuss the maximum likelihood (ML) approach for parameter estimation in the GTG model.
3.1. Maximum Likelihood Estimators
Let
be a random sample of size
n from the
model. Then, the log-likelihood function for
is given by
where
. Therefore, the score assumes the form
, where
and
The ML estimators are then obtained by numerically solving the equation
, where
denotes a vector of zeros with length
p. Solutions for Equations (
9)–(
11) can be obtained using numerical procedures in
R [
20], such as the Newton–Raphson method. To initialize the numerical algorithm that solves
, in the next subsection, we propose an initial point for the vector
.
3.2. Initial Points
In this subsection, we propose estimators based on the quantiles for the GTG distribution, and these estimators are an alternative to the moment estimators, which meets the objective of using them as initial values to calculate the maximum likelihood estimators of the GTG distribution.
Let
and
be the sample quartiles that are based on
. Initial values for
can be obtained by equating the sample quartiles with the theoretical quartiles. The resulting equations are given by
and
The solutions for
and
, say
and
, can be expressed in terms of
(the solution for
) as follows:
whereas
is obtained from the non-linear equation
Therefore, the initial point based on this method is given by .
3.3. Observed Fisher Information Matrix
The asymptotic variance of the ML estimators, say
, can be estimated from the observed Fisher information matrix defined as
, with
given in Equation (
8). Under regularity conditions,
where
stands for convergence in distribution, and
denotes the standard trivariate normal distribution (see Wang et al. [
21]). Moreover,
can be estimated from the matrix
, whose elements are given by
,
, and so on. Explicitly, we have that
where
.
4. GTG Quantile Regression Model
For the GTG model, the mean has a complicated form, and then, it is not recommendable to consider a mean-parameterized version of the model. On the other hand, and thinking in a context of heterogeneous observations, quantile regression is a more appropriate tool for analyzing data in presence of covariates because they allow for a complete description of the distribution of the response variable (not just a particular measure as is the case when regression on the mean is conducted).
Specifically, for the GTG model and considering that
represents the
pth quantile of the distribution, we obtain the equation
,
. By solving such an equation, we obtain
where
.
Thus, we can reparameterize the pdf and cdf of the GTG model as
and
respectively, where
,
,
, and
is fixed. We refer to this model as the reparameterized GTG (RGTG) model.
The consideration of
as a set of
q known covariates related to the
p-th quantil of the
i-th individual can be introduced in the model as follows:
where
is a
q-dimensional vector of unknown regression parameters (
), and
is a link function, which is continuous, invertible and at least twice differentiable. A natural choice in this context is the logarithm link, i.e.,
.
With this framework, the corresponding log-likelihood function for the RGTG quantile regression model is given by
where
. The estimation of the regression parameters is obtained by directly maximizing this function.