1. Introduction
Extreme value theory (EVT) is a set of statistical tools employed for modeling and predicting the occurrence of rare events outside the range of available data. It has been widely used to study events that are more extreme than any previously observed, e.g., in disciplines such as climatology: extreme events of temperature [
1,
2,
3,
4], precipitations [
5,
6,
7,
8,
9,
10], and solar climatology [
11,
12,
13]; finance and insurance: applications to risk theory [
14,
15,
16,
17,
18,
19,
20]; and engineering: design for modern buildings [
21].
There are two approaches for modeling an extreme value problem. The first one is the block maxima method: dividing the sample space into blocks of equal size, the maxima values of each block follow, under a certain domain of attraction conditions, approximately a Generalized Extreme Value (GEV) distribution [
22]. The second way to deal with an extreme value data set attempts to make use of the available information about the upper tail of the distribution than just the block maxima. The so-called
peaks-over-threshold (POT) method is due to hydrologists trying to model floods. Loosely speaking, References [
23,
24] showed that when we consider the distribution of data above a certain threshold
u, it can usually be approximated by a properly scaled Generalized Pareto distribution (GPD) as
u tends to the endpoint of the distribution. This is the point of view considered in this article.
Due to its importance, several methods have been proposed for estimating the shape and scale parameters of the GPD. Classical methods include the method of moments, probability weighted moments, maximum likelihood and others. An exhaustive review of them can be consulted in [
25,
26]. Other researchers have proposed generalizations of the GPD: Reference [
27] proposed a three-parameter Pareto distribution and employed POT to make estimations of Value at Risk and [
28] introduced an extension of GPD and performed parametric estimation. However, classical methods might be inappropriate in certain situations, as explained in [
26]. That is why Bayesian inference could be advisable. There are not many approaches to the GPD parameter estimation through Bayesian techniques. We can cite [
29], who recommended the use of conjugate prior distributions; Reference [
30] estimated the shape parameter when it is positive, and the computation of the posterior distribution was implemented via the Markov Chain Monte Carlo (MCMC) method with Gibbs sampling; Reference [
31] employed Gamma and Pareto priors via MCMC; Reference [
32] proposed a Bayesian mixture model, where the threshold was also a random parameter; Reference [
33] employed Jeffrey’s prior and Metropolis–Hastings (MH) method and [
34] employed the GPD distribution itself as the prior density.
In this paper, we aim at seizing all the available information coming from data in order to estimate the parameters of the GPD in a way as accurate as possible. A similar idea was also implemented in [
35], for the estimation of the parameters of the Gumbel distribution when the baseline distribution is Gumbel, Exponential or Normal.
We will take into account all the data of the baseline distribution and study the relation between the baseline parameters and the parameters of the limit GPD in order to incorporate such relation into the sketch of new methods to make estimations. Concretely, we propose two methods and compare them with the classical MH method for data over the threshold. In addition, we will analyze four particular examples of underlying distribution: Exponential, Lévy, Cauchy and Normal, and make a special study of stable distributions.
2. Generalized Pareto Distribution and Its Relation with Extreme Value Theory
Let
X be a random variable with distribution function
F. Define
u as the threshold value and let
be the random variable with distribution function
for
, being
the right endpoint of
F, that is,
.
Notice that is the random variable that we obtain when we consider the distribution of data above the threshold, which we usually call the tail distribution.
Given a random variable
X, we say that it follows a Generalized Pareto Distribution (GPD) when its distribution function is
where
and
are the scale and shape parameters, respectively. Equation (
1) is valid when
for
, and for
for
.
The fundamental result that connects EVT and GPD distribution belongs to [
23,
24], and it establishes that under certain mild conditions, for a random variable
X, the distribution of
for a sufficiently high threshold
u follows a properly scaled Generalized Pareto distribution (GPD).
We will call the distribution function of X, F, the baseline distribution of the GPD. The parameters and will depend on the value of the threshold u, and on the baseline distribution. For example, is determined by the shape of the upper tail of the baseline distribution F. Positive values of the shape parameter correspond to heavy tails, while negative ones come from light tails. The special case will appear when the upper tail of the distribution tends to an exponential distribution of parameter .
In this work, we will consider several types of baseline distributions for the GPD. Traditionally, estimation for the parameters of the limit GPD in Extreme Value Theory has been made taking into account only values above the threshold, but this information might be scarce. We propose a new strategy consisting in seizing all the data from the baseline distribution. We will show how this strategy can produce accurate estimations for the parameters of the GPD.
In the case when the baseline distribution is an exponential distribution with parameter
, with distribution function
for every
,
Consequently, the tail distribution
is the same as the underlying distribution in the exponential case. Therefore, we must employ all the available data to estimate the parameter
in the definition of the GPD (
1).
The case when is different. In this paper, we will consider some of the most employed distributions as underlying distributions: normal distribution, which has light tails (); and the Cauchy and Lévy distributions, which lead to heavy tails (). Those distributions are stable; therefore, they have additional properties that will be helpful to estimate the parameters of the GPD. We will study such properties below.
With respect to the threshold, it can be settled as a known value, which has a physical meaning, depending on the characteristics of the data, or it can be defined as an upper order statistic. It is generally defined as a p-quantile of the underlying distribution , for appropriate values of p.
3. Stable Distributions
Stable distributions are a rich class of probability distributions characterized by [
36] and they have been proposed as a model for many types of physical and economic systems because of their interesting properties. They are the attractors of sums of independent, identically distributed distributions whether or not the mean or variance is finite. Good references to study them are [
16] or [
37].
Let
Z be a random variable with parameters defined by its characteristic function:
where the parameter
is called the index of stability, and
is the skewness parameter. When
, the distribution is symmetric.
A random variable
X is said to follow a stable distribution with parameters
and
if it satisfies that
Generally, densities can be expressed only by complicated special functions, but there are three special cases of stable distributions that have probability density functions, which can be expressed analytically:
When
and
, we obtain Lévy distributions,
. If
, then
and
in (
3).
When
and
, we obtain the family of Cauchy distributions,
. If
, then
and
in (
3).
When
and
, we obtain the normal distribution,
. If
, then
and
in (
3). As usual, in this case we will denote
.
In particular, as we can standardize the stable distributions of a family, the
p-quantiles
for a stable distribution
X, can be expressed in terms of the
p-quantiles
of the standard distribution
Z, as
Let us assume that
X follows a stable distribution with parameters
a and
b for Equation (
3), and fix
as the threshold for the problem of extreme value. Then, for
u large enough,
from (
4) is also large, and, consequently, denoting
, Theorem 1 guarantees that
. Therefore, as
and
then
The parameter
will remain constant for all the random variables of the same stable family, whatever the parameters of the baseline variable are, while the scale parameter is obtained through the product of the standardization parameter
a and the scale parameter
for the GPD limiting distribution of
In the case when the underlying distribution is a Cauchy or a Lévy, or any stable distribution X with the index , the tail of the distribution is considered to be “heavy", therefore it leads to a GPD where .
Stable distributions with index of stability
, (all of them except normal distribution) also verify an interesting property. As we can see in [
36], given the standard distribution
Z, its survival function
can be approximated by:
where
.
From this approximation, we can infer that the shape of the tail of the distribution will only depend on the index of stability . Therefore, if we consider the GPD that models , the shape parameter will be fixed. We will estimate it through simulation.
Proposition 1. When the baseline distribution is a standard stable distribution Z with , the relation between the parameters of Z and the parameters of the GPD that models the distribution above the p-quantile of Z, , is: Proof. From Theorem 1, for
that is big enough,
and by Proposition 1 (
9), also for big values of
Then, making equal both expressions,
Therefore, we can take as an estimator for (notice that the shape of the tail of the distribution depends only on the value of ). □
Substituting
by
, we have
so
as
can be negligible. Therefore, we define
In
Section 5, we will assure the accuracy of these estimators through an extensive simulation study.
4. Metropolis–Hastings (MH) Method
In this section, we will explain how to apply the Markov chain Monte Carlo (MCMC) method through the Metropolis–Hastings (MH) algorithm to make the estimations for stable distributions. We have to distinguish between light tails () and heavy tails (). Let us assume and that we dispose of m values.
Let be a sample of n values from X and be a sample of m values from .
4.1. Light Tails
Take
, and
, so
, with the likelihood function
Considering (,) and (,) as prior distributions for both parameters. Then, the MH algorithm is applied:
Finally, we obtain estimations for and from and .
4.2. Heavy Tails
In this case, the likelihood function is
Taking a type I Pareto (
) as prior distribution for
and
for
,
Posterior conditional distributions are
Then, MH algorithm is applied, as in the previous case.
5. Baseline MH Method (BMH)
In this section, we will introduce Baseline Metropolis–Hastings (BMH) method, designed according to the objectives of seizing all the available data from the baseline distribution and making use of the existing relations between the baseline parameters and the limit GPD parameters. The method consists of:
Applying the MH algorithm to estimate the parameters for the baseline distribution .
Making use of the relations between the baseline parameters and the GPD parameters and to compute estimations for and .
In the case of stable distributions, these relations have been explained in previous sections and are given by (
8). We will detail below the application of the BMH method for the three selected stable distributions.
For the rest of the baseline distributions, the strategy to find such relations would be made thorough studies of simulation, in order to establish correspondences between the baseline parameters and the tail GPD parameters. At the moment, there are no studies in the literature about this subject; therefore, it would be interesting to perform them in future research.
Then, in the case of stable distributions, the application of BMH would be:
5.1. Estimations for and
In order to provide good estimations for and , we made a thorough simulation study for the three baseline distributions we have considered: Lévy, Cauchy, and Normal distribution. We took values for with increments of , and set the threshold . For each distribution, and for each value of the threshold, values from were generated. This sequence was repeated 100 times. Therefore, we obtained 100 point estimations for each p.
To guarantee the convergence of the MCMC algorithm, we must be sure that the posterior distribution has been reached. These proceedings were made using library
coda [
38] for
R software, taking 10,000 values for the burn-in period, 25 values for the thinning, and selecting initial values for each sample. Finally, to obtain the posterior distribution for each parameter, a Markov chain of length 10,000 was obtained and we considered the mean as the estimator. The results of the simulation study are shown in
Figure 1.
5.1.1. Lévy and Cauchy Baseline Distribution
By Proposition 1 (
10), we had estimations for
and
. Concretely, these are
for the Lévy distribution and,
for the Cauchy distribution.
5.1.2. Normal Baseline Distribution
In the case of the normal distribution, property (
10) is not verified. In this case, the adjustment from the simulation study is:
Now, we will estimate the scale parameter
a of the baseline distribution
X. Notice that parameter
b for the stable distribution
X defined in (
3) does not have any influence on the estimation of the parameters of the GPD, as we have shown before. Consequently, we will assume
from now on.
5.2. Lévy Distribution
Likelihood function for Lévy distribution is:
Taking a prior distribution
for
a, and making use of (
8) and (
11), we obtain estimations
and
.
5.3. Cauchy Distribution
Likelihood function for Cauchy is:
Taking
as prior distribution and, making use of (
8) and (
12), we obtain estimations
and
.
5.4. Normal Distribution
In this case, we consider
as prior distribution for
, and making use of (
8) and (
13), we obtain estimations
and
.
6. Informative Priors Baseline MH (IPBMH)
Finally, we propose an MH method to estimate the parameters and , where highly informative priors are employed, seizing the estimations computed before. Employing estimations of , and a obtained in previous sections, we can settle priors for and , which are very informative. Notice that this way of proceeding keeps the original idea of Extreme Value Theory, granting more weight to tail values because they are employed twice: to compute estimations for and and also through the likelihood function. As we commented before, this method could also be implemented for other baseline distributions once we have found relations between baseline and GPD parameters.
For stable distributions, highly informative priors are
Then, for the three distributions studied,
a is estimated through BMH and
,
are estimations computed through (
11)–(
13). In addition,
is constant, being 0.03, 0.065 and 0.1 for Normal, Cauchy and Lévy baseline distributions, respectively.
, where values are given in
Table 1.
The Joint posterior distribution is
and marginal distributions are
Then, we apply the MH algorithm with
7. Simulation Study
Now, we will develop a thorough simulation study in order to compare the accuracy of the three MH methods of estimation: MH, BMH and IPBMH.
We fixed and the threshold . Furthermore, we take . We sample , with values from the three baseline distributions considered, with scale . We obtained an MCMC with length 10,000, taking 10,000 values for the burn-in period, 25 values for the thinning and selecting initial values for each sample. Finally, this sequence was repeated 100 times and we considered the mean as the estimator.
In
Figure 2, we can see the posterior distribution of the parameters
and
for the different sample sizes
n, when the baseline distribution is
(left),
(middle) and
(right), for the methods MH and BMH. For the first one, the distribution is right skewed, although skewness becomes smaller as
n increases. BMH offers a point estimation, plotted as a vertical red line.
In
Figure 3, we can see the posterior distribution for
, for both methods. MH (upper charts) shows much more skewness, such as in the case of
, while estimations from BMH (lower charts) are less skewed and dispersed.
Now, we will compare mean absolute errors (MAE) for MH and BMH when
, and for sample sizes
. In
Figure 4,
Figure 5 and
Figure 6, we can see how BMH provides smaller errors than MH.
Clearly, BMH provides more accurate estimations for the parameters of the GPD when the baseline distribution is stable and known. However, in practical situations, we might not know which is the baseline distribution, or data could better fit a mixture of distributions rather than a simple one. In these situations, the use of highly informative priors, built with the information available from all the data, could be advisable. To develop this idea, we simulated from different mixtures and computed values of MAE for the three methods, finding that IPBMH is the method that shows the best behavior when data differ from the simple distributions.
In
Figure 7, we can see MAE for the three methods, in the case of the mixtures employed (
):
(left charts),
(middle charts) and
(right charts), for the three baseline distributions considered (Lévy, Cauchy, Normal). In general, IPBMH is the most advisable method, especially for the case when data are scarce. Notice that when data approaches one of the pure stable distributions, for example, in the case of Cauchy mixtures (which are still quite similar to a simple Cauchy) and the second mixture for the Normal distribution, BMH and IPBMH show very similar results. However, when the mixtures differ significantly from the simple distribution, the method IPBMH and MH are more advisable than BMH.
8. An Application: PM 2.5 in Badajoz (Spain) during the Period 2011–2020
As we mentioned before, both baseline methods can be generalized for other baseline distributions by studying the relations between the parameters of the baseline distributions and the parameters of the limit GPD. We show an example, employing real data whose baseline distribution can be fitted by a Gamma distribution.
Data from measurements of the levels for diverse air pollutants in many municipalities in Spain are publicly available on the website
https://www.miteco.gob.es/es/calidad-y-evaluacion-ambiental/temas/atmosfera-y-calidad-del-aire/, accessed on 15 January 2022. Particulate matter is a mixture of solid particles and liquid droplets that can be inhaled and cause serious health problems. In particular, we have selected particulate matter less than 2.5 micrometers in diameter, known as PM 2.5, because it is considered to be especially dangerous to human health. In this context, studying the tail distribution above a certain threshold is essential because the World Health Organization recommends not to exceed an average daily value of 25 μg/m
3 and not to exceed an average annual value of 10 μg/m
3. We studied levels of PM 2.5 measured in μg/m
3 for the last ten years available, 2011–2020, from Badajoz. There are
observations, and, as can be seen in
Figure 8, data can be fitted by a Gamma distribution. As in the previous simulations,
and the threshold
, resulting
μg/m
3.
Making a simulation study, similar to the previous ones made for Normal, Lévy and Cauchy, we obtained the following relation between the parameters of the baseline distribution
and the parameters of limit GPD
:
Then, we randomly selected 50 data and applied the three methods MHM, BDM and IPBDM to fit the tail data. This proceeding was repeated many times, and we found three possible behaviors, as shown in
Figure 9. In the first case (left chart), there is an example of the most usual situation: IPBDM offers intermediate estimations, between MHM and BDM. When there are scarce tail data (middle chart), MH differs significantly from the real density, while BDM and IPBDM provide better estimations. Finally, in the right chart, a common situation is shown in which IPBDM clearly offers the best estimations.
9. Conclusions
In the parameter estimation of GPD, usual EVT methods waste a lot of information. We have proposed two MH methods that make the most of all the information available from the data set. They are based on making use of the existing relations between baseline and GPD parameters, through informative priors.
When considering the GPD coming from stable baseline distributions, we employed singular properties of stable distributions to simplify the problem (reducing to standard cases) and to provide estimators for the parameters of the GPD.
We have achieved very accurate estimations for the parameters of the GPD when the baseline distribution is Cauchy, Normal or Lévy, making use of the properties of stable distributions and MH methods.
We have studied the goodness of the estimations for classical MH method and BMH when the baseline distribution is standard Cauchy, Normal or Lévy. Clearly, BMH provides more accurate estimations than MH.
In most real situations, data do not fit a simple distribution. We simulated some examples of mixtures of stable distributions and showed that IPBMH provides more accurate estimations than the other methods.
These proposals could be generalized for other baseline distributions by studying the relations between the parameters of the baseline distributions and the parameters of the limit GPD. We provide an application with real data to illustrate this situation.