Next Article in Journal
An Extensive Investigation into the Use of Machine Learning Tools and Deep Neural Networks for the Recognition of Skin Cancer: Challenges, Future Directions, and a Comprehensive Review
Next Article in Special Issue
Bayesian Inference for the Gamma Zero-Truncated Poisson Distribution with an Application to Real Data
Previous Article in Journal
On the Breaking of the U(1) Peccei–Quinn Symmetry and Its Implications for Neutrino and Dark Matter Physics
Previous Article in Special Issue
Symmetrical and Asymmetrical Distributions in Statistics and Data Science
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Inference and Application of Asymmetrical Generalized Pareto Distribution Based on Peaks-Over-Threshold Model

1
School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China
2
China National Environmental Monitoring Center, Beijing 100012, China
3
School of Economics, Nanjing University of Finance and Economics, Nanjing 210023, China
4
University of Chinese Academy of Sciences, Beijing 100049, China
5
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
*
Authors to whom correspondence should be addressed.
Current address: No.100, Pingleyuan, Chaoyang District, Beijing 100124, China.
Symmetry 2024, 16(3), 365; https://doi.org/10.3390/sym16030365
Submission received: 6 February 2024 / Revised: 1 March 2024 / Accepted: 8 March 2024 / Published: 18 March 2024

Abstract

:
Generalized Pareto distribution (GPD), an asymmetrical distribution, primarily models exceedances over a high threshold in many applications. Within the peaks-over-threshold (POT) framework, we consider a new GPD parameter estimation method to estimate a common tail risk measure, the value at risk (VaR). The proposed method is more suitable for the POT framework and makes full use of data information. Specifically, our estimation method builds upon the generalized probability weighted moments method and integrates it with the nonlinear weighted least squares method. We use exceedances for the GPD, minimizing the sum of squared differences between the sample and population moments of a function of GPD random variables. At the same time, the proposed estimator uses three iterations and assigns weight to further improving the estimated performance. Under Monte Carlo simulations and with a real heavy-tailed dataset, the simulation results show the advantage of the newly proposed estimator, particularly when VaRs are at high confidence levels. In addition, by simulating other heavy-tailed distributions, our method still exhibits good performance in estimating misjudgment distributions.

1. Introduction

Some scholars have found that the tails of various measured data distributions tend to be heavier than those of a normal distribution, as reported in [1,2]. Embrechts et al. [3] proposed the assumption that normal distribution will underestimate the risk associated with extreme tails. Since the 1990s, the extreme value theory (EVT) has been widely used in insurance, earthquake analysis, hydrology, transportation, climate, reliability analysis and other fields. The EVT mainly uses two approaches for modeling random variables: the block maxima model (BMM) and peaks over threshold (POT). The BMM first divides the sample interval, takes the maximum value in each sample interval, and then asymptotically fits the obtained maximum value series as the analysis object to the generalized extreme value (GEV) distributions. However, the POT fits all exceedances over a given threshold for heavy-tailed data. Pickands [4] first proposed that the excesses can be approximated via the generalized Pareto distribution (GPD) when the distribution of excesses is in the maximum domain of attraction. In the field of financial market analysis, risk measurement is an extremely important issue. As one of the most important and widely accepted risk measurement tools, Value at Risk (VaR) can reflect the risk and play an early-warning role in financial markets. The VaR from the fitted GPD is extremely sensitive to the GPD parameter estimators and a small difference in the parameter value will cause a significant impact on a bank’s financial position, highlighting the importance of accurately estimating GPD parameters.
Since Pickands first proposed the GPD, various parameter estimation methods for this distribution have been studied. For example, Hosking and Wallis [5] considered the method of moments (MOMs); Greenwood et al. [6] discussed probability weighted moments (PWMs); Smith [7] introduced maximum likelihood (ML); and Moharram et al. [8] discussed the least square (LS) of the GPD. Hosking [9] developed an L-moment method based on the PWM linear combination; Ashkar and Ouarda [10] proposed the generalized method of moments (GMOM) based on the MOM method. Castillo and Hadi [11] used the elemental percentile method (EPM). Based on the EPM, From and Ratnasingam [12] proposed various efficient closed-form estimators for the GPD. Product of spacing (POS) and logarithmic Cramér–von Mises (LCVM) methods were proposed at the same time. Rasmussen [13] proposed generalized probability weighted moments (GPWMs) based on the PWM method, both of which broadened the scope for parameter estimation. In recent years, for the estimation of GPD parameters, Zhang [14] introduced the likelihood moment estimator (LME), which solves the iterative convergence problem in ML methods. The LME always has the advantages of high asymptotic efficiency and simple calculation. Zhang and Stephens [15] combined Bayesian methods to propose an estimation method based on likelihood functions, which has strong practicality. Song and Song [16] considered the use of nonlinear least squares (NLSs) estimation. It is based on the least squares estimation method to minimize the sum of squares of the deviation between the empirical distribution function (EDF) and the theoretical distribution function (GPD) of the excess data. Park and Kim [17] developed weighted nonlinear least squares (WNLS) based on NLS, further improving the estimation accuracy of high quantiles. Based on the GPWM method, the generalized probability weighted moment equation (GPWME) method was proposed by Chen et al. [18], confirming that the GPWM method is a special case of the GPWME method, and the estimation method has no restrictions on parameter values. Chen et al. [19] combined the minimum distance estimation and the M-estimation in a linear regression. Martín et al. [20] proposed informative priors baseline Metropolis–Hastings (IPBMH) to improve the accuracy of Bayesian parameter estimation.
Choosing the appropriate threshold is a prerequisite for accurately estimating GPD parameters. If the selected threshold is too high, the actual sample size becomes smaller, resulting in too little data on the fitted excess distribution function such that the variance of the parameter estimate may be high. In contrast, if the selected threshold is too low, it may cause a biased estimate and an increase in estimated deviation. Several threshold selection procedures are available in the literature. Langousis et al. [21] detailed many of these methods for the chosen threshold. One category of methods applies the goodness of fit of the GPD. Choulakian and Stephens [22] proposed a goodness-of-fit test for a two-parameter GPD, using the Cramér–von Mises (CvM) and Anderson–Darling (AD) tests to select the lowest threshold at a given confidence level. Combining the AD and CvM tests with the stop rule proposed by G’Sell et al. [23], Bader et al. [24] proposed an automatic threshold selection method. Yang et al. [25] performed threshold selection based on the relationship between eigenvalues and thresholds. Saadatmand-Tarzjan [26] proposed a global threshold selection method based on fuzzy expert systems. Curceac et al. [27] used an automated threshold determination method based on the stability of shape parameters and modified scale parameters. Based on the L-moment theory, an automatic L-moment ratio threshold selection method was proposed by Silva Lomba and Fraga Alves [28].
In this study, within the POT framework, a new GPD parameter estimation method is provided to estimate VaRs. The method is derived from the GPWME method. The proposed method, however, is suitable for POT by employing the exceedances over a sufficiently high threshold. Firstly, based on the GPWME, we select three suitable objective functions, the specific forms of which can be found in Section 3.2. Secondly, using the exceedance over a certain high threshold for a heavy-tailed dataset, based on moment estimation and the nonlinear weighted least squares methods, the sum of squared differences between the sample and population moments of a function of the GPD random variables is minimized. Finally, we select appropriate weights to modify the objective function to obtain a more accurate parameter estimate. Then, we estimate the extreme VaR via the proposed method. To evaluate its performance, we apply it to different heavy-tailed distributions and a real dataset. We then compare it to various common parameter estimation methods. The results show that our method performs better than the compared methods to some extent, particularly for VaRs with extremely high confidence levels in some cases.
This paper is organized as follows: In Section 2, the relevant theories in GPD, POT and VaR are briefly introduced. In Section 3, under the POT framework, we present a new GPD parameter estimator by comparing it with the existing method. Section 4 introduces numerical simulations, and we show the performance of different existing methods for estimating tail extreme quantiles under different heavy-tailed common parameter distributions. In Section 5, a similar exercise is performed using a real dataset. In Section 6, we conclude this paper.

2. EVT for Extreme Tail Risk Measures

2.1. The GPD

Let X be a random variable (r.v.). The cumulative distribution function (cdf) of the GPD is defined as
G μ , ξ , σ ( x ) = 1 1 + ξ x μ σ 1 ξ , ξ 0 , 1 exp x μ σ , ξ = 0 ,
where ξ ( ξ R ) , μ ( μ R ) and σ ( σ > 0 ) are the shape, location, and scale parameters, respectively, and 1 / ξ is the tail index. When μ = 0 , the three-parameter GPD reduces to the two-parameter GPD ( ξ , σ ). When ξ > 0 , the domain of x is ( μ , ) and G μ , ξ , σ ( x ) is heavy-tailed. When ξ < 0 , x ( μ , μ σ / ξ ) , it is short-tailed. When ξ = 0 , the GPD is a medium-tailed exponential distribution (more details can be found in [16]).
The corresponding probability density function (pdf) is
g μ , ξ , σ ( x ) = 1 σ 1 + ξ x μ σ 1 ξ 1 , ξ 0 , 1 σ exp x μ σ , ξ = 0 .
In order to show the flexibility and asymmetry of the GPD, we plot the pdfs of the GPD with various shape parameters ξ ( μ = 0 , σ = 1 ) in Figure 1.
The quantile function, x ( G ) , is
x ( G ) = μ σ ξ [ 1 ( 1 G ) ξ ] , ξ 0 , μ σ log ( 1 G ) , ξ = 0 .

2.2. Peaks over Threshold (POT)

The EVT mainly studies the tail distribution characteristics of the continuous r.v. X, that is, the tail of F ( x ) , F ¯ ( x ) = 1 F ( x ) , 0 < x < and F ¯ ( x ) is also called the survival function [29]. If
lim x F ¯ ( x λ ) F ¯ ( x ) = λ α , λ > 0 ,
we say that F ¯ changes regularly varying with index α > 0 or F ¯ α . When α = 0 , F ¯ is said to be slowly varying [17]. Based on this definition, there is a distribution that changes regularly as F ¯ L ( x ) x α , where L ( · ) is slowly varying. It is known that f ( x ) g ( x ) means lim x f ( x ) g ( x ) = 1 . It can be concluded that the tail of a regularly varying distribution can be expressed by multiplying a slowly varying function by a power function. At this point, the distribution is called the maximum domain of attraction of the Fréchet distribution ( F MDA ( Φ α ) , where Φ α = exp x α ), representing a distribution class with heavy tails if and only if F ¯ α , where α > 0 . The GPD is in MDA ( Φ α ) [17].
The Pickands–Balkema–de Haan theorem (see [4,29]) states that, for F ¯ α , the excess loss ( X u | X > u ) from such a distribution with a high threshold u > 0 converges to the GPD with Pareto parameter ξ > 0 and 1 ξ = α . That is,
lim u x F sup 0 < x < x F u | F u ( x ) G ξ , σ ( x ) | = 0 ,
where F u ( x ) = P ( X u x | X > u ) , and x F is the right endpoint of F. This means that the excess distribution F u ( x ) converges to the GPD when F is a heavy-tailed distribution in MDA ( Φ α ) (see [4,29]).
It is worth noting that the GPD is stable with respect to excess over threshold operations [30]. This means that, if X GPD( ξ , σ ), the r.v. Y = X u | X > u will follow the GPD ( ξ , σ + ξ μ ), where u is a threshold. This can also be derived for any u ˜ > u , X u ˜ | X > u ˜ that follows a parameter of ( ξ , σ + ξ ( u u ˜ ) ) of the GPD. This characteristic of the GPD illustrates that the excess above a threshold does not change the GPD shape parameter ξ , but the GPD scale parameter is altered.

2.3. Value at Risk (VaR)

VaR reflects the maximum possible loss of the value of a financial dataset or portfolio of securities within a future time period, under a given probability level. Let F ( x ) be the distribution function of the population X; then, the p quantile of F ( x ) is F X 1 ( p ) , denoted by Q p ( x ) = F X 1 ( p ) . The exceedance distribution above the threshold u can be defined by [30]
F u ( x ) = P ( X u x | X > u ) = F ( u + x ) F ( u ) 1 F ( u ) , 0 < x < x F u ,
which is assumed to be G ξ , σ ( x ) . Therefore, F ( x ) can be expressed as
F ( x ) = ( 1 F ( u ) ) G u , ξ , σ ( x ) + F ( u ) .
Based on the estimated parameters,
F ^ ( x ) = ( 1 F n ( u ) ) G u , ξ ^ , σ ^ ( x ) + F n ( u ) ,
where F n is the EDF, and ξ ^ and σ ^ are the estimations of the GPD parameters using the exceedances over the threshold. Then, the estimated p quantile of F ( x ) is
Q ^ p ( X ) = u + σ ^ ξ ^ n n n u ( 1 p ) ξ ^ 1 ,
where n is the total sample size of observations, and n u is the number of observations lower than the threshold u.

3. Parameter Estimation Method

3.1. Existing Method GPWME

Chen et al. [18] proposed the population GPWME, and defined M g , r , s ( θ ) as follows
M g , r , s ( θ ) = E g ( Z ; θ ) F Z ( Z ; θ ) r ( 1 F Z ( Z ; θ ) ) s ,
where both r and s are real numbers, and g ( x ) is a Borel measurable function for a parameter. Accordingly, the sample generalized probability weighted quasi-moment is defined as [18]
m g , r , s ( θ ) = 1 n i = 1 n g ( Z i ; θ ) F Z ( Z i ; θ ) r ( 1 F Z ( Z i ; θ ) ) s .
For convenience, note that X GPD ( ξ , σ ) represents the excess threshold data, and the corresponding F ( x ) represents the distribution function of the GPD. Denote X 1 , X 2 , , X n as a random sample of size n from F ( x ) and denote X 1 : n < X 2 : n < < X n : n as order statistics. Chen et al. [18] set β = ξ / σ , g 1 ( x ; β , ξ ) = ( 1 β x ) 1 ξ , g 2 ( x ; β , ξ ) = log ( 1 β x ) , β < X n : n 1 . The population GPWMEs are
M g 1 , 0 , s 1 ( β , ξ ) = E ( 1 β X ) 1 ξ ( 1 F ( X ; β , ξ ) ) s 1 = 1 2 + s 1 ,
M g 2 , 0 , s 2 ( β , ξ ) = ξ E log ( 1 β X ) ( 1 F ( X ; β , ξ ) ) s 2 = ξ ( 1 + s 2 ) 2 .
For the sample, when s 1 > 2 , s 2 > 1 ,
m g 1 , 0 , s 1 ( β , ξ ) = 1 n i = 1 n ( 1 β X i ) 1 / ξ ( 1 F ( X i ; β , ξ ) ) s 1 ,
m g 2 , 0 , s 2 ( β , ξ ) = 1 n i = 1 n log ( 1 β X i ) ( 1 F ( X i ; β , ξ ) ) s 2 .
Notice that
1 n i = 1 n g j ( X i ; β , ξ ) ( 1 F ( X i ; β , ξ ) ) s j = 1 n i = 1 n g j ( X i : n ; β , ξ ) ( 1 F ( X i : n ; β , ξ ) ) s j , j = 1 , 2 .
By combining Equations (6)–(10), we have
1 n i = 1 n ( 1 β X i : n ) 1 / ξ ( 1 p i : n ) s 1 1 2 + s 1 = 0 ,
1 n i = 1 n ( 1 p i : n ) s 2 log ( 1 β X i : n ) ξ ( 1 + s 2 ) 2 = 0 ,
where p i : n = ( i + γ ) / ( n + δ ) is the estimator of F ( X i : n ; β , ξ ) , where δ = 0 and γ = 0.35 [31]. The estimated values of β and ξ can be obtained from Equations (11) and (12), denoted by β ^ n and ξ ^ n , from which σ ^ n = ξ ^ n / β ^ n can be obtained.

3.2. New Methods

3.2.1. GWNLSM Estimation

GPWME applies parameter estimation where the F ( x ) distribution is ideal. That is, in practice, for all observations, the specific distribution F ( x ) is unknown in advance. It is known that, based on the Pickands–Balkema–de Haan theorem, the tail region of the observations can be modeled via the GPD. Therefore, a parameter estimate of the GPD is required. We propose an improvement on the method by combining it with the WNLS proposed by Park and Kim [17]. Therefore, generalized weighted nonlinear least squares moment (GWNLSM) estimation is proposed within the POT framework.
For the threshold u, data less than the threshold are recorded as 0, and for data greater than the threshold, the exceedance is defined as X u . In order to obtain more accurate parameter estimates, the combination of the nonlinear least squares method and moment estimation is considered based on the GPWME method. The specific method is as follows.
Set
g 1 ( x ; ξ , σ ) = 1 + ξ σ ( x u ) 1 / ξ ,
g 2 ( x ; ξ , σ ) = log 1 + ξ σ ( x u ) ,
g 3 ( x ; ξ , σ ) = 1 .
Within the POT framework, F ( x ) is replaced by Equation (2), and F ( u ) is replaced by F n ( u ) . Using Equations (4) and (5), we have
M g 1 , 0 , s 1 ( ξ , σ ) = E 1 + ξ σ ( X i : n u ) 1 / ξ ( 1 F ( X i : n ; ξ , σ ) ) s 1 = B ( n + 1 , n + s 1 + 2 i ) B ( n i + 1 , n + s 1 + 2 ) ( 1 F n ( u ) ) ,
M g 2 , 0 , s 2 ( ξ , σ ) = E log ( 1 + ξ σ ( X i : n u ) ) ( 1 F ( X i : n ; ξ , σ ) ) s 2 = ξ B ( n + 1 , n + s 2 + 1 i ) B ( n i + 1 , n + s 2 + 1 ) j = n + s 2 + 1 n + s 2 + i j 1 + log ( 1 F n ( u ) ) ,
M g 3 , 0 , 1 ( ξ , σ ) = 1 E F ( X i : n ; ξ , σ ) = 1 i n + 1 ,
where ξ > σ / ( X n : n u ) , s 1 > 2 , s 2 > 1 . The specific calculation process is shown in the Appendix A. For the sample moments, we have
m g 1 , 0 , s 1 ( ξ , σ ) = 1 + ξ σ ( X i : n u ) 1 / ξ 1 F ( X i : n ; ξ , σ ) s 1 = 1 F n ( u ) s 1 1 + ξ X i : n u σ ( s 1 + 1 ) / ξ ,
m g 2 , 0 , s 2 ( ξ , σ ) = log ( 1 + ξ σ ( X i : n u ) ) ( 1 F ( X i : n ; ξ , σ ) ) s 2 = ξ 1 F n ( u ) s 2 1 + ξ X i : n u σ s 2 / ξ log 1 + ξ X i : n u σ ,
m g 3 , 0 , 1 ( ξ , σ ) = 1 F ( X i : n ; ξ , σ ) = 1 F n ( u ) ( 1 G ( X i : n u ) ) .
In the first step, the interim estimator ( ξ ^ 1 , σ ^ 1 ) can be obtained via nonlinear minimization:
( ξ ^ 1 , σ ^ 1 ) = arg min ( ξ , σ ) i = n u + 1 n [ M g 1 , 0 , s 1 ( ξ , σ ) m g 1 , 0 , s 1 ( ξ , σ ) ] 2 ,
The following second step with ( ξ ^ 1 , σ ^ 1 ) as initial values and the third step with ( ξ ^ 2 , σ ^ 2 ) as initial values lead to another optimization:
( ξ ^ 2 , σ ^ 2 ) = arg min ( ξ , σ ) i = n u + 1 n [ M g 2 , 0 , s 2 ( ξ , σ ) m g 2 , 0 , s 2 ( ξ , σ ) ] 2 ,
( ξ ^ 3 , σ ^ 3 ) = arg min ( ξ , σ ) i = n u + 1 n [ M g 3 , 0 , 1 ( ξ , σ ) m g 3 , 0 , 1 ( ξ , σ ) ] 2 .
Based on the idea of weighted least squares regression, we find that the performance of the third step estimation ( ξ ^ 3 , σ ^ 3 ) can be further improved by adding suitable weights in Equation (15). Set the weight to w i =(Var [ F ( X i : n ) ] ) 2 , where Var [ F ( X i : n ) ] = i ( n i + 1 ) ( n + 1 ) 2 ( n + 2 ) . A revised version of the given nonlinear optimization is produced by adding weights:
( ξ ^ 3 , σ ^ 3 ) = arg min ( ξ , σ ) i = n u + 1 n w i [ M g 3 , 0 , 1 ( ξ , σ ) m g 3 , 0 , 1 ( ξ , σ ) ] 2 .
Note that M g 3 , 0 , 1 ( ξ , σ ) = 1 i n + 1 ; we modify M g 3 , 0 , 1 ( ξ , σ ) as 1 i 0.35 n to smooth the error of VaR via numerous simulations. Combined with Equations (13), (14) and (16), the GWNLSM was proposed. One advantage of the proposed GWNLSM over the GPWME is that it obtains a more stable extreme quantile estimator. The effect of the error is reduced because each squared deviation term is multiplied by the corresponding weight. The standard “optim” function in R is used to implement the GWNLSM estimator in our numerical studies.Without a loss of generality, we take all starting values as ( ξ , σ ) = ( 0.1 , 0.1 ) in the following simulation and application. When ξ > 0 , the GPD is heavy-tailed. The proposed estimated method is only applicable to the statistical inference of the heavy-tailed GPD.

3.2.2. ( s 1 , s 2 ) Estimation

Based on the GWNLSM method, the values of s 1 and s 2 are calculated via a large number of simulations. With s 1 and s 2 fixed within a selection of ranges (−2, 2) and (0, 2), and a step size of 0.05, s 1 and s 2 have a total of 3280 pairs of combinations. A total of 3280 synthetic samples were employed to evaluate the root-mean-square error (RMSE) for VaR p of the fixed ξ , p, and F n ( u ) values. The values of ξ range from 0.1 to 0.9, the values of p are 0.999 and 0.9999 and those of F n ( u ) are 0.98, 0.99 and 0.995, where p > F n ( u ) .
First, for a given ξ , p and F n ( u ) , count the minimum and maximum values of the RMSE for VaR p corresponding to each set of parameters and then change the values of u and p, in turn, to calculate the corresponding RMSE values. Taking ξ = 0.1 as an example, because the values of u and p are different, there are seven combinations, based on the different values of s 1 and s 2 , a total of seven sets of RMSEs are calculated. Specifically, when F n ( u ) is 0.98, p takes 0.99, 0.999, and 0.9999; when F n ( u ) is 0.99, p takes 0.999 and 0.9999; and when F n ( u ) is 0.995, p takes 0.999 and 0.9999. The results are shown in Table 1.
Table 1 shows the different values of ξ , p, and F n ( u ) , as well as the corresponding actual number of samples used. The values in the table are estimated via the corresponding s 1 and s 2 values; then, the corresponding RMSE values are obtained, where min and max represent the minimum and maximum values of RMSE, respectively.
Second, for the given ξ and p, fix F n ( u ) as a certain value, and sort the RMSE values from smallest to largest, selecting the top 5% values and their corresponding s 1 and s 2 . For these s 1 and s 2 , they are weighted and averaged, with the weight being the reciprocal of the corresponding RMSE divided by the reciprocal sum of the RMSE. Taking ξ = 0.1 as an example, first sort the RMSE calculated by F n ( u ) = 0.98 and p = 0.99 and select the first 5% of the lower RMSE values and their corresponding s 1 and s 2 . Based on the selected RMSE value, the weighted averages of s 1 and s 2 are carried out to obtain the final weighted s 1 and s 2 values. The weights of s 1 and s 2 are the reciprocal of their corresponding RMSE divided by the reciprocal of the previous 5% RMSE. At this point, the s 1 and s 2 values corresponding to ξ = 0.1, and the F n ( u ) = 0.98 and p = 0.99 can be calculated. Following the above steps, the s 1 and s 2 values corresponding to F n ( u ) = 0.98, p = 0.999 and p = 0.9999 can also be calculated. This is the result based on F n ( u ) classification when ξ = 0.1. Similarly, we can obtain the result by fixing p as a certain value. By repeating the above steps, we can find the optimal values for s 1 and s 2 .
Third, the values of s 1 and s 2 based on fixing F n ( u ) significantly fluctuate through calculation and comparison. Thus, the results of s 1 and s 2 based on fixing p are considered. Figure 2 is plotted in accordance with the p score, in which the upper three solid lines represent the image of s 1 , the lower three dashed lines represent the image of s 2 , and the different colored lines represent different values of p corresponding to VaR p . From the image, it can be observed that the value of s 2 fluctuates very little and the basic value of s 2 is around 1, while the basic value of s 1 is negative and the change fluctuation is relatively large. Based on the above analysis, s 2 = 1 and s 1 are used to find the function relationship. According to the results of s 1 and ξ , p shown in Figure 2, we can establish a linear regression model for s 1 via the “lm” function in R. Through a regression analysis of the value s 1 and ξ , p , the following model is obtained:
s 1 = 0.4 ξ + 30 p 30 .
Equation (17) first requires using the existing method to estimate ξ , and then determining s 2 according to the different p and ξ values; it is relatively troublesome to determine s 1 . Therefore, RMSE is sorted from small to large. We select the first 30% of the data and their corresponding s 1 and value ranges, for a given p and F n ( u ) ; and then selects ξ to take the intersection of the value range of s 1 when taking different values, as the recommended value of s 1 , that is, s 1 = 1.15 . In the following simulation and application, the presented GWNLSM uses s 1 = 1.15 and s 2 = 1 for estimation parameters.

4. Simulation Studies

We only choose heavy-tailed distributions since our main interest is heavy-tailed data. The performance of the proposed methods is investigated via Monte Carlo simulations. The competing estimators in clude the LME estimator by Zhang [14], the WNLS by Park and Kim [17], the GPWME by Chen et al. [18], and our GWNLSM method. In addition, besides the GPD simulated samples, we also generate samples from the Cauchy and Pareto distribution. The purpose is to evaluate the robustness of the proposed method when the population distribution is misjudged. Our focus is on estimating VaR rather than the GPD parameter itself; thus, the parameter estimation results are not shown. The 98th, 99th, and 99.5th sample quantiles are selected as the thresholds u for each sample, and the estimated extreme quantiles are VaR at 99%, 99.9%, and 99.99%, respectively. We summarize the simulation procedure as follows:
(1)
Generate a sample from the given distribution, where the sample size is 10,000.
(2)
Given 0 F n ( u ) < p < 1 , then u = x F n ( u ) .
(3)
Estimate the parameters ( ξ , σ ) and the VaR (given in Equation (3)).
(4)
Repeat steps (1)–(3) 2000 times.
(5)
Compute the RMSE and the absolute relative bias (ARB) of each VaR estimator.
Note that there are 50–200 observations with different thresholds for the GPD fit, with a total sample size of 10,000.
It can be seen that there is a reasonable number of exceedances for researching extreme tails. Table 2 and Table 3 show the simulation results. Table 2 is based on the simulation data generated via the GPD, using different estimation methods to estimate VaR and calculate the corresponding index values. Table 3 uses the simulation data generated from the Cauchy distribution and Pareto distribution to estimate VaR. To highlight the objective under the condition of satisfying heavy-tailed distribution characteristics, our method proves to be effective even when the specific distribution form is incorrectly determined.
Example 1.
GPD( ξ , σ ) with different ξ values and σ = 1 .
As can be seen from Table 2, by comparison, the LME and WNLS perform better for estimating VaR at 99% in terms of RMSE and ARB. However, with an increasing p in the estimated quantile, the proposed GWNLSM’s performance improves. Overall, the GWNLSM is relatively stable. In all cases of estimation for VaR at 99.99%, the GWNLSM outperforms all other estimators in terms of the RMSE, regardless of the used threshold value. Notably, it can be seen that when ξ is relatively small, the LME estimate is relatively accurate. As ξ increases, the GWNLSM estimation error decreases, resulting in more accurate estimates compared to the other methods in most cases, as assessed via RMSE and ARB. In addition, as the threshold increases, the GWNLSM exhibits a smaller ARB for VaR at 99.99% compared to the other methods.
In order to facilitate the comparison and illustrate the effect of method improvement, we use F n ( u ) = 0.995, p = 0.9999, and p = 0.999 as examples to draw comparisons, as shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. The abscissa of Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 is the value of ξ , where the ordinates in Figure 3, Figure 4, Figure 6 and Figure 7 represent the RMSE of VaR and the ordinates in Figure 5 and Figure 8 represent the ARB of VaR. We generate 10,000 samples from the GPD with σ = 1 and ξ ranging from 0.1 to 1. This process is repeated 2000 times to calculate the RMSE and ARB of VaR at 99.99% and 99.9%. In terms of the RMSE and ARB of VaR at 99.99% and 99.9%, to better illustrate the advantages of the different estimation methods and show a discernible difference among curves in Figure 3 and Figure 6, we provide Figure 4 and Figure 7 as exploded views of Figure 3 and Figure 6, respectively. Here, in order to save space, we have only drawn a partial trend chart. In Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8, we can see that the GWNLSM method performs better overall. For the RMSE evaluation criteria, GWNLSM estimation performed well in most cases. In terms of the ARB evaluation criteria, when ξ is lower, the LME estimates VaR more accurately, and when ξ is larger, the GWNLSM estimates VaR more accurately.
Example 2.
Cauchy (μ, σ). The cdf of the Cauchy distribution is defined as
F μ , σ ( x ) = 1 π arctan x μ σ , < x < + ,
where μ and σ are location and scale parameters.
We generated samples from the standard Cauchy distribution ( μ = 0 , σ = 1 ) and presented the results in Table 3 with different thresholds. Table 3 illustrates that the GWNLSM outperforms other methods at 99.9% and 99.99% in terms of the RMSE and ARB with the 99th and 99.5th sample quantiles as thresholds.
Example 3.
Pareto ( σ , μ , α ).
The cdf of the Pareto (type I) distribution is given by
F ( x ) = 1 x μ σ α , x μ + σ ,
where μ, σ, and α are the location, scale, and shape parameters.
The Pareto distribution is also another heavy-tailed distribution. We generated Pareto samples with σ = 1 and α = 1 , and the results are presented in Table 3. As can be seen in Table 3, in most cases, our proposed method performs best again.
As can be seen from Table 3, for the Cauchy and the Pareto distributions, the GWNLSM parameter estimation method performs relatively well, regardless of a high- or low-threshold u, and the accuracy requirements of the estimated VaR are evidently superior. The higher the accuracy of the estimate, the more obvious the contrast error of various methods appears, and it is evident that the GWNLSM methods exhibit superiority over the LME, WNLS and GPWME methods in terms of performance. Whether it is a Cauchy or Pareto distribution, GWNLSM is optimal in most cases for VaR at 99.9% and VaR at 99.99%.

5. Actual Data Processing and Analysis

To verify the performance of different estimators, we chose PM2.5 data for our experimental analysis. PM2.5 data were obtained from the China National Environmental Monitoring Center. The center is a national technical institution and is directly affiliated with the Ministry of Ecology and Environment. It operates as the center of technology, network, information, quality control, and training for national environmental monitoring. In addition, it collects at least 100 million environmental monitoring data annually, which provides a strong guarantee for scientifically and accurately assessing national environmental quality. The PM2.5 monitoring results are also published online (https://www.cnemc.cn/ (accessed on 19 February 2023) We use the Beijing PM2.5 dataset from 2015 to 2022. The sample size is 2737. In order to preserve the original nature of the data and reduce the influence of scale parameter σ of the GPD, making it as small as possible, we changed the data separately, with the PM2.5 dataset reduced to 1% of the original. The following results are calculated based on transformed data.
We select the threshold and the corresponding analysis calculation using the following steps. First, when the exponential QQ plot is convex, indicating that the empirical quantile is growing faster than the theoretical quantile, the distribution is heavy-tailed. Conversely, the explanation is a short-tailed distribution. The tail distribution characteristics of PM2.5 are examined. The exponential QQ plot is given in Figure 9. It is worth noting that, in the exponential QQ plot of this study, the x axes are theoretical quantiles, and the y axes are empirical quantiles. Contrary to the aforementioned theory QQ, Figure 9 shows a concave shape; it can be observed that the data used are subject to a heavy-tailed distribution.
For the heavy-tailed features of the diagnostic data, we also plotted the empirical mean excess function (EMEF), given in Figure 10. In general, if EMEF has a significant linear change after exceeding a certain threshold and the slope is positive, it indicates that the excess threshold data follow the GPD and the shape parameter is greater than 0. For heavy-tailed data, this threshold can be selected as the threshold for analysis; if EMEF has a significant linear change after a certain threshold, but the slope is negative, it indicates that the observed data are short-tailed. As can be seen from Figure 10, the dataset satisfied the heavy-tailed distribution.
Secondly, a suitable threshold is selected. The preliminary value of the threshold can be roughly observed in the mean residual life plot. Another way to initially select a threshold is through the Hill plot in Figure 10. The trend stabilization point of the Hill plot is generally determined as a threshold. By observing the EMEF and Hill plots in Figure 10, it can be intuitively judged that the u value is about 1.
In order to find a more suitable and objective threshold u, we use goodness-of-fit tests to select the threshold. The main objective is to find the lowest threshold such that the highest number of exceedances above the threshold follows the GPD. We use the AD test statistic [22].
The details of the threshold selection method, instance calculation, and simulation steps are as follows.
(1)
Threshold selection. For the PM2.5 data, we combine the AD statistic and the Raw Down method to select the threshold u = 0.83 , exceeding the number n n u = 473 , F n ( u ) = 82.72 % , where Raw Down means that the test begins at the largest threshold and choose the first threshold until the test is rejected, then the threshold before the rejection is chosen [24].
(2)
Based on the threshold u using LME, WNLS, and GPWME, the GWNLSM method is used to calculate the estimates of ξ ^ , σ ^ and VaR t (given in Equation (3)). The specific results are shown in Table 4.
(3)
Based on the above parameter estimation results, n n u random samples following GPD ( u , ξ ^ , σ ^ ) and ξ ˜ , σ ˜ , VaR s are calculated.
(4)
Repeat (3) 1000 times; RMSE and ARB for VaR are calculated. Some results are shown in Table 5.
Table 5 compares the index values of the VaR based on four parameter estimation methods for the PM2.5 data. From Table 5, we can see that the LME and proposed GWNLSM exhibit better performances for VaR at 99.9% and VaR at 99.99% in terms of RMSE and ARB. Specifically, it can be observed that the proposed method works better for VaR at 99.99% in most cases.

6. Conclusions

In this study, based on the GPWME method within the POT framework, we present a novel GPD parameter estimation GWNLSM. The proposed method is applicable to the heavy-tailed GPD when the shape parameter ξ is greater than 0. This method utilizes the concepts of nonlinear least squares estimation and moment estimation while applying GPWME to the POT framework. In terms of the RMSE and ARB, extensive simulation studies and real-world datasets have shown that the advantage of the newly proposed estimator when VaRs are at high confidence levels. For example, when the estimated extreme quantile VaR is 99.99%, GWNLSM is optimal in most cases. Furthermore, when a set of data satisfies other heavy-tailed distributions, we can still use the asymmetrical GPD to obtain approximate extreme quantile estimates. The resulting extreme quantile estimates also exhibit relatively small errors. The new method performs well for heavy-tailed Cauchy and Pareto distributions and the actual dataset we present. A future research direction involves exploring more accurate and simpler methods to estimate the GPD parameters within the POT framework and optimizing the selection of s 1 and s 2 .

Author Contributions

Formal analysis, W.C. (Wenru Chen) and X.Z.; funding acquisition, X.Z., H.C., Q.J. and W.C. (Weihu Cheng); supervision, X.Z., H.C. and W.C. (Weihu Cheng); validation, X.Z., H.C. and W.C. (Weihu Cheng); writing—original draft, W.C. (Wenru Chen), X.Z., M.Z. and Q.J.; writing—review and editing, W.C. (Wenru Chen) and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the National Office for Philosophy and Social Sciences (grant number: 20BTJ054) for their support.

Data Availability Statement

The data are published in https://www.cnemc.cn/ (accessed on 19 February 2023).

Acknowledgments

The authors would like to thank the referees and editors for their very helpful and constructive comments, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

This section gives the calculation process of M g i , 0 , s i ( ξ , σ ) , i = 1 , 2 , 3 in detail.
Suppose that we have a sample X 1 , X 2 , , X n of size n. For convenience, we write F ( X i : n ) as F i , where F i follows Beta ( i , n i + 1 ) . Hence, we have
E [ log ( 1 F i ) ] = Γ ( n + 1 ) Γ ( i ) Γ ( n + 1 i ) 0 1 x i 1 ( 1 x ) n i log ( 1 x ) d x = j = 1 i 1 n + 1 j .
and
E ( 1 F i ) a = Γ ( n + 1 ) Γ ( i ) Γ ( n + 1 i ) 0 1 x i 1 ( 1 x ) n + a i d x = Γ ( n + 1 ) Γ ( n + a + 1 i ) Γ ( n + 1 i ) Γ ( n + a + 1 ) h ( a ) .
Hence,
M g 1 , 0 , s 1 ( ξ , σ ) = 1 1 F ( u ) E ( 1 F i ) s 1 + 1 = Γ ( n + 1 ) Γ ( n + s 1 + 2 i ) Γ ( n i + 1 ) Γ ( n + s 1 + 2 ) ( 1 F ( u ) ) .
At the same time, we can also obtain the following results
E ( 1 F i ) s 2 ( log ( 1 F i ) ) = Γ ( n + 1 ) Γ ( i ) Γ ( n + 1 i ) 0 1 x i 1 ( 1 x ) n + s 2 i log ( 1 x ) d x = h ( s 2 ) j = 1 i 1 n + s 2 + 1 j .
Therefore,
M g 2 , 0 , s 2 ( ξ , σ ) = ξ log ( 1 F ( u ) ) E ( 1 F i ) s 2 ξ E ( 1 F i ) s 2 ( log ( 1 F i ) ) = ξ log ( 1 F ( u ) ) h ( s 2 ) ξ h ( s 2 ) j = 1 i 1 n + s 2 + 1 j = ξ h ( s 2 ) log ( 1 F ( u ) ) + j = 1 i 1 n + s 2 + 1 j ,
and
M g 3 , 0 , 1 ( ξ , σ ) = E ( 1 F i ) = h ( 1 ) = 1 i n + 1 .

References

  1. Tawn, J.A. Bivariate extreme value theory: Models and estimation. Biometrika 1988, 75, 397–415. [Google Scholar] [CrossRef]
  2. Tawn, J.A. Modelling multivariate extreme value distributions. Biometrika 1988, 77, 245–253. [Google Scholar] [CrossRef]
  3. Embrechts, P.; Resnick, S.I.; Gennady, S. Extreme Value Theory as a Risk Management Tool. N. Am. Actuar. J. 1999, 3, 30–41. [Google Scholar] [CrossRef]
  4. Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
  5. Hosking, J.R.; Wallis, J.R. Parameter and quantile estimation for the generalized pareto distribution. Technometrics 1987, 29, 339–349. [Google Scholar] [CrossRef]
  6. Greenwood, J.A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R. Probability weighted moments: Definition and relation to parameters of several distributions expressable in inverse form. Water Resour. Res. 1979, 15, 1049–1054. [Google Scholar] [CrossRef]
  7. Smith, R.L. Maximum likelihood estimation in a class of nonregular cases. Biometrika 1985, 72, 67–90. [Google Scholar] [CrossRef]
  8. Moharram, S.; Gosain, A.; Kapoor, P. A comparative study for the estimators of the generalized pareto distribution. J. Hydrol. 1993, 150, 169–185. [Google Scholar] [CrossRef]
  9. Hosking, J.R. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B (Methodol.) 1990, 52, 105–124. [Google Scholar] [CrossRef]
  10. Ashkar, F.; Ouarda, T.B. On some methods of fitting the generalized pareto distributions. J. Hydrol. 1996, 177, 117–141. [Google Scholar] [CrossRef]
  11. Castillo, E.; Hadi, A.S. Fitting the generalized pareto distribution to data. J. Am. Stat. Assoc. 1997, 92, 1609–1620. [Google Scholar] [CrossRef]
  12. From, S.G.; Ratnasingam, S. Some efficient closed-form estimators of the parameters of the generalized pareto distribution. Environ. Ecol. Stat. 2022, 29, 827–847. [Google Scholar] [CrossRef]
  13. Rasmussen, P.F. Generalized probability weighted moments: Application to the generalized pareto distribution. Water Resour. Res. 2001, 37, 1745–1751. [Google Scholar] [CrossRef]
  14. Zhang, J. Likelihood moment estimation for the generalized pareto distribution. Aust. N. Z. J. Stat. 2007, 49, 69–77. [Google Scholar] [CrossRef]
  15. Zhang, J.; Stephens, M.A. A new and efficient estimation method for the generalized pareto distribution. Technometrics 2009, 51, 316–325. [Google Scholar] [CrossRef]
  16. Song, J.; Song, S. A quantile estimation for massive data with generalized pareto distribution. Comput. Stat. Data Anal. 2012, 56, 143–150. [Google Scholar] [CrossRef]
  17. Park, M.H.; Kim, J.H. Estimating extreme tail risk measures with generalized pareto distribution. Comput. Stat. Data Anal. 2016, 98, 91–104. [Google Scholar] [CrossRef]
  18. Chen, H.; Cheng, W.; Zhao, J.; Zhao, X. Parameter estimation for generalized Pareto distribution by generalized probability weighted moment-equations. Commun.-Stat.-Simul. Comput. 2017, 46, 7761–7776. [Google Scholar] [CrossRef]
  19. Chen, P.; Ye, Z.S.; Zhao, X. Minimum distance estimation for the generalized pareto distribution. Technometrics 2017, 59, 528–541. [Google Scholar] [CrossRef]
  20. Martín, J.; Parra, M.I.; Pizarro, M.M.; Sanjuán, E.L. Baseline methods for the parameter estimation of the generalized pareto distribution. Entropy 2022, 24, 178. [Google Scholar] [CrossRef] [PubMed]
  21. Langousis, A.; Mamalakis, A.; Puliga, M.; Deidda, R. Threshold detection for the generalized pareto distribution: Review of representative methods and application to the noaa ncdc daily rainfall database. Water Resour. Res. 2016, 52, 2659–2681. [Google Scholar] [CrossRef]
  22. Choulakian, V.; Stephens, M.A. Goodness-of-fit tests for the generalized pareto distribution. Technometricsh 2001, 43, 478–484. [Google Scholar] [CrossRef]
  23. G’Sell, M.G.; Wager, S.; Chouldechova, A. Sequential selection procedures and false discovery rate control. J. R. Stat. Ser. B Stat. Methodol. 2016, 78, 423–444. [Google Scholar] [CrossRef]
  24. Bader, B.; Yan, J.; Zhang, X. Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Ann. Appl. Stat. 2018, 12, 310–329. [Google Scholar] [CrossRef]
  25. Yang, X.; Zhang, J.; Ren, W.X. Threshold selection for extreme value estimation of vehicle load effect on bridges. Int. J. Distrib. Sens. Netw. 2018, 14, 1–12. [Google Scholar] [CrossRef]
  26. Annabestani, M.; Saadatmand-Tarzjan, M. A new threshold selection method based on fuzzy expert systems for separating text from the background of document images. Iran. J. Sci. Technol. Trans. Electrical Eng. 2019, 43, 219–231. [Google Scholar] [CrossRef]
  27. Curceac, S.; Atkinson, P.M.; Milne, A.; Wu, L.; Harris, P. An evaluation of automated GPD threshold selection methods for hydrological extremes across different scales. J. Hydrol. 2020, 585, 124845. [Google Scholar] [CrossRef]
  28. Silva Lomba, J.; Fraga Alves, M.I. L-moments for automatic threshold selection in extreme value analysis. Stoch. Environ. Res. Risk Assess. 2020, 34, 465–491. [Google Scholar] [CrossRef]
  29. Balkema, A.A.; De Haan, L.I. Residual life time at great age. Ann. Probab. 1974, 2, 792–804. [Google Scholar] [CrossRef]
  30. De Zea Bermudez, P.; Kotz, S. Parameter estimation of the generalized Pareto distribution—Part I. J. Stat. Plan. Inference 2010, 140, 1353–1373. [Google Scholar] [CrossRef]
  31. Landwehr, J.M.; Matalas, N.; Wallis, J. Estimation of parameters and quantiles of wakeby distributions. Water Resour. Res. 1979, 15, 1361–1372. [Google Scholar] [CrossRef]
Figure 1. pdfs of GPD ( μ , ξ , σ ) , μ = 0 , σ = 1 .
Figure 1. pdfs of GPD ( μ , ξ , σ ) , μ = 0 , σ = 1 .
Symmetry 16 00365 g001
Figure 2. Plot of s 1 and s 2 with different ξ values (solid and dashed lines represent values of s 1 and s 2 , respectively, while p = 0.99 , 0.999 , 0.9999 ).
Figure 2. Plot of s 1 and s 2 with different ξ values (solid and dashed lines represent values of s 1 and s 2 , respectively, while p = 0.99 , 0.999 , 0.9999 ).
Symmetry 16 00365 g002
Figure 3. RMSE of VaR at 99.99% with σ = 1 and 98th quantile as threshold.
Figure 3. RMSE of VaR at 99.99% with σ = 1 and 98th quantile as threshold.
Symmetry 16 00365 g003
Figure 4. RMSE of VaR at 99.99% with σ = 1 and 98th quantile as threshold (exploded view of Figure 3).
Figure 4. RMSE of VaR at 99.99% with σ = 1 and 98th quantile as threshold (exploded view of Figure 3).
Symmetry 16 00365 g004
Figure 5. ARB of VaR at 99.99% with σ = 1 and 98th quantile as the threshold.
Figure 5. ARB of VaR at 99.99% with σ = 1 and 98th quantile as the threshold.
Symmetry 16 00365 g005
Figure 6. RMSE of VaR at 99.9% with σ = 1 and 98th quantile as threshold.
Figure 6. RMSE of VaR at 99.9% with σ = 1 and 98th quantile as threshold.
Symmetry 16 00365 g006
Figure 7. RMSE of VaR at 99.9% with σ = 1 and 98th quantile as the threshold (exploded view of Figure 6).
Figure 7. RMSE of VaR at 99.9% with σ = 1 and 98th quantile as the threshold (exploded view of Figure 6).
Symmetry 16 00365 g007
Figure 8. ARB of VaR at 99.9% with σ = 1 and 98th quantile as threshold.
Figure 8. ARB of VaR at 99.9% with σ = 1 and 98th quantile as threshold.
Symmetry 16 00365 g008
Figure 9. The QQ diagram of the PM2.5 data; the ordinate x represents order data.
Figure 9. The QQ diagram of the PM2.5 data; the ordinate x represents order data.
Symmetry 16 00365 g009
Figure 10. The EMEF and Hill diagrams of the PM2.5 data. In (a), the black and grey lines represent the empirical mean residual life and its confidence intervals with confidence level 0.95. In (b), the black and blue lines represent the Hill estimator of the GPD tail index and its confidence intervals with confidence level 0.95.
Figure 10. The EMEF and Hill diagrams of the PM2.5 data. In (a), the black and grey lines represent the empirical mean residual life and its confidence intervals with confidence level 0.95. In (b), the black and blue lines represent the Hill estimator of the GPD tail index and its confidence intervals with confidence level 0.95.
Symmetry 16 00365 g010
Table 1. s 1 and s 2 correspond to the RMSE values of VaR p , where s 1 and s 2 are used in (13) and (14).
Table 1. s 1 and s 2 correspond to the RMSE values of VaR p , where s 1 and s 2 are used in (13) and (14).
ξ p = 0.99p = 0.999p = 0.9999
Min Max Min Max Min Max
n n u = 200 ( F n ( u ) = 0.98)
0.10.15050.15070.52410.52451.89201.8943
0.20.24170.38801.04681.98654.53505.9320
0.30.38740.57272.09013.328810.93112.778
0.40.62030.63424.17394.263526.57426.820
0.50.99161.01798.32188.483364.59265.264
0.61.58791.646916.62917.084160.04161.07
0.72.53592.619333.19333.911386.40398.62
0.84.05514.262666.22668.616987.721012.01
0.96.46086.8859132.31136.162489.052655.14
n n u = 200 ( F n ( u ) = 0.99)
0.1 0.52060.54652.00612.0466
0.2 1.03712.70494.87878.0178
0.3 2.06644.310911.91216.020
0.4 4.11894.315329.428729.761
0.5 8.22358.665673.15074.088
0.6 16.41916.982182.86185.17
0.7 32.70834.350461.78476.22
0.8 65.73174.5421184.331604.15
0.9 131.40173.902853.843542.59
n n u = 200 ( F n ( u ) = 0.995)
0.1 0.55440.63662.05072.2331
0.2 1.10212.58295.05438.8306
0.3 2.19124.201812.53417.377
0.4 4.35984.596631.22831.827
0.5 8.67299.176478.28281.745
0.6 17.15619.610198.75216.080
0.7 34.49948.666508.52548.680
0.8 68.833125.121301.211548.14
0.9 137.790293.193388.585448.34
Table 2. RMSE and ARB are calculated for each VaR estimation from the GPD with different ξ values.
Table 2. RMSE and ARB are calculated for each VaR estimation from the GPD with different ξ values.
RMSEARB
VaR 99 % VaR 99 . 9 % VaR 99 . 99 % VaR 99 % VaR 99 . 9 % VaR 99 . 99 %
GPD (0.2, 1)
F n ( u ) = 0.98LME0.23581.02364.69170.02490.05480.1364
WNLS0.23631.10685.12400.02500.06000.1528
GPWME0.24061.04645.12780.02550.05580.1459
GWNLSM0.24231.04124.52170.02530.05640.1388
F n ( u ) = 0.99LME 1.02745.2043 0.05500.1511
WNLS 1.09025.6780 0.05900.1715
GPWME 1.03885.9179 0.05550.1661
GWNLSM 1.03504.8295 0.05590.1484
F n ( u ) = 0.995LME 1.08895.4096 0.05810.1559
WNLS 1.09735.6234 0.05920.1718
GPWME 1.09286.0105 0.05820.1675
GWNLSM 1.09595.0130 0.05890.1534
GPD (0.4, 1)
F n ( u ) = 0.98LME0.59604.158128.5260.03590.08890.2208
WNLS0.59634.348129.2820.03590.09450.2346
GPWME0.60434.211130.0110.03640.08970.2286
GWNLSM0.62104.144326.5430.03690.09050.2209
F n ( u ) = 0.99LME 4.174933.105 0.08930.2527
WNLS 4.321833.957 0.09390.2705
GPWME 4.219436.377 0.09010.2708
GWNLSM 4.102029.095 0.08920.2411
F n ( u ) = 0.995LME 4.355935.972 0.09270.2684
WNLS 4.303634.655 0.09350.2800
GPWME 4.362438.747 0.09280.2800
GWNLSM 4.325030.981 0.09320.2542
GPD (0.8, 1)
F n ( u ) = 0.98LME3.802168.5811157.570.06240.17040.4027
WNLS3.782067.2921054.260.06230.17200.3890
GPWME3.812868.0951142.270.06270.16900.3976
GWNLSM4.048865.579989.230.06560.16970.3828
F n ( u ) = 0.99LME 69.8051480.19 0.17320.4870
WNLS 68.4721349.27 0.17530.4726
GPWME 70.1291520.94 0.17390.4998
GWNLSM 65.0321155.32 0.16740.4314
F n ( u ) = 0.995LME 71.1131812.26 0.17580.5483
WNLS 67.3751436.30 0.17260.5027
GPWME 71.0831799.72 0.17580.4694
GWNLSM 67.9831292.96 0.17200.4694
GPD (2, 1)
F n ( u ) = 0.98LME981.8343.28 × 10 6 1.32 × 10 9 0.15350.44901.1581
WNLS958.7562.78 × 10 5 8.93 × 10 7 0.15110.40280.8835
GPWME961.9813.11 × 10 5 1.11 × 10 8 0.15080.43111.0458
GWNLSM3257.374.49 × 10 5 5.82 × 10 7 0.59390.84140.9610
F n ( u ) = 0.99LME 3.58 × 10 6 2.65 × 10 9 0.47861.72649
WNLS 3.02 × 10 6 1.56 × 10 9 0.43391.2574
GPWME 3.57 × 10 5 2.29 × 10 8 0.47931.6624
GWNLSM 4.89 × 10 5 5.23 × 10 7 0.96841.0007
F n ( u ) = 0.995LME 3.59 × 10 5 6.07 × 10 8 0.47772.5958
WNLS 2.96 × 10 5 2.45 × 10 8 0.42681.5203
GPWME 4.26 × 10 5 7.44 × 10 8 0.54104.5462
GWNLSM 4.79 × 10 5 4.99 × 10 7 0.95680.9976
Table 3. RMSE and ARB for different heavy-tailed distributions.
Table 3. RMSE and ARB for different heavy-tailed distributions.
RMSEARB
VaR 99 % VaR 99.9 % VaR 99.99 % VaR 99 % VaR 99.9 % VaR 99.99 %
Cauchy (0, 1)
F n ( u ) = 0.98LME3.109288.4432318.320.07750.21200.4773
WNLS3.086385.6802033.4460.07720.21210.4527
GPWME3.086687.6502230.660.07700.20950.4681
GWNLSM3.382982.2351949.250.08390.20780.4417
F n ( u ) = 0.99LME 90.6503202.64 0.21660.5904
WNLS 85.8802707.33 0.21380.5293
GPWME 90.2693123.24 0.21580.5799
GWNLSM 82.3702462.95 0.20660.5153
F n ( u ) = 0.995LME 91.4164325.15 0.21780.7113
WNLS 84.3343070.36 0.21110.5997
GPWME 91.2094022.12 0.21750.6959
GWNLSM 85.4473379.91 0.21060.5925
Pareto (1, 1)
F n ( u ) = 0.98LME9.3271280.8017496.160.07560.217430.5073
WNLS9.3032270.1506702.710.07540.215010.4791
GPWME9.3260278.7767322.520.07550.21630.5041
GWNLSM10.079263.0216286.660.08140.21260.4683
F n ( u ) = 0.99LME 286.7699886.04 0.22080.6097
WNLS 271.8478537.50 0.21680.5585
GPWME 286.7549904.80 0.220840.6094
GWNLSM 262.6707461.39 0.21100.5345
F n ( u ) = 0.995LME 287.94614914.1 0.22120.7202
WNLS 265.75710433.2 0.21330.6269
GPWME 287.63613634.9 0.22100.7170
GWNLSM 274.0819534.50 0.21610.6060
Table 4. Parameter estimation for real data using different methods.
Table 4. Parameter estimation for real data using different methods.
ξ ^ σ ^
LME0.07820.5185
WNLS0.11690.4953
GPWME0.08130.5169
GWNLSM0.04810.5363
u0.83
Table 5. VaR estimation of the PM2.5 data.
Table 5. VaR estimation of the PM2.5 data.
RMSEARB
VaR at 99 % VaR at 99.9 % VaR at 99.99 % VaR at 99 % VaR at 99.9 % VaR at 99.99 %
LME
LME0.9396423.5674759.3593410.0299510.0690500.140593
WNLS1.1147304.34789611.124570.0495610.1028060.165263
GPWME1.0333654.19551211.239570.0456600.0966720.161397
GWNLSM1.0211223.7811479.3569010.0456340.0836340.140933
WNLS
LME1.0686714.28840711.870060.0329680.0774290.136665
WNLS1.1942444.96372513.537520.0371380.0914900.159159
GPWME1.1106514.80094913.695990.0343030.0861940.155536
GWNLSM1.0970044.36199711.530570.0343090.0800410.136313
GPWME
LME0.7290662.945358.1932690.0259590.0613760.1145927
WNLS0.7876713.4328989.4332410.0281970.0728600.132312
GPWME0.7394363.2871489.4793130.0264140.0682390.128704
GWNLSM0.7455563.0172418.0558310.0267760.0642570.114585
GWNLSM
LME1.5969074.87552910.942570.0412400.0834990.134014
WNLS1.8890575.81344612.880810.0494640.1013600.160320
GPWME1.7399655.61629212.956790.0450030.0952990.155665
GWNLSM1.6514004.85643310.349250.0433700.0832530.132417
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, W.; Zhao, X.; Zhou, M.; Chen, H.; Ji, Q.; Cheng, W. Statistical Inference and Application of Asymmetrical Generalized Pareto Distribution Based on Peaks-Over-Threshold Model. Symmetry 2024, 16, 365. https://doi.org/10.3390/sym16030365

AMA Style

Chen W, Zhao X, Zhou M, Chen H, Ji Q, Cheng W. Statistical Inference and Application of Asymmetrical Generalized Pareto Distribution Based on Peaks-Over-Threshold Model. Symmetry. 2024; 16(3):365. https://doi.org/10.3390/sym16030365

Chicago/Turabian Style

Chen, Wenru, Xu Zhao, Mi Zhou, Haiqing Chen, Qingqing Ji, and Weihu Cheng. 2024. "Statistical Inference and Application of Asymmetrical Generalized Pareto Distribution Based on Peaks-Over-Threshold Model" Symmetry 16, no. 3: 365. https://doi.org/10.3390/sym16030365

APA Style

Chen, W., Zhao, X., Zhou, M., Chen, H., Ji, Q., & Cheng, W. (2024). Statistical Inference and Application of Asymmetrical Generalized Pareto Distribution Based on Peaks-Over-Threshold Model. Symmetry, 16(3), 365. https://doi.org/10.3390/sym16030365

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop