Next Article in Journal
GraphPPL.jl: A Probabilistic Programming Language for Graphical Models
Next Article in Special Issue
Advancing Continuous Distribution Generation: An Exponentiated Odds Ratio Generator Approach
Previous Article in Journal
Quantum-Like Approaches Unveil the Intrinsic Limits of Predictability in Compartmental Models
Previous Article in Special Issue
Maxpro Designs for Experiments with Multiple Types of Branching and Nested Factors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Representative Points of Generalized Alpha Skew-t Distribution and Applications

1
School of Mathematics, Renmin University of China, No. 59, Zhongguancun Street, Haidian District, Beijing 100872, China
2
Research Center for Frontier Fundamental Studies, Zhejiang Lab, Kechuang Avenue, Zhongtai Sub-District, Yuhang District, Hangzhou 311121, China
3
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai 519087, China
4
Department of Statistics and Data Science, Faculty of Science and Technology, BNU-HKBU United International College, 2000 Jintong Road, Tangjiawan, Zhuhai 519087, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2024, 26(11), 889; https://doi.org/10.3390/e26110889
Submission received: 9 September 2024 / Revised: 20 October 2024 / Accepted: 21 October 2024 / Published: 22 October 2024
(This article belongs to the Special Issue Number Theoretic Methods in Statistics: Theory and Applications)

Abstract

:
Assuming the underlying statistical distribution of data is critical in information theory, as it impacts the accuracy and efficiency of communication and the definition of entropy. The real-world data are widely assumed to follow the normal distribution. To better comprehend the skewness of the data, many models more flexible than the normal distribution have been proposed, such as the generalized alpha skew-t (GAST) distribution. This paper studies some properties of the GAST distribution, including the calculation of the moments, and the relationship between the number of peaks and the GAST parameters with some proofs. For complex probability distributions, representative points (RPs) are useful due to the convenience of manipulation, computation and analysis. The relative entropy of two probability distributions could have been a good criterion for the purpose of generating RPs of a specific distribution but is not popularly used due to computational complexity. Hence, this paper only provides three ways to obtain RPs of the GAST distribution, Monte Carlo (MC), quasi-Monte Carlo (QMC), and mean square error (MSE). The three types of RPs are utilized in estimating moments and densities of the GAST distribution with known and unknown parameters. The MSE representative points perform the best among all case studies. For unknown parameter cases, a revised maximum likelihood estimation (MLE) method of parameter estimation is compared with the plain MLE method. It indicates that the revised MLE method is suitable for the GAST distribution having a unimodal or unobvious bimodal pattern. This paper includes two real-data applications in which the GAST model appears adaptable to various types of data.

1. Introduction

Statistical distributions play a crucial role in information theory since they describe the probability characteristics of data or signals, and hence directly affect the accuracy and efficiency of the representation, transmission, compression, and reconstruction of information. Entropy, as the most important measure in the field of information theory, depends on the statistical distribution of the random variable. In many applications of information theory, it requires the assumption of the statistical distribution of the data. Although assumed to follow the normal distribution in most statistical analyses due to mathematical convenience and generality, real-world data frequently exhibit skewness, leading to the demand for more flexible models. The geometric Brownian motion (GBM) as a popular model of stochastic processes assumes that its solutions follow the log-normal distribution. Gupta et al. (2024) [1] indicated that the GBM yields trajectories significantly deviated from the reference distribution when the data do not meet the log-normal assumption. To deal with the limitations in such a scenario, some may consider correcting the model as in [1]. Constructing alternative distributions of the normal distribution has been a common concern.
The skew-normal (SN) distribution is an extension of the normal distribution that allows for skewness, capable of modeling asymmetric data. It was first introduced by Azzalini (1985) [2]. If a random variable Z has a probability density function (pdf) given by
ϕ ( z ; s ) = 2 ϕ ( z ) Φ ( s z ) , z R , s R ,
where ϕ ( · ) and Φ ( · ) are the pdf and cumulative distribution function (cdf) of the standard normal distribution, then Z follows the SN distribution, denoted as Z S N ( s ) . The parameter s controls the skewness of the distribution. When s = 0 , the SN distribution reduces to the standard normal distribution. With s > 0 , the SN distribution is right-skewed, while s < 0 implies left skewness.
The skew-t (ST) distribution is an intriguing example among scale mixtures of SN distributions. It was first formulated by Branco and Dey (2001) [3] and later extensively studied by Azzalini and Capitanio (2003) [4]. An ST random variable, Y S T ( s , ν ) , can be represented as
Y = Z V / ν ,
where Z S N ( s ) and V χ ν 2 , i.e., chi-square distribution with ν degrees of freedom, are independent of each other. The moment of Y exists only when the order is less than ν , which is the same condition required as the Student’s t-distribution with ν degrees of freedom, denoted by t ν . The construction method from the SN distribution to the ST distribution is similar to the approach used to derive the Student’s t-distribution from the normal distribution. The pdf of the ST distribution is given by
f ( y ; s , ν ) = 2 t ( y ; ν ) T 1 + ν y 2 + ν s y ; ν + 1 , y R , s R ,
where t ( · ) is the pdf of t ν , and T ( · ) is the cdf of t ν + 1 . The parameter ν controls the tail heaviness. As ν approaches infinity, the ST distribution approaches the SN distribution. Lower values of ν result in heavier tails, providing robustness against outliers. Similar to the SN distribution, the parameter s controls the skewness. When s = 0 , the ST distribution reduces to the Student’s t-distribution. Azzalini and Genton (2008) [5] conducted a quite extensive numerical exploration, demonstrating that the ST distribution can adapt well to various empirical problems. They utilized an autoregressive model of order one, Y ( t ) = β 0 + β 1 Y ( t 1 ) + ϵ ( t ) with β 0 R and | β 1 | 1 , to fit the 91 monthly interest rates of an Austrian bank. Their results clearly showed that the error components ϵ ( t ) have an ST distribution, where the small degrees of freedom parameter signifies heavy tails in the error distribution, allowing the ST model to better manage outliers than the normal distribution. The ST distribution, which combines the characteristics of the Student’s t-distribution and the SN distribution, is particularly suitable for the applications in finance that need to model returns with skewness and excess kurtosis, as well as in environmental studies where the focus is on modeling extreme events. Martínez-Flórez et al. (2020) [6] also mentioned other kinds of skew distributions like skew-Student-t distribution, skew-Cauchy distribution, skew-logistic distribution and skew-Laplace distribution. They summarized those distributions as skew-elliptical distributions since those distributions have a unified expression form of the density function as
h Y ( y ; s ) = 2 f ( y ) F ( s y ) , y , s R ,
where f ( · ) is a symmetric pdf, and F ( · ) is the corresponding cdf.
Another type of skew distribution is to add a coefficient function with an α argument to the density function. Elal-Olivero (2010) [7] proposed a distribution called alpha-skew-normal (ASN), with a pdf defined as
f ( x ; α ) = ( 1 α x ) 2 + 1 2 + α 2 ϕ ( x ) , x R , α R .
If a random variable X has the pdf as (4), we denote it as X A S N ( α ) . This distribution is more flexible than SN and ST distributions since it can be unimodal or multimodal by adjusting the α parameter. When α = 0 , the ASN distribution reduces to the standard normal distribution, X N ( 0 , 1 ) .
Although the ASN distribution is able to model both skew and bimodal data, it has limitations when data have tails thinner or thicker than the normal distribution. In order to fit stock data more accurately, Altun et al. (2018) [8] introduced a new generalized alpha skew-t (GAST) distribution combining the approaches of [4,7]. They combined the GAST distribution with the generalized autoregressive conditional heteroskedasticity (GARCH) model to build a new Value-at-Risk (VaR) prediction model for forecasting daily log returns in three years. They compared the failure rates of the GARCH models under different distribution assumptions including normal, Student’s t, ST and GAST. The results showed that the GAST distribution performs the best in the backtesting. The definition of GAST distribution and its properties with proof will be elaborated in the next section.
For an unknown continuous statistical distribution, an empirical distribution of a random sample is a traditional way to approximate the target distribution. However, it often leads to low accuracy, and hence the support points for the discrete approximation, also known as representative points (RPs), are explored in order to preserve the information of the target distribution as much as possible. Representative points have a big potential for applications in statistical simulation and inference, see Fang and Pan (2023) [9] for a comprehensive review. Various kinds of representative points of different statistical distributions have been explored in the literature. Especially for complex distributions, the study on the representative points is necessary. The concept of representative points is to simplify complex probability distributions with discrete points easier to manipulate, facilitating efficient computations and analyses. These points serve as a finite set that approximates the distribution of a random variable that can be either discrete or continuous and either univariate or multivariate. In this paper, we focus on the study of the representative points of the GAST distribution and applications. We first introduce the concepts of three kinds of RPs here, while the specific construction procedures are included in Section 4 with their applications on the estimation of moments and densities.
There are many existing criteria for choosing RPs of a distribution, such as Monte Carlo RPs (MC-RPs), quasi-Monte Carlo RPs (QMC-RPs) and mean square error RPs (MSE-RPs) that will be introduced as follows. In fact, the Kullback–Leibler (KL) divergence or relative entropy of two probability distributions is a good criterion for this purpose. The entropy has been utilized as a measure of the experimental design, for example, Lin et al. (2022) [10]. Due to computational complexity, entropy is not popularly used in generating RPs in applications. Therefore, in this article, we study MC-RPs, QMC-RPs, and MSE-RPs of the Generalized alpha skew-t distribution only.

1.1. Monte Carlo Representative Points

Let X be the population random variable with the cdf F ( x ) = P { X x } , x R . Various Monte Carlo methods provide ways to generate independent identically distributed (i.i.d.) samples { x 1 , , x n } from the population, and p ( x i ) = 1 n , i = 1 , , n . The empirical distribution of the random sample is defined as follows:
F n ( x ) = 1 n i = 1 n I { x i x } ,
where I A is the indicator function of A. The empirical distribution F n ( x ) should be close to F ( x ) in the sense of consistency. Hence, F n ( x ) can be regarded as an approximation of F ( x ) . We denote this empirical distribution of random samples generated by the Monte Carlo method as F M C . Traditional statistical inference is based on the empirical distribution. Efron (1979) [11] proposed a resampling technique, the bootstrap method, with which we can take a set of random samples from F M C instead of F. Combined with bootstrap, the MC-RPs have proven to be useful in statistical inference, such as parameter estimation, density estimation and hypothesis testing. However, the MC method has many limitations since the convergence rate of F n ( x ) F ( x ) in distribution as n , given by O ( 1 n ) , is too slow. The following two kinds of RPs improve the convergence rate nicely.

1.2. Quasi-Monte Carlo Representative Points

For a high-dimensional integration problem:
I ( f ) = 0 1 0 1 f ( y 1 , , y d ) d y 1 d y d = C d f ( y ) d y ,
where f is a continuous function on C d = 0 , 1 d . Suppose that Y = { y 1 , , y n } is a set of n points uniformly scattered in C d , we can estimate I ( f ) by
f ( y ) ¯ = 1 n i = 1 n f ( y i ) , y i Y .
If we generate Y by the MC method, the convergence rate of f ( y ) ¯ I ( f ) is O ( 1 / n ) as n . The quasi-Monte Carlo (QMC) method provides many ways for the construction of Y to increase the convergence rate. Through the QMC method, the convergence rate can reach O ( n 1 log d n ) according to Fang et al. (1994) [12]. For further theory studies, readers can refer to Hua and Wang (1981) [13] and Niederreiter (1992) [14]. In the study of [12], the F-discrepancy is used to measure the uniformity of Y in C d , which is defined by
D ( F , F n ) = sup x R d F ( x ) F n ( x ) ,
where F ( x ) is the cdf of uniform distribution U ( C d ) and F n ( x ) is the empirical distribution of Y . The Y that minimizes D ( F , F n ) is called QMC-RPs which have equal probability 1 / n .
For the univariate distribution of this paper, the QMC method is designed to sample points that are uniformly distributed on the interval [ 0 , 1 ] . If the inverse function of F exists, then the set of n points:
b j = F 1 2 j 1 2 n , j = 1 , , n ,
has been proved to have the minimal F-discrepancy of 1 / 2 n from F ( x )  [12]. Therefore, the set of points B = { b 1 , , b n } is called the QMC-RPs of F ( x ) . Fang et al. (1994) [12] gave a comprehensive study on QMC methods and their applications in statistical inference, experimental design, geometric probability, and optimization.

1.3. Mean Square Error Representative Points

The concept of MSE-RPs was independently proposed by Cox (1957) [15], Flury (1990) [16] and many others. In the literature, “MSE-RPs” have been called by different names, such as “quantized” and “principal points”. Let a random variable X F ( x ) with finite mean μ and variance σ 2 . To provide the best representation of F for a given number n, we select a set of n representative points having the least mean square error from F ( x ) , and form a discrete distribution F M S E ( n ) . Denote Y M S E ( n ) F M S E ( n ) defined as
F M S E ( n ) ( y ) = i = 1 n p i ( n ) I { b i ( n ) y } ,
with the probability mass function
f ( y = b i ( n ) ) = p i ( n ) , i = 1 , , n ,
where < b 1 ( n ) < b 2 ( n ) < < b n ( n ) < are MSE-RPs of X and p 1 ( n ) , , p n ( n ) are the corresponding probabilities with respect to
M S E Y M S E = M S E b 1 ( n ) , , b n ( n ) = min i = 1 , , n x b i ( n ) 2 f ( x ) d x ,
and
p 1 ( n ) = ( b 1 ( n ) + b 2 ( n ) ) / 2 f ( x ) d x , p i ( n ) = ( b i 1 ( n ) + b i ( n ) ) / 2 ( b i ( n ) + b i + 1 ( n ) ) / 2 f ( x ) d x , i = 2 , , n 1 , p n ( n ) = ( b n 1 ( n ) + b n ( n ) ) / 2 + f ( x ) d x .
The MSE-RPs have many useful properties. Graf and Luschgy (2007) [17], and Fei (1991) [18] proved that
E Y M S E ( n ) = E X , lim n E ( X Y M S E ( n ) ) 2 = lim n Var ( X ) Var ( Y M S E ( n ) ) = 0 .
Hence, Y M S E ( n ) converges to X in distribution.
In this paper, Section 2 begins by reviewing the definition and properties of the GAST distribution. To explore the relationship between the classification of the GAST distribution and the three parameters α , s , ν , we apply the uniform design (Wang and Fang 1981 [19]) to arrange the values of parameter combinations, and then depict the corresponding density plots. Section 2 also classifies the GAST distribution according to the number of peaks in the density function with some proofs. The first four order moments and stochastic representation of the GAST distribution are shown in this section. Section 3 mainly introduces a maximum likelihood estimation (MLE) method with a distribution-free quantile estimator: QMC-MLE (Li and Fang 2024 [20]). In this QMC-MLE method, the estimated quantiles of the sample are used to replace the original sample, and then the MLE is performed on the estimated quantiles to obtain the parameter estimates. We explore the parameter estimation effectiveness of QMC-MLE for small samples by simulation in this section. In order to cover both unimodal and bimodal cases, we choose the GAST distribution with different parameter settings as the underlying distributions. In this section, we find that the effectiveness of QMC-MLE in parameter estimation is influenced by the number of peaks of sample. Section 4 calculates the three types of RPs, MC-RPs, QMC-RPs, and MSE-RPs, of the GAST distribution for different sample size n. For MSE-RPs, the calculation process requires a parametric k-means algorithm (Stampfer and Stadlober 2002 [21]). We will compare the estimates of four statistics (mean, variance, skewness and kurtosis) by the three types of RPs of the underlying distributions. Another application of RPs is density estimation. Section 4 combines the kernel density method (Rosenblatt 1956 [22]) and the three types of RPs to estimate the density of the underlying GAST distributions. Section 5 applies the RPs to real data samples to show the outstanding performance of MSE-RPs under the assumption of a GAST model.

2. Generalized Alpha Skew- t Distribution

In this section, we give the definition of the density function of the GAST distribution (Altun et al., 2018 [8]) and list some of its commonly used subdistributions. We set the parameter values by the uniform design method (Wang and Fang 1981 [19]) to fully demonstrate the influence of parameters on the shape of the density function. Section 2.2 discusses how the parameters influence the number of peaks of density under four conditions. Section 2.3 and Section 2.4 give the moments and stochastic representation of the GAST distribution, respectively.

2.1. Definition of the GAST Distribution

Definition 1.
(GAST distribution). A random variable X is said to follow the GAST distribution, denoted as X G A S T ( α , s , ν ) , if it has the following pdf
f ( x ; α , s , ν ) = ( 1 α x ) 2 + 1 c ( α , s , ν ) t ( x ; ν ) T 1 + ν x 2 + ν s x ; ν + 1 , ν > 2 , x R ,
where
c α , s , ν = 1 α s 1 + s 2 ν π 1 / 2 Γ ν 1 2 / Γ ν 2 + α 2 2 ν ν 2 .
Proposition 1.
If a random variable Y S T ( s , ν ) , the c ( α , s , ν ) in (11) can be written as
c ( α , s , ν ) = 1 α E Y + α 2 2 E Y 2 .
Proof. 
We set a random variable Z S N ( α ) . From the Equation (2), the moments of the ST distribution are given by
E Y m = ( ν / 2 ) m / 2 Γ ν m 2 Γ ( ν 2 ) E Z m .
Henze (1986) [23] has given the general expression of the odd moments of Z, which is
E Z 2 k + 1 = 2 π δ ( 1 + s 2 ) k ( 2 k + 1 ) ! 2 k i = 0 k i ! ( 2 s ) 2 i ( 2 i + 1 ) ! ( k i ) ! , k = 0 , 1 , , n ,
where δ = s / 1 + s 2 . The even moments coincide with the standard normal distribution, because Z 2 χ 1 2 . Hence, the first two moments of ST distribution are, respectively, given by
E Y = ν π 1 / 2 Γ ν 1 2 Γ ν 2 δ , E Y 2 = ν ν 2 .
Then the Equation (12) is proved.    □
The GAST distribution involves several popular useful distributions:
  • If α = 0 , the GAST distribution reduces to the skew-t (ST) distribution.
  • If s = 0 , the GAST distribution reduces to the alpha-skew-t (AST) distribution.
  • If α = 0 and s = 0 , the GAST distribution reduces to the Student’s-t distribution.
  • If ν , the GAST distribution reduces to the alpha-skew-normal (ASN) distribution.
  • If ν , α = 0 , the GAST distribution reduces to the skew-normal (SN) distribution.
  • If ν , α = 0 and s = 0 the GAST distribution reduces to the normal distribution.
In order to depict the GAST densities, especially the characteristics of unimodal or multimodal with different combinations of parameters, the experimental design is used to arrange the parameter values. The uniform design is a number-theoretic method, proposed by Wang and Fang (1981) [19]. As a robust experimental design method, the uniform design has been widely applied in various fields. A uniform design table provides a scientific arrangement of experiments by tabulating the level combinations of factors of interest. Let U n ( q s ) denote a uniform design with n experimental runs and s factors each having q levels. The uniform design table, U 16 ( 16 3 ) , adopted in this paper is derived from the website Uniform-Design-Tables (https://fst.uic.edu.cn/isci/research/Uniform_Design_Tables.htm (accessed on 15 September 2024)). In uniform design tables, the levels of factors are labeled by positive integers. For a unit hypercube experimental region [ 0 , 1 ] s , the levels { 1 , 2 , , q } usually take values { 1 2 q , 3 2 q , , 2 q 1 2 q } . For any hyperrectangle experimental region [ a , b ] s , a linear transformation a + ( b a ) i 2 q , i = 1 , 3 , , 2 q 1 is applied. Table 1 lists the arrangement of the uniform design table U 16 ( 16 3 ) for the parametric region, α × s × ν : [ 2.6 , 3.8 ] × [ 3 , 3.4 ] × [ 2.5 , 18.5 ] , indicating the 16 kinds of parameter settings.
Figure 1 shows the density plots corresponding to eight parameter settings in Table 1, which are enough to represent the plot of GAST density. As shown in Figure 1, there are four cases in which the pdfs are bimodal and the No. XII and XIV GAST distributions are AST and ST distributions, respectively.
In Section 2.2, we will show how the parameters α , s and ν affect the number of peaks of two special types of the GAST distribution, the AST and ST distributions, leading to the two categories, unimodal and bimodal. The number of peaks in the distribution may affect the parameter estimation. For instance, if the sample size is small and the density function presents a bimodal shape, then the sample is likely to miss the turning points, which will affect the parameter estimation to a certain extent. In addition, the calculation of representative points will also be affected, and the accuracy of derivative density estimation may be reduced.

2.2. Unimodal and Bimodal Properties

Since the density plots of the GAST distribution are varied, we will divide the GAST distribution into two categories: unimodal and bimodal. The number of peaks is determined by the number of zeros of the first derivative of (10). If it has one zero, then the density function is unimodal. If it has three zeros, then the density function is bimodal. To simplify the analysis, we consider the situations under four different parameter combinations of α and s, ( α = 0 , s = 0 ) , ( α = 0 , s 0 ) , ( α 0 , s = 0 ) and ( α 0 , s 0 ) . The discussion is as follows:
(1)
α = 0 , s = 0 and X t ( ν )
The Student’s t-distribution is a well-known unimodal distribution.
(2)
α = 0 , s 0 and X S T ( s , ν )
The pdf of X S T ( s , ν ) , f ( x ; s , ν ) , is given by (3), ν > 2 .
Proposition 2.
The skew-t distribution is always unimodal.
Proof. 
We derive the first derivative of (3) as follows:
f ( x ; s , ν ) = 2 t ( x ; ν ) T s x ν + 1 ν + x 2 ; ν + 1 + 2 t ( x ; ν ) T s x ν + 1 ν + x 2 ; ν + 1 ,
where
t ( x ; ν ) = Γ ν + 1 2 π ν Γ ( ν 2 ) 1 + x 2 ν ν + 1 2 c 1 + x 2 ν ν + 1 2 ,
t ( x ; ν ) = c 2 x ν ν + 1 2 1 + x 2 ν ν + 1 2 1 = t ( x ; ν ) ( x ) ν + 1 ν + x 2 ,
T ( · ) = s ν + 1 ν + x 2 ν ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 .
Substituting (17) and (18) in Equation (15), we obtain that
f ( x ; s , ν ) = 2 ( x ) t ( x ; ν ) ν + 1 ν ν ν + x 2 T ( · ) + 2 t ( x ; ν ) s ν + 1 ν + x 2 ν ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 = 2 t ( x ; ν ) ν ν + x 2 ( x ) ν + 1 ν T ( · ) + s ν + 1 ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 .
As 2 t ( x ; ν ) ν ν + x 2 > 0 , the solution to f ( x ; s , ν ) = 0 can be evaluated by solving the next equation:
g ( x ) = ( x ) ν + 1 ν T · + s ν + 1 ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 = 0 .
Since f ( x ; s , ν ) is symmetric with respect to s:
f ( x ; s , ν ) = f ( x ; s , ν ) .
The number of peaks is not affected by the sign of s such that we assume s > 0 . From the expression of (19), we can see that g ( x ) > 0 , when x ( , 0 ] . When x ( 0 , + ) , we have
s x ν + 1 ν + x 2 = s ν + 1 ν + x 2 ν ν + x 2 > 0 .
Hence, T ( · ) is monotonically increasing, while t ( · ) is monotonically decreasing. Therefore, we can deduce that g ( x ) is decreasing when x ( 0 , + ) , i.e., g ( x ) , as x + . Hence, there is only one solution x 1 , s.t. g ( x 1 ) = 0 . And the ST distribution must be unimodal.    □
(3)
α 0 , s = 0 and X A S T ( α , ν )
The pdf of X A S T ( α , ν ) is given by
f ( x ; α , ν ) = ( 1 α x ) 2 + 1 2 c ( α , 0 , ν ) t ( x ; ν ) , ν > 2 .
Proposition 3.
The pdf of the AST distribution, f ( x ; α , ν ) as (20), is bimodal if and only if g ( x 1 ) < 0 < g ( x 2 ) and α ν 2 3 3 ν 2 3 ν , ν 2 3 3 ν 2 3 ν , where
g ( x ) = ( 1 ν ) α 2 x 3 + 2 α ν x 2 + ( 2 α 2 ν 2 ν 2 ) x 2 α ν , x 1 , x 2 = 4 α ν Δ 2 α 2 ( 3 3 ν ) , Δ = 8 α 2 ( 3 ν 2 3 ν ) α 2 + ( 3 ν 2 ) .
Otherwise, it is unimodal. It is worth mentioning that a sufficient condition for f ( x ; α , ν ) to be unimodal is
α ν 2 3 3 ν 2 3 ν , ν 2 3 3 ν 2 3 ν .
Proof. 
Differentiating (20), we obtain
f ( x ; α , ν ) = 1 2 c ( α , 0 , ν ) ( 2 α ) ( 1 α x ) t ( x ; ν ) ( ν + 1 ) x ν + x 2 t ( x ; ν ) ( 1 α x ) 2 + 1 .
Since 1 2 c ( α , 0 , ν ) is a constant, and t ( x ; ν ) > 0 , we obtain the equivalent relation expression
f ( x ; α , ν ) = 0 ( 2 α ) ( 1 α x ) ( ν + x 2 ) ( ν + 1 ) x ( ( 1 α x ) 2 + 1 ) = 0 g ( x ) ( 1 ν ) α 2 x 3 + 2 α ν x 2 + ( 2 α 2 ν 2 ν 2 ) x 2 α ν = 0 .
Now our problem is transformed into studying the number of zeros of the function g ( x ) . The first derivative of (23) is
g ( x ) = ( 3 3 ν ) α 2 x 2 + 4 α ν x + 2 α 2 ν 2 ν 2 , ν > 2 .
This is a quadratic function with a downward opening. The discriminant Δ of g ( x ) is as follows:
Δ = 8 α 2 ( 3 ν 2 3 ν ) α 2 + ( 3 ν 2 ) .
If Δ < 0 , namely
ν 2 3 3 ν 2 3 ν < α < ν 2 3 3 ν 2 3 ν ,
then g ( x ) is monotonically decreasing. Since lim x g ( x ) = + , lim x + g ( x ) = , there must be only one root of g ( x ) = 0 , i.e., f ( x ; α , ν ) is unimodal. It is worth mentioning that the parameter setting of No.XII in Table 1 fits this condition.
If Δ > 0 , then
x 1 < x 2 , s . t . g ( x 1 ) = g ( x 2 ) = 0 , where x 1 , x 2 = 4 α ν Δ 2 α 2 ( 3 3 ν ) .
We can obtain that
g ( x ) > 0 , x ( x 1 , x 2 ) , g ( x ) < 0 , x ( , x 1 ) ( x 2 , ) .
Hence, when x ( x 1 , x 2 ) , g ( x ) is monotonically increasing. When x ( , x 1 ) ( x 2 , ) , g ( x ) is monotonically decreasing. If g ( x 1 ) < 0 < g ( x 2 ) , then g ( x ) has three zeros, and f ( x ; α , ν ) is bimodal. If condition (22) is met, f ( x ; α , ν ) is unimodal. To sum up, f ( x ; α , ν ) is bimodal if and only if Δ > 0 , i.e., condition (22) is not satisfied, and g ( x 1 ) < 0 < g ( x 2 ) . Otherwise, it is unimodal.    □
(4)
α 0 , s 0 and X G A S T ( α , s , ν )
Differentiating the pdf of the GAST distribution as (10), we obtain
f ( x ; α , s , ν ) = t ( x ; ν ) c ( α , s , ν ) { ( 2 α ) ( 1 α x ) T ( · ) + ( ν + 1 ) x ν + x 2 ( 1 α x ) 2 + 1 T ( · ) + ( 1 α x ) 2 + 1 s ν + 1 ν + x 2 ν ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 } .
Let
g ( x ) = ( 2 α ) ( 1 α x ) T ( · ) + ( ν + 1 ) x ν + x 2 ( 1 α x ) 2 + 1 T ( · ) + ( 1 α x ) 2 + 1 s ν + 1 ν + x 2 ν ν + x 2 t s x ν + 1 ν + x 2 ; ν + 1 .
Then we have f ( x ; α , s , ν ) = t ( x ; ν ) c ( α , s , ν ) g ( x ) , and we obtain
f ( x ; α , s , ν ) = 0 g ( x ) = 0 .
Due to the complexity of the g ( x ) , it is difficult to study its zeros. The discussion of such a situation remains to be studied.

2.3. Moments of the GAST Distribution

From the pdf of the GAST distribution in (10), the kth moment of X G A S T ( α , s , ν ) is given by
E X k = E Y k α E Y k + 1 + α 2 2 E Y k + 2 c ( α , s , ν ) M k ( α , s , ν ) c ( α , s , ν ) ,
where Y S T ( s , ν ) . Combined with (13) and (14), we have the first four moments of X in which the M k ( α , s , ν ) is given as follows  
M 1 ( α , s , ν ) = δ ν π 1 / 2 Γ ν 1 2 Γ ν 2 α ν ν 2 + α 2 2 ν 2 3 / 2 Γ ν 3 2 Γ ν 2 2 π 3 1 + s 2 δ + 2 δ 3 , M 2 ( α , s , ν ) = ν ν 2 α ν 2 3 / 2 Γ ( ν 3 2 ) Γ ( ν 2 ) 2 π 3 1 + s 2 δ + 2 δ 3 + 3 α 2 ν 2 2 ( ν 2 ) ( ν 4 ) , M 3 ( α , s , ν ) = ν 2 3 / 2 Γ ν 3 2 Γ ν 2 2 π 3 1 + s 2 δ + 2 δ 3 3 α ν 2 ( ν 2 ) ( ν 4 ) + α 2 2 ν 2 5 / 2 Γ ν 5 2 Γ ν 2 2 π 15 ( 1 + s 2 ) 2 δ + 20 1 + s 2 δ 3 + 8 δ 5 , M 4 ( α , s , ν ) = 3 ν 2 ( ν 2 ) ( ν 4 ) α ν 2 5 / 2 Γ ν 5 2 Γ ν 2 2 π 15 ( 1 + s 2 ) 2 δ + 20 1 + s 2 δ 3 + 8 δ 5 + 15 α 2 ν 3 2 ( ν 2 ) ( ν 4 ) ( ν 6 ) .

2.4. Stochastic Representation the GAST Distribution

Altun (2018) [8] provided a stochastic representation of X G A S T ( α , s , ν ) as follows.
Theorem 1.
If the random variables W A S T ( α , ν ) and Z t ( ν + 1 ) are independent, then we have
W 1 + ν W 2 + ν s W > Z G A S T α , s , ν
According to (26) given by [8] , we can generate random samples from the GAST distribution by the following procedure:
Step 1. 
Generate W A S T ( α , ν ) and Z t ( ν + 1 ) .
Step 2. 
If ( 1 + ν ) / ( W 2 + ν ) s W > Z , then keep W. Otherwise, go to Step 1.

3. Parameter Estimation

In parameter estimation, the maximum likelihood estimation has been widely utilized because of its transitivity. Let x = { x 1 , x 2 , , x n } be a random sample from the G A S T ( x ; α , s , ν ) distribution. The log-likelihood function is given by
( α , s , ν x ) = i = 1 n log ( 1 α x i ) 2 + 1 c ( α , s , ν ) + i = 1 n log t ( x i ; ν ) + i = 1 n log T 1 + ν x i 2 + ν s x i ; ν + 1 .
By taking the partial derivatives with respect to α , s and ν , we have
α = i = 1 n ( 2 x i ) ( 1 α x i ) c ( α , s , ν ) c α ( α , s , ν ) ( 1 α x i ) 2 + 1 ( 1 α x i ) 2 + 1 c ( α , s , ν ) , s = n c s ( α , s , ν ) c ( α , s , ν ) + i = 1 n x i ν + 1 x i 2 + ν ω i * , ν = n c ν ( α , s , ν ) c ( α , s , ν ) + i = 1 n τ i * + i = 1 n s x i ( x i 2 1 ) 2 ν + 1 x i 2 + ν ( x i 2 + ν ) 2 ω i * ,
where
ω i * = t 1 + ν x i 2 + ν s x i ; ν + 1 T 1 + ν x i 2 + ν s x i ; ν + 1 , τ i * = t ν ( x i ; ν ) t ( x i ; ν ) .
Remark that c α ( α , s , ν ) , c s ( α , s , ν ) , c ν ( α , s , ν ) , and t ν ( x i ; ν ) are the partial derivatives of c ( α , s , ν ) and t ( x i ; ν ) . The solution ( α ^ , s ^ , ν ^ ) satisfying α = 0 , s = 0 , ν = 0 at the same time is the MLE of ( α , s , ν ) . To solve the system of nonlinear equations in (28), a numerical method is required. In the following subsections, we introduce the algorithm for solving MLE: L-BFGS-B (Byrd et al., 1995 [24]) in Section 3.1. In order to improve estimation accuracy by enhancing sample representativeness, we incorporate a non-parametric quantile estimation method (Harrell and Davis 1982 [25]) introduced in Section 3.2. In Section 3.3, we evaluate the effectiveness of the algorithm and quantile estimation method by simulation. In our study, we use R software version 4.4.1 to conduct simulation.

3.1. L-BFGS-B

L-BFGS-B (Byrd et al., 1995 [24]) is a limited-memory algorithm for solving large nonlinear optimization problems subject to simple bounds on the variables. The essence of the algorithm is a quasi-Newton method. At each iteration, a limited-memory BFGS approximation to the Hessian matrix is updated. This limited-memory matrix is used to define a quadratic model of the objective function, in our study indicating (27). Given a set of samples x = { x i } i = 1 n , the optimization problem can be formulated as follows:
max α , s , ν ( α , s , ν x ) .
We summarize the procedures of L-BFGS-B as following Algorithm 1.
Algorithm 1 L-BFGS-B for MLE
1:
Input: Initial guesses for parameters α 0 , s 0 , ν 0 , tolerance ϵ , maximum number of iterations N, bounds ( α min , s min , ν min ) and ( α max , s max , ν max )
2:
Output: Estimated parameters α ^ , s ^ , ν ^
3:
Initialize k 0
4:
Initialize parameters θ ( 0 ) ( α 0 , s 0 , ν 0 )
5:
repeat
6:
   Compute the gradient ( θ ( k ) )
7:
   Compute the search direction p ( k ) using a two-stage approach [24]
8:
   Project the search direction p ( k ) to satisfy the bounds
9:
   Line search: find step size λ ( k ) that maximizes ( θ ( k ) + λ ( k ) p ( k ) )
10:
   Update parameters: θ ( k + 1 ) θ ( k ) + λ ( k ) p ( k )
11:
    k k + 1
12:
until  ( θ ( k ) ) < ϵ or k N
13:
( α ^ , s ^ , ν ^ ) θ ( k )
We chose the L-BFGS-B algorithm because the degree of freedom ν must be greater than 2 for the GAST distribution. If the unconstrained optimization method is used, missing values are likely to appear in the optimization process.

3.2. QMC-MLE

In this subsection, we introduce a method for improving the accuracy of MLE. It is well-known that the accuracy of MLE depends on the sample size to a certain extent. If the sample misses the turning points of the population density, it is less representative, which may lead to lower estimation accuracy. This situation is prone to occur in small sized samples and especially bimodal cases. Fang and Wang (1994) [12] pointed out that the set of equal quantiles { p i = ( 2 i 1 ) / 2 n , i = 1 , , n } has the best representativeness in the sense of F-discrepancy. In Section 1, we introduce a QMC method to generate the RPs of a distribution with known parameters. However, for a distribution with unknown parameters, how can we obtain the p t h quantile of the distribution F? Harrell and Davis (1982) [25] proposed a distribution-free method: the Harrell–Davis (HD) quantile estimator. We use this estimator to calculate the set of equal quantiles of F, and then substitute these n quantiles into the likelihood function ( θ x ) for calculation. Li and Fang (2024) [20] called the MLE method with HD quantile estimator as QMC-MLE, presented below.
Let x = { x 1 , , x n } be a random sample of size n from the GAST distribution. Denote X ( i ) as the i t h largest value in x and F 1 ( p ) as the p t h population quantile.
Step 1: 
Generate a set of points uniformly scattered on ( 0 , 1 ) through
p i = 2 i 1 2 n , i = 1 , , n .
Step 2: 
Use the Harrell–Davis quantile estimator to process sample:
Q ( p i ) = i = 1 n W n , i X ( i ) ,
where
W n , i = 1 β { ( n + 1 ) p i , ( n + 1 ) ( 1 p i ) } ( i 1 ) / n i / n y ( n + 1 ) p i 1 ( 1 y ) ( n + 1 ) ( 1 p i ) 1 d y = I i / n { p i ( n + 1 ) , ( 1 p i ) ( n + 1 ) } I ( i 1 ) / n { p i ( n + 1 ) , ( 1 p i ) ( n + 1 ) } ,
and I x { a , b } denotes the incomplete beta function.
Step 3: 
Let z i = Q ( p i ) , for i = 1 , , n . Therefore, the x = ( x 1 , , x n ) in the log-likelihood function is replaced by z = ( z 1 , , z n ) such that the objective function based on the revised sample is
( θ | z ) = i = 1 n ln f ( z i ; α , s , ν ) .
Step 4: 
Use the L-BFGS-B algorithm to find the MLE of θ by maximizing (29).

3.3. Simulation

Before the simulation, we introduce four measures of the estimation accuracy: L2.pdf, L2.cdf, absolute bias index (ABI) and Kullback–Leibler (KL) divergence. Denote the true underlying distribution as F in cdf or f in pdf, and the estimated distribution as F ^ or f ^ . The four measures are defined as follows:
  • L2.pdf between two densities is defined as
    L 2 f , f ^ = f ( x ) f ^ ( x ) 2 d x 1 / 2 ;
  • L2.cdf between two cdf’s is defined as
    L 2 F , F ^ = F ( x ) F ^ ( x ) 2 d x 1 / 2 ;
  • Absolute bias index (ABI) is used to evaluate the overall estimation bias in parameters in which μ ^ and σ ^ denote the estimated expectation and standard deviation of the GAST distribution, defined as
    A B I = 1 2 μ μ ^ μ + σ σ ^ σ ;
  • Kullback–Leibler (KL) divergence or the so-called relative entropy is used to measure the difference from one probability distribution to another, defined as follows:
    D K L ( F F ^ ) = f ( x ) log f ( x ) f ^ ( x ) d x .
In the simulation, we generate samples by the inverse transformation method and mainly focus on the small sample case. To study both unimodal and bimodal cases, we choose five parameter settings, No.VII, VIII, IX, X and XI, of the GAST distribution from Figure 1 as the underlying distributions, among which the No.VII, VIII, and XI distributions are bimodal. The sample size n is set to be 25 , 50 , 100 and 300. After N = 100 times of repetition, the average of ( α ^ , s ^ , ν ^ ) is set to be the parameters of the estimated GAST distribution. The precision of the estimates is evaluated by L2.pdf, L2.cdf, ABI and KL, summarized in Table 2, in which “plain” indicates the MLE resulting from the original sample x = ( x 1 , , x n ) , and “qmc” uses the revised sample z = ( z 1 , , z n ) .
The best performance in the sense of each measure for each pair of distribution type and sample size is highlighted in bold in Table 2. The QMC-MLE method performs better than the plain MLE in most cases, especially for the No.VIII, IX and X distributions. However, for the No.VII and XI distributions, the QMC-MLE has no obvious advantage. The No.IX and X distributions are unimodal, but the No.VIII is bimodal. From the pdf plot of No.VIII distribution, we can see that although it is bimodal, its first peak is not as obvious as the peaks of No.VII and XI distributions. In the pdf plots of No.VII and XI distributions, as x increases, the density function experiences a steep decline after the first peak, while for the No.VIII distribution, the decline lasts only for a short distance before it begins to rise again. Therefore, we have reasons to believe that the QMC-MLE method is more suitable for unimodal functions or bimodal functions of which one peak is not obvious.
In addition, for No.XI GAST distribution, in the sense of KL divergence, the plain MLE is better than the QMC-MLE for all sample sizes. As for the No.XI case under other measures, although the QMC-MLE performs better when n = 25 and 50, it becomes less effective for n = 100 and 300, which may be caused by the consistency of MLE. According to the discussion above, when we conduct case studies in Section 5, the QMC-MLE will be only used for unimodal samples in parameter estimation, while for bimodal samples, we will use the plain MLE. Nevertheless, this simulation study reveals that the MLE method (both plain and QMC) is appropriate for estimating the GAST parameters due to the small values of four bias measurements.

4. RPs of the GAST Distribution

Recall that in Section 1, we introduced three types of representative points: MC-RPs, QMC-RPs, and MSE-RPs. In this section, we will find these three types of RPs of the GAST distribution for different sample sizes n, and use them to estimate moments and densities in Section 4.1 and Section 4.2, respectively.

4.1. Moment Estimation

For a given n, MC-RPs will be generated by the inverse transformation method. QMC-RPs can be easily obtained by (6) while MSE-RPs are calculated through a parametric k-means algorithm proposed by Stampfer and Stadlober (2002) [21]. We summarize the computation procedure of the k-means algorithm for approximating MSE-RPs of the GAST distribution as follows.
Step 1: 
For a given pdf f ( x ; α , s , ν ) , the number of RPs: n, and t = 0 , input a set of initial points b 1 ( t ) < b 2 ( t ) < < b n ( t ) . Here we take n QMC-RPs as the initial values. Determine a partition of R as
I i ( t ) = ( a i 1 ( t ) , a i ( t ) ] , i = 1 , , n 1 , I n ( t ) = ( a n 1 ( t ) , a n ( t ) ) ,
where
a 0 ( t ) = , a i ( t ) = ( b i 1 ( t ) + b i ( t ) ) / 2 , i = 1 , , n 1 , a n ( t ) = .
Step 2: 
Calculate probabilities
p j ( t ) = I j ( t ) f ( x ; α , s , ν ) d x , j = 1 , , n ;
and the condition means
b j ( t + 1 ) = I j ( t ) x f ( x ; α , s , ν ) d x I j ( t ) f ( x ; α , s , ν ) d x = I j ( t ) x f ( x ; α , s , ν ) d x p j ( t ) .
Step 3: 
If two sets, { b j ( t ) } and { b j ( t + 1 ) } are identical, the process stops and the outputs { b j ( t ) } as the MSE-RPs of the distribution with probabilities { p j ( t ) } . Otherwise, let t : = t + 1 and go back to Step 1.
Let Y be a discrete distribution with probability mass function P ( Y = b j ) = p j , j = 1 , , n , which is an approximate distribution to the GAST distribution. Then, the estimates of mean, variance, skewness and kurtosis can be calculated by
E [ Y ] = j = 1 n b j p j = μ Y , V a r [ Y ] = j = 1 n ( b j μ Y ) 2 p j = σ Y 2 , S k [ Y ] = 1 σ Y 3 j = 1 n ( b j μ b ) 3 p j , K u [ Y ] = 1 σ Y 4 j = 1 n ( b j μ Y ) 4 p j 3 .
We use the No.IX, X and XI as the underlying distributions and consider n = 10 , 20 , 30 . It is clear that MC-RPs are random samples of size n. For fair comparisons, we generate N samples of size n and then take the average of the estimated statistics as the results of the MC(N) method. In our study, we choose N = 10 , 100 . The true parameters and four statistics of the three underlying distribution are listed in Table 3. The bias of the estimated results is summarized in Table 4, Table 5 and Table 6.
The results indicate that the estimates based on MSE-RPs perform the best for all underlying distributions and sample sizes. The performance of MC-RPs is unstable. Sometimes the average estimates of moments based on MC-RPs are more accurate than those based on QMC-RPs, but in general, appear less effective. In addition, we can observe that with the increase in the number n, the overall effect of estimation is better. The estimates of higher-order moments (skewness and kurtosis) are worse than those of lower-order moments (mean and variance).

4.2. Kernel Density Estimation

Another application of representative points is density estimation. In the field of signal transmission, the input signal is often converted into discrete data in the transmitter and then reconstructed in the receiver. For a distribution with unknown parameters, how do we use a set of data to construct its overall density function? Here, we introduce a kernel estimation method proposed by Rosenblatt (1956) [22] and Parzen (1962) [26]. Given a fixed number of points { x 1 , , x n } from the original signal, the density estimation of f ( x ) is given by
f ^ h ( x ) = 1 n i = 1 n k h ( x x i ) = 1 n h i = 1 n k x x i h ,
where k ( · ) is the kernel function is the bandwidth and k h ( y ) = 1 h k ( y h ) . The most popular kernel is the standard normal density function
k ( x ) = ϕ ( x ) = 1 2 π e 1 2 x 2 .
In our study, we employ the representative points { b 1 , , b n } from the GAST distribution as the samples with their corresponding probabilities p i , i = 1 , , n . The density estimation of f ( x ) can be extended to
f ^ h ( x ) = i = 1 n k h ( x x i ) p i = 1 h i = 1 n k x x i h p i .
The choice of the bandwidth h is very important. Here, we set a search range { 0.05 , 0.06 , , 1 } for h. In the following comparisons, we utilize three types of RPs having sample sizes n = 10 , 20 , 30 for the kernel density estimation of No.IX, X and XI distributions, and evaluate the performances by the minimum L2.pdf between f ^ h ( x ; α , s , ν ) and f ( x ; α , s , ν ) .
Table 7, Table 8 and Table 9 show that the kernel density estimation based on MSE-RPs always has the minimum L2.pdf, which decreases as n increases. For the underlying distribution No.IX, we notice that the minimum L2.pdf based on the MSE-RPs with size 10 is only 0.0306 , which is even smaller than that based on the QMC-RPs with size 30 (0.0341). Figure 2, Figure 3 and Figure 4 show the comparing fitting plots of different sets of representative points. It is obvious that the fitting effect increases with n, and the MSE-RPs-based kernel estimation has the best fitting effect, followed by the QMC-RPs-based estimation. It is worth mentioning that for the MC-RPs-based density estimation, due to the randomness of the Monte Carlo method, the density curve fitted out each time differs greatly, and in many cases, it is not sufficient to reconstruct the original density function.

5. Case Studies

In this section, we will utilize three types of RPs to study real data samples. Before calculating the RPs, we incorporate two additional parameters in the GAST distribution, the location parameter μ and the scale parameter σ , to fit the samples. The pdf is given by
f ( x ; α , s , ν , μ , σ ) = 1 α x μ σ 2 + 1 σ c ( α , s , ν ) t x μ σ ; ν T 1 + ν x μ σ 2 + ν s x μ σ ; ν + 1 ,
where c ( α , s , ν ) is the same as that in Formula (11). For the sample data, we choose both unimodal and bimodal types, which are the O 3 data and the Faithful Geyser data.

5.1. O 3 Data

These data are from the website (https://archive.ics.uci.edu/dataset/360/air+quality (accessed on 15 September 2024)), which contains hourly averaged responses from an Air Quality Chemical Multisensor Device in an Italian city. We selected the “PT08.S5( O 3 )” (denoted as “ O 3 ” in this article) data as the study object. After setting the interception time from September 1 to November 30 in 2004, and removing the missing values, we derive 90 observations. We summarize the parameter estimation results of G A S T ( α , s , ν , μ , σ ) obtained by the QMC-MLE in Table 10, providing the estimated GAST model as follows
G A S T ( 0.1518 , 0.2030 , 16.9607 , 1219.228 , 385.0162 ) .
We present the histogram with the fitted density for O 3 data in Figure 5a. After calculating the { p i = ( 2 i 1 ) / 2 n , i = 1 , , n } quantiles of these data by the HD quantile estimator introduced in Section 3.2, we obtain the associated QQ plot given in Figure 5b. Figure 5 shows the good fitting effect of the GAST model on this unimodal data.
The mean, variance, skewness, kurtosis of the distribution (31) are as follows
( 1209.544 , 168359.9 , 0.0362 , 0.601 ) .
We generate MC-RPs, QMC-RPs and MSE-RPs of size 30, from the GAST model (31) using the methods discussed in Section 4.1. Table 11 summarizes the bias of the estimation of the four statistics based on the MC, QMC, and MSE methods.
Although the bias of the estimated variance in Table 11 is large in value, it is relatively small compared to the true variance of the model, which is 168,359.9. As shown in Table 11, the MSE-RPs estimate the moments of the model more accurately than the other two types of RPs.
The comparisons of the kernel density estimates based on MC, QMC, and MSE RPs are presented in Figure 6.
The corresponding minimum L2.pdf’s between the kernel estimates and the density of the model (31) are 0.00237 for MC, 0.00026 for QMC and 0.00025 for MSE. As shown in Figure 6, although the estimated kernel density based on the QMC method is well-fitted, it is not as good as that based on the MSE method at the beginning and at the peak.

5.2. Faithful Geyser Data

The Faithful Geyser Data, a commonly used dataset in R software, is a record of the waiting time between eruptions and the duration time of these eruptions for Old Faithful Geyser in Yellow National Park, Wyoming, USA. In this study, we use the waiting-time samples which include 299 observations.
Since these data are bimodal, we use the plain-MLE to estimate parameters. The results are given in the Table 12, providing the GAST model as
G A S T ( 2.4016 , 0.2322 , 100 , 70.6301 , 8.7872 ) .
The histogram with the fitted density for Faithful Geyser data is given in Figure 7a. Denoting X ( 1 ) < < X ( n ) as the order statistics of this bimodal data, we calculate its quantiles by the traditional estimator:
Q p = ( 1 g ) X ( j ) + g X ( j + 1 ) ,
where ( n + 1 ) p = j + g and j is the integral part of ( n + 1 ) p . The associated QQ plot is given in Figure 7b.
Table 12 shows that the estimated ν is 100, which is the upper bound we set. From this point of view, we can assume that ν in this fitting model. As described in Section 2.1, this model is actually a subdistribution of the GAST: ASN, indicating that the GAST model is flexible since it can adapt to different types of data. From the QQ plot in Figure 7, we notice that when data are less than 50, the scatter point deviates far from the line, which can also be observed from Figure 7a. The fitting curve rises slowly at the beginning, so the sample quantiles will be larger than the GAST quantiles. When the data are greater than 50, where more samples are located, this distribution fits the data well. Hence, the estimated GAST model (32) is still acceptable.
The mean, variance, skewness, and kurtosis of the distribution (32) are
( 72.3205 , 192.9088 , 0.4496 , 0.8031 ) .
We generate MC-RPs, QMC-RPs and MSE-RPs of size 50 from the model (32). The estimation biases of the four statistics are summarized in Table 13.
We observe that the MSE-RPs have the same mean as the population expectation, which is described in (9). Compared to MC-RPs and QMC-RPs, the MSE method estimates the moments of the model more accurately. The comparison of the kernel density plots is presented in Figure 8.
The corresponding minimum L2.pdf’s between the kernel estimates and the density of the model (32) are 0.0041 for MC, 0.0029 for QMC and 0.0027 for MSE. The MSE-RPs still perform the best.

6. Conclusions

This paper mainly studies different types of representative points of the GAST distribution and the applications of these RPs. The comparative analyses across various sample sizes and both unimodal and bimodal GAST distributions reveal that the RPs obtained by the MSE method consistently outperform the others in the applications of estimating moments and densities. However, the performance on estimating higher-order moments, such as skewness and kurtosis, shows the limitations of RPs on capturing higher-order statistical properties. Therefore, the number of RPs n must adopt a larger value to reduce the bias of higher-order moment estimation. This paper also incorporates QMC-MLE for parameter estimation of the GAST distribution. For unimodal or bimodal data with an unclear peak, the QMC-MLE method improves parameter estimation accuracy. However, in bimodal cases, plain MLE is more effective. Combined with such property, we can model different types of data accordingly.

Author Contributions

Conceptualization, K.-T.F.; Funding acquisition, Y.-X.L.; Methodology, Y.-F.Z. and K.-T.F.; Software, Y.-F.Z.; Writing—original draft, Y.-F.Z.; Writing—review and editing, Y.-X.L., K.-T.F. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China Postdoctoral Science Foundation grant number 2023TQ0326.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

A part of the dataset utilized for case studies in this paper is openly available in UCI Machine Learning Repository at https://archive.ics.uci.edu/dataset/360/air+quality (accessed on 15 September 2024). Another data set for analysis in this paper is obtained from R package datasets, named faithful: Old Faithful Geyser Data.

Acknowledgments

Our work was supported in part by Research Center for Frontier Fundamental Studies, Zhejiang Lab, and the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Gupta, R.; Drzazga-Szczȩśniak, E.; Kais, S.; Szczȩśniak, D. The entropy corrected geometric Brownian motion. arXiv 2024, arXiv:2403.06253. [Google Scholar]
  2. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  3. Branco, M.D.; Dey, D.K. A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 2001, 79, 99–113. [Google Scholar] [CrossRef]
  4. Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. 2003, 65, 367–389. [Google Scholar] [CrossRef]
  5. Azzalini, A.; Genton, M.G. Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 2008, 76, 106–129. [Google Scholar] [CrossRef]
  6. Martínez-Flórez, G.; Tovar-Falón, R.; Gómez, H. Bivariate Power-Skew-Elliptical Distribution. Symmetry 2020, 12, 1327. [Google Scholar] [CrossRef]
  7. Elal-Olivero, D. Alpha-skew-normal distribution. Proyecciones 2010, 29, 224–240. [Google Scholar] [CrossRef]
  8. Altun, E.; Tatlidil, H.; Ozel, G.; Nadarajah, S. A new generalization of skew-t distribution with volatility models. J. Stat. Comput. Simul. 2018, 88, 1252–1272. [Google Scholar] [CrossRef]
  9. Fang, K.T.; Pan, J. A Review of Representative Points of Statistical Distributions and Their Applications. Mathematics 2023, 11, 2930. [Google Scholar] [CrossRef]
  10. Lin, Y.X.; Tang, Y.H.; Zhang, J.H.; Fang, K.T. Detecting non-isomorphic orthogonal design. J. Stat. Plan. Inference 2022, 221, 299–312. [Google Scholar] [CrossRef]
  11. Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
  12. Fang, K.T.; Wang, Y.; Bentler, P.M. Some applications of number-theoretic methods in statistics. Stat. Sci. 1994, 9, 416–428. [Google Scholar] [CrossRef]
  13. Hua, L.K.; Wang, Y. Applications of Number Theory to Numerical Analysis; Springer: Berlin/Heidelberg, Germany; Science Press: Beijing, China, 1981. [Google Scholar]
  14. Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods; Society Industrial and Applied Mathematics (SIAM): Phiadelphia, PA, USA, 1992. [Google Scholar]
  15. Cox, D.R. Note on grouping. J. Am. Stat. Assoc. 1957, 52, 543–547. [Google Scholar] [CrossRef]
  16. Flury, B.A. Principal points. Biometrika 1990, 77, 33–41. [Google Scholar] [CrossRef]
  17. Graf, S.; Luschgy, H. Foundations of Quantization for Probability Distributions; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  18. Fei, R. Statistical relationship between the representative point and the population. J. Wuxi Inst. Light Ind. 1991, 10, 78–81. [Google Scholar]
  19. Wang, Y.; Fang, K.T. A note on uniform distribution and experimental design. Kexue Tongbao 1981, 6, 485–489. [Google Scholar]
  20. Li, Y.N.; Fang, K.T. A new approach to parameter estimation of mixture of two normal distributions. Commun. Stat.-Simul. Comput. 2024, 53, 1161–1187. [Google Scholar] [CrossRef]
  21. Stampfer, E.; Stadlober, E. Methods for estimating principal points. Commun. Stat.-Simul. Comput. 2002, 31, 261–277. [Google Scholar] [CrossRef]
  22. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  23. Henze, N. A probabilistic representation of the ’skew-normal’ distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
  24. Byrd, R.H.; Lu, P.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
  25. Harrell, F.E.; Davis, C.E. A new distribution-free quantile estimator. Biometrika 1982, 69, 635–640. [Google Scholar] [CrossRef]
  26. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Figure 1. Some plots of GAST densities with parameters in Table 1.
Figure 1. Some plots of GAST densities with parameters in Table 1.
Entropy 26 00889 g001
Figure 2. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.IX distribution.
Figure 2. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.IX distribution.
Entropy 26 00889 g002
Figure 3. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.X distribution.
Figure 3. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.X distribution.
Entropy 26 00889 g003
Figure 4. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.XI distribution.
Figure 4. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the true densities (in dashed lines) for the No.XI distribution.
Entropy 26 00889 g004
Figure 5. (a) is the histogram of O 3 data with fitted GAST density line. (b) is the associated QQ plot by HD quantile estimator.
Figure 5. (a) is the histogram of O 3 data with fitted GAST density line. (b) is the associated QQ plot by HD quantile estimator.
Entropy 26 00889 g005
Figure 6. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the density of the distribution (31).
Figure 6. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the density of the distribution (31).
Entropy 26 00889 g006
Figure 7. (a) is the histogram of Faithful Geyser data with fitted GAST density line. (b) is the associated QQ plot by traditional quantile estimator.
Figure 7. (a) is the histogram of Faithful Geyser data with fitted GAST density line. (b) is the associated QQ plot by traditional quantile estimator.
Entropy 26 00889 g007
Figure 8. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the density of the distribution (32).
Figure 8. Comparing plots of the fitted densities (in solid lines) by kernel density estimation and the density of the distribution (32).
Entropy 26 00889 g008
Table 1. The parameter settings according to a uniform design table U 16 ( 16 3 ) .
Table 1. The parameter settings according to a uniform design table U 16 ( 16 3 ) .
No. α s ν No. α s ν
I−2.41.69IX3.2−216
II−0.80.43X1.21.215
III−1.6−1.66XI3.60.811
IV1.6−2.44XII0.4018
V−1.22.417XIII0.8−1.210
VI2.4−0.48XIV03.27
VII−2−0.814XV−0.4−2.812
VIII2.825XVI22.813
Table 2. The comparisons between the plain MLE and QMC-MLE in four measures ( n = 25 , 50 , 100 , 300 ; N = 100 ).
Table 2. The comparisons between the plain MLE and QMC-MLE in four measures ( n = 25 , 50 , 100 , 300 ; N = 100 ).
No. Method n = 25n = 50n = 100n = 300
L2.pdf L2.cdf ABI KL L2.pdf L2.cdf ABI KL L2.pdf L2.cdf ABI KL L2.pdf L2.cdf ABI KL
VIIplain0.03260.04690.07300.00390.01330.01530.02070.00120.00640.01310.02600.00010.00410.00560.00980.0001
VIIqmc0.04460.04400.05520.00790.02010.02180.03070.00190.01630.02240.03260.00080.00760.00710.00780.0002
VIIIplain0.08960.11610.18220.08970.05750.07210.10260.02560.01780.03580.08090.00240.00770.01860.04360.0005
VIIIqmc0.06700.09230.15900.08300.03590.04960.07490.01810.01270.02860.06500.00190.00580.01600.03460.0004
IXplain0.03170.03650.03210.00730.01850.02140.02300.00340.01100.01290.01720.00210.00470.00490.00920.0003
IXqmc0.03020.03630.03060.00740.01080.01500.01600.00260.00560.00820.01010.00150.00290.00350.00570.0004
Xplain0.06910.07620.14430.00970.04530.05250.09670.00560.01720.02610.05250.00100.00660.01250.02720.0003
Xqmc0.05700.06630.12930.01040.03810.04760.08860.00410.01120.02070.03720.00030.00580.01130.02030.0002
XIplain0.06890.08830.10730.01940.03980.04870.05760.00310.01340.01510.02650.00670.00580.00630.00870.0005
XIqmc0.05630.05520.10430.02630.03900.03420.05250.01040.02790.02650.01960.01040.00410.01270.00690.0010
Table 3. True parameters and statistics of the underlying distributions.
Table 3. True parameters and statistics of the underlying distributions.
No. α s ν E ( X ) Var ( X ) Sk ( X ) Ku ( X )
IX3.2−216−1.59970.7971−0.71171.4857
X1.21.2150.62721.70740.62580.3101
XI3.60.8111.23013.0124−0.22930.0633
Table 4. Estimation bias of four statistics for the No.IX distribution G A S T ( 3.2 , 2 , 16 ) .
Table 4. Estimation bias of four statistics for the No.IX distribution G A S T ( 3.2 , 2 , 16 ) .
StatisticsCategory102030
MeanMC(10)−0.1180−0.1562−0.0149
MC(100)−0.02910.02140.0132
QMC0.00900.00310.0014
MSE0.00000.00000.0000
VarianceMC(10)0.11940.32880.0769
MC(100)−0.0024−0.0212−0.0044
QMC−0.1455−0.0889−0.0668
MSE−0.0231−0.0063−0.0029
SkewnessMC(10)0.34860.17350.1472
MC(100)0.36670.26760.2479
QMC0.27400.16770.1227
MSE0.02590.00450.0027
KurtosisMC(10)−2.4314−1.3967−0.7850
MC(100)−2.3704−1.6857−1.3343
QMC−2.0406−1.6037−1.3769
MSE−0.5136−0.1855−0.0942
The best performance within each statistic per sample size is highlighted in bold.
Table 5. Estimation bias of four statistics for the No.X distribution G A S T ( 1.2 , 1.2 , 15 ) .
Table 5. Estimation bias of four statistics for the No.X distribution G A S T ( 1.2 , 1.2 , 15 ) .
StatisticsCategory102030
MeanMC(10)−0.04720.01700.1001
MC(100)0.07360.0219−0.0217
QMC−0.0102−0.0051−0.0035
MSE0.00000.00000.0000
VarianceMC(10)−0.0684−0.1739−0.1644
MC(100)0.07570.15300.0124
QMC−0.2179−0.1213−0.0861
MSE−0.0394−0.0109−0.0051
SkewnessMC(10)−0.6435−0.1809−0.0803
MC(100)−0.2718−0.1878−0.1476
QMC−0.1634−0.1104−0.0873
MSE−0.0106−0.0034−0.0017
KurtosisMC(10)−1.4404−0.5846−0.5425
MC(100)−1.3711−0.8694−0.6661
QMC−1.0604−0.7846−0.6516
MSE−0.2881−0.1005−0.0514
The best performance within each statistic per sample size is highlighted in bold.
Table 6. Estimation bias of four statistics for the No.XI distribution G A S T ( 3.6 , 0.8 , 11 ) .
Table 6. Estimation bias of four statistics for the No.XI distribution G A S T ( 3.6 , 0.8 , 11 ) .
StatisticsCategory102030
MeanMC(10)0.0529−0.16390.0381
MC(100)−0.0581−0.07780.0276
QMC−0.0189−0.0082−0.0032
MSE0.00000.00000.0000
VarianceMC(10)0.46620.1453−0.3346
MC(100)0.2882−0.0598−0.0723
QMC−0.3569−0.2171−0.1635
MSE−0.0705−0.0200−0.0093
SkewnessMC(10)−0.2932−0.1673−0.0959
MC(100)−0.0544−0.0884−0.1192
QMC−0.1259−0.0954−0.0829
MSE−0.0024−0.0051−0.0031
KurtosisMC(10)−0.9306−0.7944−0.6527
MC(100)−0.9915−0.7274−0.3822
QMC−0.9529−0.7366−0.6273
MSE−0.3395−0.1294−0.0702
The best performance within each statistic per sample size is highlighted in bold.
Table 7. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.IX distribution.
Table 7. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.IX distribution.
Methodnhmin L2.pdf
MC100.500.2108
QMC100.300.0583
MSE100.240.0306
MC200.600.1300
QMC200.230.0418
MSE200.150.0127
MC300.480.1471
QMC300.210.0341
MSE300.130.0098
The best performance within each sample size is highlighted in bold.
Table 8. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.X distribution.
Table 8. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.X distribution.
Methodnhmin L2.pdf
MC100.900.2067
QMC100.360.0563
MSE100.290.0305
MC200.460.1504
QMC200.280.0379
MSE200.180.0119
MC300.530.1072
QMC300.240.0295
MSE300.150.0094
The best performance within each sample size is highlighted in bold.
Table 9. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.XI distribution.
Table 9. The minimum L2.pdf and the corresponding bandwidth h of the kernel density estimation for No.XI distribution.
Methodnhmin L2.pdf
MC100.710.1607
QMC100.380.0628
MSE100.350.0441
MC200.570.1592
QMC200.290.0481
MSE200.220.0212
MC300.760.1392
QMC300.250.0382
MSE300.170.0137
The best performance within each sample size is highlighted in bold.
Table 10. Parameter estimates of the GAST model based on O 3 data.
Table 10. Parameter estimates of the GAST model based on O 3 data.
Parameters α ^ s ^ ν ^ μ ^ σ ^
QMC-MLE−0.1518−0.203016.96071219.228385.0162
Table 11. The estimation bias of the four statistics for the fitted GAST model is based on three types of RPs with size 30.
Table 11. The estimation bias of the four statistics for the fitted GAST model is based on three types of RPs with size 30.
MethodMeanVarianceSkewnessKurtosis
MC39.6251−46,385.8384−0.1111−1.3819
QMC−0.1250−10,554.1427−0.0121−0.7845
MSE0.0048−579.85220.0005−0.0641
Table 12. Parameter estimates of the GAST model based on Faithful Geyser data.
Table 12. Parameter estimates of the GAST model based on Faithful Geyser data.
Parameters α ^ s ^ ν ^ μ ^ σ ^
Plain-MLE−2.4016−0.232210070.63018.7872
Table 13. The estimation biases of the four statistics for the fitted GAST model based on three types of RPs with size 50.
Table 13. The estimation biases of the four statistics for the fitted GAST model based on three types of RPs with size 50.
MethodMeanVarianceSkewnessKurtosis
MC2.8239−11.07490.029170.5070
QMC0.0010−2.57610.0010−0.0907
MSE0.0000−0.1364−0.0001−0.0051
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Y.-F.; Lin, Y.-X.; Fang, K.-T.; Yin, H. The Representative Points of Generalized Alpha Skew-t Distribution and Applications. Entropy 2024, 26, 889. https://doi.org/10.3390/e26110889

AMA Style

Zhou Y-F, Lin Y-X, Fang K-T, Yin H. The Representative Points of Generalized Alpha Skew-t Distribution and Applications. Entropy. 2024; 26(11):889. https://doi.org/10.3390/e26110889

Chicago/Turabian Style

Zhou, Yong-Feng, Yu-Xuan Lin, Kai-Tai Fang, and Hong Yin. 2024. "The Representative Points of Generalized Alpha Skew-t Distribution and Applications" Entropy 26, no. 11: 889. https://doi.org/10.3390/e26110889

APA Style

Zhou, Y. -F., Lin, Y. -X., Fang, K. -T., & Yin, H. (2024). The Representative Points of Generalized Alpha Skew-t Distribution and Applications. Entropy, 26(11), 889. https://doi.org/10.3390/e26110889

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop