1. Introduction
Statistical distributions play a crucial role in information theory since they describe the probability characteristics of data or signals, and hence directly affect the accuracy and efficiency of the representation, transmission, compression, and reconstruction of information. Entropy, as the most important measure in the field of information theory, depends on the statistical distribution of the random variable. In many applications of information theory, it requires the assumption of the statistical distribution of the data. Although assumed to follow the normal distribution in most statistical analyses due to mathematical convenience and generality, real-world data frequently exhibit skewness, leading to the demand for more flexible models. The geometric Brownian motion (GBM) as a popular model of stochastic processes assumes that its solutions follow the log-normal distribution. Gupta et al. (2024) [
1] indicated that the GBM yields trajectories significantly deviated from the reference distribution when the data do not meet the log-normal assumption. To deal with the limitations in such a scenario, some may consider correcting the model as in [
1]. Constructing alternative distributions of the normal distribution has been a common concern.
The skew-normal (SN) distribution is an extension of the normal distribution that allows for skewness, capable of modeling asymmetric data. It was first introduced by Azzalini (1985) [
2]. If a random variable
Z has a probability density function (pdf) given by
where
and
are the pdf and cumulative distribution function (cdf) of the standard normal distribution, then
Z follows the SN distribution, denoted as
. The parameter
s controls the skewness of the distribution. When
, the SN distribution reduces to the standard normal distribution. With
, the SN distribution is right-skewed, while
implies left skewness.
The skew-
t (ST) distribution is an intriguing example among scale mixtures of SN distributions. It was first formulated by Branco and Dey (2001) [
3] and later extensively studied by Azzalini and Capitanio (2003) [
4]. An ST random variable,
, can be represented as
where
and
, i.e., chi-square distribution with
degrees of freedom, are independent of each other. The moment of
Y exists only when the order is less than
, which is the same condition required as the Student’s
t-distribution with
degrees of freedom, denoted by
. The construction method from the SN distribution to the ST distribution is similar to the approach used to derive the Student’s
t-distribution from the normal distribution. The pdf of the ST distribution is given by
where
is the pdf of
, and
is the cdf of
. The parameter
controls the tail heaviness. As
approaches infinity, the ST distribution approaches the SN distribution. Lower values of
result in heavier tails, providing robustness against outliers. Similar to the SN distribution, the parameter
s controls the skewness. When
, the ST distribution reduces to the Student’s
t-distribution. Azzalini and Genton (2008) [
5] conducted a quite extensive numerical exploration, demonstrating that the ST distribution can adapt well to various empirical problems. They utilized an autoregressive model of order one,
with
and
, to fit the 91 monthly interest rates of an Austrian bank. Their results clearly showed that the error components
have an ST distribution, where the small degrees of freedom parameter signifies heavy tails in the error distribution, allowing the ST model to better manage outliers than the normal distribution. The ST distribution, which combines the characteristics of the Student’s
t-distribution and the SN distribution, is particularly suitable for the applications in finance that need to model returns with skewness and excess kurtosis, as well as in environmental studies where the focus is on modeling extreme events. Martínez-Flórez et al. (2020) [
6] also mentioned other kinds of skew distributions like skew-Student-
t distribution, skew-Cauchy distribution, skew-logistic distribution and skew-Laplace distribution. They summarized those distributions as skew-elliptical distributions since those distributions have a unified expression form of the density function as
where
is a symmetric pdf, and
is the corresponding cdf.
Another type of skew distribution is to add a coefficient function with an
argument to the density function. Elal-Olivero (2010) [
7] proposed a distribution called alpha-skew-normal (ASN), with a pdf defined as
If a random variable
X has the pdf as (
4), we denote it as
. This distribution is more flexible than SN and ST distributions since it can be unimodal or multimodal by adjusting the
parameter. When
, the ASN distribution reduces to the standard normal distribution,
.
Although the ASN distribution is able to model both skew and bimodal data, it has limitations when data have tails thinner or thicker than the normal distribution. In order to fit stock data more accurately, Altun et al. (2018) [
8] introduced a new generalized alpha skew-
t (GAST) distribution combining the approaches of [
4,
7]. They combined the GAST distribution with the generalized autoregressive conditional heteroskedasticity (GARCH) model to build a new Value-at-Risk (VaR) prediction model for forecasting daily log returns in three years. They compared the failure rates of the GARCH models under different distribution assumptions including normal, Student’s
t, ST and GAST. The results showed that the GAST distribution performs the best in the backtesting. The definition of GAST distribution and its properties with proof will be elaborated in the next section.
For an unknown continuous statistical distribution, an empirical distribution of a random sample is a traditional way to approximate the target distribution. However, it often leads to low accuracy, and hence the support points for the discrete approximation, also known as representative points (RPs), are explored in order to preserve the information of the target distribution as much as possible. Representative points have a big potential for applications in statistical simulation and inference, see Fang and Pan (2023) [
9] for a comprehensive review. Various kinds of representative points of different statistical distributions have been explored in the literature. Especially for complex distributions, the study on the representative points is necessary. The concept of representative points is to simplify complex probability distributions with discrete points easier to manipulate, facilitating efficient computations and analyses. These points serve as a finite set that approximates the distribution of a random variable that can be either discrete or continuous and either univariate or multivariate. In this paper, we focus on the study of the representative points of the GAST distribution and applications. We first introduce the concepts of three kinds of RPs here, while the specific construction procedures are included in
Section 4 with their applications on the estimation of moments and densities.
There are many existing criteria for choosing RPs of a distribution, such as Monte Carlo RPs (MC-RPs), quasi-Monte Carlo RPs (QMC-RPs) and mean square error RPs (MSE-RPs) that will be introduced as follows. In fact, the Kullback–Leibler (KL) divergence or relative entropy of two probability distributions is a good criterion for this purpose. The entropy has been utilized as a measure of the experimental design, for example, Lin et al. (2022) [
10]. Due to computational complexity, entropy is not popularly used in generating RPs in applications. Therefore, in this article, we study MC-RPs, QMC-RPs, and MSE-RPs of the Generalized alpha skew-
t distribution only.
1.1. Monte Carlo Representative Points
Let
X be the population random variable with the cdf
. Various Monte Carlo methods provide ways to generate independent identically distributed (i.i.d.) samples {
} from the population, and
,
. The empirical distribution of the random sample is defined as follows:
where
is the indicator function of
A. The empirical distribution
should be close to
in the sense of consistency. Hence,
can be regarded as an approximation of
. We denote this empirical distribution of random samples generated by the Monte Carlo method as
. Traditional statistical inference is based on the empirical distribution. Efron (1979) [
11] proposed a resampling technique,
the bootstrap method, with which we can take a set of random samples from
instead of
F. Combined with bootstrap, the MC-RPs have proven to be useful in statistical inference, such as parameter estimation, density estimation and hypothesis testing. However, the MC method has many limitations since the convergence rate of
in distribution as
, given by
, is too slow. The following two kinds of RPs improve the convergence rate nicely.
1.2. Quasi-Monte Carlo Representative Points
For a high-dimensional integration problem:
where
f is a continuous function on
. Suppose that
is a set of
n points uniformly scattered in
, we can estimate
by
If we generate
by the MC method, the convergence rate of
is
as
. The quasi-Monte Carlo (QMC) method provides many ways for the construction of
to increase the convergence rate. Through the QMC method, the convergence rate can reach
according to Fang et al. (1994) [
12]. For further theory studies, readers can refer to Hua and Wang (1981) [
13] and Niederreiter (1992) [
14]. In the study of [
12], the
F-discrepancy is used to measure the uniformity of
in
, which is defined by
where
is the cdf of uniform distribution
and
is the empirical distribution of
. The
that minimizes
is called QMC-RPs which have equal probability
.
For the univariate distribution of this paper, the QMC method is designed to sample points that are uniformly distributed on the interval
. If the inverse function of
F exists, then the set of
n points:
has been proved to have the minimal
F-discrepancy of
from
[
12]. Therefore, the set of points
is called the QMC-RPs of
. Fang et al. (1994) [
12] gave a comprehensive study on QMC methods and their applications in statistical inference, experimental design, geometric probability, and optimization.
1.3. Mean Square Error Representative Points
The concept of MSE-RPs was independently proposed by Cox (1957) [
15], Flury (1990) [
16] and many others. In the literature, “MSE-RPs” have been called by different names, such as “quantized” and “principal points”. Let a random variable
with finite mean
and variance
. To provide the best representation of
F for a given number
n, we select a set of
n representative points having the least mean square error from
, and form a discrete distribution
. Denote
defined as
with the probability mass function
where
are MSE-RPs of
X and
are the corresponding probabilities with respect to
and
The MSE-RPs have many useful properties. Graf and Luschgy (2007) [
17], and Fei (1991) [
18] proved that
Hence,
converges to
X in distribution.
In this paper,
Section 2 begins by reviewing the definition and properties of the GAST distribution. To explore the relationship between the classification of the GAST distribution and the three parameters
, we apply the uniform design (Wang and Fang 1981 [
19]) to arrange the values of parameter combinations, and then depict the corresponding density plots.
Section 2 also classifies the GAST distribution according to the number of peaks in the density function with some proofs. The first four order moments and stochastic representation of the GAST distribution are shown in this section.
Section 3 mainly introduces a maximum likelihood estimation (MLE) method with a distribution-free quantile estimator: QMC-MLE (Li and Fang 2024 [
20]). In this QMC-MLE method, the estimated quantiles of the sample are used to replace the original sample, and then the MLE is performed on the estimated quantiles to obtain the parameter estimates. We explore the parameter estimation effectiveness of QMC-MLE for small samples by simulation in this section. In order to cover both unimodal and bimodal cases, we choose the GAST distribution with different parameter settings as the underlying distributions. In this section, we find that the effectiveness of QMC-MLE in parameter estimation is influenced by the number of peaks of sample.
Section 4 calculates the three types of RPs, MC-RPs, QMC-RPs, and MSE-RPs, of the GAST distribution for different sample size
n. For MSE-RPs, the calculation process requires a parametric k-means algorithm (Stampfer and Stadlober 2002 [
21]). We will compare the estimates of four statistics (mean, variance, skewness and kurtosis) by the three types of RPs of the underlying distributions. Another application of RPs is density estimation.
Section 4 combines the kernel density method (Rosenblatt 1956 [
22]) and the three types of RPs to estimate the density of the underlying GAST distributions.
Section 5 applies the RPs to real data samples to show the outstanding performance of MSE-RPs under the assumption of a GAST model.
3. Parameter Estimation
In parameter estimation, the maximum likelihood estimation has been widely utilized because of its transitivity. Let
be a random sample from the
distribution. The log-likelihood function is given by
By taking the partial derivatives with respect to
and
, we have
where
Remark that
,
,
, and
are the partial derivatives of
and
. The solution
satisfying
,
,
at the same time is the MLE of
. To solve the system of nonlinear equations in (
28), a numerical method is required. In the following subsections, we introduce the algorithm for solving MLE: L-BFGS-B (Byrd et al., 1995 [
24]) in
Section 3.1. In order to improve estimation accuracy by enhancing sample representativeness, we incorporate a non-parametric quantile estimation method (Harrell and Davis 1982 [
25]) introduced in
Section 3.2. In
Section 3.3, we evaluate the effectiveness of the algorithm and quantile estimation method by simulation. In our study, we use R software version 4.4.1 to conduct simulation.
3.1. L-BFGS-B
L-BFGS-B (Byrd et al., 1995 [
24]) is a limited-memory algorithm for solving large nonlinear optimization problems subject to simple bounds on the variables. The essence of the algorithm is a quasi-Newton method. At each iteration, a limited-memory BFGS approximation to the Hessian matrix is updated. This limited-memory matrix is used to define a quadratic model of the objective function, in our study indicating (
27). Given a set of samples
, the optimization problem can be formulated as follows:
We summarize the procedures of L-BFGS-B as following Algorithm 1.
Algorithm 1 L-BFGS-B for MLE |
- 1:
Input: Initial guesses for parameters , tolerance , maximum number of iterations N, bounds and - 2:
Output: Estimated parameters - 3:
Initialize - 4:
Initialize parameters - 5:
repeat - 6:
Compute the gradient - 7:
Compute the search direction using a two-stage approach [ 24] - 8:
Project the search direction to satisfy the bounds - 9:
Line search: find step size that maximizes - 10:
Update parameters: - 11:
- 12:
until or - 13:
|
We chose the L-BFGS-B algorithm because the degree of freedom must be greater than 2 for the GAST distribution. If the unconstrained optimization method is used, missing values are likely to appear in the optimization process.
3.2. QMC-MLE
In this subsection, we introduce a method for improving the accuracy of MLE. It is well-known that the accuracy of MLE depends on the sample size to a certain extent. If the sample misses the turning points of the population density, it is less representative, which may lead to lower estimation accuracy. This situation is prone to occur in small sized samples and especially bimodal cases. Fang and Wang (1994) [
12] pointed out that the set of equal quantiles
has the best representativeness in the sense of
F-discrepancy. In
Section 1, we introduce a QMC method to generate the RPs of a distribution with known parameters. However, for a distribution with unknown parameters, how can we obtain the
quantile of the distribution
F? Harrell and Davis (1982) [
25] proposed a distribution-free method: the Harrell–Davis (HD) quantile estimator. We use this estimator to calculate the set of equal quantiles of
F, and then substitute these
n quantiles into the likelihood function
for calculation. Li and Fang (2024) [
20] called the MLE method with HD quantile estimator as QMC-MLE, presented below.
Let be a random sample of size n from the GAST distribution. Denote as the largest value in and as the population quantile.
- Step 1:
Generate a set of points uniformly scattered on
through
- Step 2:
Use the Harrell–Davis quantile estimator to process sample:
where
and
denotes the incomplete beta function.
- Step 3:
Let
, for
. Therefore, the
in the log-likelihood function is replaced by
such that the objective function based on the revised sample is
- Step 4:
Use the L-BFGS-B algorithm to find the MLE of
by maximizing (
29).
3.3. Simulation
Before the simulation, we introduce four measures of the estimation accuracy: L2.pdf, L2.cdf, absolute bias index (ABI) and Kullback–Leibler (KL) divergence. Denote the true underlying distribution as F in cdf or f in pdf, and the estimated distribution as or . The four measures are defined as follows:
L2.pdf between two densities is defined as
L2.cdf between two cdf’s is defined as
Absolute bias index (ABI) is used to evaluate the overall estimation bias in parameters in which
and
denote the estimated expectation and standard deviation of the GAST distribution, defined as
Kullback–Leibler (KL) divergence or the so-called relative entropy is used to measure the difference from one probability distribution to another, defined as follows:
In the simulation, we generate samples by the inverse transformation method and mainly focus on the small sample case. To study both unimodal and bimodal cases, we choose five parameter settings, No.VII, VIII, IX, X and XI, of the GAST distribution from
Figure 1 as the underlying distributions, among which the No.VII, VIII, and XI distributions are bimodal. The sample size
n is set to be
and 300. After
times of repetition, the average of
is set to be the parameters of the estimated GAST distribution. The precision of the estimates is evaluated by L2.pdf, L2.cdf, ABI and KL, summarized in
Table 2, in which “plain” indicates the MLE resulting from the original sample
, and “qmc” uses the revised sample
.
The best performance in the sense of each measure for each pair of distribution type and sample size is highlighted in bold in
Table 2. The QMC-MLE method performs better than the plain MLE in most cases, especially for the No.VIII, IX and X distributions. However, for the No.VII and XI distributions, the QMC-MLE has no obvious advantage. The No.IX and X distributions are unimodal, but the No.VIII is bimodal. From the pdf plot of No.VIII distribution, we can see that although it is bimodal, its first peak is not as obvious as the peaks of No.VII and XI distributions. In the pdf plots of No.VII and XI distributions, as
x increases, the density function experiences a steep decline after the first peak, while for the No.VIII distribution, the decline lasts only for a short distance before it begins to rise again. Therefore, we have reasons to believe that the QMC-MLE method is more suitable for unimodal functions or bimodal functions of which one peak is not obvious.
In addition, for No.XI GAST distribution, in the sense of KL divergence, the plain MLE is better than the QMC-MLE for all sample sizes. As for the No.XI case under other measures, although the QMC-MLE performs better when
and 50, it becomes less effective for
and 300, which may be caused by the consistency of MLE. According to the discussion above, when we conduct case studies in
Section 5, the QMC-MLE will be only used for unimodal samples in parameter estimation, while for bimodal samples, we will use the plain MLE. Nevertheless, this simulation study reveals that the MLE method (both plain and QMC) is appropriate for estimating the GAST parameters due to the small values of four bias measurements.