1. Introduction
Statistics is an art of science involving data collection and data analysis, in which statistical modeling relies on various types of statistical distributions. Such distributions are either discrete or continuous and either univariate or multivariate distributions. For an unknown continuous distribution
F in
, the conventional approach is to approximate
F using the empirical distribution of a random sample. The empirical distribution is discrete, consisting of support points from the random sample, with each point contributing equally to the approximation. Because of the accuracy problem resulting from the empirical distribution, we want to construct a discrete distribution
that approximates the distribution
F while preserving the distribution information as much as possible. Consider a random vector
X following a continuous distribution
, characterized by a probability density function (pdf)
. In contrast, a discrete random vector
Y is characterized by a probability mass function (pmf)
.
where
are support points of
Y and
For defining an approximation distribution
to
it should satisfy
- (i)
is a function F;
- (ii)
A pre-decided distance between F and is small;
- (iii)
in distribution as , where k in is the number of support points of Y. In this case, the support points are called representative points (RPs). There are several ways to choose an approximation distribution .
1.1. Monte Carlo—RPs
Let
be a random vector, where
represents the parameters. For instance, for the normal distribution
, the parameters are denoted as
. In traditional statistics, random samples are utilized to make inferences about the population. Specifically, a collection of independently and identically distributed (iid) random samples, denoted as
, are drawn from the population distribution
F. The empirical distribution of the random sample is defined as follows:
where
is the indicator function of
A, and the inequalities
means that
(
,
) where
and
. Many statistical inferences rely on the empirical distribution
, which includes various methods such as:
- (1)
Parameter estimation (point estimation and confidence interval estimation);
- (2)
Density estimation;
- (3)
Testing hypothesis, and so on.
The empirical distribution is a discrete distribution with support points each having the sampling probability and can be considered as an approximation of in the sense of consistency, i.e., in distribution as . In statistical simulation, a set of random samples can be generated by computer software under the Monte Carlo (MC) method. Therefore, we denote random variable or in this paper. The MC methods have been commonly used. For instance, in the case of a normal population with unknown parameters and , one can utilize the sample mean and the sample variance to estimate and , respectively.
As the empirical distribution can be regarded as an approximation distribution to
, one can therefore take a set of random samples from
instead of from
F, as suggested by Efron [
1], which is called as
bootstrap method. The bootstrap method is a resampling technique, where the random sample takes from an approximation distribution
. Later, Efron gave a comprehensive study on the theory and application of the bootstrap method.
The MC method has proven to be useful in statistical theory and applications. However, its efficiency is not always good due to the convergence rate of in distribution, which is as . The slow convergence leads to unsatisfactory approximations when performing numerical integration using the MC method. While the empirical distribution serves as one approximation to the true distribution F, alternative approaches were proposed in the literature to address this issue.
1.2. Number-Theoretic RPs or Quasi-Monte Carlo RPs
Let us consider the numerical calculation for high-dimension integration in a canonical form
where
f is a continuous function on
. Let
be a set of
k points uniformly scattered on
. One can use the mean of
, denoted by
, to approximate
. By the MC method, we can employ a random sample from the uniform distribution
. The rate of convergence of
is
, which is relatively slow but does not depend on the dimensionality
d. How to increase the convergence rate is an impotent subject in applications. The number-theoretic methods (NTM) or quasi-Monte Carlo methods (QMC) provide many methods for the construction of
such that
are uniformly scattered on
, by which it can increase the rate of convergence into
. For the theory and methodology of NTM/QMC, one can refer to Hua and Wang [
2] and Niederreiter [
3]. In the earlier study on NTM, many authors employed the star discrepancy as a measure of the uniformity of
in
. The star discrepancy is defined by
where
is the cdf of
and
is the empirical distribution of
. An optimal
has the minimum
. In this case the points in
are called
QMC-RPs that are support points of
each having the equal probability
. In this paper we denote
by
and random vector
. Another popular measure is the
-distance between
and
FWhen
F is the uniform distribution on
, the the
-distance is called as
-discrepancy. The star discrepancy is the
-discrepancy as
. In the literature a set of
under a certain structure is regarded as a set of
quasirandom F-numbers if its discrepancy has an order
under the given discrepancy. When
F is the uniform distribution on
(
), the quasirandom
F-numbers are called as
quasirandom numbers. The reader can refer to Fang and Wang [
4] for details. Due to the numerical computation complexity of the
-discrepancy (
), the
-discrepancy has a simple computational formula. There are more uniformity measures in the experimental design such as the centered
-discrepancy, wrap-around
-discrepancy and mixture
-discrepancy (refer to Fang et al. [
5]). Fang and Wang [
4] and Fang et al. [
6] gave a comprehensive study on NTM and applications of NTM in statistical inference, experimental design, geometric probability, and optimization. Pages [
7] gave a detailed study on applications of QMC to financial mathematics.
Section 6 will introduce some algorithms for the generation of QMC-RPs.
1.3. Mean Square Error—RPs
Another measure to choose discrete approximation distribution to a given continuous distribution is the mean square error (MSE), and the corresponding support points are called MSE-RPs.
Definition 1. Suppose that a random vector in has a density function with finite mean vector and covariance matrix. A set of points of is called MSE-RPs if it minimizes the mean square error (MSE)where denotes -norm of . Given a set of
k points
in
a set of regions are defined by
that are called as
Voronoi regions, where
is the attraction domain of
.
For a univariate distribution
with pdf
, mean
and variance
, its MSE-RPs can be sorted as
, and its MSE can be expressed as
where
The corresponding
has support points
with probabilities
where
Its loss function (LF) is defined by
It is known that . The loss function shows what percentage of is lost by using replacing X.
The concept of MSE-RPs has been motivated by various problems. In the context of the grouping problem, Cox [
8] considered the task of condensing observations of a variate into a limited number of groups, where the grouping intervals are selected to retain maximum information. He introduced the concept of mean squared error (MSE) and provided several sets of MSE-RPs for the standard normal distribution. The concept of MSE-RPs is also relevant in data transmission systems, where analog input signals are converted to digital form, transmitted, and then reconstituted as analog signals at the receiver. The problem of optimal quantization of a continuous random variable with a fixed number of levels was precisely defined by Max [
9]. In IEEE journals, MSE-RPs are also referred to as “quatizers”.
Fang and He [
10] proposed the mathematical problem based on the national Chinese garment standard (refer to Fang [
11]). Iyengar and Solomon [
12] considered the mathematical problems that arise in the theory of representing a distribution by a few optimally chosen points. Flury [
13], Flury [
14] studied a project of the Swiss army to replace existing with newly designed protection masks. They used “
principal points” for MSE-RPs due to some link between the principal components and MSE-RPs. The MSE-RPs are also applied to select a few “representative” curves from a large collection of curves which is useful for kernel density estimation (see Flury and Tarpey [
15]) and for psychiatric studies by Tarpey and Petkova [
16]. Furthermore, MSE-RPs can be applied to problems related to the numerical computation of conditional expectations, stochastic differential equations, and stochastic partial differential equations. These applications are often motivated by challenges encountered in the field of finance [
7]. There was a special issue of “IEEE Transaction on Information Theory” on vector quantizers in 1982, a very detailed review on quantization by Gray and Neuhoff [
17]. There are several monographs on the theory and applications of RPs, for example, Graf and Luschgy [
18] “
Foundations of Quantization for Probability Distributions” and Pages [
7] “
Numerical Probability, An Introduction with Applications to Finance”.
The use of different types of representative points (RPs) allows for the construction of diverse approximation distributions to represent the underlying population distribution. By utilizing these approximation distributions, researchers can make more reliable and precise statistical inferences. The objective of this paper is to provide a comprehensive review of various types of RPs and their associated theory, algorithms, and applications. The focus of this review extends to the examination of recent advancements in the field, highlighting the latest developments and emerging trends. This paper aims to offer valuable insights into the current state of the art and provide researchers and practitioners with a deeper understanding of the potential applications and implications of RPs in statistical science. In
Section 2, we present a comprehensive list of properties associated with MSE-RPs for univariate distributions.
Section 3 focuses on reviewing various algorithms used for generating MSE-RPs for univariate distributions. In
Section 4, we compare various types of RPs in terms of their performance in stochastic simulation and resampling. Additionally, we show the consistency of resampling when MSE-RPs. Properties of MSE-RPs for multivariate distributions are reviewed in
Section 5, and algorithms for generating QMC-RPs and MSE-RPs for multivariate distributions are introduced in
Section 6. QMC-RPs and MSE-RPs have found numerous applications across various domains. In this paper, we focus on selected applications in statistical inference and geometric probability due to space limitations.
2. Properties of MSE-RPs for Univariate Distributions
We collect some properties of MSR-RPs in the literature in this section. These properties can be grouped into different issues. Some properties are only for the univariate distributions, and some are for the multivariate ones. The following results are from many articles, including Fei [
19] under the notation in the previous section.
Theorem 1. Let X be a continuous random variable with pdf , finite mean μ and variance . Then we have The property (
A) can be regarded as “unbiased mean”. The property (
C) gives a decomposition of variance of
X as
The concept of self-consistent has been used in the clustering analysis and has a close relation with MSE-RPs.
Definition 2. The set of k points in is called self-consistent with respect to the d-variate random vector X and the partition of ifwhere the region is the domain of attraction of . Tarpey and Flury [
20] gave a comprehensive study on the self-consistent and they pointed out
- (1)
MSE-RPs are self-consistent with respective to X;
- (2)
MSE-RPs have the minimum mean square error among all sets of the self-consistent to X.
2.1. Existence and Uniqueness of MSE-RPs
The existence of MSE-RPs is no problem for any continuous distributions with the first second moments. For the case of , the MSE-RP is the mean of X. This fact indicates that the MSE-RPs can be regarded as an extension of the mean. The MSE-RPs are no analytic formula for most cases of , but there are some discoveries on the symmetric distributions. In this paper, the notation means two random vectors X and Y have the same distribution.
Definition 3. A random vector is symmetric to if and X is symmetric to its mean vector if .
Theorem 2. Let be a set of MSE-RPs for a symmetric distribution about , then the set of is also a set of MSE-RPs. Furthermore, if the set of MSE-RPs for is unique, and its MSE-RPs are sorted as , thenwhere is the largest integer not exceeding a. The following review is for the univariate distribution
F with mean
and variance
. Sharma [
21] pointed out that the MSE-RPs of a symmetric distribution about zero do not need to be symmetric if the set of MSE-RPs is not unique.
Theorem 3. Let X be a continuous random variable with pdf , finite mean μ and variance and the distribution of X is symmetric about μ. Let . The two MSE-RPs of X areand the corresponding MSE is with the relatedif and only if This theorem was presented in Flury [
13]. If the condition (
12) does not hold, Gu and Mathew [
22] gave a detailed study on some characterizations of symmetric two MSE-RPs. Their results are listed below.
Let X be a random variable with density symmetric about the mean and continuous at ; then
- (a)
If , then and are 2 MSE-RPs of X;
- (b)
If , it implies that the above points do not provide local minimum of MSE.
They pointed out that Theorem 3 needs to be modified and gave a counterexample about the standard symmetric exponential distribution with pdf and mean . It is easy to find that , but are MSE-RPs.
More examples are discussed in their article. If two random variables
Z and
X have the relationship
and MSE-RPs of
X are known, then MSE-RPs of
Z can be easily obtained (Fang and He [
10] and Zoppè [
23]).
Theorem 4. Let be MSE-RPs of X, then has MSE-RPs of with MSE to be .
There are three special families that satisfy the above relationship: the location-scale family (), the location family () and the scales family ().
The study on the uniqueness of MSE-RPs is a challenging problem. Fleischer [
24] gave a sufficient condition “log-concavity” for the uniqueness of the MSE-RPs. The sufficient conditions in summary:
Trushkin [
25] proved that a log-concave probability density function has a unique set of MSE-RPs.
Definition 4. A continuous random variable X is said to have a log-concave density if it satisfiesfor all and all in the support of X. Log-concavity of the density is a well-known property, which is satisfied by a large number of remarkable distributions including the normal distribution.
Table 1 lists some log-concave densities, where the kernel of
ignores some constant in
so that the condition for log-concavity of
remains the same. The exponential distribution is the case of gamma distribution with
, and the uniform distribution
is the special case of beta distribution with
.
Example 1. A finite mixture of distributions allows for great flexibility in capturing a variety of density shapes. Research into mixture models has a long history. The most cited early publication is Pearson [26], as he used a two-component normal mixture model for a biometric data set. The density of a mixture of two normal distributions, denoted by , is Li et al. [27] gave a detailed study on several aspects of the distribution: “unimodal or bimodal”, “measure of disparity of two normals”, and “uniqueness of MSE-RPs”. Generally, the uniqueness of MSE-RPs is not always true, but under some conditions it is true. For example, for a location mixture of two normal densities with , a set of MSE-RPs is is unique if for all . 2.2. Asymptotic Behavior of MSE-RPs
There are a lot of studies on the asymptotic behavior of MSE-RPs; for example, see Zador [
28], Su [
29], Graf and Luschgy [
18], and Pagès [
7]. It is well-known that the distribution tail gives strong inference on statistical inference. According to different standards, there are many kinds of classification methods for statistical distribution. Embrechts et al. [
30] classified the distributions based on the convergence rate of pdf
as
, and they defined the so-called
heavy-tailed distribution and
light-tailed distribution, in which the exponential distribution is used as a standard for comparison. The following formal definitions are from Foss et al. [
31].
Definition 5. The univariate random variable X with the distribution function F is said to have a heavy tail if Otherwise, F is said to have a light tail if Obviously, any univariate random variable supported on a bounded interval is light-tailed. In fact, this definition can intuitively reflect that the tail of a heavy-tailed distribution is heavier than the tail of the exponential distribution. Moreover, the long-tailed distribution is an important subclass of heavy-tailed distribution and is more commonly used in applications. The formal definition of a long-tailed distribution was given by Foss et al. [
31] as follows.
Definition 6. The univariate random variable X with distribution function F is said to be long-tailed ifor equivalentlywhere . Xu et al. [
32] studied the limiting behavior of the gap between the largest two representative points of a statistical distribution and obtained another kind of classification for the most useful univariate distributions. They illustrate the relationship between RPs and the concepts of doubly truncated mean residual life (DMRL) and mean residual life (MRL), which are widely used in survival analysis. Denote
They consider three kinds of distributions according to the domain of distribution, i.e.,
,
, and finite interval.
Table 2 shows limiting value of
of the normal,
t, and logistic distributions. Their density functions are
respectively. It is surprising that the normal distribution and
t distribution have such different behavior, although the normal distribution is the limiting distribution of the student’s
t distribution as
.
Table 3 presents limiting value of
for many useful distribution on
. These distributions include the Weibull distribution with density
the Gamma and exponential distributions with respective densities
the density of the
F-distribution with degrees of freedoms
and
the Beta prime distribution with density
the lognormal distribution with density
and the inverse Gaussian distribution with density
Observing on these results [
32] gave Theorem 5.
Theorem 5. If the univariate random variable X supported on is long-tailed, then For the distributions on the finite interval
, Xu et al. [
32] gave a systematic study including the following result.
Theorem 6. Suppose that a random variable X has continuous probability density function on and . Let be the k MSE-RPs of X. If converges uniformly to , , thenprovided that the above limit exists. 3. Algorithms for Generation of MSE-RPs of Univariate Distributions
Generation of MSE-RPs is very important for applications. This section reviews algorithms for the generation of univariate distributions. To minimize the mean square error (
5) is an optimization problem, including some difficulties:
The objective function is multivariate on simplex ;
The objective function might be not differentiable in the whole domain;
The minimum of the objective function is not unique, and the objective function may have more local minimums on the domain.
This kind of problems can not be directly solved by the classical optimization methods (such as the downhill simplex method, quasi-Newton methods, and conjugate gradient methods) for most of distributions.
There are three main different approaches for the generation of RPs:
- (a)
Theoretic approach or combining the theoretic approach and computational calculation;
- (b)
Applying the k-means method finds approximate RPs, and this approach can be applied to all of univariate and multivariate distributions;
- (c)
To solve a system of nonlinear equations.
Approach (a) can be used for very few distributions, such as the uniform distribution in a finite interval. [
33] proposed the method for finding MSE-RPs of the exponential and Laplace distributions by combining the theoretic approach and computational calculation.
Approach (b) applies the k-means methods to any continuous univariate and multivariate distributions. The traditional k-means algorithm needs a set of n observations from the underlying distribution and the user needs to cluster those observations into k groups under a loss function . The k-means algorithm begins with k arbitrary centers. Each observation is then assigned to the nearest center, and each center is recomputed as the center of mass of all points assigned to it. These steps (assignment and center calculation) are repeated until the process stabilizes. One can check that the total error is monotonically decreasing, which ensures that no clustering is repeated during the course of the algorithm. The mean square error (MSE), see Definition 1, has been popularly used as an error .
It seems to us that Polard [
34] was the first one to propose this approach. Along this line, Lloyd [
35] proposed two trial-and-error methods. This approach is easy to implement, but it needs to choose a good quality initial and a large number of training samples. There are two kinds of
k-means algorithms: nonparametric and parametric
k-means algorithms. If the population distribution is known, the training samples are from the known population distribution and the corresponding
k-means algorithm is parametric; otherwise, the underlying distribution is unknown and the corresponding
k-means algorithm is nonparametric. Usually, the parametric
k-means algorithm is more accurate for most univariate distributions.
The parametric k-means algorithm
- (1)
For given pdf
, the number of RPs:
k, and
, input a set of initial points
. Determine a partition of
as
where
- (2)
Calculate probabilities
and the condition mean
- (3)
If two sets of and are identical, the process stops and deliver as MSE-RPs of the distribution with probabilities ; otherwise, let and go back to Step (1).
Stampfer and Stadlober [
36] called this algorithm the self-consistency algorithm as the output set of RPs is self-consistent (not necessarily MSE-RPs).
Approach (c) was proposed by Max [
9] and Fang and He [
10] based on the traditional optimization for minimizing the mean square error function
with respect to
, where the objective function (
5) is differentiable, taking partial derivative of
Y with respect to
and constructing a system of equations. Its solutions, denoted by
, might be the global minimum of
Y, i.e., MSE-RPs. For
there are three kinds of equations in (
13), (
14), and (
15), respectively. Fang and He [
10] gave the conditions for the solution to be unique under the normal distribution.
Theorem 7. Taking partial derivative of Y with respective to to , we have three kinds of equations:
- 1.
For any , for the equationthere exists a solution if and only if . - 2.
For given , for the equationthere exists a solution when , where is the th representative point in the set of MSE-RPs which has . - 3.
For any , for the equationthere exists a solution .
The Fang–He algorithm has been applied to many univariate distributions. Max [
9] and Fang and He [
10] obtained sets of MSE-RPs of
for
and
, respectively. Fu [
37] applied the Fang–He algorithm to the gamma distribution
and obtained MSE-RPs for
. Ke et al. [
38] gave a more advanced study on MSE-RPs of the gamma distribution. Zhou and Wang [
39] studied the
t distribution with 10 degrees of freedom and gave MSE-RPs for
. Fei [
40] proposed an algorithm for generating MSE-RPs by Newton optimization algorithm. Li et al. [
27] gave a detailed study on MSE-RPs of the mixture normal distributions. Fei [
41] studied the class of Pearson distributions where the pdf of
X has the form of
where
c is the normalized constant and parameters
satisfy the differential equation
The class of Pearson distributions includes many useful distributions. For example, type I is the beta distribution; type II is the symmetrical
U-shaped curve; type III is the shifted gamma distribution; type V is the shift inverse gamma distribution; type VI is the inverse beta distribution; type VII is the
t distribution; type VIII is the power function distribution; type X is the exponential distribution; and type XI is the normal distribution. Fei [
41] gave some sufficient conditions for the uniqueness of the solution.
Comparisons of the three kinds of generations of MSE-RPs: The approach (a) obviously is the best but only for a few distributions. The approach (b) can be applied to any continuous univariate and multivariate distributions. For the generation of univariate MSE-RPs, the parametric k-means algorithm does not need the training sample. Many authors have used this algorithm with a good initial set of points. The approach (c) can find the most accurate MSE-RPs of univariate distributions, but it needs a heavy computational calculation if k is larger.
5. Property of MSE-RPs of Multivariate Distributions
Let
be a random vector with cdf
and pdf
. Assume that
X exist finite mean vector
and covariance matrix
. A set of MSE-RPs of
X is denoted by
that minimizes the mean square error (MSE) and the corresponding vector is
with Voronoi regions
and probabilities
(refer to Definition 1). The following results are from Flury [
13,
14].
Theorem 11. Under the above assumption on X we have
- 1.
When the MSE-RP is given by ;
- 2.
, i.e., is in the convex hull of ;
- 3.
MSE-RPs are self-consistent andwhere is the covariance matrix of ; - 4.
The rank of .
Theorems 2 and 4 can be easily extended to the multivariate case, but Theorem 4 needs some change in linear relation for extension below.
Theorem 12. Let and be two random vectors in with relation , where , and is an orthogonal matrix of order d. We have
- (a)
If is a set of self-consistent points of , then is a set of self-consistent points of ;
- (b)
If is a set of MSE-RPs of , then is a set of MSE-RPs of .
There are various kinds of symmetry in multivariate distributions, among which the class of elliptically symmetric distributions is an extension of the multivariate normal distribution and includes so many useful distributions. For a comprehensive study, refer to Fang et al. [
46].
Definition 9. Spherically and elliptically symmetric distributions. A d-dimensional random vector X is said to have an elliptically symmetric distribution (ESD), or elliptical distribution for short, if X has the following stochastic representation (SR)where random variable is independent of that is uniformly distributed on the unit sphere in , , Ψ is a positive definite matrix of order d (not necessary to be ), and is the positive definite square root of Ψ. We write if X has a density of the formwhere g is called as the density generator. When , X has a spherical distribution with the stochastic representationand write , where is the density of X. In general, an elliptical/spherical distribution is not necessary to have a density. For example,
does not have a density in
. If the distribution of
X is spherical and
, then
and
are independent. It is known that
X defined in (
24) has a density if and only if
R has a density
. The relationship between
and
is given by
Table 5 lists some useful subclasses of the elliptical distributions.
Flury [
13] is the first one who found some relationship between the principal components and the MSE-RPs of the elliptical distribution in the following theorems.
Theorem 13. Suppose with mean vector , covariance matrix Σ that is proportional to Ψ and density generator g. Then, the two MSE-RPs of X have of the formwhere is the normalized characteristic vector associated with the largest eigenvalue of Σ, and are the MSE-RPs of the univariate random variable . If the MSE-RPs are not unique, they can be chosen as the given form. Tarpey et al. [
47] established a theorem, called the
principal subspace theorem, which shows that
k principal points of an elliptically symmetric distribution lie in the linear subspace spanned by the first several principal components.
Theorem 14. Let . If a set of k MSE-RPs of X spans a subspace of dimension , then Σ has a set of eigenvectors with associated ordered eigenvalues such that is spanned by .
The principal subspace theorem exploresthe set of MSE-RPs of an elliptical distribution that has a close relationship with its principal components. It is why Flury [
13] called the MSE-RPs principal points. Tarpey [
48] and Yang et al. [
43] proposed ways to generate a set of MSE-RPs of elliptical distributions in several subclasses of elliptical distributions and explore more relationships between the principal components and MSE-RPs. Their studies need algorithms for producing MSE-RPs.
Yang et al. [
43] consider numerical simulation for estimation of mean vector and covariance matrix of the elliptical distributions and show that both QMC-RPs and MSE-RPs have better performance than MC-RPs. They also studied the distribution of MSE of MC-RPs for univariate distributions and elliptical distributions and pointed out that MSE of MC-RPs can be fitted by the extreme value distribution. For a random sample with a poor MSE value, it does not expect to have a good result based on this set of random samples.
6. Algorithms of Generation for RPs of Multivariate Distributions
There are a lot of methods for generating a random sample from a given multivariate distribution
. Johnson [
49] gave a good introduction to various methods. There are two useful methods: conditional decomposition and stochastic representation.
6.1. Conditional Decomposition
The conditional distribution method changes generation for a multivariate distribution into generation for several conditional univariate distributions. Suppose random vector
has the cdf
. Let
be the cdf of
and let
be the conditional distribution of
given
. It is known in theory of probability
Note that each of and is a univariate (conditional) distributions. We can apply some methods including the inverse transformation method to generate a random sample from these distributions. Denote a set of random samples from these univariate (conditional) distributions by , then is a random sample from X. In particular, when are independent, , where is the cdf of .
6.2. Stochastic Representation
Let
. Suppose that
X has a stochastic representation
where
h is a set of continuous functions on
and
Y follows the uniform distribution on
. The Monte Carlo simulation can find a random sample
from
. Then,
is a random sample from
.
The SR method can be extended to generate a set of QMC-RPs and MSE-RPs. The QMC method employ a set of quasirandom numbers on , denoted by . Set . Then, the set of is called a set of quasirandom F-numbers, which can be regarded as another kind of RPs of , i.e., NTM-RPs or QMC-RPs.
Generating a set of MSE-RPs is not possible for most multivariate distributions. If we focus on some class of multivariate distributions that are easily generated by MC or QMC, then the generation of MSE-RPs becomes much easier. One method is the LBG Algorithm by the use of
k-means method proposed by Linde et al. [
50]. The LBG algorithm requests a training sequence
from the given distribution
by a Monte Carlo method, where
N is much larger than
k and
k is the number of RPs for
. The next step chooses a set of initial vectors using the same Monte Carlo method and finds the associated Voronoi partition
by assigning each
to the nearest region of the partition
. Then follow the procedure of the
k-means algorithm and iteration steps until reach the stopping rule.
Although the LBG algorithm can reach a local optimal output with a non-increasing MSE, Fang et al. [
51] pointed out two problems when applying this algorithm:
- (a)
The algorithm gives the local optimum and the results are dependent on the initial points;
- (b)
The generation of samples of and the calculation of MSE are based on the Monte Carlo method, which is less efficient with the convergence rate .
Fang et al. [
51] revised the LBG algorithm by the use of quasirandom
F-numbers in producing the set of training samples and the set of initial vectors. They proposed the so-called NTLBG algorithm for the generation of QMC-RPs of an elliptical distribution.
Recall a spherical distribution
has a SR
in (
25). If we can find a set of quasirandom numbers of the uniform distribution on
and a set of quasirandom numbers of
R, their product can produce a set of QMC-RPs of
X. An effective algorithm for generating a set of QMC-RPs on
can refer to Fang and Wang [
4]. The latter calls this algorithm as TFWW algorithm. It is easy to see that if
, then
for any orthogonal matrix
of order
d. Therefore, if
is a set of MSE-RPs of
X, then
is also a set of MSE-RPs of
X. That means that the set of MSE-RPs for spherical distributions is quite not unique.
6.3. The NTSR Algorithm for the Generation of a Spherical Distribution
Generate a set of quasirandom numbers on .
Denote the cdf of R by and let be its inverse function. Compute , .
Generate a set of quasirandom F-numbers of the uniform distribution on with the first -components of ’s via the TFWW algorithm.
Then is a set of quasirandom F-numbers or QMC-RPs of the given spherical distribution .
This algorithm can be easily extended to generation of quasirandom F-numbers or QMC-RPs for elliptical distributions. The NTLBG algorithm has the following steps:
Step 1. For a given , generate a set of quasirandom F-number as a training sequence by the NTSR algorithm with a large N.
Step 2. Set t = 0. For a given k, generate a set of quasirandom F-numbers of as an initial set of output vectors.
Step 3. Form a partition of such that each is assigned to the nearest region of the partition, i.e., if , .
Step 4. Calculate the sample conditional means
and form a new set of output vector vectors
, where
and
is the number of
falling in
. If
, deliver
as MSE-RPs and go to
Step 6 and
as estimated probability of
; otherwise go to the next step.
Step 5. Let and go to Step 3.
Step 6. Calculate and deliver MSE
or
by its estimate based on the training sequence.
The NTLBG algorithm has used in generation QMC-RPs and MSE-RPs for elliptical distributions ([
43,
52,
53]) and the skew-normal distribution in Yang et al. [
43].
8. Concluding Remarks
The bootstrap method, originally proposed by Efron [
1], has found wide applications in statistical theory and practice. This method involves drawing random samples from the empirical distribution, which serves as an approximation to the population distribution
. However, due to the inherent randomness of these samples, the bootstrap method has certain limitations. To overcome this, a natural solution is to construct support points called RPs that offer a more representative characterization of the distribution
compared to random samples.
This paper discusses three types of RPs: MC-RPs, QMC-RPs, and MSE-RPs, along with their respective approximations. Theoretical foundations and practical applications demonstrate that all of these RPs can be effectively and efficiently utilized for statistical inferences, including estimation and hypothesis testing. In many case studies, MSE-RPs and/or QMC-RPs have shown better performance compared to MC-RPs. QMC-RPs have been widely applied in various fields, including numerical integration in high dimensions, financial mathematics, experimental design, and geometric probability. This paper provides a comprehensive review of the theory and applications of MSE-RPs, with particular emphasis on recent developments. MSE-RPs exhibit significant potential for applications in statistics, financial mathematics, and big data analysis.
However, in the theory of MSE- and QMC-RPs, several open questions remain. For instance, although several new RP construction methods have been proposed, these methods still lack solid theoretical justifications and practical applications. Further research is needed to address these gaps and advance the field.
We are creating a website (
https://fst.uic.edu.cn/isci_en/index.htm, accessed on 19 June 2023) where readers can access fundamental knowledge about RPs and MSE-RPs for various univariate distributions in the near future. Additionally, we are in the process of incorporating R software that generates MSE-RPs into the website, which will be available soon. While there are existing monographs such as “Foundations of Quantization for Probability Distributions” by Graf and Luschgy [
18] and “Numerical Probability, An Introduction with Applications to Finance” by Pages [
7], these works do not specifically focus on applications in statistical inference. Therefore, there is a need for a new monograph that covers recent advancements in both theory and applications. This review article can serve as a valuable resource, providing relevant content and establishing connections for a potential new book in this area.