1. Introduction
Statistics consists of three types of statistical inferences: Bayesian inference, frequency inferences and fiducial inferences. Fiducial inference was originally proposed by Fisher [
1] and was aimed to overcome the deficiency in Bayesian framework when there was little or no parameter information in prior distribution. From Fisher’s point of view, fiducial inference simply changed the logical identity of the parameter. Fiducial confidence interval is a kind of confidence interval based on fiducial statistical theory. It also treats the unknown population parameter as a random variable. Based on the fiducial inference principle, Li et al. [
2] illustrated the usefulness of the fiducial inferences method; Wang et al. [
3] have given construction method of prediction intervals for the normal distribution, exponential distribution, and gamma distribution; Veronese and Melilli [
4] developed a simple and direct method to define the fiducial distribution of real exponential families; Krishnamoorthy and Wang [
5] obtained the fiducial confidence limits and prediction limits of the gamma distribution; Hoang-Nguyen-Thuy et al. [
6] have given the fiducial estimation method of location scale distribution family, and listed several distributions for analysis.
Confidence intervals are used to describe parameters that have some uncertainty due to sampling error. There are many methods to construct the three confidence intervals. The pivotal quantities approach is commonly used to calculate confidence intervals and the approach based on pivotal quantities allows finding exact test or confidence interval for the values of the parameter. The pivotal quantity method has been used by many scholars to obtain confidence interval. For example, the pivotal quantity used in Chen [
7] for constructing confidence intervals was adjusted to improve the performance of the confidence intervals. Seo [
8] provided the exact confidence intervals for unknown parameters and exacted predictive intervals for the future upper record values by providing some pivotal quantities in the two-parameter Rayleigh distribution. Johnson [
9] mentioned a prediction interval covered a future observation from a random process in repeated sampling, and was typically constructed by identifying a pivotal quantity that was also an ancillary statistic.
In many real-world problems, the data did not satisfy the conditions of symmetry and the assumptions of normality are violated. The class of skew normal distributions is an extension of the normal distribution, allowing for the presence of skewness, see Azzalini [
10]. Since then, the skew normal distributions have been studied in a number of important areas, see Azzalini [
11] for details. Scholars have studied the confidence interval of a parameter in the skew normal distribution such as Mameli [
12] analyzing the approximate confidence interval of skewness parameter under large sample by using Fisher’s transformation; Wang et al. [
13] gave three confidence intervals for location parameters in skew normal distribution family with known coefficient of variation and skewness; Wang et al. [
14] studied the confidence interval of skewness parameter under skew normal distribution.
Based on our knowledge, the confidence interval of the mean, the prediction interval of future sample mean, tolerance interval of the quantile, and the fiducial distribution for the skew normal distribution are seldom studied. In this paper, we will use the pivotal approach to construct the confidence intervals for the skew normal distribution. All experiments are implemented using R software. The rest of the paper is organized as follows. Some basic properties and pivotal quantities of the skew normal distribution are introduced in
Section 2. The confidence interval of the mean for skew normal distribution is constructed by pivotal quantity method, and the simulation experiment is carried out in
Section 3. The prediction interval of the future sample mean and one-side tolerance limit are studied in
Section 4 and
Section 5. The fiducial distribution for the probabilities of the skew normal distribution is discussed in
Section 6. All proposed intervals are illustrated using an actual data in
Section 7. Some conclusions are given in
Section 8.
2. Point Estimates and Pivotal Quantities
According to Azzalini [
15], the probability density function (pdf) of the skew normal distribution is given as follows
where
is the location parameter,
is the scale parameter,
is the skewness parameter, and
and
are the probability density function and cumulative distribution function of the normal distribution, respectively. We denote it by
. Moreover, the effect of
on the skew normal distribution will be graphically shown in
Figure 1.
The expressions for the expectation, variance and skewness of the
SN(
) are
where
.
According to the properties of the skew normal distribution,
can be obtained from the skewness of the sample, which do not depend on
and
. Let
be a sample from
, and denote
to be the skewness of the sample, we have
Meanwhile, let
and
be the MLE estimator of
and
based on the sample, which can be obtained by the followings
The more details for MLE estimator of
,
and
can be found in Figueiredo and Gomes [
16].
Based on the Equation (
1), we know that
, and
According to the arguments from Lawless [
17] and Krishnamoorthy et al. [
18], we have
where the notation ∼ means distributed as, and
and
are equivalent estimators for
and
of
.
3. Confidence Interval of the Mean by Pivotal Quantity
In this section, we study the confidence intervals of the mean of
through the pivot quantity approach. According to the Equations (
2) and (
3), we have
Let
, and
denotes the 100
percentile of
, then the
confidence interval for the mean is
Without loss of generality, we choose
,
, and the values of
are chosen as −2, 1, 0.05, respectively. The percentiles for calculating
confidence intervals based on different sample sizes are given in
Table A1 with the Monte Carlo experiments.
Figure 2 shows the percentiles of computing
,
,
,
,
and
confidence intervals for the mean based on the results obtained from
Table A1.
It is observed from
Figure 2 that despite the different confidence levels, the percentages gradually decrease and tend to zero as the sample size increases.
From
Table A1, we can find that the distance between the upper and lower percentiles of the
decrease as the sample size increases. To evaluate the performances of the proposed confidence interval constructed by the pivotal quantity approach, we carried out simulation studies with the same values of
given in
Table A1. The coverage probabilities (CP), average length (AL), and associated standard deviations (SD) were calculated based on the R software. For each of the generated sets, we used the R code with
10,000 runs to compute confidence intervals. The percentage of these 10,000 confidence intervals that include the actual mean value is an estimate of the CP. The AL and SD are estimated similarly. The corresponding results for sample sizes of
n ranging from 5 to 100 and different values of
are displayed in the following
Table A2. See the Appendix. The values of AL in
Table A2 are displayed in
Figure 3.
From
Table A2, we can see that all the CP can reach the corresponding confidence levels. With the increase of sample size, both the mean length and the standard deviations of the interval decrease.
Figure 3 illustrate our conclusion more visually. The convergence based on n for different
would be clearer seen.
4. Prediction Intervals for the Mean of a Future Sample
A prediction interval is a statistical interval that contains future random variables with a specific probability, which works on estimating the range of the samples in the future according to the samples in the past or present. Hahn [
19] and Kaminsky [
20] have expounded the prediction interval of normal distribution and exponential distribution respectively. In the following, we aim to find a prediction interval for the mean value of the future data, with sample size
m, from
.
Let
denote the mean of future sample of size
m from
. To find a prediction interval for
, we denote the quantity
that
where
is the mean of a sample of size
m from the
. Therefore, the
prediction intervals for a future sample mean
is
where
is 100
percentile of
. In the real life, future data is not easy to be collected due to various factors. So here we only consider the case
m less than
n, and the
prediction interval based on the Monte Carlo simulation experiment are obtained in
Table A3.
As can be seen from
Table A3, the predicted interval length decreases with the increasing of
m and
n.
Figure 4 also visually illustrates the above conclusion. And we can conclude that
of the mean of the future data is
, when the current sample size is 20 and the future sample size is 5, where
and
can be estimated from the sample size 20.
5. One-Sided Tolerance Interval Limits
In many practical applications such as medical treatment, environment and engineering, people hope to find an interval estimate based on the sample, which can capture at least a proportion
p in the sample population with confidence
. This statistical interval is called the tolerance interval. This type of interval estimation is called
p content-
coverage tolerance interval or
tolerance interval for short. Proschan [
21] has studied the tolerance interval of normal distribution. Krishnamoorthy et al. [
18] have discussed the prediction and tolerance intervals of the Rayleigh distribution with two-parameter. Hoang-Nguyen-Thuy et al. [
22] have given the calculation method of tolerance interval of location scale distribution family. Therefore, it is necessary to study the tolerance interval of skew normal distribution when many data in life tend to show some skewness compared with normal distribution.
Let
and
to be the
percentile of the distribution
, then
where
can be used to set confidence bound on
. If
is the 100
percentile of
, the one-sided tolerance interval is
In the following, we calculated the
and
for
with different sample sizes of
n and values of
p. The simulations are based on
and results are shown in
Table A4. The values of
and
are described in
Figure 5.
From
Table A4 and
Figure 5, we can see that the lower bound of one-side tolerance interval increases, the upper bound of one-side tolerance interval decreases, and the interval length of one-side tolerance interval decreases as the sample size increases, which means that the larger the sample size is, the smaller and more precise the intervals are.
Table A5 selects several sample sizes for simulation and gives the CP and AL of the tolerance confidence interval. Repeat 10,000 times to get the SD of the length of the tolerance confidence intervals. The values of AL in
Table A5 are displayed in
Figure 6.
From
Table A5, we can see that the coverage of tolerance confidence interval can reach
, the AL and SD of tolerance confidence interval decrease with the increase of sample size.
6. Fiducial Distribution of Skew Normal Distribution
In practical application, it is often necessary to know the probability that the sample is larger than a certain critical value. When analyzing survival data, we need to get the probability that the patient’s survival time after illness is greater than a certain value
t, that is
. For example, in mechanical manufacturing, it is necessary to know the probability of the parts manufactured in the tolerance range, which can be obtained by using fiducial inference. Krishnamoorthy [
23] has shown that the fiducial method is a useful tool for solving the frequency characteristics of many complex problems. The application of the fiducial method to the concrete distribution has also been studied by many scholars. O’Reilly [
24] studied the fiducial distribution of exponential distribution, and Hoang-Nguyen-Thuy [
6] obtained the fiducial distribution of position scale distribution family. In this section, we study the fiducial distribution for the probabilities of
.
Given
and
, let
be the cumulative distribution function (cdf) of
. Consider the testing
where
is a specific value between (0,1). The hypothesis above are equivalent to
and
For given level
and observed value (
) of (
), the
is rejected if,
where
is the CDF of
.
Therefore, the fiducial distribution of
is given by
and the
fiducial confidence interval for
is formed by the lower and upper
percentiles of
.
Furthermore, let
and replace the parameters with their fiducial quantities. We can obtain a fiducial quantity of
as
Therefore, the confidence interval for is , where is the percentile of .
To verify the validity of the constructed fiducial confidence interval, the following simulation is performed. Let
and
denote the left and right-tail error probability, such that
. We choose
in the study.
Table 1 shows the CP and AL of fiducial confidence intervals simulated by Monte Carlo method. The SD of the AL was obtained by repeating the simulation experiment 10,000 times. Without loss of generality, the following simulations are based on
.
As can be seen from the simulation in
Table 1, the coverage rate increases, AL and SD decrease with the increase of sample size.
7. Application
Corn seed quality is an important factor to determine corn yield and it is easy to suffer mechanical damage when threshing. Mancera-Rico et al. [
25] conducted an experiment to measure the mechanical damage suffered by maize seeds, and they considered that maize seeds contained different levels of moisture and endosperm were compressed until rupture occurred. We choose one of the variables, stain, which has the same function as Mancera-Rico et al. [
25] stated. The data set was presented in
Table 2 contains 90 observations, and strain (mm) were measured on maize seeds containing flour endosperm and
water.
Using the sn package in R software, the estimators of parameter for the skew normal distribution are
,
, and
, respectively. Next, we will work on this data to illustrate our proposed methods. Based on the equations in
Section 2, the estimators of the corresponding parameter for the skew normal distribution are
,
, and
, respectively.
Furthermore, the Kolmogorov-Smirnov (K-S) test, the Anderson-Darling (A-D) goodness-of-fit tests, as well as the
p-value (
pval) are reported in
Table 3. The K-S statistic (based on the MLE of the parameter
,
, and
) is 0.0531 and the corresponding
p-value is 0.9613. The K-S statistic (based on the our method of the parameter
,
, and
) is 0.0424 and the corresponding
p-value is 0.9969. Therefore, the data set is reasonably fitted for the skew normal distribution. The fitting curves of the probability densities from these two methods are also displayed in the
Figure 7.
Furthermore, we use the strain data in
Table 2 to study the construction of different proposed statistical intervals. The 95% confidence interval for the mean strain (MCI) of corn,
prediction interval for the mean strain (MPI) in a future sample of size
, and one-sided tolerance limited with
(TL) are given in
Table 4.
From
Table 4, we can find that the
confidence interval for the mean of corn is (0.2276, 0.2621). In other words, there is 95% chance that the average strain of corn seed after extrusion will be between 0.2276 mm and 0.2621 mm. We also notice that the
prediction confidence interval for the 20 samples in the future is (0.2320, 0.2665). It means that
chance that the average strain of a corn seed will be between 0.2320 mm and 0.2665 mm. In addition, we study the one-sided tolerance interval about the strain of the corn seed, and found that the upper and lower tolerance limits are 0.4989 mm and 0.0688 mm respectively. This means that at least
of the corn that will change at least 0.4989 mm has a confidence
.
Finally, we study the probability of strain for the corn seed in a certain length, such as
. Based on Equation (
4), we known
where
and
are estimates of
. With the same procedure instructed in
Section 6, the lower and upper 2.5th percentiles of
are calculated as 0.5833 and 0.7263. Thus, the 95% confidence interval for
is (0.5833, 0.7263), which means 58.33–72.63% of corn seeds have changed 0.1 mm to 0.45 mm in length with a confidence 95%.
8. Concluding Remarks
Secondly, we propose the confidence interval of the mean, the prediction interval of the future sample mean, and one-side tolerance limit for the skew normal distribution based on the pivotal quantity approach. We discuss that the estimator of the skewness parameter can be obtained without depending on and , and obtain the method to estimate the parameter , which is simpler than the traditional MLE method. Monte Carlo random simulation experiments are carried out for all the obtained intervals. The simulation experiments show that the CPs of the confidence intervals reach the corresponding confidence levels. Moreover, the mean lengths and standard deviations of the intervals decrease as the sample size increases, and the lengths of the prediction intervals decrease as m and n increase. In addition, we study the fiducial distribution of the skew normal distribution, and the pivotal approach provides a good idea to study the mean of one sample. In the end, we employ our proposed methods on the real data, which conclude our proposed methods can provide effective and useful information. It can be used as an extension of traditional methods to better solve specific problems in practice.
In fact, the proposed estimation method has some limitations, especially for the
, which are also discussed by Azzalini [
11]. In the future, the estimation method for solving the problem when
closes to 0 can be studied. Meanwhile, we will keep working on these confidence intervals with the fiducial approach and do some comparisons between skew normal distribution and other skewed distributions, such as, lognormal distribution, skew-t distribution, and skew-Cauchy distribution. Furthermore, these three different intervals with the pivotal quantity approach based on the skew slash distribution, which proposed by Tian et al. [
26], will be conducted to enrich the research work on the asymmetric data.