1. Introduction
Climate change significantly impacts weather patterns worldwide, including in Thailand, where changes in rainfall dispersion have become particularly noticeable [
1]. Thailand has a tropical climate characterized by a rainy season from May to October and a dry season from November to April [
2]. However, recent years have witnessed irregular and extreme weather conditions, causing disruptions to traditional rainfall patterns. Several researchers have investigated rainfall dispersion in Thailand, such as Kumphon et al. [
3], Szyniszewska and Waylen [
4], and Thodsan et al. [
5]. Additionally, studies on statistical inference for rainfall distribution in Thailand have been reported by Maneerat et al. [
6], Khooriphan et al. [
7], Yosboonruang et al. [
8,
9], and Thangjai et al. [
10]. Previous research on statistical inference has primarily focused on measuring rainfall dispersion using variance and coefficient of variation. However, when rainfall dispersion is highly skewed, the coefficient of quartile variation (CQV) becomes a more appropriate tool for analyzing this data type.
The CQV, which stands for the quartile coefficient of dispersion, is a statistical measure that assesses the relative dispersion or variability within a dataset. It quantifies the spread of the data concerning its central tendency, represented by the median. A higher coefficient indicates a more significant variability or dispersion in the dataset, indicating that the values are more spread out from the median. Conversely, a lower coefficient suggests less variability, indicating that the values are closer and more tightly clustered around the median. When the data follow a non-normal distribution, the CQV performs better than the coefficient of variation in measuring relative dispersion. Additionally, when outlier values are present, the CQV is a more appropriate measure for quantifying data dispersion [
11]. The CQV has been applied in various subject areas. For instance, Hussein and Morgan [
12] used the CQV to measure intravertebral density heterogeneity. Marcoulaki et al. [
13] employed the CQV to assess dispersion in computer simulation data for designing central pipeline systems. Chatterjee et al. [
14] compared land surface temperature and radiant temperature images at notable coal fire locations using the CQV. Antonetti et al. [
15] incorporated water temperature simulations into a fish habitat model and measured thermal heterogeneity using the CQV. Furthermore, researchers have examined statistical inferences regarding the quartile coefficient of variation. Bonett [
16] constructed confidence intervals for the CQV, which applied to normal and non-normal distributions. Ambati et al. [
17] introduced ratio- and regression-type estimators for estimating the CQV in finite populations. Javed et al. [
18] proposed a class of ratio estimators for estimating population variance, utilizing the CQV of an auxiliary variable. Altunkaynak and Gamgam [
19] recommended the bootstrap method to establish confidence intervals for the CQV in non-normal distributions. Singh et al. [
20] and Ahmed and Shabbir [
21] identified an error in the mean squared error of Ambati et al. [
17] and rectified it using auxiliary information. Ahmed and Shabbir [
21] also presented the Rao regression-type estimator for estimating the CQV with an auxiliary variable. In 2022, Eppen et al. [
22] proposed naïve, Rao, and regression estimators for estimating the CQV with an auxiliary variable. Singh and Usman [
23] expanded upon the methods introduced by Ambati et al. [
17] to estimate the CQV for missing data. Furthermore, Yosboonruang et al. [
24] developed a confidence interval for the CQV of a zero-inflated lognormal distribution.
The lognormal distribution plays a significant role in climate change studies due to its ability to model skewed data, analyze extreme events, quantify uncertainty, and facilitate econometric analyses. According to the rainfall data, several researchers have reported that the data follow a lognormal distribution with zero values, also known as a delta-lognormal distribution [
8,
9,
25,
26,
27,
28]. The delta-lognormal distribution consists of positive values following a lognormal distribution and actual zero values following a binomial distribution. The lognormal distribution is asymmetrical in shape. Nevertheless, if the values of the lognormal random variable undergo a logarithmic transformation, they conform to a symmetrical distribution, commonly known as the normal distribution. This distribution has attracted significant interest from researchers studying statistical inference related to it. For example, Li et al. [
29] presented generalized and fiducial inference approaches for estimating the mean of a lognormal distribution with excess zeros, with the fiducial approach demonstrating superior performance. Wu and Hsieh [
30] constructed a generalized confidence interval (GCI) for the mean of the delta-lognormal distribution using an asymptotic generalized pivotal quantity (GPQ), and their method showed excellent performance. Hasan and Krishnamoorthy [
31] introduced fiducial confidence intervals and the method of variance estimate recovery (MOVER) to estimate the mean of a delta-lognormal distribution, receiving recognition for the effectiveness of their proposed methods. In their 2022 study, Zhang et al. [
32] proposed the fiducial generalized pivotal quantity (FGPQ) and employed MOVER with FGPQ to construct simultaneous confidence intervals for ratios of means in zero-inflated lognormal distributions, which were also highly recognized for their effectiveness. Furthermore, Yosboonruang et al. [
9] introduced various methods to estimate the ratio of coefficients of variation of lognormal distributions with excess zeros, including the fiducial generalized confidence interval (FGCI), Bayesian methods, and the Wald and Fieller log-likelihood methods, with the Bayesian method proving to be the most effective. Recently, in 2023, Thangjai et al. [
10] employed the FGCI, Bayesian, and bootstrap methods to establish confidence intervals for the ratio of percentiles of delta-lognormal distributions, with the Bayesian method demonstrating superior performance. Various studies have addressed estimating parameters for lognormal distributions with excess zero values. In this article, we focus on estimating the dispersion of a dataset that follows a delta-lognormal distribution. One effective method for estimating this dispersion is the CQV, building upon the methodology proposed by Yosboonruang and Niwitpong [
24] to examine and compare the dispersion between two datasets. Specifically, we aim to construct the highest posterior density (HPD) and confidence intervals for the difference between the CQVs of two delta-lognormal distributions.
The following section presents Bayesian approaches utilizing multiple priors, GCI, and FGCI to construct HPD and confidence intervals.
Section 3 provides the simulation results and an empirical study. The final section encompasses the discussion and conclusions of the study.
2. Materials and Methods
Let
be random variables from
n observations of delta-lognormal distributions denoted by
, where
,
, and
represent the mean, variance, and probability of zero values, respectively. For
,
follows a lognormal distribution while
follows a binomial distribution. Let
and
be the numbers of zero and positive values, respectively, such that
. Aitchison [
33] derived the mean and variance of
as
and
, where
is the probability of positive values.
The CQV is a descriptive statistic used to measure the dispersion between data sets that have different units or to compare within data sets that have different mean values. The CQV is defined by the first and third quartiles as follows:
where
and
denote the first and third quartiles of
, respectively. The quartiles are determined according to Hasan and Krishnamoorthy [
31] as
where
is the cumulative standard normal distribution. Since this study focuses on the difference between CQVs, it is defined as
This study introduces Bayesian and GCI methods to establish HPD and confidence intervals for the difference between CQVs.
2.1. Bayesian Method
Nowadays, research on statistical inferences and applications is focused on the Bayesian approach because it relies on the population distribution to estimate the parameter of interest [
34]. In Bayesian inference, the parameters of interest are directly illustrated by the probability distribution, which is defined as random variables [
35].
Regarding the unknown parameters of the delta-lognormal distributions, namely
, and
where
, the joint likelihood function can be defined as
Using Equation (
4), the Fisher information matrix of parameters
can be derived by taking the second-order derivative of the log-likelihood function:
To estimate the difference between CQVs, HPD intervals are constructed based on the posterior distribution, which is updated using the concept of Bayes’ theorem defined as .
Since the parameters of interest in this study are
,
, and
, the posterior of these parameters is computed by integrating the likelihood function in Equation (
4) with the prior density function for a delta-lognormal distribution,
. Therefore, the posterior density of
,
and
can be derived as follows:
and
respectively.
Furthermore, Bayesian deep learning can also serve as an alternative approach for generating posterior distributions. By incorporating Bayesian inference techniques, this approach provides several advantages, including robust uncertainty estimation and a principled approach to mitigating overfitting issues. Estimating posterior distributions allows a more comprehensive understanding of the uncertainty associated with the model’s parameters, given the observed data. For a more in-depth exploration of the Bayesian deep learning method, we recommend referring to the research conducted by Zhuang et al. [
36].
This article selected three prior distributions, namely the normal gamma prior, Jeffrey’s prior, and the uniform prior for the Bayesian method because these prior distributions can return the closed-form solutions of the posterior distributions. But whenever choosing priors in other forms, the posterior distributions may not have closed-form solutions or follow regular distributions. Therefore, it is necessary to find alternative inference methods, such as Bayesian variable sampling or Markov chain Monte Carlo sampling (MCMC), etc. [
37].
2.1.1. The Normal-Gamma Prior
Choosing hyperparameters in the normal-gamma prior involves determining appropriate values for the mean and precision parameters of the normal distribution and the shape and rate parameters of the gamma distribution. These hyperparameters play a crucial role in shaping the prior distribution and subsequently influence the posterior distribution in Bayesian inference.
Maneerat et al. [
6] utilized the conjugate families proposed by DeGroot [
38] for a normal random sample to derive the posterior distribution of parameters in the normal-gamma prior. The posterior distributions of
,
, and
are as follows:
, where
;
; and
, respectively.
2.1.2. Jeffreys’ Prior
According to the delta-lognormal distribution, which is a mixture of the lognormal and binomial distributions, the parameters of interest are
,
, and
. Following the concept of Jeffreys [
39], the prior distributions for these parameters can be obtained by taking the square root of the determinant of the Fisher information matrix (Equation (
5)), resulting in
. Specifically, the Jeffreys prior for
is
. For the lognormal distribution, the prior distributions for
and
are
and
[
40]. Consequently, the posterior distributions of these parameters can be computed using Equations (
6) - (
8) as follows:
,
, and
.
2.1.3. The Uniform Prior
According to the uniform prior, it represents a constant function of an a priori probability that assigns equal probabilities to all possible values [
41,
42]. Therefore, the uniform priors for the parameters of the binomial and lognormal distributions are proportional to 1. By integrating the prior density function for a delta-lognormal distribution, the posterior distributions of
,
, and
can be determined as follows:
,
, and
.
Using the posteriors of
,
, and
obtained for each prior, we substitute these posterior distribution into Equation (
2) to calculate the difference between CQVs. Subsequently, we construct the HPD intervals for all methods using the HDInterval package in the R statistical program, following the outlined Algorithm 1 below.
Algorithm 1 Steps to construct HPD intervals for the Bayesian method. |
Step 1. Generate , where and , from the delta-lognormal distributions. |
Step 2. Compute , , and . |
Step 3. Generate the posterior densities of , , and using each prior. |
Step 4. Compute using Equation (2). |
Step 5. Compute using Equation (1). |
Step 6. Compute using Equation (3). |
Step 7. Repeat Steps 3–6 for a total of 2000 times. |
Step 8. Construct HPD intervals for using each prior. |
Step 9. Repeat Steps 1–8 for a total of 10,000 times. |
2.2. Generalized Confidence Interval
The concept of the GCI was introduced by Weerahandi [
43]. It is based on the GPQs of the parameters of interest. Furthermore, the construction of confidence intervals for the model parameters using the generalized inference method is discussed in [
44]. In this context, the random variables
, where
and
, follow delta-lognormal distributions. Referring to Equations (
1)–(
3), the parameters of interest are
,
, and
. Let
, where
and
are the observed values of the random variables
. The GPQs for these parameters possess two important properties: (1) the distribution of GPQs is free from all unknown parameters, and (2) the observed values of GPQs do not depend on the nuisance parameter. Following Wu and Hsieh [
30], they computed the variance stabilizing transformation of a binomial distribution using the concept of DasGupta [
45]. The GPQ for
is given by
where
. Moreover, they used the idea of Krishnamoorthy and Mathew [
46] to compute the GPQs for
and
as follows:
where
, and
The pivotal quantities
,
, and
are consistent with the properties of GPQs. Therefore, we can express
as follows:
By substituting the pivotal quantity from Equation (
12) into Equation (
1), we obtain
Hence, the pivotal quantity for the difference between CQVs is given by
Consequently, the
confidence interval for
can be expressed as
where
and
represent the
-th and
-th percentiles of
, respectively. The steps for constructing confidence intervals using the GCI method are presented in Algorithm 2.
Algorithm 2 Steps to construct confidence interval for the GCI method. |
Step 1. Generate , where and , from the delta-lognormal distributions. |
Step 2. Compute the estimates , , and . |
Step 3. Generate random variables and . |
Step 4. Compute the pivotal quantities , , and . |
Step 5. Repeat Steps 3–4 for a total of 2000 times. |
Step 6. Construct the confidence interval for . |
Step 7. Repeat Steps 1–6 for a total of 10,000 times. |
2.3. Fiducial Generalized Confidence Interval
We extended the FGCI method, as proposed by Yosboonruang et al. [
24], to handle the construction of the confidence interval for the difference between CQVs in the delta-lognormal distribution. Consequently, the parameters of interest, based on Equation (
2), are
,
, and
. The fiducial quantities for these parameters can be expressed as
, while
and
are represented by Equations (
10) and (
11), respectively. Furthermore, by utilizing Equation (
2), we can determine the fiducial quantity for
as follows:
Substituting
into Equation (
1), we can obtain the fiducial quantity for
:
Accordingly, the fiducial quantity for the difference between CQVs, denoted as
, can be represented as
Consequently, the
confidence interval for
can be expressed as
where
and
represent the
-th and
-th percentiles of
, respectively. Algorithm 3 outlines the procedure for constructing confidence intervals using the FGCI method.
Algorithm 3 Steps to construct confidence interval for the FGCI method. |
Step 1. Generate , where and , from the delta-lognormal distributions. |
Step 2. Compute the estimates , , and . |
Step 3. Generate random variables and . |
Step 4. Compute the pivotal quantities , , and . |
Step 5. Repeat Steps 3–4 for a total of 2000 times. |
Step 6. Construct the confidence interval for . |
Step 7. Repeat Steps 1–6 for a total of 10,000 times. |
4. Discussion and Conclusions
This study aimed to construct HPD and confidence intervals for the difference between CQVs of delta-lognormal distributions. We proposed the Bayesian method based on three priors: normal gamma, Jeffreys, and uniform priors, along with the GCI and FGCI methods. To evaluate the performance of these methods, we assessed their coverage probabilities and average lengths under various simulation scenarios.
The findings indicate that the Bayesian approach based on Jeffreys’ prior is suitable for cases with slight variances or small sample sizes. Conversely, as the variance increases, the GCI method outperforms the others. However, it is essential to note that this study focuses on quartiles and specifically applies when the probabilities of positive values are 0.8 or higher. Furthermore, we computed the HPD and confidence intervals for the CQVs of rainfall data from two areas, which followed a delta-lognormal distribution. The empirical results align with the findings from the simulation study, demonstrating that the interval lengths are similar across all methods. In conclusion, we recommend using the Bayesian approach based on Jeffreys’ prior and the GCI method to construct HPD and confidence intervals for the difference between CQVs of delta-lognormal distributions.