Assessing Multinomial Distributions with a Bayesian Approach

Al-Labadi, Luai; Ciur, Petru; Dimovic, Milutin; Lim, Kyuson

doi:10.3390/math11133007

Open AccessArticle

Assessing Multinomial Distributions with a Bayesian Approach

¹

Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Toronto, ON L5L 1C6, Canada

²

Department of Mathematics & Statistics, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4L8, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(13), 3007; https://doi.org/10.3390/math11133007

Submission received: 10 June 2023 / Revised: 2 July 2023 / Accepted: 4 July 2023 / Published: 6 July 2023

(This article belongs to the Special Issue Research Progress and Application of Bayesian Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a unified Bayesian approach for testing various hypotheses related to multinomial distributions. The method calculates the Kullback–Leibler divergence between two specified multinomial distributions, followed by comparing the change in distance from the prior to the posterior through the relative belief ratio. A prior elicitation algorithm is used to specify the prior distributions. To demonstrate the effectiveness and practical application of this approach, it has been applied to several examples.

Keywords:

dirichlet distribution; hypothesis testing; Kullback–Leibler divergence; multinomial distribution; relative belief inferences

MSC:

62F15; 62F03

1. Introduction

Multinomial distribution tests are a crucial statistical tool in many fields, especially when data can be categorized into multiple groups. These tests were first proposed by Karl Pearson in 1890 and have since been widely used to analyze and make inferences about the probabilities or proportions associated with each category in the multinomial distribution [1].

Let the sample space

A

of a random experiment be the union of a finite number k of mutually disjoint sets (categories)

A_{1}, \dots, A_{k}

. Assume that

P (A_{j}) = θ_{j}

,

j = 1, \dots, k

, where

\sum_{j = 1}^{k} θ_{j} = 1

. Here

θ_{j}

represents the probability that the outcome is an element of the set

A_{j}

. The random experiment is to be repeated n independent times. Define the random variables

Y_{j}

to be the number of times the outcome is an element of set

A_{j}

,

j = 1, \dots, k

. That is,

Y_{1}, \dots, Y_{k} = n - Y_{1} - Y_{2} - \dots - Y_{k - 1}

denote the frequencies with which the outcome belongs to

A_{1}, \dots, A_{k}

, respectively. Then the joint probability mass function (pmf) of

Y_{1}, \dots, Y_{k}

is the multinomial with parameters

n, θ_{1}, \dots, θ_{k}

[2]. It is desired to test the null hypothesis:

\begin{matrix} H_{0}^{1} : θ_{j} = θ_{j 0}, for j = 1, \dots, k \end{matrix}

(1)

against all alternatives, where

θ_{j 0}

are known constants. Within the classical frequentist framework, to test

H_{0}^{1}

, it is common to use the test statistic [3]:

\begin{matrix} χ^{2} = \sum_{j = 1}^{k} \frac{{(Y_{j} - n θ_{j 0})}^{2}}{n θ_{j 0}} . \end{matrix}

(2)

It is known that, under

H_{0}^{1}

, the limiting distribution of

χ^{2}

is chi-squared with

k - 1

degrees of freedom. When

H_{0}^{1}

is true,

n θ_{j 0}

represents the expected value of

Y_{j}

. This implies the observed value

χ^{2}

should not be too large if

H_{0}^{1}

is true. For a given significance level

α

, an approximate test of size

α

is to reject

H_{0}^{1}

if the observed

χ^{2} > χ_{k - 1}^{2} (α)

, where the

χ_{k - 1}^{2} (α)

is the

1 - α

quantile of the chi-squared distribution with

k - 1

degrees of freedom; otherwise, fail to reject

H_{0}^{1}

. Other possible tests for

H_{0}^{1}

include Fisher’s exact test and likelihood ratio tests [4].

If there are r independent samples, then the interest is to test whether the r samples come from the same multinomial population or that r multinomial populations are different. Let

A_{1}, A_{2}, \dots, A_{k}

denote k possible types of categories in the ith sample,

i = 1, \dots, k

. Let the probability that an outcome of category

A_{j}

will occur for the ith population (or ith sample) be denoted by

θ_{j | i}

. Note that,

\sum_{j = 1}^{k} θ_{j | i} = 1

for each

i = 1, \dots, r

. Moreover, let

Y_{j | i}

be the number of times the outcome is an element of

A_{j}

in sample i. Consider the completely specified hypothesis:

\begin{matrix} H_{0}^{2} : θ_{j | i} = θ_{j 0 | i}, for j = 1, 2, \dots, k . \end{matrix}

(3)

Under

H_{0}^{2}

, the test statistic in (2) can be extended to

\begin{matrix} χ^{2} = \sum_{i = 1}^{r} χ_{i}^{2} = \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(Y_{j | i} - n_{i} θ_{j 0 | i})}^{2}}{n_{i} θ_{j 0 | i}} . \end{matrix}

(4)

If

H_{0}^{2}

is true, then

χ^{2}

in (4) has an approximately chi-squared distribution with

r (k - 1)

degrees of freedom. Likewise, for a given significance level

α

, an approximate test of size

α

is to reject

H_{0}^{2}

if the observed

χ^{2}

is bigger than

χ_{r (k - 1)}^{2} (α)

; otherwise, fail to reject

H_{0}^{2}

[5].

A third and more common hypothesis is to test whether the r multinomial populations are the same without specifying the values of the

θ_{j | i}

. That is, we consider the null hypothesis:

\begin{matrix} H_{0}^{3} : θ_{j | 1} = θ_{j | 2} = \dots = θ_{j | r} = θ_{j}, for j = 1, 2, \dots, k . \end{matrix}

(5)

The test statistics to test

H_{0}^{3}

are given by

\begin{matrix} χ^{2} = \sum_{i = 1}^{r} χ_{i}^{2} = \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(Y_{j | i} - n_{i} {\hat{θ}}_{j | i})}^{2}}{n_{i} {\hat{θ}}_{j | i}}, \end{matrix}

(6)

where

{\hat{θ}}_{j | i} = \frac{\sum_{i = 1}^{r} Y_{j | i}}{\sum_{i = 1}^{r} n_{i}} = \frac{\sum_{i = 1}^{r} Y_{j | i}}{n}

. Here,

n_{i}

denotes the sample size of sample i and

\sum_{i = 1}^{r} Y_{j | i}

represents the total in category

A_{j}

. Note that

{\hat{θ}}_{j | i}

represents the pooled maximum likelihood estimator (MLE) of

θ_{j}

under

H_{0}^{3}

. It is known that the limiting distribution of

χ^{2}

in (6) is a chi-squared distribution with

(r - 1) (k - 1)

degrees of freedom. So, for a given significance level

α

, an approximate test of size

α

is to reject

H_{0}^{3}

if the observed

χ^{2} > χ_{(r - 1) (k - 1)}^{2} (α)

; otherwise, fail to reject

H_{0}^{3}

[5]. It is worth mentioning that several other frequentist methods for testing the multinomial distribution have been proposed, utilizing different distance measures. These methods include the Euclidean distance proposed by [6], the smooth total variation distance introduced by [7], and

ϕ

-divergences discussed by [8]. These approaches provide alternative ways to assess the goodness-of-fit of the multinomial distribution using distance metrics.

Refs. [9,10,11,12,13] made early advances in Bayesian methods for analyzing categorical data, focusing on smoothing proportions in contingency tables and inference about odds ratios, respectively. These methods typically employed conjugate beta and Dirichlet priors. Ref. [14,15] extended these methods to develop Bayesian analogs of small-sample frequentist tests for

2 \times 2

tables, also using such priors. Ref. [16] recommended the use of the uniform prior for predictive inference, but other priors were also suggested by discussants of his paper. The Jeffreys prior is the most commonly used prior for binomial inference, partially due to its invariance to the scale of measurement for the parameter. Reference priors (see [17]), such as the Jeffreys prior for the binomial parameter (see [18]), are viable options, but their specification can be computationally complex. Ref. [10] may have been the first to utilize an empirical Bayesian approach with contingency tables, estimating parameters in gamma and log-normal priors for association factors. Empirical Bayes involves estimating the prior distribution from the observed data itself and is particularly useful when dealing with large amounts of data. Refs. [19,20] derived integral expressions for the posterior distributions of the difference, ratio, and odds ratio under independent beta priors. Ref. [19] introduced Bayesian highest posterior density (HPD) confidence intervals for these measures. The HPD approach ensures that the posterior probability matches the desired confidence level, and the posterior density is higher inside the interval than outside. Ref. [21] discussed Bayesian confidence intervals for association parameters in

2 \times 2

tables. They argued that to achieve good coverage performance in the frequentist sense across the entire parameter space, it is advisable to use relatively diffuse priors. Even uniform priors are often too informative, and they recommended the use of the Jeffreys prior. Bayesian methods for analyzing categorical data have been extensively surveyed in the literature, including comprehensive reviews by [22,23] with a focus on contingency table analysis. Refs. [24,25,26] proposed tests based on Bayesian nonparametric methods using Dirichlet process priors.

We build on the recent work of [27] by extending their Bayesian approach for hypothesis testing on one-sample proportions based on Kullback–Leibler divergence and relative belief ratio, using a uniform (0, 1) prior on binomial proportions, to multinomial distributions. Our goal is to provide a comprehensive Bayesian approach for testing hypotheses

H_{0}^{1}

,

H_{0}^{2}

, and

H_{0}^{3}

. We derive distance formulas and use the Dirichlet distribution as a prior on probabilities. To ensure proper values of the prior’s hyperparameters, we employ the elicitation algorithm developed by [28]. The proposed approach offers several advantages, including computational simplicity, ease of interpretation, evidence in favor of the null hypothesis, and no requirement to specify a significance level.

The paper is structured as follows. Section 2 provides an overview of the relative belief ratio inference and KL divergence. Section 3 details the proposed approach, including the formulas and computational algorithms. In Section 4, several examples are presented to illustrate the approach. Finally, Section 5 contains concluding remarks and discussions.

2. Relevant Background

2.1. Inferences Using Relative Belief

Ref. [29] introduced the relative belief ratio, which has become a popular tool in statistical hypothesis testing theory. Several works have employed this approach, including [30,31,32,33,34,35].

Suppose we have a statistical model with a density function

{f_{θ} (y) : θ \in Θ}

with respect to the Lebesgue measure on the parameter space

Θ

. Let

π (θ)

be a prior on

Θ

. After observing the data

y

, the posterior distribution of

θ

can be expressed as

π (θ | y) = \frac{f_{θ} (y) π (θ)}{m (y)},

where

m (y) = \int_{Θ} f_{θ} (y) π (θ) d θ

.

Assume that the goal is to draw inferences about the parameter

θ

. If the prior

π (\cdot)

and the posterior

π (\cdot | y)

are continuous at

θ

, then the relative belief ratio for a hypothesized value

θ_{0}

of

θ

can be expressed as follows:

R B (θ_{0} | y) = \frac{π (θ_{0} | y)}{π (θ_{0})},

the ratio of the posterior density to the prior density at

θ_{0}

. In other words,

R B (θ_{0} | y)

quantifies how the belief in

θ_{0}

being the true value has changed from prior to posterior. It is worth noting that when

π (\cdot)

and

π (\cdot | y)

are discrete, the relative belief ratio is defined through limits, and further details can be found in [29].

The relative belief ratio

R B (θ_{0} | y)

provides a measure of evidence for

θ_{0}

being the true value. A value of

R B (θ_{0} | y) > 1

indicates evidence in favor of

θ_{0}

being the true value, whereas

R B (θ_{0} | y) < 1

indicates evidence against

θ_{0}

being the true value. If

R B (θ_{0} | y) = 1

, there is no evidence in either direction.

Once the relative belief ratio is calculated, it is important to determine the strength of the evidence in favor of or against

H_{0} : θ = θ_{0}

. A common way to quantify this is by computing the tail probability [29]:

S t r (θ_{0} | y) = Π (R B (θ | y) \leq R B (θ_{0} | y) | y) = \int_{{θ \in Θ : R B (θ | y) \leq R B (θ_{0} | y)}} π (\cdot | y) d θ,

(7)

where

Π (\cdot | y)

in (7) is the posterior cumulative distribution function with posterior density

π (\cdot | y)

. Therefore, equation (7) represents the posterior probability that the true value of

θ

has a relative belief ratio no greater than that of the hypothesized value

θ_{0}

. When

R B (θ_{0} | y) < 1

, there is evidence against

θ_{0}

. A small value of

S t r (θ_{0} | y)

indicates a high posterior probability that the true value has a relative belief ratio greater than

R B (θ_{0} | y)

, indicating strong evidence against

θ_{0}

. Conversely, when

R B (θ_{0} | y) > 1

, there is evidence in favor of

θ_{0}

. A large value of

S t r (θ_{0} | y)

indicates a low posterior probability that the true value has a relative belief ratio greater than

R B (θ_{0} | y)

, indicating strong evidence in favor of

θ_{0}

. A small value of

S t r (θ_{0} | y)

indicates weak evidence in favor of

θ_{0}

.

2.2. KL Divergence

The KL divergence, also referred to as relative entropy, is a measure of dissimilarity between two probability distributions that quantifies how far apart they are from each other. It was introduced by Solomon Kullback and Richard Leibler in 1951. Let P and Q be two discrete cumulative distribution functions (cdf’s) on the same probability space

φ

, with corresponding probability mass functions (pmf’s) p and q (with respect to the counting measure). The KL divergence between p and q is given by:

\begin{matrix} d (p, q) = \sum_{x \in φ} p (x) \log (\frac{p (x)}{q (x)}) . \end{matrix}

The KL divergence is always non-negative, and it attains its minimum value when

p = q

almost surely. This property makes it a useful tool in many areas of machine learning and information theory, such as hypothesis testing, model selection, and clustering. One interpretation of the KL divergence is that it measures how much information is lost when using Q to approximate P. It is worth noting that the KL divergence is not symmetric:

d (p, q)

and

d (q, p)

are generally not equal. Therefore, it is important to specify which distribution is the “true” or “target” distribution and which is the “approximating” or “predicted” distribution for some applications when using KL divergence in practice, as noted by [36].

The following lemma is essential to the proposed approach.

Lemma 1.

Let

p (y_{1}, y_{2}, \dots, y_{r}) = p_{1} (y_{1}) p_{2} (y_{2}) \dots p_{r} (y_{r})

and

q (y_{1}, y_{2}, \dots, y_{r}) =

q_{1} (y_{1}) q_{2} (y_{2}) \dots q_{r} (y_{r})

, where

p_{i} (y_{i})

and

q_{i} (y_{i})

are probability mass functions with supports

y_{i} = 1, \dots, n_{i}, i = 1, \dots, r

. Then

d (p, q) = \sum_{i = 1}^{r} d (p_{i}, q_{i}) .

Proof.

We have

\begin{matrix} d (p, q) & = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p (y_{1}, y_{2}, \dots, y_{r}) \log \frac{p (y_{1}, y_{2}, \dots, y_{r})}{q (y_{1}, y_{2}, \dots, y_{r})} \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) \log \frac{p_{1} (y_{1}) \dots p_{r} (y_{r})}{q_{1} (y_{1}) \dots q_{r} (y_{r})} \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) [\log p_{1} (y_{1}) + \dots + \log p_{r} (y_{r}) - \log q_{1} (y_{1}) - \dots - \log q_{r} (y_{r})] \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) \log p_{1} (y_{1}) + \dots + \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) \log p_{r} (y_{r}) \\ - \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) \log q_{1} (y_{1}) - \dots - \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{r} = 0}^{n_{r}} p_{1} (y_{1}) \dots p_{r} (y_{r}) \log q_{r} (y_{r}) . \end{matrix}

Since, for

i = 1, \dots, r

,

\sum_{y_{i} = 1}^{n_{i}} p_{i} (y_{i}) = \sum_{y_{i} = 1}^{n_{i}} q_{i} (y_{i}) = 1

, we have

\begin{matrix} d (p, q) & = & \sum_{y_{1} = 0}^{n_{1}} p_{1} (y_{1}) \log p_{1} (y_{1}) + \dots + \sum_{y_{r} = 0}^{n_{r}} p_{r} (y_{r}) \log p_{r} (y_{r}) \\ - \sum_{y_{1} = 0}^{n_{1}} p_{1} (y_{1}) \log q_{1} (y_{1}) - \dots - \sum_{y_{r} = 0}^{n_{r}} p_{r} (y_{r}) \log q_{r} (y_{r}) \\ = & \sum_{y_{1} = 0}^{n_{1}} p_{1} (y_{1}) \log \frac{p_{1} (y_{1})}{q_{1} (y_{1})} + \dots + \sum_{y_{r} = 0}^{n_{r}} p_{r} (y_{r}) \log \frac{p_{r} (y_{r})}{q_{r} (y_{r})} \\ = & d (p_{1}, q_{1}) + \dots + d (p_{r}, q_{r}) . \end{matrix}

□

3. The Approach

3.1. Bayesian One-Sample Multinomial

Let

Y = (Y_{1}, \dots, Y_{k}) \sim m u l t i n o m i a l (n, θ_{1}, \dots, θ_{k})

. The joint pmf of

Y_{1}, \dots, Y_{k}

is given by

\begin{matrix} p (y_{1}, \dots, y_{k}) = (\binom{n}{y_{1}, y_{2}, \dots, y_{k}}) \prod_{j = 1}^{k} θ_{j}^{y_{j}}, \end{matrix}

(8)

where

(\binom{n}{y_{1}, y_{2}, \dots, y_{k}}) = \frac{n!}{y_{1}! \dots y_{k}!}

,

\sum_{j = 1}^{k} θ_{j} = 1

, and

\sum_{j = 1}^{k} y_{j} = n .

To test the null hypothesis

H_{0}^{1}

as defined in (1), we first compute the Kullback–Leibler (KL) divergence between

p (y_{1}, \dots, y_{k})

and the pmf under

H_{0}^{1}

represented by

\begin{matrix} q (y_{1}, \dots, y_{k}) & = & (\binom{n}{y_{1}, y_{2}, \dots, y_{k}}) \prod_{j = 1}^{k} θ_{j 0}^{y_{i}} . \end{matrix}

(9)

Here,

θ_{j 0}

denotes the null hypothesis value for

θ_{j}

. The following proposition provides the formula for the KL divergence between p and q.

Proposition 1.

Let

p (y_{1}, \dots, y_{k})

and

q (y_{1}, \dots, y_{k})

be two joint probability mass functions as defined in (8) and (9), respectively. We have,

\begin{matrix} d (p, q) & = & n \sum_{j = 1}^{k} [θ_{j} \log (\frac{θ_{j}}{θ_{j 0}})] . \end{matrix}

Proof.

Let the support of

Y_{j}

be

1, 2, \dots, n_{j}

,

j = 1, \dots, k

. We have

\begin{matrix} d (p, q) & = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \log \frac{p (y_{1}, \dots, y_{k})}{q (y_{1}, \dots, y_{k})} \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \log \frac{(\binom{n}{y_{1}, y_{2}, \dots, y_{k}}) \prod_{j = 1}^{k} θ_{j}^{y_{j}}}{(\binom{n}{y_{1}, y_{2}, \dots, y_{k}}) \prod_{j = 1}^{k} θ_{j 0}^{y_{j}}} \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \log \prod_{j = 1}^{k} {[\frac{θ_{j}}{θ_{j 0}}]}^{y_{j}} . \end{matrix}

Using the properties of logarithmic function, we get

\begin{matrix} d (p, q) & = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \sum_{j = 1}^{k} y_{j} \log [\frac{θ_{j}}{θ_{j 0}}] \\ = & \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \times y_{1} \times \log [\frac{θ_{1}}{θ_{01}}] \\ \dots + \sum_{y_{1} = 0}^{n_{1}} \dots \sum_{y_{k} = 0}^{n_{k}} p (y_{1}, \dots, y_{k}) \times y_{k} \times [\frac{θ_{k}}{θ_{k 0}}] \\ = & E [Y_{1}] \times \log [\frac{θ_{1}}{θ_{01}}] + \dots + E [Y_{k}] \times \log [\frac{θ_{k}}{θ_{k 0}}] \\ = & \sum_{j = 1}^{k} E [Y_{j}] \log [\frac{θ_{j}}{θ_{j 0}}] . \end{matrix}

Since the marginal probability mass function of

Y_{j}, j = 1, \dots, k

, is the binomial with parameters n and

θ_{j}

, we get

\begin{matrix} d (p, q) & = & \sum_{j = 1}^{k} n θ_{j} \log [\frac{θ_{j}}{θ_{j 0}}] = n \sum_{j = 1}^{k} [θ_{j} \log (\frac{θ_{j}}{θ_{j 0}})] . \end{matrix}

□

To connect the distance formula presented in Proposition 1 with the test statistic

χ^{2}

in (2), we use the Taylor series expansion of the function

f (x) = x \log \frac{x}{x_{0}}

about

x_{0}

. This gives us

\begin{matrix} f (x) = (x - x_{0}) + 0.5 {(x - x_{0})}^{2} \frac{1}{x_{0}} + \dots \end{matrix}

If

H_{0}^{1}

is true and n is large, then we can approximate the distance

d (p, q)

as

\begin{matrix} d (p, q) & \approx & n \sum_{j = 1}^{k} (θ_{j} - θ_{j 0}) + 0.5 n \sum_{j = 1}^{k} \frac{{(θ_{j} - θ_{j 0})}^{2}}{θ_{j 0}} . \end{matrix}

(10)

Since the probabilities sum to 1, the first term in (10) equals 0. The second term in (10) can be expressed as

\begin{matrix} 0.5 \sum_{j = 1}^{k} \frac{{(n θ_{j} - n θ_{j 0})}^{2}}{n θ_{j 0}} = 0.5 \sum_{j = 1}^{k} \frac{{(E (Y_{j}) - n θ_{j 0})}^{2}}{n θ_{j 0}} . \end{matrix}

This shows a direct connection between the KL divergence and

χ^{2}

.

For the proposed Bayesian test, the probabilities

θ_{1}, \dots, θ_{k}

are now random. The suggested prior for the joint probabilities

(θ_{1}, \dots, θ_{k})

is the Dirichlet distribution with parameters

α_{1}, \dots α_{k}

. That is,

\begin{matrix} (θ_{1}, θ_{2}, \dots, θ_{k}) \sim Dirichlet (α_{1}, α_{2}, \dots, α_{k}) . \end{matrix}

(11)

To elicit the prior, we use the elicitation algorithm developed by [28], which requires some domain knowledge to provide a lower bound for each

θ_{i}

. For convenience, we have made this algorithm available on Shiny at the following link: https://bayesian-chi-square-test.shinyapps.io/DirichletprocessKyusonlim/ (accessed on 23 May 2023). For comparison purposes, we also considered the non-informative (uniform) prior, and Jeffreys prior; see Section 4. For the proposed Bayesian approach, when

(θ_{1}, \dots, θ_{k})

has the prior defined in (11), we put

\begin{matrix} D = n \sum_{j = 1}^{k} [θ_{j} \log (\frac{θ_{j}}{θ_{j 0}})] . \end{matrix}

(12)

We also have that the posterior distribution of

(θ_{1}, \dots, θ_{k})

given the observed data

y = (y_{1}, \dots, y_{k})

is

Dirichlet (α_{1} + y_{1}, α_{2} + y_{2}, \dots, α_{k} + y_{k})

. We write

\begin{matrix} D_{y} = n \sum_{j = 1}^{k} [θ_{j} \log (\frac{θ_{j}}{θ_{j 0}})] . \end{matrix}

(13)

Note that,

E (θ_{j} | y) = \frac{α_{j} + y_{j}}{\sum_{j = 1}^{k} (α_{j} + y_{j})} = \frac{α_{j} + y_{j}}{n + \sum_{j = 1}^{k} α_{j}} = \frac{n}{n + \sum_{j = 1}^{k} α_{j}} \frac{y_{j}}{n} + (1 - \frac{n}{n + \sum_{j = 1}^{k} α_{j}}) \frac{α_{j}}{\sum_{j = 1}^{k} α_{j}},

which is a convex combination between the sample proportion (MLE) and the prior mean. As

n \to \infty

, the weak law of large numbers ensures that

E (θ_{j} | y)

converges in probability to the true value of

θ_{j}

. Hence, if

H_{0}^{1}

is true, then

D_{y} \overset{a . s .}{\to} 0

. Conversely, if

H_{0}^{1}

is false, then

D_{y} \overset{a . s .}{\to} c

, where

c > 0

. Proposition 1 establishes that

d (p, q) = 0

if and only if

θ_{j} = θ_{j 0}

. Therefore, testing

H_{0}^{1} : θ_{j} = θ_{j 0}

is equivalent to testing

d (p, q) = 0

. It follows that when

H_{0}^{1}

is true, the distribution of

D_{y}

should be more concentrated around 0 than that of D. So, the proposed test involves comparing the distributions of D and

D_{y}

around 0 using the relative belief ratio:

\begin{matrix} R B_{D} (0 | y) = \frac{π_{D} (0 | y)}{π_{D} (0)}, \end{matrix}

(14)

where

π_{D} (0)

and

π_{D} (0 | y)

represent the probability density functions of D and

D_{y}

, respectively. If

R B D (0 | y) > 1

, it provides evidence in favor of

H_{0}

(since the distribution of

D_{y}

is more concentrated around 0 than that of D). If

R B_{D} (0 | y) < 1

, there is evidence against

H_{0}^{1}

(as the distribution of

D_{y}

is less concentrated around 0 than that of D). Additionally, we compute the strength of evidence

S t r_{D} (0 | y) = Π_{D} (R B (d | y) \leq R B (0 | y) | y)

, where

Π_{D} (\cdot | y)

is the cumulative distribution function of

D_{y}

. As

π_{_{D}} (\cdot | y)

and

π_{_{D}} (\cdot)

in (14) have no closed forms,

R B_{D} (0 | y)

and

S t r_{D} (0 | y)

need to be approximated. The following Algorithm 1 summarizes the steps required to test

H_{0}^{1}

.

Algorithm 1 RB test for

H_{0}^{1}

(i)

Generate

(θ_{1}, θ_{2}, \dots, θ_{k})

from

Dirichlet (α_{1}, α_{2}, \dots, α_{k})

based on the algorithm of [28] and compute D as defined in (12).

(ii)

Repeat step (ii) to obtain a sample of

r_{1}

values of D.

(iii)

Generate

(θ_{1}, θ_{2}, \dots, θ_{k})

given the observed data

y = (y_{1}, \dots, y_{k})

from

Dirichlet (α_{1} + y_{1}, α_{2} + y_{2}, \dots, α_{k} + y_{k})

and compute

D_{y}

as defined in (13).

(iv)

Repeat step (iii) to obtain a sample of

r_{2}

values of

D_{y}

.

(v)

Compute the relative belief ratio and the strength as follows:

(a): Let L be a positive number. Let ${\hat{F}}_{D}$ denote the empirical cdf of D based on the prior sample in (3) and for $i = 0, \dots, L,$ let ${\hat{d}}_{i / L}$ be the estimate of $d_{i / L},$ the $(i / L)$ -the prior quantile of D. Here ${\hat{d}}_{0} = 0$ , and ${\hat{d}}_{1}$ is the largest value of D. Let ${\hat{F}}_{D} (\cdot | y)$ denote the empirical cdf based on $D_{y}$ . For $d \in [{\hat{d}}_{i / L}, {\hat{d}}_{(i + 1) / L})$ , estimate $R B_{D} (d | y) = π_{D} (d | y) / π_{D} (d)$ by

${\hat{R B}}_{D} (d | y) = L {{\hat{F}}_{D} ({\hat{d}}_{(i + 1) / L} | y) - {\hat{F}}_{D} ({\hat{d}}_{i / L} | y)},$

(15)

the ratio of the estimates of the posterior and prior contents of $[{\hat{d}}_{i / L}, {\hat{d}}_{(i + 1) / L}) .$ Thus, we estimate $R B_{D} (0 | y) = π_{D} (0 | y) / π_{D} (0)$ by ${\hat{R B}}_{D} (0 | y) =$ $L {\hat{F}}_{D} ({\hat{d}}_{p_{0}} | y)$ where $p_{0} = i_{0} / L$ and $i_{0}$ are chosen so that $i_{0} / L$ is not too small (typically $i_{0} / L \approx 0.05)$ .
(b): Estimate the strength $Π_{D} (R B_{D} (d | y) \leq R B_{D} (0 | y) | y)$ by the finite sum

$\sum_{{i \geq i_{0} : {\hat{R B}}_{D} ({\hat{d}}_{i / L} | y) \leq {\hat{R B}}_{D} (0 | y)}} ({\hat{F}}_{D} ({\hat{d}}_{(i + 1) / L} | y) - {\hat{F}}_{D} ({\hat{d}}_{i / L} | y)) .$

(16)

For fixed $L,$ as $r_{1} \to \infty, r_{2} \to \infty,$ then ${\hat{d}}_{i / L}$ converges almost surely to $d_{i / L}$ and (15) and (16) converge almost surely to $R B_{D} (d | y)$ and $Π_{D} (R B_{D} (d | y) \leq R B_{D} (0 | y) | y)$ , respectively. See [34] for the details.

3.2. Bayesian r-Sample Multinomial Test with a Completely Specified Null Hypothesis

Consider r independent samples

Y_{1}, Y_{2}, \dots, Y_{r}

where each

Y_{i} = (Y_{1 | i}, \dots, Y_{k | i})

follows a multinomial distribution with parameters

n_{i}

and

θ_{i} = (θ_{1 | i}, \dots, θ_{k | i})

, where

\sum_{j = 1}^{k} θ_{j | i} = 1

for

i = 1, \dots, r

. Here,

θ_{j | i}

denotes the probability of an outcome falling in category j for the ith sample, and

Y_{j | i}

represents the number of times the outcome falls in category j in the ith sample. The null hypothesis to be tested is

H_{0}^{2} : θ_{j | i} = θ_{j 0 | i}

for

j = 1, 2, \dots, k

, where

θ_{j 0 | i}

are known constants.

Let the joint distribution of

Y_{1}, Y_{2}, \dots, Y_{r}

be

\begin{matrix} p (y_{1}, y_{2}, \dots, y_{r}) & = & \prod_{i = 1}^{r} p (y_{i}) = \prod_{i = 1}^{r} {(\binom{n_{i}}{y_{1 | i}, y_{2 | i}, \dots, y_{k | i}}) \prod_{j = 1}^{k} θ_{j | i}^{y_{j | i}}} . \end{matrix}

(17)

The proposed test is based on measuring the KL divergence between p and

\begin{matrix} q (y_{1}, y_{2}, \dots, y_{r}) & = & \prod_{i = 1}^{r} q (y_{i}) = \prod_{i = 1}^{r} {(\binom{n_{i}}{y_{1 | i}, y_{2 | i}, \dots, y_{k | i}}) \prod_{j = 1}^{k} θ_{j 0 | i}^{y_{j 0 | i}}} . \end{matrix}

(18)

The following proposition provides the expression for the KL divergence between p and q. The proof follows directly from Lemma 1 and Proposition 1.

Proposition 2.

Let

p (y_{1}, y_{2}, \dots, y_{r})

and

q (y_{1}, y_{2}, \dots, y_{r})

be the two joint probability mass functions as defined in (17) and (18), respectively. Then

\begin{matrix} d (p, q) & = & \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{θ_{j | i}}{θ_{j 0 | i}})]} . \end{matrix}

Proposition 2 provides a connection between the KL divergence formula and the test statistic

χ^{2}

in (6). Using a Taylor series expansion, we can approximate the distance

d (p, q)

as follows:

\begin{matrix} d (p, q) & \approx & 0.5 \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(n_{i} θ_{j | i} - n_{i} θ_{j 0 | i})}^{2}}{n_{i} θ_{j 0 | i}} = 0.5 \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(E (Y_{j | i}) - n_{i} θ_{j 0 | i})}^{2}}{n_{i} θ_{j 0 | i}}, \end{matrix}

which reveals a close connection to

χ^{2}

.

For the proposed Bayesian test of

H_{0}^{2}

, we adopt the prior

(θ_{1 | i}, \dots, θ_{k | i}) \sim Dirichlet (α_{1 | i}, α_{2 | i}, \dots,

α_{k | i})

, and use the algorithm developed in [28] to elicit the hyperparameters

α_{j | i}

for

j = 1, \dots, k

and

i = 1, \dots, r

. In this case, we define the divergence measure as

\begin{matrix} D = \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{θ_{j | i}}{θ_{j 0 | i}})]}, \end{matrix}

(19)

where

θ_{j 0 | i}

is the hypothesized value of

θ_{j | i}

under the null hypothesis.

The posterior distribution of

(θ_{1 | i}, \dots, θ_{k | i})

given the observed data

y_{i} = (y_{1 | i}, \dots, y_{k | i})

is then

Dirichlet (α_{1 | i} + y_{1 | i}, α_{2 | i} + y_{2 | i}, \dots, α_{k | i} + y_{k | i})

. In this case, the divergence measure becomes

\begin{matrix} D_{y} = \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{θ_{j | i}}{θ_{j 0 | i}})]} . \end{matrix}

(20)

The following algorithm outlines the steps required to test

H_{0}^{2}

using the proposed Bayesian test (Algorithm 2):

Algorithm 2 RB test for

H_{0}^{2}

(i): For $i = 1, \dots, r$ , generate $(θ_{1 | i}, θ_{2 | i}, \dots, θ_{k | i})$ from $Dirichlet (α_{1 | i}, α_{2 | i}, \dots, α_{k | i})$ based on the algorithm of [28] and compute D as defined in (19).
(ii): Repeat step (i) to obtain a sample of $r_{1}$ values of D.
(iii): For $i = 1, \dots, r$ , generate $(θ_{1 | i}, θ_{2 | i}, \dots, θ_{k | i})$ given the observed data $y_{i} = (y_{1 | i}, \dots, y_{k | i})$ from $Dirichlet (α_{1 | i} + y_{1 | i}, α_{2 | i} + y_{2 | i}, \dots, α_{k | i} + y_{k | i})$ and compute $D_{y}$ as defined in (20).
(iv): Repeat step (iii) to obtain a sample of $r_{2}$ values of $D_{y}$ .
(v): Compute the relative belief ratio and strength as described in Algorithm 1.

3.3. Bayesian Test for Homogeneity in r-Sample Multinomial Data

Consider r independent samples

Y_{1}, Y_{2}, \dots, Y_{r}

, where for

i = 1, \dots, r

,

Y_{i} = (Y_{1 | i}, \dots, Y_{k | i})

\sim multinomial (n_{i}, θ_{1 | i}, \dots, θ_{k | i})

with

\sum_{j = 1}^{k} θ_{j | i} = 1

. To test the null hypothesis

H_{0}^{3}

as defined in (5), it is required to measure the KL divergence between p and q as defined in (17) and (18) with

θ_{j 0 | i}

is replaced by

θ_{j}

. This requirement is offered in the following proposition.

Proposition 3.

Consider the probability mass functions p and q as defined in (17) and (18) with

θ_{j 0 | i}

is replaced by

θ_{j}

. Then

\begin{matrix} d (p, q) & = & \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{θ_{j | i}}{θ_{j}})]} \end{matrix}

(21)

and

\begin{matrix} θ_{j}^{★} = \arg \min_{θ_{j}} d (p, q) = \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{\sum_{i = 1}^{r} n_{i}} = \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n} . \end{matrix}

(22)

Proof.

(21) follows directly from Lemma 2 by setting

θ_{j | i} = θ_{j}

. To prove (22), we use we use Lagrange multiplier with the constraint

\sum_{j = 1}^{r} θ_{j} = 1

:

\begin{matrix} L = L (θ_{j}, λ) = \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{θ_{j | i}}{θ_{j}})]} + λ (\sum_{j = 1}^{r} θ_{j} - 1) . \end{matrix}

Now,

\begin{matrix} \frac{\partial L}{\partial θ_{j}} = - \sum_{i = 1}^{r} n_{i} \frac{θ_{j | i}}{θ_{j}} + λ, j = 1, \dots, k . \end{matrix}

Setting

\frac{\partial L}{\partial θ_{j}} = 0

gives

\begin{matrix} θ_{j} = \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{λ}, j = 1, \dots, k . \end{matrix}

Summing over both sides and applying the constraint gives

λ = \sum_{j = 1}^{k} \sum_{i = 1}^{r} n_{i} θ_{j | i} = \sum_{i = 1}^{r} n_{i} \sum_{j = 1}^{k} θ_{j | i} = \sum_{i = 1}^{r} n_{i} = n .

Hence,

\begin{matrix} θ_{j}^{★} = \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n} . \end{matrix}

□

Note that

θ_{j}^{★}

represents the weighted average of

θ_{j | i}

. Substituting

θ_{j}^{★}

into (21), we get

\begin{matrix} \hat{d} (p, q) & = & \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{n θ_{j | i}}{\sum_{i = 1}^{r} n_{i} θ_{j | i}})]}, \end{matrix}

(23)

which is equal to 0 under

H_{0}^{3}

. We can also establish a connection between (23) and the test statistic

χ^{2}

in (6). By Taylor series expansion, when n is large and under

H_{0}^{3}

, we have

\begin{matrix} \hat{d} (p, q) & \approx & 0.5 \sum_{i = 1}^{r} n_{i} \sum_{j = 1}^{k} \frac{{(θ_{j | i} - \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n})}^{2}}{\frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n}} \\ = & 0.5 \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(n_{i} θ_{j | i} - n_{i} \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n})}^{2}}{n_{i} \frac{\sum_{i = 1}^{r} n_{i} θ_{j | i}}{n}} \\ = & 0.5 \sum_{i = 1}^{r} \sum_{j = 1}^{k} \frac{{(E (Y_{j | i}) - n_{i} \frac{\sum_{i = 1}^{r} E (Y_{j | i})}{n})}^{2}}{n_{i} \frac{\sum_{i = 1}^{r} E (Y_{j | i})}{n}}, \end{matrix}

which is closely linked to

χ^{2}

. The proposed Bayesian test for

H_{0}^{3}

uses the prior described in Section 3.1. We write

\begin{matrix} \hat{D} = \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{n θ_{j | i}}{\sum_{i = 1}^{r} n_{i} θ_{j | i}})]} . \end{matrix}

(24)

Moreover, for the posterior distribution of

(θ_{1 | i}, \dots, θ_{k | i})

given the observed data

y_{i} = (y_{1 | i}, \dots, y_{k | i})

, we write

\begin{matrix} {\hat{D}}_{y} = \sum_{i = 1}^{r} {n_{i} \sum_{j = 1}^{k} [θ_{j | i} \log (\frac{n θ_{j | i}}{\sum_{i = 1}^{r} n_{i} θ_{j | i}})]} . \end{matrix}

(25)

The following algorithm is used to test

H_{0}^{3}

(Algorithm 3).

Algorithm 3 RB test for

H_{0}^{3}

(i): For $i = 1, \dots, r$ , generate $(θ_{1 | i}, θ_{2 | i}, \dots, θ_{k | i})$ from $Dirichlet (α_{1 | i}, α_{2 | i}, \dots, α_{k | i})$ based on the algorithm of [28] and compute $\hat{D}$ as defined in (24).
(ii): Repeat step (i) to obtain a sample of $r_{1}$ values of $\hat{D}$ .
(iii): For $i = 1, \dots, r$ , generate $(θ_{1 | i}, θ_{2 | i}, \dots, θ_{k | i})$ given the observed data $y_{i} = (y_{1 | i}, \dots, y_{k | i})$ from $Dirichlet (α_{1 | i} + y_{1 | i}, α_{2 | i} + y_{2 | i}, \dots, α_{k | i} + y_{k | i})$ and compute ${\hat{D}}_{y}$ as defined in (25).
(iv): Repeat step (iii) to obtain a sample of $r_{2}$ values of ${\hat{D}}_{y}$ .
(v): Compute the relative belief ratio and strength using Algorithm 1, but replace D and $D_{y}$ with $\hat{D}$ and ${\hat{D}}_{y}$ , respectively.

4. Examples

This section presents three examples that demonstrate the effectiveness of our approach in evaluating

H_{0}^{1}, H_{0}^{2},

and

H_{0}^{3}

. We use Algorithms 1–3, with fixed values of

L = 20

,

i_{0} = 1

, and

r_{1} = r_{2} = 10^{4}

. To further investigate the efficacy of our approach, we consider three different prior distributions: uniform prior, Jeffreys prior, and an elicited prior based [28]. Additionally, we compute the p-values using the test statistics discussed in Section 1 of this paper. The approach was implemented using R (version 4.2.1), and the code is available upon request from the corresponding author.

Example 1

(Rolling Die; [5]). We roll a die 60 times and seek to test whether it is unbiased, that is, whether

H_{0}^{1} : θ_{j} = 1 / 6

for

j = 1, \dots, k

. The Table 1 below presents the recorded data:

We will use a Bayesian approach to address this problem. We employ three priors: the uniform prior represented by

Dirichlet (1, 1, 1, 1, 1, 1)

, Jeffreys prior represented by

Dirichlet (0.5, 0.5,

0.5, 0.5, 0.5, 0.5)

, and the elicited prior Dirichlet (5.83, 5.83, 5.83, 5.83, 5.83, 5.83) obtained using the algorithm proposed by [28], with a lower bound of 0.05 applied to all probabilities. It is worth noting that setting the lower bound in [28] to 0 yields the uniform prior. Additionally, we will include the p-value for the corresponding frequentist test as a reference. The results of our analysis are presented in Table 2. Clearly, both the proposed Bayesian approach, considering the three priors, and the frequentist approach lead to the same conclusion. It should be noted that the uniform prior and the Jefferey prior have a wider spread around zero compared to the elicited prior. As a result, they have higher relative belief ratios in this example. However, this is not practically significant in our case as we calibrate the relative belief ratio through the strength. See Figure 1.

Example 2

(Operation Trial [5]). In a system consisting of four independent components, let

θ_{j | i}

denote the probability of successful operation of the ith component,

i = 1, 2, 3, 4

. We will test the null hypothesis

H_{0}^{2} : θ_{1 | 1} = 0.9, θ_{2 | 1} = 0.1, θ_{1 | 2} = 0.9, θ_{2 | 2} = 0.1, θ_{1 | 3} = 0.8

,

θ_{2 | 3} = 0.2

,

θ_{1 | 4} = 0.8

,

θ_{2 | 4} = 0.2

, given that in 50 trials, the components operated as follows (Table 3):

We use the priors:

Dirichlet (1, 1)

,

Dirichlet (0.5, 0.5)

, and

Dirichlet (33.38, 5.62)

. We obtain the latter prior using algorithm of [28], with lower bounds of

θ_{1 | i} = 0.7

and

θ_{2 | i} = 0.1

for all

i = 1, 2, 3, 4

. Table 4 displays the results of our analysis. As in Example 1, both the uniform prior and the Jefferey prior exhibit less concentration around zero when compared to the elicited prior. This, in turn, leads to a notably different conclusion than that of the elicited prior and the p-value calculated using the chi-square test. See also Figure 2.

Example 3

(Clinical Trial; [37]). A study was performed to determine whether the type of cancer differed between blue-collar, white-collar, and unemployed workers. A sample of 100 of each type of worker diagnosed as having cancer was categorized into one of three types of cancer. The results are shown in Table 5. See also Table 12.6 of [37]. The hypothesis to be tested is that the proportions of the three cancer types are the same for all three occupation groups. That is,

H_{0}^{3} : θ_{j | 1} = θ_{j | 2} = θ_{j | 3}

for all j (types of cancer), where

θ_{j | i}

is the probability of occupation i having cancer type j.

Similar to the previous two examples, we utilize the uniform prior

Dirichlet (1, 1, 1)

, Jeffreys prior

Dirichlet (0.5, 0.5, 0.5)

, and the elicited prior

Dirichlet (3, 3, 3)

. We obtained the elicited prior using the algorithm of [28] by setting a lower bound of

0.05

for all probabilities. Table 6 summarizes the results of our analysis. Similar to the previous examples, Jeffreys prior is not sufficiently concentrated around zero, which makes it inefficient when there is evidence against

H_{0}

. See Figure 3.

5. Concluding Remarks

This study presents a Bayesian method for testing hypotheses related to multinomial distributions. Our approach involves calculating the Kullback–Leibler divergence between two multinomial distributions and comparing the change in distance from the prior to the posterior through the relative belief ratio. To specify the prior distributions, we employ a prior elicitation algorithm. We recommend avoiding the use of Jeffreys prior or the uniform prior unless there is a valid reason to use them. Through several examples, we demonstrated the effectiveness of our approach. Future research may expand our approach to include testing for independence and other related cases.

Author Contributions

Methodology, L.A.-L. and P.C.; Software, P.C., M.D. and K.L.; Resources, K.L.; Writing—review & editing, M.D.; Supervision, L.A.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

There are no data associated with this paper.

Acknowledgments

We would like to express our sincere gratitude to the Editor and the anonymous referees for their valuable and constructive comments. Their insightful feedback and suggestions have greatly contributed to the improvement of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Agresti, A. An Introduction to Categorical Data Analysis, 2nd ed.; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
Hogg, R.V.; McKean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Person: Boston, MA, USA, 2019. [Google Scholar]
Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 1900, 50, 157–175. [Google Scholar] [CrossRef] [Green Version]
Rice, J.A. Mathematical Statistics and Data Analysis, 3rd ed.; Brookes/Cole: Belmont, MA, USA, 2007. [Google Scholar]
Bain, L.J.; Engelhardt, M. Introduction to Probability and Mathematical Statistics; Duxbury Press: North Scituate, MA, USA, 1992. [Google Scholar]
Frey, J. An exact multinomial test for equivalence. Can. J. Stat. 2009, 37, 47–59. [Google Scholar] [CrossRef]
Ostrovski, V. Testing equivalence of multinomial distributions. Stat. Probab. Lett. 2017, 124, 77–82. [Google Scholar] [CrossRef]
Alba-Fernández, M.V.; Jiménez-Gamero, M.D. Equivalence Tests for Multinomial Data Based on ϕ-Divergences. In Trends in Mathematical, Information and Data Sciences. Studies in Systems, Decision and Control; Balakrishnan, N., Gil, M.Á., Martín, N., Morales, D., Pardo, M.d.C., Eds.; Springer: Cham, Switzerland, 2023; Volume 445. [Google Scholar] [CrossRef]
Good, I.J. The population frequencies of species and the estimation of population parameters. Biometrika 1953, 40, 237–264. [Google Scholar] [CrossRef]
Good, I.J. On the estimation of small frequencies in contingency tables. J. R. Stat. Soc. Ser. B 1956, 18, 113–124. [Google Scholar] [CrossRef]
Good, I.J. The Estimation of Probabilities: An Essay on Modern Bayesian Methods; MIT Press: Cambridge, MA, USA, 1965. [Google Scholar]
Good, I.J. A Bayesian significance test for multinomial distributions (with Discussion). J. R. Stat. Soc. Ser. B 1967, 29, 399–431. [Google Scholar]
Lindley, D.V. The Bayesian analysis of contingency tables. Ann. Math. Stat. 1964, 35, 1622–1643. [Google Scholar] [CrossRef]
Altham, P.M.E. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher’s “exact” significance test. J. R. Stat. Soc. Ser. 1969, 31, 261–269. [Google Scholar] [CrossRef]
Altham, P.M.E. The analysis of matched proportions. Biometrika 1971, 58, 561–576. [Google Scholar] [CrossRef]
Geisser, S. On prior distributions for binary trials. Am. Stat. 1984, 38, 244–247. [Google Scholar]
Bernardo, J.M.; Ramón, J.M. An introduction to Bayesian reference analysis: Inference on the ratio of multinomial parameters. Statistician 1998, 47, 101–135. [Google Scholar] [CrossRef] [Green Version]
Bernardo, J.M.; Smith, A.F.M. Bayesian Theory; Wiley: Hoboken, NJ, USA, 1994. [Google Scholar]
Hashemi, L.; Nandram, B.; Goldberg, R. Bayesian analysis for a single 2×2 table. Stat. Med. 1997, 16, 1311–1328. [Google Scholar] [CrossRef]
Nurminen, M.; Mutanen, P. Exact Bayesian analysis of two proportions. Scand. J. Stat. 1987, 14, 67–77. [Google Scholar]
Agresti, A.; Min, Y. Frequentist performance of Bayesian confidence intervals for comparing proportions in 2×2 contingency tables. Biometrics 2005, 61, 515–523. [Google Scholar] [CrossRef] [PubMed]
Agresti, A.; Hitchcock, D.B. Bayesian inference for categorical data analysis. Stat. Methods Appl. 2005, 14, 297–330. [Google Scholar] [CrossRef]
Leonard, T.; Hsu, J.S.J. The Bayesian Analysis of Categorical Data—A Selective Review. In Aspects of Uncertainty; Freeman, P.R., Smith, A.F.M., Eds.; A Tribute to D. V. Lindley; Wiley: New York, NY, USA, 1994; pp. 283–310. [Google Scholar]
Carota, C. A family of power-divergence diagnostics for goodness-of-fit. Can. J. Stat. 2007, 35, 549–561. [Google Scholar] [CrossRef]
Kim, M.; Nandram, B.; Kim, D.H. Nonparametric Bayesian test of homogeneity using a discretization approach. J. Korean Data Inf. Sci. Soc. 2018, 29, 303–311. [Google Scholar] [CrossRef] [Green Version]
Quintana, F.A. Nonparametric Bayesian analysis for assessing homogeneity in k × l contingency tables with fixed right margin totals. J. Am. Stat. Assoc. 1998, 93, 1140–1149. [Google Scholar] [CrossRef]
Al-Labadi, L.; Cheng, Y.; Fazeli-Asl, F.; Lim, K.; Weng, Y. A Bayesian one-sample test for proportion. Stats 2022, 5, 1242–1253. [Google Scholar] [CrossRef]
Evans, M.; Guttman, I.; Li, P. Prior elicitation, assessment and inference with a Dirichlet prior. Entropy 2017, 19, 564. [Google Scholar] [CrossRef] [Green Version]
Evans, M. Measuring Statistical Evidence Using Relative Belief; Monographs on Statistics and Applied Probability 144; Taylor & Francis Group, CRC Press: Boca Raton, RL, USA, 2015. [Google Scholar]
Abdelrazeq, I.; Al-Labadi, L.; Alzaatreh, A. On one-sample Bayesian tests for the mean. Statistics 2020, 54, 424–440. [Google Scholar] [CrossRef]
Al-Labadi, L. The two-sample problem via relative belief ratio. Comput. Stat. 2021, 36, 1791–1808. [Google Scholar] [CrossRef]
Al-Labadi, L.; Berry, S. Bayesian estimation of extropy and goodness of fit tests. J. Appl. Stat. 2020, 49, 357–370. [Google Scholar] [CrossRef] [PubMed]
Al-Labadi, L.; Evans, M. Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Anal. 2017, 12, 705–728. [Google Scholar] [CrossRef]
Al-Labadi, L.; Evans, M. Prior-based model checking. Can. J. Stat. 2018, 46, 380–398. [Google Scholar] [CrossRef] [Green Version]
Al-Labadi, L.; Patel, V.; Vakiloroayaei, K.; Wan, C. Kullback–Leibler divergence for Bayesian nonparametric model checking. J. Korean Stat. Soc. 2020, 50, 272–289. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Freund, R.J.; Wilson, W.J.; Mohr, D.L. Statistical Methods, 3rd ed.; Academic Press: Cambridge, MA, USA, 2010. [Google Scholar]

Figure 1. Density plot of distances in Example 1.

Figure 2. Density plot of distances in Example 2.

Figure 3. Density plot of distances in Example 3.

Table 1. Data of Example 1.

	1	2	3	4	5	6	Total
Observed	8	11	5	12	15	6	60

Table 2. The RB and its strength (Str) for Example 1.

Prior	RB (Strength)	Decision
Uniform	15.125 (1)	Strong evidence in favor of $H_{0}^{1}$
Jeffreys	19.824 (1)	Strong evidence in favor of $H_{0}^{1}$
Evan et al.	1.900 (1)	Strong evidence in favor of $H_{0}^{1}$
p-value	0.3027	Fail to reject $H_{0}^{1}$ at $α = 0.05$

Table 3. Data of Example 2.

Component	Successful	Failure
1	40	10
2	48	2
3	45	5
4	40	10

Table 4. The RB and its strength (Str) for Example 2.

Prior	RB (Strength)	Decision
Uniform	20(1)	Strong evidence in favor of $H_{0}^{2}$
Jeffreys	19.998 (0.000)	Weak evidence in favor of $H_{0}^{2}$
Evan et al.	0.592 (0.047)	Strong evidence against $H_{0}^{2}$
p-value	0.030	Fail to reject $H_{0}^{1}$ at $α = 0.05$

Table 5. Data of Example 3.

Occupation	Type of Cancer
Occupation	Lung	Stomach	Other	Total
Blue collar	53	17	30	100
White Collar	10	67	23	100
Unemployed	30	30	40	100

Table 6. The RB and its strength (Str) for Example 3.

Prior	RB (Strength)	Decision
Uniform	0.102 (0.010)	Strong evidence against $H_{0}$
Jeffreys	1.874 (0.129)	Weak evidence in favor of $H_{0}$
Evan et al.	0.012 (0.000)	Strong evidence against $H_{0}$
p-value	0.000	Reject $H_{0}^{1}$ at $α = 0.05$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Ciur, P.; Dimovic, M.; Lim, K. Assessing Multinomial Distributions with a Bayesian Approach. Mathematics 2023, 11, 3007. https://doi.org/10.3390/math11133007

AMA Style

Al-Labadi L, Ciur P, Dimovic M, Lim K. Assessing Multinomial Distributions with a Bayesian Approach. Mathematics. 2023; 11(13):3007. https://doi.org/10.3390/math11133007

Chicago/Turabian Style

Al-Labadi, Luai, Petru Ciur, Milutin Dimovic, and Kyuson Lim. 2023. "Assessing Multinomial Distributions with a Bayesian Approach" Mathematics 11, no. 13: 3007. https://doi.org/10.3390/math11133007

APA Style

Al-Labadi, L., Ciur, P., Dimovic, M., & Lim, K. (2023). Assessing Multinomial Distributions with a Bayesian Approach. Mathematics, 11(13), 3007. https://doi.org/10.3390/math11133007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing Multinomial Distributions with a Bayesian Approach

Abstract

1. Introduction

2. Relevant Background

2.1. Inferences Using Relative Belief

2.2. KL Divergence

3. The Approach

3.1. Bayesian One-Sample Multinomial

3.2. Bayesian r-Sample Multinomial Test with a Completely Specified Null Hypothesis

3.3. Bayesian Test for Homogeneity in r-Sample Multinomial Data

4. Examples

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI