1. Introduction
A contingency table with identical categories for rows and columns can be produced when a categorical variable is repeatedly measured. Observations in this type of table tend to concentrate on the cells along the main diagonal. Our research focuses on applying symmetry instead of assuming independence between row and column categories. Several studies have addressed symmetry issues, such as [
1,
2,
3,
4,
5,
6,
7,
8,
9].
Let
X and
Y represent the row and column variables for an
contingency table with ordinal categories. Additionally, let
represent the probability of an observation falling into the
th cell, where
and
. The diagonals–parameter symmetry (DPS) model proposed by Goodman [
10] is defined as follows.
where
and
. The parameter
in the DPS model represents the odds of an observation falling into cells
where
and
, rather than cells
for
. Moreover, the ratio between
and
can be expressed as the constant
for
and
. This ratio depends solely on the distance from the main diagonal cells.
When
in Equation (
1), the DPS model reduces to the symmetry (S) model proposed by Bowker [
1]. When
is independent of
i and
j in Equation (
1), with
, the DPS model reduces to the conditional symmetry (CS) model proposed by McCullagh [
11].
Using the
f-divergence, Kateri and Papaioannou [
2] proposed the generalized DPS (DPS[
f]) model, defined as
where
,
,
and
. It should be noted that the function
f is twice-differentiable and strictly convex. Additionally,
,
,
,
, and
. The model derivation is not included in their paper. They did mention that the DPS model is the closest to symmetry regarding the Kullback–Leibler distance under some conditions and that the DPS[
f] model is equivalent to the DPS model. In this study, we will derive the DPS[
f] model and provide proof of the relation between the two models. We can obtain various interpretations of the DPS model from the result. We discuss the necessary and sufficient condition for the S model and the property between test statistics for goodness of fit.
The paper is organized as follows:
Section 2 derives Equation (
2) and interprets the model from an information theory viewpoint. The proof is given that the DPS[
f] model is equivalent to the DPS model regardless of the function
f.
Section 3 considers a new model and proves that the proposed model and the DPS[
f] model are separable. A numerical example is provided in
Section 4. Finally,
Section 5 summarizes the paper.
2. Properties of the DPS[f] Model
Kateri and Papaioannou [
2] noted that the DPS[
f] model is the closest model to the S model in terms of the
f-divergence under the conditions where
(and
) for
k=
and the sums
for
are given. Similar research has been conducted in, for example, Ireland et al. [
12], Kateri and Agresti [
3], and Tahata [
5]. This section derives the DPS[
f] model and describes its properties.
We can obtain the following theorem, although the proof of Theorem 1 is given in
Appendix A.1.
Theorem 1. In the class of models with given , , and , the modelwith , and , is the model closest to the complete symmetry model in terms of the f-divergence. The DPS
model can be expressed as
where
,
and
. It should be noted that
represents the conditional probability of an observation falling in the
cell, given that it falls in either the
cell or the
cell. Namely, the DPS[
f] model indicates that
When , the DPS[f] model is reduced to the S model.
If
,
, then the
f-divergence is reduced to the KL divergence. When we set
, Equation (
3) is reduced to
where
and
. We shall refer to this model as the DPS
KL model. Under the DPS
KL model, the ratios of
and
for
are expressed as
where
and
. Since Equation (
5) indicates that the ratio of
and
depends on the distance of
, the DPS
KL model is equivalent to the DPS model proposed by Goodman [
10]. Namely, the DPS model is the closest model to the S model in terms of the KL divergence under the conditions where
,
, and the sums
for
are given. This is a special case of Theorem 1.
If
,
, then the
f-divergence is reduced to the reverse KL divergence. Then, the DPS[
f] model is reduced to
where
and
. We shall refer to this model as the DPS
RKL model. This model is the closest to the S model when the divergence is measured by the reverse KL divergence and can be expressed as
where
and
. This model indicates that the difference between inverse probabilities
and
depends on the distance of
.
If
, then the
f-divergence is reduced to the
-divergence (Pearsonian distance). Then, the DPS[
f] model is reduced to
where
and
. We shall refer to this model as the DPS
P model. This model is the closest to the S model when the divergence is measured by the
-divergence and can be expressed as
where
and
. This model indicates that the difference between
and
depends on the distance of
.
Moreover, if
,
, where
is a real-valued parameter, then the
f-divergence is reduced to the power-divergence [
13]. Then, the DPS
model is reduced to
where
and
. We shall refer to this model as the DPS
PD(λ) model. This model is the closest to the S model when the power-divergence measures the divergence and can be expressed as
where
and
. This model indicates that the difference between the symmetric conditional probabilities to the power of
depends on the distance of
. When we apply the DPS
PD(λ) model, we should set the value of
.
Kateri and Papaioannou [
2] reported that the DPS[
f] model is equivalent to the DPS model regardless of
f. That is, all the models described above (i.e., DPS
KL, DPS
RKL, DPS
P, and DPS
PD(λ)) are equivalent to the DPS model surprisingly. However, the proof was not given. We prove the following theorem.
Theorem 2. The DPS[f] model is equivalent to the DPS model regardless of f.
The poof is given in
Appendix A.2. Theorem 2 states that the DPS model holds if and only if the DPS[
f] model holds. If the DPS model fits the given dataset, we obtain various interpretations for the data.
When
, the DPS[
f] model is reduced to the conditional symmetry model based on the
f-divergence (CS[
f]) model. The CS[
f] model is described previously Kateri and Papaioannou [
2]. Additionally, Fujisawa and Tahata [
14] proposed the generalization of the CS[
f] model. Similarly, when
, the DPS model is reduced to the CS model proposed by McCullagh [
11]. The CS[
f] model is equivalent to the CS model regardless of
f (Kateri and Papaioannou [
2]). Hence, Theorem 2 leads to the following result.
Corollary 1. The CS[f] model is equivalent to the CS model regardless of f.
3. Equivalence Conditions for Symmetry
Here, the equivalence conditions of the S model are discussed. If the S model holds, then the DPS[
f] model with
holds. Conversely, if the DPS[
f] model holds, then the S model does not hold generally. Therefore, we are interested in considering an additional condition to obtain the S model when the DPS[
f] model holds. Other studies have discussed such conditions; see Read [
15] and Tahata et al. [
16].
We consider the distance global symmetry (DGS) model defined as
where
,
. For
, this model indicates that the sum of probabilities which are apart distance
from main diagonal cells is equal to the sum of probabilities which are apart distance
from main diagonal cells. We obtain the following theorem. (The proof is given in
Appendix A.3.)
Theorem 3. The S model holds if and only if both the DPS[f] and DGS models hold.
Next, we consider the global symmetry (GS) model, which is defined as
It should be noted that the DGS model implies the GS model. Read [
15] noted that the S model holds if and only if both the CS and GS models hold. Fujisawa and Tahata [
14] proved that the S model holds if and only if the CS[
f] and GS models hold. These statements are the same as those from Corollary 1. In addition, a refined estimator for measures associated with the S, CS, and GS models was introduced by [
17]. The result has a significant connection to decomposing the S model and separating the goodness-of-fit test statistic of the S model. According to Corollary 1, the refined estimator for the measure of CS can be utilized to gauge the extent of deviation from the CS[
f] model.
This section proves the separation of the test statistics for the S model into those for the DPS[
f] model and the DGS model. Consider a square contingency table of size
where
denotes the observed frequency in the cell located at the
position. Assume this contingency table adheres to a multinomial distribution. In this context, let
represent the expected frequency in the
cell, and
be its corresponding maximum likelihood estimate under a specified model. To test each model’s goodness of fit, we can employ the likelihood ratio chi-square statistic, denoted by
. This statistic is computed using the following formula:
This statistic follows a chi-square distribution with the corresponding degrees of freedom (df).
It is supposed that model M
3 holds if and only if both models M
1 and M
2 hold. In this case, if the analyst has found hypothesis M3 unacceptable, their attention will move to examining components M1 and M2. For these three models, Aitchison [
18] discussed the properties of the Wald test statistics, and Darroch and Silvey [
19] described the properties of the likelihood ratio chi-square statistics. Assume that the following equivalence holds:
where
T is the goodness of fit test statistic and the number of df for M
3 is equal to the sum of numbers of df for M
1 and M
2. If both M
1 and M
2 are accepted with a high probability (at the
significance level), then M
3 is accepted. However, when (
6) does not hold, an incompatible situation where both M
1 and M
2 are accepted with a high probability but M
3 is rejected may arise. In fact, Darroch and Silvey [
19] showed such an interesting example. The partitions of chi-squared test statistics are also discussed in, for example, [
20,
21].
From Theorem 3, the S model holds if and only if the DPS[f] model and the DGS model hold. In addition, df for the DPS[f] model is and that for DGS model is . The df for the S model can be obtained by adding the degrees of freedom for the DPS[f] model and the DGS model. Thus, we consider partitioning test statistics.
Theorem 2 confirms that the DPS[
f] model is equivalent to the DPS model. Therefore, the maximum likelihood estimates (MLEs) under the DPS[
f] model are given by
where
,
, and
(Goodman [
10]).
Next, we consider the MLEs under the DGS model using the Lagrange function. Since the kernel of the log likelihood is
, Lagrange function
L is written as
Equating the derivation of
L to 0 with respect to
,
, and
gives
where
. It is important to note that the DPS and DGS models do not remain the same when the row and column categories are permuted. Therefore, these models should be used with data from an ordinal category.
We obtain the following equivalence from Equations (
7) and (
8):
because the MLEs under the S model are
. Therefore, the DPS[
f] model and the DGS model are separable and exhibit independence.
Let
denote the Wald statistic for model M. We obtain the following theorem and prove it in
Appendix A.4.
Theorem 4. is equal to the sum of and .
4. Numerical Example
Table 1, which is taken from Smith et al. [
22], describes the amount of influence religious leaders and medical leaders should have in government funding for decisions on stem cell research when surveying 871 people. The influence levels are divided into four categories: (1) Great influence, (2) Some influence, (3) A little influence, and (4) No influence.
The values of the likelihood ratio chi-square statistics
and the corresponding
p values for the models applied to these data are shown in
Table 2.
Table 2 indicates that the sum of the test statistics DPS (i.e., DPS[
f]) model and DGS model is equal to that of the S model. The S model fits the data very poorly. We can infer that the marginal distribution for religious leaders is not equal to that for medical leaders. On the other hand, the DPS model fits the data very well. The likelihood-ratio test for the null hypothesis
:
uses a test statistic which is the difference between
for the S model and the DPS model. The resulting test statistic is
with three degrees of freedom. This indicates strong evidence of at least one difference from 1. Additionally, the DGS model fits the data poorly. From Theorem 3, the reason of the poor fit of S model is caused by the poor fit of the DGS model rather than the DPS model.
The values of MLEs of
in Equation (
1) are
. It should be noted that
is equal to
in the DPS
KL model. Let
denote the pair that the amount of influence religious leaders is
ith level and that of medical leaders is
jth level. When
(
), a pair
is
times as likely as a pair
on condition that a pair is
or
. From
(
), the probability distribution for religious leaders is
stochastically higher than the probability distribution of medical leaders. That is, the medical leaders rather than the religious leaders should have influence in government funding for decisions on stem cell research.
Moreover, from Theorem 2, we can obtain various interpretations. Since the DPS model holds, the DPS
RKL, DPS
P, and DPS
PD(λ) models also hold. For example, we obtain
and for
,
When (), we can infer that (i) the difference between the reciprocal of conditional probability that a pair is and the reciprocal of conditional probability that a pair is is on condition that the pair is or from the DPSRKL model, (ii) the difference between the conditional probability that a pair is and the conditional probability that a pair is is under the same condition from the DPSP model, and (iii) the difference between the conditional probability that a pair is to the third power and the conditional probability that a pair is to the third power is under the same condition from the DPSPD(3) model.