Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories

Tahata, Kouji; Matsuda, Kohei

doi:10.3390/sym16060768

Open AccessArticle

Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories

by

Kouji Tahata

^*

and

Kohei Matsuda

Department of Information Sciences, Tokyo University of Science, Chiba 278-8510, Japan

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(6), 768; https://doi.org/10.3390/sym16060768

Submission received: 2 May 2024 / Revised: 11 June 2024 / Accepted: 14 June 2024 / Published: 19 June 2024

(This article belongs to the Special Issue Symmetry in Mathematical Models)

Download Versions Notes

Abstract

:

The diagonals–parameter symmetry (DPS) model is a proposed method for analyzing square contingency tables with ordinal categories. Previously, it was stated that the generalized DPS (DPS[f]) model was equivalent to the DPS model for any function f, but the proof was not provided. This paper presents the derivation of the DPS[f] model and the proof of the relationship between the two models. The findings offer various interpretations of the DPS model. Additionally, a new model is considered, and it is shown that the proposed model and the DPS[f] model are separable.

Keywords:

conditional symmetry; f-divergence; global symmetry; partial global symmetry

1. Introduction

A contingency table with identical categories for rows and columns can be produced when a categorical variable is repeatedly measured. Observations in this type of table tend to concentrate on the cells along the main diagonal. Our research focuses on applying symmetry instead of assuming independence between row and column categories. Several studies have addressed symmetry issues, such as [1,2,3,4,5,6,7,8,9].

Let X and Y represent the row and column variables for an

r \times r

contingency table with ordinal categories. Additionally, let

π_{i j}

represent the probability of an observation falling into the

(i, j)

th cell, where

i = 1, \dots, r

and

j = 1, \dots, r

. The diagonals–parameter symmetry (DPS) model proposed by Goodman [10] is defined as follows.

\begin{matrix} π_{i j} = \{\begin{matrix} d_{k} ψ_{i j} & (i < j), \\ ψ_{i j} & (i \geq j), \end{matrix} \end{matrix}

(1)

where

ψ_{i j} = ψ_{j i}

and

k = j - i

. The parameter

d_{k}

in the DPS model represents the odds of an observation falling into cells

(i, j)

where

j - i = k

and

i < j

, rather than cells

(j, i)

for

k = 1, \dots, r - 1

. Moreover, the ratio between

π_{i j}

and

π_{j i}

can be expressed as the constant

d_{k}

for

j - i = k

and

i < j

. This ratio depends solely on the distance from the main diagonal cells.

When

d_{1} = d_{2} = \dots = d_{r - 1} = 1

in Equation (1), the DPS model reduces to the symmetry (S) model proposed by Bowker [1]. When

d_{k}

is independent of i and j in Equation (1), with

d_{1} = \dots = d_{r - 1}

, the DPS model reduces to the conditional symmetry (CS) model proposed by McCullagh [11].

Using the f-divergence, Kateri and Papaioannou [2] proposed the generalized DPS (DPS[f]) model, defined as

\begin{matrix} π_{i j} = π_{i j}^{S} F^{- 1} (Δ_{k} + ζ_{i j}) (i = 1, \dots, r; j = 1, \dots, r), \end{matrix}

(2)

where

k = i - j

,

π_{i j}^{S} = (π_{i j} + π_{j i}) / 2

,

ζ_{i j} = ζ_{j i}

and

Δ_{k} + Δ_{- k} = 0

. It should be noted that the function f is twice-differentiable and strictly convex. Additionally,

F (t) = f^{'} (t)

,

f (1) = 0

,

f (0) = {lim}_{t \to 0} f (t)

,

0 \cdot f (0 / 0) = 0

, and

0 \cdot f (a / 0) = a \lim_{t \to \infty} [f (t) / t]

. The model derivation is not included in their paper. They did mention that the DPS model is the closest to symmetry regarding the Kullback–Leibler distance under some conditions and that the DPS[f] model is equivalent to the DPS model. In this study, we will derive the DPS[f] model and provide proof of the relation between the two models. We can obtain various interpretations of the DPS model from the result. We discuss the necessary and sufficient condition for the S model and the property between test statistics for goodness of fit.

The paper is organized as follows: Section 2 derives Equation (2) and interprets the model from an information theory viewpoint. The proof is given that the DPS[f] model is equivalent to the DPS model regardless of the function f. Section 3 considers a new model and proves that the proposed model and the DPS[f] model are separable. A numerical example is provided in Section 4. Finally, Section 5 summarizes the paper.

2. Properties of the DPS[f] Model

Kateri and Papaioannou [2] noted that the DPS[f] model is the closest model to the S model in terms of the f-divergence under the conditions where

{\sum \sum}_{j - i = k} π_{i j}

(and

{\sum \sum}_{i - j = k} π_{i j}

) for k=

1, \dots, r - 1

and the sums

π_{i j} + π_{j i}

for

i, j = 1, \dots, r

are given. Similar research has been conducted in, for example, Ireland et al. [12], Kateri and Agresti [3], and Tahata [5]. This section derives the DPS[f] model and describes its properties.

We can obtain the following theorem, although the proof of Theorem 1 is given in Appendix A.1.

Theorem 1.

In the class of models with given

{\sum \sum}_{i - j = k} π_{i j}

,

k \neq 0

, and

π_{i j} + π_{j i}

(i = 1, \dots, r; j = 1, \dots, r)

, the model

\begin{matrix} π_{i j} = π_{i j}^{S} F^{- 1} (Δ_{k} + ζ_{i j}) (i = 1, \dots, r; j = 1, \dots, r) \end{matrix}

with

k = i - j

,

ζ_{i j} = ζ_{j i}

and

Δ_{k} + Δ_{- k} = 0

, is the model closest to the complete symmetry model in terms of the f-divergence.

The DPS

[f]

model can be expressed as

\begin{matrix} F (2 π_{i j}^{c}) = \{\begin{matrix} γ_{i j} + a_{k} & (i < j), \\ γ_{i j} & (i \geq j), \end{matrix} \end{matrix}

(3)

where

k = j - i

,

γ_{i j} = γ_{j i}

and

π_{i j}^{c} = π_{i j} / (π_{i j} + π_{j i})

. It should be noted that

π_{i j}^{c}

represents the conditional probability of an observation falling in the

(i, j)

cell, given that it falls in either the

(i, j)

cell or the

(j, i)

cell. Namely, the DPS[f] model indicates that

\begin{matrix} F (2 π_{i j}^{c}) - F (2 π_{j i}^{c}) = a_{k} (i < j) . \end{matrix}

(4)

When

a_{1} = \dots = a_{r - 1} = 0

, the DPS[f] model is reduced to the S model.

If

f (x) = x log (x)

,

x > 0

, then the f-divergence is reduced to the KL divergence. When we set

f (x) = x log (x)

, Equation (3) is reduced to

\begin{matrix} π_{i j} = \{\begin{matrix} π_{i j}^{S} exp (γ_{i j} + a_{k} - 1) & (i < j), \\ π_{i j}^{S} exp (γ_{i j} - 1) & (i \geq j), \end{matrix} \end{matrix}

where

k = j - i

and

γ_{i j} = γ_{j i}

. We shall refer to this model as the DPS_KL model. Under the DPS_KL model, the ratios of

π_{i j}

and

π_{j i}

for

i < j

are expressed as

\begin{matrix} \frac{π_{i j}}{π_{j i}} = d_{k}^{KL} (i < j), \end{matrix}

(5)

where

d_{k}^{KL} = exp (a_{k})

and

k = j - i

. Since Equation (5) indicates that the ratio of

π_{i j}

and

π_{j i}

depends on the distance of

k = j - i

, the DPS_KL model is equivalent to the DPS model proposed by Goodman [10]. Namely, the DPS model is the closest model to the S model in terms of the KL divergence under the conditions where

{\sum \sum}_{i - j = k} π_{i j}

,

k \neq 0

, and the sums

π_{i j} + π_{j i}

for

i = 1, \dots, r; j = 1, \dots, r

are given. This is a special case of Theorem 1.

If

f (x) = - log (x)

,

x > 0

, then the f-divergence is reduced to the reverse KL divergence. Then, the DPS[f] model is reduced to

\begin{matrix} π_{i j} = \{\begin{matrix} π_{i j}^{S} (- \frac{1}{γ_{i j} + a_{k}}) & (i < j), \\ π_{i j}^{S} (- \frac{1}{γ_{i j}}) & (i \geq j), \end{matrix} \end{matrix}

where

k = j - i

and

γ_{i j} = γ_{j i}

. We shall refer to this model as the DPS_RKL model. This model is the closest to the S model when the divergence is measured by the reverse KL divergence and can be expressed as

\begin{matrix} \frac{1}{π_{i j}^{c}} - \frac{1}{π_{j i}^{c}} = d_{k}^{RKL} (i < j), \end{matrix}

where

d_{k}^{RKL} = - 2 a_{k}

and

k = j - i

. This model indicates that the difference between inverse probabilities

1 / π_{i j}^{c}

and

1 / π_{j i}^{c}

depends on the distance of

k = j - i

.

If

f (x) = {(1 - x)}^{2}

, then the f-divergence is reduced to the

χ^{2}

-divergence (Pearsonian distance). Then, the DPS[f] model is reduced to

\begin{matrix} π_{i j} = \{\begin{matrix} π_{i j}^{S} (\frac{γ_{i j} + a_{k}}{2} + 1) & (i < j), \\ π_{i j}^{S} (\frac{γ_{i j}}{2} + 1) & (i \geq j), \end{matrix} \end{matrix}

where

k = j - i

and

γ_{i j} = γ_{j i}

. We shall refer to this model as the DPS_P model. This model is the closest to the S model when the divergence is measured by the

χ^{2}

-divergence and can be expressed as

\begin{matrix} π_{i j}^{c} - π_{j i}^{c} = d_{k}^{P} (i < j), \end{matrix}

where

d_{k}^{P} = a_{k} / 4

and

k = j - i

. This model indicates that the difference between

π_{i j}^{c}

and

π_{j i}^{c}

depends on the distance of

k = j - i

.

Moreover, if

f (x) = {(λ (λ + 1))}^{- 1} (x^{λ + 1} - x)

,

x > 0

, where

λ

is a real-valued parameter, then the f-divergence is reduced to the power-divergence [13]. Then, the DPS

[f]

model is reduced to

\begin{matrix} π_{i j} = \{\begin{matrix} π_{i j}^{S} {(λ (γ_{i j} + a_{k}) + \frac{1}{λ + 1})}^{\frac{1}{λ}} & (i < j), \\ π_{i j}^{S} {(λ γ_{i j} + \frac{1}{λ + 1})}^{\frac{1}{λ}} & (i \geq j), \end{matrix} \end{matrix}

where

k = j - i

and

γ_{i j} = γ_{j i}

. We shall refer to this model as the DPS_PD(λ) model. This model is the closest to the S model when the power-divergence measures the divergence and can be expressed as

\begin{matrix} {(π_{i j}^{c})}^{λ} - {(π_{j i}^{c})}^{λ} = d_{k}^{PD (λ)} (i < j), \end{matrix}

where

d_{k}^{PD (λ)} = (λ a_{k}) / 2^{λ}

and

k = j - i

. This model indicates that the difference between the symmetric conditional probabilities to the power of

λ

depends on the distance of

k = j - i

. When we apply the DPS_PD(λ) model, we should set the value of

λ

.

Kateri and Papaioannou [2] reported that the DPS[f] model is equivalent to the DPS model regardless of f. That is, all the models described above (i.e., DPS_KL, DPS_RKL, DPS_P, and DPS_PD(λ)) are equivalent to the DPS model surprisingly. However, the proof was not given. We prove the following theorem.

Theorem 2.

The DPS[f] model is equivalent to the DPS model regardless of f.

The poof is given in Appendix A.2. Theorem 2 states that the DPS model holds if and only if the DPS[f] model holds. If the DPS model fits the given dataset, we obtain various interpretations for the data.

When

a_{1} = \dots = a_{r - 1}

, the DPS[f] model is reduced to the conditional symmetry model based on the f-divergence (CS[f]) model. The CS[f] model is described previously Kateri and Papaioannou [2]. Additionally, Fujisawa and Tahata [14] proposed the generalization of the CS[f] model. Similarly, when

d_{1} = \dots = d_{r - 1}

, the DPS model is reduced to the CS model proposed by McCullagh [11]. The CS[f] model is equivalent to the CS model regardless of f (Kateri and Papaioannou [2]). Hence, Theorem 2 leads to the following result.

Corollary 1.

The CS[f] model is equivalent to the CS model regardless of f.

3. Equivalence Conditions for Symmetry

Here, the equivalence conditions of the S model are discussed. If the S model holds, then the DPS[f] model with

a_{1} = \dots = a_{r - 1} = 0

holds. Conversely, if the DPS[f] model holds, then the S model does not hold generally. Therefore, we are interested in considering an additional condition to obtain the S model when the DPS[f] model holds. Other studies have discussed such conditions; see Read [15] and Tahata et al. [16].

We consider the distance global symmetry (DGS) model defined as

\begin{matrix} δ_{k}^{U} = δ_{k}^{L} (k = 1, \dots, r - 1), \end{matrix}

where

δ_{k}^{U} = {\sum \sum}_{j - i = k} π_{i j}

,

δ_{k}^{L} = {\sum \sum}_{i - j = k} π_{i j}

. For

k = 1, \dots, r - 1

, this model indicates that the sum of probabilities which are apart distance

k = j - i

from main diagonal cells is equal to the sum of probabilities which are apart distance

k = i - j

from main diagonal cells. We obtain the following theorem. (The proof is given in Appendix A.3.)

Theorem 3.

The S model holds if and only if both the DPS[f] and DGS models hold.

Next, we consider the global symmetry (GS) model, which is defined as

\begin{matrix} \underset{i < j}{\sum \sum} π_{i j} = \underset{i < j}{\sum \sum} π_{j i} . \end{matrix}

It should be noted that the DGS model implies the GS model. Read [15] noted that the S model holds if and only if both the CS and GS models hold. Fujisawa and Tahata [14] proved that the S model holds if and only if the CS[f] and GS models hold. These statements are the same as those from Corollary 1. In addition, a refined estimator for measures associated with the S, CS, and GS models was introduced by [17]. The result has a significant connection to decomposing the S model and separating the goodness-of-fit test statistic of the S model. According to Corollary 1, the refined estimator for the measure of CS can be utilized to gauge the extent of deviation from the CS[f] model.

This section proves the separation of the test statistics for the S model into those for the DPS[f] model and the DGS model. Consider a square contingency table of size

r \times r

where

n_{i j}

denotes the observed frequency in the cell located at the

(i, j)

position. Assume this contingency table adheres to a multinomial distribution. In this context, let

m_{i j}

represent the expected frequency in the

(i, j)

cell, and

{\hat{m}}_{i j}

be its corresponding maximum likelihood estimate under a specified model. To test each model’s goodness of fit, we can employ the likelihood ratio chi-square statistic, denoted by

G^{2} (M)

. This statistic is computed using the following formula:

\begin{matrix} G^{2} (M) = 2 \sum_{i = 1}^{r} \sum_{j = 1}^{r} n_{i j} \log (\frac{n_{i j}}{{\hat{m}}_{i j}}) . \end{matrix}

This statistic follows a chi-square distribution with the corresponding degrees of freedom (df).

It is supposed that model M₃ holds if and only if both models M₁ and M₂ hold. In this case, if the analyst has found hypothesis M3 unacceptable, their attention will move to examining components M1 and M2. For these three models, Aitchison [18] discussed the properties of the Wald test statistics, and Darroch and Silvey [19] described the properties of the likelihood ratio chi-square statistics. Assume that the following equivalence holds:

T (M_{3}) = T (M_{1}) + T (M_{2}),

(6)

where T is the goodness of fit test statistic and the number of df for M₃ is equal to the sum of numbers of df for M₁ and M₂. If both M₁ and M₂ are accepted with a high probability (at the

α

significance level), then M₃ is accepted. However, when (6) does not hold, an incompatible situation where both M₁ and M₂ are accepted with a high probability but M₃ is rejected may arise. In fact, Darroch and Silvey [19] showed such an interesting example. The partitions of chi-squared test statistics are also discussed in, for example, [20,21].

From Theorem 3, the S model holds if and only if the DPS[f] model and the DGS model hold. In addition, df for the DPS[f] model is

(r - 1) (r - 2) / 2

and that for DGS model is

(r - 1)

. The df for the S model can be obtained by adding the degrees of freedom for the DPS[f] model and the DGS model. Thus, we consider partitioning test statistics.

Theorem 2 confirms that the DPS[f] model is equivalent to the DPS model. Therefore, the maximum likelihood estimates (MLEs) under the DPS[f] model are given by

\begin{matrix} \{\begin{matrix} {\hat{m}}_{i j} & = \frac{n_{k}^{U}}{n_{k}^{U} + n_{k}^{L}} (n_{i j} + n_{j i}) & (i < j), \\ {\hat{m}}_{i j} & = n_{i j} & (i = j), \\ {\hat{m}}_{i j} & = \frac{n_{k}^{L}}{n_{k}^{U} + n_{k}^{L}} (n_{i j} + n_{j i}) & (i > j), \end{matrix} \end{matrix}

(7)

where

k = | j - i |

,

n_{k}^{U} = \sum \sum_{k = j - i} n_{i j}

, and

n_{k}^{L} = \sum \sum_{k = j - i} n_{j i}

(Goodman [10]).

Next, we consider the MLEs under the DGS model using the Lagrange function. Since the kernel of the log likelihood is

\sum_{i = 1}^{r} \sum_{j = 1}^{r} n_{i j} log π_{i j}

, Lagrange function L is written as

\begin{matrix} L = \sum_{i = 1}^{r} \sum_{j = 1}^{r} n_{i j} \log π_{i j} + λ (\sum_{i = 1}^{r} \sum_{j = 1}^{r} π_{i j} - 1) + \sum_{k = 1}^{r - 1} λ_{k} (\underset{k = j - i}{\sum \sum} (π_{i j} - π_{j i})) . \end{matrix}

Equating the derivation of L to 0 with respect to

π_{i j}

,

λ

, and

λ_{k}

gives

\begin{matrix} \{\begin{matrix} {\hat{m}}_{i j} & = \frac{(n_{k}^{U} + n_{k}^{L}) n_{i j}}{2 n_{k}^{U}} & (i < j), \\ {\hat{m}}_{i j} & = n_{i j} & (i = j), \\ {\hat{m}}_{i j} & = \frac{(n_{k}^{U} + n_{k}^{L}) n_{i j}}{2 n_{k}^{L}} & (i > j), \end{matrix} \end{matrix}

(8)

where

k = | j - i |

. It is important to note that the DPS and DGS models do not remain the same when the row and column categories are permuted. Therefore, these models should be used with data from an ordinal category.

We obtain the following equivalence from Equations (7) and (8):

\begin{matrix} G^{2} (S) = G^{2} (D P S [f]) + G^{2} (D G S), \end{matrix}

because the MLEs under the S model are

{\hat{m}}_{i j} = (n_{i j} + n_{j i}) / 2

. Therefore, the DPS[f] model and the DGS model are separable and exhibit independence.

Let

W (M)

denote the Wald statistic for model M. We obtain the following theorem and prove it in Appendix A.4.

Theorem 4.

W (S)

is equal to the sum of

W (D P S [f])

and

W (D G S)

.

4. Numerical Example

Table 1, which is taken from Smith et al. [22], describes the amount of influence religious leaders and medical leaders should have in government funding for decisions on stem cell research when surveying 871 people. The influence levels are divided into four categories: (1) Great influence, (2) Some influence, (3) A little influence, and (4) No influence.

The values of the likelihood ratio chi-square statistics

G^{2}

and the corresponding p values for the models applied to these data are shown in Table 2. Table 2 indicates that the sum of the test statistics DPS (i.e., DPS[f]) model and DGS model is equal to that of the S model. The S model fits the data very poorly. We can infer that the marginal distribution for religious leaders is not equal to that for medical leaders. On the other hand, the DPS model fits the data very well. The likelihood-ratio test for the null hypothesis

H_{0}

:

d_{1} = d_{2} = d_{3} = 1

uses a test statistic which is the difference between

G^{2}

for the S model and the DPS model. The resulting test statistic is

545.15 - 2.45 = 542.70

with three degrees of freedom. This indicates strong evidence of at least one difference from 1. Additionally, the DGS model fits the data poorly. From Theorem 3, the reason of the poor fit of S model is caused by the poor fit of the DGS model rather than the DPS model.

The values of MLEs of

(d_{1}, d_{2}, d_{3})

in Equation (1) are

(0.15, 0.05, 0.06)

. It should be noted that

(d_{1}, d_{2}, d_{3})

is equal to

(d_{1}^{KL}, d_{2}^{KL}, d_{3}^{KL})

in the DPS_KL model. Let

(i, j)

denote the pair that the amount of influence religious leaders is ith level and that of medical leaders is jth level. When

k = j - i

(

k = 1, 2, 3

), a pair

(i, j)

is

{\hat{d}}_{k}

times as likely as a pair

(j, i)

on condition that a pair is

(i, j)

or

(j, i)

. From

{\hat{d}}_{k} < 1

(

k = 1, 2, 3

), the probability distribution for religious leaders is stochastically higher than the probability distribution of medical leaders. That is, the medical leaders rather than the religious leaders should have influence in government funding for decisions on stem cell research.

Moreover, from Theorem 2, we can obtain various interpretations. Since the DPS model holds, the DPS_RKL, DPS_P, and DPS_PD(λ) models also hold. For example, we obtain

\begin{matrix} ({\hat{d}}_{1}^{RKL}, {\hat{d}}_{2}^{RKL}, {\hat{d}}_{3}^{RKL}) & = (6.37, 19.22, 18.09), \\ ({\hat{d}}_{1}^{P}, {\hat{d}}_{2}^{P}, {\hat{d}}_{3}^{P}) & = (- 0.73, - 0.90, - 0.90), \end{matrix}

and for

λ = 3

,

\begin{matrix} ({\hat{d}}_{1}^{PD (3)}, {\hat{d}}_{2}^{PD (3)}, {\hat{d}}_{3}^{PD (3)}) = (- 0.65, - 0.86, - 0.85) . \end{matrix}

When

k = j - i

(

k = 1, 2, 3

), we can infer that (i) the difference between the reciprocal of conditional probability that a pair is

(i, j)

and the reciprocal of conditional probability that a pair is

(j, i)

is

{\hat{d}}_{k}^{RKL}

on condition that the pair is

(i, j)

or

(j, i)

from the DPS_RKL model, (ii) the difference between the conditional probability that a pair is

(i, j)

and the conditional probability that a pair is

(j, i)

is

{\hat{d}}_{k}^{P}

under the same condition from the DPS_P model, and (iii) the difference between the conditional probability that a pair is

(i, j)

to the third power and the conditional probability that a pair is

(j, i)

to the third power is

{\hat{d}}_{k}^{PD (3)}

under the same condition from the DPS_PD(3) model.

5. Concluding Remarks

This paper proves that the DPS[f] model is equivalent to the DPS model proposed by Goodman [10]. This result provides various interpretations of the DPS model. The separation of the test statistic for the S model is considered. The DPS[f] and DGS models are separable and exhibit independence. Kateri and Papaioannou [2], Kateri and Agresti [3], Tahata [5] and Fujisawa and Tahata [14] considered models based on the f-divergence for the analysis of square contingency tables with ordinal categories. In the future, it should be studied whether the model based on the f-divergence is equivalent to the conventional model.

Author Contributions

Conceptualization, K.T.; methodology, K.T.; software, K.M.; validation, K.T. and K.M.; formal analysis, K.T.; investigation, K.M.; resources, Tahata, K.; data curation, K.M.; writing—original draft preparation, K.M.; writing—review and editing, K.T.; visualization, K.M.; supervision, K.T.; project administration, K.T.; funding acquisition, K.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI (Grant Number 20K03756).

Data Availability Statement

“The General Social Survey” at https://gss.norc.org/ (accessed on 1 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

This section provides the proofs of theorems.

Appendix A.1

In a similar manner to Tahata [5], we prove Theorem 1. Let

I^{C} (π : π^{S})

denote the f-divergence between (

π_{i j}

) and (

π_{i j}^{S}

). That is

\begin{matrix} I^{C} (π : π^{S}) = \sum_{i = 1}^{r} \sum_{j = 1}^{r} π_{i j}^{S} f (\frac{π_{i j}}{π_{i j}^{S}}), \end{matrix}

(A1)

where f satisfies the conditions described in Section 1. Now minimize (A1) under the conditions where the restraints

\begin{matrix} π_{i j} + π_{j i} = t_{i j} = t_{j i} (i = 1, \dots, r; j = 1, \dots, r) \end{matrix}

(A2)

and

\begin{matrix} δ_{- k}^{U} = \underset{i - j = - k}{\sum \sum} π_{i j}, δ_{k}^{L} = \underset{i - j = k}{\sum \sum} π_{i j} (k = 1, \dots, r - 1) \end{matrix}

(A3)

are given. The Lagrange function is written as

\begin{matrix} L = I^{C} (π : π^{S}) + \sum_{i = 1}^{r} \sum_{j = 1}^{r} λ_{i j} (π_{i j} + π_{j i} - t_{i j}) \\ + \sum_{k = 1}^{r - 1} ({\bar{Δ}}_{- k} (\underset{i - j = - k}{\sum \sum} π_{i j} - δ_{- k}^{U}) + {\bar{Δ}}_{k} (\underset{i - j = k}{\sum \sum} π_{i j} - δ_{k}^{L})) . \end{matrix}

By taking the partial derivative of L with respect to

π_{i j}

and setting it to zero, we obtain the following equation:

\begin{matrix} \{\begin{matrix} f^{'} (\frac{π_{i j}}{π_{i j}^{S}}) & + {\bar{Δ}}_{- k} + λ_{i j} + λ_{j i} = 0 & (i < j), \\ f^{'} (\frac{π_{i j}}{π_{i j}^{S}}) & + λ_{i j} + λ_{j i} = 0 & (i = j), \\ f^{'} (\frac{π_{i j}}{π_{i j}^{S}}) & + {\bar{Δ}}_{k} + λ_{i j} + λ_{j i} = 0 & (i > j) . \end{matrix} \end{matrix}

(A4)

Let

f^{'}

denote F, and let

π_{i j}^{*}

denote the solution satisfying (A2), (A3), and (A4). Given that f is a strictly convex function, it follows that

F^{'} (x) = f^{″} (x) > 0

for all x. Thus, F is strictly monotonic, ensuring the existence of

F^{- 1}

. We represent

ζ_{i j}

as

- (λ_{i j} + λ_{j i})

and

Δ_{l}

as

- {\bar{Δ}}_{l}

. From Equation (A4), we obtain

\begin{matrix} \{\begin{matrix} π_{i j}^{*} & = π_{i j}^{S} F^{- 1} (Δ_{- k} + ζ_{i j}) & (i < j), \\ π_{i j}^{*} & = π_{i j}^{S} F^{- 1} (ζ_{i j}) & (i = j), \\ π_{i j}^{*} & = π_{i j}^{S} F^{- 1} (Δ_{k} + ζ_{i j}) & (i > j), \end{matrix} \end{matrix}

where

ζ_{i j} = ζ_{j i}

and

Δ_{k} + Δ_{- k} = 0

. The minimum value of

I^{C} (π : π^{S})

is obtained for

π_{i j}^{*}

, where

ζ_{i j}

and

Δ_{l}

are selected to ensure that

π_{i j}^{*}

complies with the constraints (A2) and (A3). Thus, the DPS[f] model represents the optimal approximation to the S model in terms of f-divergence under these specified conditions.

Appendix A.2

Let function G be defined as

\begin{matrix} G (x) = F (\frac{2 x}{1 + x}) - F (\frac{2}{1 + x}) (x > 0), \end{matrix}

where

F = f^{'}

. Then, the derivative of G is

\begin{matrix} G^{'} (x) = \frac{2}{{(1 + x)}^{2}} (F^{'} (\frac{2 x}{1 + x}) + F^{'} (\frac{2}{1 + x})) . \end{matrix}

Since the function f is twice-differential and strictly convex

G^{'} (x) > 0

for

x > 0

, hence G is a strictly increasing function, and

G^{- 1}

exists.

If the DPS model holds,

π_{i j} / π_{j i} = d_{k}

holds for

i < j

from Equation (1), where

k = j - i

. Then we can see that for

i < j

,

\begin{matrix} G (d_{k}) & = F (\frac{2 d_{k}}{1 + d_{k}}) - F (\frac{2}{1 + d_{k}}), \\ = F (2 π_{i j}^{c}) - F (2 π_{j i}^{c}) . \end{matrix}

This is equivalent to Equation (4). Namely, the DPS[f] model holds.

On the other hand, if the DPS[f] model holds, Equation (4) holds. We can see that for

i < j

,

\begin{matrix} G (\frac{π_{i j}}{π_{j i}}) = a_{k} . \end{matrix}

Since

G^{- 1}

exists, we obtain

\begin{matrix} \frac{π_{i j}}{π_{j i}} = G^{- 1} (a_{k}) . \end{matrix}

Namely, the DPS model holds. The proof is complete.

Appendix A.3

It is obvious that if the S model holds, the DPS[f] model and the DGS model simultaneously hold. Assuming that both the DPS[f] and the DGS models hold, we show that the S model holds. From Theorem 2, the DPS[f] model is equivalent to

π_{i j} / π_{j i} = d_{k}

for

i < j

with

k = j - i

. Since the DGS model holds, we obtain

\begin{matrix} \underset{j - i = k}{\sum \sum} (d_{k} - 1) π_{j i} = 0 (k = 1, \dots, r - 1) . \end{matrix}

Since

π_{j i} > 0

, we get

d_{k} = 1

(

k = 1, \dots, r - 1

). Namely, the S model holds.

Appendix A.4

Theorem 2 shows that the DPS[f] model is equivalent to the DPS model. Let

\begin{matrix} π & = {(π_{11}, \dots, π_{1 r}, π_{21}, \dots, π_{2 r}, \dots, π_{r 1}, \dots, π_{r r})}^{t}, \\ β & = {(ρ_{1}, \dots, ρ_{r - 1}, ε)}^{t}, \end{matrix}

where

ε = (ε_{11}, \dots, ε_{1 r}, ε_{22}, \dots, ε_{2 r}, \dots, ε_{r r})

. Then, from Equation (1), the DPS model is expressed as

\begin{matrix} log π = X β = (x_{1}, \dots, x_{r - 1}, x_{11}, \dots, x_{1 r}, x_{22}, \dots, x_{2 r}, \dots, x_{r r}) β, \end{matrix}

where

x_{l} = {(w_{l + 1}, \dots, w_{r}, 0, \dots, 0)}^{t}

is a

r^{2} \times 1

vector

(l = 1, \dots, r - 1)

. Here,

w_{h}

(

1 \times r

vector) is 1 for the hth element and 0 otherwise. For example, when

r = 4

,

\begin{matrix} x_{1} = {(w_{2}, w_{3}, w_{4}, 0, \dots, 0)}^{t} = {(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0)}^{t} . \end{matrix}

Additionally,

x_{i j}

(

i \leq j

) is the

r^{2} \times 1

vector shouldering

ε_{i j}

. Note that the

r^{2} \times K

matrix

X

is a full column rank where

K = (r - 1) + r (r + 1) / 2

.

We define the linear space spanned by the columns of the matrix

X

as

S (X)

, which has dimension K. This space,

S (X)

, is a subspace of

R^{r^{2}}

. Consider an

r^{2} \times d_{1}

matrix

U

with full column rank, such that the linear space

S (U)

, spanned by the columns of

U

, serves as the orthogonal complement of

S (X)

. Note that

d_{1}

is calculated as

d_{1} = r^{2} - ((r - 1) + r (r + 1) / 2) = (r - 1) (r - 2) / 2

. Given that

U^{t} X = O_{d_{1}, K}

, where

O_{d_{1}, K}

denotes the

d_{1} \times K

zero matrix, the DPS model can be expressed as

h_{1} (π) = U^{t} log π = 0_{d_{1}}

, with

0_{s}

representing the

s \times 1

zero vector.

Additionally, the DGS model can be expressed as

h_{2} (π) = M π = 0_{d_{2}}

where

\begin{matrix} M = {(g_{1}, \dots, g_{r - 1})}^{t}, \end{matrix}

and

d_{2} = r - 1

. Here,

g_{l} = 2 x_{l} - \sum \sum_{j - i = l} x_{i j}

. Note that

M^{t}

belongs to the space

S (X)

. That is,

S (M^{t}) \subset S (X)

.

Let

p

denote

π

with

π_{i j}

replaced by

p_{i j}

, where

p_{i j} = n_{i j} / n

with

n = \sum \sum n_{i j}

. From Theorem 3, the S model is equivalent to

h_{3} (π) = 0_{d_{3}}

, where

h_{3} = {(h_{1}^{t}, h_{2}^{t})}^{t}

and

d_{3} = d_{1} + d_{2} = r (r - 1) / 2

. In an analogous manner to Tahata [5], we obtain that

\sqrt{n} (h_{3} (p) - h_{3} (π))

has an asymptotically normal distribution with mean

0_{d_{3}}

and covariance matrix

\begin{matrix} H_{3} (π) Σ (π) H_{3}^{t} (π) = [\begin{matrix} H_{1} (π) Σ (π) H_{1}^{t} (π) & O_{d_{1}, d_{2}} \\ O_{d_{2}, d_{1}} & H_{2} (π) Σ (π) H_{2}^{t} (π) \end{matrix}], \end{matrix}

where

H_{s} (π) = \partial h_{s} (π) / π^{t}

and

Σ (π) = d i a g (π) - π π^{t}

. Here,

d i a g (π)

denotes a diagonal matrix with the ith component of

π

as the ith diagonal component. Therefore,

W_{3} = W_{1} + W_{2}

holds, where

\begin{matrix} W_{s} = n h_{s}^{t} (p) {(H_{s} (p) Σ (p) H_{s}^{t} (p))}^{- 1} h_{s} (p) . \end{matrix}

The Wald statistic for the DPS[f] model (i.e.,

W (D P S [f])

) is

W_{1}

, that for the DGS model (i.e.,

W (D G S)

) is

W_{2}

, and that for the S model (i.e.,

W (S)

) is

W_{3}

. The proof is complete.

References

Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
Kateri, M.; Papaioannou, T. Asymmetry models for contingency tables. J. Am. Stat. Assoc. 1997, 92, 1124–1131. [Google Scholar] [CrossRef]
Kateri, M.; Agresti, A. A class of ordinal quasi-symmetry models for square contingency tables. Stat. Probab. Lett. 2007, 77, 598–603. [Google Scholar] [CrossRef]
Tahata, K.; Tomizawa, S. Generalized linear asymmetry model and decomposition of symmetry for multiway contingency tables. J. Biom. Biostat. 2011, 2, 1–6. [Google Scholar] [CrossRef]
Tahata, K. Separation of symmetry for square tables with ordinal categorical data. Jpn. J. Stat. Data Sci. 2020, 3, 469–484. [Google Scholar] [CrossRef]
Tahata, K. Advances in Quasi-Symmetry for Square Contingency Tables. Symmetry 2022, 14, 1051. [Google Scholar] [CrossRef]
Beh, E.J.; Lombardo, R. Visualising Departures from Symmetry and Bowker’s X2 Statistic. Symmetry 2022, 14, 1103. [Google Scholar] [CrossRef]
Altun, G.; Saraçbaşı, T. Determination of model fitting with power-divergence-type measure of departure from symmetry for sparse and non-sparse square contingency tables. Commun. Stat.-Simul. Comput. 2022, 51, 4087–4111. [Google Scholar] [CrossRef]
Ando, S. Generalized Sum-Asymmetry Model and Orthogonality of Test Statistic for Square Contingency Tables. Austrian J. Stat. 2024, 53, 99–108. [Google Scholar] [CrossRef]
Goodman, L.A. Multiplicative Models for Square Contingency Tables with Ordered Categories. Biometrika 1979, 66, 413–418. [Google Scholar] [CrossRef]
McCullagh, P. A class of parametric models for the analysis of square contingency tables with ordered categories. Biometrika 1978, 65, 413–418. [Google Scholar] [CrossRef]
Ireland, C.T.; Ku, H.H.; Kullback, S. Symmetry and Marginal Homogeneity of an r × r Contingency Table. J. Am. Stat. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
Read, C.B.; Cressie, N. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
Fujisawa, K.; Tahata, K. Asymmetry model based on f-divergence and orthogonal decomposition of symmetry for square contingency tables with ordinal categories. SUT J. Math. 2020, 56, 39–53. [Google Scholar] [CrossRef]
Read, C.B. Partitioning chi-squape in contingency tables: A teaching approach. Commun. Stat.-Theory Methods 1977, 6, 553–562. [Google Scholar] [CrossRef]
Tahata, K.; Naganawa, M.; Tomizawa, S. Extended linear asymmetry model and separation of symmetry for square contingency tables. J. Jpn. Stat. Soc. 2016, 46, 189–202. [Google Scholar] [CrossRef]
Tahata, K.; Auchi, R.; Ando, S.; Tomizawa, S. Separation of the refined estimator of the measure for symmetry in square contingency tables. Commun. Stat.-Simul. Comput. 2023, 1–17. [Google Scholar] [CrossRef]
Aitchison, J. Large-sample restricted parametric tests. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1962, 24, 234–250. [Google Scholar] [CrossRef]
Darroch, J.N.; Silvey, S.D. On testing more than one hypothesis. Ann. Math. Stat. 1963, 34, 555–567. [Google Scholar] [CrossRef]
Lang, J.B.; Agresti, A. Simultaneously modeling joint and marginal distributions of multivariate categorical responses. J. Am. Stat. Assoc. 1994, 89, 625–632. [Google Scholar] [CrossRef]
Lang, J.B. On the partitioning of goodness-of-fit statistics for multivariate categorical response models. J. Am. Stat. Assoc. 1996, 91, 1017–1023. [Google Scholar] [CrossRef]
Smith, T.W.; Marsden, P.; Hout, M.; Kim, J. General Social Surveys, 1972–2014 [Machine-Readable Data File]; NORC at the University of Chicago: Chicago, IL, USA, 2006. [Google Scholar]

Table 1. How much influence should religious leaders and medical leaders have in government funding for decisions on stem cell research? [22].

Religious	Medical Leaders
Leaders	Great (1)	Fair (2)	Little (3)	None (4)	Total
Great (1)	36	16	7	7	66
	(36.00) ^a	(11.96) ^a	(6.22) ^a	(7.00) ^a
	(36.00) ^b	(60.19) ^b	(70.95) ^b	(67.00) ^b
Fair (2)	74	96	22	4	196
	(78.04) ^a	(96.00) ^a	(26.05) ^a	(4.78) ^a
	(42.67) ^b	(96.00) ^b	(82.76) ^b	(40.55) ^b
Little (3)	119	174	48	4	345
	(119.78) ^a	(169.95) ^a	(48.00) ^a	(3.99) ^a
	(62.59) ^b	(100.34) ^b	(48.00) ^b	(15.05) ^b
None (4)	127	93	26	18	264
	(127.00) ^a	(92.22) ^a	(26.01) ^a	(18.00) ^a
	(67.00) ^b	(48.91) ^b	(14.99) ^b	(18.00) ^b
Total	356	379	103	33	871

^a MLEs under the DPS model; ^b MLEs under the DGS model.

Table 2. Likelihood ratio chi-square values

G^{2}

for the models applied to Table 1.

Table 2. Likelihood ratio chi-square values

G^{2}

for the models applied to Table 1.

Models	df	$G^{2}$	p-Value
S	6	545.15	< $0.0001$
DPS	3	2.45	0.4847
DGS	3	542.70	< $0.0001$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tahata, K.; Matsuda, K. Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories. Symmetry 2024, 16, 768. https://doi.org/10.3390/sym16060768

AMA Style

Tahata K, Matsuda K. Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories. Symmetry. 2024; 16(6):768. https://doi.org/10.3390/sym16060768

Chicago/Turabian Style

Tahata, Kouji, and Kohei Matsuda. 2024. "Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories" Symmetry 16, no. 6: 768. https://doi.org/10.3390/sym16060768

APA Style

Tahata, K., & Matsuda, K. (2024). Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories. Symmetry, 16(6), 768. https://doi.org/10.3390/sym16060768

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diagonals–Parameter Symmetry Model and Its Property for Square Contingency Tables with Ordinal Categories

Abstract

1. Introduction

2. Properties of the DPS[f] Model

3. Equivalence Conditions for Symmetry

4. Numerical Example

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

Appendix A.3

Appendix A.4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI