1. Introduction
A categorical variable distinguishes a set of categories. It is employed in diverse fields such as social sciences, medical sciences, engineering, and education. Here, we consider a categorical variable with
r categories and another one with
c categories. The outcome for two variables has
possible combinations, which can be denoted by a rectangular table with
r rows and
c columns, where the cells illustrate the
possible outcomes. This is called a contingency table (for more details, see [
1,
2]). A contingency table illustrates the joint frequencies by combination of two categorical variables. When analyzing a contingency table, only the observed frequencies are seen, but the true distribution is unknown. One of the aims of analyzing a contingency table is to estimate an unknown probability distribution from the observed frequencies. The confidence level of the estimated distribution is higher when fewer parameters are used to describe the data. Sometimes, we need to consider a parsimonious model. Traditionally, a contingency table is used to evaluate whether classifications are associated. That is, the analysis determines whether two variables are statistically independent.
If two variables take the same categorical values, the table is called a “square” contingency table. When the observed frequencies are concentrated in the main diagonal cells, the two variables are dependent. Even if the observations are not concentrated on the main diagonal but we have one large frequency and several small frequencies in each row and each column, then there is a strong association between the categories of a variable and those of the other, and hence a strong dependence. This is a common situation in real world data and, since the case of independence is infrequent and unrealistic, a suitable model for representing dependence data is important. Consequently, many statisticians consider various statistical models instead of an independence model and study the method of estimation and hypothesis testing based on a statistical model.When statistical independence between two variables does not hold, association models, which indicate the structure of odds ratios, have been considered to analyze contingency tables. On the other hand, symmetry or asymmetry models, which indicate the structure of ratios for cell probabilities in symmetric positions, are often used to analyze square contingency tables.
This study proposes a model with characteristics of both an association model and asymmetry model. Our model is more parsimonious than many association or asymmetry models. Hence, our model may better estimate the distribution than conventional association models and asymmetry models.
This paper is organized as follows.
Section 2 introduces previous research and proposes an asymmetry plus association model.
Section 3 describes the necessary and sufficient condition to use our model.
Section 4 provides the methods to evaluate model-fitting based on goodness-of-fit.
Section 5 concludes this paper.
2. Models
For an
square contingency table with ordinal categories, let
denote the probability that an observation will fall in the
ith row and
jth column of the contingency table (
). Goodman [
3,
4,
5] considered many association models in a contingency table. For example, the quasi-uniform association (QU) model is defined as
Without loss of generality, we impose
. The odds ratio for rows
i and
j i), and columns
s and
t s) are denoted by
. That is,
Using the odds ratios, the QU model can be expressed as
The QU model with
is the quasi-independence (QI) model (see p. 426 in Agresti [
6]). That is,
On the other hand, many statisticians have analyzed square contingency tables using a symmetric structure or an asymmetric structure for cell probabilities. Bowker [
7] proposed the symmetry (S) model, which is defined as
where
. This model indicates the symmetric structure for cell probabilities.
Stuart [
8] proposed the marginal homogeneity (MH) model, which is defined as
where
and
. The MH model indicates that the row marginal distribution is equivalent to the column marginal distribution.
Caussinus [
9] proposed the quasi-symmetry (QS) model, which is defined as
where
. This model is identical to the S model when
. The QS model can be expressed as
The QS model indicates the symmetric structure of the odds ratios. The QU model implies the QS model. That is, the QS model holds when the QU model holds.
When the S model does not hold, asymmetry models, with a weaker restriction than the S model, have been proposed. For example, Tahata and Tomizawa [
10] proposed the
kth linear asymmetry (LS
) model, which is defined for a fixed
k as
where
. Note that when
, this model is the S model. As
k increases, the LS
model is less restrictive, and the LS
model is the QS model. Namely, the LS
model is the intermediate model between the S model and QS model. The LS
model can be expressed as
The LS
model includes the linear diagonals-parameter symmetry model [
11] and the extended linear diagonals-parameter symmetry model [
12].
Goodman [
4] introduced the symmetry plus quasi-independence (SQI) model, which is defined as
This model is a special case of the S model and the QI model when and for , respectively.
Yamamoto and Tomizawa [
13] proposed the symmetry plus quasi-uniform association (SQU) model, which is defined as
The SQU model implies the S model and QU model. Note that the SQU model is identical to the SQI model when .
Association models and asymmetry models have been proposed independently. However, an asymmetry plus association model, which considers both the structure of asymmetry for cell probabilities and odds ratios, is rarely considered.
Here, we propose a new model defined for a fixed
k as
Without loss of generality, we set . This model is called the kth linear asymmetry plus quasi-uniform association (LSQU) model. When , it is called the kth linear asymmetry plus quasi-independence (LSQI) model.
If the LSQU
model holds, then
The LS
model holds by
in Equation (
14). Additionally,
Therefore, the LSQU model shows characteristics of both the LS model and the QU model.
This model with
for
is the SQU model. When
, the LSQU
model implies
On the other hand, the QU model implies
where
. Setting
provides a one-to-one relation between
and
. This means that the LSQU
model is equivalent to the QU model. The LSQU
(
) model is a special case of the QU model since the LSQU
model with
for
is the LSQU
model. Hence, the LSQU
model is an intermediate model between the SQU and QU models. Similarly, the LSQI
model is equivalent to the QI model. That is, the LSQI
model is an intermediate model between the SQI and QI models (For more details, see
Figure 1).
3. Necessary and Sufficient Condition for the SQU Model
Caussinus [
9] introduced the necessary and sufficient condition for the S model. This condition separates the S model into multiple models with a weaker restriction than the S model. Assuming that model M
holds if and only if both models M
and M
hold, then analyzing models M
and M
should elucidate a more detailed structure of the cell probabilities. Here, we are interested in deriving a necessary and sufficient condition for the SQU model using the LSQU
model.
Yamamoto and Tomizawa [
13] provided the following necessary and sufficient condition for the SQU model.
Theorem 1. The SQU model holds if and only if both the QU model and the MH model hold.
Let
X and
Y denote the row and column variables, respectively, and consider a model defined for a fixed
k (
), which is given as
where
and
. This model can be referred to as the marginal
kth moment equality (ME
) model. This leads to the following theorem.
Theorem 2. For any k , the SQU model holds if and only if both the LSQU model and the ME model hold.
Proof. If the SQU model holds, the LSQU
model holds because the LSQU
model with
is the SQU model. Since the SQU model implies the S model, we can see that
The ME model also holds. The necessity is proved.
Conversely, if both the LSQU
model and the ME
model hold, we can prove that the SQU model holds. If the LSQU
model holds, from Equation (
14), we obtain
The ME
model can also be expressed as
From the LSQU
model and the ME
model, we obtain
Since the logarithmic function is strictly increasing, then for any
Equation (
22) with
holds, that is, the S model holds. When the S model holds, the MH model holds. Additionally, the LSQU
model is a special case of the QU model. From Theorem 1, the SQU model holds. The proof is complete. □
Theorem 2 is a generalization of Yamamoto and Tomizawa’s result because the ME
model is equivalent to the MH model (see [
14]). This leads to the following corollary.
Corollary 1. For any k , the SQI model holds if and only if both the LSQI model and the ME model hold.
4. Partition of Test Statistics
Here, we describe a method to evaluate the model fitting. We consider a test of hyphothesis, where the null hypothesis is that model M holds, and the alternative hypothesis is that model M does not hold. Let
denote the observed frequency in the (
)th cell of the table and
indicate the corresponding expected frequency with
(
. Assume that
has a multinomial distribution. Then
denotes the maximum likelihood estimate (MLE) of
under a model. The likelihood ratio chi-squared statistic for the goodness-of-fit of the model M is defined as
The numbers of degrees of freedom (df) for testing the goodness-of-fit under the SQU, LSQU, and ME models are , , and k, respectively. The number of df for the SQU model is equal to the sum of those for the LSQU and ME models.
Previous studies have discussed the separability of a model [
15,
16,
17,
18,
19]. Separability means that a test statistic for the goodness-of-fit of model M
is asymptotically equivalent to the sum of the test statistics for model M
and model M
when model M
can be separated into model M
and model M
. If it holds, the incompatible situation, where both model M
and model M
are accepted but model M
is rejected, would not arise. This leads to the following theorem.
Theorem 3. For any k , the test statistic is asymptotically equivalent to the sum of and .
Proof. For a fixed
k (
), the LSQU
model can be expressed as
Without loss of generality, we can impose
. Let
and
where “
T" denotes the transpose,
and
The LSQU
model can also be expressed as
where
,
X is the
matrix, and
is the
vector of the 1 element. Additionally,
where
and
is the
matrix determined from Equation (
25). Note that
is the
zero matrix,
is the
zero vector,
, and “⊗" represents the Kronecker product. The matrix
X has a full column rank, which is
.
We denote the linear space spanned by the columns of the matrix X by with dimension K. Let U be an full column rank matrix, where , such that is the orthogonal complement of space . Hence, .
Let be a vector of functions defined by . Moreover, let be a vector of functions defined by , and note that where because belongs to space .
From Equation (
25), the LSQU
model is equivalent to the hypothesis
. Additionally, the ME
model is equivalent to the hypothesis
. From Theorem 2, the SQU model is equivalent to the hypothesis
where
and
.
We derive the Wald statistic for the SQU model in an analogous mannar to Bhapkar [
20]. Let
denote the
matrix of partial derivatives of
with respect to
. Namely,
. Let
, where
denotes a diagonal matrix with the
ith component of
as the
ith diagonal component. Additionally, let
denote a sample proportion of the (
) cell. That is,
, and
. The central limit theorem indicates that
has an asymptotic normal distribution with mean
and covariance matrix
. Using the delta method,
has an asymptotic normal distribution with mean
and covariance matrix
Since
,
, and
, we obtain
Under each hypothesis,
, we see
where
The Wald statistic
has an asymptotic chi-squared distribution with
df. That is, (i)
is the Wald statistic for the LSQU
model, (ii)
is that for the ME
model, and (iii)
is that for the SQU model. The proof is completed using the asymptotic equivalence of the Wald statistic and the likelihood ratio statistic as proved by Rao [
21]. □
Theorem 3 is also a generalization of Yamamoto and Tomizawa’s result since this theorem is identical to Yamamoto and Tomizawa’s result when . Moreover, we obtain the following corollary.
Corollary 2. For any k , the test statistic is asymptotically equivalent to the sum of and .
5. Example
Table 1 shows the data cited by [
22]. This data described 59 matched pairs using 4 dose levels of conjugated estrogen. The models described herein are used to analyze this data.
Table 2 shows the value of
for each model applied to the data in
Table 1. That is, for model M, the null hypothesis is that model M holds, and the alternative hypothesis is that model M does not hold. From
Table 2, the SQI, SQU, S, and ME
models do not fit well, and the LSQI
, LSQU
, and LS
models are accepted at the 0.05 significant level
. We choose the most appropriate model in these models. If model M
is a special case of model M
, a test based on the difference between the likelihood ratio chi-squared statistic can compare the model fitting of two nested models. Let
and
denote the degrees of freedom for the models M
and M
, respectively. Assuming that model M
holds, a likelihood ratio chi-squared statistic under model M
is given as
. This statistic is an asymptotically chi-squared distribution with
degrees of freedom. When we use it at the 0.05 significant level, the LSQI
model is the most appropriate model.
Table 3 shows the estimated expected frequencies from the LSQI
model for the data in
Table 1. The value of maximum likelihood estimator of
for the LSQI
model is 0.71. We estimate the ratio between two probabilities as
for
. Therefore, the probability distribution for the average dose for a case tends to be stochastically higher than the probability distribution for the average dose for control because
.
Finally, we are interested in inferring the reason for the poor fit of the SQI model. According to Corollary 1, the SQI model is separated into the LSQI model and the ME model. Since the LSQI model fits very well, but the ME model fits very poorly, we deduce that the lack of structure of the ME model is responsible for the poor fit of the SQI model.