2.1. A Simple Model with the Stratified Random Sampling
In this section, we consider that the application of stratified sampling design to a simple model for estimating two sensitive attributes when the sizes of each stratum are known. We assume a population with size
N composed of mutually exclusive strata, each with
a known size. From each stratum
, a sample of size
is drawn using simple random sampling with replacement (SRSWR). The
ith respondent from each stratum
h responds to a survey selected by two randomization devices as following
Table 1 and
Table 2, answering either “yes” or “no” by the following devices, with selection probabilities
and
.
Therefore, the probability that the respondent in stratum
answers (yes, yes) is given by
And the probability that the respondent in stratum
h answers (yes, no) is given by
The probability that the respondent in stratum
h answers (no, yes) is given by
Finally, the probability that the respondent in stratum
h answers (no, no) is given by
where
and
are a population proportion of sensitive attribute
A and sensitive attribute
B and
is a population proportion of sensitive attribute
A and
B, and
for a stratum
h, respectively.
Let the number of person who answered (yes, yes) out of
respondents sampled from
h stratum is
, and answered (yes, no) is
, and answered (no, yes) is
, and answered (no, no) is
, then the sample proportions are
respectively.
We can define the minimum distance function between the population proportion
and observed sample proportion
for stratum
h, as follows
Setting for minimizing the distance function,
then using the method of moment we can find the estimators of population proportion
,
, and
for stratum
h as following
And the population proportions of sensitive attribute are respectively,
where
.
Theorem 1. The stratified estimators , and are an unbiased estimator for the population proportions of sensitive attribute, , , and , respectively. Proof. Because
,
,
,
, then
In the same way, it can prove that , respectively. □
Theorem 2. The variances of stratified estimators , and are given by The covariances of stratified estimator
,
and
are given by
Next, we consider the sample alloction problem for the stratified sampling design, which are the proportional allocation and the optimum allocation. In the proportional allocation the size of straum
h is
then the variances of the stratified estimator are given by
In the optimum allocation, for a fixed cost, the stratum size
can be determined to minimize the variance
,
,
for each estimator. Let the cost function can be defined as follows;
where
be a fixed cost and
be a survey cost for unit in stratum
h.
Using the Cauchy-Schwartz inequality, we can find the optimum sample size
which minimize the variance
,
,
for a stratum
h as following,
Thus, we can find that the variances due to the optimum allocation are given by
2.2. A Simple Model with the Stratified Double Sampling
In
Section 2.1, the problem of applying the stratified random sampling method to the simple model was discussed, assuming that the sizes of each stratum in the population are precisely known. However, when information about the stratification variables is not available, it is necessary to initially take a large sample to understand the information about the stratification variables. Stratified double sampling is a method after stratifying, a subsample is selected to obtain information about sensitive attributes.
In this section, we consider that the stratified double sampling is applied to the simple model in estimating the proportion of a sensitive attribute in sensitive surveys. Also, we deal the problem of allocating samples to each stratum.
When the sizes of each stratum are unknown, in the first stage, respondents are directly questioned about surveys that match the stratification criteria to classify the strata. In the second stage, the simple model is applied to sensitive questions. The survey involves directly questioning respondents, who were randomly and uniformly selected in the first stage from a population of size N, which is composed of L strata.
In the first stage, let’s classify the sample into h stratum, and denote the number of individuals in each stratum as . Then, and can be expressed as follows.
, : the population proportion of stratum h,
, : the sample proportion of stratum h,
where is the unbiased estimator of .
In the second stage, the respondents od size
by simple random sample with replacement selected from the first-stage sample of
individuals respond using the randomization device of the simple model of [
7].
Among the respondents sampled from stratum
h,
, the number of person who answered (yes, yes) is
, who answered (yes, no) is
, and who answered (no, yes) is
, and who answered (no, no) is
, then the sample proportions
,
,
and
are
The estimators for the population proportion of sensitive attribute
,
, and
ara given by
respectively.
Theorem 3. If the estiamtors , , and are unbiased estimator for , and in stratum h, then the stratified double estimators , and are unbiased for , and , respectively Proof. Since
,
and
are unbiased estimator of
,
and
, respectively, we get
□
Theorem 4. If the respondents are selected with simple random sampling from different strata, the variance of the stratified double estimator , , and are given as following, respectively,
and
where
.
Proof. By [
8] we can rewrite the estimator
as follows
The variance of the first term on right-hand side of equation is
and the variance of the second term on right-hand side of equation is
Since
, then it becomes
It can make (
27) from these equation, (
28) and (
29) of the variance
, and
in the same way. □
Next, we consider the proportional and the optimum allocation as methods of allocating overall sample of
n to each stratum of
and check variance for each case. If the first samples
and
are used instead of the population size
N and
, the sample size of stratum
h in proportional distribution is
, so the variances of
,
and
are as follows respectively.
We can find the optimum values of
and
to minimize
,
and
for a specified cost. The cost function is
where
is a classification cost for unit and
is the cost per unit in stratum
h.
Since
is a random variable, we should minimize the expected cost
to obtain the optimum values of
and
.
The optimum values of
that minimize the products of the expected cost (
34) and variances (
27)–(
29) are obtained using Cauchy-Schwartz inequality respectively.
The optimum values of
,
and
can be obtained as follows by substituting the values of
,
and
into the expected cost (
34).
The minimum variances of
,
and
are obtained by substituting the optimum values of
,
and
and
,
and
into (
27), (
28) and (
29) respectively.