4.1. Blackwell Sufficiency
Consider two experiments,
X and
Y, that depend on
θ. One usually wants to choose between
X and
Y for inferences about
θ based solely on the conditional distributions of
X given
θ and
Y given
θ. In this section we review the concept of Blackwell Sufficiency Blackwell [
8] and show that it is a generalization of the Sufficiency Principle for comparison of experiments.
A statistic T is sufficient for an experiment X, if X and θ are conditionally independent given T. Consequently, T is sufficient iff . The conditional distribution of X given θ can be generated by observing T and sampling from .
Let
and
be two statistical experiments.
X is Blackwell Sufficient for
Y if there exists a map
, a transition function, satisfying the following properties:
For any , is measurable on the σ-algebra induced by X, .
For any , is a probability (density) function defined on .
For any , , the conditional expectation of given θ.
Let and be countable sets and define for all , as a trivial experiment such that . From the definition of Blackwell Sufficiency, the quantities and are equally distributed: X is Blackwell Sufficient for Y if and only if one can obtain an experiment with the same distribution as Y by observing and, after that, performing the “randomization”, .
Next, we provide two examples of Blackwell Sufficiency that address the question in the end of
Section 3.2. Example 1 is a version of that in Basu and Pereira [
9]. Example 2 is new and shows that sampling without replacement is Blackwell sufficient for sampling with replacement. Other examples of Blackwell Sufficiency can be found, for example, in Goel and Ginebra [
10] and Torgersen [
2].
Example 1 Let X and Y be two experiments, π a quantity of interest in and q and p known constants in . Representing the Bernoulli distribution with parameter p by Ber, consider also that the conditional distributions of X and Y given π are, respectively: X is Blackwell Sufficient for Y regarding π.
Proof. Let and , both independent of all other variables, then defining , and are equally distributed. Therefore, X is Blackwell Sufficient for Y.
Example 2 Next, we generalize the example of Section 3.2. Consider an urn with N balls. θ of these balls are black and are white. balls are drawn from the urn. By stating that is a sample with replacement from the urn, we mean: Conditionally on θ, ;
Conditionally on θ, are identically distributed;
is conditionally independent of given θ, .
Analogously, corresponds to a sample without replacement, that is: Conditionally on θ, ;
,
, .
is Blackwell Sufficient for regarding θ.
Proof. Define
,
and
two quantities
and
. These two quantities are such that:
, and is independent of all other variables;
;
, conditionally on , is jointly independent of and θ.
Conditionally on θ, Ber, . Therefore, Ber and is conditionally independent of given θ. Finally, since is a function of , and , conclude that is independent of given θ. By the previous conclusions, is identically distributed to . Also, by construction, is trivial, . Hence, sampling without replacement is Blackwell Sufficient for sampling with replacement.
Hence, in
Section 3.2, Experiment 3 is Blackwell Sufficient for Experiment 2. Similarly, Basu and Pereira [
11] shows that Experiment 3 is Blackwell Sufficient for 1. One expects that the information gained about
θ by performing Experiment 3 is at least as much as one would obtain by performing Experiments 1 or 2. Are experiments 1 or 2 also Blackwell Sufficient for 3? In this case, the experiments would be equally informative. In the next subsection we present a theorem that characterizes when two experiments are equally informative in Blackwell’s sense and, thus, also answers the comparison of the experiments in
Section 3.2.
4.2. Equivalence Relation in Experiment Information
In this section, the experiments can assume values in a countable set. For an experiment , we assume that X is measurable on the power set of and that . No assumption is required of Θ.
Using Blackwell Sufficiency, it is possible to define an equivalence relation between experiments:
X and
Y are Blackwell Equivalent if any one is Blackwell Sufficient for the other,
. This equivalence relates to the Likelihood Principle in
Section 3.1 through:
Theorem 2 Let and be two experiments. iff, for every likelihood function , The following notation reduces the algebra involved. Since all sets are countable, consider them to be ordered. Let, , be a probability function, then we define that is a vector such that in its i-th position the value assumed is ; is the i-th element of the ordering assumed in the set of values of X. Consider F to be an arbitrary map from into . We also use the symbol F for the countably infinite matrix that has in its j-th row and i-th column position the value of ; is the i-th element of the ordering in and is the j-th element of the ordering in . Finally, a (transposed) transition matrix is such that all of its elements are greater or equal to 0 and for any column the sum of its elements is equal to 1.
Proof. (⇐) Let
and
, such that
and
are likelihood nuclei of
x and
y—a likelihood nucleus is a chosen likelihood between all of those that are proportional. Recall from Basu [
4] that
S and
T are, respectively, minimal sufficient statistics for
X and
Y. Therefore,
and
. By the hypothesis,
and
are identically distributed, therefore they are Blackwell Equivalent. By transitivity of Blackwell Equivalence
, since
.
(⇒) Consider the above statistics S and T. For simplicity, we call (For an arbitrary function f and set A, we define as the image of A through f.) and . We also call and . Clearly, by construction, for every two points in or in , if their likelihood functions are proportional, then they are the same point. Since S and T are minimal sufficient statistics, , and, therefore, .
Since
S is Blackwell Sufficient for
T, there exists a map
such that
A is a transition matrix and:
On the other hand,
T is also Blackwell Sufficient for
S and, similarly, there exists a map
such that
B is a transition matrix and:
From these two equations, there exist two other transition matrices,
and
, such that:
Since M and N are transition matrices, respectively, from to and from to , we consider the Markov Chains associated to them. All probability functions in the family are invariant measures for M. Note that there are no transient states in M. If there were, let x be a transient state in M, consequently , . This is a contradiction from the assumption that ; Conclude that there is no transient state in M.
Next, we use the following result found in Ferrari and Galves [
12]:
Lemma 1 Consider a Markov Chain on a countable space with a transition matrix M and no transient states. Let M have irreducible components , …, , …. Then, there exists an unique set of probability functions , with defined in , such that all invariant measures (μ) of M can be written as the following:
If is the i-th element of , then and q is a probability function in N.
Recall that if a Markov Chain is irreducible, it admits a unique ergodic measure. This lemma states that any invariant measure of an arbitrary countable Markov Chain is a mixture of the unique ergodic measures in each of the irreducible components.
Using the lemma, since
are irreducible components of
M and
is the element of number
i of
, then
. Consequently,
If two states are in the same irreducible component then their likelihood functions are proportional. The same proof holds to matrix N.
The i-th element of is said to connect to the j-th element of if . Similarly, the i-th element of is said to connect to the j-th element of if . Note that every state in connects to at least one state in and vice-versa. This is true because A and B are transition matrices.
For all , if connects to then y only connects to . If there were a state such that y connected to , then and would be on the same irreducible component of M. Therefore and would yield proportional likelihood functions and, by the definition of S, . Similarly, if a state connects to a state then x connects solely to y.
Finally, we conclude that every state in only connects to one state in and vice versa. Also, if connects to , then y connects to x and vice-versa. This implies that if x connects to y, then , . Since S and T are sufficient the Theorem is proved.
Applying the above Theorem and the Likelihood Principle, one obtains the following result: if
X is Blackwell Equivalent to
Y,
then
, for all possible
e—the value of information.
For any information function, , satisfying the Likelihood Principle — if x and y yield proportional likelihood functions, then —, X is Blackwell Equivalent to Y, if and only if, the distribution of for X and Y are the same.
Also, since the likelihood nuclei are not equally distributed in the experiments in
Section 3.2, conclude that no pair of them is Blackwell Equivalent. Hence, from the conclusions in 4.1, Experiment 3 is strictly more informative than Experiments 2 and 1.
4.3. Experiment Information Function
In the last section, we defined properties an information function should satisfy. We reviewed Blackwell Sufficiency as a general rule for comparing experiments. Nevertheless, not every two experiments are comparable through this criterion. Next, we explicitly consider functions capable of describing the information of an experiment. A possible approach to this problem is considering that the information gained is a utility function DeGroot [
13] that the scientist wants to maximize. This way, it follows from DeGroot [
13] that
. Since we consider the data information function as non-negative, the utility function is concave, see DeGroot [
14] for instance.
Proceeding with this approach, we compare the different information functions presented in
Section 3.2. In this example, the maximum information is obtained when the posterior distribution is such that
or
. Therefore, to compare those information functions, we divide all of them by these maxima.
First, we consider Euclidean distance as the information function. In the first experiment, with probability 1 the gain of information is . That is, a small gain with a small risk. On the second experiment, with probability the gain is of the maximum and with probability it is of the maximum, moderate gain with moderate risk. In the third experiment one can get of the maximum possible information with probability and can get of the maximum possible information with probability , maximum gain with great risk. In conclusion, if one uses the Euclidian’s “utility”, then he/she would have no preference among the three experimnents, since, for all of them, the expected information gain is of . This is surprising as the third experiment is Blackwell Sufficient for both the others.
Next, consider . The information of an experiment using this metric is: . The expected information gain for each of the three experiments is, respectively, , and . Thus, the third experiment is more informative than the second, which in turn is more informative than the first.
Similarly, considering the Kullback–Leibler divergence, the expected gain of information for each of the three experiments is, respectively, , and . Again, the ordering induced by information gain in agrees with the ordering induced by Blackwell Sufficiency. The difference of information between experiments 3 and 2 is much higher than that between 2 and 1 when using Kullblack–Leibler divergence than when using .