6.1. When Losses Cannot Be Minimised
We shall now develop an example in which the minimax theorem fails:
as we shall see in Proposition 27. However, the entropy identity,
, does hold (Proposition 25 and Proposition 26). The connection with optimal loss fails to obtain since minloss
(Proposition 30). Thus, there is no belief function with an optimal loss profile in this sort of example. Nevertheless, certain equivocal functions
derived from the maximal entropy function come arbitrarily close to having the best loss profile (Proposition 29 and Proposition 31). So, while there is no unique function with the best loss profile, the functions
have a very good loss profile.
In the following discussion we shall focus on the most simple possible language,
, which contains only one relation symbol, U, which is unary. We focus on this simple language since the minimax results already fail here and considering more expressive languages does not lead to new insights while creating more notational issues. As a technical convenience, we extend the notion of a loss profile to arbitrary functions
, not merely normalised belief functions.
The example that we shall consider is generated by the following evidence:
Let
be the
k-th
n-state of
,
i.e.,
. The set of calibrated probability functions can be characterized in various ways:
The last two characterisations employ quantifiers; adding quantifiers to the language enables a finite representation of what is essentially an infinitely generated evidence set. Hence in Definition 5, we specified that an evidence set is finitely generated just if it generated by quantifier-free sentences of some finite sublanguage.
We now begin our analysis of this example:
Proposition 25. If g = gΩ or if g is symmetric and inclusive, then and is not open-minded.
Then, by Landes and Williamson [
4] (Corollary 6, p. 3574) for symmetric and inclusive
gand so for all
and all 1 ≤
i ≤ 2
n−1For all
and
i ∈ {1, 2
n−1+1,…,2
n}
The result for g = gΩ follows in the same way as above. □
We shall note for later reference that for all
n ≥ 2
Proposition 26. If g =
gΩ or if g is regular, then Proof. First note that
.
We shall show that for all
there exists an
such that for all n ≥ N we have
and
.
Since
there exists a minimal
and a k-state ν ∈ Ωk such that Q(ν) > PΩ†(ν) ≥ 0.
Case 1.
To simplify notation let
α :=
Pk†(
ν) =
Pk†()
Let us now define a function
. Note that since we want
to be a member of
we need to let
. Now let for all
The restriction operator
applied to some belief function B continuous to refer to the restriction of B to
, rather than to the restriction to
.
Note that for all
n ≥ 1
since entropy maximisers assign
n-states the same degree of belief whenever possible [
4] (Corollary 7, p. 3577). Thus,
.
It follows that for all large enough n ∈
that
.
For regular
g we now find
So, as long as
goes to zero quickly enough it follows that
for large enough n. Corollary 6 shows that this is indeed the case for regular g.
Case 2 Since Q is assumed to be calibrated,
, this case cannot occur. Case 3.
Case 3A.
Then
But for all
n ∈ ℕ
Since Q ≠ PΩ† it follows that there exists some N ∈ ℕ such that
. But then
.
Case B.
Then
. Proceed as in Case 1. □
Proposition 27. If g = gΩ or if g is regular, then.
Proof. We here show that there exists an
such that for all n ∈ ℕ it holds that
and that there exists an open-minded
such that for all n ∈ ℕ we have
.
Note that the probability function
with
. Then
.
We shall now construct an open-minded
as advertised. For all
n ∈ ℕ let
Thus, Q is open-minded and hence
for all n∈ ℕ. □
Note that Condition 1 of Definition 21 is solely responsible for the fact that
. Condition 2 has played no role here.
So far, we have established that
does not have the best loss profile. The question arises whether there exists a belief function
which is a minimal element of ≺, i.e.,
.
Proposition 28. If g =
gΩ,
then Initially, one might suspect that
would be somehow due to the fact that the
do not take beliefs in all sentences into account. This is not the case. As we will see,
holds. That is, even when restricting attention to probability functions, whose values on the n-states completely determine degrees of beliefs in all other sentences, we cannot find a function with an optimal loss profile.
Proof. Suppose for contradiction that
.
If Q is not open-minded, then there exists an N ∈ ℕ, an F ⊆ ΩN and an
such that °P (F ) > 0 and °Q(F ) = 0. But then there has to exists some ω ∈ ΩN with ω ∈ F such that P (ω) > 0 = Q(ω) since Q and P are probability functions. Thus, for all n ≥ N there exists some ν ∈ Ωn such that ν = ω with P (ν) > 0 = Q(ν). But then
for all n ≥ N.
In the proof of Proposition 27 we constructed an open-minded function Q+ ∈ E. For Q+ we have for all n that
. So, any
has to be open-minded.
Case 1and Q ∉ E
Since
there has to exist a minimal k ≥ 2 such that
.
We next define a probability function
with the following construction for all
n ≥ 2
It follows that for all
n ≥ k and all
and all
such that
P (
ω) > 0 it holds that.
For all large enough
n ∈ N we then find
Hence, there has to exists a
.
Case 2 Q.
Thus,
Let
N ≥ 3 be such that
For
n ≥ N let
We now find for all fixed
n ≥ N that
We shall now define a function
by letting for all
n ≥ 2:
That is,
.
For large enough
M ∈ ℕ it holds for all
n ≥ M that
Furthermore, for all
n ≥ max{
M, N} it holds that
and hence for all large enough fixed
n ∈ ℕ and all
Thus, R has a better loss profile than Q. Hence,
.
Finally, let us consider loss profiles for
.
Case 3..
For all
, the expression
only depends on the degrees of belief B assigns to sentences which represent an n-state. So, the degree of belief in a sentence φ ∈ S which does not n-represent an n-state are ignored by
for all n and all
. If B agrees with some probability function
on all sentences of S∄ which n-represent an n-state, then B and P are equally preferable according to ≺. As we saw above, for all
there exists some
with Q ≺ P. Thus, B cannot be a minimal element of ≺.
We can hence assume that for all
there exists some sentence
which n-represents an n-state such that B(φ) 6= P(φ). Since no
is dominated, it follows that B(ϕ) < P(φ).
First define a function
B0 as follows:
B0, which does not agree with any probability function on
has been constructed in such a way that B and B0 are equally preferred according to ≺.
Next define a function
B+ by first letting for all fixed
N ∈ ℕ
for all sentences
which are logically equivalent to an
N-state. Put
B+(
ψ) := 0 for all other
.
Since
B+ dominates
B0 the loss profile of
B+ cannot be worse than that of
B0. Furthermore, note that for all
N ∈ ℕ, all
ω ∈ Ω
N and all
n >
N it holds that
Let
. For α = 0 it follows by the usual reasoning that B+ cannot have an ideal loss profile. This leads to a contradiction in the usual way.
For 1
≥ α > 0 define a function
B∞ by first letting for all sentences
which are logically equivalent to some
n-state
ωFor all other sentences
let B∞(φ) := 0.
Observe that for all
k ∈ ℕ and all
ω ∈ Ω
kFinally, we note that B∞ agrees with some
on all sentences in
which represent a state. Then B cannot have a better loss profile than P. As we saw in Case1 and Case2, for all
there exists a
which has a strictly better loss profile than P. This contradicts B ∈ minloss
. □
Denote by
the unique probability function in
satisfying for all
n ∈ ℕ
That is,
agrees with
on
and equivocates beyond
as much as possible while satisfying
Proposition 29. For all ϵ > 0
there exists an N ∈
such that for all n ≥ NProof. For all large enough
N ∈
and even larger
n ∈
we find
For
ϵ > 0 let
N > 2 be such that
. Then for all
n ≥ N it holds that
. For
n ≥ N large enough we now obtain
□
Having considered loss for
we now investigate loss for regular
.
Proposition 30. If is regular, then minloss
.
Proof. We will show that ≺ has no minimal element. Suppose for contradiction that B ∈
is such a minimal element.
Define a function
by
B′ and B are equally preferable according to ≺ since P (φ) = 0 for all P ∈
and all such φ.
For all
φ ∈
let
nφ be the minimal
n such that
φ ∈
. Now define a function
Binf by first letting
Put Binf(φ) := B′(φ) for all other φ ∈
. For all φ ∈
it holds that Binf(φ) ≤ B(φ). Furthermore, Binf is equally preferable to B′ according to ≺. We now consider cases to show that there is a function with a strictly better loss profile than Binf, which contradicts our assumption that B ∈ minloss
.
Case A There exists some N ∈
such that for all n ≥ N, Binf and
agree on all n-states. Since
it holds that
and hence
. Thus, for all n ≥ Nand
agree on all n-states. But then for all n ≥ N all F ⊆ Ωn and all ρ ∈ ϱn. Hence, for all P ∈
it holds that
.
From the above we have that for all n ≥ N there exists an F ⊆ Ωn such that
and such that
for some ρ. Thus, there exists some P ∈
with °P(F) > 0. Then
for this P ∈
and all n ≥ N.
Thus,
by Condition 2 of Definition 21.
Case B There exist infinitely many n ∈
where Binf and
agree on all n-states and infinitely many n ∈
many where they do not agree on all n-states.
Since
is a probability function it follows that for all n ∈
, all F ⊆ Ωn and all ρ ∈ ϱn has to hold. Now proceed as in Case A.
Case C The number of n ∈
for which Binf and
agree on all n-states is finite (possibly zero).
Case C1 There exists an infinite set J ⊆
, J = {j1, j2, … }, such that limi−→∞.
If
dominates Binf, we are done.
If
does not dominate
Binf, then define a function
B1 ∈
by letting for all
n ∈
and all
F ⊆ Ω
nand requiring that
B1 satisfies logical equivalence on
L∄. For all
φ ∈
use Gaifman’s condition to ensure that
B1 is a probability function.
Since we assumed that
does not dominate Binf holds. Furthermore, B1 dominates Binf.So, the loss profile of B1 ∈
is at least equally good as that of B.
We complete this proof by showing that
∩ minloss
.
Now suppose for contradiction that there exists a function Q ∈
∩ minloss
such that
for some n ≥ 2, i.e., Q ∉
. It needs to hold that
for all n ∈
(open-mindedness).
Let
k ≥ 2 be minimal such that
. Now define a function
R ∈
by letting for all
n > kThat is, R is the arithmetic mean of Q and
on
. Beyond
, R equivocates under the k-states which imply Ut1. For such n-states
holds. Beyond
, there are only two n-states which imply ¬Ut1 which are assigned non-zero probability,
and
.
We now show that R has a strictly better loss profile than Q what contradicts Q ∈ minloss
.
Let
∈ arg
. Trivially,
. Next note that for all
n ≥ k which are large enough it holds that
and that
We now find for all large enough
n > k that
Whenever °P (F) > 0 with F ⊆ Ωn, then °R(F) is bounded from below by
. Hence, the last term in the above sum converges to zero, since g is regular.
We now obtain the contradiction as follows: there exists some ϵ > 0 such that for all large enough
n ≥ k it holds that
We have thus shown that if
∩ minloss
, then there exists some Q ∈
∩ minloss
.
Case C1A. Then Q has infinite worst-case expected loss for all n ∈
and we are done.
Case C1B.
By open-mindedness,
has to hold.
For all n ∈
let
∈ arg
From Q ∈
we now obtain that for all large enough n there exists a probability function R ∈ arg
such that
.
Next, define a probability function Q′ ∈
where
and Q′ equivocates over Ut1,
for all n ∈
and for all 2n−1 + 1 ≤ i ≤ 2n. Assume for contradiction that Q ≠ Q′.
We next show that
Q′
≺ Q. This contradicts
Q ∈ minloss
. To this end let us note that for all large enough
nSince whenever °P (F) > 0, then °Q′(F) is bounded from below by
.
Thus, for all large enough
n we have
g is regular, hence, this last term converges to zero. We thus obtain
Since Q ≠ Q′, Q, Q′ ∈
and
, there has to exist some minimal k ∈ ℕ a minimal
i ≥ 2
k−1 + 1 such that
. We now find for all large enough
n that
Recall that there exists 0 <
a ≤ b such that for all
n ∈ ℕ
a ≤ g(
πn) ≤
b holds. Hence, there exists some constant
c > 0 such that
. From
(17) we conclude that for all large enough
nholds. Thus,
Q′ ≺
Q. So,
Q ∉ minloss
.
To complete the proof of Case C1B we show that there exists some N ∈ ℕ such that
has a strictly better loss profile than Q′.
Let
N ∈ ℕ be such that
. Analogous to the above it holds that
It hence suffices to show that there exists some ε > 0 such that for large enough
N ∈ ℕ and all
n ≥ NWe now recall that
. The required inequality follows for large enough
n ∈ ℕ
Hence,
.
Case C2 There exist an α > 0 and an minimal N1 such that for all
holds.
We may assume that
Binf is open-minded on
∄. Thus there has to exist some minimal
N ≥ N1 such that
for all
n ≥ N. For all large enough
n ≥ N we now find
Using
(18) we find for all large enough
n ∈ ℕ
□
Proposition 31. For all regular g and all ϵ > 0
there exists an N ∈ ℕ
such that for all n ≥ N Proof. Let
ϵ > 0 be fixed. By
(18) it suffices to show that there exists some
N ∈ ℕ such that for all
n ≥ N it holds that
Now simply note that we have proved this already in Proposition 29. □
Hence, for all
ϵ > 0 there exists some
N ∈ ℕ such that for all
n ≥ N and all
Q ∈
Although,
is not a minimal element of ≺, the losses incurred by adopting any other B ∈
can only be marginally better, eventually.
Thus, for fixed k and δ > 0 there exists an N ∈ ℕ such that for all
. Hence, belief functions with an arbitrarily good loss can be found within an (Euclidean) neighbourhood of
.
Since the
are probability functions, there does not exist a B ∈
which dominates
on ∃ or on ∄. Furthermore, the
are optimal according to (∀*). The
thus are almost optimal in all the senses we here considered.
In essence, the phenomenon of minloss
arises from
having a strictly better loss profile than
but the limit of the sequence
is
, which is not open-minded. This phenomenon is reminiscent of min{x ∈ ℝ : 0 < x < 1} = ∅, where it is possible to get ever closer to zero but it is impossible to reach it.
6.2. When Losses Can Be Minimised
The analysis of Section 6.1, shows that there can be no general minimax theorem which covers any evidence that is not finitely generated. On the other hand, we shall see in this section that for certain natural cases evidence which cannot be finitely generated, minimax theorems do obtain.
Let
contain only one
m-ary relation symbol,
U, and
c ∈ [0, 1]. Let
and let
be an enumeration of the remaining
n-states. We shall consider the following example:
Slightly less general versions of
have attracted recent interest in the literature [
18] (Example 3, p. 95), [
19] (Example 3.5, p. 172) and [
1] (Example 5.7, p. 99). We here consider relations symbols
U of arbitrary arity, while previously
U was taken to be unary.
First of all, if c = 0 and g is symmetric and inclusive, then P= ∈
and we immediately obtain that
and
.
We shall assume from now on that c > 0.
Proposition 32. For symmetric and inclusive g it holds that and and for all n ∈ ℕ and all 1 ≤ I ≤ |Ωn|.
Proof. For all
n ≥ 2 and symmetric and inclusive
gn it holds that
for all 1 ≤
i ≤ |Ω
n| − 2 by [
4] (Corollary 7, p. 3577). Thus, there exists some
λn ≥ 0 such that
and
for all 2 ≤
k ≤ |Ω
n|.
For all n ∈ ℕ, now define a function P1 ∈
by
. Then, define a convex combination of the equivocator on
and P1 by
.. Recall that gn is equivocator-preserving (Proposition 7) and that
is strictly concave on ℙn (Lemma 1). Thus,
for all
.
On the one hand g-entropy strictly increases with decreasing λn on the other hand
imposes the constraint
.. Let N ∈ ℕ be minimal with
Then for all n ≥ N it holds that
. and
for all 1 ≤ i ≤ |Ωn| −2.
For all
r ≥ N it follows that
Thus, for all
r ≥ N we find
Thus, for all n ∈ ℕ
and
.
We now show that
P† is indeed a probability function. We need to show that
for all
n ∈ ℕ and all
ω ∈ Ω
n:
Finally, observe that
. Hence,
. □
Proposition 33. If g = gΩ or if g is regular, then maxent
.
Proof. Let Q ∈
. For regular g, it suffices to show that there exists an N ∈ ℕ such that for all
holds.
Since
there has to exist a minimal N ∈ ℕ and an N-state ω′ ∈
such that
.
Now define a function
Q′:
S → [0, 1] by requiring that
Q′ respects logical equivalence,
Q and
Q′ agree on
SN,
Q(ω0) for all n > N all ν′ ∈ Ωn with ν′ ⊨ ω′,
for all n > N and
for all n > N and all ν ∈ Ωn \
with ν ⊭ ω′
In general,
Q′ is not a probability function because
Note that for all
holds.
We now show that for all large enough
n holds. Let us first compute
Since
we now find with
that
Since
there exists some
ϵ > 0 such that for all large enough
nThis establishes the result for g = gΩ.
We now turn to regular
g.
The last sum goes to zero since
g is regular, Corollary 6. Eventually,
is greater some
ϵ > 0 as we established in the first part of the proof. Thus, for all large enough
n ∈ ℕ and all
we have
□
Lemma 12. The following three conditions are equivalent for all large enough n ∈ ℕ
and inclusive and symmetric g
Proof. Note that for all
P ∈ ℙ
The term between the last set of brackets () does not depend on i. So,
only depends on P(ν1) but not on how P distributes probabilities among the other n-states.
For large enough N ∈ ℕ it holds that
for all 3 ≤ i ≤ |Ωn|.
Since g is symmetric, γn(F) is only a function of the size of F, |F|, it follows that every
assigns as little probability as possible to ν1. Since we require that
it follows that P′(ν1) = c.
The result for
follows as above by noting that for g = gΩ it holds that γn(ν) = 1 for all n-states ν ∈ Ωn and γn(F) = 0 otherwise. □
Adapting Joyce’s notion of truth-directedness [
14] we define:
Definition 26 (Chance-directed scoring rule). A function Ff: [0, 1] × [0, 1] → [0, +∞]
of the form Ff (
x, y) =
x · f(
y) + (1
− x)
· f(1
− y)
is called chance-directed,
if and only if for all x ∈ [0, 1],
all 0 ≤
λ < 1
and all y ∈ [0, 1]
\ {
x}
holds. For a scoring rule Ff this formalises the idea that beliefs which are closer to the chances on two mutually exclusive and exhaustive events are strictly better scored.
In particular, Ff(x, y) = −x log y − (1 − x) log(1 − y) is chance-directed. The score improves by simultaneously moving y closer to x and 1 − y closer to 1 − x.
Proposition 34. If g is regular, then all B ∈ minloss
agree with on ∄.
Proof. If c = 1, then
and maxent
follows trivially. By Theorem 5 we have that for every function
it holds that
. Thus, all B ∈ minloss
agree with
on ∄.
We now focus on 0 < c < 1.
From the above lemma we obtain
We now follow the structure of the proof of Proposition 16 for fixed 0 < c < 1. Let B ∈ minloss
.
Case1.
Case1A.
If there exists an
n ∈ ℕ such that
, then
. If there exists an
m ∈ ℕ such that
, then there has to exist some
k > m such that
Since
either such an n ∈ ℕ or such a k ∈ ℕ has to exist, possibly both exist. Overall, there has to exist some N ∈ ℕ, a
and an ϵ > 0 such that
.
For large enough
n ∈ ℕ, depending on
B,
and
c, it holds that
Since we may assume that
converges in
n to
c we now find
Whether this limit exists or not, we have thus established that for large enough
n ∈ ℕ there exists a lower bound of the sequence
which is strictly positive, since we take
N ∈ ℕ to be fixed here.
For all fixed n ∈ ℕ let
be such that
and
. Note that
for all large enough n and
for all large enough n, Lemma 12.
To simplify notation let
. With this notation we have for all large enough
n ∈ ℕ
By our standing assumption on
g (regularity), we obtain that
Rn converges to zero. We now find
Because
g(
πn) is bounded and
Rn converges to zero, we obtain for all large enough
n ∈ ℕ that
Case1B.
Case1Bi
.
Let us first note that this limit has to exist, because
is a (not necessarily strictly) decreasing sequence bounded from below by c. Let
.
Note that there has to exist some
N ∈ ℕ such that for all
n ≥
N it holds that
. For all
n ≥ N there has to exist some
such that
. Then, for all
n ≥
Nwhere the strict inequality follows from chance-directedness. We now find
where the last line follows from the fact that the standard logarithmic scoring rules is strictly proper,
i.e.,
Equation (11) holds.
Case1Bii.
Let
,
b2 exists for the same reasons
b1 exists. Note that there has to exist some
N ∈ ℕ such that for all
n ≥ N it holds that
. Using chance-directedness we find for all
n ≥ NNow proceed as in Case1Bi.
Case2\ and B respects logical equivalence on
.
Case2A There exists a
such that for all n ∈ ℕ and all F ⊆ Ωn it holds that °B(F) ≤ °PB(F).
Since
there has to exists an N ∈ ℕ and an F′ ∈ ΩN such that °B(F′) < °PB(F′).
Case2Ai and no other
is such that °B(F) ≤ °P (F) for all n and all F ⊆ Ωn. Follows as does Case2Ai in Proposition 16.
Case2Aii There exists a
such that
.
Then for all
n ≥ N and all
P ∈ [
] it holds that
. For all large enough
n ∈ ℕ it holds by Case1 that
Thus,
Case2B There does not exist a
such that for all n ∈ ℕ and all F ⊆ Ωn it holds that °B(F) ≤ °PB(F).
As in Case2B in Proposition 16 we obtain that there has to exist an α > 0 and a N ∈ ℕ such that for all n ≥ N it holds that
.
To complete the proof we will now show that there exists some β > 0, which depends on
and g but does not depend on the particular n ≥ N, such that
. Since g(πn) is bounded, we then obtain that
for all large enough n ∈ ℕ.
We show that for all large enough
n ∈ ℕ that
for all functions
f : Ω
n→ [0, 1] such that
.
The minimum obtains, if and only if
for all
ω ∈ Ω
n as we saw in Proposition 16. Thus, the minimum obtains for
and
for all other
. Let us now compute
For
n approaching infinity we find
which is strictly greater some
β > 0 as required.
Case3 and B does not respect logical equivalence on
.
Simply proceed as in Case2 in Theorem 17. □
Proof. Since all
agree with
on
, all
agree with
on
; as we noted in Proposition 20.
Recall that Theorem 8 does not depend on the particular probability function, as we stated on Page 2508. We can thus apply Theorem 8 to infer that
□