Constrained versus Unconstrained Rational Inattention

Azrieli, Yaron

doi:10.3390/g12010003

Open AccessEditor’s ChoiceArticle

Constrained versus Unconstrained Rational Inattention

by

Yaron Azrieli

Department of Economics, The Ohio State University, Columbus, OH 43210, USA

Games 2021, 12(1), 3; https://doi.org/10.3390/g12010003

Submission received: 30 October 2020 / Revised: 10 December 2020 / Accepted: 29 December 2020 / Published: 5 January 2021

(This article belongs to the Special Issue Limited Attention)

Download

Browse Figures

Versions Notes

Abstract

:

The rational inattention literature is split between two versions of the model: in one, mutual information of states and signals are bounded by a hard constraint, while, in the other, it appears as an additive term in the decision maker’s utility function. The resulting constrained and unconstrained maximization problems are closely related, but, nevertheless, their solutions differ in certain aspects. In particular, movements in the decision maker’s prior belief and utility function lead to opposite comparative statics conclusions.

Keywords:

costly information; rational inattention; Shannon entropy

1. Introduction

The Rational Inattention (RI) model was introduced to economics by Sims [1,2], and it has been widely applied since then in a variety of fields. It is based on the premise that attention is a scarce resource for decision makers, and that these decision makers optimally allocate their attention, given the environment that they face. Sims suggested that a useful way for capturing the scarcity of attention is to impose a constraint on the quantity of information that the agent can process. Specifically, the constraint is that the average reduction of entropy, from the agent’s prior belief about the state of the world to her posterior, can not exceed a given threshold.1

The follow-up literature continued for the most part to use entropy reduction in order to measure informativeness, but two different versions of the model emerged: the first, which we call the ′constrained version′, continues as in Sims to study problems of the form

{max}_{x} {f (x)}

subject to

g (x) \leq c

. Here, x is the information choice of the agent,2 the objective f maps each choice to the expected utility that it generates,

g (x)

is the expected reduction of entropy induced by x, and c is the bound on the agent’s capacity to process information. The second ′unconstrained version ′ instead analyzes maximization problems of the form

{max}_{x} {f (x) - λ g (x)}

, where

x, f, g

are the same as before and

λ

captures the marginal cost of attention.

The purpose of this note is to point out that, while the two versions of the problem are obviously closely related, their solutions differ in several important aspects. Thus, the conclusions reached when using one of these versions do not automatically transfer to the other and tests of the validity of the RI model may reach different conclusions, depending on which of the two versions is tested.

The connection between the two versions is as follows: the Lagrangian of the constrained version is given by

f (x) - λ [g (x) - c]

, where

λ

is the multiplier of the RI ′budget constraint′.3 Therefore, the first-order conditions with respect to x are the same as in the unconstrained version and, since these programs are convex, the conditions are also sufficient. Furthermore, as long as c is not too large the budget constraint binds. Therefore, x solves the constrained version if and only if (i) x solves the unconstrained version with some

λ > 0

and (ii) the constraint binds at x. In the other direction, if x solves the unconstrained version, then it also solves the constrained version with parameter

c = g (x)

.

Despite this apparent equivalence, note that, in the constrained version, the Lagrange multiplier

λ

is determined endogenously, while, in the unconstrained version, it is part of the description of the problem. This is the reason underlying the differences between the solutions. First, for a fixed decision problem, the mapping from the parameter c in the constrained version to the corresponding multiplier

λ

need not be one-to-one, i.e., there may be an interval of c values mapped to the same multiplier, say

λ^{*}

. These critical values

λ^{*}

are associated with ′regime changes′ in the unconstrained version, where the set of actions are considered by the agent shifts. When analyzing the unconstrained problem, these cases appear to be knife-edge and negligible, but, for the constrained problem, this is exactly where much of the "action" takes place. We demonstrate this phenomenon with a simple example (Section 3), and then show that it always happens in two families of decision problems (Propositions 2 and 3).

A byproduct of this observation is that some properties of the solution of the unconstrained version that have been emphasized in the literature fail to hold in the constrained version. For example, Caplin and Dean [4] show that, for the unconstrained problem, there is always a solution in which the number of posteriors (and, hence, the number of actions) chosen by the agent is, at most, the number of states. This is no longer true for the constrained problem: there may be intervals of c values at which any solution uses more actions than there are states.4 Another example is the dependence of the optimal set of posteriors on the parameter. It is easy to see that any two different values of

λ

lead to different sets of posteriors in the solution to the unconstrained version. In the constrained version, on the other hand, there may be intervals of c values where the optimal posteriors stay fixed and only the allocation of mass between them changes as c varies.

Second, as the decision problem changes the mapping from c to

λ

changes with it, which leads to reversal of known comparative statics results for the unconstrained version. Specifically, one important property of the unconstrained version is that changes in the prior do not affect the optimal set of posteriors, so long as the prior remains within the convex-hull of these posteriors. This property was termed ″locally invariant posteriors″ (LIP) by Caplin and Dean [4], and it has been experimentally tested by Dean and Neligh [6]. For the constrained version, quite the opposite is true: if the set of optimal posteriors is affinely independent, which is often the case, then changes in the prior almost always lead to changes in the optimal posteriors. See Proposition 7 for details.

On the other hand, scaling up or down the stakes of the decision problem works in the opposite way: while the solution to the unconstrained version is sensitive to such changes, for the constrained version, the solution stays the same. Indeed, scaling up the utility function has exactly the same effect as scaling down the marginal cost of attention

λ

in the unconstrained version. When

λ

changes, the solution to the unconstrained problem changes with it, as already mentioned above. However, in the constrained problem, a rescaling of utility is accompanied by a corresponding rescaling of the multiplier

λ

, and the two cancel each other.

These differences between the two versions have simple testable implications and can help to guide the modeling of rationally inattentive agents.5 For instance, Propositions 2 and 3 describe classes of decision problems, in which the two versions significantly differ in their predictions, offering a direct way to distinguish between them. Similarly, Proposition 7 on the failure of LIP in the constrained version can be used to refute the validity of this model while using experimental or empirical data.

Related Literature

This paper makes a theoretical contribution to the growing body of literature on RI, see Maćkowiak et al. [7] for a recent survey.

We work in a finite environment and make extensive use of the characterization of the solution of the unconstrained version in Matějka and McKay [8] (MM, henceforth) and Caplin et al. [9] (CDL, henceforth). For most of the analysis, we view the agent as choosing a distribution over posterior beliefs, rather than state-dependent distributions over actions, see e.g., Caplin and Dean [4] and Caplin et al. [10] for previous works using this approach.

The constrained and unconstrained versions are both extensively used in the literature. Roughly speaking, static models tend to adopt the unconstrained version, while dynamic models the constrained one, although there are exceptions in both directions.6 De Oliveira [13] axiomatizes the unconstrained version of the RI model and comments that, due to the Lagrangian connection, the constrained version behaves similarly for small variations in the menu of available acts. Matějka [11] points out that, in his model, the multiplier

λ

decreases as c increases, and Fulton ([14], Theorem 2) makes a similar observation in a continuous Gaussian framework; as we show below, in the discrete case, the relationship is monotonic, but sometimes only weakly so.

Le Treust and Tomala [5] analyze the interaction between a sender and a receiver, who communicate through a noisy channel. The receiver faces a sequence of n identical decision problems and the sender sends k messages through the channel. The main result is that, as

k, n

grow, the payoff of the sender converges to the value of the constrained version of the RI problem, with c being determined by the channel’s capacity. They then show that the number of posteriors in the solution to the constrained problem can always be chosen to be, at most, one more than the number of states, and they give an example showing that this bound is tight.7 Their example is similar to the one that is given below in Section 3. Relative to that paper, the contribution of our Propositions 2 and 3 below is to show that more actions than states in the solution is a ′robust′ phenomenon that holds for intervals of c values and, in general, classes of decision problems.

Our results may be relevant for the experimental tests of the RI model and for its estimation. Dean and Neligh [6] use the testable implications that were identified by Caplin and Dean [17] and by Caplin et al. [10] in order to study whether subjects’ choices are consistent with models of costly information acquisition in general, and with the RI unconstrained model in particular. One of their findings is that subjects pay more attention (more likely to make the right choice) when the stakes are higher, which is consistent with the unconstrained version of the RI model, but not with the constrained version (see Proposition 6). Dewan and Neligh [18] observe a similar kind of behavior by most subjects (60%) in their experiment; however, note that many subjects were non-responsive to scaling up of the incentives, which suggests that the constrained model may better fit a significant fraction of the population.8

Dean and Neligh [6] also test the LIP property and find that it generally holds. In view of Proposition 7, this is another indication that the unconstrained model does a better job in explaining the data. It would be interesting to see whether LIP holds more generally in other kinds of decision problems and with other implementations of attention costs.

Finally, Cheremukhin et al. [19] use laboratory data in order to estimate a hybrid model that includes the two versions considered in this paper as special cases. The behavior of approximately 70% of their subjects is better described by an additive cost term than by a capacity constraint. This further suggests that the unconstrained model is a better fit for most decision makers, but, at the same time, that one should not dismiss the constrained version as irrelevant.

2. Two Versions of the RI Problem

For the most part, we follow the notation of CDL in order to facilitate an easy comparison. There is a finite set

Ω

of states, with

ω \in Ω

denoting a typical state. The prior belief of the decision maker (DM) is

μ \in Δ (Ω)

, where

Δ (X)

is the set of probability distributions over any finite set X. We assume that

μ

assigns positive probability to every

ω \in Ω

. The finite set of available actions is

A

. For each pair

(a, ω) \in A \times Ω

, the utility of the DM when she chooses action a and the realized state is

ω

is denoted by

u (a, ω)

. A decision problem is described by the triplet

(μ, A, u)

.

Throughout, we restrict attention to decision problems that satisfy two assumptions: first, actions are not duplicates, i.e.,

u (a, ω) \neq u (a, ω)

for some

ω

whenever

a \neq a^{'}

. Second, different actions are optimal in different states, i.e., if

ω \neq ω^{'}

, then

{arg max}_{a} u (a, ω) \cap {arg max}_{a} u (a, ω^{'}) = \emptyset

. The first assumption is purely for expositional reasons. As for the second, all of our results still hold without this assumption, but the upper end of the range of the cost parameter c for which they hold may decrease.

The DM chooses an information structure, i.e., a mapping from states to distributions over some set of signals, as well as which action to play after observing each signal. However, it is without loss ([8], Lemma 1, e.g.) to restrict attention to information structures with at most one signal per action in

A

and to identify signals with the actions that they induce. Therefore, the choice variable is a mapping

P : Ω \to Δ (A)

, where

P (a | ω)

is the probability of action a conditional on state

ω

. With slight abuse of notation, P also denotes the unconditional probability of actions, which is

P (a) = \sum_{ω} μ (ω) P (a | ω)

. Following CDL, the consideration set of P is

B (P) = {a \in A : P (a) > 0}

.

In order to state the problem, it is useful to introduce one more piece of notation. If

p \in Δ (X)

for some finite set X, then

H (p) = - \sum_{x \in X} p (x) ln p (x)

is the entropy of p. We will use H for distributions over

A

as well as over

Ω

.

In their papers, MM and CDL consider the following maximization problem, where

λ > 0

is an exogenous parameter:

max_{P} \sum_{ω} μ (ω) \sum_{a} P (a | ω) u (a, ω) - λ [H (P (\cdot)) - \sum_{ω} μ (ω) H (P (\cdot | ω))] .

(*)

The first term is the expected utility that the DM obtains by conditioning her choice on the observed signal, while the second term in parentheses is the expected reduction of entropy from the marginal distribution to the state-contingent distribution of actions, which captures the cost of attention.

We compare program (*) with the following constrained maximization problem with parameter

0 < c < H (μ)

:9

max_{P} \sum_{ω} μ (ω) \sum_{a} P (a | ω) u (a, ω)

(**)

s . t . H (P (\cdot)) - \sum_{ω} μ (ω) H (P (\cdot | ω)) \leq c .

(1)

Here the objective only includes the benefit of receiving information, and the constraint requires that the expected reduction in entropy does not exceed the ′budget of attention′ c. As mentioned above, this second formulation corresponds to the original RI problem that was introduced by Sims [2].

The relationship between programs of the form (*) and (**) is well-understood in general. While

λ

is part of the input in the former, it is the Lagrange multiplier of the constraint in the later, and, therefore, is endogenously determined. The following proposition formalizes this connection.

Proposition 1.

The mapping P solves program (**) if and only if

(i): P satisfies the budget constraint (*) with equality; and,
(ii): P solves (*) with some $λ > 0$ .

The fact that the budget constraint necessarily binds for any

c \in (0, H (μ))

is a consequence of our assumption that different actions are optimal at different states; a proof is provided in Appendix A. Once this is established, and given that these are convex programs, the rest of the proof easily follows from the KKT theorem ([20], Corollary 28.3.1, e.g.), and, therefore, is omitted.

Let

Z (a, ω, λ) = exp (\frac{u (a, ω)}{λ})

. MM and CDL prove that P solves (*) if and only if it satisfies

P (a | ω) = \frac{P (a) Z (a, ω, λ)}{\sum_{b \in A} P (b) Z (b, ω, λ)}

(2)

for every a and

ω

, and, in addition

\sum_{ω} μ (ω) (\frac{Z (a, ω, λ)}{\sum_{b \in A} P (b) Z (b, ω, λ)}) \leq 1

(3)

for every a.

It is often more convenient to work with the distribution over posteriors that are induced by P than with P itself. Namely, instead of choosing

P : Ω \to Δ (A)

, we can equivalently think of the DM as choosing the unconditional probabilities of actions

{P (a)}_{a \in A}

and the posteriors

{γ^{a} \in Δ (Ω)}_{a \in B (P)}

subject to

\sum_{a} P (a) γ^{a} (ω) = μ (ω)

for every

ω \in Ω

.10 CDL show that conditions (2) and (3) can be rewritten as

\frac{γ^{a} (ω)}{Z (a, ω, λ)} = \frac{γ^{b} (ω)}{Z (b, ω, λ)}

(4)

for every

a, b \in B (P)

,

ω \in Ω

, and

\sum_{ω} γ^{a} (ω) \frac{Z (b, ω, λ)}{Z (a, ω, λ)} \leq 1

(5)

for every

a \in B (P)

and

b \in A

. Furthermore, the budget constraint (1), which must bind at the optimum, can be rewritten as

H (μ) - \sum_{a \in B (p)} P (a) H (γ^{a}) \leq c .

(6)

3. An Example

We now illustrate the differences between the solutions of the two problems with an example; in the next section, we show that these differences hold more generally. Let

Ω = {ω_{0}, ω_{1}}

. Because

Ω

only has two elements, we identify

Δ (Ω)

with the

[0, 1]

interval and describe its elements by the probability that the state is

ω_{1}

. The set of actions is

A = {l, m, r}

standing for left, middle, and right. The following table provides the utility function:

	l	m	r
$ω_{0}$	1	0	−2
$ω_{1}$	−2	0	1

Thus, m is a safe action with a sure payoff of zero; l and r are risky actions, where l gives a high payoff at the ′left′ state

ω_{0}

and r gives a high payoff at the ′right′ state

ω_{1}

. Note that l is optimal for beliefs

γ \in [0, 1 / 3]

, m is optimal for

γ \in [1 / 3, 2 / 3]

, and r is optimal for

γ \in [2 / 3, 1]

. See Figure 1.

We use the distribution over posteriors

\{(P (l), P (m), P (r)), {γ^{a}}_{a \in B (P)}\}

for the analysis, as this makes it easier to visualize the solution. We break the analysis to three different cases, depending on the location of the prior

μ \in (0, 0.5]

, and, for each case, compare the solutions of (**) and (*).11 Figure 2, Figure 3 and Figure 4 illustrate the solutions of the three cases, while proofs of the claims can be found in the Appendix A.

In order to describe the solution, it is useful to introduce additional notation. First, let

λ^{*} = \frac{1}{ln (\frac{1 + \sqrt{5}}{2})} \approx 2.078

and

γ^{*} = {[1 + exp (\frac{3}{λ^{*}})]}^{- 1} = \frac{1}{3 + \sqrt{5}} \approx 0.191

. Second, define the function

\bar{λ} (μ)

by

\begin{matrix} \bar{λ} (μ) = \{\begin{matrix} 3 {[ln (\frac{1 - μ}{μ})]}^{- 1} & if 0 < μ < γ^{*}, \\ {[ln (\frac{- μ + \sqrt{μ (4 - 3 μ)}}{2 μ})]}^{- 1} & if γ^{*} \leq μ < \frac{1}{3}, \\ + \infty & if μ = \frac{1}{3} \\ {[ln (\frac{μ + \sqrt{μ (4 - 3 μ)}}{2 (1 - μ)})]}^{- 1} & if \frac{1}{3} < μ \leq \frac{1}{2} . \end{matrix} \end{matrix}

Case 1: $μ = 0.5$

For problem (*), the solution is as follows. If

λ > λ^{*}

then

P (m) = 1

and

γ^{m} = 0.5

. If

0 < λ < λ^{*}

then

P (l) = P (r) = 0.5

and

γ^{l} = 1 - γ^{r} = {[1 + exp (\frac{3}{λ})]}^{- 1}

. If

λ = λ^{*}

, then any mixture (including degenerate) of the former two solutions is optimal. Thus, if the cost parameter

λ

is high, then the DM chooses m for sure without seeking any information; once

λ

falls below the threshold level

λ^{*}

, the DM chooses sufficiently informative symmetric signals, so that she ends up choosing either l or r for sure; only at the cutoff

λ^{*}

there is the possibility that all three actions are considered, and even in this case

B (P) = {m}

and

B (P) = {l, r}

are still optimal. The value of

λ^{*}

is determined by the condition that the ′net utilities′ (see [9]), i.e., the difference between expected utility and cost, of the three associated posteriors

0.5, γ^{*}, 1 - γ^{*}

are equal.

Moving onto problem (**), first note that it can never (for no

c > 0

) be optimal to choose no information, since, by Proposition 1, the budget constraint must bind. When c is small, specifically

0 < c < H (0.5) - H (γ^{*})

, the solution is given by

P (l) = P (r) = \frac{c}{2 [H (0.5) - H (γ^{*})]}

,

P (m) = 1 - \frac{c}{H (0.5) - H (γ^{*})}

,

γ^{m} = 0.5

, and

γ^{l} = 1 - γ^{r} = γ^{*}

. That is, for c in this interval, we have

B (P) = {l, m, r}

and, as c increases, the posteriors stay constant while more probability is shifted from the middle

γ^{m} = 0.5

to the extremes

γ^{l} = γ^{*}

and

γ^{r} = 1 - γ^{*}

. In light of Proposition 1, this is possible, because, for all c in this range, the corresponding value of the lagrange multiplier is

λ^{*}

. However, in contrast to the previous paragraph, it is strictly beneficial here to the DM to move as much mass as possible to the extreme posteriors, since the cost does not directly enter the objective. For larger values of c, specifically,

H (0.5) - H (γ^{*}) \leq c < H (0.5)

, the solution is given by

P (l) = P (r) = 0.5

with the posteriors

γ^{l} = 1 - γ^{r}

determined by the equation

H (0.5) - H (γ^{l}) = c

.

Figure 2 illustrates the consideration sets of the two solutions. The figure also shows the mapping from c to the corresponding value of

λ

at the optimum of (**). While this mapping is weakly decreasing (a higher c implies weakly lower

λ

), it is neither one-to-one nor onto the entire range of

λ

’s.

Case 2: $γ^{*} < μ < 0.5$

Figure 3 illustrates the solution for this case. In (*), when

λ < λ^{*}

, the solution is similar to the previous case

μ = 0.5

: the consideration set is

{l, r}

, the posteriors are

γ^{l} = 1 - γ^{r} = {[1 + exp (\frac{3}{λ})]}^{- 1}

, and the probabilities are set to satisfy

P (l) γ^{l} + P (r) γ^{r} = μ

. At

λ = λ^{*}

, again, the solution is not unique, with

B (P) = {l, r}

,

B (P) = {l, m}

, and

B (P) = {l, m, r}

all possible. When

λ^{*} < λ < \bar{λ} (μ)

,

B (P) = {l, m}

, the posteriors are

γ^{l} = {[1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})]}^{- 1}

,

γ^{m} = exp (\frac{2}{λ}) γ^{l}

, and the probabilities are adjusted, so that

P (l) γ^{l} + P (m) γ^{m} = μ

. Finally, for

λ \geq \bar{λ} (μ)

, it is optimal to obtain no information, and the consideration set is either

{l}

or

{m}

when

μ

is below or above

1 / 3

, respectively.12

In (**), when c is small, the consideration set is

{l, m}

. The posteriors

γ^{l}, γ^{m}

satisfy

γ^{l} {(1 - γ^{l})}^{2} = γ^{m} {(1 - γ^{m})}^{2}

(this follows from (4)) and, in addition, we need that

P (l) γ^{l} + P (m) γ^{m} = μ

and that

H (μ) - [P (l) H (γ^{l}) + P (m) H (γ^{m})] = c

. These three equations, combined, pin down the solution. Once c is sufficiently large, so that all three actions can be considered,13 the consideration set becomes

{l, m, r}

, the posteriors are fixed at

γ^{m} = 0.5, γ^{l} = 1 - γ^{r} = γ^{*}

, and the probabilities

(P (l), P (m), P (r))

adjust, so that the expected posterior is equal to

μ

and the cost is equal to c. For even larger c, namely

c \geq H (μ) - H (γ^{*})

, the consideration set is

{l, r}

and the posteriors satisfy

γ^{l} = 1 - γ^{r}

.

Case 3: $0 < μ \leq γ^{*}$

In (*), for

λ < \bar{λ} (μ)

, we have

B (P) = {l, r}

,

γ^{l} = 1 - γ^{r} = {[1 + exp (\frac{3}{λ})]}^{- 1}

, and

(P (l), P (r))

determined by

P (l) γ^{l} + P (r) γ^{r} = μ

. For

λ > \bar{λ} (μ)

, the solution is

B (P) = {l}

, i.e., no information.

In (**), for any

0 < c < H (μ)

, the consideration set is

B (P) = {l, r}

and

γ^{l} = 1 - γ^{r}

. These posteriors are determined by the equation

H (μ) - H (γ^{l}) = c

, and the probabilities

P (l), P (r)

are then determined by the equation

P (l) γ^{l} + P (r) γ^{r} = μ

. See Figure 4.

4. Model Comparison

In this section, we state several results regarding the relationship and differences between the two versions of the RI problem. These results generalize the insights that are gained from the above example. All of the proofs not in the main text can be found in the Appendix A.

We start with the following lemma that describes the correspondence between c and

λ

. This correspondence is key for the subsequent analysis.

Lemma 1.

Fix a decision problem

(μ, A, u)

. For every

c \in (0, H (μ))

, there is a unique

λ > 0

, denoted

λ (c)

, such that every solution to (**) with c solves (*) only with

λ (c)

. The mapping

c \to λ (c)

is continuous and (weakly) decreasing on

(0, H (μ))

, and

{lim}_{c \to H (μ) -} λ (c) = 0

.

Lemma 1 implies that the choice consistent with optimization in problem (**) with c can be rationalized as optimal behavior in (*) only for one value of

λ

, namely

λ (c)

. The continuity and monotonicity of

λ (c)

imply that its image contains the entire interval

({lim}_{c \to H (μ) -} λ (c), {lim}_{c \to 0 +} λ (c)) = (0, {lim}_{c \to 0 +} λ (c))

. Therefore, for every

λ

in this interval, the optimal behavior in (*) can be rationalized as optimal behavior in (**) with some c. However, as illustrated in the example, for a given

λ

, there may be multiple solutions that correspond to different values of c, so a solution to (*) with

λ (c)

need not be optimal (or even feasible) for (**) with c.

4.1. Consideration Sets and Optimal Posteriors

Perhaps the most apparent difference between the solutions of (*) and (**) in the example is that there is an interval of c values for which the solution of (**) has all three actions

{l, m, r}

considered, while, in (*), this can only happen at a single point

λ^{*}

, and, even at

λ^{*}

, there are other solutions that only involve subsets of actions. It is well-known [4] that this feature of the solution of (*) is true in general: in every decision problem, there is always a solution in which the size of the consideration set is, at most,

| Ω |

. We now show that, in several cases of interest, the solution of (**) behaves quite differently. Thus, the example of Section 3 is in no way special.

The first result generalizes the example to any decision problem with a binary state-space, at least three actions, and a not-too-extreme prior.

Proposition 2.

Let

Ω = {ω_{0}, ω_{1}}

and denote, by

a_{i}

, the (unique) optimal action at

ω_{i}

(i = 0, 1)

. If neither

a_{0}

nor

a_{1}

are optimal at the prior μ, then there is

λ^{*} > 0

and

0 < \underset{̲}{c} < \bar{c} < H (μ)

, such that, for every

c \in (\underset{̲}{c}, \bar{c})

(i)

λ (c) = λ^{*}

and (ii)

| B (P) | \geq 3

for every P that solves (**).

In the next proposition, we consider a class of decision problems similar to the one analyzed in (Caplin et al. [9], Section 3.1), but with an additional action corresponding to the outside option of the DM. Let

Ω = {ω_{1}, \dots, ω_{m}}

. Consider the decision problem in which

A = {a_{1}, \dots, a_{m}, o}

, and the utility function is given by

\begin{matrix} u (a_{i}, ω_{j}) = \{\begin{matrix} 1 & if i = j, \\ 0 & if i \neq j, \end{matrix} \end{matrix}

and

u (o, ω_{i}) = t

for every

1 \leq i \leq m

, where

\frac{m - 1}{m} < t < 1

. Thus, if the DM correctly guesses the state, then her payoff is 1, while any wrong guess yields a payoff of 0. In addition, the safe choice o guarantees a payoff of t.14 For this class of problems, we obtain a similar result to that of Proposition 2.

Proposition 3.

Consider a decision problem, as described above, and suppose that the prior μ satisfies

1 - t < μ (ω_{i})

for every i. Then there is

λ^{*} > 0

and

0 < \underset{̲}{c} < \bar{c} < H (μ)

, such that, for every

c \in (\underset{̲}{c}, \bar{c})

(i)

λ (c) = λ^{*}

and (ii)

B (P) = A

for every P that solves (**).

The assumption that

1 - t < μ (ω_{i})

for each i guarantees that the optimal action at the prior

μ

is the outside option o, and that

μ

is centrally located, in the sense that it is not in the convex-hull of any collection of

m - 1

posteriors at which the

a_{i}

actions are optimal.

The intuition for the last two propositions is similar to the one in the example: when

λ

is small, the DM obtains precise information guaranteeing that one of the ′extreme′ actions (the

a_{i}

’s) will be selected. When

λ

is relatively large, the DM may seek some information, but it will often end up at a posterior in which a safer action is optimal (o in Proposition 3). The transition between these two regimes happens at

λ^{*}

. Moreover, at

λ^{*}

, mixtures of these two types of solutions are also optimal, so there is a range of c values mapped to

λ^{*}

and both types of actions are considered.

Remark 1.

Another known property of (*) is that the same set of posteriors can not be a solution with two different values of λ, while assuming that some information is obtained (this immediately follows from condition (4) above). As shown in the example, this is not true for the parameter c in (**): there is an interval of c values, such that the chosen posteriors are fixed, and only the allocation of mass over these posteriors changes as c varies. The proofs of Propositions 2 and 3 make it clear that the same is also true in these families of decision problems.

An ′Anything Goes′Result

We end this section by arguing that the testable implications of the constrained model are limited if the analyst does not know the DM’s utility function and prior. Namely, any finite set of posteriors that is not convex independent can arise as the solution to problem (**) for an interval of c values in some decision problem. Note that the set of posteriors can be arbitrarily large.

Proposition 4.

Fix Ω, let

n \geq 3

be an arbitrary integer, and consider a collection

Γ = {{\tilde{γ}}^{i}}_{i = 1}^{n}

of distinct elements in the relative interior of

Δ (Ω)

. If Γ is not convex independent (i.e., if there is

{\tilde{γ}}^{i}

in the convex-hull of

Γ ∖ {{\tilde{γ}}^{i}}

), then there is a decision problem

(μ, A, u)

and

0 < \underset{̲}{c} < \bar{c} < H (μ)

such that for every

c \in (\underset{̲}{c}, \bar{c})

(i) there is a solution of (**), in which the set of posteriors is Γ; and (ii) if

γ \notin Γ

, then γ is not part of any solution of (**).

A couple of comments are in order. First, property (ii) of the proposition guarantees that optimal posteriors must be in

Γ

; in particular, the decision problem is non-trivial in the sense that not every P is optimal. Second, while we know that the set

Γ

is also a solution to the unconstrained problem for some

λ

, Remark 1 above implies that it is not robustly so in the sense that arbitrarily small changes in

λ

would change the optimal set of posteriors. This is in contrast to the constrained version, in which

Γ

remains optimal for all

c \in (\underset{̲}{c}, \bar{c})

.

4.2. Comparative Statics

4.2.1. Locally Variant Posteriors

The "locally invariant posteriors" (LIP) property [4] states that changes in the prior do not affect the optimal set of posteriors for (*) whenever this set is still feasible. Dean and Neligh [6] experimentally test this property and find that it is generally satisfied.

In problem (**), on the other hand, arbitrarily small changes in the prior typically induce different sets of optimal posteriors. Consider the example of Section 3 with some given prior

μ

, and suppose that c is large enough, so that the optimal consideration set is

{l, r}

. The optimal posteriors satisfy

γ^{l} = 1 - γ^{r}

in this case. Therefore, by the symmetry of H, the cost of an optimal P is given by

H (μ) - [P (l) H (γ^{l}) + P (r) H (γ^{r})] = H (μ) - H (γ^{l})

. Because the budget constraint binds, we must have

H (μ) - H (γ^{l}) = c

. Therefore, if c is fixed and the prior

μ

changes, then the optimal posteriors

γ^{l}, γ^{r}

must also change.

The reason for the failure of LIP is clear: suppose that we fix a set of affinely independent posteriors

\{γ^{a}\}

. For

μ

in the convex-hull of this set, say

μ = \sum_{a} P (a) γ^{a}

, the cost associated with the choice of these posteriors is

H (μ) - \sum_{a} P (a) H (γ^{a})

. The first term is strictly concave in

μ

, while the second is linear in

μ

(since the vector

{P (a)}

changes linearly with

μ

). Therefore, with a fixed set of posteriors, the cost is a strictly concave function of the prior

μ

, which implies that changes in the prior typically lead to variation in the cost. Because the budget constraint binds, the set of posteriors associated with an optimal solution must adjust to keep the cost unchanged. Therefore, we have the following.

Proposition 5.

Consider a decision problem

(μ, A, u)

and parameter c, such that (**) has a unique solution P with posteriors

{\{γ^{a}\}}_{a \in B (P)}

that are affinely independent. Then the set of priors

μ^{'}

for which the same set of posteriors

{\{γ^{a}\}}_{a \in B (P)}

is optimal for decision problem

(μ', A, u)

with parameter c is a nowhere dense subset of

Δ (Ω)

.15

Another way to think about the failure of LIP is that, under the conditions of the proposition,

λ (c)

for decision problem

(μ, A, u)

is usually different than

λ (c)

for

(μ^{'}, A, u)

. When

λ

changes, the optimal set of posteriors in (*) changes with it, as mentioned in the previous subsection.

We note that the assumption of affinely independent posteriors can not be dispensed with. Indeed, going back to the example, consider the case

μ = 0.5

and c not too large, so that the solution to (**) has

B (P) = {l, m, r}

. For

μ^{'}

close to

μ

and the same c, the solution to (**) still has

B (P) = {l, m, r}

. Additionally, when all three actions are considered, the posteriors must be

γ^{l} = γ^{*}

,

γ^{m} = 0.5

, and

γ^{r} = 1 - γ^{*}

.

Finally, the difference between the unconstrained and constrained versions of the RI model that are exhibited in Proposition 7 is much more general than in the case where cost is measured by a reduction of entropy. Indeed, both the LIP property in the unconstrained version and its failure in the constrained version continue to hold so long as cost is measured by the expected reduction in the value of a strictly concave function of posteriors.

4.2.2. Utility Scaling

While the solution of problem (**) is sensitive to changes in the prior, it does not change when the stakes of the decision problem are scaled up or down.

Proposition 6.

Consider a decision problem

(μ, A, u)

and suppose that P solves (**) with parameter c. Then P is also a solution of (**) with parameter c in the decision problem

(μ, A, ρ \cdot u)

for every

ρ > 0

.

The argument is straightforward: since P is optimal in

(μ, A, u)

, it follows from Proposition 1 that P solves (*) for this decision problem with some

λ

. Therefore, P also solves (*) for the scaled decision problem

(μ, A, ρ \cdot u)

with

λ^{'} = ρ \cdot λ

. Because the cost of P does not change with the problem, Proposition 1 implies that P also solves (**) for

(μ, A, ρ \cdot u)

with the same c.

Notice that stakes do matter in the unconstrained version: scaling up the utility has the exact same effect as scaling down the marginal cost of information

λ

, which, as already discussed, necessarily changes the solution.

4.3. Optimality of ′No Information′

For large values of

λ

, the solution of (*) often involves the DM choosing not to be informed at all, as demonstrated by the example. Indeed, that was the case for any prior

μ

, except for

μ = \frac{1}{3}

. On the other hand, since the budget constraint must bind, choosing no information can not be optimal in (**) for any

c > 0

.

In light of Lemma 1, this gap between the two versions occurs if and only if the limit

\bar{λ} : = {lim}_{c \to 0 +} λ (c)

is finite. Indeed, there is a zero-cost (i.e., uninformative) solution to (*) with

λ

if and only if

λ \geq \bar{λ}

. In the next proposition, we characterize those decision problems for which

\bar{λ}

is finite and show that this is the ′typical′ case.

Formally, we say that P is uninformative if

P (\cdot | ω)

is the same for all

ω \in Ω

, or, equivalently, if the posterior is equal to the prior with probability 1. Additionally, given

(A, u)

, say that the prior

μ

is an indifference point if there are two different actions

a, a^{'} \in A

, such that both a and

a^{'}

are optimal at belief

μ

, i.e.,

{a, a^{'}} \subseteq \arg \max_{b} \sum_{ω} μ (ω) u (b, ω)

. Note that, for a given

(A, u)

, the set of indifference points is "small" in

Δ (Ω)

, e.g., it is nowhere dense and has Lebesgue measure zero in

Δ (Ω)

viewed as a subset of

R^{| Ω |} - 1

.

Proposition 7.

Suppose that μ is not an indifference point of

(A, u)

and let

a^{*}

be the (unique) optimal action given belief μ. Then the limit

\bar{λ} = {lim}_{c \to 0 +} λ (c)

is finite and, for every

λ > \bar{λ}

, the unique solution to (*) is given by

P (a^{*} | ω) = 1

for all ω. If μ is an indifference point, then

\bar{λ} = + \infty

and uninformative P’s are never optimal.

Intuitively, when

λ

is large condition (4) that characterizes the solution to (*) requires that the posteriors are close to each other and, therefore, close to the prior

μ

. Thus, if

μ

is not an indifference point, then, for large enough

λ

, the same action that is optimal at

μ

is also optimal at all posteriors. But then obtaining no information yields the same expected utility at a lower cost. Conversely, when

μ

is an indifference point, the marginal value of little information is positive, since it allows the DM to learn which of the a-priory optimal actions is better. The marginal cost of little information is zero due to the smoothness of the entropy function. Therefore, it is not optimal to obtain no information.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proofs

Appendix A.1. Notation

The following notation and remarks will be used in several of the proofs below. We use P to denote the state-dependent stochastic choice of the DM, i.e.,

P : Ω \to Δ (A)

. The distribution over posteriors induced by P is denoted by

Γ (P)

. Recall that the support of

Γ (P)

is the collection

{γ^{a}}_{a \in B (P)}

, where

γ^{a} (ω) = \frac{μ (ω) P (a | ω)}{P (a)}

, and

P (a) = \sum_{ω} μ (ω) P (a | ω)

is the probability assigned to posterior

γ^{a}

. Given

P, P^{'}

and

α \in [0, 1]

, the mixture of the induced distributions over posteriors

α Γ (P) + (1 - α) Γ (P^{'})

is defined as usual: The support of the mixture is the union of the supports of

Γ (P)

and

Γ (P^{'})

, and the probability of each

γ

in the support is the corresponding average of probabilities of

γ

in the two distributions.

Remark A1.

Suppose that P solves either (*) or (**). If

a \in B (P)

, then choosing a must be optimal given belief

γ^{a}

, that is

a \in {arg max}_{b} \sum_{ω} γ^{a} (ω) u (b, ω)

. Indeed, if that was not the case then choosing the same distribution over posteriors but playing some action from the arg max would increase the utility of the DM without changing the cost of information. Furthermore, a must be the unique optimal action given belief

γ^{a}

. Indeed, if there are multiple optimal actions at some induced posterior, then this is inconsistent with optimality, as shown in the proof of Proposition 7.

Remark A2.

Given P and

P^{'}

, if

a \in B (P) \cap B (P^{'})

and the posteriors

γ^{a}

induced by P and

γ^{' a}

induced by

P^{'}

are not equal, then the mixture

α Γ (P) + (1 - α) Γ (P^{'})

has two different posteriors associated with the same action a. In particular, if P and

P^{'}

are both optimal in (*), and if

a \in B (P) \cap B (P^{'})

, then

γ^{a} = γ^{' a}

. Indeed, if P and

P^{'}

are optimal, then so is any mixture of

Γ (P)

and

Γ (P^{'})

.

Given a decision problem

(μ, A, u)

, we write

V (P) = \sum_{ω} μ (ω) \sum_{a} P (a | ω) u (a, ω)

for the expected utility that the DM obtains with choice P. Notice that this can also be expressed using

Γ (P)

as

V (P) = \sum_{a} P (a) \sum_{ω} γ^{a} (ω) u (a, ω) .

Similarly, we write

C (P) = H (P (\cdot)) - \sum_{ω} μ (ω) H (P (\cdot | ω)) = H (μ) - \sum_{a} P (a) H (γ^{a})

for the cost of P. Thus, problem (*) can be written as

{max}_{P} {V (P) - λ C (P)}

, and problem (**) as

{max}_{P} {V (P)}

subject to

C (P) \leq c

.

Appendix A.2. Proofs of Propositions

Proof of Proposition 1.

Fix a decision problem

(μ, A, u)

and

c \in (0, H (μ))

. We prove here that if P is optimal for (**) then the budget constraint (1) binds. As explained in the text, the rest of the argument is a standard application of the KKT theorem.

For each

ω \in Ω

let

A^{*} (ω) = {arg max}_{a} u (a, ω)

. Denote

\bar{V} = \sum_{ω} μ (ω) u (a^{*} (ω), ω)

, where

a^{*} (ω) \in A^{*} (ω)

. Then a choice P gives the DM expected utility of

\bar{V}

if and only if

\sum_{a \in A^{*} (ω)} P (a | ω) = 1

for every

ω

. Our assumption that different actions are optimal at different states means that

A^{*} (ω) \cap A^{*} (ω^{'}) = \emptyset

whenever

ω \neq ω^{'}

. It follows that if P achieves

\bar{V}

then P reveals the realized state with probability one, and therefore that

C (P) = H (μ)

.

Now, suppose that P is such that the constraint is slack,

C (P) < c

. Because

c < H (μ)

it follows from the previous paragraph that

V (P) < \bar{V}

. For

ϵ > 0

define

P (ϵ) = ϵ \bar{P} + (1 - ϵ) P

, where

\bar{P}

satisfies

V (\bar{P}) = \bar{V}

. Then

V (P (ϵ)) = ϵ \bar{V} + (1 - ϵ) V (P) > V (P)

, and since C is continuous in P we have that

C (P (ϵ)) < c

for

ϵ

small enough, so

P (ϵ)

is feasible. This shows that P is not a solution for (**). □

Proof of Lemma 1.

Fix c and let P be a solution to (**). By Proposition 1 the budget constraint binds, implying that P must be informative. In particular, there must be two actions

a \neq b

such that

a, b \in B (P)

. Moreover, Proposition 1 implies that there is

λ > 0

, such that conditions (4) and (5) are satisfied. From (4) we have that for every

ω

\frac{γ^{a} (ω)}{Z (a, ω, λ)} = \frac{γ^{b} (ω)}{Z (b, ω, λ)},

or, after rearranging,

\frac{γ^{a} (ω)}{γ^{b} (ω)} = \frac{Z (a, ω, λ)}{Z (b, ω, λ)} = exp (\frac{u (a, ω) - u (b, ω)}{λ}) .

Because we assumed that actions are not duplicates, there exists some

ω

at which

u (a, ω) - u (b, ω) \neq 0

, which implies that the right-hand side of the last equation is strictly monotone in

λ

. It follows that

λ

is pinned down uniquely by P. Denote this

λ

by

λ (c)

.

Now, suppose that

P^{'}

also solves (**) with the same c. By Proposition 1

P^{'}

solves (*) with some

λ^{'} > 0

. We claim that it must be the case that

λ^{'} = λ (c)

. Indeed

V (P^{'}) = V (P)

since they are both optimal in (**), and, since the budget constraint binds,

C (P^{'}) = c = C (P)

. This implies that P is also optimal for (*) with

λ^{'}

, so by the previous paragraph

λ^{'} = λ (c)

.

Next, we prove that

λ (c)

is (weakly) decreasing. Let

c > c^{'}

and suppose per absurdum that

λ (c) > λ (c^{'})

. Let P be optimal for (**) with c and

P^{'}

optimal for (**) with

c^{'}

. Then

\begin{matrix} V (P) - V (P^{'}) \geq λ (c) [C (P) - C (P^{'})] = λ (c) [c - c^{'}] > λ (c^{'}) [c - c^{'}] = λ (c^{'}) [C (P) - C (P^{'})], \end{matrix}

where the first inequality follows from P being optimal for (*) with

λ (c)

, the next equality holds since the budget constraint in (**) binds, the strict inequality is by the assumptions that

c > c^{'}

and

λ (c) > λ (c^{'})

, and the last equality is again by the binding budget constraint. Rearranging gives

V (P) - λ (c^{'}) C (P) > V (P^{'}) - λ (c^{'}) C (P^{'}),

contradicting the optimality of

P^{'}

for (*) with

λ (c^{'})

.

Finally, we show that the image of

λ (c)

contains the entire open interval

(0, {lim}_{c \to 0 +} λ (c))

. Combining this with the monotonicity proved above implies both continuity and

{lim}_{c \to H (μ) -} λ (c) = 0

. Suppose that

\tilde{λ}

is in this interval. Let P be optimal for (*) with

\tilde{λ}

. Notice that

C (P) = 0

is impossible, since that would imply that any

λ > \tilde{λ}

also has an uninformative solution, contradicting the assumption that

\tilde{λ} < {lim}_{c \to 0 +} λ (c)

. It also can not be that

C (P) = H (μ)

since that would require the posteriors to be at the vertices of the simplex, contradicting (4). Thus,

0 < C (P) < H (μ)

and it follows from Proposition 1 that P is also optimal for (**) with

c = C (P)

. Therefore,

\tilde{λ} = λ (C (P))

, i.e.,

\tilde{λ}

is in the image. This completes the proof. □

Proof of Proposition 2.

Because there are only two states, we identify distributions over

Ω

with the probability they assign to

ω_{1}

. Define

Λ

to be the set of all

λ > 0

, such that there exists a solution P to (*) satisfying

B (P) = {a_{0}, a_{1}}

. We break the proof into several claims.

Claim 1.

The set

Λ

is non-empty and bounded from above.

Proof.

If P is optimal and

a, b \in B (P)

then by (4) we have

\frac{γ^{a}}{γ^{b}} = exp (\frac{u (a, ω_{1}) - u (b, ω_{1})}{λ})

and also

\frac{1 - γ^{a}}{1 - γ^{b}} = exp (\frac{u (a, ω_{0}) - u (b, ω_{0})}{λ})

. As

λ \to 0

we either get

γ^{b} \to 0

and

γ^{a} \to 1

, or vice versa.16 From Remark A1, a and b must be optimal given beliefs

γ^{a}

and

γ^{b}

, respectively. This is only possible if

{a, b} = {a_{0}, a_{1}}

when

λ

is sufficiently small. It is also clear that choosing ′no information′ is not optimal for

λ

small enough, since it is not optimal at

λ = 0

. Thus, every sufficiently small

λ > 0

is in

Λ

.

On the other hand, as

λ \to + \infty

the ratio

\frac{γ^{a}}{γ^{b}}

converges to 1. Because

μ

must be in the convex-hull of the induced posteriors, all the induced posteriors necessarily converge to

μ

as

λ \to + \infty

(see the proof of Proposition 7 for details). Additionally, since we assumed that neither

a_{0}

nor

a_{1}

are optimal at

μ

, this implies that these actions are not considered for large enough

λ

. Thus,

Λ

is bounded from above. □

Claim 2.

Let

λ^{*} = sup Λ

. Then in problem (*) with

λ^{*}

there are two solutions,

P^{*}

and

P^{* *}

, such that

B (P^{*}) = {a_{0}, a_{1}}

and

B (P^{* *}) \neq {a_{0}, a_{1}}

.

Proof.

First, consider a sequence

{λ_{n}}

converging to

λ^{*}

from below, such that for each n there is a solution

P_{n}

to (*) with

λ_{n}

satisfying

B (P_{n}) = {a_{0}, a_{1}}

. Such a sequence exists by the definition of

λ^{*}

. By taking a subsequence if needed we may assume that

P_{n}

converges. By the theorem of the maximum the limit

P^{*}

is optimal at

λ^{*}

. In addition, we must have

B (P^{*}) = {a_{0}, a_{1}}

: For every

a \neq a_{0}, a_{1}

we have

P_{n} (a) = 0

for all n, implying

P^{*} (a) = 0

. Additionally, it is impossible that

B (P^{*}) = {a_{0}}

or

B (P^{*}) = {a_{1}}

since these actions are not optimal at

μ

.

Second, let

{λ_{n}^{'}}

be a sequence converging to

λ^{*}

, but this time from above. Let

P_{n}^{'}

be a corresponding solution sequence such that

| B (P_{n}^{'}) | \leq 2

for each n.17 Subsequently, the limit

P^{* *} = {lim}_{n} P_{n}^{'}

is optimal at

λ^{*}

, and by the definition of

λ^{*}

we have

B (P_{n}^{'}) \neq {a_{0}, a_{1}}

for every n, also implying that

B (P^{* *}) \neq {a_{0}, a_{1}}

. □

Claim 3.

For

P^{*}

and

P^{* *}

constructed in Claim 2,

C (P^{*}) > C (P^{* *})

.

Proof.

Because H is strictly concave over

Δ (Ω)

, it is sufficient to show that

Γ (P^{*})

is a mean-preserving spread of

Γ (P^{* *})

. We consider two different cases. Suppose first that

B (P^{* *}) \cap B (P^{*}) = \emptyset

, i.e., neither of the two extreme actions is in the consideration set of

P^{* *}

. Subsequently, by Remark A1, every posterior induced by

P^{* *}

is in-between the two posteriors induced by

P^{*}

. This implies that

Γ (P^{*})

is a mean-preserving spread of

Γ (P^{* *})

as needed.

The other case is when

B (P^{* *}) \cap B (P^{*}) \neq \emptyset

. Since

B (P^{* *})

has at most two elements, the intersection contains exactly one element, say

B (P^{* *}) \cap B (P^{*}) = {a_{0}}

. Denote the support of

Γ (P^{*})

by

{γ_{*}^{a_{0}}, γ_{*}^{a_{1}}}

and the support

Γ (P^{* *})

by

{γ_{* *}^{a_{0}}, γ_{* *}^{a}}

. Remark A1 implies that

γ_{* *}^{a} < γ_{*}^{a_{1}}

. In addition, since both

P^{*}

and

P^{* *}

are optimal, Remark A2 implies that

γ_{*}^{a_{0}} = γ_{* *}^{a_{0}}

. Therefore, we again get that

Γ (P^{*})

is a mean-preserving spread of

Γ (P^{* *})

and the claim is proved. If

B (P^{* *}) \cap B (P^{*}) = {a_{1}}

the argument is symmetric. □

So far, we have shown that

λ (C (P^{*})) = λ (C (P^{* *})) = λ^{*}

, and that

C (P^{*}) > C (P^{* *})

. By monotonicity proved in Lemma 1 this implies that

λ (c) = λ^{*}

for every

c \in (C (P^{* *}), C (P^{*}))

. The next claim completes the proof of the proposition.

Claim 4.

Denote

\bar{c} = C (P^{*})

. There is

\underset{̲}{c} < \bar{c}

such that if P is optimal for (**) with

c \in (\underset{̲}{c}, \bar{c})

then

| B (P) | \geq 3

.

Proof.

Assume, contrary to the claim, that there is a sequence

c_{n}

converging to

\bar{c}

from below and a corresponding sequence

P_{n}

, such that

P_{n}

solves (**) with

c_{n}

and

| B (P_{n}) | \leq 2

for every n. Because of the budget constraint binds, we have

C (P_{n}) = c_{n}

for all n, so for n large enough

P_{n}

also solves (*) with

λ^{*}

. Denote

B (P_{n}) = {a (n), b (n)}

and the corresponding posteriors by

γ^{a (n)}, γ^{b (n)}

(it can not be that

| B (P_{n}) | = 1

since then

C (P_{n}) = 0

). Additionally, recall that

γ_{*}^{a_{0}}, γ_{*}^{a_{1}}

are the posteriors induced by

P^{*}

. There are three cases to consider:

First, it can not be that

{a (n), b (n)} = {a_{0}, a_{1}}

, since by Remark A2 this would imply

γ^{a (n)} = γ_{*}^{a_{0}}

and

γ^{b (n)} = γ_{*}^{a_{1}}

, contradicting the assumption that

c_{n} < \bar{c}

.

Second, suppose that

{a (n), b (n)} = {a_{0}, a}

for some action

a \neq a_{1}

. Then again by Remark A2 this would imply

γ^{a (n)} = γ_{*}^{a_{0}}

. Additionally by Remark A1 the other posterior

γ^{a}

is smaller and bounded away from

γ_{*}^{a_{1}}

, contradicting the assumption that

c_{n} \to \bar{c}

. The argument for

{a (n), b (n)} = {a_{1}, a}

is analogous.

Third, if

{a (n), b (n)} \cap {a_{0}, a_{1}} = \emptyset

, then both

γ^{a (n)}

and

γ^{b (n)}

are between and bounded away from

γ_{*}^{a_{0}}, γ_{*}^{a_{1}}

, so

C (P_{n})

can not converge to

\bar{c}

. □

Proof of Proposition 3.

For the most part the proof follows the footsteps of the proof of Proposition 2. We only provide details when a different argument is needed.

Claim 5.

For any

λ > 0

, if P solves (*) then either

B (P) = {a_{1}, \dots, a_{m}}

or

o \in B (P)

.

Proof.

Suppose by contradiction that P is optimal and

B (P) ⊊ {a_{1}, \dots, a_{m}}

. From Remark A1, if

a_{j} \in B (P)

then the associated posterior

γ^{a_{j}}

must satisfy

γ^{a_{j}} (ω_{j}) > t

, implying that

γ^{a_{j}} (ω_{i}) < 1 - t

for any

i \neq j

. Let i be such that

a_{i} \notin B (P)

. Then

μ (ω_{i}) = \sum_{a_{j} \in B (P)} P (a_{j}) γ^{a_{j}} (ω_{i}) < \sum_{a_{j} \in B (P)} P (a_{j}) (1 - t) = (1 - t),

contradicting the assumption in the proposition. □

Define

Λ

to be the set of all

λ > 0

, such that there exists a solution P to (*), satisfying

B (P) = {a_{1}, \dots, a_{m}}

.

Claim 6.

The set

Λ

is non-empty and bounded from above.

Proof.

We first prove that, if

λ

is small enough and P solves (*), then

o \notin B (P)

. By Claim 5, this implies that

B (P) = {a_{1}, \dots, a_{m}}

, so

Λ \neq \emptyset

.

For o to be considered, Condition (7) requires that for every

1 \leq i \leq m

1 \geq \sum_{j = 1}^{m} γ^{0} (ω_{j}) exp (\frac{u (a_{i}, ω_{j}) - t}{λ}) = γ^{0} (ω_{i}) exp (\frac{1 - t}{λ}) + (1 - γ^{0} (ω_{i})) exp (\frac{- t}{λ}) \geq γ^{0} (ω_{i}) exp (\frac{1 - t}{λ}) .

Summing up over all i gives

m \geq \sum_{i = 1}^{m} γ^{0} (ω_{i}) exp (\frac{1 - t}{λ}) = exp (\frac{1 - t}{λ}) .

Since

t < 1

this inequality clearly can not hold for

λ > 0

small enough.

To show that

Λ

is bounded, note that the assumption of the proposition implies that o is the unique optimal action at

μ

. By Proposition 7, this implies that obtaining no information is the unique optimal choice for all

λ

large enough. □

Claim 7.

Let

λ^{*} = sup Λ

. Then in problem (*) with

λ^{*}

there are two solutions,

P^{*}

and

P^{* *}

, such that

B (P^{*}) = {a_{1}, \dots, a_{m}}

and

B (P^{* *}) \neq {a_{1}, \dots, a_{m}}

.

Proof.

The proof is identical to that in Claim 2, except that to argue that

B (P^{*}) = {a_{1}, \dots, a_{m}}

we need to use Claim 5. □

Claim 8.

For

P^{*}

and

P^{* *}

constructed in the previous claim,

C (P^{*}) > C (P^{* *})

.

Proof.

We denote by

γ_{*}^{i}

the posterior corresponding to

a_{i}

induced by

P^{*}

and by

γ_{* *}^{i}

the one induced by

P^{* *}

(if

a_{i} \in B (P^{* *})

).

Because

o \notin B (P^{*})

,

P^{*}

must coincide with the solution given in ([9], Theorem 1) when all of the

a_{i}

’s are considered. In particular, the induced posteriors are symmetric in the sense that

γ_{*}^{i} (ω_{i})

is the same for all i. It follows that

V (P^{*}) = γ_{*}^{1} (ω_{1})

. On the other hand, by Remark A2 we must have

γ_{*}^{i} = γ_{* *}^{i}

whenever

a_{i} \in B (P^{* *})

, implying that

\begin{matrix} V (P^{* *}) & = & P^{* *} (o) t + \sum_{a_{i} \in B (P^{* *})} P^{* *} (a_{i}) γ_{* *}^{i} (ω_{i}) = P^{* *} (o) t + \sum_{a_{i} \in B (P^{* *})} P^{* *} (a_{i}) γ_{*}^{i} (ω_{i}) = \\ P^{* *} (o) t + (1 - P^{* *} (o)) γ_{*}^{1} (ω_{1}) < γ_{*}^{1} (ω_{1}) = V (P^{*}), \end{matrix}

where the strict inequality follows from

o \in B (P^{* *})

(recall Claim 5) and

t < γ_{*}^{1} (ω_{1})

. Therefore, we got that

V (P^{* *}) < V (P^{*})

, and since both are optimal for (*) with

λ^{*}

it must be that

C (P^{*}) > C (P^{* *})

. □

Claim 9.

Denote

\bar{c} = C (P^{*})

. There is

\underset{̲}{c} < \bar{c}

such that if P is optimal for (**) with

c \in (\underset{̲}{c}, \bar{c})

then

B (P) = A

.

Proof.

Suppose by contradiction that

c_{n} ↑ \bar{c}

, and that for each n

P_{n}

solves (**) with

c_{n}

but

B (P_{n}) \neq A

. Then

C (P_{n}) = c_{n}

, so for n large enough

P_{n}

also solves (*) with

λ^{*}

. Since

c_{n} < \bar{c}

we can not have

B (P_{n}) = {a_{1}, \dots, a_{m}}

, so by Claim 5

o \in B (P_{n})

. The exact same argument as in the previous claim gives

V (P_{n}) = P_{n} (o) t + (1 - P_{n} (o)) γ_{*}^{1} (ω_{1})

. Moreover,

P_{n} (o)

is bounded away from zero, since

μ

is not in the convex-hull of any strict subset of

{γ_{*}^{i}}_{i = 1}^{m}

. This implies that there is

δ > 0

such that

V (P^{*}) - V (P_{n}) > δ

for every n. However,

P^{*}

and

P_{n}

are both optimal for (*) with

λ^{*}

, so

V (P^{*}) - λ^{*} C (P^{*}) = V (P_{n}) - λ^{*} C (P_{n})

. This contradicts the convergence of

c_{n} = C (P_{n})

to

\bar{c} = C (P^{*})

. □

Proof of Proposition 4.

Fix a collection

Γ

as in the proposition. Define

f : Δ (Ω) \to R

by

f (γ) = H (μ) - H (γ)

. Because f is strictly convex, for each

1 \leq i \leq n

there is an affine function

f_{i} : Δ (Ω) \to R

such that

f_{i} ({\tilde{γ}}^{i}) = f ({\tilde{γ}}^{i})

and

f_{i} (γ) < f (γ)

for every

γ \in Δ (Ω)

. Let the set of actions be

A = {1, \dots, n}

and the utility function be

u (i, ω) = f_{i} (γ_{ω})

, where

γ_{ω} \in Δ (Ω)

is the Dirac measure on state

ω

.

To complete the description of the decision problem we need to choose the prior

μ

. By assumption, one of the elements of

Γ

, say

{\tilde{γ}}^{1}

, is in the convex hull of the others. Define

μ = (1 - α) {\tilde{γ}}^{1} + \frac{α}{n} \sum_{i = 1}^{n} {\tilde{γ}}^{i}

for some

0 < α < 1

.

Consider the set

C = \{c = \sum_{i} p_{i} H ({\tilde{γ}}^{i}) : \sum_{i} p_{i} {\tilde{γ}}^{i} = μ, p_{i} > 0 \forall i\} .

We claim that this set contains a non-degenerate interval of c values. Indeed, one element of this set is

[(1 - α) + \frac{α}{n}] H ({\tilde{γ}}^{1}) + \frac{α}{n} \sum_{i = 2}^{n} H ({\tilde{γ}}^{i})

. Additionally, we have

{\tilde{γ}}^{1} = \sum_{i = 2}^{n} {\tilde{p}}_{i} {\tilde{γ}}^{i}

for some probability vector

\tilde{p} = {{\tilde{p}}_{i}}_{i \geq 2}

, so

\frac{α}{n} H ({\tilde{γ}}^{1}) + \sum_{i = 2}^{n} [(1 - α) {\tilde{p}}_{i} + \frac{α}{n}] H ({\tilde{γ}}^{i})

is also in this set. The strict concavity of H implies that the former is strictly larger than the latter. By taking convex combinations of these two representations of

μ

we can get any c in between. Denote

\underset{̲}{c} = H (μ) - sup C

and

\bar{c} = H (μ) - inf C

. Note that

0 < \underset{̲}{c} < \bar{c} < H (μ)

.

Finally, consider problem (**) with

c \in (\underset{̲}{c}, \bar{c})

. Subsequently, by the definition of

\underset{̲}{c}

and

\bar{c}

, there are strictly positive

{\tilde{P} (i)}_{i = 1}^{n}

satisfying

\sum_{i} \tilde{P} (i) {\tilde{γ}}^{i} = μ

and

H (μ) - \sum_{i} \tilde{P} (i) H ({\tilde{γ}}^{i}) = c

. We claim that this distribution over posteriors is optimal. Indeed, this gives an expected utility of

\sum_{i} \tilde{P} (i) max_{1 \leq j \leq n} \sum_{ω} {\tilde{γ}}^{i} (ω) u (j, ω) = \sum_{i} \tilde{P} (i) max_{1 \leq j \leq n} f_{j} ({\tilde{γ}}^{i}) \geq \sum_{i} \tilde{P} (i) f_{i} ({\tilde{γ}}^{i}) = H (μ) - \sum_{i} \tilde{P} (i) H ({\tilde{γ}}^{i}) = c,

where the first equality is by the definition of u and the affinity of

f_{j}

, the inequality is obvious, the next equality follows from

f_{i} ({\tilde{γ}}^{i}) = f ({\tilde{γ}}^{i})

, and the last equality is by construction of

{\tilde{P} (i)}

. On the other hand, for any feasible distribution over posteriors

{\{P (i), γ^{i}\}}_{i = 1}^{l}

we have

\sum_{i} P (i) max_{1 \leq j \leq n} \sum_{ω} γ^{i} (ω) u (j, ω) = \sum_{i} P (i) max_{1 \leq j \leq n} f_{j} (γ^{i}) \leq \sum_{i} P (i) f (γ^{i}) = H (μ) - \sum_{i} P (i) H (γ^{i}) \leq c,

where the first equality is again by the definition of u and the affinity of

f_{j}

, the inequality follows from

f_{j} \leq f

for all j, and the last inequality follows from feasibility (the reduction of entropy must be at most c). Moreover, if

γ_{i} \notin Γ

for some i, then

f_{j} (γ^{i}) < f (γ^{i})

for every j, so the first inequality is strict. This completes the proof. □

Proof of Proposition 7.

From Proposition 1 and Lemma 1 it immediately follows that

\bar{λ}

is equal to the infimum of the set of

λ

’s for which an uninformative P is optimal in problem (*). Thus, to prove the proposition, it is enough to show that

μ

is an indifference point if and only if the set of such

λ

’s is empty.

We start by showing that if

μ

is not an indifference point then for

λ

large enough the optimal solution to (*) is uninformative. For this proof, we view

Δ (Ω)

as the unit simplex of

R^{Ω}

endowed with the metric

d (γ, γ^{'}) = {max}_{ω} | γ (ω) - γ^{'} (ω) |

. Let

K = \{γ \in Δ (Ω) : {a^{*}} = \underset{a}{arg max} \sum_{ω} γ (ω) u (a, ω)\}

be the set of beliefs at which

a^{*}

is the unique optimal action. Because K is relatively open in

Δ (Ω)

and

μ \in K

, there is

δ > 0

such that

γ \in K

whenever

d (γ, μ) < δ

.

Let

M = {max}_{a, b} {max}_{ω} {u (a, ω) - u (b, ω)} > 0

. Suppose that

λ

is large enough, so that

exp (\frac{M}{λ}) < 1 + δ

. If P is optimal for (*) with

λ

, the, n for every

a, b \in B (P)

and every

ω

, we have

\frac{γ^{a} (ω)}{γ^{b} (ω)} = exp (\frac{u (a, ω) - u (b, ω)}{λ}) \leq exp (\frac{M}{λ}) < 1 + δ,

(A1)

where the first equality is by (4), and the next inequality is by the definition of M. We thus get

d (γ^{a}, γ^{b}) = max_{ω} \{max \{γ^{b} (ω) (\frac{γ^{a} (ω)}{γ^{b} (ω)} - 1), γ^{a} (ω) (\frac{γ^{b} (ω)}{γ^{a} (ω)} - 1)\}\} \leq max_{ω} \{max \{γ^{b} (ω) δ, γ^{a} (ω) δ\}\} < δ,

(A2)

where in the first equality we used the fact that optimal posteriors are never on the boundary of

Δ (Ω)

(implied by condition (4)), and the following inequality is by (A1).

Now, since

μ = \sum_{b \in B (P)} P (b) γ^{b}

, we also have for every

a \in B (P)

and every

ω

|μ (ω) - γ^{a} (ω)| \leq \sum_{b \in B (P)} P (b) |γ^{b} (ω) - γ^{a} (ω)| .

Taking the maximum over

ω

gives

d (μ, γ^{a}) \leq \sum_{b \in B (P)} P (b) d (γ^{a}, γ^{b}) < δ,

where the last inequality is from (A2). By the construction of

δ

this implies that

γ^{a} \in K

for every

a \in B (P)

. Thus, by Remark A1, P induces a unique posterior and is therefore uninformative.

In the other direction, we now claim that, if

μ

is an indifference point, then an uninformative P can not be a solution of (*) with any

λ

. Indeed, let

a, a^{'}

be two optimal actions given belief

μ

. Consider

\tilde{P}

, as given by

\tilde{P} (a | ω) = \tilde{P} (a^{'} | ω) = 0.5

for every

ω

. Note that

\tilde{P}

is optimal among the set of uninformative P’s. However, we have

B (\tilde{P}) = {a, a^{'}}

and

γ^{a} = γ^{a^{'}} = μ

. Because

a, a^{'}

are not duplicates, condition (4) can not hold for any

λ > 0

, implying that

\tilde{P}

is not optimal.18 □

Appendix A.3. Proofs of Claims in the Example

We start with the following lemma.

Lemma A1.

Fix a consideration set

B (P) \subseteq {l, m, r}

. The posteriors

{γ^{a}}_{a \in B (P)}

satisfy (4) with some

λ > 0

if and only if:

(i): For $B (P) = {l, r}$ : $γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}$ .
(ii): For $B (P) = {l, m}$ : $γ^{l} = \frac{1}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}$ and $γ^{m} = \frac{exp (\frac{2}{λ})}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}$ .
(iii): For $B (P) = {l, m, r}$ : $γ^{m} = 0.5$ , $γ^{l} = 1 - γ^{r} = γ^{*}$ and $λ = λ^{*}$ .

Proof

(i): Condition (4) with $a = l$ and $b = r$ requires

$\frac{γ^{l}}{exp (\frac{- 2}{λ})} = \frac{γ^{r}}{exp (\frac{1}{λ})}$

in state $ω_{1}$ and

$\frac{1 - γ^{l}}{exp (\frac{1}{λ})} = \frac{1 - γ^{r}}{exp (\frac{- 2}{λ})}$

in $ω_{0}$ . It is immediate to verify that these two equations are equivalent to $γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}$ .
(ii): Similarly, condition (4) with $a = l$ and $b = m$ requires

$\frac{γ^{l}}{exp (\frac{- 2}{λ})} = \frac{γ^{m}}{1}$

in $ω_{1}$ and

$\frac{1 - γ^{l}}{exp (\frac{1}{λ})} = \frac{1 - γ^{m}}{1}$

in $ω_{0}$ , and these two equations are satisfied if and only if the posteriors are as in the lemma.
(iii): When all three actions are considered condition (4) is equivalent to the equations in both previous cases holding simultaneously. In particular, it implies that both $γ^{l} = \frac{1}{1 + exp (\frac{3}{λ})}$ and $γ^{l} = \frac{1}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}$ . This pins down $λ$ at $λ^{*}$ , and consequentially the posteriors $γ^{l}, γ^{m}, γ^{r}$ , as above.

□

Case 1: $μ = 0.5$

Claim 10.

Solution of (*): If

0 < λ < λ^{*}

then

P (l) = P (r) = 0.5

and

γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}

. If

λ > λ^{*}

then

P (m) = 1

and

γ^{m} = 0.5

. If

λ = λ^{*}

then any mixture (including degenerate) of the former two solutions is optimal.

Proof.

Suppose

λ < λ^{*}

, and we will show that

B (P) = {l, r}

is optimal. From Lemma A1 (i), condition (4) is equivalent to

γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}

. Condition (5) with

a = l

and

b = m

requires

\frac{γ^{l}}{exp (\frac{- 2}{λ})} * 1 + \frac{1 - γ^{l}}{exp (\frac{1}{λ})} * 1 \leq 1 .

Plugging in

γ^{l} = \frac{1}{1 + exp (\frac{3}{λ})}

and rearranging, this condition becomes

λ \leq λ^{*}

, which holds by assumption. Condition (7), with

a = r

and

b = m

, is identical, given that

γ^{l} = 1 - γ^{r}

.

Suppose now that

λ > λ^{*}

. We need to show that obtaining no information is optimal. Condition (4) is trivially satisfied, so we only need to check (5) for

a = m

and

b = l, r

. For

b = l

, this gives

\frac{0.5}{1} exp (\frac{1}{λ}) + \frac{0.5}{1} exp (\frac{- 2}{λ}) \leq 1,

which is equivalent to

λ \geq λ^{*}

. For

b = r

, we get the exact same condition.

Finally, when

λ = λ^{*}

it follows from the last two paragraphs that both

B (P) = {m}

and

B (P) = {l, r}

are still optimal. Because the set of optimal distributions over posteriors is convex, it follows that any mixture of these solutions is optimal as well. Note that at

λ = λ^{*}

the posteriors of the risky actions in the solution are given by

γ^{l} = 1 - γ^{r} = γ^{*}

. □

Claim 11.

Solution of (**): If

0 < c < H (0.5) - H (γ^{*})

then the solution is given by

P (l) = P (r) = \frac{c}{2 (H (0.5) - H (γ^{*}))}

,

P (m) = 1 - \frac{c}{(H (0.5) - H (γ^{*}))}

,

γ^{m} = 0.5

, and

γ^{l} = 1 - γ^{r} = γ^{*}

. If

H (0.5) - H (γ^{*}) \leq c < H (0.5)

then

P (l) = P (r) = 0.5

and

γ^{l} = 1 - γ^{r}

is determined by the equation

H (0.5) - H (γ^{l}) = c

.

Proof.

Suppose

0 < c < H (0.5) - H (γ^{*})

. We show that

B (P) = {l, m, r}

is optimal. From Lemma A1 (iii) it must be that

γ^{m} = 0.5

,

γ^{l} = 1 - γ^{r} = γ^{*}

, and that the value of the lagrange multiplier is

λ = λ^{*}

. Setting

P (l) = P (r) = \frac{c}{2 (H (0.5) - H (γ^{*}))}

and

P (m) = 1 - \frac{c}{(H (0.5) - H (γ^{*}))}

, we get that

P (l) γ^{l} + P (m) γ^{m} + P (r) γ^{r} = 0.5

. In addition, the expected reduction of entropy from the prior to the posteriors is given by

H (0.5) - [\frac{c}{2 (H (0.5) - H (γ^{*}))} H (γ^{*}) + \frac{c}{2 (H (0.5) - H (γ^{*}))} H (1 - γ^{*}) + (1 - \frac{c}{(H (0.5) - H (γ^{*}))}) H (0.5)] = c,

so the budget constraint (6) binds. By Proposition 1, we are done.

Once

H (0.5) - H (γ^{*}) \leq c

it is no longer possible that all three actions are considered, since, given the location of the posteriors, the budget constraint can not bind. We show that

B (P) = {l, r}

is optimal. Define

γ^{l}

by the equation

H (0.5) - H (γ^{l}) = c

, and set

γ^{r} = 1 - γ^{l}

. Let

P (l) = P (r) = 0.5

. Subsequently,

P (l) γ^{l} + P (r) γ^{r} = μ

and the budget constraint binds. Set

λ

to solve

γ^{l} = \frac{1}{1 + exp (\frac{3}{λ})}

. Afterwards, condition (4) is satisfied by Lemma A1 (i). Furthermore, since

H (0.5) - H (γ^{l}) = c \geq H (0.5) - H (γ^{*})

we know that

γ^{*} \geq γ^{l}

, which implies that

λ \leq λ^{*}

. Thus, (5) holds for

a = l, r

and

b = m

in the same way as in the case

λ < λ^{*}

of the previous claim. □

Case 2: $γ^{*} < μ < 0.5$

Claim 12.

Solution of (*): When

λ < λ^{*}

we have

B (P) = {l, r}

,

γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}

, and the probabilities are set to satisfy

P (l) γ^{l} + P (r) γ^{r} = μ

. At

λ = λ^{*}

the solution is not unique, with

B (P) = {l, r}

,

B (P) = {l, m}

, and

B (P) = {l, m, r}

all possible. When

λ^{*} < λ < \bar{λ} (μ)

,

B (P) = {l, m}

, the posteriors are

γ^{l} = \frac{1}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

,

γ^{m} = \frac{exp (\frac{2}{λ})}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

, and the probabilities are adjusted, so that

P (l) γ^{l} + P (m) γ^{m} = μ

. Finally, for

λ \geq \bar{λ} (μ)

, it is optimal to obtain no information, and the consideration set is either

{l}

or

{m}

when

μ

is below or above

1 / 3

, respectively.

Proof.

The proof for

λ < λ^{*}

is identical to the

μ = 0.5

case. Suppose that

λ^{*} < λ < \bar{λ} (μ)

and we prove that

B (P) = {l, m}

is optimal. By Lemma A1 (ii), condition (4) is equivalent to

γ^{l} = \frac{1}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

and

γ^{m} = \frac{exp (\frac{2}{λ})}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

. We need to check condition (5) with

a = l, m

and

b = r

. For

a = l

, this requires

\frac{γ^{l}}{exp (\frac{- 2}{λ})} exp (\frac{1}{λ}) + \frac{1 - γ^{l}}{exp (\frac{1}{λ})} exp (\frac{- 2}{λ}) \leq 1,

which boils down to

λ^{*} \leq λ

. For

a = m

, the condition is

\frac{γ^{m}}{1} exp (\frac{1}{λ}) + \frac{1 - γ^{m}}{1} exp (\frac{- 2}{λ}) \leq 1,

which, again, is equivalent to

λ^{*} \leq λ

. Finally, we need that

μ

is in the interior of the convex hull of

γ^{l}

and

γ^{m}

. It is not hard to check that this is equivalent to

λ < \bar{λ} (μ)

.

Note that at

λ = λ^{*}

both solutions with

B (P) = {l, r}

and with

B (P) = {l, m}

are optimal, and therefore any mixture is optimal as well. This implies that

B (P) = {l, m, r}

is also optimal.

Suppose now that

λ \geq \bar{λ} (μ)

. We consider the case where

μ > \frac{1}{3}

, so obtaining no information implies that

B (P) = {m}

and

γ^{m} = μ

(the case

μ < \frac{1}{3}

in which

B (P) = {l}

is similar). We need to check (5) with

a = m

and

b = l, r

. These conditions are given by

\frac{μ}{1} exp (\frac{- 2}{λ}) + \frac{1 - μ}{1} exp (\frac{1}{λ}) \leq 1

and

\frac{μ}{1} exp (\frac{1}{λ}) + \frac{1 - μ}{1} exp (\frac{- 2}{λ}) \leq 1,

respectively. For every

μ > \frac{1}{3}

, the second inequality implies the first, and it is satisfied if and only if

λ \geq \bar{λ} (μ)

. This completes the proof. □

Claim 13.

Solution of (**): For

0 < c \leq H (μ) - [\frac{1 - 2 μ}{1 - 2 γ^{*}} H (γ^{*}) + \frac{2 μ - 2 γ^{*}}{1 - 2 γ^{*}} H (0.5)]

the consideration set is

B (P) = {l, m}

, and the distribution over posteriors is determined by the equations

γ^{l} {(1 - γ^{l})}^{2} = γ^{m} {(1 - γ^{m})}^{2}

,

P (l) γ^{l} + P (m) γ^{m} = μ

and

H (μ) - [P (l) H (γ^{l}) + P (m) H (γ^{m})] = c

. For

H (μ) - [\frac{1 - 2 μ}{1 - 2 γ^{*}} H (γ^{*}) + \frac{2 μ - 2 γ^{*}}{1 - 2 γ^{*}} H (0.5)] < c < H (μ) - H (γ^{*})

the consideration set is

B (P) = {l, m, r}

, the posteriors are

γ^{m} = 0.5, γ^{l} = 1 - γ^{r} = γ^{*}

, and the probabilities

(P (l), P (m), P (r))

adjust so that the expected posterior is equal to

μ

and the cost is equal to c. For

H (μ) - H (γ^{*}) \leq c < H (μ)

, the consideration set is

B (P) = {l, r}

, the posteriors are determined by

H (μ) - H (γ^{l}) = c

and

γ^{l} = 1 - γ^{r}

, and their probabilities by

P (l) γ^{l} + P (r) γ^{r} = μ

.

Proof.

Fix c in the first range. The function

f (x) = x {(1 - x)}^{2}

is strictly increasing on

[γ^{*}, \frac{1}{3})

, strictly decreasing on

(\frac{1}{3}, 0.5]

, and is equal to

\frac{1}{8}

at both

γ^{*}

and

0.5

. It follows that, for each

γ^{l} \in [γ^{*}, \frac{1}{3})

, there is a unique

γ^{m} \in (\frac{1}{3}, 0.5]

, such that

γ^{l} {(1 - γ^{l})}^{2} = γ^{m} {(1 - γ^{m})}^{2}

, and that as

γ^{l}

increases the corresponding

γ^{m}

decreases. Therefore, given

μ

, there is a unique way to choose the posteriors

γ^{l}, γ^{m}

and the probabilities

P (l), P (m) = 1 - P (l)

, such that (1)

γ^{l} {(1 - γ^{l})}^{2} = γ^{m} {(1 - γ^{m})}^{2}

; (2)

P (l) γ^{l} + P (m) γ^{m} = μ

; and, (3)

H (μ) - [P (l) H (γ^{l}) + P (m) H (γ^{m})] = c

.19 Now, define

λ

as the (unique) solution to

γ^{l} = \frac{1}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

, and note that this implies

γ^{m} = \frac{exp (\frac{2}{λ})}{1 + exp (\frac{1}{λ}) + exp (\frac{2}{λ})}

. By Lemma A1 (ii) condition (4) holds. Finally,

γ^{l} \geq γ^{*}

implies that

λ \geq λ^{*}

, so condition (4) holds in the same way as in the previous claim.

Moving on to

H (μ) - [\frac{1 - 2 μ}{1 - 2 γ^{*}} H (γ^{*}) + \frac{2 μ - 2 γ^{*}}{1 - 2 γ^{*}} H (0.5)] < c < H (μ) - H (γ^{*})

, the distribution in the claim is optimal by Proposition 1 with

λ = λ^{*}

. The lower end of the range of c is obtained by the distribution with support

γ^{*}

and

0.5

only, while the upper end by the distribution with support

γ^{*}

and

1 - γ^{*}

only.

When

H (μ) - H (γ^{*}) \leq c < H (μ)

the proof is as in the previous claims. □

Case 3: $0 < μ \leq γ^{*}$

Claim 14.

The solution of (*): for

λ < \bar{λ} (μ)

we have

B (P) = {l, r}

,

γ^{l} = 1 - γ^{r} = \frac{1}{1 + exp (\frac{3}{λ})}

, and

(P (l), P (r))

determined by

P (l) γ^{l} + P (r) γ^{r} = μ

. For

λ \geq \bar{λ} (μ)

, the solution is

B (P) = {l}

, i.e., no information.

Proof.

Fix

λ < \bar{λ} (μ)

and set the posteriors as in the claim. Subsequently, (4) holds by Lemma A1 (i). As in the previous cases above, condition (5) with

a = l, r

and

b = m

is equivalent to

λ \leq λ^{*}

; for

μ

in the current range it is easy to check that

\bar{λ} (μ) \leq λ^{*}

, so the condition holds by assumption. Finally, we need that

μ \in (γ^{l}, γ^{r})

, which is equivalent to

λ < \bar{λ} (μ)

.

Assume now

λ \geq \bar{λ} (μ)

. Set

B (P) = {l}

and

γ^{l} = μ

. Condition (5) with

a = l

and

b = m, r

requires

\frac{μ}{exp (\frac{- 2}{λ})} * 1 + \frac{1 - μ}{exp (\frac{1}{λ})} * 1 \leq 1

and

\frac{μ}{exp (\frac{- 2}{λ})} exp (\frac{1}{λ}) + \frac{1 - μ}{exp (\frac{1}{λ})} exp (\frac{- 2}{λ}) \leq 1,

respectively. For any

μ

in the range, the second inequality implies the first and it is satisfied if and only if

λ \geq \bar{λ} (μ)

. □

Claim 15.

The solution of (**): For any

0 < c < H (μ)

the consideration set is

B (P) = {l, r}

and

γ^{l} = 1 - γ^{r}

. The posterior

γ^{l}

is determined by the equation

H (μ) - H (γ^{l}) = c

, and the probability

P (l)

is then determined by the equation

P (l) γ^{l} + P (r) γ^{r} = μ

.

Proof.

Fix c and define

γ^{l}

by

H (μ) - H (γ^{l}) = c

. Let

γ^{r} = 1 - γ^{l}

and set

λ

to solve

γ^{l} = \frac{1}{1 + exp (\frac{3}{λ})}

. The fact that

H (μ) - H (γ^{l}) = c > 0

implies that

γ^{l} < μ

, which in turn implies that

μ \in (γ^{l}, γ^{r})

as well as that

λ < \bar{λ} (μ)

. The optimality of these posteriors now follows in the same way as in the previous claim. □

References

Sims, C.A. Stickiness. Carnegie Rochester Conf. Ser. Public Policy 1998, 49, 317–356. [Google Scholar] [CrossRef]
Sims, C.A. Implications of rational inattention. J. Monet. Econ. 2003, 50, 665–690. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Caplin, A.; Dean, M. Behavioral Implications of Rational Inattention with Shannon Entropy; National Bureau of Economic Research: Cambridge, MA, USA, 2013. [Google Scholar]
Le Treust, M.; Tomala, T. Persuasion with limited communication capacity. J. Econ. Theory 2019, 184, 104940. [Google Scholar] [CrossRef] [Green Version]
Dean, M.; Neligh, N. Experimental Tests of Rational Inattention. Working Paper. 2019. Available online: http://www.columbia.edu/~md3405/Working_Paper_21.pdf (accessed on 29 December 2020).
Maćkowiak, B.; Matějka, F.; Wiederholt, M. Rational Inattention: A Review. Working Paper. 2020. Available online: http://home.cerge-ei.cz/matejka/RIsurvey.pdf (accessed on 29 December 2020).
Matějka, F.; McKay, A. Rational inattention to discrete choices: A new foundation for the multinomial logit model. Am. Econ. Rev. 2015, 105, 272–298. [Google Scholar] [CrossRef] [Green Version]
Caplin, A.; Dean, M.; Leahy, J. Rational inattention, optimal consideration sets, and stochastic choice. Rev. Econ. Stud. 2019, 86, 1061–1094. [Google Scholar] [CrossRef] [Green Version]
Caplin, A.; Dean, M.; Leahy, J. Rationally Inattentive Behavior: Characterizing and Generalizing Shannon Entropy; National Bureau of Economic Research: Cambridge, MA, USA, 2017. [Google Scholar]
Matějka, F. Rigid pricing and rationally inattentive consumer. J. Econ. Theory 2015, 158, 656–678. [Google Scholar] [CrossRef] [Green Version]
Steiner, J.; Stewart, C.; Matějka, F. Rational inattention dynamics: Inertia and delay in decision-making. Econometrica 2017, 85, 521–553. [Google Scholar] [CrossRef]
De Oliveira, H. Axiomatic Foundations for Entropic Costs of Attention. Working Paper. 2014. Available online: https://6964c30c-a4a4-4464-a435-9021be4b7ccd.filesusr.com/ugd/21e9a6_dab3d6bc526d40fa86894de99e3a48ba.pdf (accessed on 29 December 2020).
Fulton, C. Mechanics of Linear Quadratic Gaussian Rational Inattention Tracking Problems. Working Paper. 2017. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3065488# (accessed on 29 December 2020).
Doval, L.; Skreta, V. Constrained Information Design: Toolkit. Working Paper. 2018. Available online: https://arxiv.org/abs/1811.03588 (accessed on 29 December 2020).
Boleslavsky, R.; Kim, K. Bayesian Persuasion and Moral Hazard. Working Paper. 2020. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2913669 (accessed on 29 December 2020).
Caplin, A.; Dean, M. Revealed preference, rational inattention, and costly information acquisition. Am. Econ. Rev. 2015, 105, 2183–2203. [Google Scholar] [CrossRef] [Green Version]
Dewan, A.; Neligh, N. Estimating information cost functions in models of rational inattention. J. Econ. Theory 2020, 187, 105011. [Google Scholar] [CrossRef]
Cheremukhin, A.; Popova, A.; Tutino, A. A theory of discrete choice with information costs. J. Econ. Behav. Organ. 2015, 113, 34–50. [Google Scholar] [CrossRef]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]

1.	The use of entropy for measuring quantity of information has its origins in [3] classical work on the capacity of a communication channel.
2.	More precisely, the agent chooses both the information structure and the action to take conditional on the realized signal. See the next section.
3.	The Lagrangian contains additional terms due to constraints associated with x being a collection of probability distributions. These constraints are common to the two versions and are not relevant for the current discussion.
4.	Le Treust and Tomala [5] give an example that is similar to ours, where the number of actions considered is larger than the number of states in the constrained problem. They prove that there is always a solution with at most the number of states plus one actions. We discuss the relation to that paper below. The first draft of the current paper was written without awareness of this existing result.
5.	See the related literature section below for a discussion of experimental works that test some of these implications.
6.	For example, Matějka [11] uses the constrained version in a static application, while Steiner et al. [12] analyze a dynamic problem with the unconstrained version.
7.	Also see [15,16] for related results.
8.	Dewan and Neligh [18] also compare the performance of several cost functions for information in explaining their data. They find that most of the subjects fit well to one of four functional forms, including Shannon entropy; however, they do not explicitly consider the constrained model.
9.	If $c \geq H (μ)$ , then full information is feasible and the problem becomes trivial, similarly to the case $λ = 0$ in (*).
10.	Indeed, starting from $P (a \| ω)$ , define $P (a) = \sum_{ω} μ (ω) P (a \| ω)$ and let $γ^{a} (ω) = \frac{μ (ω) P (a \| ω)}{P (a)}$ for any $a \in B (P)$ . In the other direction, set $P (a \| ω) = \frac{P (a) γ^{a} (ω)}{μ (ω)}$ .
11.	Because the setup is symmetric the analysis is similar for $μ > 0.5$ .
12.	Note that, for $μ = 1 / 3$ , we have $\bar{λ} (μ) = + \infty$ , so the latter case is empty.
13.	This happens when $c > H (μ) - [\frac{1 - 2 μ}{1 - 2 γ^{}} H (γ^{}) + \frac{2 μ - 2 γ^{}}{1 - 2 γ^{}} H (0.5)]$ .
14.	In Caplin et al. [9], the available actions are just ${a_{1}, \dots, a_{m}}$ . Setting the payoffs to 1 and 0 is just for convenience, the same result (with the obvious changes) applies with similar payoff structures.
15.	We view $Δ (Ω)$ as a subset of the Euclidean space $R^{Ω}$ and endow it with the topology that it inherits from that space. A set in a topological space is nowhere dense if its closure has an empty interior. The proposition immediately follows from the fact that the level sets of a strictly concave and continuous function are closed and contain no open set.
16.	If one of the actions (weakly) dominates the other then the latter can not be part of an optimal solution.
17.	Recall that there is always a solution in which the size of the consideration set is at most the number of states [4].
18.	The same argument also implies the stronger result that if P is optimal for (*) with some $λ > 0$ , then none of the posteriors ${γ^{a}}_{a \in B (P)}$ is an indifference point.
19.	Note that the bound on c is obtained by setting $γ^{l} = γ^{*}$ and $γ^{m} = 0.5$ .

Figure 1. The dashed lines show the expected utility for each of the actions

{l, m, r}

in the example as a function of the decision maker’s (DM’s) belief. The upper envelope of these functions is the solid black line. l is optimal for posteriors in

[0, 1 / 3]

, m is optimal in

[1 / 3, 2 / 3]

, and r is optimal in

[2 / 3, 1]

.

Figure 1. The dashed lines show the expected utility for each of the actions

{l, m, r}

in the example as a function of the decision maker’s (DM’s) belief. The upper envelope of these functions is the solid black line. l is optimal for posteriors in

[0, 1 / 3]

, m is optimal in

[1 / 3, 2 / 3]

, and r is optimal in

[2 / 3, 1]

.

Figure 2. The solutions of problems (**) (upper line) and (*) (lower line) from the example with prior

μ = 0.5

. The arrows between the lines illustrate the mapping from c to the value of the lagrange multiplier

λ

at the optimum.

Figure 2. The solutions of problems (**) (upper line) and (*) (lower line) from the example with prior

μ = 0.5

. The arrows between the lines illustrate the mapping from c to the value of the lagrange multiplier

λ

at the optimum.

Figure 3. The consideration sets in the solutions of (*) and (**) for the case

γ^{*} < μ < 0.5

,

μ \neq \frac{1}{3}

. When

μ = 1 / 3

, the consideration set in (*) is

{l, m}

for every

λ > λ^{*}

.

Figure 3. The consideration sets in the solutions of (*) and (**) for the case

γ^{*} < μ < 0.5

,

μ \neq \frac{1}{3}

. When

μ = 1 / 3

, the consideration set in (*) is

{l, m}

for every

λ > λ^{*}

.

Figure 4. The solution of the example for

0 < μ \leq γ^{*}

.

Figure 4. The solution of the example for

0 < μ \leq γ^{*}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Azrieli, Y. Constrained versus Unconstrained Rational Inattention. Games 2021, 12, 3. https://doi.org/10.3390/g12010003

AMA Style

Azrieli Y. Constrained versus Unconstrained Rational Inattention. Games. 2021; 12(1):3. https://doi.org/10.3390/g12010003

Chicago/Turabian Style

Azrieli, Yaron. 2021. "Constrained versus Unconstrained Rational Inattention" Games 12, no. 1: 3. https://doi.org/10.3390/g12010003

APA Style

Azrieli, Y. (2021). Constrained versus Unconstrained Rational Inattention. Games, 12(1), 3. https://doi.org/10.3390/g12010003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constrained versus Unconstrained Rational Inattention

Abstract

1. Introduction

Related Literature

2. Two Versions of the RI Problem

3. An Example

4. Model Comparison

4.1. Consideration Sets and Optimal Posteriors

An ′Anything Goes′Result

4.2. Comparative Statics

4.2.1. Locally Variant Posteriors

4.2.2. Utility Scaling

4.3. Optimality of ′No Information′

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Appendix A. Proofs

Appendix A.1. Notation

Appendix A.2. Proofs of Propositions

Appendix A.3. Proofs of Claims in the Example

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI