Optimal Channel Design: A Game Theoretical Analysis

Khouzani, MHR.; Malacaria, Pasquale

doi:10.3390/e20090675

Open AccessArticle

Optimal Channel Design: A Game Theoretical Analysis

by

MHR. Khouzani

^*,† and

Pasquale Malacaria

^*,†

School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, London E1 4NS, UK

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2018, 20(9), 675; https://doi.org/10.3390/e20090675

Submission received: 22 August 2018 / Revised: 31 August 2018 / Accepted: 3 September 2018 / Published: 5 September 2018

(This article belongs to the Special Issue Information Theory in Game Theory)

Download

Browse Figures

Versions Notes

Abstract

:

This paper studies the problem of optimal channel design. For a given input probability distribution and for hard and soft design constraints, the aim here is to design a (probabilistic) channel whose output leaks minimally from its input. To analyse this problem, general notions of entropy and information leakage are introduced. It can be shown that, for all notions of leakage here defined, the optimal channel design problem can be solved using convex programming with zero duality gap. Subsequently, the optimal channel design problem is studied in a game-theoretical framework: games allow for analysis of optimal strategies of both the defender and the adversary. It is shown that all channel design problems can be studied in this game-theoretical framework, and that the defender’s Bayes–Nash equilibrium strategies are equivalent to the solutions of the convex programming problem. Moreover, the adversary’s equilibrium strategies correspond to a robust inference problem.

Keywords:

entropy; game theory; convex optimisation; quantitative information flow

1. Introduction

A channel is defined as a conditional distribution, modelling the probability of outputs that an adversary can observe given secret inputs. Important examples of channels are side-channels in computer security where an attacker, for example by observing the running time of an encryption program, can reconstruct the encryption keys.

At a high level, the problem of optimal channel design is the following: given a prior on the secret and some operational constraints, design a channel that minimises the leakage of information about the secret. In simple terms, an optimal channel can be seen as an optimal countermeasure to information leakage.

To explore this design problem, one needs to specify what constraints should be considered and how the leakage of information is quantified. In the cryptographic example above, one may want, for example, to design a channel of minimal leakage (in terms of the number of key bits that can be reconstructed by an adversary) under the constraint that the average encryption per block should take less time than some given duration. This work will consider two general classes of constraints which we refer to them as “hard” and “soft”. Hard constraints are the ones establishing which outputs are allowed given each inputs. These constraints must be satisfied for each realisation of input–output pairs. Soft constraints, on the other hand, must be satisfied in the expected value sense, as they relate to the expected utility of the channel.

Leakage of information is defined as the difference between the adversary’s prior and posterior uncertainty, i.e., the uncertainty before and after observing the outputs of the channel. The leakage quantifies how much the attacker can learn about the secret input from observing the outputs. Therefore, any entropy function, seen as a measure of uncertainty, can induce a candidate function for quantifying leakage. For Shannon entropy, the leakage is just the mutual information. To capture the widest class of entropies, and hence leakages, this work uses core-concavity [1,2], a generalisation of concavity which allows for capture of entropies which are not concave (like Rényi entropies when

α > 1

).

Once the optimal channel design problem is formally set, it is possible to address some basic questions. The first question regards how difficult it is to solve this problem. Based on Reference [2], it can be shown that, for any choice of the entropy measure, this problem is solved via convex programming with zero duality gap, for which the Karush–Kuhn–Tucker (KKT) conditions can be used to solve for the optimal channel.

1.1. Literature Review

The problem of information leakage outside of the communication setting has been studied in the quantitative information flow (QIF) literature [3,4,5,6], works on private information retrieval (PIR) [7], and private search queries [8,9], as well as research on privacy-utility trade-offs [10,11,12]. Particularly important from the field of QIF are advances on fundamental security guarantees of leakage measures (what security can be achieved) and robust techniques and results (how much a technique or result is valid across different notions of leakage). However, most of the theoretical effort has been focused on analysing a given system as opposed to a design problem.

Information leakage in the context of game theory has been studied in Reference [13]. Their work focuses on modelling the interplay between attacker and defender in regard to information leakage of given channels, and to reason about their optimal strategies. In contrast our focus is on the design of optimal channels within operational constraints.

The authors in Reference [14] also use a zero-sum game between a forecaster against Nature to show that the celebrated maximum entropy principle in statistics, i.e., that one should choose a distribution that has the highest entropy from a family without any further knowledge, is the dual of solving a robust Bayes decision problem. This was the inspiration for our duality connection in Section 5.2.

This work builds and extends on our two conference papers [1,2]. However, there are several differences compared to those papers. For example, we have now simplified the definition of core-concavity without loss of generality. In addition, the games in Reference [1] are different; e.g., they do not include soft constraints. Moreover, the connection of convex optimisation to a two-person game for “any” core-concave entropic leakage was not explored in either works. Finally, the relation of the dual problem, that of the adversary, to a robust information extraction problem is unique to this manuscript.

1.2. Contributions

The main contribution of this paper is to present the problem of designing optimal channels for minimum information leakage in a game-theoretical framework for a generalised class of quantifying leakage. In this way, the optimal channel design can be studied both from the defender (the channel designer) and the adversary (the inference maker, or the information extractor) point of view. The main technical contribution is Theorem 1, which shows that the convex programming solutions as in Reference [2] correspond to the defender’s optimal strategies in these games. Moreover, this game-theoretical framework reveals that there is a tight duality relationship between the problem of designing a minimal leakage channel and choosing a “robust inference extraction” strategy. In particular, knowing only the specification of a channel given by some constraints and a prior distribution, the optimal strategy to extract the maximum amount of information about the input from the output of the channel, where the exact realisation of the channel is unknown, needs to be found. Hence, the strategy should be robust to any realisation of the channel within its constraints. When the game is finite, efficient solutions for both the defender and adversary’s strategies can be found using linear programming.

This work also establishes a result to deal with uncertainty about the prior. By Theorem 2, it follows that, when the prior is not unique, but is known to depend on a hidden “context”, the Nash equilibrium is not given by customising with respect to the context, but rather by treating the multi-prior problem as a single-prior one, where the prior is the average prior over all contexts.

1.3. Roadmap

After introducing notations and the information-theoretical background, including the important definitions of core-concavity and gain functions, the optimal channel design problem is presented in Section 3. It is then shown, in Section 4, that the problem is solved by convex programming for any entropy belonging to this generalised class.

The main contribution of the paper, i.e., the game-theoretical framework, is presented in Section 5. The games under study here are two persons sequential zero-sum games with asymmetric information. A notion of utility is introduced based on gain functions and soft constraints and the saddle-point equilibria are defined. The main result of this section, Theorem 1, shows the correspondence between equilibria and convex optimisation from Section 4. The section concludes with a discussion of the problem from the adversary point of view and its relation to robust inference.

In Section 6, our framework is extended to the case of uncertainty about the prior. It is first analysed as a convex optimisation problem, culminating in Theorem 2, which is followed by a discussion of the game-theoretical implication of that result.

2. Notational Conventions and Preliminaries

We will denote sets, random variables, and realisations with calligraphic, capital, and small letters, respectively, e.g.,

X

, X, and x. We will denote the cardinality of a set

X

by

| X |

. For a vector p, we use

p_{[i]}

to denote the i-th largest element of p, where ties are broken arbitrarily. In addition, we will use the notation

{∥ p ∥}_{α}

for the

α

-norm of vector p, that is,

{∥ p ∥}_{α} : = {(\sum_{i = 1}^{n} p_{i}^{α})}^{1 / α}

. The limit case of ∞-norm is

{∥ p ∥}_{\infty} : = p_{[1]}

.

Let X represent the secret as a discrete random variable that can take one of the n possibilities from

X : = {1, \dots, n}

with the (categorical) distribution of

p_{X} = (p_{X} (1), p_{X} (2), \dots, p_{X} (n)) \in p (X)

, where

p (X)

is the probability simplex in

R^{n}

. For the rest of the paper, as is the convention, we may omit the subscript X whenever it is not ambiguous and simply use p to refer to

p_{X}

. Without loss of generality, assume that every secret has a strictly positive probability of realisation and that

p (x)

’s are sorted in non-increasing order; that is,

p (1) \geq p (2) \geq \dots \geq p (n) > 0

.

A system that generates observable Y from the discrete set

Y

that can probabilistically depend on a secret can be modelled as a probabilistic discrete channel (henceforth referred to simply as a “channel”) denoted by the triplet

(X, p_{Y | X}, Y)

. Specifically,

X

and

Y

are the input and outputalphabets, respectively, and

p_{Y | X}

denotes the conditional probability distribution, also known as the transition matrix. That is,

p (y | x)

is the probability with which the channel produces the output (the observable) y given that its input (the secret) is x. In particular, they satisfy the following:

\begin{matrix} p (y | x) \geq 0 \forall x \in X, y \in Y; \end{matrix}

(1a)

\begin{matrix} \sum_{y \in Y} p (y | x) = 1 \forall x \in X . \end{matrix}

(1b)

In other words, the transition matrix is “row-stochastic”. In the rest of the paper, we will use the terms secret and input, as well as observables and outputs interchangeably.

Central to this work is the notion of leakage of information. In order to define leakage formally we will start by defining entropy and posterior (conditional) entropy in a general context.

2.1. Entropy

The classical choice for entropy and posterior entropy are (Gibbs)–Shannon’s:

\begin{matrix} H (X) = - \sum_{x \in X} p (x) log (p (x)) \end{matrix}

(2a)

\begin{matrix} H (X | Y) = - \sum_{y \in Y^{+}} p (y) \sum_{x \in X} p (x | y) log (p (x | y)) \end{matrix}

(2b)

where

Y^{+}

is the set of outputs that have a strictly positive probability of realisation, that is

Y^{+} = {y \in Y ∣ \exists x \in X, p (y | x) > 0}

. In addition,

p (y)

is the (total) probability that y is observed by the adversary, i.e.,

p (y) = \sum_{x^{'} \in X} p (x^{'}) p (y | x^{'})

, and

p (x | y)

is the posterior probability of the secret x given that y is observed as given by the Bayes’ rule:

p (x | y) = p (x, y) / p (y) = p (x) p (y | x) / p (y)

.

However, as we mentioned in the introduction, there are many candidates for entropy. Some are more fitting for specific operational scenarios, such as Min-entropy and guesswork. A generalisation of Shannon and Min-entropy is the Rényi family, which itself is a special case of the Kolmogorov–Nagumo family. Rather than taking a specific entropy, we construct a general entropy from an axiomatic description.

Consider a random variable X whose distribution depends on the realisation of a “context” C, which is a binary random variable. In particular,

p (c = 0) = α

and

p (c = 1) = 1 - α

, with

0 \leq α \leq 1

; moreover,

p_{X | c = 0} = p_{1}

and

p_{X | c = 1} = p_{2}

. Compare the following two scenarios: (1) we observe the realisation of the context and (2) we cannot see the realisation of the context. Intuitively, our uncertainty about X in the first scenario should be lower than that in the second. In particular, if we measure the uncertainty of a random variable with distribution p by function

F (p)

, we should have

α F (p_{1}) + (1 - α) F (p_{2}) \leq F (α p_{1} + (1 - α) p_{2})

; that is, F should be a concave function. However, we note that this intuitive inequality still holds even if an increasing

R \to R

function

η (\cdot)

is applied to both sides; that is,

\begin{matrix} η (α F (p_{1}) + (1 - α) F (p_{2})) \leq η (F (α p_{1} + (1 - α) p_{2})) . \end{matrix}

The function

η

can be thought of as capturing our risk attitude. This motivates the following definitions.

Definition 1.

Let H be a function from probability distributions to

R

. Then we call H to be core-concave if we can write

H (p) = η (F (p))

, where

η : R \to R

is strictly increasing and F is concave.

Throughout the paper, we will consider concave functions to also be continuous; specifically, their value on the boundaries are their limit values. Note that any concave function is also core-concave, by simply taking

η (t) = t

. However, the converse is not true. A notable example is the Rényi entropies:

\begin{matrix} H (p) = \frac{1}{1 - α} log \sum_{i} p_{i}^{α} = \frac{α}{1 - α} log {∥ p ∥}_{α} . \end{matrix}

For

α > 1

, this function is neither concave nor convex (it is only pseudo-concave). However, it is core-concave. This can be shown as follows:

\begin{matrix} η (t) = \frac{α}{1 - α} log (- t), F (p) = - {∥ p ∥}_{α} . \end{matrix}

For

0 < α < 1

, core-concavity can be shown by

η (t) = \frac{α}{1 - α} log (t)

and

F (p) = {∥ p ∥}_{α}

. As another example, consider Sharma–Mittal entropies [15], defined as

\begin{matrix} H_{α, β} (p) = \frac{1}{β - 1} (1 - {({∥ p ∥}_{α}^{α})}^{\frac{1 - β}{1 - α}}), α \geq 0, α, β \neq 1 . \end{matrix}

(3)

This family generalises Rényi

H_{α, β \to 1} (p)

, Shannon

H_{α \to 1, β \to 1} (p)

), and Havrda–Tsallis entropies [16,17]:

H_{α, α} (p) = \frac{1}{1 - α} (1 - {∥ p ∥}_{α}^{α})

.

H_{α, β} (p)

is also core-concave. This can be seen by

\begin{matrix} η (t) = \frac{1}{β - 1} (1 - t^{\frac{1 - β}{1 - α}}), F (p) = {∥ p ∥}_{α}^{α} . \end{matrix}

In this paper, we take any function that is core-concave as a candidate for entropy.

2.2. Posterior Entropy

Motivated by the equivalence of our core-concave entropies with generalised induced entropies, we define the posterior entropy to take the following form:

\begin{matrix} H (p_{X | Y}) = η (\sum_{y \in Y^{+}} p (y) F (p_{X | y})) . \end{matrix}

(4)

Note that the above definition is deliberately different from

\sum_{y \in Y^{+}} p (y) H (p_{X | y})

. In particular,

η

is outside of the expectation. Now, the (information) leakage can be defined as

\begin{matrix} Leakage = H (p_{X}) - H (p_{X | Y}) . \end{matrix}

(5)

The above structure of the posterior entropy is strongly motivated by the following result:

Proposition 1.

For any core-concave H, leakage is non-negative.

Proof.

Replacing from definitions, we have

\begin{matrix} Leakage = H (p_{X}) - H (p_{X | Y}) = η (F (p_{X})) - η (\sum_{y \in Y^{+}} p (y) F (p_{X | y})) . \end{matrix}

For a core-concave H, F is concave; hence, following Jensen’s inequality,

\sum_{y \in Y^{+}} p (y) F (p (x | y)) \leq F (\sum_{y \in Y^{+}} p (y) p_{X | y}) = F (p_{X})

. Therefore, since

η

is a monotonically increasing function, we have

η (\sum_{y \in Y^{+}} p (y) F (p_{X | y})) \leq η (F (p_{X}))

, i.e., leakage is non-negative. ☐

In fact, our leakages satisfy a stronger property:

Proposition 2.

The conditional entropy defined in Equation (4) satisfies the data-processing inequality (DPI).

Proof.

Reference [1] (Lemma 1). ☐

2.3. Gain Functions and g-Leakage

An alternative foundational approach to information leakage is in term of gain functions. As we will use gain functions in our results, we give here a primer on this approach.

A classical interpretation for Shannon entropy is in terms of guessing a secret by asking set membership questions (“is the secret in set X?”). Often in the security community, another guessing model is more appropriate, which is individual guesses: “is the secret x”?

Information-theoretically, the individual guesses scenario is modelled by Min-entropy. This guessing scenario is, however, an all-or-nothing scenario: the attacker either guesses the secret or does not, and right guesses always yield the same reward. In many real world scenarios, however, even guessing part of the secret may be valuable, or guessing different secrets may yield different rewards. These scenarios have motivated the introduction of gain functions and g-vulnerability [18].

A gain function is a real valued function g whose arguments are an attacker guess and the secret:

g (a, x)

quantifies the gain of the attacker for guessing a when the secret is x.

The g-vulnerability is defined as the attacker expected gain for an optimal guess:

V_{g} (p) = sup_{a \in A} \sum_{x \in X} p (x) g (a, x)

where

A

is a countable set (the attacker guesses). From g-vulnerability, one can define posterior g-vulnerability by considering the average vulnerability over all possible outputs, i.e.,

V_{g} (p_{Y | X}) = \sum_{y \in Y} p (y) V_{g} (p_{X | y}) .

Further derived notions are g-entropy and g-leakage. g-entropy is defined as the negative log of the vulnerability:

- log V_{g} (p)

. Similarly, posterior g-entropy is defined as the negative log of the posterior vulnerability:

- log V_{g} (p_{Y | X})

. g-leakage is the difference between the g-entropy and the g-posterior entropy. An important property of gain functions, which we use in the game-theoretical analysis, is that any convex function can be defined using gain functions ([19] (Theorem 5)).

3. Optimal Channel Design

The general setting in our paper is the following: Given a prior distribution on input (secret) variable X as p, we (the defender) would like to design a channel

p_{Y | X}

within some operational constraints, such that the channel leaks minimally about the secret X through its output Y.

Let

Ω \subseteq X \times Y

define the permissible outputs (observable) for each input (secret). Specifically, if

(x, y) \notin Ω

, then, for input x, the designer cannot produce output y. This can represent the “hard” operational constraints on the channel. Hence, the channel, along with Equation (1), should satisfy:

\begin{matrix} p (y | x) = 0 \forall (x, y) \notin Ω . \end{matrix}

(6)

We will refer to Equation (6) as “hard” constraints, as they strictly forbid some input–output pairs “path-wise”, that is, for each realisation of the input. As a consequence, an adversary can eliminate the forbidden inputs for an observable when making an inference. For ease of notation, for any given

Ω

, we will denote the space of channels that satisfy Equations (1) and (6) by

Γ

. That is,

\begin{matrix} Γ ≜ \{p (y | x) ∣ p (y | x) \geq 0, \sum_{y : (x, y) \in Ω} p (y | x) = 1 \forall x\} . \end{matrix}

The design requirement for a legitimate channel that satisfies the hard constraints can now be expressly represented as

p_{Y | X} \in Γ

.

The naming of hard constraints is to contrast with the “soft” constraints, which are expressed in terms of an expected value. In particular, there are many interesting cases where it may be “feasible” to assign the same observable for all secrets, but such a move may result in a huge deterioration in the system’s quality of the service (QoS). In such cases, the goal is to strike an optimal “balance” between information leakage and QoS. This is for instance the setting in geo-location privacy-utility trade-off [10,11,20] and secrecy-delay trade-off in bucketing as a defence against timing attacks [21,22].

In its most basic form, the QoS can be captured as an expected value of a “payoff” (desirability) function. In particular, let

u : X \times Y \to R

, where

u (x, y)

represents how good the realised output is for a particular input. Then the expected value of the pay-off is simply:

\sum_{x, y} p (x) p (y | x) u (x, y)

, which can be a metric for the QoS of the channel. The channel design problem then becomes a “two-objective” optimisation: (a) minimising leakage and (b) maximising the QoS. The solution concept for multi-objective optimisations is of “Pareto-efficiency” (Pareto-optimality), which are the solutions with a guarantee that no alternative can simultaneously improve all of the objectives (at least one of them strictly). One of the standard methods of converting a multi-objective optimisation (MOO) to (a series of) single-objective optimisations (SOOs) is to present all but one of the objectives as inequality constraints. Specifically, we can introduce a lower threshold

u_{min}

on the QoS by imposing:

\sum_{x, y} p (x) p (y | x) u (x, y) \geq u_{min}

. Then by varying the value of

u_{min}

and solving the resulting SOOs, the Pareto-frontier (the set of Pareto-optimal solutions) will be found (see e.g., [23]). Hence, with this in mind, for the rest of the paper, we will be dealing with SOOs. We will refer to the constraint of

\sum_{x, y} p (x) p (y | x) u (x, y) \geq u_{min}

as the “soft” constraint, since it is expressed in terms of the expected value, distinguishing it from the “hard” constraints represented by

Ω

(or equivalently,

Γ

), for each realisation of the secret.

As we argued before, the aim is to design channels that have the lowest leakage of information about the input while satisfying a set of operational constraints, and the leakage is defined as the difference between the posterior and prior entropies. The first point to note is that the choice of the channel cannot change the prior entropy, as the prior entropy of the input is entirely governed by its prior distribution, which we assume is a “given” parameter that the defender cannot control. Therefore, the problem of minimising the leakage becomes equivalent to maximising the posterior entropy (equivocation).

Putting things together, the optimal channel design problem in its most general form becomes

\begin{matrix} \begin{matrix} Given : & p_{X}, Γ, η, F, u, u_{min} \\ Solve : & max_{p_{Y | X} \in Γ} H (X | Y) = η (\sum_{y \in Y^{+}} p (y) F (p_{X | y})), s . t . \sum_{x, y} p (x) p (y | x) u (x, y) \geq u_{min} \end{matrix} \end{matrix}

(7)

where the main notations are described in Table 1.

Before we get to our analysis, we present two minimalistic examples to instantiate the constraints. Note that each of these contexts of course have their idiosyncrasies that are abstracted away for the purpose of this paper. The first toy example is motivated by geo-location privacy. Figure 1 depicts four locations

x_{1}

to

x_{4}

, where the configuration is a representation of their relative positions. The defender is in one of these four locations and generates an observable, which can be its reported coordinates, based on which it receives a location-based service (LBS). Suppose, in particular, that

x_{1}

and

x_{2}

are near enough that the same observable can be reported for both of them, but

x_{1}

is too far from

x_{3}

and

x_{4}

such that reporting the same coordinates with them is either infeasible (e.g., it will not get any network connectivity from an access point) or it will be unacceptable (the quality of the received utility will be too poor). Moreover,

x_{2}

,

x_{3}

, and

x_{4}

are close enough to produce the same observable. If we label the observables simply by the subset of the secrets that can produce them, then the set of admissible secret-observable pairs, i.e.,

Ω

, is

{(x_{1}, {x_{1}}),

(x_{2}, {x_{2}}),

(x_{3}, {x_{3}}),

(x_{4}, {x_{4}}),

(x_{1}, {x_{1}, x_{2}}),

(x_{2}, {x_{1}, x_{2}}),

(x_{2}, {x_{2}, x_{3}}),

(x_{3}, {x_{2}, x_{3}}),

(x_{3}, {x_{3}, x_{4}}),

(x_{4}, {x_{3}, x_{4}}),

(x_{2}, {x_{2}, x_{3}, x_{4}}),

(x_{3}, {x_{2}, x_{3}, x_{4}}),

(x_{4}, {x_{2}, x_{3}, x_{4}})}

. This

Ω

determines the hard constraints on the problem, e.g., we must have

p ({x_{2}, x_{3}, x_{4}} | x_{1}) = 0

because

(x_{1}, {x_{2}, x_{3}, x_{4}}) \notin Ω

.

As another example, consider a minimalistic bucketing example depicted in Figure 2. The axis denotes time duration, and

x_{1}

to

x_{4}

represent the distinct execution times of four distinct (encryption or decryption) processes, i.e., Process 1 takes

x_{1}

time to finish, and so on. If the result of each process is released immediately upon finishing, then they can be uniquely identified just by the timing “side channel”. The result of a finished process can be deferred and released at a later time, to become identical to other processes that take longer to finish. This superset duration time constitutes a bucket. In the figure, the arrows represent whether a secret can be deferred till the finishing time of a longer processes. Specifically, suppose that the delay limitation for Process 1 does not allow it to be released as late as

x_{3}

or

x_{4}

. Therefore, the hard constraints can be identically represented as in the previous toy example.

4. Optimal Channel Design is Convex Programming

We now show that the problem of finding an optimal channel is a “convex optimisation” (also known as “convex programming” [24,25]). This is a useful result, because convex optimisations have desirable characteristics, e.g., many efficient algorithms for solving them exist (e.g., interior methods [25]). Moreover, any local optimum has the guarantee to also be a global optimum, so in particular any “descent” algorithm will necessarily converge to a global optimum. Additionally, in Proposition 4, we show that the Karush–Kuhn–Tucker (KKT) conditions fully describe the optimal channel (represent necessary and sufficient conditions of optimality).

Proposition 3.

The optimisation problem of Equation (7) for any choice of the pay-off and core-concave entropy functions is solved by convex programming.

Proof.

η

from Equation (7) can be simply ignored for both cases, since it is an increasing

R \to R

function. Our optimisation variable is

p_{Y | X} \in R^{| X | | Y |}

. In particular, consider it as a

| X | | Y | \times 1

vector. All we need to show is that (a) the constraints of the optimisation define a convex subset of

R^{| X | | Y |}

and (b) the objective function of the maximisation is concave in

p_{Y | X}

.

Establishing (a) is simple: the constraint

p_{Y | X} \in Γ

, which is equivalent to Equations (1a,b) and (6) trivially define a convex subset. The minimum expected utility constraint is also linear in

p_{Y | X}

, where the coefficient of

p (y | x)

is

p (x) u (x, y)

. Hence, the constraints of the problem define a convex subset of

R^{| X | | Y |}

. In fact, they define a bounded polyhedron, as the feasible set is the intersection of half-spaces and it does not contain a whole line.

We establish part (b) by expressing H as a composition of a number of transformations that preserve concavity:

First affine transformation $f_{i}$ : projection of $p (x, y)$ onto the sub-coordinate where $y = y_{i}$ , that is, the transformation ${(p (x_{j}, y_{i}))}_{i, j} \to {(p (x_{j}, y_{i}))}_{j}$ . Composition with an affine mapping preserves concavity/convexity.
Second affine function $g_{1}$ : extension of a vector with its summation of elements, i.e., the transformation: $x \to (x, \sum_{j} x_{j})$ .
Perspective transformation $g_{2}$ : Given a function $F : R^{n} \to R$ , consider $g_{2} : R^{n + 1} \to R$ , called perspective transformation, defined as follows: $g_{2} (y, t) = t F (y / t)$ where $dom g_{2} = {(p, t) ∣ p / t \in dom F, t > 0}$ . Then, if F is concave, so is $g_{2}$ [24] (Chapter 3.2.6).

Now, we can write

\begin{matrix} H (p (x, y)) = \sum_{i : p (y_{i}) > 0} p (y_{i}) F (p (x | y_{i})) = \sum_{i : p (y_{i}) > 0} g_{2} (g_{1} (f_{i} (p (x, y)))) . \end{matrix}

Hence, H is concave in

p (x, y)

. ☐

As mentioned before, a fundamental property of convex optimisations is that any local optimum is a global optimum. In what follows, we establish another important property of the optimal channel design problems: that the Karush–Kuhn–Tucker (KKT) conditions provide both necessary and sufficient conditions for optimality. For an overview of the Lagrangian duality and KKT conditions the reader can consult with the rich literature on convex programming such as [24] (Chapter 5) and [26] (Chapter 28).

Proposition 4.

KKT conditions are necessary and sufficient for solving the optimal channel design problem described by Equation (7).

Proof.

We start by noticing that, in the most basic form, KKT conditions are expressed for cases where the function in the objective and constraints are “continuously differentiable”, whereas some of our convex objective functions (e.g., in the case of min-entropy or guesswork) are piecewise linear. There is however a simple and standard translation from piecewise-linear convex functions into continuously differentiable functions by forming the epigraph problem [24] (§5.2.5).

The proof is straightforward: all of our constraints are affine hence the KKT conditions are necessary—this is known as “Linearity Constraint Qualification” (LCQ). Moreover, since we showed that these problems are convex optimisations, the KKT conditions are also sufficient [24] (§5.5.3). ☐

The “Lagrangian” for the problem of Equation (7), denoted by L is:

\begin{matrix} L = \sum_{y} [\sum_{x^{'}} p_{x^{'}} p (y | x^{'})] F (\frac{{(p_{s} p (y | x))}_{x \in X}}{\sum_{x^{'}} p_{x^{'}} p (y | x^{'})}) + \sum_{x, y} λ_{y}^{x} p (y | x) + \sum_{x} μ_{x} (\sum_{y} p (y | x) - 1) + \\ ρ (\sum_{x, y} p_{x} p (y | x) u (x, y) - u_{min}) + \sum_{(x, y) \notin Ω} γ_{y}^{x} p (y | x) \end{matrix}

(8)

where the multipliers

μ, γ

are from the equality constraints and are therefore free (no sign constraint), whereas the multipliers

λ, ρ

pertain to inequalities constraints and are hence required to be positive (dual feasibility).

The optimisation problem then becomes equivalent to solving the following KKT conditions:

Vanishing first order derivatives of L with respect to each of the optimisation variables $p (y | x)$ , that is, $\nabla L = 0$ (where ∇ is the gradient with respect to the (primal) variables $p (y | x)$ ). That is, for each $p (y | x)$ : $\frac{\partial L}{\partial p (y | x)} = 0$ .
Primal feasibility: $p_{Y | X} \in Γ$ .
Dual feasibility: $λ_{y}^{x} \geq 0, \forall x, y$ , and $ρ \geq 0$ .
Complementary slackness: $\forall x, y λ_{y}^{x} p (y | x) = 0$ and $ρ (\sum_{x, y} p_{x} p (y | x) u (x, y) - u_{min}) = 0$ .

5. Game-Theoretical Interpretation

We now present a game-theoretical framework for the general optimal channel design problem. The problem solution is shown to be a Nash equilibrium in a sequential zero-sum game. The main result proved in this section is a correspondence between any defender Nash equilibrium in these games and convex programming problems from Proposition 3. Moreover, when the game is finite, the solution can be found with linear programming and, hence, in a more efficient way than the general case. An important property of the game interpretation is that it provides not only the optimal channel design but also the attacker optimal attack strategy.

Consider the following two-player zero-sum game between a defender and an adversary: “Nature” chooses a realisation of a random variable X from the finite set

X

according to the publicly known probability distribution p. The defender, observes the realisation of x, and chooses an action from the finite set

Y

. Hence, the space of the pure strategies of the defender are all functions from

X

to

Y

, i.e.,

Y^{X}

. Each pure strategy of the defender corresponds to a deterministic channel. Similarly, a behavioural strategy of the defender corresponds to a probabilistic channel,

p (Y | X)

, whose space is

{(Δ Y)}^{X}

. The adversary, after observing y, makes a guess a from the countable (but potentially infinitely-sized) set

A

. Hence, the space of the adversary’s pure strategies (deterministic plans of action) is

A^{Y}

. A behavioural strategy of the adversary, designated by

q (A | Y)

, assigns a potentially probabilistic guess to each output. Hence, the space of adversary’s behavioural strategies is

{(Δ A)}^{Y}

. A pure and behavioural strategy profile of the game are respectively the pairs

(d, a) \in (Y^{X} \times A^{Y})

and

(p (Y | X), q (A | Y)) \in ({(Δ Y)}^{X} \times {(Δ A)}^{Y})

.

The payoff of the game can in general be represented by the (bounded) function

v : X \times Y \times A \to R

. That is, the outcome of each instance of the game is that the adversary wins, and the defender loses,

v (x, y, a)

units; if the (realisations) of the channel input, the channel output and the adversary’s guess have been x, y, and a, respectively. Let V represent the expected payoff of the game. The expectation is taken with respect to the random realisation of the input according to the prior p as well as any randomisation present in the strategies of the two players. Specifically,

\begin{matrix} V = \sum_{x \in X} \sum_{y \in Y} \sum_{a \in A} p (x) p (y | x) q (a | y) v (x, y, a) . \end{matrix}

(9)

The defender wants to minimise V while the adversary wants to maximise it. Unlike the defender, the adversary does not observe the realisation of X; for this reason, this is a game of asymmetric information.

5.1. Nash Equilibria and Saddle-Point Strategies

A Nash equilibrium (NE) is a standard solution concept in game theory, which states that each strategy should be the best response assuming the strategy of the other player(s) is fixed. For two-player zero-sum games (2PZSGs), the set of NEs has a stronger interpretation—that of a saddle point. We first briefly describe this solution concept.

The defender may adopt the following worst-case scenario argument: assuming that any strategy that is adopted by the defender is going to be revealed to the adversary to best respond to it, the “robust” optimisation of the defender (the minimiser) becomes the following:

\begin{matrix} \bar{V} ≜ inf_{p (Y | X) \in Γ} sup_{q (A | Y)} V (p (Y | X), q (A | Y)) . \end{matrix}

We denote the value of the above optimisation with

\bar{V}

to indicate that this is the highest expected payoff to the adversary. On the other hand, the best-case scenario of the defender is derived from the following argument: suppose the strategy of the adversary is given and the defender can design their strategy accordingly. Then this optimistic scenario for the defender (which is the worst-case for the adversary) leads to the following problem:

\begin{matrix} \underset{̲}{V} ≜ sup_{q (A | Y)} inf_{p (Y | X) \in Γ} V (p (Y | X), q (A | Y)) . \end{matrix}

Clearly, we have

\underset{̲}{V} \leq \bar{V}

. If we have

\underset{̲}{V} = \bar{V} = V^{*}

, we say the game has a value

V^{*}

. Further, a saddle-point strategy pair

(p^{*} (Y | X), q^{*} (X | Y))

is a strategy pair that satisfies the following:

\begin{matrix} \forall p (X | Y) \in Γ, V (p (Y | X), q^{*} (A | Y)) \leq V (p^{*} (Y | X), q^{*} (A | Y)) \leq V (p^{*} (Y | X), q^{*} (A | Y)), \forall q (A | Y) \in {(Δ A)}^{Y} . \end{matrix}

That is, a saddle-point strategy attains the value of the game:

V^{*} = V (p^{*} (Y | X), q^{*} (A | Y))

. Then the argument for the saddle-point strategies as the solution concept of the 2PZSG is strong: the saddle-point strategy gives each player a guarantee of the utility no-matter what the other player’s strategy is. In what follows, we derive the condition for the saddle-point strategy of the defender and adversary, respectively.

For the defender, a saddle-point strategy solves

{inf}_{p (Y | X) \in Γ} {sup}_{q (A | Y)} V (p (Y | X), q (A | Y))

. As before, let

Y^{+}

be the set of outputs with a strictly positive probability of realisation. Since only these “on-path” outputs contribute to the expected payoff, we can rewrite Equation (9) as

\begin{matrix} V = \sum_{x \in X} \sum_{y \in Y^{+}} \sum_{a \in A} p (x) p (y | x) q (a | y) v (x, y, a) = \sum_{y \in Y^{+}} p (y) \sum_{a \in A} q (a | y) \sum_{x \in X} v (x, y, a) \frac{p (x) p (y | x)}{p (y)} \\ = \sum_{y \in Y^{+}} p (y) \sum_{a \in A} q (a | y) \sum_{x \in X} v (x, y, a) p (x | y) . \end{matrix}

Hence,

\begin{matrix} sup_{q (A | Y)} V (p (Y | X), q (A | Y)) = \sum_{y \in Y^{+}} p (y) sup_{a \in A} \sum_{x \in X} v (x, y, a) p (x | y) . \end{matrix}

In particular, for each y, the adversary can put all the probability weight on an action that maximises the expected value of

v (X, y, a)

with

X \sim p_{X | y}

, where

p_{X | y}

follows Bayes’ rule. Note that, although we started from an agnostic stance, Bayes’ rule turns out to be indeed the optimal belief update of the adversary. The saddle-point strategy of the defender hence solves the following optimisation:

\begin{matrix} inf_{p (Y | X) \in Γ} \sum_{y \in Y^{+}} p (y) sup_{a \in A} \sum_{x \in X} v (x, y, a) p (x | y) . \end{matrix}

(10)

For the saddle-point of the adversary, we can rewrite Equation (9) as

\begin{matrix} V = \sum_{x \in X} p (x) \sum_{y \in Y^{+}} p (y | x) \sum_{a \in A} q (a | y) v (x, y, a) . \end{matrix}

Therefore, the best strategy of the defender for a given x is to put all the probability weight of

p (y | x)

of the y that achieves the smallest

\sum_{a \in A} q (a | y) v (x, y, a)

across all feasible y’s for that x, i.e.,

\begin{matrix} inf_{p (Y | X)} V (p (Y | X), q (A | Y)) = \sum_{x \in X} p (x) inf_{y \in Y : (x, y) \in Ω} \sum_{a \in A} v (x, y, a) q (a | y) . \end{matrix}

Hence, the saddle-point strategy of the adversary comes from solving the following optimisation:

\begin{matrix} sup_{q (A | Y) \in {(Δ A)}^{Y}} \sum_{x \in X} p (x) inf_{y \in Y, (x, y) \in Ω} \sum_{a \in A} v (x, y, a) q (a | y) . \end{matrix}

(11)

We will consider the following payoff function:

\begin{matrix} v (x, y, a) = g (a, x) - λ u (x, y) \end{matrix}

where

λ \in R^{+}

and

g, u

are real valued functions. This payoff function can be understood as a weighted difference between the gain of the attacker in guessing the secret and the utility of the channel.

We will refer to such zero-sum game between a defender and an adversary as

G

(also G-game), which is specified by

〈 p, Ω, g (x, a), λ, u (x, y) 〉

. For such a game, the optimisation problem for saddle-point strategy of the defender in Equation (10) becomes

\begin{matrix} inf_{p (Y | X) \in Γ} \{\sum_{y \in Y^{+}} p (y) sup_{a \in A} (\sum_{x \in X} g (x, a) p (x | y) - λ \sum_{x \in X} p (x | y) u (x, y))\} \\ = inf_{p (Y | X) \in Γ} \{\sum_{y \in Y^{+}} p (y) sup_{a \in A} (\sum_{x \in X} g (x, a) p (x | y)) - λ \sum_{x, y} p (x) p (y | x) u (x, y)\} . \end{matrix}

(12)

Theorem 1.

For any optimal channel design problem in Equation (7), there is an induced game

G

, where the optimal channel is the saddle-point strategy of the defender. Conversely, for any game

G

, the saddle-point strategy of the defender is a solution to an induced optimal channel design problem.

Proof.

We showed in Proposition 3 that the optimal channel design problem for any core-concave H is a convex optimisation. Since

η

is an increasing function, it can be removed from the optimisation without any effect. Now, from convex optimisation theory, we know that there exists a Lagrange multiplier

λ \geq 0

such that the solutions of the original optimisation matches those of the following Lagrange relaxation problem:

\begin{matrix} sup_{p_{Y | X} \in Γ} \{\sum_{y \in Y^{+}} p (y) F (p_{X | y}) + λ (\sum_{x, y} p (x) p (y | x) u (x, y) - u_{min})\} . \end{matrix}

Or equivalently,

\begin{matrix} - inf_{p_{Y | X} \in Γ} \{\sum_{y \in Y^{+}} p (y) (- F (p_{X | y})) - λ \sum_{x, y} p (x) p (y | x) u (x, y)\} + λ u_{min} . \end{matrix}

Now, since

- F (p)

is a convex function of

p \in Δ X

, there is a countable set

A

and a function

g_{F} : A \times X \to R

such that

\begin{matrix} \forall p \in Δ X, - F (p) = sup_{a \in A} \sum_{x \in X} p (x) g_{F} (a, x) . \end{matrix}

In particular,

g_{F} (a, x)

can be constructed as follows: This follows from application of the supporting hyperplanes and a limit argument, as presented, e.g., in [19] (Theorem 5). Therefore, the optimisation can be written as

\begin{matrix} - inf_{p (Y | X) \in Γ} \{\sum_{y \in Y^{+}} p (y) (sup_{a \in A} \sum_{x \in X} p (x | y) g_{F} (a, x)) - λ \sum_{x, y} p (x) p (y | x) u (x, y)\} + λ u_{min} . \end{matrix}

Now, note that the minimisation is defining exactly the saddle-point strategy of the defender in a game

G = 〈 p, Ω, g_{F} (x, a), λ, u (x, y) 〉

as given in Equation (12).

Now, for the reverse direction, consider the game

G = 〈 p, Ω, g (x, a), λ, u (x, y) 〉

. The saddle-point strategy of the defender is a solution of the optimisation in Equation (12). Note that the

{sup}_{a \in A} \sum_{x \in X} p (x | y) g (a, x)

characterises a convex function, or a negative of a concave function, which we call

F_{g}

, i.e., let

\begin{matrix} - F_{g} (p_{X | y}) ≜ sup_{a \in A} \sum_{x \in X} p (x | y) g (a, x) . \end{matrix}

With this notation, the saddle-point strategy of the defender solves

\begin{matrix} inf_{p (Y | X) \in Γ} - \{\sum_{y \in Y^{+}} p (y) F_{g} (p_{X | y}) - λ \sum_{x, y} p (x) p (y | x) u (x, y)\} . \end{matrix}

(13)

Let a saddle-point strategy of the defender be denoted by

p^{*} (Y | X)

. Now consider the the following convex optimisation:

\begin{matrix} - inf_{p (Y | X) \in Γ} \sum_{y \in Y^{+}} p (y) F_{g} (p_{X | y}) s . t . \sum_{x, y} p (x) p (y | x) u (x, y) \geq u_{min} \end{matrix}

(14)

where

u_{min} = \sum_{x, y} p (x) p^{*} (y | x) u (x, y)

. We claim that these two convex optimisations are equivalent. To see this, note that the KKT conditions are necessary and sufficient for optimality in both optimisations. Moreover, if we take the

λ

in Equation (13) to be the Lagrange multiplier of the minimum utility constraint in Equation (14), these KKT conditions are exactly identical, except that Equation (14) has an additional complementary slackness condition:

λ (\sum_{x, y} p (x) p (y | x) u (x, y) - u_{min}) = 0

. Since

λ > 0

, we should have, for an optimum of Equation (14),

\sum_{x, y} p (x) p (y | x) u (x, y) - u_{min} = 0

, which holds for the saddle-point strategy by our specific choice of

u_{min} = \sum_{x, y} p (x) p^{*} (y | x) u (x, y)

. ☐

When the action-space of the adversary is finite, the saddle point strategies can be computed using linear programming. Specifically:

Proposition 5.

If the game has a finite number of pure strategies, then the saddle point strategies expressed by Equations (10) and (11) can be computed as the solution to the following linear program (LP) and its dual:

\begin{matrix} {\bar{V}}^{*} = min_{p, v} \sum_{y \in Y} v_{y} - λ \sum_{x, y} p (x) p (y | x) u (x, y) \\ s . t . : v_{y} \geq \sum_{x \in X} g (a, x) p (x) p (y | x), \forall a \in A, \forall y \in Y \\ p (y | x) \geq 0, \forall y \in Y, \forall x \in X, \sum_{y \in Y} p (y | x) = 1, \forall x \in X, p (y | x) = 0, \forall (x, y) \notin Ω . \end{matrix}

Introducing variables

u = (u_{x})

for

x \in X

, the dual of the above LP is

\begin{matrix} {\underset{̲}{V}}^{*} = max_{α, u} \sum_{x \in X} p (x) u_{x} \\ s . t . : u_{x} \leq \sum_{a \in A} (g (a, x) - λ u (x, y)) q (a | y), \forall (x, y) \in Ω \\ q (a | y) \geq 0, \forall a \in A, \forall y \in Y, \sum_{a \in A} q (a | y) = 1, \forall y \in Y . \end{matrix}

Proof.

In the first LP, the constraints

v_{y} \geq \sum_{x \in X} g (a, x) p (x) p (y | x), \forall a \in A,

and

\forall y \in Y

guarantee that, for each y, the optimisation chooses

v_{y} = {max}_{a \in A} \sum_{x \in X} g (a, x) p (x) p (y | x)

; hence, the objective function becomes exactly as in Equation (10).

Similarly, for the second LP, the constraints

u_{x} \leq \sum_{a \in A} (g (a, x) - λ u (x, y)) q (a | y),

and

\forall (x, y) \in Ω

guarantee that the optimisation chooses

u_{x} = {min}_{y \in Y, (x, y) \in Ω} \sum_{a \in A} (g (a, x) - λ u (x, y)) q (a | y)

, which is exactly the optimisation problem of the adversary as in Equation (11). ☐

5.2. The Adversary’s Problem: Robust Inference

One important advantage of the game-theoretical analysis is that it connects the problem of the defender and attacker. Here, we provide a practical interpretation of the adversary’s problem: Suppose we would like to extract information (i.e., infer) about X by observing Y. We know the prior over X, but we do not know

p_{Y | X}

, i.e., the channel. All we know is that the channel has to respect some hard and/or soft operational constraints. What is the best inference about X in the absence of the channel? One approach is to consider the worst case among all possible channels that satisfy the constraints. The resulting “robust” strategy will have the minimum inference guarantee for any feasible realisation of the channel. The game-theoretical analysis reveals that the optimal channel design problem and the robust inference problem are equivalent; i.e., they are duals of each other.

5.3. Measure-Invariant Optimality

Notice that in all cases seen so far the optimal solution depends on the choice of entropy. There is, however, a particular case studied in Reference [1] where the optimiser is universal, i.e., is the same for all entropies:

Proposition 6.

(Theorem 1 in Reference [1]) When there are no soft constraints and the hard constraints are equivalent to just a size-cap of k on the pre-images of the outputs, there is a closed form solution for the Nash equilibrium. Moreover, this solution is universally optimal, i.e., it is optimal for any choice of entropy.

6. Uncertainty about the Prior

We have assumed that the input is realised according to a single distribution p that is known to the adversary. We now analyse the setting where the prior distribution of the input can be one of a number of possibilities, each happening with a known probability (a distribution over distributions. That is, the distribution of the input itself depends on a hidden random variable, which we refer to as the context. The adversary knows the joint statistics of the hidden context and the input, but does not get to observe the realisation of the context.

At a high level, the main result of this section is the following: the best strategy for the defender is not to “customise” its strategy with respect to the context depending on the particular prior given each context, but rather to build an “averaged prior”, and design the best strategy over this averaged prior and play it irrespective of the contexts. This result implies that the context-dependent optimal channel design problem reduces to an equivalent context-independent channel design problem over the mixed prior.

This result may not be immediately intuitive, as there can be a counterargument as follows: Among the available priors, there are some particularly “good” ones, in the sense that they are very conducive to hide the secret (e.g., they are very close to uniform in a symmetric constraint setting). Then shouldn’t the defender adopt the optimal channel for such priors in those contexts—especially if they have a high probability weight of occurrence? Our result refutes this intuitive argument.

To formalise the setting, let the space of the discrete random variable of the context be

C = {c_{1}, \dots, c_{| C |}}

. Without loss of generality, we assume that the context has full support, i.e.,

p_{C} (c) > 0, \forall c \in C

. The channel designer (the defender) knows the true distribution of the secret. Technically speaking, the defender “observes” the realisation of the context. The adversary, on the other hand, does not directly observe the context, but knows the probability of the realisation of each context,

p_{C}

, as well as the (conditional) probability distribution of the secret given each context,

p_{X | C}

. Note that knowledge of

p_{C}

and

p_{X | C}

is equivalent to the knowledge of the “joint” probability distribution of the context and the secret

p_{X, C}

.

The adversary only sees the output Y and wants to “infer” about the input X. In the worst case, one can assume that the adversary knows

p_{Y | X, C}

and hence, using his knowledge of

p_{X, C}

, can use Bayes’ rule to update his best belief about the secret after observing Y, i.e., constructing his posterior:

\begin{matrix} p (x | y) = \frac{p (x, y)}{p (y)} = \frac{\sum_{c \in C} p (c) p (x | c) p (y | x, c)}{\sum_{x^{'} \in X} \sum_{c \in C} p (c) p (x^{'} | c) p (y | x^{'}, c)} . \end{matrix}

Note that the defender is not directly interested in not leaking information about the context and only cares about X, but should be wary of how the adversary can use their information about the joint distribution of the context and input to intuit the input based on the observation. In addition, for clarity, we repeat that the adversary does not “observe” the context nor the secret. (For the scenario where the adversary can directly observe the context, the problem will reduce to designing

| C |

optimal channels according to optimisations as in Equation (7) with priors

p_{X | c}

for each

c \in C

.)

The defender decides what observable to produce per each secret in each context, potentially using randomisation and benefit from the ambiguity that it can inject. As before, the strategy has to satisfy some operational constraints. We may have hard constraints prescribing which secrets can produce which observables, which in part determine which subsets of secrets can be conflated with each other. In the previous sections, we expressed these “hard” operational constraints through

Ω \subseteq X \times Y

, representing the set of permissible secret-observable pairs. In the presence of contexts, in the most general form, the permissible observables for a secret may depend on the context as well; thus,

Ω

should be now a subset of

X \times C \times Y

. However, for the result of this section, we assume that these constraints are context-independent, i.e., the same subset of observables is permissible for a secret irrespective of the context, so we keep

Ω

to be a subset of

X \times Y

.

Likewise, there can be soft operational constraints in the form of satisfying a minimum expected utility. The expectation is now taken with respect to the context as well, that is, we must have expectation of the payoff with respect to

X, C, Y

to be no less than

u_{min}

. However, for the result of this section, we assume that the payoff function u, i.e., the measure of “goodness” of each observable for each secret, does not depend on the context. Hence,

\begin{matrix} \sum_{x, c, y} p (x, c) p (y | x, c) u (x, y) \geq u_{min} . \end{matrix}

As before, without loss of generality, assume that we are dealing with core-concave functions, i.e., F is concave and

η

is increasing. Moreover, note that, again, the choice of the strategy cannot affect the prior entropy of the secret. Hence, the problem of designing for minimum leakage is again equivalent to maximising the posterior entropy. Ignoring

η

, since it is just an increasing scalar function, the posterior entropy (as the objective of the maximisation) can hence be written as

\begin{matrix} max_{p_{Y | X, C}} \sum_{y \in Y^{+}} p (y) F (p_{X | y}), where p (y) = \sum_{x, c} p (x, c) p (y | x, c), and p (x | y) = \frac{\sum_{c} p (x, c) p (y | x, c)}{p (y)} . \end{matrix}

(15)

The constraints of the optimisation are:

\begin{matrix} p (y | x, c) \geq 0 \forall y \in Y, (x, c) \in X \times C \end{matrix}

(16a)

\begin{matrix} \sum_{y \in Y} p (y | x, c) = 1 \forall (x, c) \in X \times C \end{matrix}

(16b)

\begin{matrix} p (y | x, c) = 0 (x, y) \notin Ω \end{matrix}

(16c)

\begin{matrix} \sum_{x, c, y} p (x, c) p (y | x, c) u (x, y) \geq u_{min} . \end{matrix}

(16d)

Given any “context-dependent” strategy p, we define a corresponding “context-independent” strategy

\bar{p}

as follows:

\begin{matrix} \bar{p} (y | x) = \sum_{c} p (c | x) p (y | x, c) . \end{matrix}

(17)

To be precise, the strategy is

\tilde{p}

such that for any

c^{'} \in C

,

\tilde{p} (y | x, c^{'}) = \bar{p} (y | x)

, i.e.,

\tilde{p}

represents playing the same randomised strategy of

\bar{p}

irrespective of the context. This context-free strategy is a mixing of the context-dependent strategies with weights equal to conditional probability of the context given the secret. In other words,

\tilde{p}

“marginalises away” the dependence of p on the context. Note however, that we cannot marginalise away the dependence on X, because of the input-dependent constraints: these input-dependent constraints are exactly why the trivial solutions like

p (y | x, c) = 1 / ∥ Y ∥

are not acceptable.

First, we show that

\tilde{p}

is itself a legitimate strategy:

$\tilde{p} (y | x, c) \geq 0$ : trivially (product of non negative terms).
$\forall (x, c) \in X \times C : \sum_{y \in Y} \tilde{p} (y | x, c) = 1$ ; this is because

$\begin{matrix} \sum_{y} \tilde{p} (y | x, c) = \sum_{y} \sum_{c^{'}} p (c^{'} | x) p (y | x, c^{'}) = \sum_{c^{'}} p (c^{'} | x) \sum_{y} p (y | x, c^{'}) = \sum_{c^{'}} p (c^{'} | x) = 1 \end{matrix}$

where we first exchanged the order of the summations, and then respectively used the facts that $p (y | x, c^{'})$ and $p (c^{'} | x)$ are conditional distributions.
We show that the expected payoff under strategy $p_{Y | X, C}$ is the same as the expected payoff under strategy ${\tilde{p}}_{Y | X, C}$ . Therefore, if $p_{Y | X, C}$ satisfies the minimum expected payoff constraint, so does ${\tilde{p}}_{Y | X, C}$ . For this purpose, we establish the following lemma, which we will user later:
Lemma 1.
Let $p_{X, Y}$ and ${\tilde{p}}_{X, Y}$ denote the induced (joint) distribution on $X \times Y$ where, respectively, strategies $p_{Y | X, C}$ and ${\tilde{p}}_{Y | X, C}$ are employed. Then we have $p (x, y) = \tilde{p} (x, y)$ $\forall x, y \in X \times Y$ .
Proof.
$p (x, y) = \sum_{c} p (c | x) p (y | x, c) = \bar{p} (y | x) = \bar{p} (y | x) \sum_{c} p (c | x) = \sum_{c} p (c | x) \tilde{p} (y | x, c) = \tilde{p} (x, y)$ . ☐
Now, the equality of the expected payoff under these two strategies follows as a simple corollary:

$\begin{matrix} \sum_{x, c, y} p (x, c) p (y | x, c) u (x, y) = \sum_{x, c, y} p (x, y, c) u (x, y) = \sum_{x, y} p (x, y) u (x, y) = \sum_{x, y} \tilde{p} (x, y) u (x, y) . \end{matrix}$

The second equality holds because $u (x, y)$ does not depend on c, and $\sum_{c} p (x, y, c) = p (x, y)$ . The third equality is due to Lemma 1.
$\tilde{p} (y | x, c) = 0$ $\forall (y, x) \notin Ω_{x}$ , trivially. Note that we made the assumption that the hard constraints do not depend on the context, and only on the input.

Next, we show that replacing any context-dependent strategy with its context-independent counterpart would lead to same leakage (irrespective of the choice of the entropy).

Lemma 2.

Let

H (X | Y)

and

\tilde{H} (X | Y)

denote the posterior entropies where strategies

p_{Y | X, C}

and its corresponding

{\tilde{p}}_{Y | X, C}

are used. Then we have

H (X | Y) = \tilde{H} (X | Y)

.

Proof.

This is also a direct consequence of Lemma 1, once we notice that

H (X | Y)

is completely determined by

p_{X, Y}

. ☐

This in turn implies that the search for optimal channels can be restricted to the context-independent ones. We are now ready for the main result of this section: that the (informally) optimal channel problem with uncertainty can be reduced to the classical case of Equation (7):

Theorem 2.

The optimisation in Equation (15) subject to Equation (16) can be simplified to an instance of Equation (7) where the prior distribution is the context-average prior, i.e.,

\sum_{c \in C} p (c) p_{X | c}

. In particular, if

p_{Y | X}^{*}

is an optimal solution of Equation (7) with the average prior, then an optimal solution of Equation (15) subject to Equation (16) is to play

p_{Y | X}^{*}

for all

c \in C

.

Proof.

This proof follows a similar argument as above. In particular, if we let

\tilde{p} (y | x, c) = p^{*} (y | x)

for all

c \in C

, the constraints of Equations (16a)–(16c) follow directly from feasibility of

p_{Y | X}^{*}

for Equation (7). Now, let the joint probability on

X \times Y

induced by

{\tilde{p}}_{Y | X, C}

and

p_{Y | X}^{*}

be respectively denoted by

{\tilde{p}}_{X, Y}

and

p_{X, Y}^{*}

. Then

\tilde{p} (x, y) = \sum_{c} p (c) p (x | c) \tilde{p} (y | x, c) = \sum_{c} p (c) p (x | c) p^{*} (y | x)

. On the other hand,

p^{*} (x, y) = p (x) p^{*} (y | x)

, where

p (x)

is the prior used in Equation (7). Hence, by taking this prior to be

\sum_{c} p (c) p (x | c)

, we ensure that

{\tilde{p}}_{X, Y} = p_{X, Y}^{*}

. This in turn implies that

{\tilde{p}}_{Y | X, C}

satisfies Equation (16d) and, further, has the same

H (X | Y)

as of

p^{*} (y | x)

, which by construction has the highest value. Finally, from Lemma (2),

H (X | Y)

is also the highest value across all (potentially context-dependent) channels. ☐

6.1. Game-Theoretical Interpretation

Let us consider now the implications of Theorem 2 with respect to our game-theoretical interpretation: Notice first we can cast the uncertainty on the prior in terms of a Bayesian game over

G -

games as defined in Section 5: Nature chooses one of the possible priors, and the players then play in the

G -

game corresponding to that prior. Theorem 2 says that the defender optimal strategy in this Bayesian game is to play the Nash equilibrium strategy from the

G -

game corresponding to the average prior.

The adversary has to best respond to the defender move (as the defender plays first), and, as the attacker does not know the subgame chosen by Nature but only sees the move played by the defender (all sub-games in the Bayesian game have the same set of moves), he can only best respond over the average prior.

Hence, the Nash equilibrium in the Bayesian game over a set of priors is given by the Nash equilibrium over the

G -

game over the average prior specified in Theorem 2.

6.2. Discussion

As mentioned in the beginning of this section, an alternative heuristic is to play the best channel per each context. One can argue that, if the “good” priors that lead to a particularly strong channel have a high probability, it may be better to play this heuristic. However, as we established in Theorem 2, this heuristic is wrong. For a numerical depiction, in Figure 3, we have plotted the posterior entropy that is achieved by the optimal strategy

\bar{p}

per Theorem 2 against this heuristic strategy of playing the best channel for each prior. As we can see, for any weight of the two priors (except trivially when the weight is either 0 or 1, where the two strategies become the same), the

\bar{p}

strictly outperforms the heuristic strategy.

7. Conclusions and Future Work

We investigated the problem of designing constrained channels that leak minimally about their input in a general information-theoretical setting. We generalised the notion of information leakage that encompassed a broad range of existing entropy-based measures and established that with respect to all of such measures, the problem of designing optimal channels is a convex optimisation with zero duality gap, where KKT conditions provide both necessary and sufficient conditions of optimality.

We then introduced a game-theoretical framework in which the channel designer is a defender against an information extracting adversary, and showed that the Bayes–Nash equilibrium strategies of the defender correspond to the optimal channels. The game-theoretical framework reveals a dual connection between our optimal channel design and a robust inference problem. In particular, the equilibrium strategies of the adversary solve the following interesting problem: Suppose we know the prior distribution of a random variable and the operational specification of a channel in terms of soft and hard constraints but the exact realisation of the channel is not known, and we would like to make the best inference about the input by observing the output of an instance of such channels. In particular, the equilibrium strategies of the adversary are robust, in the sense that they guarantee a minimal level of inference for any realisation of the channel within the family of the given constraints.

While in this work we emphasised the viewpoint of the defender, future work can investigate the adversary’s problem of robust inference further. Moreover, as suggested by one reviewer, the implication of our results to a general system design and analysis, for instance in the sense of Žampa’s systems theory [27], will be an interesting trajectory. This is inspired by the observation that our notion of a “channel” can be seen as an example of a “stochastic (abstract) system”.

Author Contributions

Both authors contributed to the technical results.

Conflicts of Interest

The authors were guest editors for the special issue “Information Theory in Game Theory”. However they had no role in the review and decision process for this article.

References

Khouzani, M.; Malacaria, P. Relative Perfect Secrecy: Universally Optimal Strategies and Channel Design. In Proceedings of the 29th Computer Security Foundations Symposium (CSF 2016), Lisbon, Portugal, 27 June–1 July 2016; pp. 61–76. [Google Scholar]
Khouzani, M.; Malacaria, P. Leakage-Minimal Design: Universality, Limitations, and Applications. Computer Security Foundations Symposium. In Proceedings of the IEEE 30th Computer Security Foundations Symposium, Santa Barbara, CA, USA, 21–25 August 2017; pp. 305–317. [Google Scholar]
Heusser, J.; Malacaria, P. Quantifying information leaks in software. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC 2010), Austin, TX, USA, 6–10 December 2010; pp. 261–269. [Google Scholar]
Doychev, G.; Köpf, B.; Mauborgne, L.; Reineke, J. CacheAudit: A tool for the static analysis of cache side channels. ACM Trans. Inf. Syst. Secur. (TISSEC) 2015, 18, 1–32. [Google Scholar] [CrossRef]
McIver, A.; Morgan, C.; Rabehaja, T. Abstract hidden Markov models: A monadic account of quantitative information flow. In Proceedings of the 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2015), Kyoto, Japan, 6–10 July 2015; pp. 597–608. [Google Scholar]
Alvim, M.S.; Chatzikokolakis, K.; McIver, A.; Morgan, C.; Palamidessi, C.; Smith, G. Additive and multiplicative notions of leakage, and their capacities. In Proceedings of the 27th Computer Security Foundations Symposium (CSF 2014), Vienna, Austria, 19–22 July 2014; pp. 308–322. [Google Scholar]
Chor, B.; Kushilevitz, E.; Goldreich, O.; Sudan, M. Private information retrieval. J. ACM 1998, 45, 965–981. [Google Scholar] [CrossRef] [Green Version]
Domingo-Ferrer, J.; Solanas, A.; Castellà-Roca, J. H(k)-Private information retrieval from privacy-uncooperative queryable databases. Online Inf. Rev. 2009, 33, 720–744. [Google Scholar] [CrossRef]
Gervais, A.; Shokri, R.; Singla, A.; Capkun, S.; Lenders, V. Quantifying web-search privacy. In Proceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (CCS 2014), Scottsdale, AK, USA, 3–7 November 2014; pp. 966–977. [Google Scholar]
Khoshgozaran, A.; Shahabi, C. Private information retrieval techniques for enabling location privacy in location-based services. In Privacy in Location-Based Applications; Springer: Berlin, Germany, 2009; pp. 59–83. [Google Scholar]
Ardagna, C.A.; Cremonini, M.; Damiani, E.; Di Vimercati, S.D.C.; Samarati, P. Location privacy protection through obfuscation-based techniques. In Data and Applications Security XXI; Springer: Berlin, Germany, 2007; pp. 47–60. [Google Scholar]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Trans. Inf. Forensic Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Alvim, M.S.; Chatzikokolakis, K.; Kawamoto, Y.; Palamidessi, C. Information leakage games. In International Conference on Decision and Game Theory for Security; Springer: Berlin, Germany, 2017; pp. 437–457. [Google Scholar]
Grünwald, P.D.; Dawid, A.P. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Stat. 2004, 32, 1367–1433. [Google Scholar]
Sharma, B.D.; Mittal, D.P. New non-additive measures of entropy for discrete probability distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Alvim, M.S.; Chatzikokolakis, K.; Palamidessi, C.; Smith, G. Measuring Information Leakage Using Generalized Gain Functions. In Proceedings of the 25th Computer Security Foundations Symposium (CSF 2012), Cambridge, MA, USA, 25–27 June 2012; pp. 265–279. [Google Scholar]
Alvim, M.S.; Chatzikokolakis, K.; McIver, A.; Morgan, C.; Palamidessi, C.; Smith, G. Axioms for Information Leakage. In Proceedings of the 29th Computer Security Foundations Symposium (CSF 2016), Lisbon, Portugal, 27 June–1 July 2016; pp. 77–92. [Google Scholar]
Theodorakopoulos, G.; Shokri, R.; Troncoso, C.; Hubaux, J.P.; Le Boudec, J.Y. Prolonging the Hide-and-Seek Game: Optimal Trajectory Privacy for Location-Based Services. In Proceedings of the 13th Workshop on Privacy in the Electronic Society, Scottsdale, AZ, USA, 3 November 2014; pp. 73–82. [Google Scholar]
Köpf, B.; Durmuth, M. A provably secure and efficient countermeasure against timing attacks. In Proceedings of the 22nd Computer Security Foundations Symposium (CSF 2009), Port Jefferson, NY, USA, 8–10 July 2009; pp. 324–335. [Google Scholar]
Köpf, B.; Smith, G. Vulnerability bounds and leakage resilience of blinded cryptography under timing attacks. In Proceedings of the 23rd Computer Security Foundations Symposium (CSF 2010), Edinburgh, UK, 17–19 July 2010; pp. 44–56. [Google Scholar]
Chinchuluun, A.; Pardalos, P.M. A survey of recent developments in multiobjective optimization. Ann. Oper. Res. 2007, 154, 29–50. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Nesterov, Y.; Nemirovskii, A. Interior-Point Polynomial Algorithms in Convex Programming; SIAM: Philadelphia, PA, USA, 1994. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
Rychtáriková, R.; Urban, J.; Štys, D. Žampa’s System Theory: A Comprehensive Theory of Measurement in Dynamic Systems. Acta Polytech. 2018, 58, 128–143. [Google Scholar] [CrossRef]

Figure 1. (Toy Example 1) The “secret” is one of the four possible locations

x_{1}

to

x_{4}

.

x_{1}

is located too far away from

x_{3}

and

x_{4}

for all of the secrets to be able to produce the same observable. To avoid clutter, only two of the feasible observables,

{x_{1}, x_{2}}

and

{x_{2}, x_{3}, x_{4}}

, are demarcated here.

Figure 1. (Toy Example 1) The “secret” is one of the four possible locations

x_{1}

to

x_{4}

.

x_{1}

is located too far away from

x_{3}

and

x_{4}

for all of the secrets to be able to produce the same observable. To avoid clutter, only two of the feasible observables,

{x_{1}, x_{2}}

and

{x_{2}, x_{3}, x_{4}}

, are demarcated here.

Figure 2. (Toy Example 2) The “secrets” are one of the four processes each with a distinct execution time

x_{1}

to

x_{4}

. The arrows denote which process can be deferred to be released at a later finishing time. For instance, Process 2 can be either released instantaneously, i.e., at

x_{2}

, or deferred until

x_{3}

, or until

x_{4}

. In contrast,

s_{1}

cannot be deferred as late as

x_{3}

or

x_{4}

.

Figure 2. (Toy Example 2) The “secrets” are one of the four processes each with a distinct execution time

x_{1}

to

x_{4}

. The arrows denote which process can be deferred to be released at a later finishing time. For instance, Process 2 can be either released instantaneously, i.e., at

x_{2}

, or deferred until

x_{3}

, or until

x_{4}

. In contrast,

s_{1}

cannot be deferred as late as

x_{3}

or

x_{4}

.

Figure 3. Shannon’s posterior entropy between the optimal design as per Proposition 2 and the heuristic best alternative, where the best channel for each prior is designed and played according to the context. The priors are as follows:

P_{1} = (1 / 3, 1 / 3, 1 / 3)

(the “good” prior) and

P_{2} = (0.8, 0.15, 0.05)

(the “bad” prior). The x-axis is the probability (weight) of

P 1

. As we can see, except trivially for the two end-points, the optimal strictly outperforms this “best” heuristic.

Figure 3. Shannon’s posterior entropy between the optimal design as per Proposition 2 and the heuristic best alternative, where the best channel for each prior is designed and played according to the context. The priors are as follows:

P_{1} = (1 / 3, 1 / 3, 1 / 3)

(the “good” prior) and

P_{2} = (0.8, 0.15, 0.05)

(the “bad” prior). The x-axis is the probability (weight) of

P 1

. As we can see, except trivially for the two end-points, the optimal strictly outperforms this “best” heuristic.

Table 1. List of the main notations for the optimal channel design problem.

Parameter	Definition
$X, Y$	input and output random variables of the channel.
$p_{X}$	(given) prior distribution on the input of the channel.
$Ω$	(given) set of permissible input–output pairs (hard constraints).
$u (x, y)$	utility of the channel input–output pair is $(x, y)$ .
$u_{min}$	minimum expected utility that the channel must satisfy (soft constraint).
$p_{Y \| X}$	representation of the channel as the conditional distributions given each input.
$p_{X \| y}$	(Bayesian) posterior distribution of the input if the observed output of the channel is y.
$H (X)$	(prior) entropy of the input, equal to $η (F (p_{X}))$ where $η$ is increasing and F is concave.
$H (X \| Y)$	posterior entropy of the input, equal to $η (\sum_{y} p (y) F (p_{X \| y}))$ for the same $η$ and F.
$Leakage$	leakage of information about input by observing outputs, equal to $H (X) - H (X \| Y)$ .

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khouzani, M.; Malacaria, P. Optimal Channel Design: A Game Theoretical Analysis. Entropy 2018, 20, 675. https://doi.org/10.3390/e20090675

AMA Style

Khouzani M, Malacaria P. Optimal Channel Design: A Game Theoretical Analysis. Entropy. 2018; 20(9):675. https://doi.org/10.3390/e20090675

Chicago/Turabian Style

Khouzani, MHR., and Pasquale Malacaria. 2018. "Optimal Channel Design: A Game Theoretical Analysis" Entropy 20, no. 9: 675. https://doi.org/10.3390/e20090675

APA Style

Khouzani, M., & Malacaria, P. (2018). Optimal Channel Design: A Game Theoretical Analysis. Entropy, 20(9), 675. https://doi.org/10.3390/e20090675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Channel Design: A Game Theoretical Analysis

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions

1.3. Roadmap

2. Notational Conventions and Preliminaries

2.1. Entropy

2.2. Posterior Entropy

2.3. Gain Functions and g-Leakage

3. Optimal Channel Design

4. Optimal Channel Design is Convex Programming

5. Game-Theoretical Interpretation

5.1. Nash Equilibria and Saddle-Point Strategies

5.2. The Adversary’s Problem: Robust Inference

5.3. Measure-Invariant Optimality

6. Uncertainty about the Prior

6.1. Game-Theoretical Interpretation

6.2. Discussion

7. Conclusions and Future Work

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI