Variable-Length Coding with Zero and Non-Zero Privacy Leakage

Zamani, Amirreza; Skoglund, Mikael

doi:10.3390/e27020124

Open AccessArticle

Variable-Length Coding with Zero and Non-Zero Privacy Leakage^†

by

Amirreza Zamani

^*

and

Mikael Skoglund

Division of Information Science and Engineering, KTH Royal Institute of Technology, 100-44 Stockholm, Sweden

^*

Author to whom correspondence should be addressed.

^†

This work was presented in part at the 2023 IEEE Workshop on Information Forensics and Security (WIFS), Nuremberg, Germany, 4–7 December 2023.

Entropy 2025, 27(2), 124; https://doi.org/10.3390/e27020124

Submission received: 13 December 2024 / Revised: 3 January 2025 / Accepted: 7 January 2025 / Published: 24 January 2025

(This article belongs to the Special Issue Information-Theoretic Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

:

A private compression design problem is studied, where an encoder observes useful data Y, wishes to compress them using variable-length code, and communicates them through an unsecured channel. Since Y are correlated with the private attribute X, the encoder uses a private compression mechanism to design an encoded message

C

and sends it over the channel. An adversary is assumed to have access to the output of the encoder, i.e.,

C

, and tries to estimate X. Furthermore, it is assumed that both encoder and decoder have access to a shared secret key W. In this work, the design goal is to encode message

C

with the minimum possible average length that satisfies certain privacy constraints. We consider two scenarios: 1. zero privacy leakage, i.e., perfect privacy (secrecy); 2. non-zero privacy leakage, i.e., non-perfect privacy constraint. Considering the perfect privacy scenario, we first study two different privacy mechanism design problems and find upper bounds on the entropy of the optimizers by solving a linear program. We use the obtained optimizers to design

C

. In the two cases, we strengthen the existing bounds: 1.

| X | \geq | Y |

; 2. The realization of

(X, Y)

follows a specific joint distribution. In particular, considering the second case, we use two-part construction coding to achieve the upper bounds. Furthermore, in a numerical example, we study the obtained bounds and show that they can improve existing results. Finally, we strengthen the obtained bounds using the minimum entropy coupling concept and a greedy entropy-based algorithm. Considering the non-perfect privacy scenario, we find upper and lower bounds on the average length of the encoded message using different privacy metrics and study them in special cases. For achievability, we use two-part construction coding and extended versions of the functional representation lemma. Lastly, in an example, we show that the bounds can be asymptotically tight.

Keywords:

private variable-length coding; minimum entropy coupling; lossless compression; extended functional representation lemma; separation technique

1. Introduction

In this work, the random variable (RV) Y denotes the useful data and is correlated with the private data denoted by RV X. An encoder wishes to compress Y and communicates it to a user over an unsecured channel. The encoded message is denoted by the RV

C

. As shown in Figure 1, it is assumed that an adversary has access to the encoded message

C

, and wants to extract information about X. Moreover, it is assumed that the encoder and decoder have access to a shared secret key denoted by the RV W with size M. The goal is to design an encoded message

C

, which compresses Y, using a variable-length code with the minimum possible average length that satisfies certain privacy constraints. We utilize techniques used in privacy mechanism and compression design problems and combine them to build such a code

C

. In this paper, we introduce an approach and apply it to a lossless compression problem.

In this paper, we consider two scenarios, where the privacy leakage is either zero or a bounded leakage is allowed. We refer to these scenarios as perfect and non-perfect privacy scenarios. In the perfect privacy scenario, considering two different cases, we extend previously existing results [1,2]. In the non-perfect privacy scenario, we extend previously existing results [1,2] by generalizing the perfect privacy constraint and allowing non-zero leakage. We use different privacy leakage constraints, e.g., the mutual information between X and

C

equals

ϵ

, a strong privacy constraint, and bounded per-letter privacy criterion.

1.1. Related Works

Recently, the compression and privacy mechanism design problems have received increased attention [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. Specifically, in [3], the notion of perfect secrecy was introduced by Shannon, where the public data and private data are statistically independent. Equivocation as a measure of information leakage for information theoretic security was used in [4,5,6]. A rate-distortion approach to information theoretic secrecy was studied in [11]. Maximal leakage was introduced in [12] and used in [13] for the Shannon cipher system. Furthermore, bounds on the privacy–utility trade-off have been derived. The fundamental limits of the privacy–utility trade-off when measuring privacy leakage using estimation-theoretic guarantees were studied in [7]. A privacy mechanism design with total variation as a privacy measure was studied in [8]. A source coding problem with secrecy was studied in [10]. The concept of a privacy funnel was introduced in [14], where the privacy–utility trade-off was studied considering the log-loss as privacy measure and a distortion measure for utility. In both [6,10], privacy–utility trade-offs considering equivocation and expected distortion as measures of privacy and utility were studied.

In [15], the problem of privacy–utility trade-off considering mutual information as measures of both privacy and utility was studied. It was shown that, under perfect privacy, the privacy mechanism design problem can be obtained using a linear program. This was generalized in [25] considering a privacy–utility trade-off with a rate constraint. Moreover, in [15], it was shown that non-zero information about useful data Y can be revealed if the kernel (leakage matrix) between private and useful data is not invertible. In [16], the work of [15] was extended by relaxing the perfect privacy assumption, allowing some small bounded privacy leakage. Specifically, privacy mechanisms with a per-letter (point-wise) privacy criterion considering an invertible kernel were designed, allowing a small leakage. This result was extended to a non-invertible leakage matrix in [17].

In [1], an approach to partial secrecy called secrecy by design was introduced and applied to two information processing problems: privacy mechanism design and lossless compression. For the privacy design problem, bounds on the privacy–utility trade-off were derived using the functional representation lemma. These results were obtained under a perfect privacy (secrecy) assumption. In [18], the privacy problems considered in [1] were extended by relaxing the perfect secrecy constraint and allowing some leakages. In [20], a privacy–utility trade-off with two different per-letter (point-wise) privacy constraints was studied. The fundamental limits of private data disclosure were studied in [26], where the goal was to minimize the leakage under the utility constraints with non-specific tasks, and it was shown that, under the assumption that the private data are an element-wise deterministic function of useful data, the main problem can be reduced to multiple privacy funnel (PF) problems. Furthermore, the exact solution to each problem was obtained. The concept of lift was studied in [27], which represents the likelihood ratio between the posterior and prior beliefs concerning sensitive features within a dataset. Further privacy leakage measures, such as the differential privacy introduced in [28],

(ϵ, δ)

-differential privacy [29], local differential privacy [30,31,32,33], maximal leakage [12], and lift [27], have been used in the literature.

Moreover, in [1], the problems of fixed-length and variable-length compression were studied, and upper and lower bounds on the average length of an encoded message were derived. These results were obtained under the assumption that the private data were independent of the encoded message, which corresponds to perfect privacy. The minimum entropy coupling concept and a greedy entropy-based algorithm were studied in [34,35,36]. More specifically, in [36], an information spectrum converse was derived and used to find a lower bound on the minimum entropy problem.

1.2. Our Contributions

In the perfect privacy scenario, we consider the problem in [1], where for the problem of lossless data compression, strong information theoretic guarantees are provided and fundamental limits are characterized when the private data are a deterministic function of the useful data. In this paper, we improve the bounds obtained in [1].

To this end, we combine the privacy design techniques used in [18], which are based on extended versions of the functional representation lemma (FRL) and strong functional representation lemma (SFRL), as well as the lossless data compression design in [1]. We find lower and upper bounds on the average length of the encoded message

C

and study them in different scenarios. Considering a specific set of joint distributions for

(X, Y)

, we propose an algorithm with low complexity to find bounds on the optimizer of privacy problems in [18] with zero leakage. We use the obtained results to design

C

, which achieves tighter bounds compared to [1]. Furthermore, when

| X | \geq | Y |

, we propose a new code that improves on the bounds in [1]. We propose a new design and strengthen the obtained bounds using the minimum entropy coupling concept and a greedy entropy-based algorithm. Finally, in a numerical example, we study the bounds and compare them with [1].

In the non-perfect privacy scenario, our problem is closely related to [1], where we generalize the variable-length lossless compression problem considered in [1] by removing the assumption that X and

C

are independent, i.e.,

I (X; C (Y, W)) = 0

, and therefore allowing a small leakage.

We find the lower and upper bounds on the average length of the encoded message

C

and study them in different scenarios. For achievability, we use two-part construction coding. In an example, we show that the obtained bounds can be asymptotically tight. Furthermore, in the case of perfect privacy, the existing bounds found in [1] were improved considering the case where

X = (X_{1}, X_{2})

and

X_{2}

is a deterministic function of

X_{1}

. The conference versions regarding this paper can be found in [22,23].

Our contribution can be summarized as follows:

(i) In Section 2, we define a private variable-length coding with zero and non-zero leakage problems. We propose a method to deal with the privacy concerns and present privacy–utility trade-offs considering zero and non-zero leakage scenarios.

(ii) In Section 3, we provide a brief overview of the privacy problems in the literature, two essential lemmas, and the minimum entropy coupling concept.

(iii) In Section 4, we find the upper and lower bounds on the trade-offs proposed in Section 2 considering zero and non-zero leakage scenarios. In both scenarios, to find upper bounds, we use the two-part construction coding introduced in [1], which is based on the extended versions of the FRL, the SFRL. We obtain a simple private compression mechanism design that can be optimal in specific scenarios. Moreover, we provide a discussion and comparison of the obtained bounds, with each other and the literature, and the tightness of the bounds is investigated. The results regarding the tightness of the bounds are extended using the concept of the minimum information coupling problem in a zero leakage scenario.

(iv) In Section 5, we present an application in which our method can be applied. Specifically, we consider a cache-aided network and discuss how our results in this paper can be used.

(v) In Section 6, we present an experiment considering the MNIST dataset and we compare our method with previous results in the literature.

Finally, the paper is concluded in Section 7.

2. System Model and Problem Formulation

Let

P_{X Y}

denote the joint distribution of discrete random variables X and Y defined on alphabets

X

and

Y

. We assume that the cardinality

| X |

is finite and

| Y |

is finite or countably infinite. We represent

P_{X Y}

by a matrix defined on

R^{| X | \times | Y |}

and marginal distributions of X and Y by vectors

P_{X}

and

P_{Y}

defined on

R^{| X |}

and

R^{| Y |}

given by the row and column sums of

P_{X Y}

. The relation between X and Y is given by the leakage matrix

P_{X | Y}

defined on

R^{| X | \times | Y |}

. The shared secret key is denoted by the discrete RV W defined on

{1, \dots, M}

and is assumed to be accessible by both the encoder and decoder. Furthermore, we assume that W is uniformly distributed and is independent of X and Y. A prefix-free code with variable-length and shared secret key of size M is a pair of mappings:

\begin{matrix} (encoder) C : X \times Y \times {1, \dots, M} \to {0, 1}^{*} \\ (decoder) D : {0, 1}^{*} \times {1, \dots, M} \to Y . \end{matrix}

The output of the encoder

C (Y, W)

describes the encoded message. The variable-length code

(C, D)

is lossless if

\begin{matrix} P (D (C (X, Y, W), W) = Y) = 1 . \end{matrix}

(1)

2.1. Non-Zero Privacy Leakage (Non-Perfect Privacy)

Similarly to [1], we define

ϵ

-private, strongly

ϵ

-private, and point-wise

ϵ

-private codes. The code

(C, D)

is ϵ-private if

\begin{matrix} I (C (X, Y, W); X) = ϵ . \end{matrix}

(2)

Moreover, the code

(C, D)

is bounded ϵ-private if

\begin{matrix} I (C (X, Y, W); X) \leq ϵ . \end{matrix}

(3)

Let

ξ

be the support of

C (X, W, Y)

. For any

c \in ξ

, let

L (c)

be the length of the codeword. The code

(C, D)

is of

(α, M)

-variable-length if

\begin{matrix} E (L (C (X, Y, w))) \leq α, \forall w \in {1, \dots, M} . \end{matrix}

(4)

Finally, let us define the sets

H^{ϵ} (α, M)

and

H_{b}^{ϵ} (α, M)

, as follows:

H^{ϵ} (α, M) ≜ {(C, D) : (C, D) is ϵ -private and of (α, M) -variable-length}

, and

H_{b}^{ϵ} (α, M) ≜ {(C^{'}, D^{'}) : (C^{'}, D^{'}) is bounded ϵ -private and of (α, M) -variable-length}

. The private compression design problems can be then stated as follows:

\begin{matrix} L (P_{X Y}, M, ϵ) & = inf_{\begin{matrix} \begin{matrix} (C, D) : (C, D) \in H^{ϵ} (α, M) \end{matrix} \end{matrix}} α, \end{matrix}

(5)

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) & = inf_{\begin{matrix} \begin{matrix} (C, D) : (C, D) \in H_{b}^{ϵ} (α, M) \end{matrix} \end{matrix}} α . \end{matrix}

(6)

Clearly, we have

L^{b} (P_{X Y}, M, ϵ) \leq L (P_{X Y}, M, ϵ)

and

L^{b} (P_{X Y}, M, ϵ) \leq L (P_{X Y}, M, 0)

.

Remark 1.

Considering

ϵ = 0

, both (5) and (6) lead to the privacy–compression rate trade-off studied in [1]. In this paper, we generalize this trade-off by considering a non-zero

ϵ

.

2.2. Zero Privacy Leakage (Perfect Privacy)

We define a perfectly-private code. The code

(C, D)

is perfectly-private if

\begin{matrix} I (C (X, Y, W); X) = 0 . \end{matrix}

(7)

Let

ξ

be the support of

C (X, W, Y)

and for any

c \in ξ

let

L (c)

be the length of the codeword. The code

(C, D)

is

(α, M)

-variable-length if (4) holds. Finally, let us define the sets

H (α, M)

as follows:

H (α, M) ≜ {(C, D) : (C, D) is perfectly-private and (α, M) -variable-length}

. The private compression design problems can be stated as follows:

\begin{matrix} L^{0} (P_{X Y}, M) & = inf_{\begin{matrix} \begin{matrix} (C, D) : (C, D) \in H (α, M) \end{matrix} \end{matrix}} α . \end{matrix}

(8)

Furthermore, let us recall the privacy mechanism design problems considered in [19] with zero leakage as follows:

\begin{matrix} g_{0} (P_{X Y}) & ≜ max_{\begin{matrix} \begin{matrix} P_{U | Y} : X - Y - U \\ I (U; X) = 0, \end{matrix} \end{matrix}} I (Y; U), \end{matrix}

(9)

\begin{matrix} h_{0} (P_{X Y}) & ≜ max_{\begin{matrix} \begin{matrix} P_{U | Y, X} : I (U; X) = 0, \end{matrix} \end{matrix}} I (Y; U) . \end{matrix}

(10)

Finally, we define a set of joint distributions

{\hat{P}}_{X Y}

as follows:

\begin{matrix} {\hat{P}}_{X Y} ≜ {P_{X Y} : g_{0} (P_{X Y}) = h_{0} (P_{X Y})} . \end{matrix}

(11)

Remark 2.

The problem (8) was studied in [1]. Here, we improve the bounds in [1] in two different cases and obtain more bounds using the concept of the minimum information coupling problem.

3. Preliminaries and Overview of the Privacy Problems

In this section, we provide a brief overview of a few related privacy problems. We then provide essential lemmas and definitions corresponding to the extended functional representation lemma, the extended strong functional representation lemma, and the minimum information coupling problem. Mutual information has been widely used as a privacy leakage measure [1,2,14,15,18,21,26]. Specifically, in [21], the following problem was addressed

\begin{matrix} g_{ϵ} (P_{X Y}) & ≜ sup_{\begin{matrix} \begin{matrix} P_{U | Y} : X - Y - U \\ I (U; X) \leq ϵ, \end{matrix} \end{matrix}} I (Y; U), \end{matrix}

(12)

where in ([21], Lemma 1), the lower and upper bounds on

g_{ϵ} (P_{X Y})

were derived. Considering perfect privacy, i.e.,

ϵ = 0

,

g_{0} (P_{X Y})

can be obtained by solving a linear program ([15], Theorem 1). This implies that the optimal mapping is the solution to a linear program. Furthermore, it has been shown that, to find the optimal privacy-preserving mapping, it is sufficient to consider U such that

| U | \leq null (P_{X | Y}) + 1

. However, solving the linear program suggested by [15] can become challenging when the size of

Y

or

X

increases. Necessary and sufficient conditions for achieving non-zero utility when the private data are hidden, i.e.,

g_{0} (P_{X Y}) > 0

, were derived in [37,38]. It was shown that

g_{0} (P_{X Y}) > 0

if and only if the rows of

P_{X | Y}

are linearly dependent. This result was demonstrated and generalized in [15], e.g., see ([15], Proposition 1). Moreover, considering an observable private data scenario, necessary and sufficient conditions for attaining non-zero utility, i.e.,

h_{0} (P_{X Y}) > 0

, were derived in ([1], Theorem 5), where

\begin{matrix} h_{ϵ} (P_{X Y}) & ≜ sup_{\begin{matrix} \begin{matrix} P_{U | Y, X} : I (U; X) \leq ϵ, \end{matrix} \end{matrix}} I (Y; U) . \end{matrix}

(13)

It has been shown that

h_{0} (P_{X Y}) > 0

if and only if Y is not a deterministic function of X, i.e.,

H (Y | X) > 0

.

Next, we present a brief overview of the linear program proposed by [15] that attains

g_{0} (P_{X Y})

. To do so, let

β

be a vector in

R^{| Y |}

. It is shown that

β \in Null (P_{X | Y})

if and only if

β \in Null (M)

, where

M \in R^{| X | \times | Y |}

is constructed as follows: Let V be the matrix of right eigenvectors of

P_{X | Y}

, i.e.,

P_{X | Y} = U Σ V^{T}

and

V = [v_{1}, v_{2}, \dots, v_{| Y |}]

, then M is defined as

\begin{matrix} M ≜ {[v_{1}, v_{2}, \dots, v_{rank (P_{X | Y})}]}^{T} . \end{matrix}

Furthermore, if

β \in Null (P_{X | Y})

, then

1^{T} β = 0

. As shown in [15], for every

u \in U

, if the Markov chain

X - Y - U

holds and

I (X; U) = 0

, then the vector

P_{Y | U = u}

belongs to the following convex polytope

S

\begin{matrix} S = \{y \in R^{| Y |} | M y = M P_{Y}, y \geq 0\} . \end{matrix}

Using this, we have

\begin{matrix} X - Y - U, I (X; U) = 0 \leftrightarrow P_{Y | U = u} \in S, \forall u, \end{matrix}

(14)

which results in the following equivalency

\begin{matrix} min_{\begin{matrix} \begin{matrix} P_{U | Y} : X - Y - U \\ I (X; U) = 0 \end{matrix} \end{matrix}} H (Y | U) = min_{\begin{matrix} \begin{matrix} P_{U}, P_{Y | U = u} \in S, \forall u \in U, \\ \sum_{u} P_{U} (u) P_{Y | U = u} = P_{Y}, \end{matrix} \end{matrix}} H (Y | U), \end{matrix}

(15)

where

P_{U}

defined on

R^{| U |}

is the marginal distribution of U. Finally, let

P_{Y | U = u}^{*}, \forall u \in U

be the minimizer of

H (Y | U)

over the set

{P_{Y | U = u} \in S, \forall u \in U | \sum_{u} P_{U} (u) P_{Y | U = u} = P_{Y}}

, then,

P_{Y | U = u}^{*} \in S

for all

u \in U

must belong to the extreme points of

S_{u}

. Furthermore, it is sufficient to consider

null (P_{X | Y}) + 1

extreme points of

S

. Using the previous result, the solution to the minimization in (15) can be obtained in two phases: in phase one, the extreme points of set

S

are identified, while in phase two, proper weights over these extreme points are obtained to minimize the objective function. As argued in [15], the extreme points of

S

are the basic feasible solutions of

S

. Basic feasible solutions of

S

can be found in the following manner. Let

Ω

be the set of indices which correspond to

rank (P_{X | Y})

linearly independent columns of M, i.e.,

| Ω | = rank (P_{X | Y})

and

Ω \subset {1, \dots, | Y |}

. Let

M_{Ω} \in R^{rank (P_{X | Y}) \times rank (P_{X | Y})}

be the sub-matrix of M with columns indexed by the set

Ω

. Then, if all elements of the vector

M_{Ω}^{- 1} M P_{Y}

are non-negative, the vector

V_{Ω}^{*} \in R^{| Y |}

, which is defined in the following, is a basic feasible solution of

S

. Assume that

Ω = {ω_{1}, \dots, ω_{rank (P_{X | Y})}}

, where

ω_{i} \in {1, \dots, | Y |}

and all elements are arranged in an increasing order. The

ω_{i}

-th element of

V_{Ω}^{*}

is defined as the i-th element of

M_{Ω}^{- 1} M P_{Y}

, i.e., for

1 \leq i \leq rank (P_{X | Y})

, we have

\begin{matrix} V_{Ω}^{*} (ω_{i}) = (M_{Ω}^{- 1} M P_{Y}) (i) . \end{matrix}

(16)

The other elements of

V_{Ω}^{*}

are set to zero. Thus, by using the same arguments as in [15], we obtain the following corollary.

Corollary 1.

Let

S^{*}

be a set that contains the extreme points of

S

. If all elements of the vector

M_{Ω}^{- 1} M P_{Y}

are non-negative, then

V_{Ω}^{*} \in S^{*} \subseteq S

.

In the next proposition, we show two properties of each vector inside

S^{*}

.

Proposition 1.

Let

Ω \subset {1, \dots, | Y |}, | Ω | = rank (P_{X | Y})

. For every Ω, we have

1^{T} (M_{Ω}^{- 1} M P_{Y}) = 1

.

Proof.

Consider the set

S^{1} = {y \in R^{| Y |} | M y = M P_{Y}}

. Any element in

S^{1}

has sum elements equal to one, since we have

M (y - P_{Y}) = 0 \Rightarrow y - P_{Y} \in Null (P_{X | Y})

, we obtain

1^{T} (y - P_{Y}) = 0 \Rightarrow 1^{T} y = 1

. The basic solutions of

S^{1}

are

W_{Ω}^{*}

defined as follows: Let

Ω = {ω_{1}, \dots, ω_{rank (P_{X | Y})}}

, where

ω_{i} \in {1, \dots, | Y |}

, then,

W_{Ω}^{*} (ω_{i}) = M_{Ω}^{- 1} M P_{Y} (i),

and other elements of

W_{Ω}^{*}

are zero. Thus, the sum over all elements of

M_{Ω}^{- 1} M P_{Y}

equals one, since each element in

S^{1}

has to sum to one. □

After finding all extreme points of the set

S

, for the second phase, we proceed as follows. Assume that all extreme points of the set

S

are denoted by

α_{1}, α_{2}, \dots, α_{K}

, which were obtained in the first phase. Then, using the equivalency in Theorem (15), we have

\begin{matrix} g_{0} (P_{X Y}) = H (Y) - min_{\begin{matrix} \begin{matrix} β : \\ β \in R^{K}, β \geq 0 \\ [α_{1} α_{2} \dots α_{K}] β = P_{Y} \end{matrix} \end{matrix}} [H (α_{1}) H (α_{2}) \dots H (α_{K})] β, \end{matrix}

(17)

where

H (α_{i})

is the entropy of the distribution vector

α_{i}

.

In this work, considering the zero leakage scenario, to design the code, we use the properties of the privacy problems defined in (12) and (13).

Next, we recall two important lemmas from [18].

Lemma 1.

(Extended Functional Representation Lemma (EFRL)): For a pair of RVs

(X, Y)

distributed according to

P_{X Y}

supported on alphabets

X

and

Y

, respectively, where

| X |

is finite and

| Y |

is finite or countably infinite, and any

0 \leq ϵ < I (X; Y)

, there exists a RV U defined on

U

such that the leakage between X and U is equal to ϵ, i.e., we have

\begin{matrix} I (U; X) = ϵ, \end{matrix}

Y is a deterministic function of

(U, X)

, i.e., we have

\begin{matrix} H (Y | U, X) = 0, \end{matrix}

and

| U | \leq [| X | (| Y | - 1) + 1] [| X | + 1] .

Lemma 2.

(Extended Strong Functional Representation Lemma (ESFRL)): For a pair of RVs

(X, Y)

distributed according to

P_{X Y}

supported on alphabets

X

and

Y

, respectively, where

| X |

is finite and

| Y |

is finite or countably infinite with

I (X, Y) < \infty

, and any

0 \leq ϵ < I (X; Y)

, there exists a RV U defined on

U

such that the leakage between X and U is equal to ϵ, i.e., we have

\begin{matrix} I (U; X) = ϵ, \end{matrix}

Y is a deterministic function of

(U, X)

, i.e., we have

\begin{matrix} H (Y | U, X) = 0, \end{matrix}

I (X; U | Y)

can be upper bounded as follows:

\begin{matrix} I (X; U | Y) \leq α H (X | Y) + (1 - α) [log (I (X; Y) + 1) + 4], \end{matrix}

and

| U | \leq [| X | (| Y | - 1) + 2] [| X | + 1],

where

α = \frac{ϵ}{H (X)}

.

The main idea in constructing U, which satisfies the conditions in Lemmas 1 and 2, is to add a randomized response to the output of the FRL and SFRL, respectively. The output refers to a random variable which is produced by the FRL or the SFRL and satisfies the corresponding constraints. In this work, considering the non-zero leakage scenario, to design the code, we utilize the techniques used in Lemmas 1 and 2. The randomization technique was used in finding fair and private representations of a dataset in [39]. To find fair representations, we use randomization techniques similar to those in Lemmas 1 and 2, which lead to lower bounds. Furthermore, this randomization technique has also been applied in the design of differentially private mechanisms [40,41,42,43]. A similar randomization technique was used in [22,23] to address private compression design problems. Next, we recall the important results regarding the upper and lower bounds on the minimum entropy coupling as obtained in [34,35,36]. First, we recall the definition of the minimum entropy coupling problem using ([36], Def. 2).

Definition 1.

(minimum entropy coupling) For a set of PMFs

{P_{1}, \dots, P_{M}}

, the minimum Shannon entropy coupling is

\begin{matrix} H^{*} (P_{1}, \dots, P_{M}) ≜ inf_{P_{Y_{1}, \dots, Y_{M}} : Y_{i} \sim P_{i}, \forall i} H (Y_{1}, \dots, Y_{M}), \end{matrix}

(18)

where the infimum is taken over all the joint distributions

P_{Y_{1}, \dots, Y_{M}}

, where the marginal distribution

P_{Y_{i}}

equals

P_{i}

for all

i \in {1, \dots, M}

.

Similarly to [36], for a given joint distribution

P_{X Y}

, let the minimum entropy of functional representation of

(X, Y)

be defined as

\begin{matrix} H^{*} (P_{X Y}) ≜ inf_{\begin{matrix} \begin{matrix} H (Y | X, U) = 0, I (X; U) = 0 \end{matrix} \end{matrix}} H (U) . \end{matrix}

(19)

Remark 3.

As shown in ([36], Lemma 1), the minimum entropy functional representation and the minimum entropy coupling are related functions. In more detail,

H^{*} (P_{X Y})

equals the minimum entropy coupling of the set of PMFs

{P_{Y | X = x_{1}}, \dots, P_{Y | X = x_{n}}}

, where

X = {x_{1}, \dots, x_{n}}

.

Let

G_{S}

be the output of the greedy entropy-based algorithm which was proposed in ([34], Section 3), i.e.,

H^{*} (P_{X Y}) \leq H (G_{S})

. More specifically, the corresponding algorithm aims to solve (19) but does not achieve the optimal solution in general. Next, we recall a result obtained in [35], which showed that

G_{S}

is optimal within

\frac{log e}{e} \approx 0.53

bits when

| X | = 2

and is optimal within

\frac{1 + log e}{2} \approx 1.22

bits when

| X | > 2

. Let

U^{*}

achieve the optimal solution of (19), i.e.,

H (U^{*}) = H^{*} (P_{X Y})

.

Theorem 1

([35], Th. 3.4, Th. 4.1, Th. 4.2). Let

(X, Y) \sim P_{X Y}

and have finite alphabets. When X is binary, we have

\begin{matrix} H (P r o f i l e) & \leq H (U^{*}) \leq H (G_{S}) \leq H (P r o f i l e) + \frac{log e}{e} \\ \approx H (P r o f i l e) + 0.53 . \end{matrix}

(20)

Moreover, for

| X | > 2

, we have

\begin{matrix} H (P r o f i l e) & \leq H (U^{*}) \leq H (G_{S}) \\ \leq H (P r o f i l e) + \frac{1 + log e}{2} \\ \approx H (P r o f i l e) + 1.22 . \end{matrix}

(21)

Here,

P r o f i l e

corresponds to the profile method proposed in ([35], Section 3).

Since finding the optimal solution for (19) is challenging, Theorem 1 asserts that

G_{S}

can be a good approximation for

U^{*}

that achieves (19). Next, we present the results on lower bounds on

H^{*} (P_{X Y})

obtained in a parallel work [36]. The lower bounds were derived using information spectrum and majorization concepts.

Theorem 2

([36], Corollary 2, Th. 2). Let

(X, Y) \sim P_{X Y}

and have finite alphabets. Let

α = 1

in ([36], Corollary 2, Th. 2). We have

\begin{matrix} H (\land_{x \in X} P_{Y | x}) \leq H (Q^{*}) \leq H (U^{*}), \end{matrix}

(22)

where ∧ corresponds to the greatest lower bound with respect to majorization and

Q^{*}

is defined in ([36], Lemma 3).

Remark 4.

In contrast with [35], the lower bounds in [36] were obtained using Rényi entropy in (19). In this work, we consider Shannon entropy, which is a special case of Rényi entropy.

Remark 5.

As argued in ([36], Remark 1), for

α = 1

the (largest) lower bounds obtained in [35,36] match. Thus, using Theorem 1, for binary X, we have

\begin{matrix} H (Q^{*}) \leq H (U^{*}) \leq H (G_{S}) \leq H (Q^{*}) + \frac{log e}{e}, \end{matrix}

(23)

and for

| X | > 2

,

\begin{matrix} H (Q^{*}) & \leq H (U^{*}) \leq H (G_{S}) \leq H (Q^{*}) + \frac{1 + log e}{2} . \end{matrix}

(24)

Moreover, in some cases, the lower bound

H (Q^{*})

is tight, e.g., see ([36], Example 2).

As discussed in [36],

Q^{*}

can be obtained by a greedy construction. To do so, let

Q^{*} = (q_{1}^{*}, q_{2}^{*}, \dots)

with

q_{1}^{*} \geq q_{2}^{*} \geq \dots

, where

q_{i} = P (Q = q_{i})

. Let

P_{Y | X}

be a matrix with columns

P_{Y | X = x}

, where each is a conditional distribution vector and assume that each column has a descending order (re-order each column). Let

q_{1}^{*} = {min}_{x \in X} {{max}_{y \in Y} P_{Y | X} (y | x)}

. In other words, we choose the smallest number in the first row of the matrix

P_{Y | X}

. Next, we subtract

q_{1}^{*}

from the first row and reorder each column and update the matrix. We then choose the smallest number from the first row of the updated matrix and represent it by

q_{2}^{*}

. We continue this procedure until the summation of

q_{i}^{*}

reaches one. To see an example, refer to ([36], Example 1).

Finally, we recall the two-part construction coding which was used in [1]. To do so, let us consider the scenario illustrated in Figure 1 where an encoder wishes to compress Y and communicates it to a user over an unsecured channel. The encoded message is described by the RV

C

. As shown in Figure 1, it is assumed that an adversary has access to the encoded message

C

, and wants to extract information about X. Moreover, it is assumed that the encoder and decoder have access to a shared secret key denoted by the RV W with size M. The goal is to design an encoded message

C

, which compresses Y, using a variable length code with the minimum possible average length that satisfies a zero leakage constraint. The shared secret key is denoted by the discrete RV W defined on

{1, \dots, M}

and is assumed to be accessible by both the encoder and decoder. Furthermore, we assume that W is uniformly distributed and is independent of X and Y.

Theorem 3

([1], Theorem 8). Let

0 \leq ϵ \leq H (X)

and the pair of RVs

(X, Y)

be distributed according to

P_{X Y}

, and the shared secret key size be

| X |

, i.e.,

M = | X |

. Then, we have

\begin{matrix} L^{0} (P_{X Y}, | X |) \leq \sum_{x \in X} H (Y | X = x) + 1 + ⌈ log (| X |) ⌉, \end{matrix}

(25)

where

α = \frac{ϵ}{H (X)}

and if

| Y |

is finite, we have

\begin{matrix} L^{0} (P_{X Y}, | X |) \leq ⌈ log (| X | (| Y | - 1) + 1) ⌉ + ⌈ log (| X |) ⌉, \end{matrix}

(26)

Finally, if X is a deterministic function of Y, we have

\begin{matrix} L^{0} (P_{X Y}, | X |) \leq ⌈ log ((| Y | - | X | + 1)) ⌉ + ⌈ log (| X |) ⌉ . \end{matrix}

(27)

Proof.

Let W be the shared secret key with key size

M = | X |

, which is uniformly distributed over

{1, \dots, M} = {1, \dots, | X |}

and independent of

(X, Y)

. As shown in Figure 2, first, the private data X are encoded using the shared secret key. Thus, we have

\begin{matrix} \tilde{X} = X + W \mod | X | . \end{matrix}

Next, we show that

\tilde{X}

has a uniform distribution over

{1, \dots, | X |}

and

I (X; \tilde{X}) = 0

. We have

\begin{matrix} H (\tilde{X} | X) = H (X + W | X) = H (W | X) = H (W) = log (| X |) . \end{matrix}

(28)

Furthermore,

H (\tilde{X} | X) \leq H (\tilde{X})

, and combining it with (28), we obtain

H (\tilde{X} | X) = H (\tilde{X}) = log (| X |)

. For encoding

\tilde{X}

, we use

⌈ log (| X |) ⌉

bits. Let

C_{1}

denote the encoded

\tilde{X}

. Next, by using the construction in ([1], Lemma 2), let

U^{*}

be the output of FRL. Thus, we have

\begin{matrix} (29) & I (U^{*}; X) & = 0, \\ (30) & H (Y | U^{*}, X) & = 0, \\ (31) & H (U^{*}) & \leq \sum_{x} H (Y | X = x) + 1 . \end{matrix}

We encode

U^{*}

using any lossless code which uses at most

H (U^{*}) + 1

bits and let

C_{2}

be the encoded message

U^{*}

. We send

C = (C_{1}, C_{2})

over the channel. Thus, (25) can be obtained. Note that, as shown in Figure 3, on the decoder side, we first decode X using W, by adding

| X | - W

to

\tilde{X}

, then, Y is decoded using the fact that

U^{*}

satisfies

H (Y | U^{*}, X) = 0

. Next, we show that if W is independent of

(X, U^{*})

we obtain

I (C; X) = 0

. We have

\begin{matrix} I (C; X) & = I (C_{1}, C_{2}; X) = I (U, \tilde{X}; X) \\ = I (U^{*}; X) + I (\tilde{X}; X | U) = I (\tilde{X}; X | U^{*}) \\ \overset{(a)}{=} H (\tilde{X} | U^{*}) - H (\tilde{X} | X, U^{*}) \\ = H (\tilde{X} | U^{*}) - H (X + W | X, U^{*}) \\ \overset{(b)}{=} H (\tilde{X} | U^{*}) - H (W) \\ \overset{(c)}{=} H (\tilde{X}) - H (W) = log (| X |) - log (| X |) = 0 \end{matrix}

where (a) follows from

I (U^{*}; X) = 0

; (b) follows, since W is independent of

(X, U^{*})

; and (c) from the independence of

U^{*}

and

\tilde{X}

. The latter follows, since we have

\begin{matrix} 0 \leq I (\tilde{X}; X | U^{*}) & = H (\tilde{X} | U^{*}) - H (W) \\ \overset{(i)}{=} H (\tilde{X} | U^{*}) - H (\tilde{X}) \leq 0 . \end{matrix}

Thus,

\tilde{X}

and

U^{*}

are independent. Step (i) above follows by the fact that W and

\tilde{X}

are uniformly distributed over

{1, \dots, | X |}

, i.e.,

H (W) = H (\tilde{X})

. As a summary, if we choose W independently of

(X, U^{*})

, the leakage to the adversary is zero. □

4. Main Results

Here, we find upper and lower bounds on (5), (6) and (8), and study them in different scenarios.

4.1. Zero Leakage Results

In this section, we derive upper and lower bounds on

L^{0} (P_{X Y}, M, ϵ)

defined in (8). We first study the properties of

{\hat{P}}_{X Y}

and provide a sufficient condition on

P_{X Y}

so that it belongs to the corresponding set. We then find upper bounds on the entropy of the optimizers of

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

. The obtained bounds help us to find upper bounds on

L^{0} (P_{X Y}, M)

. In other words, for

P_{X Y} \in {\hat{P}}_{X Y}

, we use the solutions of

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

to design

C

; however, in [1], the design of the code was based on the functional representation lemma. Finally, we provide lower bounds on

L^{0} (P_{X Y}, M)

, study the bounds in a numerical example, and compare them with [1].

In the next lemma, let

C (X, Y)

denote the common information between X and Y, where the common information corresponds to the Wyner [44] or Gács-Körner [45] notions of common information.

Lemma 3.

If

C (X, Y) = I (X; Y)

, then

P_{X Y} \in {\hat{P}}_{X Y}

.

Proof.

The proof is provided in ([18], Proposition 6) with

ϵ = 0

and follows similar arguments as the proof of ([46], Theorem 2). □

Corollary 2.

If X is a deterministic function of Y, then

P_{X Y} \in {\hat{P}}_{X Y}

, since, in this case, we have

C (X, Y) = I (X; Y)

. Furthermore, this result was also shown in ([18], Proposition 6).

In the next result, we provide the properties of the optimizers for

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

.

Lemma 4.

If

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

, then

\begin{matrix} g_{0} (P_{X Y}) = h_{0} (P_{X Y}) = H (Y | X), \end{matrix}

(32)

and both

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

have the same set of optimizers. Furthermore, for any optimizer

U^{*}

, we have

\begin{matrix} (33) & H (Y | U^{*}, X) = 0, \\ (34) & I (X; U^{*} | Y) = 0, \\ (35) & I (X; U^{*}) = 0 . \end{matrix}

Finally, if U satisfies (33)–(35), then it is an optimizer for both

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

.

Proof.

All the statements can be shown using ([1], Theorem 7) and the key equation as follows:

\begin{matrix} I (U; Y) = I (X; U) + H (Y | X) - I (X; U | Y) - H (Y | X, U) . \end{matrix}

(36)

□

Remark 6.

The optimization problems in (9) and (10) do not have unique optimizers. For instance, let

X = f (Y)

, by using the constructions used in [1,47], we can attain the optimum value.

Next, we show that if

P_{X Y} \in {\hat{P}}_{X Y}

, without loss of optimality, we can assume that

| U | \leq null (P_{X | Y}) + 1

, where

null (P_{X | Y})

corresponds to the dimension of the null space of

P_{X | Y}

.

Lemma 5.

For any

P_{X Y}

, we have

\begin{matrix} g_{0} (P_{X Y}) = max_{\begin{matrix} \begin{matrix} P_{U | Y} : X - Y - U, \\ I (X; U) = 0, \\ | U | \leq null (P_{X | Y}) + 1 \end{matrix} \end{matrix}} I (Y; U) . \end{matrix}

(37)

Furthermore, let

P_{X Y} \in {\hat{P}}_{X Y}

. Then,

\begin{matrix} h_{0} (P_{X Y}) & = g_{0} (P_{X Y}) \\ (38) & = max_{\begin{matrix} \begin{matrix} P_{U | Y} : X - Y - U \\ I (X; U) = 0, \\ | U | \leq null (P_{X | Y}) + 1 \end{matrix} \end{matrix}} I (Y; U) \\ (39) & = max_{\begin{matrix} \begin{matrix} P_{U | Y} : I (X; U) = 0, \\ | U | \leq null (P_{X | Y}) + 1 \end{matrix} \end{matrix}} I (Y; U) \end{matrix}

Proof.

The proof of (37) is provided in ([15], Theorem 1). It only remains to show the equality

(38) = (39)

. Let

U^{*}

be an optimizer of (38). Using (37), it is also an optimizer of

g_{0} (P_{X Y})

and hence

h_{0} (P_{X Y})

. Thus,

(38) = (39)

. In other words, for

P_{X Y} \in {\hat{P}}_{X Y}

, we can assume

| U | \leq null (P_{X | Y}) + 1

in both problems

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

. □

Before stating the next result, let us define a set

U^{1} (P_{X Y})

and a function

K (P_{X Y})

as follows

\begin{matrix} (40) & U^{1} (P_{X Y}) & ≜ {U : U satisfies (33), (34), (35)} \\ (41) & K (P_{X Y}) & ≜ min_{U \in U^{1} (P_{X Y})} H (U) . \end{matrix}

Note that

U^{1} (P_{X Y})

can be empty for some joint distributions

P_{X Y}

; however, using Lemma 4 when

P_{X Y} \in {\hat{P}}_{X Y}

it is the set which contains all the optimizers satisfying

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

. Furthermore, the function

K (P_{X Y})

finds the minimum entropy of all optimizers satisfying

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

.

Lemma 6.

Let

P_{X Y} \in {\hat{P}}_{X Y}

. We have

\begin{matrix} K (P_{X Y}) \leq log (null (P_{X | Y}) + 1) . \end{matrix}

(42)

Proof.

The proof follows from Lemma 5. Let

U^{*}

be any optimizer of

g_{0} (P_{X Y})

that satisfies

| U^{*} | \leq null (P_{X | Y}) + 1

. We have

K (P_{X Y}) \leq H (U^{*}) \leq log (null (P_{X | Y}) + 1)

. □

Remark 7.

Lemma 6 asserts that when

P_{X Y} \in {\hat{P}}_{X Y}

, there exist a U that satisfies (33)–(35), and

\begin{matrix} H (U) \leq log (null (P_{X | Y}) + 1) . \end{matrix}

(43)

Remark 8.

The upper bound derived in Lemma 6 can be significantly smaller than the upper bound obtained in ([1], Lemma 2). This helps us to find codes that have a lesser average length compared to [1]. Noting that ([1], Lemma 2) asserts that there exists U which satisfies (33) and (35), and

\begin{matrix} | U | \leq | X | (| Y | - 1) + 1, \end{matrix}

which results in

\begin{matrix} H (U) \leq log (| X | (| Y | - 1) + 1) . \end{matrix}

(44)

In the next example, we compare (44) and (43).

Example 1.

Let Y be a deterministic function of X, i.e.,

Y = f (X)

, which yields

P_{X Y} \in {\hat{P}}_{X Y}

and

Y \leq | X |

. In this case,

null (P_{X | Y}) \leq | X | - 1

. We have

\begin{matrix} null (P_{X | Y}) \leq | X | - 1 < | X | (| Y | - 1), \end{matrix}

which results in

\begin{matrix} (43) < (44) . \end{matrix}

Furthermore, as

| X |

and

| Y |

grow, the gap between (33) and (34) becomes larger. To see another example, let X and Y be represented by

(X^{'}, V)

and

(Y^{'}, V)

, where

X^{'}

and

Y^{'}

are conditionally independent given V, then by using [48], we have

I (X; Y) = C (X; Y)

resulting in

P_{X Y} \in {\hat{P}}_{X Y}

, where

C (X; Y)

corresponds to the common information between X and Y. In this case, we have

null (P_{X | Y}) \leq max {| X |, | Y |} - 1

. For

| X | > 2

, we have

\begin{matrix} null (P_{X | Y}) \leq max {| X |, | Y |} - 1 < | X | (| Y | - 1), \end{matrix}

which results in

\begin{matrix} (43) < (44) . \end{matrix}

Intuitively, when

| X |

and

| Y |

are large, (43) can be significantly less than (44), since

null (P_{X | Y}) \leq max {| X |, | Y |} - 1

and (44) includes multiplication of

| X |

and

| Y |

.

Before stating the next result, we define

\begin{matrix} (45) & A_{X Y} & ≜ [\begin{matrix} P_{y_{1}} - P_{y_{1} | x_{1}} & \dots & P_{y_{q}} - P_{y_{q} | x_{1}} \\ \cdot & \dots & \cdot \\ P_{y_{1}} - P_{y_{1} | x_{t}} & \dots & P_{y_{q}} - P_{y_{q} | x_{t}} \end{matrix}] \in R^{t \times q}, \\ (46) & b_{X Y} & ≜ [\begin{matrix} H (Y | x_{1}) - H (Y | X) \\ \cdot \\ H (Y | x_{t}) - H (Y | X) \end{matrix}] \in R^{t}, a ≜ [\begin{matrix} a_{1} \\ \cdot \\ a_{q} \end{matrix}] \in R^{q} . \end{matrix}

where

t = | X |

and

q = | Y |

.

Theorem 4.

Let

P_{X Y} \in {\hat{P}}_{X Y}

. For any

U \in U^{1}

, we have

\begin{matrix} (47) & H (Y | X) + min_{a_{i} : A_{X Y} a = b_{X Y}, a \geq 0} \sum_{i = 1}^{q} P_{y_{i}} a_{i} \leq K (P_{X Y}) \leq \\ (48) & H (U) \leq H (Y | X) + max_{a_{i} : A_{X Y} a = b_{X Y}, a \geq 0} \sum_{i = 1}^{q} P_{y_{i}} a_{i} . \end{matrix}

The upper bound can be strengthened and we have

\begin{matrix} (49) & K (P_{X Y}) & \leq H (Y | X) + max_{\begin{matrix} \begin{matrix} a_{i} : A_{X Y} a = b_{X Y}, a \geq 0, \\ \sum_{i = 1}^{q} P_{y_{i}} a_{i} \leq log (null (P_{X | Y}) + 1) - H (Y | X) \end{matrix} \end{matrix}} \sum_{i = 1}^{q} P_{y_{i}} a_{i} \\ (50) & \leq log (null (P_{X | Y}) + 1) . \end{matrix}

Moreover, when

rank (A_{X Y}) = | Y |

and

Y \neq f (X)

, we have

| U^{1} | = 1

, and for the unique optimizer we have

\begin{matrix} K (P_{X Y}) = H (Y | X) + \sum_{i} P_{y_{i}} a_{i} \end{matrix}

(51)

where

A_{X Y} a = b_{X Y}, a \geq 0

and

\sum_{i = 1}^{q} P_{y_{i}} a_{i} \leq log (null (P_{X | Y}) + 1) - H (Y | X)

.

Proof.

The proof is provided in Appendix A. □

Remark 9.

As we outlined earlier, it is shown in [15] that

g_{0} (P_{X Y})

can be obtained by a linear program which is presented in Section 3. As we can see in Section 3, each extreme point

α_{i}

is a vector with size

| Y |

in which at most

rank (P_{X | Y})

elements are non-zero. We recall that β corresponds to the weighting elements in the marginal distribution of U, i.e.,

β = P_{U}

. The number of extreme points is at most

(\binom{| Y |}{rank (P_{X | Y})})

. Hence, if we consider the linear equations

[α_{1} α_{2} \dots α_{K}] β = P_{Y}

, the size of the matrix in the system of linear equations, i.e.,

[α_{1} α_{2} \dots α_{K}]

, is at most

| Y | \times (\binom{| Y |}{rank (P_{X | Y})})

and we have at most

(\binom{| Y |}{rank (P_{X | Y})})

variables in the system. By solving the linear program proposed in [15] we can find the exact value of

K (P_{X Y})

and the conditional distribution

P_{U | Y X}

that attains it. The complexity of the linear program in [15] can grow faster than the exponential functions with respect to

| Y |

; however, the complexity of our proposed method grows linearly with

| Y |

. The latter follows since, by considering the system of linear equations in our proposed method

A_{X Y} a = b_{X Y}

, the size of matrix is

| X | \times | Y |

and we do not need to find all possible extreme points of the set

S

presented in Section 3. Thus, our proposed upper bound has less complexity compared to the solution in [15]. Furthermore, in a special case where

rank (A_{X Y}) = | Y |

, we can find the exact value of

K (P_{X Y})

using our method. One necessary condition for

rank (A_{X Y}) = | Y |

is to have

| X | \geq | Y | + 1

, since the summation of rows in

A_{X Y}

equals zero.

Remark 10.

The optimizer of

g_{0} (P_{X Y})

does not help us to build

C

, since the constraint

H (Y | U, X) = 0

does not hold in general; however, the optimizer of

h_{0} (P_{X Y})

satisfies it. Hence, we consider the cases where the equality

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

holds. In this case,

H (Y | U, X) = 0

and we have tighter bounds on

H (U)

compared to [1].

Remark 11.

The upper bound in (48) holds for any

U \in U^{1}

; however, the upper bound in (48) asserts that there exists

U \in U^{1}

such that the bound holds. The lower bound in (47) can be used to find a lower bound on

L^{0} (P_{X Y}, M)

. Similarly to Remark 9, the linear program attaining the lower bound in (47) has less complexity than [15].

Next, we obtain upper and lower bounds on

L^{0} (P_{X Y}, M)

. See Figure 4 and Figure 5.

Theorem 5.

For the pair of RVs

(X, Y)

, let

P_{X Y} \in {\hat{P}}_{X Y}

and the shared secret key size be

| X |

, i.e.,

M = | X |

. Furthermore, let

q = | Y |

and

β = log (null (P_{X | Y}) + 1)

. Then, we obtain

\begin{matrix} (52) & L^{0} (P_{X Y}, | X |) \leq K (P_{X Y}) + 1 + ⌈ log (| X |) ⌉ \\ (53) & \leq H (Y | X) + max_{\begin{matrix} \begin{matrix} a_{i} : A_{X Y} a = b_{X Y}, a \geq 0, \\ \sum_{i = 1}^{q} P_{y_{i}} a_{i} \leq β - H (Y | X) \end{matrix} \end{matrix}} \sum_{i = 1}^{q} P_{y_{i}} a_{i} + 1 + ⌈ log (| X |) ⌉ \\ (54) & \leq β + 1 + ⌈ log (| X |) ⌉ \end{matrix}

For any

P_{X Y}

, we have

\begin{matrix} (55) & L^{0} (P_{X Y}, | X |) \leq min_{\begin{matrix} \begin{matrix} P_{U | Y, X} : I (U; X) = 0, \\ H (Y | X, U) = 0 \end{matrix} \end{matrix}} H (U) + 1 + ⌈ log (| X |) ⌉ \\ (56) & \leq 1 + min (\sum_{x} H (Y | X = x), log (| X | (| Y | - 1) + 1) - 1) + ⌈ log (| X |) ⌉, \end{matrix}

and if

X = f (Y)

,

\begin{matrix} L^{0} (P_{X Y}, | X |) \leq ⌈ log (| Y | - | X | + 1) ⌉ + ⌈ log (| X |) ⌉ . \end{matrix}

(57)

Finally, for any

P_{X Y}

with

| Y | \leq | X |

, we have

\begin{matrix} L^{0} (P_{X Y}, | Y |) \leq ⌈ log | Y | ⌉ . \end{matrix}

(58)

Proof.

All upper bounds can be obtained using two-part construction coding [1] and the complete proof is provided in Appendix A. □

Next, we obtain lower bounds on

L^{0} (P_{X Y}, M)

.

Theorem 6.

For any pair of RVs

(X, Y)

distributed according to

P_{X Y}

and shared key size

M \geq 1

, we have

\begin{matrix} L^{0} (P_{X Y}, M) \geq max_{x} H (Y | X = x), \end{matrix}

(59)

if

X = f (Y)

,

\begin{matrix} L^{0} (P_{X Y}, | X |) \geq log (| X |), \end{matrix}

(60)

and the code

C

does not exist when

M < | X |

and

X = f (Y)

. Furthermore, considering

P_{X Y} \in {\hat{P}}_{X Y}

, if we assume the received code

C

satisfies

I (X; C) = 0

,

H (Y | X, C) = 0

and

X - Y - C

, we have

\begin{matrix} L^{0} (P_{X Y}, M) \geq H (Y | X) + min_{a_{i} : A_{X Y} a = b_{X Y}, a \geq 0} \sum_{i = 1}^{q} P_{y_{i}} a_{i} . \end{matrix}

(61)

Proof.

The proofs of (59) and (60) are provided in ([1], Theorem 9) and (61) follows from Theorem 4. □

Remark 12.

The constraint

X - Y - C

in the converse bound (61) can be motivated by considering the cases where the encoder has no direct access to the private data X. Furthermore, the constraint

H (Y | X, C) = 0

is stronger compared to the

H (Y | X, C, W) = 0

that is used to prove (59). Specifically, the constraint needed to prove (59) is

H (Y | C, W) = 0

, which results in

H (Y | X, C, W) = 0

.

4.2. Comparison

In this part, we study the bounds obtained in Theorem 5 and compare them with the existing results in [1].

Case 1: $| Y | \leq | X |$

Clearly, for any

P_{X Y}

with

| Y | \leq | X |

, the upper bound obtained in (58) improves the bounds in (56) and (55), where (56) was derived in [1]. The latter follows, since in this case we have

⌈ log (| Y |) ⌉ \leq ⌈ log (| X |) ⌉

.

Case 2: $P_{X Y} \in {\hat{P}}_{X Y}$

In this part, let

X = f (Y)

, which results in

P_{X Y} \in {\hat{P}}_{X Y}

, see Corollary 1. The upper bound attained by [1] is

⌈ log (| Y | - | X | + 1) ⌉ + ⌈ log (| X |) ⌉

. Furthermore, we can assume that all elements in the probability vector

P_{X}

are non-zero, otherwise we can remove them. the In this case, we have

\begin{matrix} null (P_{X | Y}) = | Y | - | X | . \end{matrix}

(62)

This follows, since each column in

P_{X | Y}

contains exactly one non-zero element that equals 1. Moreover, all rows of

P_{X | Y}

are linearly independent, since each row contains a non-zero element that the other rows do not have. Thus,

rank (P_{X | Y}) = | X |

, which results

null (P_{X | Y}) = | Y | - | X |

. Thus, the upper bound in (54) can be modified to obtain the same bound as (57). Furthermore, the upper bounds (53) and (52) attain less quantities compared to (57), i.e., they can improve the bounds in [1]. Next, in a numerical example, we show that the upper bounds (53) and (52) improve (57).

Example 2.

Let

P_{X | Y} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 \end{matrix}]

and

P_{Y} = [\frac{1}{8}, \frac{2}{8}, \frac{3}{8}, \frac{1}{8}, \frac{1}{16}, \frac{1}{16}]

. Clearly, in this case X is a deterministic function of Y. Using the linear program proposed in [15], we obtain a solution as

P_{Y | u_{1}} = [0.75, 0, 0, 0.25, 0, 0]

,

P_{Y | u_{2}} = [0, 0.75, 0, 0.25, 0, 0]

,

P_{Y | u_{3}} = [0, 0, 0.75, 0, 0.25, 0]

,

P_{Y | u_{4}} = [0, 0, 0.75, 0, 0, 0.25]

and

P_{U} = [\frac{1}{6}, \frac{1}{3}, \frac{1}{4}, \frac{1}{4}]

, which results in

H (U) = 1.9591

bits. We have

K (P_{X Y}) + 1 \leq 2.9591

and

⌈ log (| Y | - | X | + 1) ⌉ = ⌈ log 5 ⌉ = 3

. Hence, the upper bound (52) is strictly lower than (57), i.e., the

U^{*}

that achieves

g_{0} (P_{X Y})

has less entropy than the RV U constructed in ([1], Lemma 1).

4.3. Improving the Bounds Using Greedy Entropy-Based Algorithm

In this part, we use the greedy entropy-based algorithm to generalize the bounds obtained in Theorem 5. In more detail, we use the output of the greedy entropy-based algorithm in Theorem 1 instead of the solution to

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

. In the next theorem, let

Q^{*} = (q_{1}^{*}, q_{2}^{*}, \dots)

with

q_{1}^{*} \geq q_{2}^{*} \geq \dots

, where

q_{i} = P (Q = q_{i})

. Let

P_{Y | X}

be a matrix with columns

P_{Y | X = x}

, and we re-order each column so that each column has a descending order. Let

q_{1}^{*} = {min}_{x \in X} {{max}_{y \in Y} P_{Y | X} (y | x)}

, i.e., choose the smallest number in the first row of the matrix

P_{Y | X}

, and then we subtract

q_{1}^{*}

from the first row and reorder each column and update the matrix. We then choose the smallest number from the first row of the updated matrix and represent it by

q_{2}^{*}

. We continue this procedure until the summation of

q_{i}^{*}

reaches one.

Theorem 7.

Let the RVs

(X, Y)

be distributed according to

P_{X Y}

supported on alphabets

X

and

Y

, where

| X |

and

| Y |

are finite, and let the shared secret key size be

| X |

, i.e.,

T = | X |

. Let

| X | = 2

, we have

\begin{matrix} L^{0} (P_{X Y}, 2) \leq H (Q^{*}) + \frac{log e}{e} + 2, \end{matrix}

(63)

where

Q^{*}

is defined in ([36], Lemma 3). When

| X | > 2

, we have

\begin{matrix} L^{0} (P_{X Y}, | X |) \leq H (Q^{*}) + \frac{1 + log e}{2} + 1 + ⌈ log (| X |) ⌉, \end{matrix}

(64)

Proof.

The proof is similar to Theorem 5 and the main difference is to use the minimum entropy output of (19) instead of the solution to

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

that is used in two-part construction coding. Similarly to Theorem 5, we use two-part code construction to achieve the upper bounds. As shown in Figure 6, we first encode the private data X using one-time pad coding ([2], Lemma 1), which uses

⌈ log (| X |) ⌉

bits. Next, we produce U based on the greedy entropy-based algorithm proposed in [34], which solves the minimum entropy problem in (19). Thus, we have

\begin{matrix} (65) & H (Y | X, U) = 0, \\ (66) & I (U; X) = 0, \end{matrix}

and

\begin{matrix} H (U) \leq H (Q^{*}) + \frac{log e}{e}, \end{matrix}

(67)

and for

| X | > 2

,

\begin{matrix} H (U) \leq H (Q^{*}) + \frac{1 + log e}{2} . \end{matrix}

(68)

Thus, we obtain (63) and (64). Moreover, for the leakage constraint, we note that the randomness of one-time-pad coding is independent of X and the output of the greedy entropy-based algorithm U.

On the user side, we can decode Y using (65). □

Remark 13.

We emphasize that (63) and (64) can improve (52), since they achieve the minimum in (56) within a gap, i.e., achieve

{min}_{\begin{matrix} \begin{matrix} P_{U | Y, X} : I (U; X) = 0, \\ H (Y | X, U) = 0 \end{matrix} \end{matrix}} H (U)

up to a constant gap that depends on

| X |

. Furthermore, in contrast with the bounds in Theorem 5, (63) and (64) can be achieved for any joint distribution

P_{X Y}

. However, in the following numerical example, we show that the bounds in Theorem 5 can be tighter than those in Theorem 7.

Example 3.

Let

P_{X | Y} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 \end{matrix}]

and

P_{Y} = [\frac{1}{8}, \frac{2}{8}, \frac{3}{8}, \frac{1}{8}, \frac{1}{16}, \frac{1}{16}]

. Clearly, in this case, X is a deterministic function of Y. Using the linear program proposed in [15], we obtain a solution as

P_{Y | u_{1}} = [0.75, 0, 0, 0.25, 0, 0]

,

P_{Y | u_{2}} = [0, 0.75, 0, 0.25, 0, 0]

,

P_{Y | u_{3}} = [0, 0, 0.75, 0, 0.25, 0]

,

P_{Y | u_{4}} = [0, 0, 0.75, 0, 0, 0.25]

and

P_{U} = [\frac{1}{6}, \frac{1}{3}, \frac{1}{4}, \frac{1}{4}]

which results

H (U) = 1.9591

bits. We have

H (U) = K (P_{X Y}) \leq 1.9591

. Moreover, we have

\begin{matrix} P_{Y | X} = [\begin{matrix} \frac{1}{6} & 0 \\ \frac{1}{3} & 0 \\ \frac{1}{2} & 0 \\ 0 & \frac{1}{2} \\ 0 & \frac{1}{4} \\ 0 & \frac{1}{4} \end{matrix}] . \end{matrix}

Using the greedy search algorithm, we have

P_{Q *} = [\frac{1}{2} \frac{1}{4} \frac{1}{6} \frac{1}{12}]

, hence,

H (Q^{*}) = 1.7296

. Thus,

\begin{matrix} H (Q^{*}) + \frac{log e}{e} = 2.2596 \geq K (P_{X Y}) = 1.9591 . \end{matrix}

In the next example, we show that the bounds in Theorem 7 can be tighter than the bounds obtained in Theorem 5.

Example 4.

Let

P_{X | Y} = [\begin{matrix} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 \end{matrix}]

and

P_{Y} = [\frac{1}{8}, \frac{2}{8}, \frac{3}{8}, \frac{1}{8}, 0.0925, 0.0325]

. Similarly to the previous example, X is a deterministic function of Y. Using the linear program proposed in [15], we obtain a solution as

P_{U} = [0.185, 0.065, 0.315, 0.25, 0.185]

, which results in

H (U) = 2.1696

bits. We have

H (U) = K (P_{X Y}) = 2.182

. Moreover, we have

\begin{matrix} P_{Y | X} = [\begin{matrix} \frac{1}{3} & 0 \\ \frac{2}{3} & 0 \\ 0 & 0.6 \\ 0 & 0.2 \\ 0 & 0.148 \\ 0 & 0.052 \end{matrix}] . \end{matrix}

Using the greedy search algorithm, we have

P_{Q *} = [0.6 0.2 0.1333 0.052 0.0147]

, hence,

H (Q^{*}) = 1.6053

. Thus,

\begin{matrix} H (Q^{*}) + \frac{log e}{e} = 2.136 \leq K (P_{X Y}) = 2.182 . \end{matrix}

4.4. Non-Zero Leakage Results

In this section, we derive upper and lower bounds on

L (P_{X Y}, M, ϵ)

and

L^{b} (P_{X Y}, M, ϵ)

, defined in (5) and (6). Next, we study the bounds in different scenarios; moreover, we provide an example that shows that the bounds can be tight. The following lemma helps us to find lower bounds on the entropy of an RV that satisfies

I (U; X) = ϵ

and

H (Y | U, X) = 0

considering the shared key.

Lemma 7.

Let the pair

(X, Y)

be jointly distributed and the RV W be independent of X and Y. Then, if the RV U satisfies

I (U; X) = ϵ,

and

H (Y | U, X, W) = 0,

then, we have

\begin{matrix} H (U) \geq max {L_{1} (ϵ, P_{X Y}), L_{2} (ϵ, P_{X Y}), L_{3} (ϵ, P_{X Y})}, \end{matrix}

(69)

where

\begin{matrix} (70) & L_{1} (ϵ, P_{X Y}) = H (Y | X), \\ (71) & L_{2} (ϵ, P_{X Y}) = min_{x \in X} H (Y | X) + ϵ, \\ (72) & L_{3} (ϵ, P_{X Y}) = H (Y | X) - H (X | Y) + ϵ, \end{matrix}

and

{min}_{x \in X} H (Y | X = x)

is the minimum conditional entropy which is non-zero.

Proof.

The proof is provided in Appendix B. □

In the next two theorems, we provide upper and lower bounds on

L (P_{X Y}, M, ϵ)

. The next theorem is a generalization of ([1], Theorem 8) for correlated

C

and X.

Theorem 8.

Let

0 \leq ϵ \leq H (X)

and the pair of RVs

(X, Y)

be distributed according to

P_{X Y}

, and the shared secret key size be

| X |

, i.e.,

M = | X |

. Then, we have

\begin{matrix} L (P_{X Y}, | X |, ϵ) & \leq \sum_{x \in X} H (Y | X = x) + ϵ + h (α) + 1 + ⌈ log (| X |) ⌉, \end{matrix}

(73)

where

α = \frac{ϵ}{H (X)}

and if

| Y |

is finite, we have

\begin{matrix} L (P_{X Y}, | X |, ϵ) \leq ⌈ log (| X | (| Y | - 1) + 1) (| X | + 1) ⌉ + ⌈ log (| X |) ⌉, \end{matrix}

(74)

Finally, if X is a deterministic function of Y, we have

\begin{matrix} L (P_{X Y}, | X |, ϵ) \leq ⌈ log ((| Y | - | X | + 1) [| X | + 1]) ⌉ + ⌈ log (| X |) ⌉ . \end{matrix}

(75)

Proof.

The proof is based on ([19], Lemma 5) and the two-part construction coding and is provided in Appendix B. □

In the next theorem, we provide lower bounds on

L (P_{X Y}, M, ϵ)

.

Theorem 9.

Let

0 \leq ϵ \leq H (X)

and the pair of RVs

(X, Y)

be distributed according to

P_{X Y}

supported on alphabets

X

and

Y

, where

| X |

is finite and

| Y |

is finite or countably infinite. For any shared secret key size

M \geq 1

, we have

\begin{matrix} L (P_{X Y}, M, ϵ) \geq max {L_{1} (ϵ, P_{X Y}), L_{2} (ϵ, P_{X Y}), L_{3} (ϵ, P_{X Y})}, \end{matrix}

(76)

where

L_{1} (ϵ, P_{X Y})

,

L_{2} (ϵ, P_{X Y})

and

L_{3} (ϵ, P_{X Y})

are defined in (70), (71) and (72). Furthermore, if X is deterministic function of Y, then, we have

\begin{matrix} L (P_{X Y}, M, ϵ) \geq log (\frac{1}{{max}_{x} P_{X} (x)}) . \end{matrix}

(77)

Proof.

The proof is based on Lemma 7 and is provided in Appendix B. □

Next, we provide an example where the upper and lower bounds obtained in Theorems 8 and 9 are studied.

Example 5.

Let

Y = (Y_{1}, \dots, Y_{N})

be an i.i.d Bernoulli(

0.5

) sequence and

X_{N} = \frac{1}{N} \sum_{i = 1}^{N} Y_{i}

. Clearly,

X_{N}

is a function of Y,

| X | = N + 1

and

| Y | = 2^{N}

. Using ([1], (72)), we have

\begin{matrix} h_{0} (P_{X Y}) \overset{(a)}{=} H (Y | X) = \sum_{i = 1}^{N} {(\frac{1}{2})}^{N} (\binom{N}{i}), \end{matrix}

(78)

where (a) follows from ([1], (70)). Let the shared key size be

N + 1

, by using Theorems 8 and 9, we have

\begin{matrix} ϵ + \sum_{i = 1}^{N} {(\frac{1}{2})}^{N} (\binom{N}{i}) & \leq L (P_{X Y}, N + 1, ϵ) \\ \leq log ((N + 2) (2^{N} - N)) + log (N + 1), \end{matrix}

(79)

where (75) is used for the upper bound and

L_{3} (ϵ, P_{X Y})

is used for the lower bound. Using ([1], (73)), we have

\begin{matrix} lim_{N \to \infty} \frac{h_{0} (P_{X Y})}{N} = lim_{N \to \infty} \frac{H (Y | X)}{N} = h (0.5) = 1, \end{matrix}

(80)

where

h (\cdot)

is the binary entropy function. Now, by using (79) and (80), we obtain

\begin{matrix} lim_{N \to \infty} \frac{L (P_{X Y}, N + 1, ϵ)}{N} = 1 \end{matrix}

(81)

Bounds for Bounded Leakage Constraint

Here, we assume that

X = (X_{1}, X_{2})

, where

X = X_{1} \times X_{2}

,

1 < | X_{1} | \leq | X_{2} |

, and

| X_{1} | | X_{2} | = | X |

. In the next theorem, we provide upper bounds for

L^{b} (P_{X Y}, M, ϵ)

. We show that when the leakage is more than a threshold, we are able to communicate the message over the channel using a shared key size less than

| X |

and the receiver can decode the message without any loss.

Theorem 10.

Let

ϵ \geq H (X_{1})

and the RVs

(X_{1}, X_{2}, Y)

be distributed according to

P_{X_{1} X_{2} Y}

and let the shared secret key size be

| X_{2} |

, i.e.,

M = | X_{2} |

. Then, we have

\begin{matrix} L^{b} (P_{X Y}, | X_{2} |, ϵ) \leq \sum_{x \in X} H (Y | X = x) + H (X_{1}) + 2 + ⌈ log (| X_{2} |) ⌉, \end{matrix}

(82)

if

| Y |

is finite, then

\begin{matrix} L^{b} (P_{X Y}, | X_{2} |, ϵ) \leq ⌈ log (| X | (| Y | - 1) + 1) ⌉ + ⌈ log (| X_{2} |) ⌉ + 2 + H (X_{1}), \end{matrix}

(83)

and if

X = (X_{1}, X_{2})

is deterministic function of Y, we obtain

\begin{matrix} L^{b} (P_{X Y}, | X_{2} |, ϵ) \leq ⌈ log (| Y | - | X | + 1) ⌉ + ⌈ log (| X_{2} |) ⌉ + 2 + H (X_{1}), \end{matrix}

(84)

Furthermore, let

ϵ \geq H (X_{2})

and the shared secret key size be

| X_{1} |

. We have

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, ϵ) \leq \sum_{x \in X} H (Y | X = x) + H (X_{2}) + 2 + ⌈ log (| X_{1} |) ⌉, \end{matrix}

(85)

Proof.

The proof is similar to the proof of Theorem 8 and is provided in Appendix B. □

Remark 14.

As argued in ([1], Theorem 9), when the leakage is zero and X is a deterministic function of Y, if the shared key size is less than the size of X, i.e.,

M \leq | X |

, lossless variable-length codes do not exist. However, as is shown in Theorem 10, when the leakage is more than a threshold, for the key size less than

| X |

, such codes exist.

Remark 15.

The upper bounds (82) and (85) can be less than the upper bound found in ([1], Theorem 9), which is obtained under perfect privacy assumption. For instance, if

H (X_{1}) + 2 \leq log (| X_{1} |)

, then (82) is less than ([1], (95)). The latter follows, since we have

H (X_{1}) + 1 + ⌈ log (| X_{2} |) ⌉ \leq log (| X_{1} |) + ⌈ log (| X |) - log (| X_{1} |) ⌉ - 1 \leq ⌈ log (| X_{1} |) ⌉ + ⌈ log (| X |) - log (| X_{1} |) ⌉ - 1 \leq ⌈ log (| X |) ⌉,

where the last inequality follows by

⌈ a ⌉ + ⌈ b ⌉ \leq ⌈ a + b ⌉ + 1

.

In the next theorem, we find lower bounds for

L^{b} (P_{X Y}, M, ϵ)

.

Theorem 11.

Let

0 \leq ϵ \leq H (X)

and the RVs

(X_{1}, X_{2}, Y)

be distributed according to

P_{X_{1} X_{2} Y}

. For any shared secret key size

M \geq 1

, we have

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \geq H (Y | X) - H (X | Y) . \end{matrix}

(86)

Furthermore, if X is a deterministic function of Y, then we have

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \geq log (\frac{1}{{max}_{x} P_{X} (x)}) . \end{matrix}

(87)

Proof.

The proof is provided in Appendix B. □

A General Approach for X

Next, by using the same construction as in Theorem 10, we find upper bounds on

L^{b} (P_{X Y}, M, ϵ)

for any joint distribution

P_{X Y}

. Next, we present a simple observation, which we call the separation technique.

Observation 1.

(Separation technique) Any discrete RV X supported on

X = {1, \dots, | X |}

with

| X | > 2

can be represented by two RVs

(X_{1}, X_{2})

, where

1 < | X_{1} | < | X |

and

1 < | X_{2} | < | X |

.

If

| X |

is not a prime number, then we can show X by

(X_{1}, X_{2})

where

X = X_{1} \times X_{2}

,

1 < | X_{1} | \leq | X_{2} |

,

| X_{1} | | X_{2} | = | X |

and

P_{X} (x) = P_{X_{1}, X_{2}} (x_{1}, x_{2})

. Furthermore, if

| X |

is a prime number, then we can show X by

(X_{1}, X_{2})

, where

| X_{1} | | X_{2} | = | X | + 1

and

P_{X_{1}, X_{2}} (| X_{1} |, | X_{2} |) = 0

. Let

S_{X}

be all possible representations of X, where

X = (X_{1}, X_{2})

. For a fixed

ϵ

, we define

S_{X}^{1 ϵ} ≜ {(X_{1}, X_{2}) : (X_{1}, X_{2}) \in S_{X}, H (X_{1}) \leq ϵ}

and

S_{X}^{2 ϵ} ≜ {(X_{1}, X_{2}) : (X_{1}, X_{2}) \in S_{X}, H (X_{2}) \leq ϵ}

.

Theorem 12.

For any

ϵ \geq 0

, pair of RVs

(X, Y)

distributed according to

P_{X Y}

, and shared key size

M \geq α

, if

S_{X}^{1 ϵ} \neq

, we have

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \leq \sum_{x \in X} H (Y | X = x) + 2 + min_{(X_{1}, X_{2}) \in S_{X}^{1 ϵ}} \{H (X_{1}) + ⌈ log (| X_{2} |) ⌉\}, \end{matrix}

(88)

where

α = | {arg min}_{X_{2} : (X_{1}, X_{2}) \in S_{X}^{1 ϵ}} H (X_{1}) + ⌈ log (| X_{2} |) ⌉ | < | X |

and

| S |

denotes the size of the RV S. If

S_{X}^{2 ϵ} \neq

, for a shared key size

M \geq β

, we have

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \leq \sum_{x \in X} H (Y | X = x) + 2 + min_{(X_{1}, X_{2}) \in S_{X}^{1 ϵ}} \{H (X_{2}) + ⌈ log (| X_{1} |) ⌉\}, \end{matrix}

(89)

where

α = | {arg min}_{X_{1} : (X_{1}, X_{2}) \in S_{X}^{2 ϵ}} H (X_{1}) + ⌈ log (| X_{2} |) ⌉ | < | X |

.

Proof.

To prove (88), let

(X_{1}, X_{2}) \in S_{X}^{1 ϵ}

. Then, by using Theorem 10, we have

\begin{matrix} L^{b} (P_{X Y}, | X_{2} |, ϵ) \leq \sum_{x \in X} H (Y | X = x) + H (X_{1}) + 2 + ⌈ log (| X_{2} |) ⌉ . \end{matrix}

(90)

Since (90) holds for any

(X_{1}, X_{2}) \in S_{X}^{1 ϵ}

, we can take the minimum over

(X_{1}, X_{2})

, which results in (88). Furthermore, the key size is chosen as

| {arg min}_{X_{2} : (X_{1}, X_{2}) \in S_{X}^{1 ϵ}} H (X_{1}) + ⌈ log (| X_{2} |) ⌉ | < | X |

and note that the first term

\sum_{x \in X} H (Y | X = x)

remains the same, since it is a function of

P_{Y | X_{1} X_{2}}

, i.e., the distributions including the pair

(X_{1}, X_{2})

do not change, since

P_{X_{1}, X_{2}} (x_{1}, x_{2}) = P_{X} (x)

. Using (85) and following the same approach, (89) is obtained. □

Remark 16.

The upper bounds (88) and (89) generalize (82) and (85). The upper bounds (82) and (85) are obtained for a fixed

(X_{1}, X_{2})

. Thus, by taking the minimum over

(X_{1}, X_{2}) \in S_{X}^{1 ϵ}

and

(X_{1}, X_{2}) \in S_{X}^{2 ϵ}

, we obtain tighter upper bounds.

Remark 17.

The same lower bounds as obtained in Theorem 11 can be used here, since the separation technique does not improve the bounds in (86) and (87).

Remark 18.

Using Example 5, we can see that the bounds obtained in Theorems 10–13 can be asymptotically tight.

Next, we provide an example to clarify the construction in Theorem 12 and compare the bounds with the perfect privacy case.

Example 6.

Let the RV

X \in {1, \dots, 12}

with distribution

P_{X} (x) = 0.05, x \in {1, \dots, 10}

and

P_{X} (x) = 0.475, x \in {11, 12}

be correlated with Y, where

Y | X = x

is a

BSC (0.5)

. To calculate the bound in Theorem 12, one possible separation of X is

(X_{1}, X_{2})

, where the distributions of

X_{1}

and

X_{2}

are

P_{X_{1}} (x_{1}) = 0.01, x \in {1, \dots, 5}

,

P_{X_{1}} (x_{1}) = 0.95, x_{1} = 6

and

P_{X_{2}} (x_{2}) = 0.5, x \in {1, 2}

. In this case,

H (X_{1}) = 0.4025

and

log (| X_{2} |) = 1

. Using the bound (82), for all

ϵ \geq 0.4025

and key size

M \geq 1

, we have

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \leq \sum_{x \in X} H (Y | X = x) + H (X_{1}) + 2 + ⌈ log (| X_{2} |) ⌉ = 15.45 bits, \end{matrix}

By using the bound ([1], (95)), for the key size

M \geq 12

, we have

\begin{matrix} L^{b} (P_{X Y}, M, 0) \leq 17 bits . \end{matrix}

Note that, since

| Y | = 2

, we can also use the following bound instead of (88)

\begin{matrix} L^{b} (P_{X Y}, M, ϵ) \leq ⌈ log (| X | (| Y | - 1) + 1) ⌉ + 2 \\ + min_{(X_{1}, X_{2}) \in S_{X}^{1 ϵ}} \{⌈ log (| X_{2} |) ⌉ + H (X_{1})\} . \end{matrix}

By checking all possible separations, the minimum of

⌈ log (| X_{2} |) ⌉ + H (X_{1})

occurs at

| X_{1} | = 6

, and

| X_{2} | = 2

where

X_{1}

and

X_{2}

have the distributions mentioned before. Thus, for all

ϵ \geq 0.4025

and the key size

M \geq 1

, we have

L^{b} (P_{X Y}, M, ϵ) \leq 7.4025 bits .

By using the bound ([1], (96)), for the key size

M \geq 12

, we have

L^{b} (P_{X Y}, M, 0) \leq 9 bits .

We can see that by allowing a small amount of leakage with a shorter secret key size, we are able to send shorter codewords.

In the following, we study the bounds obtained in Theorem 12 considering a perfect privacy scenario and we show that the previous bounds can be improved by using the separation technique.

Perfect Privacy ( $ϵ = 0$ )

In this part, we let

ϵ = 0

and consider the problem as in [1]. We show that when

X = (X_{1}, X_{2})

, where

X_{2}

is a deterministic function of

X_{1}

, the upper bounds on

L^{b} (P_{X Y}, M, 0)

can be improved by using a shorter shared key size, i.e.,

M < | X |

. In the next theorem, we provide upper bounds on

L^{b} (P_{X Y}, M, 0)

.

Theorem 13.

Let

X = (X_{1}, X_{2})

, where

X_{2} = f (X_{1})

, and the shared key size be

| X_{1} |

. We have

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, 0) \leq \sum_{x_{1} \in X_{1}} H (Y | X_{1} = x_{1}) + 1 + ⌈ log (| X_{1} |) ⌉, \end{matrix}

(91)

if

| Y |

is finite, then

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, 0) \leq ⌈ log (| X | (| Y | - 1) + 1) ⌉ + ⌈ log (| X_{1} |) ⌉ + 1, \end{matrix}

(92)

and if

X = (X_{1}, X_{2})

is deterministic function of Y, we obtain

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, 0) \leq ⌈ log (| Y | - | X | + 1) ⌉ + ⌈ log (| X_{1} |) ⌉ + 1 . \end{matrix}

(93)

Proof.

The proof is provided in Appendix B. □

Remark 19.

Clearly, when

X = (X_{1}, X_{2})

with

X_{2} = f (X_{1})

the upper bounds (91), (92) and (93) improve the bounds in ([1], Theorem 8).

Special Case: $X = (X_{1}, X_{2})$ and $X_{2} = f (X_{1})$

In this part, we use similar approaches as used in Theorems 12 and 13 to find upper bounds. We encode

X_{1}

using one-time-pad coding and the same U as in Theorem 13. As shown in Theorem 13,

X_{1}

is sufficient to decode Y, since

X_{2} = f (X_{1})

; however, in this scheme, we do not have the possibility that we are allowed to leak about X. Similarly to Theorem 13, we obtain

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, ϵ) \leq \sum_{x \in X} H (Y | X = x) + 1 + ⌈ log (| X_{1} |) ⌉ \\ = \sum_{x_{1} \in X_{1}} H (Y | X_{1} = x_{1}) + 1 + ⌈ log (| X_{1} |) ⌉ . \end{matrix}

Next, we use the separation technique as used in Theorem 12. For any

ϵ \geq 0

, if a separation exist such that

ϵ \geq H (X_{1}^{'})

, where

X_{1} = (X_{1}^{'}, X_{1}^{″})

, we obtain

\begin{matrix} L^{b} (P_{X Y}, | X^{″} |, ϵ) \leq \sum_{x_{1} \in X_{1}} H (Y | X_{1} = x_{1}) + H (X_{1}^{'}) + 2 + ⌈ log (| X_{1}^{″} |) ⌉ . \end{matrix}

Furthermore, we can take the minimum over all possible separations to improve the upper bound. Let

S_{X_{1}}^{ϵ}

be all possible separations, where

ϵ \geq H (X_{1}^{'})

. We have

\begin{matrix} L^{b} (P_{X Y}, α, ϵ) \leq \sum_{x_{1} \in X_{1}} H (Y | X_{1} = x_{1}) + 2 + min_{(X_{1}^{'}, X_{1}^{″}) \in S_{X_{1}}^{ϵ}} \{H (X_{1}^{'}) + ⌈ log (| X_{1}^{″} |) ⌉\}, \end{matrix}

where

α = | {arg min}_{X_{1}^{″} : (X_{1}^{'}, X_{1}^{″}) \in S_{X_{1}}^{ϵ}} H (X_{1}^{'}) + ⌈ log (| X_{1}^{″} |) ⌉ | < | X_{1} |

.

We can see that the upper bound on

L^{b} (P_{X Y}, | X^{″} |, ϵ)

can be less than the bound in (91). For instance, let

H (X_{1}^{'}) + 2 \leq log (| X_{1}^{'} |)

. Using the same arguments as in Remark 15, we have

H (X_{1}^{'}) + 1 + ⌈ log (| X_{1}^{″} |) ⌉ \leq ⌈ log (| X_{1} |) ⌉

. Hence, if the leakage is more than a certain threshold, this can help us to improve the upper bound and send shorter codewords compared to the perfect privacy case. Moreover, it helps us to use a shorter shared secret key size.

5. Applications: Cache-Aided Networks

In this part, we present an application where the results of this paper can be used. The complete study of the application can be found in [49]. To do so, let us consider the scenario illustrated in Figure 7. A server has access to a database consisting of N files

Y_{1}, \dots, Y_{N}

, where each file, of size F bits, is sampled from the joint distribution

P_{X Y_{1} \cdot Y_{N}}

. The random variable X denotes the private latent variable. We assume that the server knows the realization of the private variable X as well. The server is connected to K users over a shared link, where user i has access to a local cache memory of size

M F

bits. Furthermore, we assume that the server and the users have access to a shared secret key denoted by W, of size T. The system works in two phases: the placement and delivery phases, respectively [50]. In the placement phase, the server fills the local caches using the database. Let

Z_{k}

denote the content of the local cache memory of user k,

k \in [K] ≜ {1, \dots, K}

after the placement phase. In the delivery phase, first the users send their demands to the server, where

d_{k} \in [N]

denotes the demand of user k. The server sends a response, denoted by

C

, over the shared link to satisfy all the demands, simultaneously. We assume that an adversary has access to the shared link as well, and uses

C

to extract information about X. However, the adversary does not have access to the local cache contents or the secret key. Since the files in the database are all correlated with the private latent variable X, the coded caching and delivery techniques introduced in [50] do not satisfy the privacy requirement. The goal of the cache-aided private delivery problem is to find a response

C

with minimum possible average length that satisfies a certain privacy constraint and the zero-error decodability constraint of the users. Here, we consider the worst-case demand combinations

d = (d_{1}, \dots, d_{K})

to construct

C

. This expectation is taken for the randomness in the database. In this study, we first consider a perfect privacy constraint, i.e., we require

C

to be independent of X. Let

{\hat{Y}}_{d_{k}}

denote the decoded message of user k using W,

C

, and

Z_{k}

. User k should be able to recover

Y_{d_{k}}

reliably, i.e.,

P {{\hat{Y}}_{d_{k}} \neq Y_{d_{k}}} = 0

,

\forall k \in [K]

. We remark that we have a cache-aided variable-length compression problem with a privacy constraint. Hence, to solve this problem and design the code

C

, we utilized techniques used in privacy mechanisms, data compression, and cache design and coded delivery problems, and combined them to build such a code. In particular, we used the data compression techniques employed in this paper and caching design techniques in [50]. Finally, we extended the results for correlated X and

C

, i.e., when non-zero leakage is allowed. To do so, we used the results of this paper for correlated X and

C

, i.e., the non-perfect privacy part. For more details, see [49].

6. Experiment

In this section, we provide an experiment to evaluate the bounds obtained in Theorem 4. To do so, we used the MNIST dataset, which contains 60,000 images illustrating handwritten digits from 0 to 9. Let the RV S (information source) denote the images in the dataset, i.e.,

| S | =

60,000. Let Y represent the useful data that is a feature extracted from the dataset. In this example, the useful data are defined as a “quantized histogram”. To determine the quantized histogram of the dataset, we first converted the images into black and white by quantizing them. The histogram of any image in the dataset is then computed as the ratio of the number of white pixels to the total number of pixels. Consequently, the histogram of any image can be represented by a value within the interval

[0, 1]

. For quantizing the histogram of any image, we used the following intervals: Interval 1

= [0, 0.1)

, Interval 2

= [0.1, 0.15)

, Interval 3

= [0.15, 0.2)

, Interval 4

= [0.2, 0.25)

, Interval 5

= [0.25, 0.3)

, Interval 6

= [0.3, 0.35)

and Interval 7

= [0.35, 1]

. Each image in the dataset was therefore labeled with a number between 1 and 7, indicating the interval to which its histogram belongs. Let Y denote the interval sample S belongs to. Clearly, the labels of the intervals are deterministic functions of the information source S. For instance, in Figure 8, the number of white pixels is 130, hence the quantized histogram is equal to

0.1658

, which corresponds to the third interval.

Let the RV Z denote the label of each image that is a number from 0 to 9, indicating the digit which the image represents, i.e.,

Z \in {0, 1, \dots, 9}

. Clearly, Z is a deterministic function of S, i.e.,

Z = f (S)

. The private attribute X is defined as the presence of, digit 0 which is a binary RV, i.e.,

X \in {0, 1}

and

X = \{\begin{matrix} 1 if Z = 0, \\ 0 if Z = 1 \end{matrix}

. Here, we use an empirical distribution for Z and it can be seen that X is a deterministic function of Z, which yields the Markov chain

X - Z - Y

. Using this Markov chain, the kernel

P_{X | Y}

can be obtained as

P_{X | Z} P_{Z | Y}

, where

\begin{matrix} P_{X | Z} = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix}], \end{matrix}

and

P_{Z | Y}

can be found from the dataset. We used an empirical distribution to find the kernel

P_{Y | Z}

, e.g.,

P_{Y = 1 | Z = 1}

is the number of images illustrating digit 1, which corresponds to label 1 (their quantized histograms belong to the first interval) divided by the total number of images with digit 1. By using

P_{Y | Z}

derived in (94) and multiplying it by

P_{Z | X}

, we obtain

P_{X | Y}

as in (95).

\begin{matrix} P_{Z | Y} = [\begin{matrix} 0.0064 & 0.0441 & 0.1445 & 0.3070 & 0.4678 & 0.5968 & 0.7500 \\ 0.4922 & 0.0497 & 0.0037 & 0.0004 & 0 & 0 & 0 \\ 0.0262 & 0.0854 & 0.1440 & 0.1599 & 0.1044 & 0.1129 & 0 \\ 0.0416 & 0.1017 & 0.1339 & 0.1265 & 0.0922 & 0.0323 & 0 \\ 0.0876 & 0.1330 & 0.0796 & 0.0332 & 0.0111 & 0 & 0 \\ 0.0712 & 0.1058 & 0.0907 & 0.0676 & 0.0478 & 0.0161 & 0 \\ 0.0479 & 0.1086 & 0.1181 & 0.0973 & 0.0711 & 0.0645 & 0 \\ 0.1330 & 0.1447 & 0.0635 & 0.0201 & 0.0078 & 0 & 0 \\ 0.0162 & 0.0872 & 0.1409 & 0.1523 & 0.1700 & 0.1774 & 0.2500 \\ 0.0778 & 0.1396 & 0.0812 & 0.0357 & 0.0278 & 0 & 0 \end{matrix}], \end{matrix}

(94)

\begin{matrix} P_{X | Y} = [\begin{matrix} 0.0064 & 0.0441 & 0.1445 & 0.3070 & 0.4678 & 0.5968 & 0.7500 \\ 0.9936 & 0.9559 & 0.8555 & 0.6930 & 0.5322 & 0.4032 & 0.2500 \end{matrix}] . \end{matrix}

(95)

Moreover, the distribution of the task Y is given as in Table 1.

The goal was to compress Y and send it to a user, and we considered a perfect privacy case where the privacy leakage was zero. Since X is a deterministic function of Y, we have

h_{0} (P_{X Y}) = g_{0} (P_{X Y})

and

P_{X Y} \in {\hat{P}}_{X Y}

. Hence, using the linear program in [15], we found U that achieved

g_{0} (P_{X Y}

, and

K (P_{X Y})

was obtained. Using Theorem 4, the average length of

C

was 2.88 bits; however, using the method in [1], 4.16 bits were required to send the message

C

over the channel.

7. Conclusions

We have studied a compression problem with a privacy constraint in two scenarios, where the information delivered over the channel is either independent of X or is correlated with X, satisfying a bounded leakage constraint. Considering the perfect privacy constraint, we proposed new upper bounds using a two-part construction coding that benefits from the solution of

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

to encode the second part of the code. For the first part of the code, we hide the private data using one-time pad coding. Furthermore, in case of

| Y | \geq | X |

, we proposed a new achievable scheme. We have shown that the new upper bounds can improve the existing ones. Moreover, we strengthened the bounds using the greedy entropy-based algorithm. Considering the second scenario, we obtained upper and lower bounds and showed that the upper bounds can be asymptotically tight. To achieve the upper bounds, we used two-part construction coding. Furthermore, in the case of perfect privacy, we showed that, by using the separation technique, the previous upper bounds can be improved.

Author Contributions

Methodology, A.Z. and M.S.; Validation, M.S.; Formal analysis, A.Z.; Investigation, A.Z.; Writing–original draft A.Z.; Writing–review & editing A.Z. and M.S.; Supervision, M.S.; Funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

Swedish Research Council (VR) under grant 2019-03606.

Data Availability Statement

The data presented in this study are openly available in https://doi.org/10.1109/WIFS58808.2023.10374604.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Proof of Theorem 4

Let

U \in U^{1}

. In the following, we expand

H (Y, U | X = x)

using the chain rule:

\begin{matrix} H (Y, U | X = x) & = H (U | X = x) + H (Y | U, X = x) \\ \overset{(a)}{=} H (U), \end{matrix}

(A1)

where, in (a), we used

\begin{matrix} I (X; U) = 0 \to H (U) = H (U | X) = H (U | X = x), \end{matrix}

and

\begin{matrix} H (Y | X, U) = 0 \to H (Y | U, X = x) = 0 . \end{matrix}

Furthermore,

\begin{matrix} H (Y, U | X = x) & = H (Y | X = x) + H (U | Y, X = x) \\ = H (Y | X = x) + \sum_{y} P_{y | x} H (U | Y = y, X = x) \\ \overset{(b)}{=} H (Y | X = x) + \sum_{y} P_{y | x} H (U | Y = y), \end{matrix}

(A2)

where (b) follows from the Markov chain

X - Y - U

. Combining (A1) and (A2), for any

x \in X

, we have

\begin{matrix} H (U) = H (Y | X = x) + \sum_{y} P_{y | x} H (U | Y = y) . \end{matrix}

(A3)

Furthermore, by using Lemma 4, U is an optimizer of both

g_{0} (P_{X Y})

and

h_{0} (P_{X Y})

. Thus,

\begin{matrix} H (U) = H (Y | X) + U (U | Y) = H (Y | X) + \sum_{y} P_{y} H (U | Y = y) \end{matrix}

(A4)

Combining (A4) and (A3), we have

\begin{matrix} H (Y | X) + \sum_{y} P_{y} H (U | Y = y) = H (Y | X = x) + \sum_{y} P_{y | x} H (U | Y = y) \end{matrix}

(A5)

Thus, for any

x \in X

\begin{matrix} \sum_{y} (P_{y | x} - P_{y}) H (U | Y = y) = H (Y | X) - H (Y | X = x) . \end{matrix}

(A6)

For any optimizer

U \in U^{1}

, (A6) holds; hence, if we take the maximum over

H (U | Y = y)

and by letting

a_{i} = H (U | Y = y_{i})

, we obtain

\begin{matrix} H (U) \leq max H (U) = H (Y | X) + max \sum_{i} P_{y_{i}} a_{i} \end{matrix}

(A7)

subject to

\begin{matrix} A_{X Y} a = b_{X Y}, a \geq 0 . \end{matrix}

(A8)

To prove (49), note that, since we consider all optimizers of

g_{0} (P_{X Y})

, there exists at least one

U^{*}

that satisfies

| U^{*} | \leq null (P_{X | Y}) + 1

, which results

H (U^{*}) \leq log (null (P_{X | Y}) + 1)

. Since (A6) must hold for any optimizer,

U^{*}

satisfies the inequality in (49). Moreover, (50) follows from Lemma 6 and the constraint we added to have

H (U) \leq log (null (P_{X | Y}) + 1)

. To prove the last statement, note that when

Y = f (X)

, using the key equation in (36), the upper bound on

h_{0} (P_{X Y})

equals zero, which results in

h_{0} (P_{X Y}) = g_{0} (P_{X Y}) = 0

. Furthermore, as shown in ([1], Theorem 5), when

Y \neq f (X)

, we have

h_{0} (P_{X Y}) > 0

; thus, the system of linear equations in (A6) has at least one solution. However, due to

rank (A_{X Y}) = | Y |

, it has a unique solution, since the number of independent equations is at least the number of variables. Finally, to prove the lower bound in (47), note that for any U that satisfies (33), (34) and (35), (A6) holds. By taking the minimum over

H (U) = H (Y | X) + \sum_{i} P_{y_{i}} H (U | y_{i})

subject to (A6). Moreover, the inequality holds for

U^{*}

that achieves the minimum entropy.

Appendix A.2. Proof of Theorem 5

Let W be the shared secret key with key size

M = | X |

, which is uniformly distributed over

{1, \dots, M} = {1, \dots, | X |}

and independent of

(X, Y)

. As shown in Figure 4, first, the private data X are encoded using the shared secret key ([2], Lemma 1). Thus, we have

\begin{matrix} \tilde{X} = X + W \mod | X | . \end{matrix}

Next, we show that

\tilde{X}

has a uniform distribution over

{1, \dots, | X |}

and

I (X; \tilde{X}) = 0

. We have

\begin{matrix} H (\tilde{X} | X) = H (X + W | X) = H (W | X) = H (W) = log (| X |) . \end{matrix}

(A9)

Furthermore,

H (\tilde{X} | X) \leq H (\tilde{X})

, and combining it with (A9), we obtain

H (\tilde{X} | X) = H (\tilde{X}) = log (| X |)

. For encoding

\tilde{X}

, we use

⌈ log (| X |) ⌉

bits. Let

C_{1}

denote the encoded

\tilde{X}

. Next, by using

P_{X Y} \in {\hat{P}}_{X Y}

, let

U^{*} \in U^{1}

be an optimizer of

h_{0} (P_{X Y}) = g_{0} (P_{X Y})

that achieves

K (P_{X Y})

, i.e., has the minimum entropy. We encode

U^{*}

using any lossless code, which uses at most

H (U^{*}) + 1 = K (P_{X Y}) + 1

bits and let

C_{2}

be the encoded message

U^{*}

. We send

C = (C_{1}, C_{2})

over the channel. Thus, (52) can be obtained. Note that, as shown in Figure 5, on the decoder side, we first decode X using W, by adding

| X | - W

to

\tilde{X}

, then, Y is decoded using the fact that

U^{*}

satisfies

H (Y | U^{*}, X) = 0

. Next, we show that if W is independent of

(X, U^{*})

, we obtain

I (C; X) = 0

. We have

\begin{matrix} I (C; X) & = I (C_{1}, C_{2}; X) = I (U, \tilde{X}; X) \\ = I (U^{*}; X) + I (\tilde{X}; X | U) = I (\tilde{X}; X | U^{*}) \\ \overset{(a)}{=} H (\tilde{X} | U^{*}) - H (\tilde{X} | X, U^{*}) \\ = H (\tilde{X} | U^{*}) - H (X + W | X, U^{*}) \\ \overset{(b)}{=} H (\tilde{X} | U^{*}) - H (W) \\ \overset{(c)}{=} H (\tilde{X}) - H (W) \\ = log (| X |) - log (| X |) \\ = 0 \end{matrix}

where (a) follows from

I (U^{*}; X) = 0

; (b) follows, since W is independent of

(X, U^{*})

; and (c) from the independence of

U^{*}

and

\tilde{X}

. The latter follows, since we have

\begin{matrix} 0 \leq I (\tilde{X}; X | U^{*}) & = H (\tilde{X} | U^{*}) - H (W) \\ \overset{(i)}{=} H (\tilde{X} | U^{*}) - H (\tilde{X}) \\ \leq 0 . \end{matrix}

Thus,

\tilde{X}

and

U^{*}

are independent. Step (i) above follows from the fact that W and

\tilde{X}

are uniformly distributed over

{1, \dots, | X |}

, i.e.,

H (W) = H (\tilde{X})

. As a summary, if we choose W independent of

(X, U^{*})

, the leakage to the adversary is zero. Thus, (52) is achievable. The upper bounds (53) and (54) can be obtained using the same coding and chain of inequalities stated in Theorem 4. The upper bounds in (56) and (57) were derived in ([1], Theorem 8). The upper bound in (55) can be obtained using the same coding. Note that the inequality between the upper bounds follows from the fact that the RV U that achieves the upper bound in (56) does not necessarily minimize the entropy

H (U)

; however, it satisfies the constraints

I (X; U) = 0

and

H (Y | X, U) = 0

. Finally, to achieve (58), let the shared key W be independent of

(X, Y)

and have a uniform distribution. We construct

\tilde{Y}

using one-time pad coding. We have

\begin{matrix} \tilde{Y} = Y + W \mod | Y | . \end{matrix}

Then,

\tilde{Y}

is encoded using any lossless code which uses at most

⌈ log (| Y |) ⌉

bits. The latter follows, since

\tilde{Y}

has a uniform distribution. It only remains to show that

I (\tilde{Y}; X) = 0

. We have

\begin{matrix} I (\tilde{Y}; X, Y) & = I (Y + W; X, Y) \\ = H (Y + W) - H (Y + W | X, Y) \\ = log (| Y |) - H (W | X, Y) \\ = log (| Y |) - H (W) \\ = log (| Y |) - log (| Y |) \\ = 0 . \end{matrix}

Hence,

I (\tilde{Y}; X) = 0

. Furthermore, on the decoder side, we can decode Y using W.

Appendix B

Appendix B.1. Proof of Lemma 7

First we obtain

L_{3} (ϵ, P_{X Y})

. Using the key equation ([19], (7)) with assignments

X \leftarrow X

and

Y \leftarrow (Y, W)

, for any RVs X, Y, W, and U, we have

\begin{matrix} I (U; Y, W) & = I (U; X) + H (Y, W | X) \\ - H (Y, W | X, U) - I (X; U | Y, W) . \end{matrix}

(A10)

Using (A10), if U satisfies

I (U; X) = ϵ

and

H (Y | U, X, W) = 0

and W is independent of

(X, Y)

, we have

\begin{matrix} H (U) & \geq I (U; Y, W) \\ = I (U; X) + H (Y | X) + H (W | X, Y) \\ - H (W | X, U) - H (Y | X, U, W) - H (X | Y, W) \\ + H (X | W, Y, U) \\ \overset{(a)}{=} ϵ + H (Y | X) + H (W) - H (W | X, U) \\ - H (X | Y) + H (X | Y, U, W) \\ \overset{(b)}{\geq} ϵ + H (Y | X) - H (X | Y), \end{matrix}

where in (a), we used

I (X; U) = ϵ

independently of W and

(X, Y)

, and

H (Y | X, U, W) = 0

. Step (b) follows from the non-negativity of

H (W) - H (W | X, U) = I (W; X, U)

and

H (X | Y, U, W)

.

To prove

L_{2} (ϵ, P_{X Y})

, note that we have

\begin{matrix} H (U) \geq min_{x} H (U | X = x) + ϵ, \end{matrix}

(A11)

since

\begin{matrix} H (U) - min_{x} H (U | X = x) & \geq H (U) - \sum_{x} P_{X} (x) H (U | X = x) \\ = ϵ . \end{matrix}

Furthermore,

\begin{matrix} (A12) & H (U | X = x) & \geq H (U | W, X = x) \\ \overset{(a)}{\geq} H (U | W, X = x) + H (Y | U, W, X = x) \\ = H (U, Y | W, X = x) \\ = H (Y | W, X = x) + H (U | Y, W, X = x) \\ (A13) & \overset{(b)}{\geq} H (Y | W, X = x) \\ (A14) & = H (Y | X = x), \end{matrix}

where (a) used the fact that

H (Y | U, W, X = x) = 0

, since

H (Y | U, W, X) = 0

. (b) follows, since W is independent of X and Y. Taking the minimum on both sides, we obtain

{min}_{x} H (U | X = x) \geq {min}_{x} H (Y | X = x)

. Combining this with (A11), we obtain

L_{2} (ϵ, P_{X Y})

. To obtain

L_{1} (ϵ, P_{X Y})

, we take the average from both sides of (A12) and (A14). This gives us

\begin{matrix} H (U) \geq H (U | X) \geq H (Y | X), \end{matrix}

which completes the proof.

Appendix B.2. Proof of Theorem 8

To prove (73), we use the two-part construction coding scheme as used in [1]. As shown in Figure A1, first the private data X are encoded by one-time-pad coding ([2], Lemma 1), which uses

⌈ log (| X |) ⌉

bits. Next, we produce U based on EFRL ([19], Lemma 3) and encode it using any traditional code which uses at most

H (U) + 1

bits. By using the upper bound on

H (U)

found in ([19], Lemma 5), we obtain

\begin{matrix} L (P_{X Y}, | X |, ϵ) \leq \\ \sum_{x \in X} H (Y | X = x) + ϵ + h (α) + 1 + ⌈ log (| X |) ⌉ . \end{matrix}

For proving (74) and (75), we use the same coding and note that when

| Y |

is finite, we use the bound

| U | \leq [| X | (| Y | - 1) + 1] [| X | + 1]

, which results in (74). Furthermore, when X is a deterministic function of Y, we can use the bound

| U | \leq [| Y | - | X | + 1] [| X | + 1]

that leads to (75). Moreover, for the leakage constraint, we note that the randomness of the one-time-pad coding is independent of X and the output of the EFRL. Let Z be the compressed output of the EFRL and

\tilde{X}

be the output of the one-time-pad coding. Then, we have

\begin{matrix} I (X; \tilde{X}, Z) & = I (X; \tilde{X}) + I (X; Z | \tilde{X}) \\ = I (X; Z) \\ = ϵ, \end{matrix}

where we used the independency between

\tilde{X}

and

(Z, X)

. Moreover, as shown in Figure A2, on the receiver side, we first decode X using the shared key W, then by using the fact that U is produced by EFRL, we can decode Y based on X and U.

Figure A1. We use a two-part construction coding strategy to send codewords over the channels. As discussed in Theorem 8, on the receiver side using

C

and the shared key, we decode X, and using X, the shared key, and

C

, we decode Y. The leakage from X to

C

equals

ϵ

.

Figure A1. We use a two-part construction coding strategy to send codewords over the channels. As discussed in Theorem 8, on the receiver side using

C

and the shared key, we decode X, and using X, the shared key, and

C

, we decode Y. The leakage from X to

C

equals

ϵ

.

Figure A2. On the receiver side, we first decode X using the shared key W, then by using the fact that U is produced by EFRL, we have

H (Y | X, U) = 0

. Thus, we can decode Y based on X and U.

Figure A2. On the receiver side, we first decode X using the shared key W, then by using the fact that U is produced by EFRL, we have

H (Y | X, U) = 0

. Thus, we can decode Y based on X and U.

Appendix B.3. Proof of Theorem 9

Let

\tilde{U} = C (Y, W)

be the output of the encoder. Since the code is

ϵ

-private, it satisfies

I (U; X) = ϵ

. Due to recoverability, we can recover the useful data, and this also satisfies

H (Y | U, X, W) = 0

. Thus, by using Lemma 7, we have

\begin{matrix} H (\tilde{U}) \geq max {L_{1} (ϵ, P_{X Y}), L_{2} (ϵ, P_{X Y}), L_{3} (ϵ, P_{X Y})}, \end{matrix}

which results in (76). The proof of (77) is similar to ([1], Theorem 9). For any

\tilde{u} \in ξ

, we have

\begin{matrix} P_{\tilde{U}} (\tilde{u}) & = \sum_{x} P_{X} (x) P_{\tilde{U} | X} (\tilde{u} | x) \\ \leq max_{x} P_{X} (x) \sum_{x} P_{\tilde{U} | X} (\tilde{u} | x) \\ = max_{x} P_{X} (x) \sum_{x} \sum_{w} P_{\tilde{U} | W X} (\tilde{u} | w, x) P_{W} (w) \\ = max_{x} P_{X} (x) \sum_{w} P_{W} (w) \sum_{x} P_{\tilde{U} | W X} (\tilde{u} | w, x) \\ \overset{(a)}{\leq} max_{x} P_{X} (x) \sum_{w} P_{W} (w) = max_{x} P_{X} (x), \end{matrix}

where (a) follows from the lossless condition, since, as argued in ([1], Theorem 9), for a lossless code

P_{\tilde{U} | W X} (\tilde{u} | w, x) > 0

for at most one

x \in X

. Hence, we have

\begin{matrix} H (\tilde{U}) \geq log (\frac{1}{{max}_{x} P_{X} (x)}) . \end{matrix}

Appendix B.4. Proof of Theorem 10

To prove (82),

X_{2}

is encoded by one-time-pad coding ([2], Lemma 1), which uses

⌈ log (| X_{2} |) ⌉

bits. Let the output of the one-time-pad coding be

{\tilde{X}}_{2}

. Next, we encode

X_{1}

using any traditional code, which uses at most

H (X_{1}) + 1

bits. Next, we produce U based on FRL ([1], Lemma 1) with assignments

X \leftarrow (X_{1}, X_{2})

and

Y \leftarrow Y

, and encode it using any traditional code, which uses at most

H (U) + 1

bits. We send the encoded

X_{1}

,

{\tilde{X}}_{2}

and U over the channel. Since U satisfies

I (U; X_{1}, X_{2}) = 0

and

H (Y | U, X_{1}, X_{2}) = 0

by using the upper bound on

H (U)

found in ([1], Lemma 2), we obtain

\begin{matrix} H (U) \leq \sum_{x} H (Y | X = x) . \end{matrix}

Finally, we obtain

\begin{matrix} L^{b} (P_{X Y}, | X_{2} |, ϵ) \leq \\ \sum_{x \in X} H (Y | X = x) + H (X_{1}) + 2 + ⌈ log (| X_{2} |) ⌉ . \end{matrix}

Note that, on the receiver side, using the shared key and

{\tilde{X}}_{2}

we can decode

X_{2}

, and by using

X_{1}

,

X_{2}

and U we can decode Y without loss. For the leakage constraint, we note that the one-time-pad coding can be used such that

{\tilde{X}}_{2}

is independent of

(X_{1}, X_{2}, U)

. Thus, we have

\begin{matrix} I (X_{1}, {\tilde{X}}_{2}, U; X_{1}, X_{2}) = \\ H (X_{1}) + H ({\tilde{X}}_{2}) + H (U) - H (X_{1} | X_{1}, X_{2}) - H ({\tilde{X}}_{2} | X_{1}, X_{2}) \\ - H (U | X_{1}, X_{2}, {\tilde{X}}_{2}) \overset{(a)}{=} H (X_{1}) \leq ϵ, \end{matrix}

where (a) follows from the independency between

{\tilde{X}}_{2}

and

(U, X_{1}, X_{2})

, and independency between U and

(X_{1}, X_{2})

. The proofs of (83) and (84) are based on ([1], (31)) and ([1], (41)). The proof of (85) is similar to (82). We use one-time-pad coding for encoding

X_{1}

instead of

X_{2}

. In this case. we send compressed

X_{2}

,

{\tilde{X}}_{1}

and U over the channel, where

{\tilde{X}}_{1}

is the output of the one-time-pad coding. We then have

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, ϵ) \leq \\ \sum_{x \in X} H (Y | X = x) + H (X_{2}) + 2 + ⌈ log (| X_{1} |) ⌉ . \end{matrix}

Appendix B.5. Proof of Theorem 11

Let

U = C (Y, W)

be the output of the encoder. The code is bounded

ϵ

-private, thus

I (U; X_{1}, X_{2}) \leq ϵ

. Due to recoverability, i.e., given the code, the shared secret key, and the private data, we can recover the useful data, which also satisfies

H (Y | W, U, X_{1}, X_{2}) = 0

. Using (A10), we have

\begin{matrix} H (U) & \geq I (U; Y, W) \\ = I (U; X) + H (Y | X) + H (W | X, Y) \\ - H (W | X, U) - H (Y | X, U, W) - H (X | Y, W) \\ + H (X | W, Y, U) \\ \geq H (Y | X) + H (W) - H (W | X, U) \\ - H (X | Y) + H (X | Y, U, W) \\ \geq H (Y | X) - H (X | Y), \end{matrix}

where we used

I (U; X) \geq 0

,

H (X | Y, U, W) \geq 0

and the independency of W and

(X, Y)

. Proof of (87) is similar to (77).

Appendix B.6. Proof of Theorem 13

To prove (91), let U be produced based on FRL ([1], Lemma 1) with assignments

X \leftarrow (X_{1}, X_{2})

and

Y \leftarrow Y

, and encode it using any traditional code which uses at most

H (U) + 1

bits. Thus,

I (U; X_{1}, X_{2}) = 0

and

H (Y | X_{1}, X_{2}, U) = 0

. Since

X_{2} = f (X_{1})

, we have

\begin{matrix} H (Y | X_{1}, X_{2}, U) = H (X_{1}, U) = 0 . \end{matrix}

(A15)

Using ([1], Lemma 2), we have

\begin{matrix} H (U) \leq \sum_{x} H (Y | X = x) . \end{matrix}

We encode

X_{1}

by one-time-pad coding, which uses

⌈ log (| X_{1} |) ⌉

bits. Let the output of one-time-pad coding be

{\tilde{X}}_{1}

. We send compressed

{\tilde{X}}_{1}

and U over the channel. Using the upper bound on

H (U)

and

{\tilde{X}}_{1}

, we have

\begin{matrix} L^{b} (P_{X Y}, | X_{1} |, 0) \leq \sum_{x \in X} H (Y | X = x) + 1 + ⌈ log (| X_{1} |) ⌉ . \end{matrix}

On the receiver side, using the shared key, first

X_{1}

is decoded and we then use (A15) to decode Y using

X_{1}

and U. For the leakage constraint, the independency between

{\tilde{X}}_{1}

and

X_{1}

implies

I (X_{1}, X_{2}; {\tilde{X}}_{1}) = 0

, and the later follows since

X_{2} = f (X_{1})

. Furthermore, we choose

{\tilde{X}}_{1}

to be independent of U. Thus,

\begin{matrix} I ({\tilde{X}}_{1}, U; X_{1}, X_{2}) = I ({\tilde{X}}_{1}, U; X_{1}) = I ({\tilde{X}}_{1}; X_{1}) = 0 . \end{matrix}

The bounds (92) and (93) can be obtained using ([1], (31)) and ([1], (41)).

References

Shkel, Y.Y.; Blum, R.S.; Poor, H.V. Secrecy by Design With Applications to Privacy and Compression. IEEE Trans. Inf. Theory 2021, 67, 824–843. [Google Scholar] [CrossRef]
Shkel, Y.Y.; Poor, H.V. A Compression Perspective on Secrecy Measures. IEEE J. Sel. Areas Inf. Theory 2021, 2, 163–176. [Google Scholar] [CrossRef]
Shannon, C.E. Communication theory of secrecy systems. Bell Syst. Tech. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
Gündüz, D.; Erkip, E.; Poor, H.V. Source coding under secrecy constraints. In Securing Wireless Communications at the Physical Layer; Springer: Boston, MA, USA, 2009; pp. 173–199. [Google Scholar]
Schaefer, R.F.; Boche, H.; Khisti, A.; Poor, H.V. Information Theoretic Security and Privacy of Information Systems; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Wang, H.; Vo, L.; Calmon, F.P.; Médard, M.; Duffy, K.R.; Varia, M. Privacy with Estimation Guarantees. IEEE Trans. Inf. Theory 2019, 65, 8025–8042. [Google Scholar] [CrossRef]
Rassouli, B.; Gündüz, D. Optimal Utility-Privacy Trade-Off With Total Variation Distance as a Privacy Measure. IEEE Trans. Inf. Forensics Secur. 2020, 15, 594–603. [Google Scholar] [CrossRef]
Rassouli, B.; Rosas, F.E.; Gündüz, D. Data Disclosure under Perfect Sample Privacy. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2012–2025. [Google Scholar] [CrossRef]
Yamamoto, H. A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers (corresp.). IEEE Trans. Inf. Theory 1983, 29, 918–923. [Google Scholar] [CrossRef]
Yamamoto, H. A rate-distortion problem for a communication system with a secondary decoder to be hindered. IEEE Trans. Inf. Theory 1988, 34, 835–842. [Google Scholar] [CrossRef]
Issa, I.; Kamath, S.; Wagner, A.B. An operational measure of information leakage. In Proceedings of the 2016 Annual Conference on Information Science and Systems, Princeton, NJ, USA, 16–18 March 2016; pp. 234–239. [Google Scholar] [CrossRef]
Issa, I.; Kamath, S.; Wagner, A.B. Maximal leakage minimization for the Shannon cipher system. In Proceedings of the 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, 10–15 July 2016; pp. 520–524. [Google Scholar]
Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the 2014 IEEE Information Theory Workshop, Hobart, Australia, 2–5 November 2014; pp. 501–505. [Google Scholar]
Rassouli, B.; Gündüz, D. On Perfect Privacy. IEEE J. Sel. Areas Inf. Theory 2021, 2, 177–191. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. A Design Framework for Strongly χ²-Private Data Disclosure. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2312–2325. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. Data Disclosure With Non-Zero Leakage and Non-Invertible Leakage Matrix. IEEE Trans. Inf. Forensics Secur. 2022, 17, 165–179. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. On the Privacy-Utility Trade-Off With and Without Direct Access to the Private Data. IEEE Trans. Inf. Theory 2024, 70, 2177–2200. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. Bounds for Privacy-Utility Trade-off with Non-zero Leakage. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 620–625. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. Bounds for Privacy-Utility Trade-off with Per-letter Privacy Constraints and Non-zero Leakage. In Proceedings of the 2022 IEEE Information Theory Workshop (ITW), Mumbai, India, 1–9 November 2022; pp. 13–18. [Google Scholar] [CrossRef]
Asoodeh, S.; Diaz, M.; Alajaji, F.; Linder, T. Information extraction under privacy constraints. Information 2016, 7, 15. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Gündüz, D.; Skoglund, M. Private Variable-Length Coding with Zero Leakage. In Proceedings of the 2023 IEEE International Workshop on Information Forensics and Security (WIFS), Nurnberg, Germany, 4–7 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. Private Variable-Length Coding with Non-Zero Leakage. In Proceedings of the 2023 IEEE International Workshop on Information Forensics and Security (WIFS), Nurnberg, Germany, 4–7 December 2023; pp. 1–6. [Google Scholar]
Wang, S.; Lin, Z.; Fanti, G. Statistic Maximal Leakage. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 2742–2747. [Google Scholar]
Sreekumar, S.; Gündüz, D. Optimal Privacy-Utility Trade-off under a Rate Constraint. In Proceedings of the 2019 IEEE International Symposium on Information Theory, Paris, France, 7–12 July 2019; pp. 2159–2163. [Google Scholar] [CrossRef]
Liu, T.Y.; Wang, I.H. Robust Privatization With Multiple Tasks and the Optimal Privacy-Utility Tradeoff. IEEE Trans. Inf. Theory 2024, 70, 8164–8179. [Google Scholar] [CrossRef]
Zarrabian, M.A.; Ding, N.; Sadeghi, P. On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs. Entropy 2023, 25, 679. [Google Scholar] [CrossRef]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference, TCC 2006, New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, 28 May–1 June 2006; pp. 486–503. [Google Scholar]
Kairouz, P.; Oh, S.; Viswanath, P. The composition theorem for differential privacy. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1376–1385. [Google Scholar]
Cuff, P.; Yu, L. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 43–54. [Google Scholar]
Barthe, G.; Olmedo, F. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Riga, Latvia, 8–12 July 2013; pp. 49–60. [Google Scholar]
Feldman, V.; Mironov, I.; Talwar, K.; Thakurta, A. Privacy amplification by iteration. In Proceedings of the 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), Paris, France, 7–9 October 2018; pp. 521–532. [Google Scholar]
Kocaoglu, M.; Dimakis, A.; Vishwanath, S.; Hassibi, B. Entropic causal inference. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Compton, S.; Katz, D.; Qi, B.; Greenewald, K.; Kocaoglu, M. Minimum-entropy coupling approximation guarantees beyond the majorization barrier. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 10445–10469. [Google Scholar]
Shkel, Y.Y.; Yadav, A.K. Information spectrum converse for minimum entropy couplings and functional representations. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 66–71. [Google Scholar]
Calmon, F.P.; Makhdoumi, A.; Medard, M.; Varia, M.; Christiansen, M.; Duffy, K.R. Principal Inertia Components and Applications. IEEE Trans. Inf. Theory 2017, 63, 5011–5038. [Google Scholar] [CrossRef]
Berger, T.; Yeung, R.W. Multiterminal source encoding with encoder breakdown. IEEE Trans. Inf. Theory 1989, 35, 237–244. [Google Scholar] [CrossRef]
Zamani, A.; Rodríguez-Gálvez, B.; Skoglund, M. On information theoretic fairness: Compressed representations with perfect demographic parity. arXiv 2024, arXiv:2408.13168. [Google Scholar]
Erlingsson, Ú.; Pihur, V.; Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
Asoodeh, S.; Liao, J.; Calmon, F.P.; Kosut, O.; Sankar, L. Three variants of differential privacy: Lossless conversion and applications. IEEE J. Sel. Areas Inf. Theory 2021, 2, 208–222. [Google Scholar] [CrossRef]
Alghamdi, W.; Gomez, J.F.; Asoodeh, S.; Calmon, F.; Kosut, O.; Sankar, L. The saddle-point method in differential privacy. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 508–528. [Google Scholar]
Dwork, C.; Rothblum, G.N.; Vadhan, S. Boosting and differential privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar]
Wyner, A.D. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Basciftci, Y.O.; Wang, Y.; Ishwar, P. On privacy-utility tradeoffs for constrained data release mechanisms. In Proceedings of the 2016 Information Theory and Applications Workshop, La Jolla, CA, USA, 31 January–5 February 2016; pp. 1–6. [Google Scholar] [CrossRef]
Li, C.T.; El Gamal, A. Strong Functional Representation Lemma and Applications to Coding Theorems. IEEE Trans. Inf. Theory 2018, 64, 6967–6978. [Google Scholar] [CrossRef]
Wyner, A. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Gündüz, D.; Skoglund, M. Cache-Aided Private Variable-Length Coding with Zero and Non-Zero Leakage. In Proceedings of the 2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Singapore, 24–27 August 2023; pp. 247–254. [Google Scholar] [CrossRef]
Maddah-Ali, M.A.; Niesen, U. Fundamental limits of caching. IEEE Trans. Inf. Theory 2014, 60, 2856–2867. [Google Scholar] [CrossRef]

Figure 1. In this work, an encoder wants to compress Y, which is correlated with X under certain privacy leakage constraints and send it over a channel where an eavesdropper has access to the output of the encoder.

Figure 2. Two-part construction coding: A strategy to send codewords over the channels. We hide the information of X using one-time-pad coding, and we then use the FRL to produce U.

Figure 3. At the receiver side, we first decode X using the shared key W, then by using the fact that U satisfies

H (Y | X, U) = 0

, we can decode Y based on X and U.

Figure 3. At the receiver side, we first decode X using the shared key W, then by using the fact that U satisfies

H (Y | X, U) = 0

, we can decode Y based on X and U.

Figure 4. In this work, we use two-part construction coding to send codewords over the channels. The encoder takes X, Y, and W as inputs. In the first step, it hides the information of X using one-time-pad coding by adding the secret key W to X to produce

\tilde{X}

, then,

\tilde{X}

is encoded to

C_{1}

using any lossless code. In the second step, the encoder uses the solution of

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

to construct U, which is independent of X. Moreover, U satisfies

H (Y | U, X) = 0

, which ensures that Y can be recovered from X and U. Then, U is encoded to

C_{2}

using any lossless code. Finally,

C = (C_{1}, C_{2})

is sent over the channel.

Figure 4. In this work, we use two-part construction coding to send codewords over the channels. The encoder takes X, Y, and W as inputs. In the first step, it hides the information of X using one-time-pad coding by adding the secret key W to X to produce

\tilde{X}

, then,

\tilde{X}

is encoded to

C_{1}

using any lossless code. In the second step, the encoder uses the solution of

g_{0} (P_{X Y}) = h_{0} (P_{X Y})

to construct U, which is independent of X. Moreover, U satisfies

H (Y | U, X) = 0

, which ensures that Y can be recovered from X and U. Then, U is encoded to

C_{2}

using any lossless code. Finally,

C = (C_{1}, C_{2})

is sent over the channel.

Figure 5. On the receiver side, we first decode X using the shared key W, then by using

H (Y | X, U) = 0

, we can decode Y based on X and U.

Figure 5. On the receiver side, we first decode X using the shared key W, then by using

H (Y | X, U) = 0

, we can decode Y based on X and U.

Figure 6. Encoder design: illustration of the achievability scheme of Theorem 7. Two-part code construction is used to produce the response of the server,

C

. The server sends

C

over the channel, which is independent of X.

Figure 6. Encoder design: illustration of the achievability scheme of Theorem 7. Two-part code construction is used to produce the response of the server,

C

. The server sends

C

over the channel, which is independent of X.

Figure 7. A server wants to send a response over a shared link to the satisfy users’ demands, but since the database is correlated with the private data, the existing schemes are not applicable. In the delivery phase, we hide the information about X using one-time-pad coding and send the rest of response using the functional representation lemma (FRL).

Figure 8. Quantized image of digit 3 with resolution

28 \times 28

.

Figure 8. Quantized image of digit 3 with resolution

28 \times 28

.

Table 1. Marginal distribution of the useful data Y.

	$y = 1$	$y = 2$	$y = 3$	$y = 4$	$y = 5$
$\Pr (Y = y)$	$0.1851$	$0.4069$	$0.2981$	$0.0936$	$0.0151$
	$y = 6$	$y = 7$
$\Pr (Y = y)$	$0.0010$	$0.0002$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zamani, A.; Skoglund, M. Variable-Length Coding with Zero and Non-Zero Privacy Leakage. Entropy 2025, 27, 124. https://doi.org/10.3390/e27020124

AMA Style

Zamani A, Skoglund M. Variable-Length Coding with Zero and Non-Zero Privacy Leakage. Entropy. 2025; 27(2):124. https://doi.org/10.3390/e27020124

Chicago/Turabian Style

Zamani, Amirreza, and Mikael Skoglund. 2025. "Variable-Length Coding with Zero and Non-Zero Privacy Leakage" Entropy 27, no. 2: 124. https://doi.org/10.3390/e27020124

APA Style

Zamani, A., & Skoglund, M. (2025). Variable-Length Coding with Zero and Non-Zero Privacy Leakage. Entropy, 27(2), 124. https://doi.org/10.3390/e27020124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Variable-Length Coding with Zero and Non-Zero Privacy Leakage^†

Abstract

1. Introduction

1.1. Related Works

1.2. Our Contributions

2. System Model and Problem Formulation

2.1. Non-Zero Privacy Leakage (Non-Perfect Privacy)

2.2. Zero Privacy Leakage (Perfect Privacy)

3. Preliminaries and Overview of the Privacy Problems

4. Main Results

4.1. Zero Leakage Results

4.2. Comparison

4.3. Improving the Bounds Using Greedy Entropy-Based Algorithm

4.4. Non-Zero Leakage Results

5. Applications: Cache-Aided Networks

6. Experiment

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 4

Appendix A.2. Proof of Theorem 5

Appendix B

Appendix B.1. Proof of Lemma 7

Appendix B.2. Proof of Theorem 8

Appendix B.3. Proof of Theorem 9

Appendix B.4. Proof of Theorem 10

Appendix B.5. Proof of Theorem 11

Appendix B.6. Proof of Theorem 13

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Variable-Length Coding with Zero and Non-Zero Privacy Leakage †

Abstract

1. Introduction

1.1. Related Works

1.2. Our Contributions

2. System Model and Problem Formulation

2.1. Non-Zero Privacy Leakage (Non-Perfect Privacy)

2.2. Zero Privacy Leakage (Perfect Privacy)

3. Preliminaries and Overview of the Privacy Problems

4. Main Results

4.1. Zero Leakage Results

4.2. Comparison

4.3. Improving the Bounds Using Greedy Entropy-Based Algorithm

4.4. Non-Zero Leakage Results

5. Applications: Cache-Aided Networks

6. Experiment

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 4

Appendix A.2. Proof of Theorem 5

Appendix B

Appendix B.1. Proof of Lemma 7

Appendix B.2. Proof of Theorem 8

Appendix B.3. Proof of Theorem 9

Appendix B.4. Proof of Theorem 10

Appendix B.5. Proof of Theorem 11

Appendix B.6. Proof of Theorem 13

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Variable-Length Coding with Zero and Non-Zero Privacy Leakage^†