Next Article in Journal
A Systematic Review of INGARCH Models for Integer-Valued Time Series
Previous Article in Journal
In-Network Learning: Distributed Training and Inference in Networks
Previous Article in Special Issue
On Decoder Ties for the Binary Symmetric Channel with Arbitrarily Distributed Input
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Utility–Privacy Trade-Offs with Limited Leakage for Encoder

Department of Computer and Network Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu 182-8585, Tokyo, Japan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2023, 25(6), 921; https://doi.org/10.3390/e25060921
Submission received: 10 April 2023 / Revised: 28 May 2023 / Accepted: 6 June 2023 / Published: 11 June 2023
(This article belongs to the Special Issue Advances in Information and Coding Theory)

Abstract

:
The utilization of databases such as IoT has progressed, and understanding how to protect the privacy of data is an important issue. As pioneering work, in 1983, Yamamoto assumed the source (database), which consists of public information and private information, and found theoretical limits (first-order rate analysis) among the coding rate, utility and privacy for the decoder in two special cases. In this paper, we consider a more general case based on the work by Shinohara and Yagi in 2022. Introducing a measure of privacy for the encoder, we investigate the following two problems: The first problem is the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder, in which utility is measured by the expected distortion or the excess-distortion probability. The second task is establishing the strong converse theorem for utility–privacy trade-offs, in which utility is measured by the excess-distortion probability. These results may lead to a more refined analysis such as the second-order rate analysis.

1. Introduction

1.1. Background

The utilization of database has progressed in our society and includes autonomous cars and the congestion data service over the Internet. At the same time, the risk of accidental or intentional leakage of private information has also increased rapidly. To protect private information, coding with a privacy constraint has been analyzed via an information-theoretic approach. In 1983, Yamamoto [1] introduced a framework to quantify the utility of databases and the privacy of personal information and analyzed the trade-offs between them. Decades later, in 2013, Sankar et al. [2] claimed the necessity of converting databases to protect privacy while maintaining the utility of data. Then, Yamamoto’s framework [1] was re-recognized by Sankar et al. and other researchers. Using the rate-distortion theory in information theory, he revealed the optimal relationships (theoretical limits) among coding rate, utility, and privacy in two cases; (i) public information that can be open to the public and private information that should be protected from a third party are encoded, and (ii) only public information is encoded. However, since a more general case, i.e., where (iii) public information and a part of private information is encoded, had not been clarified, Shinohara and Yagi [3] derived the theoretical limits in such a case (see Figure 1). As a result, our characterization of the achievable region gives a “unified expression” because it includes the characteristics given in [1] in cases (i) and (ii) as special cases.

1.2. Motivation and Contributions

By investigating case (iii), one can compare the theoretical limits corresponding to a variety of patterns of the encoded information. One can see that the achievable region in case (i) is the largest among all patterns. However, this may not be the case if privacy leakage for the encoder is constrained. Motivated by this observation, in this paper, we characterize the optimal trade-offs among coding rate, utility, privacy for the decoder, and privacy for the encoder in Section 3. The addressed problem corresponds to the case where there are some aggregators between the source and the encoder and the aggregator controls the data (source sequence) passing to the encoder. The obtained results indeed suggest that the best-encoded information can be in case (iii) if some restriction is imposed on the privacy leakage for the encoder.
One of the most important tasks in information-theoretic analysis for utility–privacy trade-offs is second-order rate analysis (e.g., [4,5,6]). In general, in second-order rate analysis, the excess-distortion probability is used as a measure of utility [4,5,6]. However, in the first-order rate analysis shown in [3], utility is measured by the expected distortion, so for second-order rate analysis, we need first to conduct first-order rate analysis, which replaces the expected distortion with the excess-distortion probability as the measure of utility. In Section 4, the theoretical limits coincide with the one in which utility is measured by expected distortion.
There is one more problem to solve before tackling second-order rate analysis: we need to clarify whether the boundary of the achievable region may vary or not, depending on the value of the excess-distortion probability. In Section 5, we establish the strong converse theorem, provided that utility is measured by the probability of excess distortion. For the sake of simplicity, we focus on the achievable region of utility and privacy for the decoder or a third party, which reveals an aspect of utility–privacy trade-offs. In the proof, we adopt a change in measure argument developed by Tyagi and Watanabe [7]. Contrary to the standard rate-distortion problem, the alphabets of the encoder’s input and the decoder’s output are different, so we extend the argument to incorporate this discrepancy. Although the strong converse theorem is shown for the rate region of utility and privacy, we can also derive the same result when the privacy of the encoder is involved.
For readers’ convenience, Figure 2 shows the road map to the most important task: the second-order rate analysis. In summary, three contributions of this paper are as follows:
  • The rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured using the expected distortion (Section 3).
  • The rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured using the excess-distortion probability (Section 4).
  • The strong converse theorem for utility–privacy trade-offs in which utility is measured using the excess-distortion probability (Section 5).

1.3. Related Work

The analysis of the utility–privacy trade-offs using an information-theoretic approach was initiated by [2], which translates the rate-distortion problem with an equivocation constraint in [1] into the privacy and utility trade-off problem. In information-theoretic studies on coding with privacy and utility constraints, several measures for privacy and utility are adopted. One of the strong measures for privacy is differential privacy [8,9], and an extension and relaxation of differential privacy have been proposed in [10,11]. A weaker but useful privacy measure is the mutual information between the codeword and private information [1,2,12,13,14], which guarantees the average amount of leaked private information. Other examples of well-known privacy measures are maximal leakage [15], maximal α -leakage [16,17,18], and total variation [19]. Relationships among several measures for privacy have been revealed in [20]. On the other hand, well-known utility measures are average distortion [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], hard distortion [16,17], and log-loss distortion [26].
Coding systems in the utility–privacy problem are extended to the ones with the encoder’s side information [2] and with the decoder’s side information [25]. In [14], a related coding problem has been investigated, where both the encoder and the decoder can access a uniform secret key and the decoder can also access side information. Utility–privacy trade-off schemes are applied, for example, to the Internet of Energy [23] and to a system with informational self-determination [24].
A closely related study to this paper was given by Basciftci et al. [13], in which several release mechanisms of encoded information from the database were discussed. In particular, utility–privacy trade-offs (without the coding rate) were compared when the encoded information was (i) both private and public information, (ii) only public information, and (iv) only private information (see also the three cases described in Section 1.1). A sufficient condition under which the utility–privacy trade-offs coincide for cases (i) and (ii) was given.

1.4. Organization

This paper is organized as follows: In Section 2, we begin by introducing the notation and system model that are used in this paper. In Section 3, we give the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured by the expected distortion. In Section 4, we tackle the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured by the excess-distortion probability. Section 5 focuses on the strong converse theorem for utility–privacy trade-offs in which utility is measured by the excess-distortion probability. In Section 6, we discuss the significance of the encoded information with limited leakage for the encoder. Finally, in Section 7, the conclusion and future work are stated.

2. Notation and System Model

2.1. Information Source

Database d is described by a K × n matrix whose rows represent K attributes and columns represent n entries of data. Let K = { 1 , 2 , , K } be the set of indexes of K attributes. The random variable for the lth attribute is denoted by X l , which takes a value in a finite alphabet X l . For any subset B K , the tuple of random variables ( X l ) l B is abbreviated as X B . Similarly, the Cartesian product of alphabets l B X l is abbreviated as X B .
The K attributes can be divided into two groups; one may be open to the public and the other should be kept secret from a third party. Then, the set K is divided into disjoint sets R and H . That is,
K = R H , R H = , X K = X R × X H ,
where X R is the set of values that public (revealed) source symbols X R take and X H is the set of values that private (hidden) source symbols X H take.
We assume that the source sequence X K n = ( X K , 1 , X K , 2 , , X K , n ) is generated from a stationary and memoryless source p X K . That is,
P X K n ( x K n ) = Pr { X K n = x K n } = i = 1 n P X K ( x K , i ) ,
where x K n = ( x K , 1 , , x K , n ) X K n . Taking the partition of attributes in (1) into account, the source sequence X K n is described as
X K n = ( X R n , X H n ) ,
where
X R n = ( X R , 1 , X R , 2 , , X R , n ) X R n ,
X H n = ( X H , 1 , X H , 2 , , X H , n ) X H n
are referred to as the revealed source sequence and the hidden source sequence, respectively. In the addressed coding system introduced in [22], the revealed symbols and a part of the hidden symbols are input to the encoder, and thus the encoded alphabet E satisfies R E K . Similar to (3), X K n is sometimes described as
X K n = ( X E n , X E c n ) ,
where X E n is the source sequence observed by the encoder and E c = K E .

2.2. Encoder and Decoder

The coding system consists of encoder f n and decoder g n as in Figure 1. When the source sequence X K n = ( X E n , X E c n ) is generated from the stationary and memoryless source p X K , the codeword J n = f n ( X E n ) is generated by the encoder
f n : X E n { 1 , 2 , , M n }
and the reproduced sequence X ^ R n = g n ( J n ) is produced by decoder
g n : { 1 , 2 , , M n } X ^ R n ,
where M n denotes the number of codewords.

3. First-Order Rate Analysis with Expected Distortion

3.1. Performance Measures

In this section, we mention the measure of the coding rate, utility, privacy for the decoder, and privacy for the encoder. Hereafter, let a pair of the encoder and decoder ( f n , g n ) be fixed.
For a given M n , the coding rate is defined as
r n 1 n log M n .
Let d : X R × X ^ R [ 0 , ) be a distortion function between x R X R and x ^ R X ^ R . The distortion between sequences x R n X R n and x ^ R n X ^ R n is defined as
d ( x R n , x ^ R n ) i = 1 n d ( x R , i , x ^ R , i ) .
Then, the measure of utility is defined as
u n E 1 n d ( X R n , X ^ R n ) ,
where E represents the expectation by the joint distribution of ( X R n , X ^ R n ) .
In this system, the privacy of the hidden source sequence X H n should be protected when the codeword J n is observed by decoder g n . The measure of privacy for the decoder is defined as
l n 1 n I ( X H n ; J n ) ,
where I ( X H n ; J n ) is the mutual information between X H n and J n .
The privacy of the hidden source sequence X H n should be protected when the encoded information X E is observed by encoder f n . The measurement of privacy for the encoder is defined as
e n 1 n I ( X H n ; X E n ) ,
where I ( X H n ; X E n ) is the mutual information between X H n and X E n .

3.2. Achievable Region and Theorem

We define the achievable region for the first-order rate analysis with the expected distortion and state the obtained results.
Definition 1.
A tuple ( R , D , L , E ) is said to be  ϵ -achievable (with respect to the expected distortion measure) if, for any given ϵ > 0 , there exists a sequence of codes ( f n , g n ) satisfying
r n R + ϵ ,
u n D + ϵ ,
l n L + ϵ ,
e n E + ϵ
for all sufficiently large n.
The technical meanings of each constraint in Definition 1 can be interpreted as follows: Equation (14) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (15) is the constraint corresponding to distortion being less than D + ϵ . The smaller the distortion is, the better the utility is, so this condition should also be decreased. Equation (16) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (17) constrains the amount of private information leaked to the encoder. For the same reason as (16), this quantity should also be decreased.
Remark 1.
The minimum coding rate R for a fixed D corresponds to the rate-distortion function (Section 10 in [27]). Thus, in the proof of achievability, we evaluate the coding rate and the distortion with the argument in rate-distortion theory. This view is also important to correctly understand the numerical results in Section 6.1.
Definition 2.
The closure of the set of ϵ-achievable tuples ( R , D , L , E ) is referred to as the  ϵ -achievable region and is denoted by C E ( ϵ | P X K ) and defines
C E ( P X K ) 0 < ϵ < 1 C E ( ϵ | P X K ) .
To characterize the achievable region, we define the following informational region.
Definition 3.
For any E such that R E K , S E ( P X K ) is defined as
S E ( P X K ) = { ( R , D , L , E ) : R I ( X E ; X ^ R ) , D E [ d ( X R , X ^ R ) ] , L I ( X H ; X ^ R ) , E I ( X H ; X E ) for some P X E , X E c · P X ^ R | X E } .
We establish the next theorem. For the proof of this theorem, please refer to Section 3.3, Section 3.4 and Section 3.5.
Theorem 1.
For any E such that R E K , the achievable region of the coding system is given by
C E ( P X K ) = S E ( P X K ) .
To clarify the relationship with the conventional result of Shinohara and Yagi [3], we mention the achievable region among the coding rate, utility, and privacy, which is derived by projecting the result of Theorem 1 onto the R - D - L hyperplane.
Definition 4.
For any E such that R E K , we define
C E R D L ( ϵ | P X K ) { ( R , D , L ) : ( R , D , L , E ) C E ( ϵ | P X K ) }
and
C E R D L ( P X K ) 0 < ϵ < 1 C E R D L ( ϵ | P X K ) .
Definition 5.
For any E such that R E K , we define
S E R D L ( P X K ) = { ( R , D , L ) : R I ( X E ; X ^ R ) , D E [ d ( X R , X ^ R ) ] , L I ( X H ; X ^ R ) for some P X E , X E c · P X ^ R | X E } .
Corollary 1.
For any E such that R E K , the region C E R D L ( P X K ) is given by
C E R D L ( P X K ) = S E R D L ( P X K ) .
Remark 2.
Corollary 1 suggests that the conventional result [3] can be obtained from C E ( P X K ) .
Remark 3.
The derived characterization in (24) reduces to the characterization given in [1] when the encoded attribute E is either K or R . Thus, (24) gives its generalization for R E K .
Examples to illustrate this result are shown in Section 6.1.

3.3. Proof Preliminaries for First-Order Rate Analysis

For preliminaries for coding theorems by the first-order rate analysis, we define strongly typical sequences that are necessary for the proof and show some properties. These proof preliminaries are also used in Section 4.
Definition 6
(Definition 2.1, [28]). The type of a sequence x n X n of length n is the distribution P x n on X defined by
P x n ( a ) 1 n N ( a | x n ) ,
where N ( a | x n ) represents the number of occurrences of symbol a X in x n . Likewise, the joint type of x n X n and y n Y n is the distribution P x n y n on X × Y defined by
P x n y n 1 n N ( a , b | x n , y n ) ,
where N ( a , b | x n , y n ) represents the number of the occurrences of ( a , b ) X × Y in the pair of sequences ( x n , y n ) .
Definition 7
((Conditional Type), [28], Definition 2.2). We define the conditional type of y n given x n as a stochastic matrix V : X Y satisfying
N ( a , b | x n , y n ) = N ( a | x n ) V ( b | a ) .
In particular, the conditional type of y n given x n is uniquely determined and given by
V ( b | a ) = N ( a , b | x n , y n ) N ( a | x n )
if N ( a | x n ) > 0 for any a X .
Definition 8
((Strongly Typical Sequences), [29], Definition 1.2.8). For any distribution P on X , a sequence x n X n is said to be P-typical with constant δ > 0 if
1 n N ( a | x n ) P ( a ) δ for every a X
and, in addition, no a X with P ( a ) = 0 occurs in x n . The set of such sequences is denoted by T δ n ( P ) . If X is a random variable with values in X , we also refer to P-typical sequences as X-typical sequences and write T δ n ( X ) .
Definition 9
((Conditional Strongly Typical Sequences), [29], Definition 1.2.9). For a stochastic matrix W : X Y , a sequence y n Y n is said to be W-typical given x n X n with constant δ > 0 if
1 n N ( a , b | x n , y n ) 1 n N ( a | x n ) W ( b | a ) δ for every a X , b Y ,
and, in addition, N ( a , b | x n , y n ) = 0 whenever W ( b | a ) = 0 . The set of such sequences y n is denoted by T δ n ( W | x n ) . Further, if X and Y are random variables with values in X and Y , respectively, and P Y | X = W , then they are also said to be Y | X -typical and written as T δ n ( Y | X | x n ) .
Hereafter, the set of conditional strongly typical sequences T δ n ( Y | X | x n ) is abbreviated as T δ n ( Y | x n ) .
We state some lemmas that are used in this proof.
Lemma 1
([29], Lemma 1.2.13). For any positive sequences { δ n } n = 1 and { δ n } n = 1 such that δ n 0 and δ 0 as n 0 , there exists a sequence ϵ n = ϵ n ( | X , Y | , δ n , δ n ) 0 ( n ) such that for every distribution P on X and stochastic matrix W : X Y ,
1 n log | T δ n n ( P ) | H ( P ) ϵ n ,
1 n log | T δ n n ( W | x n ) | H ( W | P ) ϵ n .
Lemma 2
([29], Lemma 1.2.7). Let the variational distance between two distributions P and Q on X be defined as
d v ( P , Q ) x X | P ( x ) Q ( x ) | .
If d v ( P , Q ) < 1 2 , then
| H ( P ) H ( Q ) | d v ( P , Q ) · log d v ( P , Q ) | X | .
Lemma 3
([29], Lemma 1.2.10). If x n T δ n ( X ) and y n T δ n ( Y | x n ) , then ( x n , y n ) T δ + δ n ( X , Y ) and, consequently, y n T δ ( Y ) for δ ( δ + δ ) · | X | .
Lemma 4.
If ( x n , y n ) T δ n ( X , Y ) , then x n T δ 1 n ( X ) and, consequently, y n T δ 2 n ( Y | x n ) for δ 1 | Y | · δ and δ 2 ( | Y | + 1 ) · δ .
Lemma 5.
If y n T δ n ( Y ) and ( x n , y n ) T 2 δ n ( X , Y ) , then x n T δ n ( X | y n ) .
Lemma 6
([29], Lemma 1.2.12 and Remark). For arbitrarily fixed δ > 0 and every distribution P on X and stochastic matrix W : X Y
Pr { X n T δ n ( P ) } 1 2 | X | e 2 δ 2 n ,
Pr { Y n T δ n ( W | x n ) | X n = x n } 1 2 | X | · | Y | e 2 δ 2 n for every x n X n .

3.4. Proof of Converse Part

In this part, we shall prove C E ( P X K ) S E ( P X K ) .
Let a tuple ( R , D , L , E ) C E ( P X K ) be arbitrarily fixed. Then, there exists an ( n , 2 n ( R + ϵ ) , D + ϵ , L + ϵ , E + ϵ ) code that satisfies (14)–(17). Let Q be a uniform random variable over { 1 , 2 , , n } and let p i ( x E , i , x E c , i , x ^ R , i ) be the conditional distribution given Q = i . Evaluating the inequalities for R, we obtain
R + ϵ ( a ) 1 n log M n ( b ) 1 n H ( J n ) 1 n I ( J n ; X E n ) = ( c ) 1 n { H ( X E n ) H ( X E n | J n , X ^ R n ) } = ( d ) 1 n i = 1 n H ( X E , i ) 1 n i = 1 n H ( X E , i | X E i 1 , J n , X ^ R n ) ( e ) 1 n i = 1 n H ( X E , i ) 1 n i = 1 n H ( X E , i | X ^ R , i ) = ( f ) i = 1 n Pr { Q = i } H ( X E , i | Q = i ) i = 1 n Pr { Q = i } H ( X E , i | X ^ R , i , Q = i ) = H ( X E , Q | Q ) H ( X E , Q | X ^ R , Q , Q ) = ( g ) H ( X E ) H ( X E , Q | X ^ R , Q , Q ) ( h ) H ( X E ) H ( X E | X ^ R ) = I ( X E ; X ^ R ) ,
(a)
follows from (14),
(b)
follows because H ( J n ) log | J n | = log M n ,
(c)
is due to the fact that X ^ R n = g ( J n ) ,
(d)
follows because each X K , i is independent and X ^ R n is a function of J n ,
(e)
follows because conditioning reduces entropy,
(f)
is due to the definition of Q,
(g)
follows because X E Q , and
(h)
follows because conditioning reduces entropy, where ( X E , X ^ R ) i = 1 n Pr { Q = i } p i ( x E , i , x ^ R , i ) = p ( x E , x ^ R ) .
Similarly, evaluating D, L, and E, respectively, we obtain
D + ϵ ( i ) E 1 n i = 1 n d ( X R , i , X ^ R , i ) = 1 n i = 1 n E [ d ( X R , i , X ^ R , i ) ] = ( j ) E Q [ E [ d ( X R , i , X ^ R , i ) | Q ] ] = ( k ) E [ d ( X R , X ^ R ) ] ,
L + ϵ ( l ) 1 n I ( X H n ; J n ) = 1 n H ( X H n ) 1 n H ( X H n | J n ) = ( m ) H ( X H ) 1 n i = 1 n H ( X H , i | X H i 1 , J n ) = ( n ) H ( X H ) 1 n i = 1 n H ( X H , i | X H i 1 , J n , X ^ R , i ) ( o ) H ( X H ) 1 n i = 1 n H ( X H , i | X ^ R , i ) = ( p ) H ( X H ) i = 1 n Pr { Q = i } H ( X H , i | X ^ R , i , Q = i ) = H ( X H ) H ( X H , Q | X ^ R , Q , Q ) ( q ) H ( X H ) H ( X H | X ^ R ) = I ( X H ; X ^ R ) ,
E + ϵ 1 n I ( X H n ; X E n ) = ( r ) 1 n i = 1 n I ( X H , i ; X E n | X H i 1 ) = ( s ) 1 n i = 1 n I ( X H , i ; X E , i ) = ( t ) I ( X H ; X E ) ,
where
(i)
is due to (15),
(j)
is derived from the definition of Q,
(k)
follows because ( X R , X ^ R ) i = 1 n Pr { Q = i } p i ( x R , i , x ^ R , i ) = p ( x R , x ^ R ) ,
(l)
is due to (16),
(m)
follows because i . i . d . P X K n ,
(n)
follows because X ^ R n = g ( J n ) ,
(o)
follows from the fact that conditioning reduces entropy,
(p)
is derived from the definition of Q, and
(q)
follows because conditioning reduces entropy, where ( X H , X ^ R ) i = 1 n Pr { Q = i } p i ( x H , i , x ^ R , i ) = p ( x H , x ^ R ) ,
(r)
is due to chain rule for mutual information,
(s), (t)
follow because i . i . d . P X K n .
It is readily shown that the Markov chain X E c X E X ^ R holds (cf. Appendix A). We complete the proof of the converse part.

3.5. Proof of Direct Part

In this part, we provide a sketch of the proof of S E ( P X K ) C E ( P X K ) .
Under an arbitrarily fixed distribution P X E , X E c · P X ^ R | X E , any tuple ( R , D , L , E ) S E ( P X K ) is chosen such that
R > I ( X E ; X ^ R ) ,
D > E [ d ( X R , X ^ R ) ] ,
L > I ( X H ; X ^ R ) ,
E > I ( X H ; X E ) .
From (42) and (43), we can choose a sufficiently small ϵ > 0 such that
D > E [ d ( X R , X ^ R ) ] + ϵ ,
L > I ( X H ; X ^ R ) + ϵ .
In addition, with this ϵ , some constant 0 < τ < 1 2 is fixed such that
τ ( log | X H | + 5 ) + 4 τ log | X H | · 2 R 2 τ < ϵ .
We can also choose positive numbers δ ( δ ( n ) ) such that
( δ ( n ) + δ 1 ( n ) ) | X R | · | X ^ R | D max + τ < ϵ ,
2 δ 2 ( n ) R I ( X E ; X ^ R ) 1 n τ ,
δ ( n ) 0 ,
n · δ ( n )
as n , where δ 1 ( | X E | | X R | ) · δ and D max max a X R , b X ^ R d ( a , b ) . Let δ ( n ) = c n log n where c is a constant, and obviously (50) and (51) are satisfied.
Generation of codebook: Randomly generate x ^ R n ( j ) from the strongly typical sequences T δ n ( X ^ R ) for j = 1 , 2 , , M n 2 n R . Reveal the codebook C = { x ^ R n ( 1 ) , , x ^ R n ( M n ) } to the encoder and decoder.
Encoding: If a sequence x E n X E n satisfies x K n = ( x E n , x E c n ) with some x E c n X E c n , we write x E n x K n . Given x K n , the encoder finds j such that x E n T δ n ( X E | x ^ R ( j ) ) and sets f n ( x E n ) = j where T δ n ( X E | x ^ R ( j ) ) is the conditional strongly typical sequences. If there exist multiple such j, f n ( x E n ) is set as the minimum one. If there are no such j, then f n ( x E n ) = M n .
Decoding: When j is observed, the decoder sets the reproduced sequence as X ^ R n = x ^ R n ( j ) .
Evaluation: We define A ( j ) , B ( j ) , and A ˜ ( j ) as
A ( j ) { x E n : f n ( x E n ) = j } ,
B ( j ) { x K n : x E n x K n , f n ( x E n ) = j } ,
A ˜ ( j ) { x K n : x E n x K n , f n ( x E n ) = j , x K n T 2 δ n ( X K | x ^ R n ( j ) ) } ( j = 1 , 2 , , M n 1 ) { x K n : x K n X K n j = 1 M n 1 A ˜ ( j ) } ( j = M n ) .
It is easily verified that A ( j ) for j = 1 , 2 , , M n (also, B ( j ) and A ˜ ( j ) ) is disjoint. From the definitions of J n , A ( j ) , and B ( j ) ,
Pr { J n = j } = Pr { X E n A ( j ) } = Pr { X K n B ( j ) } for j = 1 , 2 , , M n .
For sufficiently large n, we can prove (cf. Appendix B)
| Pr { X K n B ( j ) } Pr { X K n A ˜ ( j ) } | 2 | X K | · | X ^ R | e 2 δ 2 n for j = 1 , 2 , , M n 1 .
For sufficiently large n, we can show that there exists a code ( f n , g n ) such that (cf. Appendix C)
r n R ,
u n E [ d ( X R , X ^ R ) ] + ( δ + δ 1 ) | X R | · | X ^ R | D max + τ ,
e n I ( X H ; X E ) ,
Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n ,
Pr X K n j = 1 M n 1 A ˜ ( j ) τ ,
| A ˜ ( j ) | 2 n { H ( X K | X ^ R ) τ } .
For this code ( f n , g n ) , we evaluate the privacy leakage against the decoder as
l n 1 n I ( X H n ; J n ) = 1 n H ( X H n ) 1 n H ( X H n | J n ) = ( a ) H ( X H ) 1 n j = 1 M n H ( X H n | X K n B ( j ) ) Pr { X K n B ( j ) } ( b ) H ( X H ) 1 n j = 1 M n H ( X H n | X K n A ˜ ( j ) ) Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ
( c ) H ( X H ) 1 n j = 1 M n 1 H ( X H n | X K n A ˜ ( j ) ) Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ = H ( X H ) 1 n j = 1 M n 1 [ x H n Pr { X H n = x H n | X K n A ˜ ( j ) } · log Pr { X H n = x H n | X K n A ˜ ( j ) } ] · Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ ,
where
(a)
follows because of i . i . d . P X K n ,
(b)
is due to the inequality proved in Appendix D,
(c)
follows by removing the term for j = M n .
Here, for any x H n satisfying x K n = ( x R n , x H n ) A ˜ ( j ) with some x R n , we can show that
Pr { X H n = x H n | X K n A ˜ ( j ) } = Pr { X K n A ˜ ( j ) | X H n = x H n } Pr { X H n = x H n } Pr { X K n A ˜ ( j ) } = x R n : ( x R n , x H n ) A ˜ ( j ) Pr { X R n = x R n , X H n = x H n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n } = ( d ) x R n : ( x R n , x H n ) A ˜ ( j ) Pr { X R n = x R n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n } ( e ) x R n T δ 3 n ( X R | x H n , x ^ R n ( j ) ) Pr { X R n = x R n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n }
( f ) 2 n { H ( X R | X H , X ^ R ) + τ } · 2 n { H ( X R | X H ) τ } 2 n { H ( X K | X ^ R ) τ } · 2 n { H ( X K ) + τ } · 2 n { H ( X H ) τ } = 2 n { H ( X H | X ^ R ) 5 τ } ,
where
(d)
follows from the fact that
Pr { X R n = x R n , X H n = x H n | X H n = x H n } = Pr { X R n = x R n | X H n = x H n } ,
(e)
is due to the inequality proved in Appendix E, and
(f)
follows because of the number of strongly typical sequences.
Therefore, from Equations (61), (64) and (66) we can obtain
l n H ( X H ) 1 n j = 1 M n 1 [ n x H n Pr { X H n = x H n | X K n A ˜ ( j ) } · { H ( X H | X ^ R ) 5 τ } ] · Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ = H ( X H ) Pr X K n j = 1 M n 1 A ˜ ( j ) · { H ( X H | X ^ R ) 5 τ } + 4 τ log | X H | · 2 R 2 τ H ( X H ) ( 1 τ ) { H ( X H | X ^ R ) 5 τ } + 4 τ log | X H | · 2 R 2 τ I ( X H ; X ^ R ) + τ ( log | X H | + 5 ) + 4 τ log | X H | · 2 R 2 τ .
Since constants ϵ , τ , and δ are fixed to satisfy (45)–(48), from (44), (57)–(59) and (67), we obtain
r n R ,
u n E [ d ( X R , X ^ R ) ] + ϵ < D ,
l n < I ( X H ; X ^ R ) + ϵ < L ,
e n I ( X H ; X E ) < E .
Therefore, for the fixed distribution P X E , X E c · P X ^ R | X E any tuple
( R , D , L , E ) { ( R , D , L , E ) : R > I ( X E ; X ^ R ) , D > E [ d ( X R , X ^ R ) ] , L > I ( X H ; X ^ R ) , E > I ( X H ; X E ) } S E * ( P X K )
is achievable. Consequently, S E * ( P X K ) C E ( P X K ) . Taking the closure for the left-hand side (l.h.s.), we obtain C l ( S E * ( P X K ) ) C E ( P X K ) because C E ( P X K ) is a closed set. We conclude that S E ( P X K ) = p C l ( S E * ( P X K ) ) C E ( P X K ) because the distribution P X K = P X E , X E c · P X ^ R | X E is fixed arbitrarily. We complete the proof of the direct part.

4. First-Order Rate Analysis with Excess-Distortion Probability

4.1. Performance Measures

Hereafter, let the pair of the encoder and decoder ( f n , g n ) be fixed.
For a given M n , the coding rate is defined as
r n 1 n log M n .
Let d : X R × X ^ R [ 0 , ) be a distortion function between x R X R and x ^ R X ^ R . The distortion between sequences x R n X R n and x ^ R n X ^ R n is defined as
d ( x R n , x ^ R n ) i = 1 n d ( x R , i , x ^ R , i ) .
Then, the measure of utility is defined as
u n Pr 1 n d ( X R n , X ^ R n ) > D .
This measurement is called excess-distortion probability for D 0 .
In this system, the privacy of the hidden source sequence X H n should be protected when the codeword J n is observed by decoder g n . The measure of privacy for the decoder is defined as
l n 1 n I ( X H n ; J n ) ,
where I ( X H n ; J n ) is the mutual information between X H n and J n .
The privacy of the hidden source sequence X H n should be protected when the encoded information X E is observed by encoder f n . The measurement of privacy for the encoder is defined as
e n 1 n I ( X H n ; X E n ) ,
where I ( X H n ; X E n ) is the mutual information between X H n and X E n .

4.2. Achievable Region and Theorem

We define the achievable region for the first-order rate analysis with the excess-distortion probability and state the obtained results.
Definition 10.
A tuple ( R , D , L , E ) is said to be  ϵ -achievable (with respect to the excess-distortion probability) if, for any given ϵ > 0 , there exists a sequence of codes ( f n , g n ) satisfying
r n R + ϵ ,
u n ϵ ,
l n L + ϵ ,
e n E + ϵ
for all sufficiently large n.
The technical meanings of each constraint in Definition 10 can be interpreted as follows: Equation (78) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (79) is the constraint corresponding to the excess-distortion probability being less than ϵ , so this condition should also be decreased. Equation (80) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (81) constrains the amount of leaked private information to the encoder. For the same reason as (80), this quantity should also be decreased.
Definition 11.
The closure of the set of ϵ-achievable tuples ( R , D , L , E ) is referred to as the  ϵ -achievable region and is denoted by L E ( ϵ | P X K ) and define
L E ( P X K ) 0 < ϵ < 1 L E ( ϵ | P X K ) .
We establish the following theorem. For the proof of this theorem, please refer to Section 4.3 and Section 4.4.
Theorem 2.
For any E such that R E K , the achievable region of the coding system is given by
L E ( P X K ) = S E ( P X K ) .
Remark 4.
From Theorems 1 and 2, we find that the achievable region in which utility is measured by the expected distortion is equal to the one in which utility is measured by the excess-distortion probability.
Because in Section 6 we discuss the achievable region among coding rate, utility, and privacy, a characterization of the achievable region is derived by projecting the characterization in Theorem 2 onto the R - D - L hyperplane.
Definition 12.
For any E such that R E K , we define
L E R D L ( ϵ | P X K ) { ( R , D , L ) : ( R , D , L , E ) L E ( ϵ | P X K ) }
and
L E R D L ( P X K ) 0 < ϵ < 1 L E R D L ( ϵ | P X K ) .
Definition 13.
For any E such that R E K , we define
S E R D L ( P X K ) = { ( R , D , L ) : R I ( X E ; X ^ R ) , D E [ d ( X R , X ^ R ) ] , L I ( X H ; X ^ R ) for some P X E , X E c · P X ^ R | X E } .
Corollary 2.
For any E such that R E K , the region L E R D L ( P X K ) is given by
L E R D L ( P X K ) = S E R D L ( P X K ) .
Examples of numerical calculation of this result are shown in Section 6.1.
Since we focus on the achievable region between utility and privacy in the next section, a characterization of the achievable region is derived by further projecting the result of Theorem 2 onto the D - L plane.
Definition 14.
For any E such that R E K , we define
L E D L ( ϵ | P X K ) { ( D , L ) : ( R , D , L , E ) L E ( ϵ | P X K ) }
and
L E D L ( P X K ) 0 < ϵ < 1 L E D L ( ϵ | P X K ) .
Definition 15.
For any E such that R E K , we define
S E D L ( P X K ) = { ( D , L ) : D E [ d ( X R , X ^ R ) ] , L I ( X H ; X ^ R ) for some P X E , X E c · P X ^ R | X E } .
Corollary 3.
For any E such that R E K , the region L E D L ( P X K ) is given by
L E D L ( P X K ) = S E D L ( P X K ) .

4.3. Proof of Converse Part

From Section 3.4 (proof of the converse part), we have
C E ( P X K ) S E ( P X K ) .
Let a tuple ( R , D , L , E ) L E ( P X K ) be arbitrarily fixed and ϵ > 0 and ϵ > 0 be given. From the argument of the method of types, the sequences x R n are divided into two categories: distortion-typical or non-distortion-typical with some x ^ R n . The sequences of the former categories satisfy 1 n d ( x R n , x ^ R n ) < D + ϵ and the sequences of the latter one satisfy 1 n d ( x R n , x ^ R n ) < d max where d max max x R X R , x ^ R X ^ R d ( x R , x ^ R ) . Then, the expected distortion is bounded from above as
E 1 n d ( X R n , X ^ R n ) D + ϵ + Pr 1 n d ( x R n , x ^ R n ) > D + ϵ · d max D + ϵ + Pr 1 n d ( x R n , x ^ R n ) > D · d max ( a ) D + ϵ + ϵ d max ,
where (a) follows from (79) of ϵ -achievable in which utility is measured by the excess-distortion probability. Since ϵ + ϵ d max can be arbitrarily small with proper choices of ϵ and ϵ , (15) can be derived. This means
L E ( P X K ) C E ( P X K ) .
From both inclusion relations,
L E ( P X K ) C E ( P X K ) S E ( P X K )
is evidently satisfied.

4.4. Proof of the Direct Part

In this part, we provide a sketch of the proof of S E ( P X K ) L E ( ϵ | P X K ) .
Under an arbitrarily fixed distribution P X E , X E c · P X ^ R | X E , any tuple ( R , D , L , E ) S E ( P X K ) is chosen such that
R > I ( X E ; X ^ R ) ,
D > E [ d ( X R , X ^ R ) ] ,
L > I ( X H ; X ^ R ) ,
E > I ( X H ; X E ) .
From (97) and (98) , we can choose a sufficiently small ϵ > 0 such that
D > E [ d ( X R , X ^ R ) ] + ϵ ,
L > I ( X H ; X ^ R ) + ϵ .
In addition, with this ϵ , some constant 0 < τ < 1 2 is fixed such that
τ ( log | X H | + 5 ) + 4 τ log | X H | · 2 R 2 τ < ϵ .
We can also choose positive numbers δ ( δ ( n ) ) such that
2 δ 2 ( n ) R I ( X E ; X ^ R ) 1 n τ ,
δ ( n ) 0 ,
n · δ ( n )
as n . Let δ ( n ) = c n log n where c is a constant, and obviously (104) and (105) are satisfied.
Generation of codebook: Randomly generate x ^ R n ( j ) from the strongly typical sequences T δ n ( X ^ R ) for j = 1 , 2 , , M n 2 n R . Reveal the codebook C = { x ^ R n ( 1 ) , , x ^ R n ( M n ) } to the encoder and decoder.
Encoding: If a sequence x E n X E n satisfies x K n = ( x E n , x E c n ) with some x E c n X E c n , we write x E n x K n . Given x K n , the encoder finds j such that x E n T δ n ( X E | x ^ R n ( j ) ) and sets f n ( x E n ) = j where T δ n ( X E | x ^ R n ( j ) ) is the conditional strongly typical sequences. If there exist multiple such j, f n ( x E n ) is set as the minimum one. If there are no such j, then f n ( x E n ) = M n .
Decoding: When j is observed, the decoder sets the reproduced sequence as X ^ R n = x ^ R n ( j ) .
Evaluation: We define A ( j ) , B ( j ) , and A ˜ ( j ) as
A ( j ) { x E n : f n ( x E n ) = j } ,
B ( j ) { x K n : x E n x K n , f n ( x E n ) = j } ,
A ˜ ( j ) { x K n : x E n x K n , f n ( x E n ) = j , x K n T 2 δ n ( X K | x ^ R n ( j ) ) } ( j = 1 , 2 , , M n 1 ) { x K n : x K n X K n j = 1 M n 1 A ˜ ( j ) } ( j = M n ) .
It is easily verified that A ( j ) for j = 1 , 2 , , M n (and also B ( j ) and A ˜ ( j ) ) is disjoint. From the definitions of J n , A ( j ) , and B ( j ) ,
Pr { J n = j } = Pr { X E n A ( j ) } = Pr { X K n B ( j ) } for j = 1 , 2 , , M n .
For sufficiently large n, we can prove (cf. Appendix B)
| Pr { X K n B ( j ) } Pr { X K n A ˜ ( j ) } | 2 | X K | · | X ^ R | e 2 δ 2 n for j = 1 , 2 , , M n 1 .
For sufficiently large n, we can show that there exists a code ( f n , g n ) such that (cf. Appendix F)
r n R ,
Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n ,
u n ( 2 | X E | + 1 ) e 2 δ 2 n ,
e n I ( X H ; X E ) ,
Pr X K n j = 1 M n 1 A ˜ ( j ) τ ,
| A ˜ ( j ) | 2 n { H ( X K | X ^ R ) τ } .
For this code ( f n , g n ) , we evaluate the privacy leakage against the decoder as
l n 1 n I ( X H n ; J n ) = 1 n H ( X H n ) 1 n H ( X H n | J n ) = ( a ) H ( X H ) 1 n H ( X H n | J n ) = H ( X H ) 1 n j = 1 M n H ( X H n | X K n B ( j ) ) Pr { X K n B ( j ) } ( b ) H ( X H ) 1 n j = 1 M n H ( X H n | X K n A ˜ ( j ) ) Pr { X K n A ˜ ( j ) }
+ 4 τ log | X H | · 2 R 2 τ ( c ) H ( X H ) 1 n j = 1 M n 1 H ( X H n | X K n A ˜ ( j ) ) Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ = H ( X H ) 1 n j = 1 M n 1 [ x H n Pr { X H n = x H n | X K n A ˜ ( j ) } · log Pr { X H n = x H n | X K n A ˜ ( j ) } ] ·
Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ ,
where
(a)
follows because of i . i . d . P X K n ,
(b)
is due to the inequality proved in Appendix D, and
(c)
follows by removing the term for j = M n .
Here, for any x H n satisfying x K n = ( x R n , x H n ) A ˜ ( j ) with some x R n , we can show that
Pr { X H n = x H n | X K n A ˜ ( j ) } = Pr { X K n A ˜ ( j ) | X H n = x H n } Pr { X H n = x H n } Pr { X K n A ˜ ( j ) } = x R n : ( x R n , x H n ) A ˜ ( j ) Pr { X R n = x R n , X H n = x H n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n } = ( d ) x R n : ( x R n , x H n ) A ˜ ( j ) Pr { X R n = x R n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n } ( e ) x R n T δ 3 n ( X R | x H n , x ^ R n ( j ) ) Pr { X R n = x R n | X H n = x H n } ( x ˜ R n , x ˜ H n ) A ˜ ( j ) Pr { X R n = x ˜ R n , X H n = x ˜ H n } · Pr { X H n = x H n }
( f ) 2 n { H ( X R | X H , X ^ R ) + τ } · 2 n { H ( X R | X H ) τ } 2 n { H ( X K | X ^ R ) τ } · 2 n { H ( X K ) + τ } · 2 n { H ( X H ) τ } = 2 n { H ( X H | X ^ R ) 5 τ } ,
where
(d)
follows from the fact that
Pr { X R n = x R n , X H n = x H n | X H n = x H n } = Pr { X R n = x R n | X H n = x H n } ,
(e)
is due to the inequality proved in Appendix E, and
(f)
follows because of the number of strongly typical sequences.
Therefore, from Equations (115), (119), and (121), we can obtain
l n H ( X H ) 1 n j = 1 M n 1 [ n x H n Pr { X H n = x H n | X K n A ˜ ( j ) } · { H ( X H | X ^ R ) 5 τ } ] · Pr { X K n A ˜ ( j ) } + 4 τ log | X H | · 2 R 2 τ = H ( X H ) Pr X K n j = 1 M n 1 A ˜ ( j ) · { H ( X H | X ^ R ) 5 τ } + 4 τ log | X H | · 2 R 2 τ H ( X H ) ( 1 τ ) { H ( X H | X ^ R ) 5 τ } + 4 τ log | X H | · 2 R 2 τ I ( X H ; X ^ R ) + τ { H ( X H | X ^ R ) + 5 } + 4 τ log | X H | · 2 R 2 τ .
Since constants ϵ , τ , and δ are fixed to satisfy (100)–(102), from (111), (113), and (122), we obtain
r n R ,
u n ϵ ,
l n < I ( X H ; X ^ R ) + ϵ < L ,
e n I ( X H ; X E ) < E .
Therefore, for the fixed distribution P X E , X E c · P X ^ R | X E , any tuple
( R , D , L , E ) { ( R , D , L , E ) : R > I ( X E ; X ^ R ) , D > E [ d ( X R , X ^ R ) ] , L > I ( X H ; X ^ R ) , E > I ( X H ; X E ) } S E * ( P X K )
is achievable. Consequently, S E * ( P X K ) L E ( ϵ | P X K ) . Taking the closure for the l.h.s., we obtain C l ( S E * ( P X K ) ) L E ( ϵ | P X K ) because L E ( ϵ | P X K ) is a closed set. We conclude that S E ( P X K ) = p C l ( S E * ( P X K ) ) L E ( ϵ | P X K ) because the distribution P X K , X ^ R = P X E , X E c · P X ^ R | X E is fixed arbitrarily. We complete the proof of the direct part.

5. Strong Converse Theorem for Utility–Privacy Trade-Offs

5.1. Another Expression of the Achievable Region

In Section 5.1, we clarify that the achievable region L E D L ( P X K ) defined in (89) coincides with the region expressed with a tangent plane.
Definition 16.
For any E such that R E K , the region T E μ ( P X K ) is defined as
T E μ ( P X K ) min { I ( X H ; X ^ R ) + μ E [ d ( X R , X ^ R ) ] for some P X ^ R | X E · P X E c X E } , w h e r e T E D L ( P X K ) μ 0 { ( L , D ) : L + μ D T E μ ( P X K ) } .
Theorem 3.
For any E such that R E K , the region S E D L ( P X K ) defined in (90) is given by
S E D L ( P X K ) = T E D L ( P X K ) ,
and the achievable region L E D L ( P X K ) , which is the projection region of the achievable region L E ( P X K ) onto the D - L plane, is given by
L E D L ( P X K ) = T E D L ( P X K ) .
Proof. 
Figure 3 illustrates the proof image using a graph. Let a constance μ 0 be fixed arbitrarily. Like in Figure 3, there exists a boundary point ( D μ , L μ ) of S E D L tangent to the line with slope μ . The intercept of this tangent line is L μ + μ D μ .
The minimum I ( X H ; X ^ R ) + μ E [ d ( X R , X ^ R ) ] characterized by some distribution P X ^ R | X E coincides with L μ + μ D μ . Therefore,
L μ + μ D μ = min { I ( X H ; X ^ R ) + μ E [ d ( X R , X ^ R ) ] for some P X ^ R | X E · P X E c X E } .
From (130), we obtain
{ ( L , D ) : L + μ D L μ + μ D μ } = { ( L , D ) : L + μ D min { I ( X H ; X ^ R ) + μ E [ d ( X R , X ^ R ) ] for some P X ^ R | X E · P X E c X E } } .
Taking the intersection by μ 0 on the both sides of (131),
μ 0 { ( L , D ) : L + μ D L μ + μ D μ } = μ 0 { ( L , D ) : L + μ D T E μ ( P X K ) } .
The l.h.s. of (131) shows the upper-right region in the first quadrant drawn by the tangent line with a slope μ for S E D L ( P X K ) . Since the l.h.s. of (132) is the intersection of the l.h.s. of (131), the l.h.s. of (132) represents S E D L ( P X K ) . From Definition 16, the right-hand side (r.h.s.) of (132) is T E D L ( P X K ) . As a result, (128) holds. Since L E D L ( P X K ) = S E D L ( P X K ) from Corollary 3, likewise, (129) holds. □

5.2. Proof Preliminaries

In Section 5.2, we derive two fundamental properties of the minimization about two values and the inequalities about entropy and divergence to prove the strong converse theorem. In Proposition 1, we change the objective function T E μ ( P X K ) of the region expressed with the tangent plane introduced in Section 5.1 onto the region expressed with divergence.
Proposition 1.
Let μ 0 be fixed arbitrarily. For any E such that R E K ,
T E μ ( P X K ) = sup α > 0 T E μ , α ( P X K ) ,
where
T E μ , α ( P X K ) min P X ˜ E c X ˜ E X ^ ˜ R [ I ( X ˜ H ; X ^ ˜ R ) + μ E [ d ( X ˜ R , X ^ ˜ R ) ] + α D ( P X ˜ E c X ˜ E X ^ ˜ R Q X E c X E X ^ ˜ R ) + D ( P X ˜ E c X ˜ E P X E c X E ) ] = min P X ˜ E c X ˜ E X ^ ˜ R [ I ( X ˜ H ; X ^ ˜ R ) + μ E [ d ( X ˜ R , X ^ ˜ R ) ] ] + ( α + 1 ) D ( P X ˜ E c X ˜ E P X E c X E ) + α I ( X ˜ E c ; X ^ ˜ R | X ˜ E ) ] ,
and Q X E c X E X ^ ˜ R is the distribution induced from each P X ˜ E c X ˜ E X ^ ˜ R .
Proof. 
First, it is clear that T E μ ( P X K ) T E μ , α ( P X K ) for all α > 0 . To prove T E μ ( P X K ) T E μ , α ( P X K ) for some α > 0 , for α > 0 , let P X ˜ E c X ˜ E X ^ ˜ R α be the distribution that minimizes the r.h.s. of (134) and Q X E c X E X ^ ˜ R α = P X ^ ˜ R | X ˜ E P X E c X E be the estimated distribution. Since G ( P X ˜ E c X ˜ E X ^ ˜ R α ) I ( X ˜ H ; X ^ ˜ R ) + E [ d ( X ˜ R , X ^ ˜ R ) ] is non-negative and is bounded above, by setting a = log | X H | + D max , it must hold that
α D ( P X ˜ E c X ˜ E X ^ ˜ R α Q X E c X E X ^ ˜ R α ) a
and thus
D ( P X ˜ E c X ˜ E X ^ ˜ R α Q X E c X E X ^ ˜ R α ) ( a / α ) .
Notice that any set of probability distributions on a finite alphabet forms a compact set. Because G ( P X ˜ E c X ˜ E X ^ ˜ R α ) is a continuous function over a compact set, it is also uniformly continuous. Then, there exists a function Δ ( t ) satisfying Δ ( t ) 0 as t 0 such that
T E μ , α ( P X K ) G ( P X ˜ E c X ˜ E X ^ ˜ R α ) G ( Q X E c X E X ^ ˜ R α ) Δ ( a / α ) T E μ ( P X K ) Δ ( a / α ) .
Consequently, we obtain the desired inequality T E μ ( P X K ) lim α T E μ , α ( P X K ) by taking α . □
In the following proposition, we show the inequalities satisfied between i . i . d . source P X E c n X E n and arbitrary source P X ˜ E c n X ˜ E n .
Proposition 2.
For i . i . d . source P X E c n X E n , which has the common distribution P X E c X E and arbitrary distribution P X ˜ E c n X ˜ E n , it holds that
H ( X ˜ E c n | X ˜ E n ) + D ( P X ˜ E c n X ˜ E n P X E c n X E n ) n [ H ( X ˜ E c , J | X ˜ E , J ) + D ( P X ˜ E c , J X ˜ E , J P X E c X E ) ] ,
H ( X ˜ H n ) + D ( P X ˜ H n X ˜ R n P X H n X R n ) n [ H ( X ˜ H , J ) + D ( P X ˜ H , J X ˜ R , J P X H X R ) ] ,
where J unif ( 1 , , n ) is the uniformly random variable over the set { 1 , 2 , , n } for time-sharing and is assumed to be independent of all the other random variables involved.
Proof. 
The l.h.s. of (135) can be represented as
H ( X ˜ E c n | X ˜ E n ) + D ( P X ˜ E c n | X ˜ E n P X E c n | X E n | P X ˜ E n ) + D ( P X ˜ E n P X E n ) .
The sum of the first and second terms satisfies the following equation:
H ( X ˜ E c n | X ˜ E n ) + D ( P X ˜ E c n | X ˜ E n P X E c n | X E n | P X ˜ E n ) = x E c n , x E n P X ˜ E c n X ˜ E n ( x E c n , x E n ) · log 1 P X ˜ E c n | X ˜ E n ( x E c n | x E n ) + log P X ˜ E c n | X ˜ E n ( x E c n | x E n ) P X E c n | X E n ( x E c n | x E n ) = x E c n , x E n P X ˜ E c n X ˜ E n ( x E c n , x E n ) log 1 P X E c n | X E n ( x E c n | x E n ) = ( a ) x E c n , x E n P X ˜ E c n X ˜ E n ( x E c n , x E n ) · j = 1 n log 1 P X E c | X E ( x E c , j | x E , j ) = ( b ) n x E c , x E P X ˜ E c , J X ˜ E , J ( x E c , x E ) log 1 P X E c | X E ( x E c | x E ) = n x E c , x E P X ˜ E c , J X ˜ E , J ( x E c , x E ) · log 1 P X ˜ E c , J | X ˜ E , J ( x E c | x E ) + log P X ˜ E c , J | X ˜ E , J ( x E c | x E ) P X E c | X E ( x E c | x E ) = n { H ( X ˜ E c , J | X ˜ E , J ) + D ( P X ˜ E c , J | X ˜ E , J P X E c | X E | P X ˜ E , J ) } ,
where
(a)
follows from the memoryless property of i . i . d . source P X E c n X E n ;
(b)
holds because 1 n j = 1 n P X ˜ E c , j X ˜ E , j ( x E c , x E ) = P X ˜ E c , J X ˜ E , J ( x E c , x E ) .
The third term can be bounded from below as
D ( P X ˜ E n P X E n ) = j = 1 n D ( P X ˜ E , j | X ˜ E j 1 P X E | P X ˜ E j 1 ) ( c ) j = 1 n D ( P X ˜ E , j P X E ) ( d ) n D ( P X ˜ E , J P X E ) ,
where
(c)
follows from the data processing inequality and
(d)
holds because of Jensen’s inequality.
From (137) and (138), (135) can be derived.
Likewise, the l.h.s. of (136) can be represented as
H ( X ˜ H n ) + D ( P X ˜ H n P X H n ) + D ( P X ˜ R n | X ˜ H n P X R n | X H n | P X ˜ H n ) ,
The sum of the first and second terms satisfies
H ( X ˜ H n ) + D ( P X ˜ H n P X H n ) = x H n P X ˜ H n ( x H n ) log 1 P X ˜ H n ( x H n ) + log P X ˜ H n ( x H n ) P X H n ( x H n ) = x H n P X ˜ H n ( x H n ) log 1 P X H n ( x H n ) = x H n P X ˜ H n ( x H n ) · j = 1 n log 1 P X H ( x H , j ) = ( e ) n x H P X ˜ H , J ( x H ) log 1 P X H ( x H ) = n x H P X ˜ H , J ( x H ) log 1 P X ˜ H , J ( x H ) + log P X ˜ H , J ( x H ) P X H ( x H ) = n { H ( X ˜ H , J ) + D ( P X ˜ H , J P X H ) } ,
where
(e)
holds because 1 n j = 1 n P X ˜ H , j ( x H ) = P X ˜ H , J ( x H ) .
For the third term, it holds that
D ( P X ˜ R n | X ˜ H n P X R n | X H n | P X ˜ H n ) = j = 1 n D ( P X ˜ R , j | X ˜ H n X ˜ R j 1 P X R | X H | P X ˜ H n X ˜ R j 1 ) ( f ) j = 1 n D ( P X ˜ R , j | X ˜ H , j P X R | X H | P X ˜ H , j ) n D ( P X ˜ R , J | X ˜ H , J P X R | X H | P X ˜ H , J ) ,
where
(f)
follows from the log sum inequality.
From (139) and (140), we obtain (136). □

5.3. Strong Converse Theorem

We shall establish the strong converse theorem, which is the main result of this section. Before proving the theorem, we state the lemma of the key tool in the proof about a single-letterized T E μ , α ( P X K ) and a T E μ , α ( P X K n ) , which are introduced in Proposition 1.
Lemma 7.
For any E such that R E K , all n N , μ 0 and α > 0 , it holds that
T E μ , α ( P X K n ) n T E μ , α ( P X K ) .
As the main theorem of this section, we show the strong converse theorem for the utility–privacy trade-offs.
Theorem 4.
Strong converse theorem: For any E such that R E K and all 0 < ϵ < 1 , it holds that
L E D L ( ϵ | P X K ) = L E D L ( P X K ) .
Remark 5.
Theorem 4 suggests that regardless of the value of ϵ, the region L E D L ( ϵ | P X K ) is equal to L E D L ( P X K ) .

5.4. Proof of Lemma 7

Lemma 7 indicates that the function T E μ , α ( P X K n ) , whose argument P X K n is a probability distribution over X K n , can be lower-bounded by the n-fold of a single-letterized function T E μ , α ( P X K ) . Before describing the detailed proof, we state the outline of the proof: (i) We first express the function T E μ , α ( P X K n ) as the maximum of the difference of two functions denoted by G 1 and G 2 as in (142). (ii) Then, we show that the first function G 1 can be lower-bounded by the n-fold of its single-letterized function as in (143), while the second function G 2 can be upper-bounded by the n-fold of its single-letterized function as in (147). This outline of the proof is similar to the Proof of Theorem 4, 16 with a slight modification of the function G 2 .
For a given distribution P X ˜ E c n X ˜ E n X ^ ˜ R n , let functions G 1 ( P X ˜ E c n X ˜ E n ) and G 2 ( P X ˜ E c n X ˜ E n X ^ ˜ R n ) be defined as
G 1 ( P X ˜ E c n X ˜ E n ) H ( X ˜ H n ) + α H ( X ˜ E c n | X ˜ E n ) + ( α + 1 ) D ( P X ˜ E c n X ˜ E n P X E c X E n ) , G 2 ( P X ˜ E c n X ˜ E n X ^ ˜ R n ) H ( X ˜ H n | X ^ ˜ R n ) μ E [ d ( X ˜ R n , X ^ ˜ R n ) ] + α H ( X ˜ E n | X ˜ E n , X ^ ˜ R n ) .
Using these functions, and in view of (134), T E μ , α ( P X E c X E n ) can be written as
T E μ , α ( P X E c X E n ) = min P X ˜ E c n X ˜ E n X ^ ˜ R n G 1 ( P X ˜ E c n X ˜ E n ) G 2 ( P X ˜ E c n X ˜ E n X ^ ˜ R n ) .
For fixed P X ˜ E c n X ˜ E n X ^ ˜ R n , from Proposition 2, it holds that
G 1 ( P X ˜ E c n X ˜ E n ) n G 1 ( P X ˜ E c , J X ˜ E , J ) .
Next, we consider the function G 2 ( P X ˜ E c n X ˜ E n X ^ ˜ R n ) . For the first term on the r.h.s. of (141), it holds that
H ( X ˜ H n | X ^ ˜ R n ) = j = 1 n H ( X ˜ H , j | X ˜ H j 1 , X ^ ˜ R n ) j = 1 n H ( X ˜ H , j | X ^ ˜ R , j ) = n · 1 n j = 1 n H ( X ˜ H , j | X ^ ˜ R , j ) = n H ( X ˜ H , J | X ^ ˜ R , J , J ) n H ( X ˜ H , J | X ^ ˜ R , J ) .
The second term of (141) can be expressed as follows:
E [ d ( X ˜ R n , X ^ ˜ R n ) ] = x R n , x ^ R n P X ˜ R n X ^ ˜ R n ( x R n , x ^ R n ) · j = 1 n d ( x R , j , x ^ R , j ) = j = 1 n x R , x ^ R P X ˜ R , j X ^ ˜ R , j ( x R , x ^ R ) d ( x R , x ^ R ) = ( a ) n x R , x ^ R P X ˜ R , J X ^ ˜ R , J ( x R , x ^ R ) d ( x R , x ^ R ) = n E [ d ( X ˜ R , J , X ^ ˜ R , J ) ] ,
where
(a)
follows from 1 n j = 1 n P X ˜ R , j X ^ ˜ R , j ( x R , x ^ R ) = P X ˜ R , J X ^ ˜ R , J ( x R , x ^ R ) .
Moreover, for the third term of (141), it holds that
H ( X ˜ E c n | X ˜ E n , X ^ ˜ R n ) = j = 1 n H ( X ˜ E c , j | X ˜ E c j 1 , X ˜ E n , X ^ ˜ R n ) j = 1 n H ( X ˜ E c , j | X ˜ E , j , X ^ ˜ R , j ) = n · 1 n j = 1 n H ( X ˜ E c , j | X ˜ E , j , X ^ ˜ R , j ) = n H ( X ˜ E c , J | X ˜ E , J , X ^ ˜ R , J , J ) n H ( X ˜ E c , J | X ˜ E , J , X ^ ˜ R , J ) .
From (144)–(146), we obtain
G 2 ( P X ˜ E c n X ˜ E n X ^ ˜ R n ) n G 2 ( P X ˜ E c , J X ˜ E , J X ^ ˜ R , J ) .
Consequently, since (143) and (147) are satisfied for an arbitrary P X ˜ E c n X ˜ E n X ^ ˜ R n , the proof is completed.

5.5. Proof of Strong Converse Theorem

For any given ϵ > 0 , fix the rate pair ( D , L ) L E D L ( ϵ | P X K ) arbitrarily. Then, by definition, there exists a code ( f n , g n ) satisfying (79) and (80). For this code ( f n , g n ) , a set D is defined as
D { ( x E c n , x E n ) : d ( x R n , g n ( f n ( x E n ) ) ) n D } .
We derive a distribution P X ˜ E c n X ˜ E n as
P X ˜ E c n X ˜ E n ( x E c n , x E n ) P X E c X E n ( x E c n , x E n ) 1 l [ ( x E c n , x E n ) D ] P X E c X E n ( D ) .
It is obvious that the excess-distortion probability measured by P X ˜ E c n X ˜ E n is 0; that is, X ˜ R n and X ^ ˜ R n = g n ( f n ( X ˜ E n ) ) satisfy E [ d ( X ˜ R n , X ^ ˜ R n ) ] n D . Thus, by imitating the proof approach of the standard weak converse theorem, it holds that
n ( L + μ D ) I ( X ˜ H n ; X ^ ˜ R n ) + μ E [ d ( X ˜ R n , X ^ ˜ R n ) ] ,
D ( P X ˜ E c n X ˜ E n P X E c X E n ) = log 1 P X E c X E n ( D ) log 1 1 ϵ .
From (148), the following equation is obtained:
n ( L + μ D ) ( a ) I ( X ˜ H n ; X ^ ˜ R n ) + μ E [ d ( X ˜ R n , X ^ ˜ R n ) ] + ( ( α + 1 ) D ( P X ˜ E c n X ˜ E n P X E c X E n ) + α I ( X ˜ E c n ; X ^ ˜ R n | X ˜ E n ) ) ( α + 1 ) log 1 1 ϵ ( b ) T E μ , α ( P X K n ) ( α + 1 ) log 1 1 ϵ ,
where
(a)
follows from (149) and I ( X ˜ E c n ; X ^ ˜ R n | X ˜ E n ) = 0 ,
(b)
is due to (134).
Since T E μ , α ( P X K n ) n T E μ , α ( P X K ) from Lemma 7, we have
L + μ D T E μ , α ( P X K ) ( α + 1 ) n log 1 1 ϵ ,
and therefore
sup α > 0 ( L + μ D ) sup α > 0 T ϵ μ , α ( P X K ) ( α + 1 ) n log 1 1 ϵ .
Because T E μ ( P X K ) = sup α > 0 T E μ , α ( P X K ) from Proposition 1, it holds that for an arbitrary α > 0 ,
L + μ D T E μ ( P X K ) ( α + 1 ) n log 1 1 ϵ .
Hence, it holds that
L + μ D lim n T E μ ( P X K ) ( α + 1 ) n log 1 1 ϵ = T E μ ( P X K ) for every μ 0 .
For the set of ( D , L ) satisfying (150), varying μ 0 arbitrarily and taking the intersection, we have
( D , L ) μ 0 { ( D , L ) : L + μ D T E μ ( P X K ) } .
From Theorem 3, the r.h.s. of (151) is equal to L E D L ( P X K ) . This proof is completed.

6. Discussion

6.1. Numerical Calculation of Coding Rate, Utility, and Privacy for Decoder

In this section, we show some numerical calculations of the achievable region C E R D L ( P X K ) and L E R D L ( P X K ) in Corollaries 1 and 2, respectively. In general, it is difficult to compute the achievable region C E R D L ( P X K ) and L E R D L ( P X K ) . Nevertheless, to obtain some insight, let us consider the three tractable but essential cases. In these calculations, the number of public attributes is one ( | R | = 1 ) and the number of private attributes is two ( | H | = 2 ) . We assume that each of the attributes is binary. Here, note again that the coding rate R acts like the rate-distortion function in rate-distortion theory (cf. (Section 10 in [27])). For fixed D and L, a smaller coding rate is better.
In the first example, we calculated the L-D graph of theoretical limits in case (i) E = K , case (ii) E = R , and case (iii) R E K (Figure 4). As a result, the achievable privacy leakage L becomes small as D becomes large if we do not impose any restrictions on the value of R. For a given D, the privacy leakage for the decoder in case (i) E = K is the smallest, and the one in case (ii) E = R is the largest in all cases. The second example calculated the R-D graph of theoretical limits in cases (i), (ii), and (iii) (Figure 5). We can see that the minimum coding rates for a given D coincide in all cases if we do not impose any restrictions on the value of L. In the third example, we calculated the optimal privacy leakage L for fixed D and the corresponding coding rates R in cases (i), (ii), and (iii) (Table 1, Table 2 and Table 3). As a result, the optimal privacy leakage in cases (i) and (iii) is smaller than the one in case (ii), whereas for the optimal privacy leakage, the achievable coding rates in cases (i) and (iii) is larger than the one in case (ii).
Next, we discuss these results. In Figure 4, in comparison with each case, we can verify that for a given D, the more private information is encoded, the smaller the achievable minimum privacy leakage is. Figure 5 suggests that if the coding rate should be minimized, it suffices to encode only the public attributes. This result is evident from Corollaries 1 and 2 because the condition on the choice of test channel P X ^ R | X E in case (i) is weaker than the one in case (ii), and if an appropriate test channel is taken in case (i), it is also appropriate in case (ii). It is indicated that the achievable region in case (ii) is also the one in cases (i) and (iii). The opposite is not the case. From Table 1, Table 2 and Table 3, we can confirm the trade-off between the optimal privacy leakage L for a fixed D and the corresponding coding rate R in comparison with each case.
Summarizing the foregoing arguments, we have discussed the relationship between utility and privacy in Figure 4, the one between utility and coding rate in Figure 5, and the one between privacy and coding rate in Table 1, Table 2 and Table 3. From the discussion about Figure 5, some readers may suspect that case (i) is the best-encoded information because the achievable region in cases (ii) and (iii) is the one in case (i). This is true if we do not consider the leakage for the encoder. However, this is not true if we consider the leakage for the encoder, that is, the measurement of privacy for the encoder (see (12) or (76)). In the next section, we discuss this point in detail.

6.2. Significance of Limited Leakage for Encoder

In this section, we discuss the significance of evaluating the leakage for the encoder. Our goal of this discussion is to show that the best-encoded information may be case (iii) R E K if we take the limited leakage for the encoder into consideration.
The first issue is the amount of encoded information. Some readers may think that it is better that more encoded information is inputted into the encoder. However, there are pros and cons.
Pros: 
The achievable regions C E R D L ( P X K ) and L E R D L ( P X K ) become larger.
Cons: 
The leakage for the encoder increases.
From this point of view, we can come up with the idea that there exists the best-encoded information in case (iii) R E K if we impose some constraint on the leakage for the encoder. This idea is the key point of this paper.
The second issue is the significance of the limited leakage for the encoder. Figure 6 shows the Hasse diagram, which represents the inclusion relation about the index sets of attributes. The Hasse diagram is often used to represent inclusion relations, for example, R E 2 E 1 K .
We can also regard Figure 6 as the Hasse diagram that represents the inclusion relation for the achievable regions C E R D L ( P X K ) and L E R D L ( P X K ) because the index sets of attributes ( R E K ) corresponds to the encoded information ( X E ) and the encoded information corresponds to the achievable region ( C E R D L ( P X K ) and L E R D L ( P X K ) ). In addition, the diagram in Figure 6 has another property, which is that the superordinate sets have a larger amount of privacy leakage for the encoder than the subordinate sets since the index sets of attributes correspond to the privacy leakage for the encoder.
Let us consider a practical application. We assume that the data aggregator, that is, the encoder, tries to gather encoded information from some application user and hopes to develop the utility of the application while limiting the amount of leakage for X H n by E 0 , that is, e n E . More precisely, for a given E, we want to find which subsets of K are sufficient to characterize
C R D L ( P X K | E ) R E K ( R , D , L ) : ( R , D , L , E ) C E ( P X K ) , L R D L ( P X K | E ) R E K ( R , D , L ) : ( R , D , L , E ) L E ( P X K ) ,
where C E ( P X K ) and L E ( P X K ) are defined in Definitions 2 and 11, respectively. The process is as follows.
Step 1:
Check the user’s requirements and impose the restriction on the privacy leakage for the encoder.
Figure 7 shows the Hasse diagram for Step 1. The blue dotted line means the border line satisfies the restriction of the privacy leakage for the encoder. Therefore, the index sets E 1 and K are not suitable as the index sets of encoded information.
Step 2:
Check the inclusion relation between index sets.
Figure 8 shows the Hasse diagram for Step 2. From Figure 6, we can find that
R E 2 , R E 3 , R E 5 , E 3 E 4 , E 5 E 4 .
Therefore, the index sets R , E 3 , and E 5 are not suitable as the index sets of encoded information.
Figure 9 shows the Hasse diagram obtained after Step 2. From Figure 9, the remaining index sets are E 2 and E 4 . Therefore, if we impose restriction on privacy leakage for the encoder, the index sets E 2 or E 4 form the Pareto area in this multi-objective optimization problem. In other words, there exists a system that satisfies the user’s requirements E of the maximum amount of leakage to the encoder, and the achievable regions are given by C R D L ( P X K | E ) = C E 2 R D L ( P X K ) C E 4 R D L ( P X K ) and L R D L ( P X K | E ) = L E 2 R D L ( P X K ) L E 4 R D L ( P X K ) .
From the discussion above, we mention that the best-encoded information is case (iii) R E K if we take the limited leakage for the encoder into account. This concept is one of the most important novelties in this paper.
If E satisfies some condition, then C R D L ( P X K | E ) can be characterized by the expressions given by Yamamoto [1] (cf. Remark 3). More specifically, the region C R D L ( P X K | E ) can be given by
C R D L ( P X K | E ) = S K R D L ( P X K )
if E H ( X K ) and
C R D L ( P X K | E ) = S R R D L ( P X K )
if H ( X R ) E < H ( X E ) for any R E with R E , where the regions S K R D L ( P X K ) and S R R D L ( P X K ) are given in [1] (cf. Remark 3).

6.3. Discussion on Measures for Privacy Leakage

This paper adopts the mutual information as the measure of privacy leakage as in (12), (13), (76), and (77). However, some less likely data can be leaked even though the database satisfies the theoretical limit of privacy leakage. For example, let ( X , Y ) be a pair of correlated random variables whose I ( X ; Y ) is very small. However, there may exist a pair of ( x 1 , y 1 ) such that Y = y 1 can imply X = x 1 with high probability. To put it differently, the receiver can tell the value of X if it observes Y = y 1 . The theoretical limit evaluated with mutual information cannot prevent such a scenario. To circumvent this scenario, we suggest the other measurement adopted in related studies. A promising candidate to avoid this problem is to employ Rényi information of higher orders [30], maximal leakage [15], and maximal α -leakage [16,17,18,21].

7. Conclusions

In this paper, we strengthened the results in [3] mainly by establishing three coding theorems in a privacy-constrained source coding problem. In Section 3 and Section 4, two theorems are made about the first-order rate analysis in which utility is measured by the expected distortion or the excess-distortion probability for case (iii), R E K . The novelty is the introduction of the measure of privacy for the encoder along with the use of the excess-distortion probability. The obtained characterization reduces to the one given in [3] derived based on the expected distortion when the leakage for the encoder is not limited, and the result shows that employing an excess-distortion probability does not change the achievable region from the one with an expected distortion. In Section 5, we establish the strong converse theorem for utility–privacy trade-offs. Although the described result is for the projected plane of utility and privacy for the decoder for simplicity, we can also incorporate the measure of privacy for the encoder. Finally, we discuss the significance of the encoded information considering limited leakage for the encoder. The argument suggests that the best-encoded information can be case (iii) R E K if some constraint is imposed on the privacy leakage for the encoder.
As future work, the second-order rate analysis for utility–privacy trade-offs is an interesting research topic [4,5,6]. Moreover, the strong converse theorem and the second-order rate analysis for the four-dimensional region of coding rate, utility, privacy for the decoder, and privacy for the encoder are more challenging tasks. It is also worth analyzing the achievable region with the other privacy measures such as Rényi information [30], maximal leakage [15], and maximal α -leakage [16,17,18,21]. This paper analyzed the theoretical limits of coding, but understanding how to achieve the theoretical limits remains open. The construction of good codes is also an important subject. Extensions of this paper’s scenario to coding with side information [2,25] are also of interest.

Author Contributions

N.S. contributed to the conceptualization of the research goals and aims, the visualization, the formal analysis of the results, and the review and editing. H.Y. contributed to the conceptualization of the ideas, the validation of the results, and the supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by JSPS KAKENHI Grant Numbers JP20K04462 and JP18H01438.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of the Markov Chain X E c X E X ^ R in Converse Part of Theorem 1

Let p i ( x E , i , x E c , i , x ^ R , i ) be the conditional distribution given Q = i ,
p i ( x E , i , x E c , i , x ^ R , i ) = x E , k : k i x E c , k : k i x ^ R , k : k i p ( x E n , x E c n , x ^ R n ) = x E , k : k i x E c , k : k i p ( x E n , x E c n , x ^ R , i ) = ( a ) x E , k : k i x E c , k : k i p i ( x E n , x ^ R , i ) p ( x E c n | x E n ) = x E , k : k i p i ( x E n , x ^ R , i ) x E c , k : k i p ( x E c n | x E n ) = ( b ) x E , k : k i p i ( x E n , x ^ R , i ) · x E c , k : k i l = 1 n p ( x E c , l | x E , l ) = p i ( x E , i , x ^ R , i ) p ( x E c , i | x E , i ) = p ( x E , i ) p ( x E c , i | x E , i ) p i ( x ^ R , i | x E , i ) = p ( x E , i , x E c , i ) p i ( x ^ R , i | x E , i ) ,
where
(a)
is due to the Markov chain X E c n X E n X ^ R , i and
(b)
follows from the stationary memoryless source.
Therefore, we can obtain the Markov chain X E c , i X E , i X ^ R , i . For the marginal distribution, we can show that
p ( x E , x E c , x ^ R ) = ( c ) 1 n i = 1 n p i ( x E , x E c , x ^ R ) = ( d ) 1 n i = 1 n p i ( x E , x E c ) p i ( x ^ R | x E ) = ( e ) p ( x E , x E c ) · 1 n i = 1 n p i ( x ^ R | x E ) = ( f ) p ( x E , x E c ) p ( x ^ R | x E ) ,
where
(c)
follows because
p ( x E , x E c , x ^ R ) = i = 1 n Pr { Q = i } p i ( x E , x E c , x ^ R ) ,
(d)
is due to the Markov chain X E c , i X E , i X ^ R , i ,
(e)
follows from the stationary memoryless source, and
(f)
follows because
p ( x ^ R | x E ) = i = 1 n Pr { Q = i } p i ( x ^ R | x E ) .
Therefore, we can obtain the Markov chain X E c X E X ^ R . We complete the proof.

Appendix B. Proof of Equation (56)

From A ˜ ( j ) B ( j ) for j = 1 , 2 , , M n 1 ,
Pr { X K n B ( j ) } = Pr { X K n A ˜ ( j ) } + Pr { X K n B ( j ) A ˜ ( j ) } .
If x K n B ( j ) A ˜ ( j ) , then x E n T δ n ( X E | x ^ R n ( j ) ) and ( x E n , x E c n ) T 2 δ n ( X K | x ^ R n ( j ) ) , and thus we have x E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) from Lemma 5. Then,
x K n B ( j ) A ˜ ( j ) x E n T δ n ( X E | x ^ R n ( j ) ) , x E c n T δ n ( X E c | x E n , x ^ R n ( j ) )
We can prove that
Pr { X K n B ( j ) A ˜ ( j ) } Pr { X E n T δ n ( X E | x ^ R n ( j ) ) , X E c n T δ n ( X E c | X E n , x ^ R n ( j ) ) } = x E n T δ n ( X E | x ^ R n ( j ) ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n } = ( a ) x E n T δ n ( X E | x ^ R n ( j ) ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n , X ^ R n = x ^ R n ( j ) } ( b ) x E n T δ n ( X E | x ^ R n ( j ) ) Pr { X E n = x E n } · 2 | X E c | · | X E | · | X ^ R | e 2 δ 2 n 2 | X K | · | X ^ R | e 2 δ 2 n ,
where
(a)
is due to the Markov chain X E c n X E n X ^ R n and
(b)
follows from Lemma 6.
From Equations (A5) and (A7), we can obtain
| Pr { X K n B ( j ) } Pr { X K n A ˜ ( j ) } | 2 | X K | · | X ^ R | e 2 δ 2 n .
We complete the proof of (56).

Appendix C. Proof of Existence of Code Satisfying Equations (57)–(62)

We first set M n 2 n R and r n 1 n log M n . Then, we obviously have (57).
From the union upper bound,
Pr X E n j = 1 M n 1 A ( j ) Pr { X E n T δ n ( X E ) } + Pr { X E n T δ n ( X E ) , X E n T δ n ( X E | x ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } .
From Lemma 6, the first term in (A9) is bounded as
Pr { X E n T δ n ( X E ) } 2 | X E | e 2 δ 2 n .
We consider the expectation of the second term in (A9) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequence x ^ R n ( j ) as X ^ R n ( j ) . For notational simplicity, we use the abbreviation
Pr { X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 | X E n = x E n } = Pr { x E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ,
and then
E [ Pr { X E n T δ n ( X E ) , X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ] = x E n T δ n ( X E ) p ( x E n ) E [ Pr { X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 | X E n = x E n } ] = ( a ) x E n T δ n ( X E ) p ( x E n ) E [ Pr { x E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ] = x E n T δ n ( X E ) p ( x E n ) j = 1 M n 1 E Pr { x E n T δ n ( X E | X ^ R n ( j ) ) } = ( b ) x E n T δ n ( X E ) p ( x E n ) E Pr { x E n T δ n ( X E | X ^ R n ( 1 ) ) } M n 1 ( c ) exp 2 n ( R I ( X E ; X ^ R ) 1 n τ ) ( d ) exp 2 2 δ 2 n ,
where
(a)
is owing to (A11),
(b)
is due to the symmetry about indexes of random coding,
(c)
follows from the same way as in (Section 3.6.3 in [31]), and
(d)
because δ is fixed to satisfy (49).
From (A10) and (A12), we obtain
E Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n .
Therefore, there exists at least one codebook satisfying (60) in the ensembles obtained by random coding.
Hereafter, codebook C is fixed to satisfy (60). That is, codebook C satisfies
Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n .
We evaluate the distortion function for each j.
(i)
j = 1 , 2 , , M n 1 :
d ( x R n , x ^ R n ( j ) ) = 1 n a X R b X ^ R N ( a , b | x R n , x ^ R n ( j ) ) d ( a , b ) ( e ) a X R b X ^ R P X R , X ^ R ( a , b ) d ( a , b ) + ( δ + δ 1 ) | X R | · | X ^ R | D max = E [ d ( X R , X ^ R ) ] + ( δ + δ 1 ) | X R | · | X ^ R | D max ,
where
(e)
because from Lemma 4, if x E n T δ n ( X E | x ^ R n ( j ) ) , then x R n T δ 1 n ( X R | x ^ R n ( j ) ) and from Lemma 3, if x ^ R n ( j ) T δ n ( X ^ R ) and x R n T δ 1 n ( X R | x ^ R n ( j ) ) , then ( x R n , x ^ R n ( j ) ) T δ + δ 1 n ( X R , X ^ R ) .
(ii)
j = M n :
d ( x R n , x ^ R n ( M n ) ) = 1 n i = 1 n d ( x R , i , x ^ R , i ) ( f ) D max ,
where
(f)
is due to the definition of D max max a X R , b X ^ R d ( a , b ) .
We consider Pr { J n = M n } . From (A14),
Pr { J n = M n } = Pr { X E n A ( M n ) } = Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n .
Therefore, we can confirm
lim n Pr { J n = M n } = 0 .
From (i) and (ii), we can evaluate utility u n as below.
u n E d ( X R n , X ^ R n ) j = 1 M n 1 Pr { J n = j } · ( E [ d ( X R , X ^ R ) ] + ( δ + δ 1 ) | X R | · | X ^ R | D max ) + Pr { J n = M n } · D max ( g ) E [ d ( X R , X ^ R ) ] + ( δ + δ 1 ) | X R | · | X ^ R | D max + τ
for all sufficiently large n, where
(g)
follows from (A18).
Thus, we obtain (58).
We can evaluate the privacy leakage against the encoder as shown below.
e n 1 n I ( X H n ; X E n ) = ( h ) 1 n i = 1 n I ( X H , j ; X E n | X H j 1 ) = ( i ) 1 n i = 1 n I ( X H , j ; X E , j ) = ( j ) I ( X H ; X E ) ,
where
(h)
is due to chain rule for mutual information and
(i), (j)
follows because i . i . d . P X K n .
Thus, we have (59).
Next, we show that the probability that random vector X K n is not included in the set j = 1 M n 1 A ˜ ( j ) is sufficiently small. First, notice that
x K n j = 1 M n 1 A ˜ ( j ) x E n j = 1 M n 1 A ( j ) or x E n A ( j 0 ) , ( x E n , x E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( x E n ) ,
where j 0 is the index such that f n ( x E n ) = j 0 for 1 j 0 M n 1 . Therefore, by the union upper bound,
Pr X K n j = 1 M n 1 A ˜ ( j ) Pr X E n j = 1 M n 1 A ( j ) + Pr { X E n A ( j 0 ) , ( X E n , X E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( X E n ) } .
We evaluate each term in (A22).
(i)
The first term:
Pr X E n j = 1 M n 1 A ( j ) ( k ) ( 2 | X E | + 1 ) e 2 δ 2 n ,
where
(k)
is because of (A14).
(ii)
The second term:
If the event in the second term occurs, x E n T δ n ( X E | x ^ R n ( j 0 ) ) and ( x E n , x E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) . Therefore, from Lemma 5, x E c n T δ n ( X E c | x E n , x ^ R n ( j 0 ) ) holds. Hence,
Pr { X E n A ( j 0 ) , ( X E n , X E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( X E n ) } Pr { X E n A ( j 0 ) , X E c n T δ n ( X E c | X E n , x ^ R n ( j 0 ) ) } j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n } = ( l ) j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n , X ^ R n = x ^ R n ( j ) } ( m ) j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · 2 | X E c | · | X E | · | X ^ R | e 2 δ 2 n ( n ) 2 | X K | · | X ^ R | e 2 δ 2 n ,
where
(l)
is due to the Markov chain X E c n X E n X ^ R n ,
(m)
follows since x E n T δ n ( X E | x ^ R n ( j 0 ) ) and Lemma 6, and
(n)
follows because A ( j ) are disjoint for each j.
From (A22)–(A24),
Pr X K n j = 1 M n 1 A ˜ ( j ) 4 | X K | · | X ^ R | e 2 δ 2 n .
Therefore, for sufficiently large n,
Pr X K n j = 1 M n 1 A ˜ ( j ) τ ,
and we obtain (61).
From Lemma 1, for sufficiently large n to stochastic matrix W : X ^ R X K and x ^ R n ( j ) T δ n ( X ^ R ) we can show that
1 n log | T δ 2 n ( X K | x ^ R n ( j ) ) | H ( X K | X ^ R ) τ , δ 2 δ | X E c | .
We can also show from (A27) that
2 n { H ( X K | X ^ R ) τ } | T δ 2 n ( X K | x ^ R n ( j ) ) | 2 n { H ( X K | X ^ R ) + τ } .
From the definition of A ˜ ( j ) and T δ 2 n ( X K | x ^ R n ( j ) ) and Lemma 3, for j = 1 , 2 , , M n 1 , we have
x K n A ˜ ( j ) x E n T δ n ( X E | x ^ R n ( j ) ) x K n T 2 δ n ( X K | x ^ R n ( j ) )
x K n T δ 2 n ( X K | x ^ R n ( j ) ) x E n T δ n ( X E | x ^ R n ( j ) ) x K n T 2 δ n ( X K | x ^ R n ( j ) )
This means
T δ 2 n ( X K | x ^ R n ( j ) ) A ˜ ( j ) | T δ 2 n ( X K | x ^ R n ( j ) ) | | A ˜ ( j ) | .
Therefore, from (A28) and (A31),
| A ˜ ( j ) | 2 n { H ( X K | X ^ R ) τ } ,
and we obtain (62).

Appendix D. Derivation of Inequality in Equation (63)

We derive the inequality in (63). To write notation concisely, for every x H n X H n and each j = 1 , 2 , , M n , we define P n ( j ) , Q n ( j ) , P ˜ n ( x H n , j ) , and Q ˜ n ( x H n , j ) as follows:
P n ( j ) Pr { X K n B ( j ) } ,
Q n ( j ) Pr { X K n A ˜ ( j ) } ,
P ˜ n ( x H n , j ) Pr { X H n = x H n , X K n B ( j ) } ,
Q ˜ n ( x H n , j ) Pr { X H n = x H n , X K n A ˜ ( j ) } .
Then, using the notation in [5], we can write each entropy as
H ( X K n B ( J n ) ) = H ( P n ) ,
H ( X K n A ˜ ( J n ) ) = H ( Q n ) ,
H ( X H n , X K n B ( J n ) ) = H ( P ˜ n ) ,
H ( X H n , X K n A ˜ ( J n ) ) = H ( Q ˜ n ) .
The variational distance between distributions P n and Q n is
d v ( P n , Q n ) = j = 1 M n | P n ( j ) Q n ( j ) | = j = 1 M n 1 | P n ( j ) Q n ( j ) | + | P n ( M n ) Q n ( M n ) | .
We evaluate each term in (A41).
(i)
The first term:
j = 1 M n 1 | P n ( j ) Q n ( j ) | = j = 1 M n 1 Pr { X K n B ( j ) A ˜ ( j ) } = ( a ) Pr X K n j = 1 M n 1 B ( j ) A ˜ ( j ) = Pr X K n j = 1 M n 1 B ( j ) Pr X K n j = 1 M n 1 A ˜ ( j ) = 1 Pr X K n j = 1 M n 1 A ˜ ( j ) 1 Pr X K n j = 1 M n 1 B ( j ) = Pr X K n j = 1 M n 1 A ˜ ( j ) Pr X K n j = 1 M n 1 B ( j ) Pr X K n j = 1 M n 1 A ˜ ( j ) ( b ) τ ,
where
(a)
follows because B ( j ) A ˜ ( j ) is disjoint for each j = 1 , 2 , , M n 1 ,
(b)
is owing to (61).
(ii)
The second term:
| P n ( M n ) Q n ( M n ) | = ( c ) Q n ( M n ) P n ( M n ) Q n ( M n ) = Pr X K n j = 1 M n 1 A ˜ ( j ) ( d ) τ ,
where
(c)
follows because B ( M n ) A ˜ ( M n ) and
(d)
follows from (61).
From (A42) and (A43), the variational distance between P n and Q n is bounded from above as
d v ( P n , Q n ) τ + τ = 2 τ .
Next, the variational distance between distributions P ˜ n and Q ˜ n is
d v ( P ˜ n , Q ˜ n ) = j = 1 M n x H n X H n P ˜ n ( x H n , j ) Q ˜ n ( x H n , j ) = j = 1 M n 1 x H n X H n P ˜ n ( x H n , j ) Q ˜ n ( x H n , j ) + x H n X H n P ˜ n ( x H n , M n ) Q ˜ n ( x H n , M n ) .
We evaluate each term in (A45).
(i)
The first term:
j = 1 M n 1 x H n X H n P ˜ n ( x H n , j ) Q ˜ n ( x H n , j ) = j = 1 M n 1 x H n X H n Pr { X H n = x H n , X K n B ( j ) A ˜ ( j ) } = ( e ) x H n X H n Pr X H n = x H n , X K n j = 1 M n 1 B ( j ) A ˜ ( j ) = Pr X K n j = 1 M n 1 B ( j ) A ˜ ( j ) = Pr X K n j = 1 M n 1 B ( j ) Pr X K n j = 1 M n 1 A ˜ ( j ) = 1 Pr X K n j = 1 M n 1 A ˜ ( j ) 1 Pr X K n j = 1 M n 1 B ( j ) = Pr X K n j = 1 M n 1 A ˜ ( j ) Pr X K n j = 1 M n 1 B ( j ) Pr X K n j = 1 M n 1 A ˜ ( j ) ( f ) τ ,
where
(e)
follows since B ( j ) A ˜ ( j ) is disjoint for each j = 1 , 2 , , M n 1 ,
(f)
is due to (61).
(ii)
The second term:
x H n X H n | P ˜ n ( x H n , M n ) Q ˜ n ( x H n , M n ) | = ( g ) x H n X H n Q ˜ n ( x H n , M n ) P ˜ n ( x H n , M n ) x H n X H n Q ˜ n ( x H n , M n ) = Q n ( M n ) = Pr X K n j = 1 M n 1 A ˜ ( j ) ( h ) τ ,
where
(g)
follows because B ( M n ) A ˜ ( M n ) and
(h)
is due to (61).
From (A46) and (A47), the variational distance between P ˜ n and Q ˜ n is bounded from above as
d v ( P ˜ n , Q ˜ n ) τ + τ = 2 τ .
As a result, from Lemma 2 and the relation of each entropy,
| H ( X K n B ( J n ) ) H ( X K n A ˜ ( J n ) ) | 2 τ log 2 τ M n , | H ( X H n , X K n B ( J n ) ) H ( X H n , X K n A ˜ ( J n ) ) |
2 τ log 2 τ | X H | n · M n .
From (A49), (A50), and the chain rule of entropy,
| H ( X H n | X K n B ( J n ) ) H ( X H n | X K n A ˜ ( J n ) ) | = | { H ( X H n , X K n B ( J n ) ) H ( X K n B ( J n ) ) } { H ( X H n , X K n A ˜ ( J n ) ) H ( X K n A ˜ ( J n ) ) } | = | { H ( X H n , X K n B ( J n ) ) H ( X H n , X K n A ˜ ( J n ) ) } + { H ( X K n A ˜ ( J n ) ) H ( X K n B ( J n ) ) } | ( i ) | H ( X H n , X K n B ( J n ) ) H ( X H n , X K n A ˜ ( J n ) ) | + | H ( X K n A ˜ ( J n ) ) H ( X K n B ( J n ) ) | 2 τ log 2 τ M n 2 τ log 2 τ | X H | n · M n = 4 τ log 2 τ | X H | n · M n = 4 τ log | X H | n · M n 2 τ ,
where
(i)
is because of the triangle inequality.
Therefore, we obtain
1 n H ( X H n | J n ) = 1 n H ( X H n | X K n B ( J n ) ) 1 n H ( X H n | X K n A ˜ ( J n ) ) 4 τ n log | X H | n · M n 2 τ > ( j ) 1 n H ( X H n | X K n A ˜ ( J n ) ) 4 τ n log | X H | n · 2 n R ( 2 τ ) n = 1 n H ( X H n | X K n A ˜ ( J n ) ) 4 τ log | X H | · 2 R 2 τ ,
where
(j)
follows from the definition that M n = 2 n R and 2 τ < 1 .
We complete the derivation of (63).

Appendix E. Proof of Equation (65)

First of all, we shall show
x K n A ˜ ( j ) x R n T δ 3 n ( X R | x H n , x ^ R n ( j ) ) , δ 3 ( | X H | + 1 ) · 2 δ .
By the definition of A ˜ ( j ) ,
A ˜ ( j ) T 2 δ n ( X K | x ^ R n ( j ) ) for j = 1 , 2 , , M n 1 .
Thus, from Lemma 4, any x R n such that ( x R n , x H n ) A ˜ ( j ) satisfies
x R n T δ 3 n ( X R | x H n , x ^ R n ( j ) ) .
That is, given x H n X H n and x ^ R n ( j ) X ^ R n , x R n X R n and x K n = ( x R n , x H n ) A ˜ ( j ) are conditional strongly typical sequences. Then, we obtain (A53), and
x R n : ( x R n , x H n ) A ˜ ( j ) Pr { X R n = x R n | X H n = x H n } Pr { X H n = x H n } x R n T δ 3 n ( X R | x H n , x ^ R n ( j ) ) Pr { X R n = x R n | X H n = x H n } Pr { X H n = x H n } .
Therefore, we obtain (65).

Appendix F. Proof of the Existence of Code Satisfying Equations (111)–(116)

We first set M n 2 n R and r n 1 n log M n . Then, we obviously have (111).
From the union upper bound,
Pr X E n j = 1 M n 1 A ( j ) Pr { X E n T δ n ( X E ) } + Pr { X E n T δ n ( X E ) , X E n T δ n ( X E | x ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } .
From Lemma 6, the first term in (A57) is bounded as
Pr { X E n T δ n ( X E ) } 2 | X E | e 2 δ 2 n .
We consider the expectation of the second term in (A57) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequence x ^ R n ( j ) as X ^ R n ( j ) . For notational simplicity, we use the abbreviation
Pr { X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 | X E n = x E n } = Pr { x E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ,
and then
E [ Pr { X E n T δ n ( X E ) , X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ] = x E n T δ n ( X E ) p ( x E n ) E [ Pr { X E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 | X E n = x E n } ] = ( a ) x E n T δ n ( X E ) p ( x E n ) E [ Pr { x E n T δ n ( X E | X ^ R n ( j ) ) for all j = 1 , 2 , , M n 1 } ] = x E n T δ n ( X E ) p ( x E n ) j = 1 M n 1 E Pr { x E n T δ n ( X E | X ^ R n ( j ) ) } = ( b ) x E n T δ n ( X E ) p ( x E n ) E Pr { x E n T δ n ( X E | X ^ R n ( 1 ) ) } M n 1 ( c ) exp 2 n ( R I ( X E ; X ^ R ) 1 n τ ) ( d ) exp 2 2 δ 2 n ,
where
(a)
is owing to (A59),
(b)
is due to the symmetry about indexes of random coding,
(c)
follows from the same way as in ([31], Section 3.6.3), and
(d)
because δ is fixed to satisfy (103).
From (A58) and (A60), we obtain
E Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n .
Therefore, there exists at least one codebook satisfying (112) in the ensembles obtained using random coding.
Hereafter, codebook C is fixed to satisfy (112). That is, codebook C satisfies
Pr X E n j = 1 M n 1 A ( j ) ( 2 | X E | + 1 ) e 2 δ 2 n .
For a fixed codebook C , we divide the sequences x E n X E n into three categories.
  • Strongly typical sequences x E n T δ n ( X E ) such that there exists a codeword X ^ R n ( j o ) for some j o = 1 , 2 , , M n 1 that is conditionally strongly typical with x E n . In this case, from Lemma 3, ( x E , x ^ R n ( j o ) ) T 2 δ n ( X E , X ^ R ( j o ) ) . Since the codeword is jointly strongly typical with x E n , the continuity of the distortion as a function of the joint distribution ensures that they are also typical distortions (see [2], Chapters 10.5 and 10.6). Hence, the distortion between these x E n and their codewords is bounded by D + δ where δ goes to 0 as n . In the first-order analysis, that is, n , we can regard D + δ as D.
  • Strongly typical sequences x E n T δ n ( X E ) such that f n ( x E n ) = M n .
  • Non-strongly typical sequences x E n T δ n ( X E ) .
The sequences in the second and third categories are encoded as f n ( x E n ) = M n . The sequences of third categories are the sequences that can be bounded by such the distortion d max as in excess of D. Then, the excess-distortion probability is evaluated as
Pr 1 n d ( X R n , X ^ R n ) > D < Pr X E n A ( M n )
= Pr X E n j = 1 M n 1 A ( j )
( 2 | X E | + 1 ) e 2 δ 2 n .
Hence, for an appropriate choice of ϵ and n, we can ensure the excess-distortion probability of all badly represented sequences are as small as we want. We obtain (113).
We can evaluate privacy leakage against the encoder as below.
e n 1 n I ( X H n ; X E n ) = ( e ) 1 n i = 1 n I ( X H , j ; X E n | X H j 1 ) = ( f ) 1 n i = 1 n I ( X H , j ; X E , j ) = ( g ) I ( X H ; X E ) ,
where
(e)
is due to chain rule for mutual information and
(f), (g)
follows because i . i . d . P X K n .
Thus, we have (114).
Next, we show that the probability that random vector X K n is not included in the set j = 1 M n 1 A ˜ ( j ) and is sufficiently small. First, notice that
x K n j = 1 M n 1 A ˜ ( j ) x E n j = 1 M n 1 A ( j ) or x E n A ( j 0 ) , ( x E n , x E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( x E n ) ,
where j 0 is the index such that f n ( x E n ) = j 0 for 1 j 0 M n 1 . Therefore, by the union’s upper bound,
Pr X K n j = 1 M n 1 A ˜ ( j ) Pr X E n j = 1 M n 1 A ( j ) + Pr { X E n A ( j 0 ) , ( X E n , X E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( X E n ) } .
We evaluate each term in (A68).
(i)
The first term:
Pr X E n j = 1 M n 1 A ( j ) ( h ) ( 2 | X E | + 1 ) e 2 δ 2 n ,
where
(h)
is because of (A62).
(ii)
The second term:
If the event in the second term occurs, x E n T δ n ( X E | x ^ R n ( j 0 ) ) and ( x E n , x E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) . Therefore, from Lemma 5, x E c n T δ n ( X E c | x E n , x ^ R n ( j 0 ) ) holds. Hence,
Pr { X E n A ( j 0 ) , ( X E n , X E c n ) T 2 δ n ( X K | x ^ R n ( j 0 ) ) for j 0 = f n ( X E n ) } Pr { X E n A ( j 0 ) , X E c n T δ n ( X E c | X E n , x ^ R n ( j 0 ) ) } j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n } = ( i ) j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · Pr { X E c n T δ n ( X E c | x E n , x ^ R n ( j ) ) | X E n = x E n , X ^ R n = x ^ R n ( j ) } ( j ) j = 1 M n 1 x E n A ( j ) Pr { X E n = x E n } · 2 | X E c | · | X E | · | X ^ R | e 2 δ 2 n ( k ) 2 | X K | · | X ^ R | e 2 δ 2 n ,
where
(i)
is due to the Markov chain X E c n X E n X ^ R n ,
(j)
follows since x E n T δ n ( X E | x ^ R n ( j 0 ) ) and Lemma 6,
(k)
follows because A ( j ) is disjoint for each j.
From (A68)–(A70),
Pr X K n j = 1 M n 1 A ˜ ( j ) 4 | X K | · | X ^ R | e 2 δ 2 n .
Therefore, for sufficiently large n,
Pr X K n j = 1 M n 1 A ˜ ( j ) τ ,
and we obtain (115).
From Lemma 1, for sufficiently large n to stochastic matrix W : X ^ R X K and x ^ R n ( j ) T δ n ( X ^ R ) we can show that
1 n log | T δ 2 n ( X K | x ^ R n ( j ) ) | H ( X K | X ^ R ) τ , δ 2 δ | X E c | .
We can also show from (A73) that
2 n { H ( X K | X ^ R ) τ } | T δ 2 n ( X K | x ^ R n ( j ) ) | 2 n { H ( X K | X ^ R ) + τ } .
From the definition of A ˜ ( j ) and T δ 2 n ( X K | x ^ R n ( j ) ) and Lemma 4, for j = 1 , 2 , , M n 1 , we have
x K n A ˜ ( j ) x E n T δ n ( X E | x ^ R n ( j ) ) x K n T 2 δ n ( X K | x ^ R n ( j ) )
x K n T δ 2 n ( X K | x ^ R n ( j ) ) x E n T δ n ( X E | x ^ R n ( j ) ) x K n T 2 δ n ( X K | x ^ R n ( j ) )
This means
T δ 2 n ( X K | x ^ R n ( j ) ) A ˜ ( j ) | T δ 2 n ( X K | x ^ R n ( j ) ) | | A ˜ ( j ) | .
Therefore, from (A74) and (A77),
| A ˜ ( j ) | 2 n { H ( X K | X ^ R ) τ } ,
and we obtain (116).

References

  1. Yamamoto, H. A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers. IEEE Trans. Inf. Theory 1983, 29, 918–923. [Google Scholar] [CrossRef] [Green Version]
  2. Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility–Privacy tradeoff in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef] [Green Version]
  3. Shinohara, N.; Yagi, H. Unified expression of utility–privacy trade-off in privacy-constrained source coding. In Proceedings of the 2022 International Symposium on Information Theory and Its Applications (ISITA2022), Tsukuba, Japan, 17–19 October 2022; pp. 198–202. [Google Scholar]
  4. Ingber, A.; Kochman, Y. The dispersion of lossy source coding. In Proceedings of the 2011 Data Compression Conference, Snowbird, UT, USA, 29–31 March 2011; pp. 53–62. [Google Scholar]
  5. Kostina, V.; Verdú, S. Fixed length lossy compression in the finite blocklength regime: Discrete memoryless sources. IEEE Trans. Inf. Theory 2012, 58, 3309–3338. [Google Scholar] [CrossRef] [Green Version]
  6. Watanabe, S. Second-order region for Gray-Wyner network. IEEE Trans. Inf. Theory 2017, 63, 1006–1018. [Google Scholar] [CrossRef] [Green Version]
  7. Tyagi, H.; Watanabe, S. Strong converse using change of measure arguments. IEEE Trans. Inf. Theory 2020, 66, 689–703. [Google Scholar] [CrossRef] [Green Version]
  8. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference Theory Cryptograph (TCC), New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
  9. Dwork, C. Differential privacy. In Proceedings of the 33rd International Conference Automata, Languages and Programming (ICALP), Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
  10. Soria-Comas, J.; Domingo-Ferrer, J.; Sánchez, D.; Megías, D. Individual differential privacy: A utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  11. Kalantari, K.; Sankar, L.; Sarwate, A.D. Robust privacy-utility tradeoffs under differential privacy and hamming distortion. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2816–2830. [Google Scholar] [CrossRef] [Green Version]
  12. Makhdoumi, A.; Fawaz, N. Privacy-utility tradeoff under statistical uncertainty. In Proceedings of the 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–4 October 2013; pp. 1627–1634. [Google Scholar]
  13. Basciftci, Y.O.; Wang, Y.; Ishwar, P. On privacy-utility tradeoffs for constrained data release mechanisms. In Proceedings of the 2016 Information Theory and Applications Workshop (ITA), La Jolla, CA, USA, 31 January–5 February 2016; pp. 1–6. [Google Scholar]
  14. Günlü, O.; Schaefer, R.F.; Boche, H.; Poor, H.V. Secure and private source coding with private key and decoder side information. In Proceedings of the 2022 IEEE Information Theory Workshop (ITW), Mumbai, India, 6–9 November 2022; pp. 226–231. [Google Scholar]
  15. Issa, I.; Wagner, A.B.; Kamath, S. An operational approach to information leakage. IEEE Trans. Inf. Theory 2020, 66, 1625–1657. [Google Scholar] [CrossRef] [Green Version]
  16. Liao, J.; Kosut, O.; Sankar, L.; Calmon, F.P. Privacy under hard distortion constraints. In Proceedings of the 2018 IEEE Information Theory Workshop (ITW2018), Guangzhou, China, 25–29 November 2018; pp. 1–5. [Google Scholar]
  17. Liao, J.; Kosut, O.; Sankar, L.; Calmon, F.P. Tunable measures for information leakage and applications to privacy-utility tradeoffs. IEEE Trans. Inf. Theory 2019, 65, 8043–8066. [Google Scholar] [CrossRef] [Green Version]
  18. Saeidian, S.; Cervia, G.; Oechtering, T.J.; Skoglund, M. Quantifying membership privacy via information leakage. IEEE Trans. Inf. Forensics Secur. 2020, 16, 3096–3108. [Google Scholar] [CrossRef]
  19. Rassouli, B.; Gündüz, D. Optimal utility–privacy trade-off with total variation distance as a privacy measure. IEEE Trans. Inf. Forensics Secur. 2019, 15, 594–603. [Google Scholar] [CrossRef] [Green Version]
  20. Wang, W.; Ying, L.; Zhang, J. On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Trans. Inf. Theory 2016, 62, 5018–5029. [Google Scholar] [CrossRef] [Green Version]
  21. Liao, J.; Sankar, L.; Kosut, O.; Calmon, F.P. Maximal α-leakage and its properties. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Virtual, 29 June–1 July 2020; pp. 1–6. [Google Scholar]
  22. Shinohara, N.; Yagi, H. Strong converse theorem for utility–privacy trade-offs. In Proceedings of the 45th Symposium on Information Theory and Its Applications (SITA2022), Noboribetsu, Japan, 29 November–2 December 2022; pp. 338–343. [Google Scholar]
  23. Guan, Z.; Si, G.; Wu, J.; Zhu, L.; Zhang, Z.; Ma, Y. Utility–privacy tradeoff based on random data obfuscation in internet of energy. IEEE Access 2017, 5, 3250–3262. [Google Scholar] [CrossRef]
  24. Asikis, T.; Pournaras, E. Optimization of privacy-utility trade-offs under informational self-determination. Future Gener. Comput. Syst. 2020, 109, 488–499. [Google Scholar] [CrossRef]
  25. Lu, J.; Xu, Y.; Zhu, Z. On scalable source coding problem with side information privacy. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 1–3 November 2022; pp. 415–420. [Google Scholar]
  26. Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the 2014 IEEE Information Theory Workshop (ITW), Hobart, Australia, 2–5 November 2014; pp. 501–505. [Google Scholar]
  27. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John & Wiley Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
  28. Uyematsu, T. Gendai Shannon Riron, 1st ed.; Baihukan: Tokyo, Japan, 1998. (In Japanese) [Google Scholar]
  29. Csizar, L.; Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  30. Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef] [Green Version]
  31. El Gamal, A.; Kim, Y.H. Network Information Theory, 1st ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Figure 1. Privacy-constrained coding system.
Figure 1. Privacy-constrained coding system.
Entropy 25 00921 g001
Figure 2. Road map for second-order rate analysis [1,3,8].
Figure 2. Road map for second-order rate analysis [1,3,8].
Entropy 25 00921 g002
Figure 3. The region expressed with a tangent plane using the Legendre transformation.
Figure 3. The region expressed with a tangent plane using the Legendre transformation.
Entropy 25 00921 g003
Figure 4. Utility–privacy trade-off region in cases (i), (ii), and (iii).
Figure 4. Utility–privacy trade-off region in cases (i), (ii), and (iii).
Entropy 25 00921 g004
Figure 5. Utility–coding-rate trade-off region in cases (i), (ii), and (iii). The curves coincide in all cases.
Figure 5. Utility–coding-rate trade-off region in cases (i), (ii), and (iii). The curves coincide in all cases.
Entropy 25 00921 g005
Figure 6. Hasse diagram that represents the inclusion relation for the index sets of attributes.
Figure 6. Hasse diagram that represents the inclusion relation for the index sets of attributes.
Entropy 25 00921 g006
Figure 7. Hasse diagram for Step 1.
Figure 7. Hasse diagram for Step 1.
Entropy 25 00921 g007
Figure 8. Hasse diagram for Step 2.
Figure 8. Hasse diagram for Step 2.
Entropy 25 00921 g008
Figure 9. Hasse diagram obtained after Step 2.
Figure 9. Hasse diagram obtained after Step 2.
Entropy 25 00921 g009
Table 1. Minimum L and its corresponding R for D = 0.0500 .
Table 1. Minimum L and its corresponding R for D = 0.0500 .
CasesLeakage LCoding Rate R
case (ii)0.0195120.494629
case (iii)0.0082980.527700
case (i)0.0051070.539478
Table 2. Minimum L and its corresponding R for D = 0.100 .
Table 2. Minimum L and its corresponding R for D = 0.100 .
CasesLeakage LCoding Rate R
case (ii)0.0153780.368062
case (iii)0.0026560.418826
case (i)0.0000000.429490
Table 3. Minimum L and its corresponding R for D = 0.1500 .
Table 3. Minimum L and its corresponding R for D = 0.1500 .
CasesLeakage LCoding Rate R
case (ii)0.0117480.270436
case (iii)0.0020320.294424
case (i)0.0000000.382211
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shinohara, N.; Yagi, H. Utility–Privacy Trade-Offs with Limited Leakage for Encoder. Entropy 2023, 25, 921. https://doi.org/10.3390/e25060921

AMA Style

Shinohara N, Yagi H. Utility–Privacy Trade-Offs with Limited Leakage for Encoder. Entropy. 2023; 25(6):921. https://doi.org/10.3390/e25060921

Chicago/Turabian Style

Shinohara, Naruki, and Hideki Yagi. 2023. "Utility–Privacy Trade-Offs with Limited Leakage for Encoder" Entropy 25, no. 6: 921. https://doi.org/10.3390/e25060921

APA Style

Shinohara, N., & Yagi, H. (2023). Utility–Privacy Trade-Offs with Limited Leakage for Encoder. Entropy, 25(6), 921. https://doi.org/10.3390/e25060921

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop