Next Article in Journal
Life on the Edge: Latching Dynamics in a Potts Neural Network
Next Article in Special Issue
On the Capacity and the Optimal Sum-Rate of a Class of Dual-Band Interference Channels
Previous Article in Journal
Statistics of Binary Exchange of Energy or Money
Previous Article in Special Issue
On the Reliability Function of Variable-Rate Slepian-Wolf Coding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Channel Coding and Source Coding With Increased Partial Side Information

Department of Electrical and Computer Engineering at the Ben Gurion University of the Negev, Beer Sheva 84105, Israel
*
Authors to whom correspondence should be addressed.
Entropy 2017, 19(9), 467; https://doi.org/10.3390/e19090467
Submission received: 30 June 2017 / Revised: 25 August 2017 / Accepted: 30 August 2017 / Published: 2 September 2017
(This article belongs to the Special Issue Multiuser Information Theory)

Abstract

:
Let ( S 1 , i , S 2 , i ) i . i . d p ( s 1 , s 2 ) , i = 1 , 2 , be a memoryless, correlated partial side information sequence. In this work, we study channel coding and source coding problems where the partial side information ( S 1 , S 2 ) is available at the encoder and the decoder, respectively, and, additionally, either the encoder’s or the decoder’s side information is increased by a limited-rate description of the other’s partial side information. We derive six special cases of channel coding and source coding problems and we characterize the capacity and the rate-distortion functions for the different cases. We present a duality between the channel capacity and the rate-distortion cases we study. In order to find numerical solutions for our channel capacity and rate-distortion problems, we use the Blahut-Arimoto algorithm and convex optimization tools. Finally, we provide several examples corresponding to the channel capacity and the rate-distortion cases we presented.

1. Introduction

In this paper, we investigate point-to-point channel models and rate-distortion problem models where both users have different and correlated partial side information and where, in addition, a rate-limited description of one of the user’s side information is delivered to the other user. We then show the duality between the channel models and the rate-distortion models we investigate.
For the convenience of the reader, we refer to the state information as the side information, to the partial side information that is available to the encoder as the encoder’s side information (ESI) and to the partial side information that is available to the decoder as the decoder’s side information (DSI). We refer ro the rate-limited description of the other user’s side information as the increase in the side information. For example, if the decoder is informed with its DSI and, in addition, with a rate-limited description of the ESI, then we would say that the decoder is informed with increased DSI.
To make the motivation for this paper clear, let us look at the simple example depicted in Figure 1. In this setup, the communication between the Tx-Rx pair (the encoder-decoder) is interrupted by an undesired signal, S. The encoder and the decoder do not know S perfectly, but they each possess a version of S; the encoder knows S 1 (the ESI) and the decoder knows S 2 (the DSI). For this example, let us assume that the source of the interruption is physically located in close proximity to the encoder (potentially, both signal sources are co-located). Thus, we assume that the encoder “knows more on S” then the decoder; i.e., H ( S | S 1 ) < H ( S | S 2 ) . We assume also that the transmitter can provide a rate-limited description of the ESI, S 1 , to the decoder, thus increasing his DSI. In these circumstances, we pose the question; what is the capacity of the channel between the encoder and the decoder? This question is of practical importance. Knowing the channel capacity allows one to analyze its communication system better, answering questions such as “how close is the communication system’s performance to the capacity?” and “how important is the quality of the side information to the throughput?”. Moreover, it allows one to design better practical codes, like polar codes and LDPC codes.

1.1. Channel Capacity in the Presence of State Information

The three problems of channel capacity in the presence of state information that we address in this paper are presented in Figure 2a. We make the assumption that the encoder is informed with partial state information, the ESI ( S 1 ), and the decoder is informed with different, but correlated, partial state information, which is the DSI ( S 2 ). The channel capacity problem cases are:
  • Case 1: The decoder is provided with increased DSI; i.e., in addition to the DSI, the decoder is also informed with a rate-limited description of the ESI.
  • Case 2: The encoder is informed with increased ESI.
  • Case 2 C : Similar to Case 2, with the exception that the ESI is known to the encoder in a causal manner. Notice that the rate-limited description of the DSI is still known to the encoder noncausally.
We will subsequently provide the capacity of Case 1 and Case 2 C and characterize the lower and the upper bounds on Case 2, which differ only by a Markov relation. The results for the first case under discussion, Case 1, can be concluded from Steinberg’s problem [1]. In [1], Steinberg introduced and solved the case in which the encoder is fully informed with the ESI and the decoder is informed with a rate-limited description of the ESI. Therefore, the innovation in Case 1 is that the decoder is also informed with the DSI. The solution for this problem can be derived by considering the DSI to be a part of the channel’s output in Steinberg’s solution. In the proof of the converse in his paper, Steinberg uses a new technique that involves using the Csiszár sum twice in order to get to a single-letter bound on the rate. We shall use this technique to present a duality in the converse of the Gelfand–Pinsker [2] and the Wyner-Ziv [3] problems, which, by themselves, constitute the basis for most of the results in this paper. In [3], Wyner and Ziv presented the rate-distortion function for data compression problems with side information at the decoder. We make use of their coding scheme in the achievability proof of the lower bound of Case 2 for describing the ESI with a limited rate at the decoder. In [2], Gelfand and Pinsker presented the capacity for a channel with noncausal channel state information (CSI) at the encoder. We use their coding scheme in the achievability proof of Case 1 and the lower bound of Case 2 for transmitting information over a channel where the ESI is the state information at the encoder. Therefore, we combine in our problems the Gelfand–Pinsker and the Wyner-Ziv problems. Another related paper is [4], in which Shannon presented the capacity of a channel with causal CSI at the transmitter. We make use of Shannon’s result in the achievability proof of Case 2 C for communicating over a channel with causal ESI at the encoder. We also use Shannon’s strategies [4], for developing an iterative algorithm to calculate the capacity of the cases we present in this paper.
Some related papers that can be found in the literature are mentioned herein. Heegard and El Gamal [5] presented a model of a state-dependent channel, where the transmitter is informed with the CSI at a rate limited to R e and the receiver is informed with the CSI at a rate limited to R d . This result relates to Case 1, Case 2 and Case 2 C since we consider the rate-limited description of the ESI or the DSI as side information known at both the encoder and the decoder. Cover and Chiang [6] extended the Gelfand–Pinsker problem and the Wyner-Ziv problem to the case where both the encoder and the decoder are provided with different, but correlated, partial side information. They also showed a duality between the two cases, which is a topic that will be discussed later in this paper. Rozenzweig et al. [7] and Cemal and Steinberg [8] studied channels with partial state information at the transmitter. A detailed subject review on channel coding with state information was given by Keshet et al. in [9].
In addition to these three cases, we also present a more general case, where both the encoder and the decoder are informed with increased partial side information. i.e., the encoder and the decoder are each informed with partial side information, and, in addition, with a rate-limited description of the other’s side information. We provide a lower bound on the capacity for this case; however, this bound does not necessarily coincide with the capacity and, therefore, this problem remains open.

1.2. Rate-Distortion with Side Information

In this paper, we address three problems of rate-distortion with side information, as presented in Figure 2b. In common with the channel capacity problems, we assume that the encoder is informed with the ESI ( S 1 ) and the decoder is informed with the DSI ( S 2 ), where the source, X, the ESI and the DSI are correlated. The rate-distortion problem cases we investigate in this paper are:
  • Case 1: The decoder is provided with increased DSI.
  • Case 1 C : Similar to Case 1, with the exception that the ESI is known to the encoder in a causal manner. The rate-limited description of the ESI is still known to the decoder noncausally.
  • Case 2: The encoder is informed with increased ESI.
Case 2 is a special case of Kaspi’s [10] two-way source coding for K = 1 . In [10], Kaspi introduced a model of multistage communication between two users, where each user may transmit up to K messages to the other user, dependent on the source and the previous received messages. For Case 2, we can consider sending the rate-limited description of the DSI as the first transmission and then, sending a function of the source, the ESI and the rate-limited description of the DSI as the second transmission. This fits into Kaspi’s problem for K = 1 and thus Kaspi’s theorem also applies to Case 2. Kaspi’s problem was later extended by Permuter et al. [11] to the case where a common rate-limited side information message is being conveyed to both users. Another strongly related paper is Wyner and Ziv’s paper [3]. In the achievability of Case 1, we use the Wyner-Ziv coding scheme twice; once for describing the ESI at the decoder where the DSI is the side information and once for the main source and the ESI where the DSI is the side information. The rate-limited description of the ESI is the side information provided to both the encoder and the decoder. In [6] there is an extension to the Wyner-Ziv problem to the case where both the encoder and the decoder are provided with correlated partial side information. Weissman and El Gamal [12] and Weissman and Merhav [13] presented source coding with causal side information at the decoder, which relates to Case 1 C . In addition, we present a generalized case of rate-distortion with two-sided increased partial side information. In this problem setup the encoder and the decoder are each informed with partial side information, and, in addition, with a rate-limited description of the other’s side information. We present an upper bound on the optimal rate; however, this bound does not necessarily coincide with the optimal rate and, therefore, this problem remains open.

1.3. Duality

Within the scope of this work, we point out a duality relation between the channel capacity and the rate-distortion cases we discuss. The operational duality between channel coding and source coding was first mentioned by Shannon [14]. Pradhan et al. [15] and Pradhan and Ramchandran [16] studied the functional duality between some cases of channel coding and source coding, including the duality between the Gelfand–Pinsker problem and the Wyner-Ziv problem. This duality was also described by Cover and Chiang in [6], where they provided a transformation that makes duality between channel coding and source coding with two-sided state information apparent. Zamir et al. [17] and Su et al. [18] utilized the duality between channel coding and source coding with side information to develop coding schemes for the dual problems. Goldfeld, Permuter and Kramer [19] studied the duality between a two-encoder source coding with one-sided, rate-limited coordination and a semi-deterministic broadcast channel with one-sided decoder cooperation. More related works on the topic of duality can be found in the papers of Asnani et al. [20] and Gupta and Verdu [21].
In our paper, we show that the channel capacity cases and the rate-distortion cases we discuss are operational duals in a way that strongly relates to the Wyner-Ziv and Gelfand–Pinsker duality. We also provide a transformation scheme that shows this duality in a clear way. Moreover, we show a duality relation between Kaspi’s problem and Steinberg’s [1] problem by showing a duality relation between Case 2 source coding and Case 1 channel coding. Also, we show duality in the converse parts of the Gelfand–Pinsker and the Wyner-Ziv problems. We show that both converse parts can be proven in a perfectly dual way by using the Csiszár sum twice.

1.4. Computational Algorithms

Calculating channel capacity and rate-distortion problems, in general, and the Gelfand–Pinsker and the Wyner-Ziv problems, in particular, is not straightforward. Blahut [22] and Arimoto [23] suggested an iterative algorithm (to be referred to as the B-A algorithm) for numerically computing the channel capacity and the rate-distortion problems. Willems [24] and Dupuis et al. [25] presented iterative algorithms based on the B-A algorithm for computing the Gelfand–Pinsker and the Wyner-Ziv functions. We use principles from Willems’ algorithms to develop an algorithm to numerically calculate the capacity for the cases we presented. More B-A based iterative algorithms for computing channel capacity and rate-distortion with side information can be found in [26,27]. A Blahut-Arimoto based algorithm for maximizing the directed-information can be found in [28].

1.5. Organization of the Paper and Main Contributions

To summarize, the main contributions of this paper are:
  • We characterize the capacity and the rate-distortion functions of new channel and source coding problems with increased partial side information. We quantify the gain in the rate that can be achieved by having the parties involved share their partial side information with each other over a rate-limited secondary channel.
  • We show a duality relationship between the channel capacity cases and the rate-distortion cases that we discuss.
  • We provide a B-A based algorithm to solve the channel capacity problems we describe.
  • We show a duality between the Gelfand–Pinsker capacity converse and the Wyner-Ziv rate-distortion converse.
The reminder of this paper is organized as follows. In Section 2 we introduce some notations for this paper and provide the settings of three channel coding and three source coding cases with increased partial side information. In Section 3 we present the main results for coding with increased partial side information; we provide the capacity and the rate-distortion for the cases we introduced in Section 2 and we point out the duality between the cases we examined. Section 4 contains illuminating examples for the cases discussed in the paper. In Section 5 we describe the B-A based algorithm we used in order to solve the capacity examples. We conclude the paper in Section 6 and we highlight two open problems; channel capacity and rate-distortion with two-sided rate-limited partial side information. Appendix A contains the duality derivation for the converse proofs of the Gelfand–Pinsker and the Wyner-Ziv problems and Appendix B, Appendix C, Appendix D and Appendix E contain the proofs for our theorems and lemmas.

2. Problem Setting and Definitions

In this section, we describe and formally define three cases of channel coding problems and three cases of source coding problems. All six cases are presented in Figure 2a,b.
Notations. 
We use subscripts and superscripts to denote vectors in the following ways: x j = ( x 1 , , x j ) and x i j = ( x i , , x j ) for i j . Moreover, we use the lower case x to denote sample value, the upper case X to denote a random variable, the calligraphic letter X to denote the alphabet of X, | X | to denote the cardinality of the alphabet of X and p ( x ) to denote the probability Pr { X = x } . We use the notation T ϵ ( n ) ( X ) to denote the strongly typical set of the random variable X, as defined in [29] (Chapter 11).

2.1. Definitions and Problem Formulation—Channel Coding with State Information

Definition 1.
A discrete channel is defined by the set { X , S 1 , S 2 , p ( s 1 , s 2 ) , p ( y | x , s 1 , s 2 ) , Y } . The channel’s input sequence, { X i X , i = 1 , 2 , } , the ESI sequence, { S 1 , i S 1 , i = 1 , 2 , } , the DSI sequence, { S 2 , i S 2 , i = 1 , 2 , } , and the channel’s output sequence, { Y i Y , i = 1 , 2 , } , are discrete random variables drawn from the finite alphabets X , S 1 , S 2 , Y , respectively. Denote the message and the message space as W { 1 , 2 , , 2 n R } and let W ^ be the reconstruction of the message W. The random variables ( S 1 , i , S 2 , i ) are i.i.d. p ( s 1 , s 2 ) and the channel is memoryless, i.e., at time i, the output, Y i , has a conditional distribution of
p ( y i | x i , s 1 i , s 2 i , y i 1 ) = p ( y i | x i , s 1 , i , s 2 , i ) .
In the remainder of the paper, unless specifically mentioned otherwise, we refer to the ESI and the DSI as if they are known to the encoder and the decoder, respectively, in a noncausal manner. Also, as noted before, we use the term increased side information to indicate that the user’s side information also includes a rate-limited description of the other user’s partial side information. For example, when the decoder is informed with the DSI and with a rate-limited description of the ESI we would say that the decoder is informed with increased DSI.
Problem Formulation.
For the channel p ( y | x , s 1 , s 2 ) , consider the following channel coding problem cases:
  • Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI.
  • Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI.
  • Case 2 C : The encoder is informed with increased causal ESI ( S 1 i at time i) and the decoder is informed with DSI. This case is the same as Case 2, except for the causal ESI.
All cases are presented in Figure 2a.
Definition 2.
A ( n , 2 n R , 2 n R j ) code, { j 1 , 2 } , for a channel with increased partial side information, as illustrated in Figure 2a, consists of two encoders and one decoder. The encoders are f and f v , where f is the encoder for the channel’s input and f v is the encoder for the side information, and the decoder is g, as described for each case:
Case 1: Two encoders
f v : S 1 n { 1 , 2 , , 2 n R 1 } , f : { 1 , 2 , , 2 n R } × S 1 n × { 1 , 2 , , 2 n R 1 } X n ,
and a decoder
g : Y n × S 2 n × { 1 , 2 , , 2 n R 1 } { 1 , 2 , , 2 n R } .
Case 2: Two encoders
f v : S 2 n { 1 , 2 , , 2 n R 2 } , f : { 1 , 2 , , 2 n R } × S 1 n × { 1 , 2 , , 2 n R 2 } X n ,
and a decoder
g : Y n × S 2 n × { 1 , 2 , , 2 n R 2 } { 1 , 2 , , 2 n R } .
Case 2 C : Two encoders
f v : S 2 n { 1 , 2 , , 2 n R 2 } , f i : { 1 , 2 , , 2 n R } × S 1 i × { 1 , 2 , , 2 n R 2 } X i ,
and a decoder
g : Y n × S 2 n × { 1 , 2 , , 2 n R 2 } { 1 , 2 , , 2 n R } .
The average probability of error, P e ( n ) , for a ( n , 2 n R , 2 n R j ) code is defined as
P e ( n ) = 1 2 n R w = 1 2 n R Pr W ^ W | W = w ,
where the index W is chosen according to a uniform distribution over the set { 1 , 2 , , 2 n R } . A rate pair ( R , R ) is said to be achievable if there exists a sequence of ( n , 2 n R , 2 n R ) codes such that the average probability of error P e ( n ) 0 as n .
Definition 3.
The capacity of the channel, C ( R ) , is the supremum of all R such that the rate pair ( R , R ) is achievable.

2.2. Definitions and Problem Formulation—Source Coding with Side Information

Throughout this article we use the common definitions of rate-distortion as presented in [29].
Definition 4.
The source sequence { X i X , i = 1 , 2 , } , the ESI sequence { S 1 , i S 1 , i = 1 , 2 , } and the DSI sequence { S 2 , i S 2 , i = 1 , 2 , } are discrete random variables drawn from the finite alphabets X , S 1 and S 2 respectively. The random variables ( X i , S 1 , i , S 2 , i ) are i.i.d p ( x , s 1 , s 2 ) . Let X ^ be the reconstruction alphabet and d x : X × X ^ [ 0 , ) be the distortion measure. The distortion between sequences is defined in the usual way:
d ( x n , x ^ n ) = 1 n i = 1 n d ( x i , x ^ i ) .
Problem Formulation.
For the source, X, the ESI, S 1 , and the DSI, S 2 , consider the following source coding problem cases:
  • Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI.
  • Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI.
  • Case 1 C : The encoder is informed with ESI and the decoder is informed with increased causal DSI ( S 2 i at time i). This case is the same as Case 1, except for the causal DSI.
All cases are presented in Figure 2b.
Definition 5.
A ( n , 2 n R , 2 n R j , D ) code, { j 1 , 2 } , for the source X with increased partial side information, as illustrated in Figure 2b, consists of two encoders, one decoder and a distortion constraint. The encoders are f and f v , where f is the encoder for the source and f v is the encoder for the side information, and the decoder is g, as described for each case:
Case 1: Two encoders
f v : S 1 n { 1 , 2 , , 2 n R 1 } , f : X n × S 1 n × { 1 , 2 , , 2 n R 1 } { 1 , 2 , , 2 n R } ,
and a decoder
g : { 1 , 2 , , 2 n R } × S 2 n × { 1 , 2 , , 2 n R 1 } X ^ n .
Case 2: Two encoders
f v : S 2 n { 1 , 2 , , 2 n R 2 } , f : X n × S 1 n × { 1 , 2 , , 2 n R 2 } { 1 , 2 , , 2 n R } ,
and a decoder
g : { 1 , 2 , , 2 n R } × S 2 n × { 1 , 2 , , 2 n R 2 } X ^ n .
Case 1 C : Two encoders
f v : S 1 n { 1 , 2 , , 2 n R 1 } , f : X n × S 1 n × { 1 , 2 , , 2 n R 1 } { 1 , 2 , , 2 n R } ,
and a decoder
g i : { 1 , 2 , , 2 n R } × S 2 i × { 1 , 2 , , 2 n R 1 } X ^ i .
The distortion constraint for all three cases is:
E 1 n i = 1 n d ( X i , X ^ i ) D .
For a given distortion, D, and for any ϵ > 0 , the rate pair ( R , R ) is said to be achievable if there exists a ( n , 2 n R , 2 n R , D + ϵ ) code for the rate-distortion problem.
Definition 6.
For a given R and distortion D, the operational rate R * ( R , D ) is the infimum of all R, such that the rate pair ( R , R ) is achievable.

3. Results

In this section, we present the main results of this paper. We will first present the results for the channel coding cases, then the main results for the source coding cases and, finally, we will present the duality between them.

3.1. Channel Coding with Side Information

For a channel with two-sided state information as presented in Figure 2a, where ( S 1 , i , S 2 , i ) p ( s 1 , s 2 ) , the capacity is as follows
Theorem 1 (The capacity for the cases in Figure 2a).
For the memoryless channel p ( y | x , s 1 , s 2 ) , where S 1 is the ESI and S 2 is the DSI and the side information ( S 1 , i , S 2 , i ) p ( s 1 , s 2 ) , the channel capacity is
Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI,
C 1 * = max p ( v 1 | s 1 ) p ( u | s 1 , v 1 ) p ( x | u , s 1 , v 1 ) s . t . R I ( V 1 ; S 1 ) I ( V 1 ; Y , S 2 ) I ( U ; Y , S 2 | V 1 ) I ( U ; S 1 | V 1 ) .
Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI;
Lower bounded by
C 2 l b * = max p ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) s . t . R I ( V 2 ; S 2 | S 1 ) I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) .
Upper bounded by
C 2 u b 1 * = max p ( v 2 | s 1 , s 2 ) p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) s . t . R I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 )
and by
C 2 u b 2 * = max p ( v 2 | s 2 ) p ( u | s 1 , s 2 , v 2 ) p ( x | u , s 1 , v 2 ) s . t . R I ( V 2 ; S 2 | S 1 ) I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) .
Case 2 C : The encoder is informed with increased causal ESI ( S 1 i at time i) and the decoder is informed with DSI,
C 2 C * = max p ( v 2 | s 2 ) p ( u | v 2 ) p ( x | u , s 1 , v 2 ) R I ( V 2 ; S 2 ) I ( U ; Y , S 2 | V 2 ) .
For case j, j { 1 , 2 } , some joint distribution, p ( s 1 , s 2 , v j , u , x , y ) , and ( U , V j ) being some auxiliary random variables with bounded cardinality.
Appendix B contains the proof.
Lemma 1.
For all three channel coding cases described in this section and for j { 1 , 2 } , the following statements hold,
  • The function C j ( R ) is a concave function of R .
  • It is enough to take X to be a deterministic function of ( U , S 1 , V j ) to evaluate C j .
  • The auxiliary alphabets U and V j satisfy
    f o r   C a s e   1 : | V 1 | | X | | S 1 | | S 2 | + 1 a n d | U | | X | | S 1 | | S 2 | | X | | S 1 | | S 2 | + 1 , f o r   C a s e   2 : | V 2 | | S 1 | | S 2 | + 1 a n d | U | | X | | S 1 | | S 2 | | S 1 | | S 2 | + 1 , f o r   C a s e   2 C : | V 2 | | S 2 | + 1 a n d | U | | X | | S 2 | | S 2 | + 1 .
Appendix D contains the proof for the above lemma.
Remark 1.
Please notice that in Equation (14), the rate of the side information, I ( V 2 ; S 2 | S 1 ) , can be written as I ( V 2 ; S 2 ) I ( V 2 | S 1 ) . This is true since the Markov relation V 2 S 2 S 1 holds. Therefore, the only difference between the two upper bounds of Case 2, C 2 u b 1 * (13) and C 2 u b 2 * (14), is in the distribution over which we maximize. While for C 2 u b 1 * we restrict the maximization to distributions which maintain the Markov chain U ( S 1 , V 2 ) S 2 , for the second upper bound, C 2 u b 2 * , we restrict the maximization to distributions which maintain V 2 S 2 S 1 . We should note that we cannot state with certainty that one of the bounds is tighter than the other for all distributions p ( y , x , s 1 , s 2 ) and for all values of R . Notwithstanding, one bound may be tighter than the other for all distributions.

3.2. Source Coding with Side Information

For the problem of source coding with side information as presented in Figure 2b, the rate-distortion function is as follows:
Theorem 2 (The rate-distortion function for the cases in Figure 2b).
For a bounded distortion measure d ( x , x ^ ) , a source, X, and side information, S 1 , S 2 , where ( X i , S 1 , i , S 2 , i ) p ( x , s 1 , s 2 ) , the rate-distortion function is
Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI,
R 1 * ( D ) = min p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) s . t . R I ( V 1 ; S 1 | S 2 ) I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) .
Case 1 C : The encoder is informed with ESI and the decoder is informed with increased causal DSI ( S 2 i at time i),
R 1 C * ( D ) = min p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) s . t . R I ( V 1 ; S 1 ) I ( U ; X , S 1 | V 1 ) .
Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI,
R 2 * ( D ) = min p ( v 2 | s 2 ) p ( u | x , s 1 , v 2 ) p ( x ^ | u , s 2 , v 2 ) s . t . R I ( V 2 ; S 2 ) I ( V 2 ; X , S 1 ) I ( U ; X , S 1 | V 2 ) I ( U ; S 2 | V 2 ) .
For case j, j { 1 , 2 } , some joint distribution, p ( x , s 1 , s 2 , v j , u , x ^ ) , where E 1 n i = 1 n d ( X i , X ^ i ) D and ( U , V j ) being some auxiliary random variables with bounded cardinality.
Appendix C contains the proof.
Lemma 2.
For all cases of rate-distortion problems in this section and for j { 1 , 2 } , the following statements hold.
  • The function R j ( R , D ) is a convex function of R and D.
  • It is enough to take X ^ to be a deterministic function of ( U , S 2 , V j ) to evaluate R j .
  • The auxiliary alphabets U and V j satisfy
    f o r   C a s e   1 : | V 1 | | S 1 | | S 2 | + 1 a n d | U | | X | | S 1 | | S 2 | | S 1 | | S 2 | + 1 , f o r   C a s e   1 C : | V 1 | | S 1 | + 1 a n d | U | | X | | S 1 | | S 1 | + 1 , f o r   C a s e   2 : | V 2 | | X | | S 1 | | S 2 | + 1 a n d | U | | X | | S 1 | | S 2 | | X | | S 1 | | S 2 | + 1 .
Appendix D contains the proof for the above lemma.

3.3. Duality

We now investigate the duality between the channel coding and the source coding for the cases in Figure 2a,b. The following transformation makes the duality between the channel coding cases 1, 2, 2 C and the source coding cases 2, 1, 1 C , respectively, evident. The left column corresponds to channel coding and the right column to source coding. For cases j and j ¯ , where j , j ¯ { 1 , 2 } and j ¯ j , consider the transformation:
channel coding source coding C R ( D ) maximization minimization C j R j ¯ ( D ) X X ^ Y X S j S j ¯ V j V j ¯ U U ( 19 ) R R .
This transformation is an extension of the transformation provided in [6,15]. Note that while the channel capacity formula in Case j and the rate-distortion function in Case j ¯ are dual to one another in the sense of maximization-minimization, the corresponding rates R are not dual to each other in this sense; i.e., one would expect to see an opposite inequality ( ) for dual cases, where we have an inequality that is in the same direction ( ) in the R formulas. The duality in the side information rates, R , is then in the sense that the arguments in the formulas for the dual R are dual. This exception is due to the fact that while the Gelfand–Pinsker and the Wyner-Ziv problems for the main channel or the main rate-distortion problems are dual, the Wyner-Ziv problem for the side information stays the same; the only difference is the input and the output.

4. Examples

In this section, we provide examples for Case 2 of the channel coding theorem and for Case 1 of the source coding theorem. The numerical iterative algorithm, which we used to numerically calculate the lower bound, C 2 l b , is provided in the next section.
Example 1 (Case 2 channel coding for a binary channel).
Consider the binary channel illustrated in Figure 3. The alphabet of the input, the output and the two states is binary X = Y = S 1 = S 2 = { 0 , 1 } with ( S 1 , S 2 ) P S 1 S 2 being a joint probability mass function (PMF) matrix. The channel is dependent on the states S 1 and S 2 , where the encoder is fully informed with S 1 and with S 2 with a rate limited to R and the decoder is fully informed with S 2 . The dependence of the channel on the states is illustrated in Figure 3. If ( S 1 = 1 , S 2 = 0 ) then the channel is the Z channel with transition probability ϵ , if ( S 1 = 1 , S 2 = 1 ) then the channel has no error, if ( S 1 = 0 , S 2 = 0 ) then the channel is the X-channel and if ( S 1 = 0 , S 2 = 1 ) then the channel is the S-channel with transition probability of ϵ . The side information’s joint PMF is
P S 1 S 2 = 0.1 0.4 0.4 0.1 .
The expressions for the lower bound on the capacity C 2 l b ( R ) and for R are brought in Case 2 of Theorem 1.
In Figure 4, we provide the graph from of the computation of the lower bound on the capacity for the binary channel we are testing. In the graph, we present the lower bound, C 2 l b ( R ) , as a function of R . We also provide the Cover & Chiang [6] capacity (where R = 0 ) and the Gelfand and Pinsker [2] capacity (where R = 0 and the decoder is not informed with S 2 ).
Discussion:
  • The algorithm that we used to calculate C 2 l b ( R ) and R combines a grid-search and a Blahut-Arimoto-like algorithms. We first construct a grid of probabilities of the random variable V 2 given S 2 , namely, w ( v 2 | s 2 ) . Then, for every probability w ( v 2 | s 2 ) such that I ( V 2 ; S 2 | S 1 ) is close enough to R we calculate the maximum of I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) using the iterative algorithm described in the next section. We then choose the maximum over those maximums and declare it to be C 2 l b . By taking a fine grid of the probabilities w ( v 2 | s 2 ) the operation’s result can be arbitrarily close to C 2 l b .
  • For a given joint PMF matrix P S 1 S 2 , we can see that C 2 l b ( R ) is non-decreasing in R . Furthermore, since the expression I ( V 2 ; S 2 | S 1 ) is bounded by R max = max p ( v 2 | s 2 ) I ( V 2 ; S 2 | S 1 ) = H ( S 2 | S 1 ) , allowing R to be greater than R max cannot improve C 2 l b any more. i.e., C 2 l b ( R = R max ) = C 2 l b ( R > R max ) . Therefore, it is enough to allow R = R max to achieve C 2 l b , as if the encoder is fully informed with S 2 .
  • Although C 2 l b is a lower bound on the capacity, it can be significantly greater than the Cover-Chiang and the Gelfand–Pinsker rates for some channel models, as can be seen in this example. Moreover, we can actually state that C 2 l b is always greater than or equal to the Gelfand–Pinsker and the Cover-Chiang rates. This is due to the fact that when R = 0 , C 2 l b coincides with the Cover-Chiang rate, which, in its turn, is always greater than or equal to the Gelfand–Pinsker rate; since C 2 l b is also non-decreasing in R , it is obvious that our assertion holds.
Example 2 (Source coding Case 1 for a binary-symmetric source and Hamming distortion).
Consider the source X = S 1 S 2 , where S 1 , S 2 i . i . d . Bernoulli ( 0.5 ) , and consider the problem setting depicted in Case 1 of the source coding problems. It is sufficient for the decoder to reconstruct S 1 with distortion E d ( S 1 , S ^ 1 ) D in order to reconstruct X with the same distortion. Furthermore, the two rate-distortion problem settings illustrated in Figure 5 are equivalent.
For every achievable rate in Setting 1, E d ( S 1 , S ^ 1 ) D . Denote X ^ S ^ 1 S 2 , then, d ( S 1 , S ^ 1 ) = S 1 S ^ 1 = ( S 1 S 2 ) ( S ^ 1 S 2 ) = X X ^ = d ( X , X ^ ) and, therefore, E d ( S 1 , S ^ 1 ) D in Setting 1 E d ( X , X ^ ) D in Setting 2. In the same way, for Setting 2, denote S ^ 1 X ^ S 2 . Then, d ( X , X ^ ) = X X ^ = S 1 S ^ 1 and, therefore, E d ( X , X ^ ) D in Setting 2 E d ( S 1 , S ^ 1 ) D in Setting 1. Hence, we can conclude that the two settings are equivalent and, for any given 0 D and 0 R , the rate-distortion function is
R ( D ) = 1 H ( D ) R 1 H ( D ) R 0 0 1 H ( D ) R < 0 .
In Figure 6 we present the plot resulting for this example. It is easy to verify that the Wyner and Ziv rate and the Cover and Chiang rate for this setting are R W Z ( D ) = R C C ( D ) = max 1 H ( D ) , 0 .

5. Semi-Iterative Algorithm

In this section, we provide algorithms that numerically calculate the lower bound on the capacity of Case 2 of the channel coding problems. The calculation of the Gelfand–Pinsker and the Wyner-Ziv problems has been addressed in many papers in the past, including [5,24,25,26]. All these algorithms are based on Arimoto’s [23] and Blahut’s [22] algorithms and on the fact that the Wyner-Ziv and the Gelfand–Pinsker problems can be presented as convex optimization problems. On the contrary, our problems are not convex in all of their optimization variables and, therefore, cannot be presented as convex optimization problems. In order to solve our problems we devised a different approach which combines a grid-search and a Blauhut-Arimoto-like algorithm. In this section, we provide the mathematical justification for those two algorithms. Other algorithms to numerically compute the channel capacity or the rate-distortion of the rest of the cases presented in this paper can be derived using the principles that we describe in this section.

5.1. An Algorithm for Computing the Lower Bound on the Capacity of Case 2

Consider the channel in Figure 7 described by p ( y | x , s 1 , s 2 ) and consider the joint PMF p ( s 1 , s 2 ) . The capacity of this channel is lower bounded by max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) , where the maximization is over all PMFs p ( s 1 , s 2 ) w ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | s 1 , v 2 , u ) p y | x , s 1 , s 2 such that R I ( V 2 ; S 2 | S 1 ) . Notice that the lower bound expression is not concave in w ( v 2 | s 2 ) , which is the main difficulty with the computation of it. We first present an outline of the semi-iterative algorithm we developed, then we present the mathematical background and justification for the algorithm and, finally, we present the detailed algorithm.
For any fixed PMF w ( v 2 | s 2 ) denote
R w I ( V 2 ; S 2 | S 1 ) ,
C 2 , w l b max p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) .
Then, the lower bound on the capacity , C 2 l b ( R ) , can be expressed as
C 2 l b ( R ) = max w ( v 2 | s 2 ) s . t . R R w max p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) [ I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) ] max w ( v 2 | s 2 ) s . t . R R w C 2 , w l b .
The outline of the algorithm is as follows: for any given rate R H ( S 2 | S 1 ) , ϵ > 0 and δ > 0 ,
  • Establish a fine and uniformly spaced grid of legal PMFs, w ( v 2 | s 2 ) , and denote the set of all of those PMFs as W .
  • Establish the set W * : = { w ( v 2 | s 2 ) | w ( v 2 | s 2 ) W and R ϵ R w R } . This set is the set of all PMFs w ( v 2 | s 2 ) such that R w is ϵ -close to R from below. If W * is empty, go back to step 1 and make the grid finer. Otherwise, continue.
  • For every w ( v 2 | s 2 ) W * , perform a Blahut-Arimoto-like optimization to find C 2 , w l b with accuracy of δ .
  • Declare C 2 l b ( R ) = max w ( v 2 | s 2 ) W * C 2 l b ( ϵ , δ , W ) ( R ) .
Remark 2.
1. 
We considered only those R s such that R H ( S 2 | S 1 ) since H ( S 2 | S 1 ) is the maximal value that I ( V 2 ; S 2 | S 1 ) takes. The interpretation of this is that if the encoder is informed with S 1 , we cannot increase its side information about S 2 in more than H ( S 2 | S 1 ) . Therefore, for any H ( S 2 | S 1 ) R , we can limit R to be equal to H ( S 2 | S 1 ) in order to compute the capacity;
2. 
Since C 2 , w l b ( R ) is continuous in w ( v 2 | s 2 ) and bounded (for example, by I ( X ; Y | S 1 , S 2 ) from above and by I ( X ; Y ) from below), C 2 ( ϵ , δ , W ) ( R ) can be arbitrarily close to C 2 l b ( R ) for ϵ 0 , δ 0 and | W | .

5.1.1. Mathematical background and justification

Here we focus on finding the lower bound on the capacity of the channel for a fixed distribution w ( v 2 | s 2 ) , i.e., finding C 2 , w l b . Note that the mutual information expression I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) is concave in p ( u | s 1 , v 2 ) and convex in p ( x | u , s 1 , v 2 ) . Therefore, a standard convex maximization technique is not applicable for this problem. However, according to Dupuis, Yu and Willems [25], we can write the expression for the lower bound as C 2 , w l b = max q ( t | s 1 , v 2 ) I ( T ; Y , S 2 | V 2 ) I ( T ; S 1 | V 2 ) , where q ( t | s 1 , v 2 ) is a probability distribution over the set of all possible strategies t : S 1 × V 2 X , the input symbol X is selected using x = t ( s 1 , v 2 ) and p ( y | x , s 1 , s 2 ) = p ( y | x , s 1 , s 2 , v 2 ) = p y | t ( s 1 , v 2 ) , s 1 , s 2 , v 2 . Now, since I ( T ; Y , S 2 | V 2 ) I ( T ; S 1 | V 2 ) is concave in q ( t | s 1 , v 2 ) , we can use convex optimization methods to derive C 2 , w l b .
Denote the PMF
p ( s 1 , s 2 , v 2 , t , y ) p ( s 1 , s 2 ) w ( v 2 | s 2 ) q ( t | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) ,
and denote also
J w ( q , Q ) s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 , v 2 , t , y ) log Q ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 ) ,
Q * ( t | y , s 2 , v 2 ) s 1 p ( s 1 , s 2 , v 2 , t , y ) s 1 , t p ( s 1 , s 2 , v 2 , t , y ) .
Notice that Q * ( t | y , s 2 , v 2 ) is a marginal distribution of p ( s 1 , s 2 , v 2 , t , y ) and that J w ( q , Q * ) = I ( T ; Y , S 2 | V 2 ) I ( T ; S 1 | V 2 ) for the joint PMF p ( s 1 , s 2 , v 2 , t , y ) .
The following lemma is the key for the iterative algorithm.
Lemma 3.
C 2 , w l b = sup q ( t | s 1 , v 2 ) max Q ( t | y , s 2 , v 2 ) J w ( q , Q ) .
The proof for this is brought by Yeung in [30]. In addition, Yeung shows that the two-step alternating optimization procedure converges monotonically to the global optimum if the optimization function is concave. Hence, if we show that J w ( q , Q ) is concave, we can maximize it using an alternating maximization algorithm over q and Q.
Lemma 4.
The function J w ( q , Q ) is concave in q and Q simultaneously.
We can now proceed to calculate the steps in the iterative algorithm.
Lemma 5.
For a fixed q , J w ( q , Q ) is maximized for Q = Q * .
Proof. 
The above follows from the fact that Q * is a marginal distribution of p ( s 1 , s 2 ) w ( v 2 | s 2 ) q ( t | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) and the property of the K-L divergence D ( Q * Q ) 0 . ☐
Lemma 6.
For a fixed Q , J w ( q , Q ) is maximized for q = q * , where q * is defined by
q * ( t | s 1 , v 2 ) = s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) t s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) ,
and
p ( s 2 | s 1 , v 2 ) = p ( s 1 , s 2 ) w ( v 2 | s 2 ) s 2 p ( s 1 , s 2 ) w ( v 2 | s 2 ) .
Define U w ( q ) in the following way
U w ( q ) = s 1 , v 2 p ( s 1 , v 2 ) max t s 2 , y p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q * ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 ) ,
where Q * is given in (26), p ( s 1 , v 2 ) and p ( s 2 | s 1 , v 2 ) are marginal distributions of the joint PMF p ( s 1 , s 2 , v 2 , t , y ) = p ( s 1 , s 2 ) w ( v 2 | s 2 ) q ( t | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) . The following lemma will help us to define a termination condition for the algorithm.
Lemma 7.
For every q ( t | s 1 , v 2 ) the function U w ( q ) is an upper bound on C w , 2 l b and converges to C 2 , w l b for a large enough number of iterations.

5.2. Semi-Iterative Algorithm

The the algorithm for finding C 2 l b ( R ) is brought in Algorithm 1. Notice that the result of this algorithm, C 2 ( ϵ , δ , W ) ( R ) , can be arbitrarily close to C 2 l b ( R ) for ϵ 0 , δ 0 and | W | .
Algorithm 1 Numerically calculating C 2 l b ( R )
 1:
Chose ϵ > 0 , δ > 0
 2:
Set R min { R , H ( S 2 | S 1 ) } {the amount of information needed for the encoder to know S 2 given S 1 }
 3:
Set C
 4:
Establish a fine and uniformly spaced grid of legal PMFs w ( v 2 | s 2 ) and name it W
 5:
for all w in W do
 6:
 Compute R w using
R w = I ( V 2 ; S 2 ) I ( V 2 ; S 1 )
 7:
if R ϵ R w R then
 8:
  Set Q ( t | y , s 2 , v 2 ) to be a uniform distribution over { 1 , 2 , , | T | } , where T is the alphabet of t.  i.e., Q ( t | y , s 2 , v 2 ) = 1 | T | , t , y , s 2 , v 2
 9:
  repeat
10:
    Set q ( t | s 1 , v 2 ) q * ( t | s 1 , v 2 ) using
q * ( t | s 1 , v 2 ) = s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) t s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 )
11:
    Set ( Q ( t | y , s 2 , v 2 ) Q * ( t | y , s 2 , v 2 ) using
Q * ( t | y , s 2 , v 2 ) = s 1 p ( s 1 , s 2 , v 2 , t , y ) s 1 , t p ( s 1 , s 2 , v 2 , t , y )
12:
    Compute J w ( q , Q ) using
J w ( q , Q ) = s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 , v 2 , t , y ) log Q ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 )
13:
    Compute U w ( q ) using
U w ( q ) = s 1 , v 2 p ( s 1 , v 2 ) max t s 2 , y p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q * ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 )
14:
  until U w ( q ) J ( q , Q ) < δ
15:
  if C J w ( q , Q ) then
16:
    Set C J w ( q , Q )
17:
  end if
18:
end if
19:
end for
20:
if C < 0 then {there is no PMF w ( v 2 | s 2 ) W such that R w is ϵ -close to R from below}
21:
go to line 4 and make the grid finer
22:
end if
23:
Declare C 2 l b ( ϵ , δ , W ) ( R ) = C

6. Open Problems

In this section, we discuss the generalization of the channel capacity and the rate-distortion problems that we presented in Section 3. We now consider the cases where the encoder and the decoder are simultaneously informed with a rate-limited description of both the ESI and the DSI, as illustrated in Figure 8. A lower bound on the capacity and an upper bound on the rate-distortion are suggested. Achievability schemes for the presented bounds can be easily derived using the same techniques that we used in the proofs for Theorems 1 and 2, and, hence, are omitted. We were unable to prove that the suggested bounds are tight, nor did we encounter any other such proofs in the published literature; therefore, we believe these problems to be open.

6.1. A Lower Bound on the Capacity of a Channel with Two-Sided Increased Partial Side Information

Consider the channel illustrated in Figure 8, where ( S 1 , i , S 2 , i ) i . i . d . p ( s 1 , s 2 ) . The encoder is informed with the ESI ( S 1 n ) and rate-limited DSI and the decoder is informed with the DSI ( S 2 n ) and rate-limited ESI. An ( n , 2 n R , 2 n R 1 , 2 n R 2 ) code for the discussed channel consists of three encoding maps:
f v 1 : S 1 n { 1 , 2 , , 2 n R 1 } , f v 2 : S 2 n { 1 , 2 , , 2 n R 2 } , f : { 1 , 2 , , 2 n R } × S 1 n × { 1 , 2 , , 2 n R 2 } X n ,
and a decoding map:
g : Y n × S 2 n × { 1 , 2 , , 2 n R 1 } { 1 , 2 , , 2 n R } .
Fact 1: The channel capacity, C 12 * , of this channel coding setup is bounded from below as follows:
C 12 * max p ( v 1 | s 1 ) p ( v 2 | s 2 ) p ( u | s 1 , v 1 , v 2 ) p ( x | u , s 1 , v 1 , v 2 ) s . t . R 1 I ( V 1 ; S 1 ) I ( V 1 ; Y , S 2 , V 2 ) R 2 I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) I ( U ; Y , S 2 | V 1 , V 2 ) I ( U ; S 1 | V 1 , V 2 ) ,
for some joint distribution p ( s 1 , s 2 , v 1 , v 2 , u , x , y ) and U , V 1 and V 2 are some auxiliary random variables.
The proof for the achievability follows closely the proofs given in Appendix B and, therefore, is omitted.

6.2. An Upper Bound on the Rate-Distortion with Two-Sided Increased Partial Side Information

Consider the rate-distortion problem illustrated in Figure 9, where the source X and the side information S 1 , S 2 are distributed ( X i , S 1 , i , S 2 , i ) i . i . d . p ( x , s 1 , s 2 ) . The encoder is informed with the ESI ( S 1 n ) and rate-limited DSI and the decoder is informed with the DSI ( S 2 n ) and rate-limited ESI. An ( n , 2 n R , 2 n R 1 , 2 n R 2 , D ) code for the discussed rate-distortion problem consists of three encoding maps:
f v 1 : S 1 n { 1 , 2 , , 2 n R 1 } , f v 2 : S 2 n { 1 , 2 , , 2 n R 2 } , f : X n × S 1 n × { 1 , 2 , , 2 n R 2 } { 1 , 2 , , 2 n R } ,
and a decoding map:
g : { 1 , 2 , , 2 n R } × S 2 n × { 1 , 2 , , 2 n R 1 } X ^ n .
Fact 2: For a given distortion, D, and a given distortion measure, d ( X , X ^ ) : X × X ^ R + , the rate-distortion function R 12 * ( D ) of this setup is bounded from above as follows:
R 12 * ( D ) min p ( v 1 | s 1 ) p ( v 2 | s 2 ) p ( u | x , s 1 , v 1 , v 2 ) p ( x ^ | u , s 2 , v 1 , v 2 ) s . t . R 1 I ( V 1 ; S 1 ) I ( V 1 ; S 2 , V 2 ) R 2 I ( V 2 ; S 2 ) I ( V 2 ; X , S 1 , V 1 ) I ( U ; X , S 1 | V 1 , V 2 ) I ( U ; S 2 | V 1 , V 2 ) ,
for some joint distribution p ( x , s 1 , s 2 , v 1 , v 2 , u , x ^ ) where E 1 n i = 1 n d ( X i , X ^ i ) D and U , V 1 and V 2 are some auxiliary random variables.
The proof for the achievability follows closely the proofs given in Appendix C and, therefore, is omitted.

Acknowledgments

Avihay Sadeh-Shirazi, Uria Basher and Haim Permuter were supported in part by the Israel Science Foundation and in part by the European Research Council under the EuropeanUnion’s Seventh Framework Programme (FP7/2007-2013)/ERC under Grant 337752.

Author Contributions

Avihay Sadeh-Shirazi and Haim Permuter conceived and designed the study; Avihay Sadeh-Shirazi conducted the research with the aid and supervision of Hain Permuter. Avihay Sadeh-Shirazi wrote the paper with the contribution of Uria Basher. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Duality of the Converse of the Gelfand–Pinsker Theorem and the Wyner-Ziv Theorem

In this appendix, we provide proofs of the converse of the Gelfand–Pinsker capacity and the converse of the Wyner-Ziv rate in a dual way.
Entropy 19 00467 i001
where
Δ = i = 1 n I ( Y i 1 ; S i | W , S i + 1 n ) , Δ = i = 1 n I ( X i 1 ; S i | T , S i + 1 n ) , Δ * = i = 1 n I ( S i + 1 n ; Y i | W , Y i 1 ) , Δ * = i = 1 n I ( S i + 1 n ; X i | T , X i 1 ) , ( a ) follows   from   Fan o s   inequality ( a ) follows   from   Fan o s   inequality aand   from   that   fact   that   W   is and   from   the   fact   that   T   is independent   of   S n , independent   of   S n , ( b ) follows   the   fact   that   S i   is ( b ) follows   from   the   fact   that   S i   is independent   of   S i + 1 n . independent   of   S i + 1 n   and   that   X i is   independent   of   X i 1 .
By substituting the output Y and the input X in the channel capacity theorem with the input X and the output X ^ in the rate-distortion theorem, respectively, we can observe duality in the converse proofs of the two theorems.

Appendix B. Proof of Theorem 1

In this section, we provide the proofs for Theorem 1, Cases 2 and 2 C . The results for Case 1, where the encoder is informed with ESI and the decoder is informed with increased DSI, can be derived directly from [1] (Section IV). In [1], Steinberg considered the case where the encoder is fully informed with the ESI and the decoder is informed with a rate-limited description of the ESI. Therefore, by considering the DSI, S 2 n , to be a part of the channel’s output, we can apply Steinberg’s result on the channel depicted in Case 1. For this reason, the proof for this case is omitted.

Appendix B.1. Proof of Theorem 1, Case 2

Channel capacity Case 2 is presented in Figure A1. The proof of the lower bound, C 2 l b , is performed in the following way: for the description of the DSI, S 2 , at a rate R we use a Wyner-Ziv coding scheme where the source is S 2 and the side information is S 1 . Then, for the channel coding, we use a Gelfand–Pinsker coding scheme where the state information at the encoder is S 1 , S 2 is a part of the channel’s output and the rate-limited description of S 2 is side information at both the encoder and the decoder. Notice that I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 , | V 2 ) = I ( U ; Y , S 2 , V 2 ) I ( U ; S 1 , V 2 ) and that, since the Markov chain V 2 S 2 S 1 holds, we can also write R I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) . We make use of these expressions in the following proof.
Figure A1. Channel capacity: Case 2. Lower bound: C 2 l b = max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) , where the maximization is over all joint PMFs p ( s 1 , s 2 , v 2 , u , x , y ) that maintain the Markov relations U ( S 1 , V 2 ) S 2 and V 2 S 2 S 1 and the constraint R I ( V 2 ; S 2 | S 1 ) . Upper bounds: C 2 u b 1 is the result of the same expressions as for the lower bound, except that the maximization is taken over all PMFs that maintain the Markov chain U ( S 1 , V 2 ) S 2 , and C 2 u b 2 is the result of the same expressions as for the lower bound, except that this time the maximization is taken over all PMFs that maintain V 2 S 2 S 1 .
Figure A1. Channel capacity: Case 2. Lower bound: C 2 l b = max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) , where the maximization is over all joint PMFs p ( s 1 , s 2 , v 2 , u , x , y ) that maintain the Markov relations U ( S 1 , V 2 ) S 2 and V 2 S 2 S 1 and the constraint R I ( V 2 ; S 2 | S 1 ) . Upper bounds: C 2 u b 1 is the result of the same expressions as for the lower bound, except that the maximization is taken over all PMFs that maintain the Markov chain U ( S 1 , V 2 ) S 2 , and C 2 u b 2 is the result of the same expressions as for the lower bound, except that this time the maximization is taken over all PMFs that maintain V 2 S 2 S 1 .
Entropy 19 00467 g010
Achievability: (Channel capacity Case 2—Lower bound). Given ( S 1 , i , S 2 , i ) i.i.d. p ( s 1 , s 2 ) and the memoryless channel p ( y | x , s 1 , s 2 ) , fix p ( s 1 , s 2 , v 2 , u , x , y ) = p ( s 1 , s 2 ) p ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) p ( y | x , s 1 , s 2 ) , where x = f ( u , s 1 , v 2 ) (i.e., p ( x | u , s 1 , v 2 ) can get the values 0 or 1).
Codebook generation and random binning
  • Generate a codebook C v of 2 n ( I ( V 2 ; S 2 ) ) + 2 ϵ sequences V 2 n independently using i . i . d . p ( v 2 ) . Label them v 2 n ( k ) , where k 1 , 2 , , 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) , and randomly assign each sequence v 2 n ( k ) a bin number b v v 2 n ( k ) in the set 1 , 2 , , 2 n R .
  • Generate a codebook C u of 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) sequences U n independently using i . i . d . p ( u ) . Label them u n ( l ) , l 1 , 2 , , 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) , and randomly assign each sequence a bin number b u u n ( l ) in the set 1 , 2 , , 2 n R .
Reveal the codebooks and the content of the bins to all encoders and decoders.
Encoding
  • State Encoder: Given the sequence S 2 n , search the codebook C v and identify an index k such that v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) . If such a k is found, stop searching and send the bin number j = b v v 2 n ( k ) . If no such k is found, declare an error.
  • Encoder: Given the message W, the sequence S 1 n and the index j, search the codebook C v and identify an index k such that v 2 n ( k ) , S 1 n T ϵ ( n ) ( V 2 , S 1 ) . If no such k is found or there is more than one such index, declare an error. If a unique k, as defined, is found, search the codebook C u and identify an index l such that u n ( l ) , S 1 n , v 2 n ( k ) T ϵ ( n ) ( U , S 1 , V 2 ) and b u u n ( l ) = W . If a unique l, as defined, is found, transmit x i = f u i ( l ) , S 1 , i , v 2 , i ( k ) , i = 1 , 2 , , n . Otherwise, if there is no such l or there is more than one, declare an error.
Decoding
Given the sequences Y n , S 2 n and the index k, search the codebook C u and identify an index l such that u n ( l ) , Y n , S 2 n , v 2 n ( k ) T ϵ ( n ) ( U , Y , S 2 , V 2 ) . If a unique l, as defined, is found, declare the message W ^ to be the bin index where u n ( l ) is located, i.e., W ^ = b u u n ( l ) . Otherwise, if no such l is found or there is more than one, declare an error.
Analysis of the probability of error
Without loss of generality, let us assume that the message W = 1 was sent and the indexes that correspond with the given W = 1 , S 1 n , S 2 n are ( k = 1 , l = 1 and j = 1 ) ; i.e., v 2 n ( 1 ) corresponds with S 2 n , b v v 2 n ( 1 ) = 1 , u n ( 1 ) is chosen according to W = 1 , S 1 n , v 2 n ( 1 ) and b u u n ( 1 ) = 1 .
Define the following events:
E 1 : = v 2 n ( k ) C v , v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) E 2 : = v 2 n ( 1 ) , S 1 n T ϵ ( n ) ( V 2 , S 1 ) E 3 : = k 1   such   that   b v v 2 n ( k ) = 1   and   v 2 n ( k ) , S 1 n T ϵ ( n ) ( V 2 , S 1 ) E 4 : = u n ( l ) C u   such   that   b u u n ( l ) = 1 , u n ( l ) , S 1 n , v 2 n ( 1 ) T ϵ ( n ) ( U , S 1 , V 2 ) E 5 : = u n ( 1 ) , Y n , S 2 n , v 2 n ( 1 ) T ϵ ( n ) ( U , Y , S 2 , V 2 ) E 6 : = l 1   such   that   u n ( l ) , Y n , S 2 n , v 2 n ( 1 ) T ϵ ( n ) ( U , Y , S 2 , V 2 )
The probability of error P e ( n ) is upper bounded by P e n P ( E 1 ) + P ( E 2 | E 1 c ) + P ( E 3 | E 1 c , E 2 c ) + P ( E 4 | E 1 c , E 2 c , E 3 c ) + P ( E 5 | E 1 c , , E 4 c ) + P ( E 6 | E 1 c , , E 5 c ) . Using standard arguments, and assuming that ( S 1 n , S 2 n ) T ϵ ( n ) ( S 1 , S 2 ) and that n is large enough, we can state that
  •  
    P ( E 1 ) = Pr v 2 n ( k ) C v v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) = k = 1 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) Pr v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) = k = 1 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) 1 Pr v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) 1 2 n ( I ( V 2 ; S 2 ) + ϵ ) 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) e 2 n ( I ( V 2 ; S 2 ) + ϵ ) 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) ( A 3 ) = e 2 n ϵ .
    The probability that there is no v 2 n ( k ) in C v such that v 2 n ( k ) , S 2 n is strongly jointly typical is exponentially small provided that | C v | 2 n ( I ( V 2 ; S 2 ) + ϵ ) . This follows from the standard rate-distortion argument that 2 n I ( V 2 ; S 2 ) v 2 n ’s “cover” S 2 n , therefore P ( E 1 ) 0 .
  • By the Markov lemma [31], since ( S 1 n , S 2 n ) are strongly jointly typical, S 2 n , v 2 n ( 1 ) are strongly jointly typical and the Markov chain S 1 S 2 V 2 holds, then S 1 n , S 2 n , v 2 n ( 1 ) are strongly jointly typical with high probability. Therefore, P ( E 2 | E 1 c ) 0 .
  •  
    ( A 4 ) P ( E 3 | E 1 c , E 2 c ) = Pr { v 2 n ( k 1 ) C v b v v 2 n ( k ) = 1 v 2 n ( k ) , S 1 n T ϵ ( n ) ( V 2 , S 1 ) } ( A 5 ) v 2 n ( k 1 ) C v b v v 2 n ( k ) = 1 Pr v 2 n ( k ) , S 1 n T ϵ ( n ) ( V 2 , S 1 ) ( A 6 ) v 2 n ( k 1 ) C v b v v 2 n ( k ) = 1 2 n ( I ( V 2 ; S 1 ) ϵ ) ( A 7 ) = 2 n ( I ( V 2 ; S 2 ) + 2 ϵ R ) 2 n ( I ( V 2 ; S 1 ) ϵ ) ( A 8 ) = 2 n ( I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) + 3 ϵ R ) .
    The probability that there is another index k , k 1 , such that v 2 n ( k ) is in bin number 1 and that is strongly jointly typical with S 1 n is bounded by the number of v 2 n ( k ) ’s in the bin times the probability of joint typicality. Therefore, if the number of bins R > I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) + 3 ϵ then P ( E 3 | E 1 c , E 2 c ) 0 .
  • We use here the same argument we used for P ( E 1 ) ; by the covering lemma, we can state that the probability that there is no u n ( l ) in bin number 1 that is strongly jointly typical with S 1 n , v 2 n ( 1 ) tends to zero for large enough n if the average number of u n ( l ) ’s in each bin is greater than 2 n ( I ( U ; S 1 , V 2 ) + ϵ ) ; i.e., | C u | / 2 n R > 2 n ( I ( U ; S 1 , V 2 ) + ϵ ) . This also implies that in order to avoid an error the number of words one should use is R < I ( U ; Y , S 2 , V 2 ) I ( U ; S 1 , V 2 ) 3 ϵ , where the last expression also equals I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) 3 ϵ .
  • As we argued for P ( E 2 | E 1 c ) , since X n , u n ( 1 ) , S 1 n , v 2 n ( 1 ) is strongly jointly typical, Y n , X n , S 1 n , S 2 n is strongly jointly typical and the Markov chain ( U , V 2 ) ( X , S 1 , S 2 ) Y holds, then, by the Markov lemma, u n ( 1 ) , Y n , S 2 n , v 2 n ( 1 ) is strongly jointly typical with high probability, i.e., P ( E 5 | E 1 c , , E 4 c ) 0 .
  •  
    P ( E 6 | E 1 c , , E 5 c ) = Pr { u n ( l 1 ) C u u n ( l ) , Y n , S 2 n , v 2 n ( 1 ) T ϵ ( n ) ( U , Y , S 2 , V 2 ) } l = 2 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) Pr u n ( l ) , Y n , S 2 n , V 2 n T ϵ ( n ) ( U , Y , S 2 , V 2 ) l = 2 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) 2 n ( I ( U ; Y , S 2 , V 2 ) ϵ ) 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) 2 n ( I ( U ; Y , S 2 , V 2 ) ϵ ) ( A 9 ) = 2 n ϵ .
    The probability that there is another index l , l 1 , such that u n ( l ) is strongly jointly typical with Y n , S 2 n , v 2 n ( 1 ) is bounded by the total number of u n ’s times the probability of joint typicality. Therefore, taking | C u | 2 n ( I ( U ; Y , S 2 , V 2 ) 2 ϵ ) assures us that P ( E 6 | E 1 c , , E 5 c ) 0 . This follows the standard channel capacity argument that one can distinguish at most 2 n I ( U ; Y , S 2 , V 2 ) different u n ( l ) ’s given any typical member of Y n × S 2 n × V 2 n .
This shows that for rates R and R as described and for large enough n, the error events are of arbitrarily small probability. This concludes the proof of the achievability and the lower bound on the capacity of Case 2.
Converse: (Channel capacity Case 2—Upper bound). We first prove that it is possible to bound the capacity from above by using two random variables, U and V, that maintain the Markov chain U ( S 1 , V 2 ) S 2 (that is C 2 u b 1 ). Then, we prove that it is also possible to upper-bound the capacity by using U and V that maintain the Markov relation V 2 S 2 S 1 (that is C 2 u b 2 ).
Fix the rates R and R and a sequence of codes ( n , 2 n R , 2 n R ) that achieve the capacity. By Fano’s inequality, H ( W | Y n , S 2 n ) n ϵ n , where ϵ n 0 as n . Let T 2 = f v ( S 2 n ) , and define V 2 , i = ( T 2 , Y i 1 , S 1 , i + 1 n , S 2 i 1 ) , U i = W ; hence, the Markov chain U i ( S 1 , i , V 2 , i ) S 2 , i is maintained. The proof for this follows.
p ( u i | s 1 , i , v 2 , i , s 2 , i ) = p ( w | s 1 , i , t 2 , y i 1 , s 1 , i + 1 n , s 2 i 1 , s 2 , i ) = x i 1 , s 1 i 1 p ( w , x i 1 , s 1 i 1 | s 1 , i , t 2 , y i 1 , s 1 , i + 1 n , s 2 i 1 , s 2 , i ) = x i 1 , s 1 i 1 p ( s 1 i 1 | t 2 , y i 1 , s 1 , i n , s 2 i 1 ) p ( x i 1 | t 2 , y i 1 , s 1 n , s 2 i 1 ) p ( w | x i 1 , t 2 , y i 1 , s 1 n , s 2 i 1 ) ( A 10 ) = p ( w | t 2 , y i 1 , s 1 , i + 1 n , s 2 i 1 , s 1 , i ) .
Next, consider
n R H ( T 2 ) H ( T 2 | S 1 n ) H ( T 2 | S 1 n , S 2 n ) = I ( T 2 ; S 2 n | S 1 n ) = H ( S 2 n | S 1 n ) H ( S 2 n | T 2 , S 1 n ) = i = 1 n H ( S 2 , i | S 1 n , S 2 i 1 ) H ( S 2 , i | T 2 , S 1 n , S 2 i 1 ) = ( a ) i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | T 2 , S 1 n , S 2 i 1 , Y i 1 ) = ( b ) i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | T 2 , S 1 , i + 1 n , S 2 i 1 , Y i 1 , S 1 , i ) = i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | V 2 , i , S 1 , i ) ( A 11 ) = i = 1 n I ( S 2 , i ; V 2 , i | S 1 , i ) ,
where (a) follows from the fact that S 2 , i is independent of ( S 1 i 1 , S 1 , i + 1 n , S 2 i 1 ) given S 1 , i , and the fact that Y i 1 is independent of S 2 , i given ( T 2 , S 1 n , S 2 i 1 ) (the proof for this follows) and (b) follows from the fact that conditioning reduces entropy.
p ( y i 1 | t 2 , s 1 n , s 2 i 1 , s 2 , i ) = x n , w p ( y i 1 , x n , w | t 2 , s 1 n , s 2 i 1 , s 2 , i ) = x n , w p ( w ) p ( x n | w , t 2 , s 1 n ) p ( y i 1 | x i 1 , s 1 i 1 , s 2 i 1 ) ( A 12 ) = p ( y i 1 | t 2 , s 1 n , s 2 i 1 ) ,
where we used the facts that W is independent of ( T 2 , S 1 n , S 2 , i n ) , X n is a function of ( W , T 2 , S 1 n ) and that the channel is memoryless; i.e., Y i 1 is independent of ( W , T 2 , S 1 , i n , S 2 , i n ) given ( X i 1 , S 1 i 1 , S 2 i 1 ) . We continue the proof of the converse by considering the following set of inequalities:
n R = H ( W ) H ( W | T 2 ) H ( W | T 2 , Y n , S 2 n ) + n ϵ n = I ( W ; Y n , S 2 n | T 2 ) + n ϵ n = i = 1 n I ( W ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) + n ϵ n = ( b ) i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i + 1 n ; Y i , S 2 , i | W , T 2 , Y i 1 , S 2 i 1 ) ] + n ϵ n = ( c ) i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i ; Y i 1 , S 2 i 1 | W , T 2 , S 1 , i + 1 n ) ] + n ϵ n = i = 1 n [ I ( W ; Y i , S 2 , i | T 2 , Y i 1 , S 1 , i + 1 n , S 2 i 1 ) I ( S 1 , i ; W | T 2 , Y i 1 , S 1 , i + 1 n , S 2 i 1 ) ] ( A 13 ) + Δ Δ * + n ϵ n ,
where
Δ = i = 1 n I ( S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) ,
Δ * = i = 1 n I ( S 1 , i ; Y i 1 , S 2 i 1 | T 2 , S 1 , i + 1 n ) ,
(b) follows from the mutual information properties and (c) follows from the Csiszár sum identity.
By using the Csiszár sum on (A14) and (A15), we get
Δ = Δ * ,
and, therefore, from (A11) and (A13)
R 1 n i = 1 n I ( S 2 , i ; V 2 , i | S 1 , i )
R ϵ n 1 n i = 1 n I ( U i ; Y i , S 2 , i | V 2 , i ) I ( U i ; S 1 , i | V 2 , i ) .
Using the convexity of R and Jensen’s inequality, the standard time sharing argument for R and the fact that ϵ n 0 as n , we can conclude that
R I ( V 2 ; S 2 | S 1 ) ,
R I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) ,
where U and V maintain the Markov chain U ( S 1 , V 2 ) S 2 .
We now proceed to prove that it is possible to upper-bound the capacity of Case 2 by using two random variables, U and V, that maintain the Markov chain V 2 S 2 S 1 . Fix the rates R and R and a sequence of codes ( n , 2 n R , 2 n R ) that achieve the capacity. By Fano’s inequality, H ( W | Y n , S 2 n ) n ϵ n , where ϵ n 0 as n . Let T 2 = f v ( S 2 n ) and define V 2 , i = ( T 2 , S 2 i 1 ) , U i = ( W , Y i 1 , S 1 , i + 1 n ) . The Markov chain V 2 , i S 2 , i S 1 , i is maintained. Then,
n R H ( T 2 ) = ( a ) i = 1 n H ( S 2 , i | S 1 n , S 2 i 1 ) H ( S 2 , i | T 2 , S 1 n , S 2 i 1 ) = ( b ) i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | T 2 , S 1 , i , S 1 , i + 1 n , S 2 i 1 ) i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | T 2 , S 1 , i , S 2 i 1 ) = i = 1 n H ( S 2 , i | S 1 , i ) H ( S 2 , i | V 2 , i , S 1 , i ) ( A 21 ) = i = 1 n I ( S 2 , i ; V 2 , i | S 1 , i ) ,
where ( a ) follows from the same reasoning as in (A11), and ( b ) follows from the fact that S 2 , i is independent of ( S 1 i 1 , S 1 , i + 1 n , S 2 i 1 ) given S 1 , i , and the fact that ( Y i 1 , S 1 i 1 ) is independent of S 2 , i given ( T 2 , S 1 , i n , S 2 i 1 ) ; the proof for this follows.
p ( y i 1 , s 1 i 1 | t 2 , s 1 , i n , s 2 i 1 , s 2 , i ) = x n , w p ( y i 1 , s 1 i 1 , x n , w | t 2 , s 1 , i n , s 2 i 1 , s 2 , i ) = x n , w p ( w ) p ( s 1 i 1 | s 2 i 1 ) p ( x n | w , t 2 , s 1 n ) p ( y i 1 | x i 1 , s 1 i 1 , s 2 i 1 ) ( A 22 ) = p ( y i 1 , s 1 i 1 | t 2 , s 1 , i n , s 2 i 1 ) ,
where we used the facts that W is independent of ( T 2 , S 1 , i n , S 2 , i n ) , S 1 i 1 is independent of ( T 2 , S 1 , i n , S 2 , i n ) given S 2 i 1 , X n is a function of ( W , T 2 , S 1 n ) and that the channel is memoryless; i.e., Y i 1 is independent of ( W , T 2 , S 1 , i n , S 2 , i n ) given ( X i 1 , S 1 i 1 , S 2 i 1 ) .
In order to complete our proof, we need the following lemma.
Lemma A1.
The following inequality holds:
i = 1 n I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n | T 2 , S 2 i 1 ) i = 1 n I ( S 1 , i ; W , Y i 1 , S 2 i 1 | T 2 , S 1 , i + 1 n ) .
Proof. 
Notice that
i = 1 n I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n | T 2 , S 2 i 1 ) = i = 1 n I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n , S 2 i 1 | T 2 ) I ( S 1 , i ; S 2 i 1 | T 2 )
and that
i = 1 n I ( S 1 , i ; W , Y i 1 , S 2 i 1 | T 2 , S 1 , i + 1 n ) = i = 1 n I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n , S 2 i 1 | T 2 ) I ( S 1 , i ; S 1 , i + 1 n | T 2 ) .
Therefore, it is enough to show that i = 1 n I ( S 1 , i ; S 2 i 1 | T 2 ) i = 1 n I ( S 1 , i ; S 1 , i + 1 n | T 2 ) holds in order to prove the lemma. Consider
i = 1 n I ( S 1 , i ; S 1 , i + 1 n | T 2 ) i = 1 n I ( S 1 , i ; S 2 i 1 | T 2 ) = i = 1 n H ( S 1 , i | T 2 , S 1 , i + 1 n ) H ( S 1 , i | T 2 , S 2 i 1 ) = i = 1 n H ( S 1 n | T 2 ) H ( S 1 , i | T 2 , S 2 i 1 ) = i = 1 n H ( S 1 , i | T 2 , S 1 i 1 ) H ( S 1 , i | T 2 , S 2 i 1 ) ( A 26 ) ( a ) 0 ,
where ( a ) follows from the fact that the Markov chain S 1 , i ( T 2 , S 2 i 1 ) ( T 2 , S 1 i 1 ) holds and from the data processing inequality. This completes the proof of the lemma. ☐
We continue the proof of the converse by considering the following set of inequalities:
n R = H ( W ) H ( W | T 2 ) H ( W | T 2 , Y n , S 2 n ) + n ϵ n = I ( W ; Y n , S 2 n | T 2 ) + n ϵ n = i = 1 n I ( W ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) + n ϵ n = ( a ) i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i + 1 n ; Y i , S 2 , i | W , T 2 , Y i 1 , S 2 i 1 ) ] + n ϵ n = ( b ) i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i ; Y i 1 , S 2 i 1 | W , T 2 , S 1 , i + 1 n ) ] + n ϵ n = i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i ; W , Y i 1 , S 2 i 1 | T 2 , S 1 , i + 1 n ) ] + n ϵ n ( c ) i = 1 n [ I ( W , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n | T 2 , S 2 i 1 ) ] + n ϵ n i = 1 n [ I ( W , Y i 1 , S 1 , i + 1 n ; Y i , S 2 , i | T 2 , S 2 i 1 ) I ( S 1 , i ; W , Y i 1 , S 1 , i + 1 n | T 2 , S 2 i 1 ) ] + n ϵ n ( A 27 ) = i = 1 n I ( U i ; Y i , S 1 , i + 1 n | V 2 , i ) I ( U i ; S 1 , i | V 2 , i ) ,
where ( a ) follows from the mutual information properties, ( b ) follows from the Csiszár sum identity and ( c ) follows from Lemma A1. Therefore,
R 1 n i = 1 n I ( S 2 , i ; V 2 , i | S 1 , i )
R ϵ n 1 n i = 1 n I ( U i ; Y i , S 2 , i | V 2 , i ) I ( U i ; S 1 , i | V 2 , i ) .
Using the convexity of R and Jensen’s inequality, the standard time sharing argument for R and the fact that ϵ n 0 as n , we can conclude that
R I ( V 2 ; S 2 | S 1 ) ,
R I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) ,
where the Markov chain V 2 S 2 S 1 holds. Therefore, we can conclude that the expression given in (12) is an upper-bound to any achievable rate. This concludes the proof of the upper-bound and the proof of Theorem 1 Case 2.

Appendix B.2. Proof of Theorem 1, Case 2C

Channel capacity Case 2 C is illustrated in Figure A2. For describing the DSI, S 2 , with a rate R we use the standard rate-distortion coding scheme. Then, for the channel coding we use the Shannon strategy [4] coding scheme where the channel’s causal state information at the encoder is S 1 , S 2 is a part of the channel’s output and the rate-limited description of S 2 is the side information at both the encoder and the decoder.
Figure A2. Channel capacity: Case 2 with causal ESI. C 2 C = max I ( U ; Y , S 2 | V 2 ) , where the maximization is over all PMFs p ( v 2 | s 2 ) p ( u | v 2 ) p ( x | u , s 1 , v 2 ) such that R I ( V 2 ; S 2 ) .
Figure A2. Channel capacity: Case 2 with causal ESI. C 2 C = max I ( U ; Y , S 2 | V 2 ) , where the maximization is over all PMFs p ( v 2 | s 2 ) p ( u | v 2 ) p ( x | u , s 1 , v 2 ) such that R I ( V 2 ; S 2 ) .
Entropy 19 00467 g011
Achievability: (Channel capacity Case 2 C ). Given ( S 1 , i , S 2 , i ) i.i.d. p ( s 1 , s 2 ) , where the ESI is known in a causal way ( S 1 i at time i), and the memoryless channel p ( y | x , s 1 , s 2 ) , fix p ( s 1 , s 2 , v 2 , u , x , y ) = p ( s 1 , s 2 ) p ( v 2 | s 2 ) p ( u | v 2 ) p ( x | u , s 1 , v 2 ) p ( y | x , s 1 , s 2 ) , where x = f ( u , s 1 , v 2 ) (i.e., p ( x | u , s 1 , v 2 ) can get the values 0 or 1).
Codebook generation and random binning
  • Generate a codebook C v of 2 n I ( V 2 ; S 2 ) + 2 ϵ sequences V 2 n independently using i . i . d . p ( v 2 ) . Label them v 2 n ( k ) where k 1 , 2 , , 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) .
  • For each v 2 n ( k ) generate a codebook C u ( k ) of 2 n I ( U ; Y , S 2 | V 2 ) 2 ϵ sequences U n distributed independently according to i . i . d . p ( u | v 2 ) . Label them u n ( w , k ) , where w 1 , 2 , , 2 n ( I ( U ; Y , S 2 | V 2 ) 2 ϵ ) , and associate the sequences u n ( w , · ) with the message W = w .
Reveal the codebooks and the content of the bins to all encoders and decoders.
Encoding
  • State Encoder: Given the sequence S 2 n , search the codebook C v and identify an index k such that v 2 n ( k ) , S 2 n T ϵ ( n ) ( V 2 , S 2 ) . If such a k is found, stop searching and send it. Otherwise, if no such k is found, declare an error.
  • Encoder: Given the message W 1 , 2 , , 2 n ( I ( U ; Y , S 2 | V 2 ) 2 ϵ ) , the index k and S 1 i at time i, identify u n ( W , k ) in the codebook C u ( k ) and transmit x i = f u i ( W , k ) , S 1 , i , v 2 , i ( k ) at any time i { 1 , 2 , , n } . The element x i is the result of a multiplexer with an input signal u i ( W , k ) , v 2 , i ( k ) and a control signal S 1 , i .
Decoding
Given Y n , S 2 n and k, look for a unique index W ^ , associated with the sequence u n ( W ^ , k ) C u ( k ) , such that Y n , S 2 n , u n ( W ^ , k ) T ϵ ( n ) ( Y , U , S 2 | v 2 n ( k ) ) . If a unique such W ^ is found, declare that the sent message was W ^ . Otherwise, if no unique index W ^ exists, declare an error.
Analysis of the probability of error
Without loss of generality, let us assume that the message W = 1 was sent and the index k that correspond with S 2 n is k = 1 ; i.e., v 2 n ( 1 ) corresponds to S 2 n and u n ( 1 , 1 ) is chosen according to W = 1 , v 2 n ( 1 ) .
Define the following events:
E 1 : = v 2 n ( k ) C v , S 2 n , v 2 n ( k ) T ϵ ( n ) ( S 2 , V 2 ) E 2 : = ( u n ( 1 , 1 ) , Y n , S 2 n ) T ϵ ( n ) ( U , Y , S 2 | v 2 n ( 1 ) ) E 3 : = w 1 : u n ( w , 1 ) C u ( 1 ) and u n ( w , 1 ) , Y n , S 2 n T ϵ ( n ) ( U , Y , S 2 | v 2 n ( 1 ) ) .
The probability of error P e ( n ) is upper bounded by P e n P ( E 1 ) + P ( E 2 | E 1 c ) + P ( E 3 | E 1 c , E 2 c ) . Using standard arguments and assuming that ( S 1 n , S 2 n ) T ϵ ( n ) ( S 1 , S 2 ) and that n is large enough, we can state that
  • For each sequence v 2 n C v , the probability that v 2 n is not jointly typical with S 2 n is at most 1 2 n ( I ( V 2 ; S 2 ) + ϵ ) . Therefore, having 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) i . i . d . sequences in C v , the probability that none of those sequences is jointly typical with S 2 n is bounded by
    P ( E 1 ) 1 2 n ( I ( V 2 ; S 2 ) + ϵ ) 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) e 2 n ( I ( V 2 ; S 2 ) + 2 ϵ ) 2 n ( I ( V 2 ; S 2 ) + ϵ ) ( A 32 ) = e 2 n ϵ ,
    where, for every ϵ > 0 , the last line goes to zero as n goes to infinity.
  • The random variable Y n is distributed according to p ( y | x , s 1 , s 2 ) = p ( y | x , s 1 , s 2 , v 2 ) , therefore, having ( S 2 n , v 2 n ( 1 ) ) T ϵ ( n ) ( S 2 , V 2 ) implies that ( Y n , S 2 n , v 2 n ( 1 ) ) T ϵ ( n ) ( Y , S 2 , V 2 ) . Recall that x i = f u i ( 1 , 1 ) , S 1 , i , v 2 ( 1 ) and that U n is generated according to p ( u | v 2 ) ; therefore, ( X n , S 1 n , u n ( 1 , 1 ) , v 2 n ( 1 ) ) is jointly typical. Thus, by the Markov lemma [31], we can state that ( Y n , S 2 n , u n ( 1 , 1 ) , v 2 n ( 1 ) ) T ϵ ( n ) ( Y , S 2 , U , V 2 ) with high probability for a large enough n.
  • Now, the probability for a random U n , such that ( U n , v 2 n ( 1 ) ) T ϵ ( n ) ( U , V 2 ) , to be also jointly typical with ( Y n , S 2 n , v 2 n ( 1 ) ) is upper bounded by 2 n ( I ( U , Y , S 2 | V 2 ) ϵ ) , hence
    P ( E 3 | E 1 c , E 2 c ) 1 < w | C u ( 1 ) | Pr u n ( w , 1 ) , Y n , S 2 n T ϵ ( n ) ( U , Y , S 2 | v 2 n ( 1 ) ) 1 < w | C u ( 1 ) | 2 n ( I ( U , Y , S 2 | V 2 ) ϵ ) 2 n ( I ( U , Y , S 2 | V 2 ) 2 ϵ ) 2 n ( I ( U , Y , S 2 | V 2 ) ϵ ) ( A 33 ) = 2 n ϵ ,
    which goes to zero exponentially fast with n for every ϵ > 0 .
    Therefore, P ϵ ( n ) = P ( W ^ W ) goes to zero as n .
Converse: (Channel capacity case 2 c ). Fix the rates R and R and a sequence of codes ( n , 2 n R , 2 n R ) that achieve capacity. By Fano’s inequality, H ( W | Y n , S 2 n ) n ϵ n , where ϵ n 0 as n . Let T 2 = f v ( S 2 n ) , and define V 2 , i = ( T 2 , Y i 1 , S 2 i 1 ) , U i = W . Then,
n R H ( T 2 ) H ( T 2 ) H ( T 2 | S 2 n ) = I ( T 2 ; S 2 n ) = H ( S 2 n ) H ( S 2 n | T 2 ) = i = 1 n H ( S 2 , i | S 2 i 1 ) H ( S 2 , i | T 2 , S 2 i 1 ) = ( a ) i = 1 n H ( S 2 , i ) H ( S 2 , i | T 2 , S 2 i 1 , Y i 1 ) = i = 1 n I ( S 2 , i ; T 2 , Y i 1 , S 2 i 1 ) ( A 34 ) = i = 1 n I ( S 2 , i ; V 2 , i ) ,
where ( a ) follows from the fact that S 2 , i is independent of S 2 i 1 and the fact that S 2 , i is independent of Y i 1 given ( T 2 , S 2 i 1 ) . The proof for this follows.
p ( y i 1 | t 2 , s 2 i 1 , s 2 , i ) = w , x i 1 , s 1 i 1 p ( y i 1 , w , x i 1 , s 1 i 1 | t 2 , s 2 i 1 , s 2 , i ) = w , x i 1 , s 1 i 1 p ( w ) p ( s 1 i 1 | s 2 i 1 ) p ( x i 1 | w , t 2 , s 1 i 1 ) p ( y i 1 | x i 1 , s 1 i 1 , s 2 i 1 ) ( A 35 ) = p ( y i 1 | t 2 , s 2 i 1 ) ,
where we used the fact that W is independent of ( T 2 , S 2 i 1 , S 2 , i ) , S 1 i 1 is independent of ( T 2 , S 2 , i ) given S 2 i 1 , X i 1 is a function of ( W , T 2 , S 1 i 1 ) and that Y i 1 is independent of ( W , T 2 , S 2 , i ) given ( X i 1 , S 1 i 1 , S 2 i 1 ) . We now continue with the proof of the converse.
n R H ( W ) H ( W | T 2 ) H ( W | T 2 , Y n , S 2 n ) + n ϵ n = I ( W ; Y n , S 2 n | T 2 ) + n ϵ n = i = 1 n I ( W ; Y i , S 2 , i | T 2 , Y i 1 , S 2 i 1 ) + n ϵ n ( A 36 ) = i = 1 n I ( U i ; Y i , S 2 , i | V 2 , i ) + n ϵ n
and therefore, from (A34) and (A36)
R 1 n i = 1 n I ( S 2 , i ; V 2 , i )
R ϵ n 1 n i = 1 n I ( U i ; Y i , S 2 , i | V 2 , i ) .
Using the convexity of R and Jensen’s inequality, the standard time-sharing argument for R and the fact that ϵ n 0 as n , we can conclude that
R I ( V 2 ; S 2 ) ,
R I ( U ; Y , S 2 | V 2 ) .
Notice that the Markov chain V 2 , i S 2 , i S 1 , i holds since ( Y i 1 , S 2 i 1 ) is independent of S 1 , i and T 2 ( S 2 n ) is dependent on S 1 , i only through S 2 , i . Notice also that the Markov chain U i V 2 , i ( S 1 , i , S 2 , i ) holds since
p ( w | t 2 , y i 1 , s 2 i 1 , s 1 , i , s 2 , i ) = x i 1 , s 1 i 1 p ( w , x i 1 , s 1 i 1 | t 2 , y i 1 , s 2 i 1 , s 1 , i , s 2 , i ) = x i 1 , s 1 i 1 p ( s 1 i 1 | t 2 , y i 1 , s 2 i 1 ) p ( x i 1 | t 2 , y i 1 , s 1 i 1 , s 2 i 1 ) p ( w | t 2 , x i 1 , s 1 i 1 ) ( A 41 ) = p ( w | t 2 , y i 1 , s 2 i 1 ) .
This concludes the converse, and the proof of Theorem 1 Case 2 C .

Appendix C. Proof of Theorem 2

In this section, we provide the proof of Theorem 2, Cases 1 and 1 C . Case 2, where the encoder is informed with increased ESI and the decoder is informed with DSI is a special case of [10] for K = 1 and, therefore, the proof for this case is omitted. Following Kaspi’s scheme (Figure A3) for K = 1 , at the first stage, node W sends a description of W with a rate limited to R w , then, after reconstructing W ^ at the Z node, it sends a function of Z and W ^ over to node W with a rate limited to R z . Let S 2 be W in Kaspi’s scheme and ( X , S 1 ) be Z in Kaspi’s scheme. Consider D z = d ( Z i , Z ^ i ) = d ( X , S 1 , i ) , ( X ^ i , S ^ 1 , i ) = d ( X i , X ^ i ) = D . Then, it is apparent that Case 2 of the rate-distortion problems is a special case of Kaspi’s two-way problem for K = 1 .
Figure A3. Kaspi’s two-way source coding scheme. The total rates are R w = k = 1 K R w k and R z = k = 1 K R z k and the expected per-letter distortions are D w = E 1 n i = 1 n d ( W i , W ^ i ) and D z = E 1 n i = 1 n d ( Z i , Z ^ i ) .
Figure A3. Kaspi’s two-way source coding scheme. The total rates are R w = k = 1 K R w k and R z = k = 1 K R z k and the expected per-letter distortions are D w = E 1 n i = 1 n d ( W i , W ^ i ) and D z = E 1 n i = 1 n d ( Z i , Z ^ i ) .
Entropy 19 00467 g012

Appendix C.1. Proof of Theorem 2, Case 1

Rate-distortion Case 1 is presented in Figure A4. We use the Wyner-Ziv coding scheme for the description of the ESI, S 1 , at a rate R , where the source is S 1 and the side information at the decoder is S 2 . Then, to describe the main source, X, with distortion less than or equal to D we use the Wyner-Ziv coding scheme again, where this time, S 2 is the side information at the decoder, S 1 is a part of the source and the rate-limited description of S 1 is the side information at both the encoder and the decoder. Notice that I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) = I ( U ; X , S 1 , V 1 ) I ( U ; S 1 , V 1 ) and that since the Markov chain V 1 S 1 S 2 holds, it is also possible to write R I ( V 1 ; S 1 ) I ( V 1 ; S 2 ) ; we use these expressions in the following proof.
Figure A4. Rate-distortion: Case 1. R 1 ( D ) = min I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) , where the minimization is over all PMFs p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) such that R I ( V 1 ; S 1 | S 2 ) and E d ( X , X ^ ) D .
Figure A4. Rate-distortion: Case 1. R 1 ( D ) = min I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) , where the minimization is over all PMFs p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) such that R I ( V 1 ; S 1 | S 2 ) and E d ( X , X ^ ) D .
Entropy 19 00467 g013
Achievability: (Rate-distortion Case 1). Given ( X i , S 1 , i , S 2 , i ) i . i . d . p ( x , s 1 , s 2 ) and the distortion measure D, fix p ( x , s 1 , s 2 , v 1 , u , x ^ ) = p ( x , s 1 , s 2 ) p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) that satisfies E d ( X , X ^ ) = D and x ^ = f ( u , s 2 , v 1 ) .
Codebook generation and random binning
  • Generate a codebook, C v , of 2 n I ( V 1 ; S 1 ) + 2 ϵ sequences, V 1 n , independently using i . i . d . p ( v 1 ) . Label them v 1 n ( k ) , where k 1 , 2 , , 2 n ( I ( V 1 ; S 1 ) + 2 ϵ ) and randomly assign each sequence v 1 n ( k ) a bin number b v v 1 n ( k ) in the set 1 , 2 , , 2 n R .
  • Generate a codebook C u of 2 n I ( U ; X , S 1 , V 1 ) + 2 ϵ sequences U n independently using i . i . d . p ( u ) . Label them u n ( l ) , where l 1 , 2 , , 2 n ( I ( U ; X , S 1 , V 1 ) + 2 ϵ ) , and randomly and assign each u n ( l ) a bin number b u u n ( l ) in the set 1 , 2 , , 2 n R .
Reveal the codebooks and the content of the bins to all encoders and decoders.
Encoding
  • State Encoder: Given the sequence S 1 n , search the codebook C v and identify an index k such that S 1 n , v 1 n ( k ) T ϵ ( n ) ( S , V 1 ) . If such a k is found, stop searching and send the bin number j = b v v 1 n ( k ) . If no such k is found, declare an error.
  • Encoder: Given the sequences X n , S 1 n and v 1 n ( k ) , search the codebook C u and identify an index l such that X n , S 1 n , v 1 n ( k ) , u n ( l ) T ϵ ( n ) ( X , S 1 , V 1 , U ) . If such an l is found, stop searching and send the bin number w = b u u n ( l ) . If no such l is found, declare an error.
Decoding
Given the bins indices w and j and the sequence S 2 n , search the codebook C v and identify an index k such that S 2 n , v 1 n ( k ) T ϵ ( n ) ( S 2 , V 1 ) and b v v 1 n ( k ) = j . If no such k is found or there is more than one such index, declare an error. If a unique k, as defined, is found, search the codebook C u and identify an index l such that S 2 n , v 1 n ( k ) , u n ( l ) T ϵ ( n ) ( S 2 , V 1 , U ) and b u u n ( l ) = w . If a unique l, as defined, is found, declare X ^ i = f i ( u i n ( l ) , S 2 , i , v 1 , i ( k ) ) , i = 1 , 2 , , n . Otherwise, if there is no such l or there is more than one, declare an error.
Analysis of the probability of error
Without loss of generality, for the following events E 2 , E 3 , E 4 , E 5 and E 6 , assume that v 1 n ( k = 1 ) and b v v 1 n ( k = 1 ) = 1 correspond to the sequences ( X n , S 1 n , S 2 n ) and for the events E 5 and E 6 assume that u n ( l = 1 ) and b u u n ( l = 1 ) = 1 correspond to the same given sequences. Define the following events:
E 1 : = v 1 n ( k ) C v , S 1 n , v 1 n ( k ) T ϵ ( n ) ( S 1 , V 1 ) E 2 : = S 1 n , v 1 n ( 1 ) T ϵ ( n ) ( S 1 , V 1 )   but   S 2 n , v 1 n ( 1 ) T ϵ ( n ) ( S 2 , V 1 ) E 3 : = k 1   such   that   b v v 1 n ( k ) = 1   and   S 2 n , v 1 n ( k ) T ϵ ( n ) ( S 2 , V 1 ) E 4 : = u n ( l ) C u , X n , S 1 n , v 1 n ( 1 ) , u n ( l ) T ϵ ( n ) ( X , S 1 , V 1 , U } E 5 : = X n , S 1 n , v 1 n ( 1 ) , u n ( 1 ) T ϵ ( n ) ( X , S 1 , V 1 , U   but   S 2 n , v 1 n ( 1 ) , u n ( 1 ) T ϵ ( n ) ( S 2 , V 1 , U ) } E 6 : = l 1   such   that   b u u n ( l ) = 1   and   S 2 n , v 1 n ( 1 ) , u n ( l ) T ϵ ( n ) ( S 2 , V 1 , U ) .
The probability of error P e ( n ) is upper bounded by P e n P ( E 1 ) + P ( E 2 | E 1 c ) + P ( E 3 | E 1 c , E 2 c ) + P ( E 4 | E 1 c , E 2 c , E 3 c ) + P ( E 5 | E 1 c , , E 4 c ) + P ( E 6 | E 1 c , E 5 c ) . Using standard arguments and assuming that ( X n , S 1 n , S 2 n ) T ϵ ( n ) ( X , S 1 , S 2 ) and that n is large enough, we can state that
  •  
    P ( E 1 ) = Pr { v 1 n ( k ) C v S 1 n , v 1 n ( k ) T ϵ ( n ) ( S 1 , V 1 ) } k = 1 2 n I ( V 1 ; S 1 ) + ϵ Pr { S 1 n , V 1 n ( k ) T ϵ ( n ) ( S 1 , V 1 ) } e 2 n I ( V 1 ; S 1 ) + 2 ϵ 2 n I ( S 1 ; V 1 ) n ϵ ( A 42 ) = e 2 n ϵ .
    The probability that there is no v 1 n ( k ) in C v such that S 1 n , v 1 n ( k ) is strongly jointly typical is exponentially small provided that | C v | > 2 n I ( S 1 ; V 1 ) + ϵ . This follows from the standard rate-distortion argument that 2 n I ( S 1 ; V 1 ) v 1 n ( k ) s “cover” S 1 n , therefore P ( E 1 ) 0 .
  • By the Markov lemma, since ( S 1 n , S 2 n ) are strongly jointly typical and S 1 n , v 1 n ( 1 ) are strongly jointly typical and the Markov chain V 1 S 1 S 2 holds, then S 1 n , S 2 n , v 1 n ( 1 ) are also strongly jointly typical. Thus, P ( E 2 | E 1 c ) 0 .
  •  
    P ( E 3 ) = Pr { v 1 n ( k 1 ) b v v 1 ( k ) = 1 S 2 n , v 1 n ( k ) T ϵ ( n ) ( S 1 , V 1 ) } v 1 n ( k 1 ) b v v 1 ( k ) = 1 Pr ( S 1 n , v 1 n ( k ) T ϵ ( n ) ( S 1 , V 1 ) } ( A 43 ) 2 n I ( V 1 ; S 1 ) + 2 ϵ R 2 n I ( S 2 ; V 1 ) ϵ .
    The probability that there is another index k , k 1 , such that v 1 n ( k ) is in bin number 1 and that it is strongly jointly typical with S 2 n is bounded by the number of v 1 n ( k ) ’s in the bin times the probability of joint typicality. Therefore, if R > I ( V 1 ; S 1 ) I ( V 1 ; S 2 ) + 3 ϵ then P ( E 3 | E 1 c , E 2 c ) 0 . Furthermore, using the Markov chain V 1 S 1 S 2 , we can see that the inequality can be presented as R > I ( V 1 ; S 1 | S 2 ) + 3 ϵ .
  • We use here the same argument we used for P ( E 1 ) . By the covering lemma we can state that the probability that there is no u n ( l ) in C u that is strongly jointly typical with X n , S 1 n , v 1 n ( k ) tends to 0 as n if R u > I ( U ; X , S 1 , V 1 ) + ϵ . Hence, P ( E 4 | E 1 c , E 2 c , E 3 c ) 0 .
  • Using the same argument we used for P ( E 2 | E 1 c ) , we conclude that P ( E 4 | E 1 c , E 2 c , E 3 c ) 0 .
  • We use here the same argument we used for P ( E 2 | E 1 c ) . Since ( U , X , S 1 V 1 ) are strongly jointly typical, ( X , S 1 , S 2 ) are strongly jointly typical and the Markov chain ( U , V 1 ) ( X , S 1 ) S 2 holds, then ( U , X , S 1 , S 2 , V 1 ) are also strongly jointly typical.
  • The probability that there is another index l , l 1 such that u n ( l ) is in bin number 1 and that it is strongly jointly typical with S 2 n , v 1 n ( 1 ) is exponentially small provided that R I ( U ; X , S 1 , V 1 ) I ( U ; S 2 , V 1 ) + 3 ϵ = I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) + 3 ϵ . Notice that 2 n ( I ( U ; X , S 1 , V 1 ) R ) stands for the average number of sequences u n ( l ) ’s in each bin indexed w for w { 1 , 2 , , 2 n R } .
This shows that for rates R and R as described, and for large enough n, the error events are of arbitrarily small probability. This concludes the proof of the achievability for the source coding Case 1.
Converse: (Rate-distortion Case 1). Fix a distortion measure D, the rates R , R R ( D ) = min I ( U ; X , S 1 | V 1 ) I ( U ; S 2 | V 1 ) = min I ( U ; X , S 1 | S 2 , V 1 ) and a sequence of codes ( n , 2 n R , 2 n R ) such that E 1 n i = 1 n d ( X i , X ^ i ) = D . Let T 1 = f v ( S 1 n ) , T = f ( X n , S 1 n , T ) and define V 1 , i = ( T 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n ) and U i = T . Notice that X ^ i = X ^ i ( T , T 1 , S 2 n ) and, therefore, X ^ i is a function of ( U i , V 1 , i , S 2 , i ) .
n R H ( T 1 ) H ( T 1 | S 2 n ) H ( T 1 | S 1 n , S 2 n ) = I ( T 1 ; S 1 n | S 2 n ) = H ( S 1 n | S 2 n ) H ( S 1 n | T 1 , S 2 n ) = i = 1 n H ( S 1 , i | S 1 , i + 1 n , S 2 n ) H ( S 1 , i | T 1 , S 1 , i + 1 n , S 2 n ) = ( a ) i = 1 n H ( S 1 , i | S 2 , i ) H ( S 1 , i | T 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n , S 2 , i ) = i = 1 n H ( S 1 , i | S 2 , i ) H ( S 1 , i | V 1 , i , S 2 , i ) ( A 44 ) = i = 1 n I ( S 1 , i ; V 1 , i | S 2 , i ) ,
where ( a ) follows from the fact that S 1 , i is independent of ( S 1 , i + 1 n , S 2 i 1 , S 2 , 1 + i n ) given S 2 , i .
n R H ( T ) H ( T | T 1 , S 2 n ) H ( T | T 1 , X n , S 1 n , S 2 n ) = I ( T ; X n , S 1 n | T 1 , S 2 n ) = H ( X n , S 1 n | T 1 , S 2 n ) H ( X n , S 1 n | T , T 1 , S 2 n ) = i = 1 n H ( X i , S 1 , i | T 1 , S 2 n , X i + 1 n , S 1 , i + 1 n ) H ( X i , S 1 , i | T , T 1 , S 2 n , X i + 1 n , S 1 , i + 1 n ) = ( b ) i = 1 n H ( X i , S 1 , i | T 1 , S 1 , i + 1 n , S 2 n ) H ( X i , S 1 , i | T , T 1 , S 2 n , X i + 1 n , S 1 , i + 1 n ) ( c ) i = 1 n H ( X i , S 1 , i | T 1 , S 1 , i + 1 n , S 2 n ) H ( X i , S 1 , i | T , T 1 , S 1 , i + 1 n , S 2 n ) = i = 1 n I ( X i , S 1 , i ; T | T 1 , S 1 , i + 1 n , S 2 n ) = i = 1 n I ( X i , S 1 , i ; U i | V 1 , i , S 2 , i ) = i = 1 n R E d X i , X ^ i ( d ) n R E 1 n i = 1 n d X i , X ^ i ( A 45 ) = n R ( D ) ,
where ( b ) follows from the fact that ( X i , S 1 , i ) is independent of X i + 1 n given ( T 1 , S 1 , i + 1 n , S 2 n ) ; this is because X i + 1 n is independent of ( T 1 , X i , S 1 i ) given ( S 1 , i + 1 n , S 2 , i + 1 n ) , ( c ) follows from the fact that conditioning reduces entropy and ( d ) follows from the convexity of R ( D ) and Jensen’s inequality.
Using also the convexity of R and Jensen’s inequality, we can conclude that
R I ( V 1 ; S 1 | S 2 ) ,
R I ( U ; X , S 1 | V 1 , S 2 ) .
It is easy to verify that ( T 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n ) S 1 , i S 2 , i forms a Markov chain, since T 1 ( S 1 n ) depends on S 2 , i only through S 1 , i . The structure T ( T 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n , X i , S 1 , i ) S 2 , i also forms a Markov chain since S 2 , i contains no information about ( S 1 i 1 , X i 1 , X i + 1 n ) given ( T 1 , S 1 , i n , S 2 i 1 , S 2 , i + 1 n , X i ) and, therefore, contains no information about T ( X n , S 1 n , T 1 ) .
This concludes the converse, and the proof of Theorem 2 Case 1.

Appendix C.2. Proof of Theorem 2, Case 1C

Rate-distortion Case 1 C is illustrated in Figure A5. For describing the ESI, S 1 , with a rate R we use the standard rate-distortion coding scheme. Then, for the main source, X, we use a Weissman-El Gamal [12] coding scheme where the DSI, S 2 , is the causal side information at the decoder, S 1 is a part of the source and the rate-limited description of S 1 is the side information at both the encoder and decoder.
Figure A5. Rate-distortion: Case 1 with causal DSI. R 1 C ( D ) = min I ( U ; X , S 1 | V 1 ) , where the minimization is over all PMFs p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) such that R I ( V 1 ; S 1 ) and E d ( X , X ^ ) D .
Figure A5. Rate-distortion: Case 1 with causal DSI. R 1 C ( D ) = min I ( U ; X , S 1 | V 1 ) , where the minimization is over all PMFs p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) such that R I ( V 1 ; S 1 ) and E d ( X , X ^ ) D .
Entropy 19 00467 g014
Achievability: (Rate-distortion Case 1 C ). Given ( X i , S 1 , i , S 2 , i ) i . i . d . p ( x , s 1 , s 2 ) where the DSI is known in a causal way ( S 2 i in time i) and the distortion measure is D, fix p ( x , s 1 , s 2 , v 1 , u , x ^ ) = p ( x , s 1 , s 2 ) p ( v 1 | s 1 ) p ( u | x , s 1 , v 1 ) p ( x ^ | u , s 2 , v 1 ) that satisfies E d ( X , X ^ ) = D and that x ^ = f ( u , s 2 , v 1 ) .
Codebook generation and random binning
  • Generate a codebook C v of 2 n I ( V 1 ; S 1 ) + 2 ϵ sequences V 1 n independently using i . i . d . p ( v 2 ) . Label them v 1 n ( k ) where k 1 , 2 , , 2 n ( I ( V 1 ; S 1 ) + 2 ϵ ) .
  • For each v 1 n ( k ) generate a codebook C u ( k ) of 2 n I ( U ; X , S 1 | V 1 ) + 2 ϵ sequences U n distributed independently according to i . i . d . p ( u | v 1 ) . Label them u n ( w , k ) , where w 1 , 2 , , 2 n ( I ( U ; X , S 1 | V 1 ) + 2 ϵ ) .
Reveal the codebooks to all encoders and decoders.
Encoding
  • State Encoder: Given the sequence S 1 n , search the codebook C v and identify an index k such that v 1 n ( k ) , S 1 n T ϵ ( n ) ( V 1 , S 1 ) . If such a k is found, stop searching and send it. Otherwise, if no such k is found, declare an error.
  • Encoder: Given X n , S 1 n and the index k, search the codebook C u ( k ) and identify an index w such that u n ( w , k ) , X n , S 1 n T ϵ ( n ) ( U , X , S 1 | v 1 n ( k ) ) . If such an index w is found, stop searching and send it. Otherwise, declare an error.
Decoding
Given the indices w , k and the sequence S 1 i at time i, declare x ^ i = f u i ( w , k ) , S 2 , i , v 1 , i ( k ) .
Analysis of the probability of error
Without loss of generality, let us assume that v 1 n ( 1 ) corresponds to S 1 n and that u n ( 1 , 1 ) corresponds to ( X n , S 1 n , v 1 n ( 1 ) ) .
Define the following events:
E 1 : = v 1 n ( k ) C v , v 1 n ( k ) , S 1 n T ϵ ( n ) ( S 1 , V 1 ) E 2 : = u n ( w , 1 ) C u ( 1 ) , X n , S 1 n , u n ( w , 1 ) T ϵ ( n ) ( X , S 1 , U )
The probability of error P e ( n ) is upper bounded by P e n P ( E 1 ) + P ( E 2 | E 1 c ) . Assuming that ( S 1 n , S 2 n ) T ϵ ( n ) ( S 1 , S 2 ) , we can state that by the standard rate-distortion argument, having more than 2 n ( I ( V 1 ; S 1 ) + ϵ ) sequences v 1 n ( k ) in C v and a large enough n assures us with probability arbitrarily close to 1 that we would find an index k such that v 1 n ( k ) , S 1 n T ϵ ( n ) ( V 1 , S 1 ) . Therefore, P ( E 1 ) 0 as n . Now, if v 1 n ( 1 ) , S 1 n T ϵ ( n ) ( V 1 , S 1 ) , using the same argument, we can also state that having more than 2 n ( I ( U ; X , S 1 | V 1 ) + ϵ ) sequences u n ( w , 1 ) in C u ( 1 ) assures us that P ( E 2 | E 1 c ) 0 as n . This concludes the proof of the achievability.
Converse: (Rate-distortion Case 1 C ). Fix a distortion measure D, the rates R , R R ( D ) = min I ( U ; X , S 1 | V 1 ) and a sequence of codes ( n , 2 n R , 2 n R ) such that E 1 n i = 1 n d ( X i , X ^ i ) = D . Let T 1 = f v ( S 1 n ) , T = f ( X n , S 1 n , T 1 ) and define V 1 , i = ( T 1 , S 1 , i + 1 n ) , U i = T . Notice that X ^ i = X ^ i ( T , T 1 , S 2 i ) , and, therefore, X ^ i is a function of ( U i , V 1 , i , S 2 i ) .
n R H ( T 1 ) H ( V ) H ( T 1 | S 1 n ) = I ( T 1 ; S 1 n ) = H ( S 1 n ) H ( S 1 n | T 1 ) = i = 1 n H ( S 1 , i | S 1 , i + 1 n ) H ( S 1 , i | T 1 , S 1 , i + 1 n ) = ( a ) i = 1 n H ( S 1 , i ) H ( S 1 , i | T 1 , S 1 , i + 1 n ) = i = 1 n H ( S 1 , i ) H ( S 1 , i | V 1 , i ) ( A 48 ) = i = 1 n I ( S 1 , i ; V 1 , i ) ,
where ( a ) follows the fact that S 1 , i is independent of S 1 , i + 1 n .
n R H ( T ) H ( T | T 1 ) H ( T | T 1 , X n , S 1 n ) = I ( T ; X n , S 1 n | T 1 ) = H ( X n , S 1 n | T 1 ) H ( X n , S 1 n | T , T 1 ) = i = 1 n H ( X i , S 1 , i | T 1 , X i + 1 n , S 1 , i + 1 n ) H ( X i , S 1 , i | T , T 1 , X i + 1 n , S 1 , i + 1 n ) = ( b ) i = 1 n H ( X i , S 1 , i | T 1 , S 1 , i + 1 n ) H ( X i , S 1 , i | T , T 1 , X i + 1 n , S 1 , i + 1 n ) ( c ) i = 1 n H ( X i , S 1 , i | T 1 , S 1 , i + 1 n ) H ( X i , S 1 , i | T , T 1 , S 1 , i + 1 n ) = i = 1 n I ( X i , S 1 , i ; T | T 1 , S 1 , i + 1 n ) = i = 1 n I ( X i , S 1 , i ; U i | V 1 , i ) = i = 1 n R E d X i , X ^ i ( d ) n R E 1 n i = 1 n d X i , X ^ i ( A 49 ) = n R ( D )
where ( b ) follows from the fact that ( X i , S 1 , i ) is independent of X i + 1 n given ( T 1 , S 1 , i + 1 n ) , ( c ) follows from the fact that conditioning reduces entropy and ( d ) follows from the convexity of R ( D ) and Jensen’s inequality.
Using also the convexity of R and Jensen’s inequality, we can conclude that
R I ( V 1 ; S 1 ) ,
R I ( U ; X , S 1 | V 1 ) .
It is easy to verify that both Markov chains V 1 , i S 1 , i ( X i , S 2 , i ) and U i ( X i , S 1 , i , V 1 , i ) S 2 , i hold. This concludes the converse, and the proof of Theorem 2 Case 1 C .

Appendix C.3. Proof of Theorem 2, Case 2

Rate-distortion Case 2 (see Figure A6) is a special case of [10] for K = 1 , and hence, the proof is omitted.
Figure A6. Rate distortion: Case 2. R 2 ( D ) = min I ( U ; X , S 1 | V 2 ) I ( U ; S 2 | V 2 ) , where the minimization is over all PMFs p ( v 2 | s 2 ) p ( u | x , s 1 , v 2 ) p ( x ^ | u , s 2 , v 2 ) such that R I ( V 2 ; S 2 ) I ( V 2 ; X , S 1 ) and E d ( X , X ^ ) D .
Figure A6. Rate distortion: Case 2. R 2 ( D ) = min I ( U ; X , S 1 | V 2 ) I ( U ; S 2 | V 2 ) , where the minimization is over all PMFs p ( v 2 | s 2 ) p ( u | x , s 1 , v 2 ) p ( x ^ | u , s 2 , v 2 ) such that R I ( V 2 ; S 2 ) I ( V 2 ; X , S 1 ) and E d ( X , X ^ ) D .
Entropy 19 00467 g015

Appendix D. Proof of Lemma 1

We provide here a partial proof of Lemma 1. In the first part we prove the concavity of C 2 l b ( R ) in R for Case 2, the second part contains the proof that it is enough to take X to be a deterministic function of ( S 1 , V 1 , U ) in order to achieve the capacity C 1 ( R ) for Case 1 and in the third part we prove the cardinality bound for Case 1. The proofs of these three parts for the rest of the cases can be derived using the same techniques and therefore are omitted. The proof of Lemma 2 can also be readily concluded using the techniques we use in this appendix and is omitted as well.
Part 1: We prove here that for Case 2 of the channel capacity problems, the lower bound on the capacity, C 2 l b ( R ) , is a concave function of the state information rate, R . Recall that the expression for C 2 l b is C 2 l b ( R ) = max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) where the maximization is over all probabilities p ( s 1 , s 2 ) p ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | u , s 1 , v 2 ) p ( y | x , s 1 , s 2 ) such that R I ( V 2 ; S 2 | S 1 ) . This means that we want to prove that for any two rates, R ( 1 ) and R ( 2 ) , and for any 0 α 1 and α ¯ = 1 α the capacity maintains C 2 l b α R ( 1 ) + α ¯ R ( 2 ) α C 2 l b ( R ( 1 ) ) + α ¯ C 2 l b ( R ( 2 ) ) . Let ( U ( 1 ) , V 2 ( 1 ) , X ( 1 ) , Y ( 1 ) ) and ( U ( 2 ) , V 2 ( 2 ) , X ( 2 ) , Y ( 2 ) ) be the random variables that meet the conditions on R ( 1 ) and on R ( 2 ) and also achieve C 2 l b ( R ( 1 ) ) and C 2 l b ( R ( 2 ) ) , respectively. Let us introduce the auxiliary random variable Q { 1 , 2 } , independent of S 1 , S 2 , V 2 , U , X and Y, and distributed according to Pr { Q = 1 } = α and Pr { Q = 2 } = α ¯ . Then, consider
α R ( 1 ) + α ¯ R ( 2 ) = α I ( V 2 ( 1 ) ; S 2 ) I ( V 2 ( 1 ) ; S 1 ) + α ¯ I ( V 2 ( 2 ) ; S 2 ) I ( V 2 ( 2 ) ; S 1 ) = ( a ) α I ( V 2 ( 1 ) ; S 2 | Q = 1 ) I ( V 2 ( 1 ) ; S 1 | Q = 1 ) + α ¯ I ( V 2 ( 2 ) ; S 2 | Q = 2 ) I ( V 2 ( 2 ) ; S 1 | Q = 2 ) = ( b ) I ( V 2 ( Q ) ; S 2 | Q ) I ( V 2 ( Q ) ; S 1 | Q ) ( A 52 ) = ( c ) I ( V 2 ( Q ) , Q ; S 2 ) I ( V 2 ( Q ) , Q ; S 1 ) ,
and
α C 2 l b ( R ( 1 ) ) + α ¯ C 2 l b ( R ( 2 ) ) = α I ( U ( 1 ) ; Y ( 1 ) , S 2 | V 2 ( 1 ) ) I ( U ( 1 ) ; S 1 | V 2 ( 1 ) ) + α ¯ I ( U ( 2 ) ; Y ( 2 ) , S 2 | V 2 ( 2 ) ) I ( U ( 2 ) ; S 1 | V 2 ( 2 ) ) ( A 53 ) = ( d ) I ( U ( Q ) ; Y ( Q ) , S 2 | V 2 ( Q ) , Q ) I ( U ( Q ) ; S 1 | V 2 ( Q ) , Q ) ,
where ( a ) , ( b ) , ( c ) and ( d ) all follow from the fact that Q is independent of ( S 1 , S 2 , V 2 , U , X , Y ) and from Q’s probability distribution. Now, let V 2 = ( V 2 ( Q ) , Q ) , U = U ( Q ) , Y = Y ( Q ) and X = X ( Q ) . Then, following from the equalities above, for any two rates R ( 1 ) and R ( 2 ) and for any 0 α 1 , there exists a set of random variables ( U , V 2 , X , Y ) that maintains
α R ( 1 ) + α ¯ R ( 2 ) = I ( V 2 ; S 2 ) I ( V 2 ; S 1 ) ,
and
C 2 l b α R ( 1 ) + α ¯ R ( 2 ) I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) ( A 55 ) = α C 2 l b ( R ( 1 ) ) + α ¯ C 2 l b ( R ( 2 ) ) .
This completes the proof of the concavity of C 2 l b ( R ) in R .
Part 2: We prove here that it is enough to take X to be a deterministic function of ( U , S 1 , V 1 ) in order to maximize I ( U ; Y , S 2 , V 1 ) I ( U ; S 1 , V 1 ) . Fix p ( u , v 1 | s 1 ) . Note that
p ( y , s 2 | u , v 1 ) = x , s 1 p ( s 1 | , u , v 1 ) p ( s 2 | s 1 , v 1 , u ) p ( x | s 1 , s 2 , v 1 , u ) p ( y | x , s 1 , s 2 , v 1 , u ) ( A 56 ) = x , s 1 p ( s 1 | u , v 1 ) p ( s 2 | s 1 ) p ( x | s 1 , v 1 , u ) p ( y | x , s 1 , s 2 )
is linear in p ( x | u , v 1 , s 1 ) . This follows from the fact that fixing p ( u , v 1 | s 1 ) also defines p ( s 1 | u , v 1 ) and from the following Markov chains S 2 S 1 ( V 1 , U ) , X ( S 1 , V 1 , U ) S 2 and Y ( X , S 1 , S 2 ) ( V 1 , U ) . Hence, since I ( U ; Y , S 2 | V 1 ) is convex in p ( y , s 2 | v 1 ) it is also convex in p ( x | u , v 1 , s 1 ) . Noting also that I ( U ; S 1 | V 1 ) is constant given a fixed p ( u , v 1 | s 1 ) , we can conclude that I ( U ; Y , S 2 | V 1 ) I ( U ; S 1 | V 1 ) is convex in p ( x | u , v 1 , s 1 ) and, hence, it gets its maximum at the boundaries of p ( x | u , v 1 , s 1 ) , i.e., when the last is equal 0 or 1. This implies that X can be expressed as a deterministic function of ( U , V 1 , S 1 ) .
Part 3: We prove now the cardinality bound for Theorem 1. First, let us recall the support lemma [32] (p. 310). Let P ( Z ) be the set of PMFs on the set Z , and let the set P ( Z | Q ) P ( Z ) be a collection of PMFs p ( z | q ) on Z indexed by q Q . Let g j , j = 1 , , k , be continuous functions on P ( Z | Q ) . Then, for any Q F Q ( q ) , there exists a finite random variable Q p ( q ) taking at most k values in Q such that
E g j ( p Z | Q ( z | Q ) ) = Q g j ( p Z | Q ( z | q ) ) d F ( q ) ( A 57 ) = q g j ( p Z | q ( z | q ) ) p ( q ) .
We first reduce the alphabet size of V 1 while considering the alphabet size of U to be constant and then we calculate the cardinality of U. Consider the following continuous functions of p ( x , s 1 , s 2 , u | v 1 )
g j = P X S 1 S 2 | V ( j | v 1 ) , j 1 , 2 , , | X | | S 1 | | S 2 | 1 , I ( V 1 ; S 1 ) I ( V 1 ; Y , S 2 ) j = | X | | S 1 | | S 2 | , I ( U ; Y , S 2 | V 1 = v 1 ) I ( U ; S 1 | V 1 = v 1 ) j = | X | | S 1 | | S 2 | + 1 .
Then, by the support lemma, there exists a random variable V 1 with | V 1 | | X | | S 1 | | S 2 | + 1 such that p ( x , s 1 , s 2 ) , I ( V 1 ; S 1 ) I ( V 1 ; Y , S 2 ) and I ( U ; Y , S 2 | V 1 ) I ( U ; S 1 | V 1 ) are preserved. Notice that the probability of U might have changed due to changing V 1 ; we denote the corresponding U as U . Next, for v 1 V 1 and the corresponding probability p ( v 1 ) that we found in the previous step, we consider | X | | S 1 | | S 2 | | V 1 | continuous functions of p ( x , s 1 , s 2 , v 1 | u )
f j = P X S 1 S 2 V 1 | U ( j | u ) j = 1 , 2 , , | X | | S 1 | | S 2 | | V 1 | 1 , I ( U ; Y , S 2 | V 1 ) I ( U ; S 1 | V 1 ) j = | X | | S 1 | | S 2 | | V 1 | .
Thus, there exists a random variable U with | U | | X | | S 1 | | S 2 | | V 1 | such that the mutual information expressions above and all the desired Markov conditions are preserved. Notice that the expression I ( V 1 ; S 1 ) I ( V 1 ; Y , S 2 ) is being preserved since p ( x , s 1 , s 2 , v 1 ) is being preserved.
To conclude, we can bound the cardinality of the auxiliary random variables of Theorem 1 Case 1 by | V 1 | | X | | S 1 | | S 2 | + 1 and | U | | X | | S 1 | | S 2 | | V 1 | | X | | S 1 | | S 2 | | X | | S 1 | | S 2 | + 1 without limiting the generality of the solution. □

Appendix E. Proofs for Section 5

Appendix E.1. Proof of Lemma 4

Proof. 
For 0 α 1 and α ¯ = 1 α
J w ( α q 1 + α ¯ q 2 , α Q 1 + α ¯ Q 2 ) = s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) p ( y | t , s 1 , s 2 , v 2 ) α q 1 + α ¯ q 2 log α Q 1 + α ¯ Q 2 α q 1 + α ¯ q 2 ( a ) s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) p ( y | t , s 1 , s 2 , v 2 ) α q 1 log Q 1 q 1 + α ¯ q 2 log Q 2 q 2 ( A 60 ) = α J w ( q 1 , Q 1 ) + α ¯ J w ( q 2 , Q 2 ) ,
where ( a ) follows from the log-sum inequality:
i a i log a i b i a log a b ,
for i a i = a and i b i = b . ☐

Appendix E.2. Proof of Lemma 6

Proof. 
Let us calculate q * using the KKT conditions. We want to maximize J w ( q * , Q ) over q * , where for all t , s 1 and v 2 , 0 q * ( t | s 1 , v 2 ) 1 and t q * ( t | s 1 , v 2 ) = 1 .
For fixed s 1 and v 2 ,
( A 62 ) 0 = q * J w ( q * , Q ) + 1 t q * ( t | s 1 , v 2 ) ν s 1 , v 2 ( A 63 ) = s 2 , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q ( t | y , s 2 , v 2 ) q * ( t | s 1 , v 2 ) 1 ν s 1 , v 2 ,
divide by p ( s 1 , v 2 ) ,
0 = log q * ( t | s 1 , v 2 ) + s 2 , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) p ( y | t , s 1 , s 2 , v 2 ) p ( s 1 , v 2 ) log Q ( t | y , s 2 , v 2 ) 1 + ν s 1 v 2 p ( s 1 , v 2 ) ,
define 1 + ν s 1 v 2 p ( s 1 , v 2 ) = log ν s 1 , v 2 , hence
q * ( t | s 1 , v 2 ) = ν s 1 , v 2 s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) ,
and from the constraint t q * ( t | s 1 , v 2 ) = 1 we get that
q * ( t | s 1 , v 2 ) = s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) t s 2 , y Q ( t | y , s 2 , v 2 ) p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) .
 ☐

Appendix E.3. Proof of Lemma 7

The proof for this lemma is done in three steps: first, we prove that U w ( q 1 ) is greater than or equal to J w ( q 0 , Q 0 * ) for any two PMFs q 0 ( t | s 1 , v 2 ) and q 1 ( t | s 1 , v 2 ) , then, we use Lemmas 3 and 5 to state that for the optimal PMF, q c ( t | s 1 , v 2 ) , C 2 , w l b = J w ( q c , Q c * ) , and, therefore, U w ( q ) is an upper bound of C 2 , w l b for every q ( t | s 1 , v 2 ) . Thirdly, we prove that U w ( q ) converges to C 2 , w l b .
Proof. 
Consider any two PMFs, q 0 ( t | s 1 , v 2 ) and q 1 ( t | s 1 , v 2 ) , their corresponding { p 0 ( s 1 , s 2 , v 2 , t , y ) , Q 0 * ( t | y , s 2 , v 2 ) } and { p 1 ( s 1 , s 2 , v 2 , t , y ) , Q 1 * ( t | y , s 2 , v 2 ) } , respectively, according to (24) and (26) and consider also the following inequalities:
s 1 , s 2 , v 2 , t , y p 0 ( s 1 , s 2 , v 2 , t , y ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) J w ( q 0 , Q 0 * ) = s 1 , s 2 , v 2 , t , y p 0 ( s 1 , s 2 , v 2 , t , y ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) log Q 0 * ( t | y , s 2 , v 2 ) q 0 ( t | s 1 , v 2 ) = s 1 , s 2 , v 2 , t , y p 0 ( s 1 , s 2 , v 2 , t , y ) log Q 1 * ( t | y , s 2 , v 2 ) Q 0 * ( t | y , s 2 , v 2 ) q 0 ( t | s 1 , v 2 ) q 1 ( t | s 1 , v 2 ) = D q 0 ( t | s 1 , v 2 ) q 1 ( t | s 1 , v 2 ) D Q 0 * ( t | y , s 2 , v 2 ) Q 1 * ( t | y , s 2 , v 2 ) = ( a ) D q 0 ( t | s 1 , s 2 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) p ( s 1 , s 2 ) w ( v 2 | s 2 ) q 1 ( t | s 1 , s 2 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) p ( s 1 , s 2 ) w ( v 2 | s 2 ) D Q 0 * ( t | y , s 2 , v 2 ) Q 1 * ( t | y , s 2 , v 2 ) = D p 0 ( s 1 , s 2 , v 2 , t , y ) p 1 ( s 1 , s 2 , v 2 , t , y ) D Q 0 * ( t | y , s 2 , v 2 ) Q 1 * ( t | y , s 2 , v 2 ) = ( b ) D p 0 ( s 2 , v 2 , y ) Q 0 * ( t | y , s 2 , v 2 ) p 0 ( s 1 | s 2 , v 2 , t , y ) p 1 ( s 2 , v 2 , y ) Q 1 * ( t | y , s 2 , v 2 ) p 1 ( s 1 | s 2 , v 2 , t , y ) D Q 0 * ( t | y , s 2 , v 2 ) Q 1 * ( t | y , s 2 , v 2 ) = D p 0 ( s 2 , v 2 , y ) p 1 ( s 2 , v 2 , y ) + D p 0 ( s 1 | s 2 , v 2 , t , y ) p 1 ( s 1 | s 2 , v 2 , t , y ) ( A 67 ) = ( c ) 0 ,
where D · · is the K-L divergence, p j ( s 2 , v 2 , y ) and p j ( s 1 | s 2 , v 2 , t , y ) are marginal distributions of p j ( s 1 , s 2 , v 2 , t , y ) for j = 0 , 1 , ( a ) follows from the fact that T is independent of S 2 given ( S 1 , V 2 ) and from the K-L divergence properties, ( b ) follows from the fact that Q j * ( t | y , s 2 , v 2 ) is a marginal distribution of p j ( s 1 , s 2 , v 2 , t , y ) for j = 0 , 1 and ( c ) follows from the fact that D · · 0 always.
Thus,
J ( q 0 , Q 0 * ) s 1 , s 2 , v 2 , t , y p 0 ( s 1 , s 2 , v 2 , t , y ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) = s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) q 0 ( t | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) = s 1 , v 2 p ( s 1 , v 2 ) t q 0 ( t | s 1 , v 2 ) s 2 p ( s 2 | s 1 , v 2 ) y p ( y | t , s 1 , s 2 , v 2 ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) s 1 , v 2 p ( s 1 , v 2 ) max t s 2 p ( s 2 | s 1 , v 2 ) y p ( y | t , s 1 , s 2 , v 2 ) log Q 1 * ( t | y , s 2 , v 2 ) q 1 ( t | s 1 , v 2 ) ( A 68 ) = U w ( q 1 ) .
We proved that U w ( q 1 ) is greater than or equal to J w ( q 0 , Q 0 * ) for any choice of q 0 ( t | s 2 , v 2 ) and q 1 ( t | s 1 , v 2 ) . Therefore, by taking q 0 ( t | s 1 , v 2 ) to be the distribution that achieves C 2 , w l b and by considering Lemmas 3 and 5, we conclude that U w ( q ) C w , 2 for any choice of q ( t | s 1 , v 2 ) .
In order to prove that U w ( q ) converges to C 2 , w l b let us rewrite Equation (A63) as
s 2 , y p ( s 2 | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q ( t | y , s 2 , v 2 ) q * ( t | s 1 , v 2 ) = ν s 1 , v 2 .
We can see that for a fixed Q, the right hand side of the equation is independent of t. Considering also
J w ( q , Q ) = s 1 , s 2 , v 2 , t , y p ( s 1 , s 2 ) w ( v 2 | s 2 ) q ( t | s 1 , v 2 ) p ( y | t , s 1 , s 2 , v 2 ) log Q ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 ) ( A 70 ) s 1 , v 2 p ( s 1 , v 2 ) max t s 2 p ( s 2 | s 1 , v 2 ) y p ( y | t , s 1 , s 2 , v 2 ) log Q * ( t | y , s 2 , v 2 ) q ( t | s 1 , v 2 ) ,
we can conclude that the equation holds when the PMF q is the PMF that achieves C 2 , w l b .

References

  1. Steinberg, Y. Coding for Channels With Rate-Limited Side Information at the Decoder, with Applications. IEEE Trans. Inf. Theory 2008, 54, 4283–4295. [Google Scholar] [CrossRef]
  2. Gel’fand, S.I.; Pinsker, M.S. Coding for Channel with Random Parameters. Probl. Control Theory 1980, 9, 19–31. [Google Scholar]
  3. Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
  4. Shannon, C.E. Channels with side information at the transmitter. IBM J. Res. Dev. 1958, 2, 289–293. [Google Scholar] [CrossRef]
  5. Heegard, C.; Gamal, A.A.E. On the capacity of computer memory with defects. IEEE Trans. Inf. Theory 1983, 29, 731–739. [Google Scholar] [CrossRef]
  6. Cover, T.M.; Chiang, M. Duality between channel capacity and rate distortion with two-sided state information. IEEE Trans. Inf. Theory 2006, 48, 1629–1638. [Google Scholar] [CrossRef]
  7. Rosenzweig, A.; Steinberg, Y.; Shamai, S. On channels with partial channel state information at the transmitter. IEEE Trans. Inf. Theory 2005, 51, 1817–1830. [Google Scholar] [CrossRef]
  8. Cemal, Y.; Steinberg, Y. Coding Problems for Channels With Partial State Information at the Transmitter. IEEE Trans. Inf. Theory 2007, 53, 4521–4536. [Google Scholar] [CrossRef]
  9. Keshet, G.; Steinberg, Y.; Merhav, N. Channel Coding in the Presence of Side Information. Found. Trends Commun. Inf. Theory 2007, 4, 445–586. [Google Scholar] [CrossRef]
  10. Kaspi, A.H. Two-way source coding with a fidelity criterion. IEEE Trans. Inf. Theory 1985, 31, 735–740. [Google Scholar] [CrossRef]
  11. Permuter, H.; Steinberg, Y.; Weissman, T. Two-Way Source Coding With a Helper. IEEE Trans. Inf. Theory 2010, 56, 2905–2919. [Google Scholar] [CrossRef]
  12. Weissman, T.; Gamal, A.E. Source Coding With Limited-Look-Ahead Side Information at the Decoder. IEEE Trans. Inf. Theory 2006, 52, 5218–5239. [Google Scholar] [CrossRef]
  13. Weissman, T.; Merhav, N. On causal source codes with side information. IEEE Trans. Inf. Theory 2005, 51, 4003–4013. [Google Scholar] [CrossRef]
  14. Shannon, C.E. Coding Theorems for a Discrete Source with a Fidelity Criterion. IRE Nat. Conv. Rec. 1959, 4, 142–163. [Google Scholar]
  15. Pradhan, S.S.; Chou, J.; Ramchandran, K. Duality between source coding and channel coding and its extension to the side information case. IEEE Trans. Inf. Theory 2003, 49, 1181–1203. [Google Scholar] [CrossRef]
  16. Pradhan, S.S.; Ramchandran, K. On functional duality in multiuser source and channel coding problems with one-sided collaboration. IEEE Trans. Inf. Theory 2006, 52, 2986–3002. [Google Scholar] [CrossRef]
  17. Zamir, R.; Shamai, S.; Erez, U. Nested linear/lattice codes for structured multiterminal binning. IEEE Trans. Inf. Theory 2006, 48, 1250–1276. [Google Scholar] [CrossRef]
  18. Su, J.; Eggers, J.; Girod, B. Illustration of the duality between channel coding and rate distortion with side information. In Proceedings of the 2000 Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 29 October–1 November 2000; Volume 2, pp. 1841–1845. [Google Scholar]
  19. Goldfeld, Z.; Permuter, H.H.; Kramer, G. Duality of a Source Coding Problem and the Semi-Deterministic Broadcast Channel With Rate-Limited Cooperation. IEEE Trans. Inf. Theory 2016, 62, 2285–2307. [Google Scholar] [CrossRef]
  20. Asnani, H.; Permuter, H.H.; Weissman, T. Successive Refinement With Decoder Cooperation and Its Channel Coding Duals. IEEE Trans. Inf. Theory 2013, 59, 5511–5533. [Google Scholar] [CrossRef]
  21. Gupta, A.; Verdu, S. Operational Duality Between Lossy Compression and Channel Coding. IEEE Trans. Inf. Theor. 2011, 57, 3171–3179. [Google Scholar] [CrossRef]
  22. Blahut, R.E. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inform. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
  23. Arimoto, S. An Algorithm for Computing the Capacity of Arbitrary Discrete MemorylessChannels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
  24. Willems, F.M.J. Computation of the Wyner-Ziv Rate-Distortion Function; Research Report; University of Technology: Hong Kong, China, 1983. [Google Scholar]
  25. Dupuis, F.; Yu, W.; Willems, F. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information. In Proceedings of the 2004 International Symposium on Information Theory, Chicago, IL, USA, 27 June–2 July 2004; p. 179. [Google Scholar]
  26. Cheng, S.; Stankovic, V.; Xiong, Z. Computing the channel capacity and rate-distortion function with two-sided state information. IEEE Trans. Inf. Theory 2005, 51, 4418–4425. [Google Scholar] [CrossRef]
  27. Sumszyk, O.; Steinberg, Y. Information embedding with reversible stegotext. In Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, Korea, 28 June–3 July 2009; pp. 2728–2732. [Google Scholar]
  28. Naiss, I.; Permuter, H.H. Extension of the Blahut-Arimoto Algorithm for Maximizing Directed Information. IEEE Trans. Inf. Theory 2013, 59, 204–222. [Google Scholar] [CrossRef]
  29. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1991. [Google Scholar]
  30. Yeung, R.W. Information Theory and Network Coding, 1 ed.; Springer: Berlin, Germany, 2008. [Google Scholar]
  31. Berger, T. Multiterminal source coding. In Information Theory Approach to Communications; Longo, G., Ed.; CSIM Course and Lectures; Springer: Berlin, Germany, 1978; pp. 171–231. [Google Scholar]
  32. Csiszar, I.; Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems/Imre Csiszar and Janos Korner; Academic Press: Cambridge, MA, USA, 1981. [Google Scholar]
Figure 1. Increased partial side information example. The encoder wants to send a message to the decoder over an interrupted channel in the presence of side information. The encoder is provided with the ESI and the decoder is provided with increased DSI. i.e., the decoder is informed with a rate-limited description of the ESI, in addition to the DSI.
Figure 1. Increased partial side information example. The encoder wants to send a message to the decoder over an interrupted channel in the presence of side information. The encoder is provided with the ESI and the decoder is provided with increased DSI. i.e., the decoder is informed with a rate-limited description of the ESI, in addition to the DSI.
Entropy 19 00467 g001
Figure 2. Channel coding and source coding cases. (a) Channel coding with state information. Case 1: Rate-limited ESI at the decoder. Case 2: Rate-limited DSI at the encoder. Case 2 C : Causal ESI and rate-limited DSI at the encoder; (b) Source coding with side information. Case 2: Rate-limited DSI at the encoder. Case 1: Rate-limited ESI at the decoder. Case 1 C : Causal DSI and rate-limited ESI at the decoder. The cases are presented in this order to allow each source coding case to be paralel to the dual channel coding case.
Figure 2. Channel coding and source coding cases. (a) Channel coding with state information. Case 1: Rate-limited ESI at the decoder. Case 2: Rate-limited DSI at the encoder. Case 2 C : Causal ESI and rate-limited DSI at the encoder; (b) Source coding with side information. Case 2: Rate-limited DSI at the encoder. Case 1: Rate-limited ESI at the decoder. Case 1 C : Causal DSI and rate-limited ESI at the decoder. The cases are presented in this order to allow each source coding case to be paralel to the dual channel coding case.
Entropy 19 00467 g002
Figure 3. Example 1 Channel coding Case 2—channel topology.
Figure 3. Example 1 Channel coding Case 2—channel topology.
Entropy 19 00467 g003
Figure 4. Example 1. Channel coding Case 2 for the channel depicted in Figure 3, where the side information is distributed S 1 Bernoulli ( 0 . 5 ) , and Pr { S 2 S 1 } = 0.8 . C 2 lb ( R ) is the lower bound on the capacity of this channel, C-C rate is the Cover-Chiang rate ( R = 0 ) and G-P rate is the Gelfand–Pinsker rate ( R = 0 and the decoder has no side information available at all). Notice that at the encoder the maximal uncertainty about S 2 is H ( S 2 | S 1 ) = 0.7219 bit. Therefore, for any R 0.7219 C 2 l b reaches its maximal value.
Figure 4. Example 1. Channel coding Case 2 for the channel depicted in Figure 3, where the side information is distributed S 1 Bernoulli ( 0 . 5 ) , and Pr { S 2 S 1 } = 0.8 . C 2 lb ( R ) is the lower bound on the capacity of this channel, C-C rate is the Cover-Chiang rate ( R = 0 ) and G-P rate is the Gelfand–Pinsker rate ( R = 0 and the decoder has no side information available at all). Notice that at the encoder the maximal uncertainty about S 2 is H ( S 2 | S 1 ) = 0.7219 bit. Therefore, for any R 0.7219 C 2 l b reaches its maximal value.
Entropy 19 00467 g004
Figure 5. The equivalent rate-distortion problem for Case 1 for the source X = S 1 S 2 where S 1 , S 2 i . i . d . B e r n o u l l i ( 0.5 ) .
Figure 5. The equivalent rate-distortion problem for Case 1 for the source X = S 1 S 2 where S 1 , S 2 i . i . d . B e r n o u l l i ( 0.5 ) .
Entropy 19 00467 g005
Figure 6. Example 2. Source coding Case 1 for binary-symmetric source and Hamming distortion. The source is given by X = S 1 S 2 , where S 1 , S 2 Bernoulli ( 0.5 ) . The graph shows the rate-distortion function for different values of R .
Figure 6. Example 2. Source coding Case 1 for binary-symmetric source and Hamming distortion. The source is given by X = S 1 S 2 , where S 1 , S 2 Bernoulli ( 0.5 ) . The graph shows the rate-distortion function for different values of R .
Entropy 19 00467 g006
Figure 7. Channel coding: Case 2. C 2 l b = max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) , where the maximization is over all PMFs w ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | s 1 , v 2 , u ) such that R I ( V 2 ; S 2 | S 1 ) .
Figure 7. Channel coding: Case 2. C 2 l b = max I ( U ; Y , S 2 | V 2 ) I ( U ; S 1 | V 2 ) , where the maximization is over all PMFs w ( v 2 | s 2 ) p ( u | s 1 , v 2 ) p ( x | s 1 , v 2 , u ) such that R I ( V 2 ; S 2 | S 1 ) .
Entropy 19 00467 g007
Figure 8. Channel coding with two-sided increased partial side information.
Figure 8. Channel coding with two-sided increased partial side information.
Entropy 19 00467 g008
Figure 9. Source coding with two-sided increased partial side information.
Figure 9. Source coding with two-sided increased partial side information.
Entropy 19 00467 g009

Share and Cite

MDPI and ACS Style

Sadeh-Shirazi, A.; Basher, U.; Permuter, H. Channel Coding and Source Coding With Increased Partial Side Information. Entropy 2017, 19, 467. https://doi.org/10.3390/e19090467

AMA Style

Sadeh-Shirazi A, Basher U, Permuter H. Channel Coding and Source Coding With Increased Partial Side Information. Entropy. 2017; 19(9):467. https://doi.org/10.3390/e19090467

Chicago/Turabian Style

Sadeh-Shirazi, Avihay, Uria Basher, and Haim Permuter. 2017. "Channel Coding and Source Coding With Increased Partial Side Information" Entropy 19, no. 9: 467. https://doi.org/10.3390/e19090467

APA Style

Sadeh-Shirazi, A., Basher, U., & Permuter, H. (2017). Channel Coding and Source Coding With Increased Partial Side Information. Entropy, 19(9), 467. https://doi.org/10.3390/e19090467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop