Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information
Abstract
:1. Introduction
- As aforementioned, the straight generalization has not yet found wide applicability.
- In the discrete case and , Arimoto [28] proposed the definition of the nonnegative quantity
- Augustin [29] and, later, Csiszár [4] defined
- For the purpose of coming up with a measure of the similarity among a finite collection of probability measures , weighted by on , Sibson [31] proposed the information radius of order as
- Independently, Lapidoth-Pfister [32] and Tomamichel-Hayashi [33] proposed
2. Notation, Definitions and Properties
- If is a probability space, indicates for all .
- Let and be measurable spaces, which we refer to as the input and output spaces, respectively, with and referred to as the input and output alphabets respectively. denotes a random transformation from to , i.e., for any , is a probability measure on , and for any , is an -measurable function. For brevity, we will usually drop mention of the underlying -fields. If P is a probability measure on and is a random transformation, the corresponding joint probability measure on is denoted by (or, interchangeably, ). The notation indicates that the output marginal of the joint probability measure is denoted by Q.
- The relative information between two probability measures P and Q on the same measurable space such that is defined as
- Given , the information density is defined as
- Fix , , and a probability measure on . Then, the output probability measure is called the -response to if
- Given two probability measures P and Q on the same measurable space and a scalar , the Rényi divergence of order between P and Q is defined as [1]
- If , then the binary Rényi divergence of order is given by
- , with equality only if .
- is monotonically increasing with .
- While we may have and simultaneously, for any is equivalent to P and Q being orthogonal. Conversely, if for some , , then .
- The Rényi divergence satisfies the data-processing inequality. If and , then
- Gilardoni [38] gave a strengthened Pinsker’s inequality upper bounding the square of the total variation distance by
- The Rényi divergence is lower semicontinuous in the topology of setwise convergence, i.e., if for every event , , and , then
- In the theory of robust lossless source coding [22,25] the following scalar, called the -minimax redundancy of , is an important measure of the worst-case redundancy penalty that ensues when the encoder only knows that the data is generated according to one of the probability measures in the collection :
- Given input distribution and random transformations , the conditional Rényi divergence of order isAlthough (38) also holds for the familiar case, in general the conditional Rényi divergence is not the arithmetic average of with respect to if . Instead it’s a generalized mean, or a scaled cumulant generating function evaluated at . Specifically, if , thenRegardless of whether or , (39) implies that
- Given , and , the α-mutual information is [27,31]Note that but, in general, .
- An alternative expression for -mutual information, which will come in handy in our analysis and which does not involve either or is obtained by introducing an auxiliary probability measure dominating the collection [27]:As usual, sometimes it is convenient to fix -finite measures and on the input and output spaces which dominate and , respectively, and denote their densities with respect to the reference measures byThen, we can write -mutual information as
- In the special case of discrete alphabets,
- Fix , , and a collection of probability measures on the input space. Then, we denoteWhen is the set of all input measures we write simply , dubbed the Rényi capacity in [26]. is a measure of the similarity of the family , which plays an important role in the analysis of the fundamental limits of information transmission through noisy channels, particularly in the regime of exponentially small error probability. For a long time (e.g., [39]) the cutoff rate was conjectured to be the maximal rate for which reliable codes with manageable decoding complexity can be found. The zero-error capacity of the discrete memoryless channel with feedback is equal to either zero or [40]
- While is convex in the pair , the picture for Rényi divergence is somewhat more nuanced:
- (a)
- If , then is convex in .
- (b)
- For any fixed pair , is concave (resp. convex) in if (resp. ) (see [43]).
3. Conditional Rényi Divergence Game
3.1. Saddle point
- : Regardless of whether or , we see from (45) that if there exists some such that
3.2. Minimax identity
4. Finding
5. Proofs
5.1. Proof of Theorem 1
5.2. Proof of Theorem 2
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Rényi, A. On measures of information and entropy. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
- Van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
- Blahut, R.E. Hypothesis testing and information theory. IEEE Trans. Inf. Theory 1974, 20, 405–417. [Google Scholar] [CrossRef] [Green Version]
- Csiszár, I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
- Shayevitz, O. On Rényi measures and hypothesis testing. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 894–898. [Google Scholar]
- Harremoës, P. Interpretations of Rényi entropies and divergences. Phys. A Stat. Mech. Its Appl. 2006, 365, 57–62. [Google Scholar] [CrossRef]
- Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef]
- Haroutunian, E.A. Estimates of the exponent of the error probability for a semicontinuous memoryless channel. Probl. Inf. Transm. 1968, 4, 29–39. [Google Scholar]
- Haroutunian, E.A.; Haroutunian, M.E.; Harutyunyan, A.N. Reliability criteria in information theory and in statistical hypothesis testing. Found. Trends Commun. Inf. Theory 2007, 4, 97–263. [Google Scholar] [CrossRef]
- Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, Allerton, IL, USA, 29 September–1 October 2010; pp. 1327–1333. [Google Scholar]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Liese, F.; Vajda, I. Convex Statistical Distances; Number 95 in Teubner Texte zur Mathematik; Teubner: Leipzig, Germany, 1987. [Google Scholar]
- Tridenski, S.; Zamir, R.; Ingber, A. The Ziv-Zakai-Renyi bound for joint source-channel coding. IEEE Trans. Inf. Theory 2015, 61, 429–4315. [Google Scholar] [CrossRef]
- Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef] [Green Version]
- Blahut, R.E. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef] [Green Version]
- Blachman, N.M. Communication as a game. Proc. IRE Wescon 1957, 2, 61–66. [Google Scholar]
- Borden, J.M.; Mason, D.M.; McEliece, R.J. Some information theoretic saddlepoints. SIAM J. Control Optim. 1985, 23, 129–143. [Google Scholar] [CrossRef]
- Lapidoth, A.; Narayan, P. Reliable communication under channel uncertainty. IEEE Trans. Inf. Theory 1998, 44, 2148–2177. [Google Scholar] [CrossRef] [Green Version]
- Kemperman, J. On the Shannon capacity of an arbitrary channel. In Koninklijke Nederlandse Akademie van Wetenschappen, Indagationes Mathematicae; Elsevier: Amsterdam, The Netherlands, 1974; Volume 77, pp. 101–115. [Google Scholar]
- Gallager, R.G. Source Coding With Side Information and Universal Coding; Technical Report LIDS-P-937; Lab. Information Decision Systems, Massachusetts Institute of Technology: Cambridge, MA, USA, 1979. [Google Scholar]
- Ryabko, B. Encoding a source with unknown but ordered probabilities. Probl. Inf. Transm. 1979, 15, 134–138. [Google Scholar]
- Davisson, L.; Leon-Garcia, A. A source matching approach to finding minimax codes. IEEE Trans. Inf. Theory 1980, 26, 166–174. [Google Scholar] [CrossRef]
- Ryabko, B. Comments on “A Source Matching Approach to Finding Minimax Codes”. IEEE Trans. Inf. Theory 1981, 27, 780–781. [Google Scholar] [CrossRef]
- Haussler, D. A general minimax result for relative entropy. IEEE Trans. Inf. Theory 1997, 43, 1276–1280. [Google Scholar] [CrossRef]
- Yagli, S.; Altuğ, Y.; Verdú, S. Minimax Rényi redundancy. IEEE Trans. Inf. Theory 2018, 64, 3715–3733. [Google Scholar] [CrossRef]
- Nakiboğlu, B. The Rényi capacity and center. IEEE Trans. Inf. Theory 2019, 65, 841–860. [Google Scholar] [CrossRef]
- Verdú, S. α-mutual information. In Proceedings of the 2015 Information Theory and Applications Workshop, San Diego, CA, USA, 1–6 February 2015; pp. 1–6. [Google Scholar]
- Arimoto, S. Information Measures and Capacity of Order α for Discrete Memoryless Channels. In Topics in Information Theory, Proceedings of the Coll. Math. Soc. János Bolyai; Bolyai: Keszthely, Hungary, 1975; pp. 41–52. [Google Scholar]
- Augustin, U. Noisy Channels. Ph.D. Thesis, Universität Erlangen-Nürnberg, Erlangen, Germany, 1978. [Google Scholar]
- Nakiboglu, B. The Augustin capacity and center. arXiv 2018, arXiv:1606.00304. [Google Scholar]
- Sibson, R. Information radius. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 1969, 14, 149–160. [Google Scholar] [CrossRef]
- Lapidoth, A.; Pfister, C. Two measures of dependence. Entropy 2019, 21, 778. [Google Scholar] [CrossRef]
- Tomamichel, M.; Hayashi, M. Operational interpretation of Rényi information measures via composite hypothesis testing against product and Markov distributions. IEEE Trans. Inf. Theory 2018, 64, 1064–1082. [Google Scholar] [CrossRef]
- Aishwarya, G.; Madiman, M. Remarks on Rényi versions of conditional entropy and mutual information. In Proceedings of the 2019 IEEE International Symposium on Information Theory, Paris, France, 7–12 July 2019; pp. 1117–1121. [Google Scholar]
- Shannon, C.E.; Gallager, R.G.; Berlekamp, E. Lower bounds to error probability for coding on discrete memoryless channels, I. Inf. Control 1967, 10, 65–103. [Google Scholar] [CrossRef]
- Gallager, R.G. A simple derivation of the coding theorem and some applications. IEEE Trans. Inf. Theory 1965, 11, 3–18. [Google Scholar] [CrossRef] [Green Version]
- Yagli, S.; Cuff, P. Exact soft-covering exponent. IEEE Trans. Inf. Theory 2019, 65, 6234–6262. [Google Scholar] [CrossRef]
- Gilardoni, G.L. On Pinsker’s and Vajda’s type inequalities for Csiszár’s f-divergences. IEEE Trans. Inf. Theory 2010, 56, 5377–5386. [Google Scholar] [CrossRef]
- Massey, J.L. Coding and modulation in digital communications. In Proceedings of the 1974 International Zurich Seminar on Digital Communications, Zurich, Switzerland, 12–15 March 1974; pp. E2(1)–E2(4). [Google Scholar]
- Shannon, C.E. The zero error capacity of a noisy channel. IRE Trans. Inf. Theory 1956, 2, 8–19. [Google Scholar] [CrossRef] [Green Version]
- Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef]
- Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef]
- Ho, S.W.; Verdú, S. Convexity/concavity of Rényi entropy and α-mutual information. In Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 745–749. [Google Scholar]
- Gallager, R.G. Information Theory and Reliable Communication; John Wiley and Sons: New York, USA, 1968; Volume 2. [Google Scholar]
- Verdú, S. Channel Capacity. In The Electrical Engineering Handbook, 2nd ed.; IEEE Press: Piscataway, NJ, USA; CRC Press: Boca Raton, FL, USA, 1997; Chapter 73.5; pp. 1671–1678. [Google Scholar]
- Luenberger, D. Optimization by vector space methods; John Wiley and Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
- Polyanskiy, Y.; Wu, Y. Lecture Notes on Information Theory. 2017. Available online: http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf (accessed on 30 April 2019).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, C.; Verdú, S. Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy 2019, 21, 969. https://doi.org/10.3390/e21100969
Cai C, Verdú S. Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy. 2019; 21(10):969. https://doi.org/10.3390/e21100969
Chicago/Turabian StyleCai, Changxiao, and Sergio Verdú. 2019. "Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information" Entropy 21, no. 10: 969. https://doi.org/10.3390/e21100969
APA StyleCai, C., & Verdú, S. (2019). Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy, 21(10), 969. https://doi.org/10.3390/e21100969