The Case for Shifting the Rényi Entropy
Abstract
:1. Introduction
2. Preliminaries
2.1. The Generalized Power Means
- The (weighted) geometric mean when .
- The weighted arithmetic mean when .
- The weighted harmonic mean for .
- The quadratic mean for .
- Finally, the max- and min-means appear as the limits:
- 1.
- (0- and 1-order homogeneity in weights and values) If , then
- 2.
- (Order factorization) If , then
- 3.
- (Reduction to the arithmetic mean) If , then .
- 4.
- (Reduction to the harmonic mean) If , then .
- 5.
- (Monotonicity in r) Furthermore, and , then
- is constant, in which case .
- and some , in which case .
- and some , in which case .
- 6.
- (Non-null derivative) Call . Then
2.2. Rényi’s Entropy
2.2.1. Probability Spaces, Random Variables and Expectations
2.2.2. The Approach to Rényi’s Information Functions Based in Postulates
- The amount of information provided by a single random event should be a function of its probability , not its value , where quantifies information.
- This amount of information should be additive on independent events.
- The amount of information of a binary equiprobable decision is one bit.
- If different amounts of information occur with different probabilities the total amount of information is an average of the individual information amounts weighted by the probability of occurrence.
- Postulates 1 and 2 fix Hartley’s function as the single possible amount of information of a basic event
- Postulates 3 fixes the base of the logarithm in Hartley’s formula to 2 by fixing . Any other value / fixes b as the base for the logarithm and changes the unit.
- Postulate 4 defines an average amount of information, or entropy, properly speaking. Its basic formula is a form of the Kolmogorov–Nagumo formula or f-mean (A2) applied to informationThus the “entropy” in Information Theory is, by definition, synonym with “aggregate amount of information”, which departs from its physical etymology, despite the numerous analogies between both concepts.It has repeatedly been proven that only two forms of the function can actually be used in the Kolmogorov–Nagumo formula that respect the previous postulates [12,26,27]:
- -
- The one generating Shannon’s entropy:
- -
- That originally used by Rényi himself:
3. Results
3.1. The Shifted Rényi Entropy and Divergence
- For this is motivated by:
- For we can use the linear mean with inverse as per the standard definition, leading to Shannon’s entropy.
3.1.1. The Case for Shifting the Rényi Entropy
- 1.
- (Monotonicity) The Rényi entropy is a non-increasing function of the order r.
- 2.
- (Boundedness) The Rényi spectrum is bounded by the limits
- 3.
- The entropy of the uniform pmf is constant over r.
- 4.
- The Hartley entropy () is constant over the distribution simplex.
- 5.
- (Divergence from uniformity) The divergence of any distribution from the uniform can be written in terms of the entropies as:
- 6.
- (Derivative of the shifted entropy) The derivative in r of Rényi’s r-th order entropy is
- 7.
- (Relationship with the moments of ) The shifted Rényi Entropy of order r is the logarithm of the inverse r-th root of the r-th moment of .
- is the normalization of . In fact, if and only if we have .
- if and 0 otherwise.
- Furthermore, if has P maxima (M minima), then () is an everywhere null distribution but at the indices where the maxima (minima) of are situated:
3.1.2. Shifting Other Concepts Related to the Entropies
3.2. Writing Rényi Entropies in Terms of Each Other
3.3. Quantities Around the Shifted Rényi Entropy
3.3.1. The Equivalent Probability Function
- 1.
- For all , there holds that
- 2.
- If the uniform over the same , then .
- 3.
- if the Kroneker delta centered on , then where is the step function.
3.3.2. The Information Potential
3.3.3. Summary
3.4. Discussion
3.4.1. Other Reparameterization of the Rényi Entropy
3.4.2. Rényi Measures and the Means
3.4.3. Other Magnitudes around the Rényi Entropy
3.4.4. Redundancy of the Rényi Entropy
- That everything can be written in terms of , e.g., in terms of Shannon’s entropy. This is made possible by the existence of estimators for Shannon’s entropy and divergence.
- That everything can be written in terms of a finite , e.g., . This is possible by means of Properties 1.3 and 1.4 of the generalized power means. The work in [29] is pointing this way (perhaps including also , aka Hartley’s) capitalizing on the fact that Rényi’s entropy for data is well estimated for , equivalently ([29], Section 2.6).
- That everything can be written in terms of the extreme values of the entropy, e.g., . This is suggested by Properties 3.1 and 3.2. Supposing we had a way to estimate either or . Then by a divide-and-conquer type of approach it would be feasible to extract all the probabilities of a distribution out of its Rényi entropy function.
3.4.5. The Algebra of Entropies
3.4.6. Shifted Rényi Entropies on Continuous Distributions
3.4.7. Pervasiveness of Rényi Entropies
4. Conclusions
- It aligns them with the power means and explains the apparition of the escort probabilities. Note that the importance of the escort probabilities is justified independently of their link to the means in the shifted version of entropy [5].
- It highlights the Shannon entropy in the role of the “origin” of entropy orders, just as the geometric means is a particular case of the weighted averaged means. This consideration is enhanced by the existence of a formula allowing us to rewrite every other order as a combination of Shannon entropies and cross entropies of escort probabilities of the distribution.
- The shifting of the Rényi entropy aligns it with the moments of the distribution, thus enabling new insights into the moments’ problem.
- It makes the relation between the divergence and the entropy more “symmetrical”.
- It highlights the “information spectrum” quality of the Rényi entropy measure for fixed .
Author Contributions
Funding
Conflicts of Interest
Appendix A. The Kolmogorov-Mean and the Kolmogorov–Nagumo Formula
- 1.
- Continuity and strict monotonicity in all coordinates.
- 2.
- (Symmetry or permutation invariance) Let σ be a permutation, then .
- 3.
- (Reflexivity) The mean of a series of constants is the constant itself:
- 4.
- (Blocking) The computation of the mean can be split into computations of equal size sub-blocks.
- 5.
- (Associativity) Replacing a k-subset of the x with their partial mean in the same multiplicity does not change the overall mean.
Appendix B. The Approach to Shannon’s Information Functions Based in Postulates
- The amount of information of a sequence of n numbers is a symmetric function of this set of values , where is any permutation of n-elements.
- is a continuous function of .
- .
- The following relation holds:
References
- Shannon, C.; Weaver, W. A mathematical Model of Communication; The University of Illinois Press: Champaign, IL, USA, 1949. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Parts I and II. Bell Syst. Tech. J. 1948, XXVII, 379–423. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Part III. Bell Syst. Tech. J. 1948, XXVII, 623–656. [Google Scholar] [CrossRef]
- Shannon, C. The bandwagon. IRE Trans. Inf. Theory 1956, 2, 3. [Google Scholar] [CrossRef]
- Beck, C.; Schögl, F. Thermodynamics of Chaotic Systems: An Introduction; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
- Mayoral, M.M. Rényi’s entropy as an index of diversity in simple-stage cluster sampling. Inf. Sci. 1998, 105, 101–114. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial. Found. Trends Commun. Inf. Theory 2004, 1, 417–528. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Sayood, K. Information Theory and Cognition: A Review. Entropy 2018, 20, 706. [Google Scholar] [CrossRef]
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20–30 July 1960; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
- Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
- Csiszár, I. Axiomatic Characterizations of Information Measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
- Arndt, C. Information Measures, 1st ed.; Information and Its Description in Science and Engineering; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Rényi, A. On the Foundations of Information Theory. Revue de l’Institut International de Statistique/Rev. Int. Stat. Inst. 1965, 33, 1–14. [Google Scholar] [CrossRef]
- Aczél, J.; Daróczy, Z. On measures of inFormation and Their Characterizations; Academic Press [Harcourt Brace Jovanovich, Publishers]: New York, NY, USA; London, UK, 1975. [Google Scholar]
- Kolmogorov, A.N. Sur la notion de la moyenne. Atti Della Accademia Nazionale dei Lincei 1930, 12, 388–391. [Google Scholar]
- Nagumo, M. Uber eine Klasse der Mittelwerte. Jpn. J. Math. Trans. Abstr. 1930, 7, 71–79. [Google Scholar] [CrossRef]
- De Finetti, B. Sul concetto di media. Giornale dell Istituto Italiano degli Attuari 1931, II, 369–396. [Google Scholar]
- Kolmogorov, A. On the Shannon theory of information transmission in the case of continuous signals. IRE Trans. Inf. Theory 1956, 2, 102–108. [Google Scholar] [CrossRef]
- Muliere, P.; Parmigiani, G. Utility and means in the 1930s. Stat. Sci. 1993, 8, 421–432. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEEE Trans. Inf. Theory 2014. [Google Scholar] [CrossRef]
- Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: Cambridge, UK, 1952. [Google Scholar]
- Kitagawa, T. On Some Class of Weighted Means. Proc. Phys.-Math. Soc. Jpn. 3rd Ser. 1934, 16, 117–126. [Google Scholar] [CrossRef]
- Rényi, A. Probability Theory; Courier Dover Publications: Mineola, NY, USA, 1970. [Google Scholar]
- Jizba, P.; Arimitsu, T. The world according to Rényi: thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef] [Green Version]
- Bickel, P.J.; Hammel, E.A.; O’Connell, J.W. Sex bias in graduate admissions: Data from Berkeley. Science 1975, 187, 398–403. [Google Scholar] [CrossRef]
- Principe, J.C. Information Theoretic Learning; Information Science and Statistics; Springer: New York, NY, USA, 2010. [Google Scholar]
- Brillouin, L. Science and Information Theory, 2nd ed.; Academic Press, Inc.: New York, NY, USA, 1962. [Google Scholar]
- Harremoës, P. Interpretations of Rényi entropies and divergences. Phys. A Stat. Mech. Its Appl. 2005, 365, 57–62. [Google Scholar] [CrossRef]
- Augustin, U. Noisy Channels. Ph.D. Thesis, Universität Erlangen, Erlangen, Germany, 1978. [Google Scholar]
- Nakiboglu, B. The Rényi capacity and center. IEEE Trans. Inf. Theory 2018. [Google Scholar] [CrossRef]
- Gondran, M.; Minoux, M. Graphs, Dioids and Semirings. New Models and Algorithms; Operations Research Computer Science Interfaces Series; Springer: New York, NY, USA, 2008. [Google Scholar]
- Moreau, J.J. Inf-convolution, sous-additivité, convexité des fonctions numériques. J. Math. Pures Appl. 1970, 49, 109–154. [Google Scholar]
- Valverde Albacete, F.J.; Peláez-Moreno, C. Entropy operates in Non-Linear Semifields. arXiv, 2017; arXiv:1710.04728. [Google Scholar]
- Zhang, Z.; Grabchak, M. Entropic representation and estimation of diversity indices. J. Nonparametr. Stat. 2016, 28, 563–575. [Google Scholar] [CrossRef]
Mean Name | Mean | Shifted Entropy | Entropy Name | r | |
---|---|---|---|---|---|
Maximum | min-entropy | ∞ | ∞ | ||
Arithmetic | Rényi’s quadratic | 2 | 1 | ||
Geometric | Shannon’s | 1 | 0 | ||
Harmonic | Hartley’s | 0 | |||
Minimum | max-entropy |
Quantity in Terms of… | Rényi Entropy | Gen. Hölder Mean | Information Potential | Distribution |
---|---|---|---|---|
Rényi entropy | ||||
Gen. Hölder mean | ||||
Information potential |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Valverde-Albacete, F.J.; Peláez-Moreno, C. The Case for Shifting the Rényi Entropy. Entropy 2019, 21, 46. https://doi.org/10.3390/e21010046
Valverde-Albacete FJ, Peláez-Moreno C. The Case for Shifting the Rényi Entropy. Entropy. 2019; 21(1):46. https://doi.org/10.3390/e21010046
Chicago/Turabian StyleValverde-Albacete, Francisco J., and Carmen Peláez-Moreno. 2019. "The Case for Shifting the Rényi Entropy" Entropy 21, no. 1: 46. https://doi.org/10.3390/e21010046
APA StyleValverde-Albacete, F. J., & Peláez-Moreno, C. (2019). The Case for Shifting the Rényi Entropy. Entropy, 21(1), 46. https://doi.org/10.3390/e21010046