Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds
Abstract
:1. Introduction
1.1. Parametric Information Geometry Revisited
1.1.1. Riemannian Manifold, Fisher Metric and α-Connections
1.1.2. Exponential Family, Mixture Family and Their Generalization
1.2. Divergence Function and Induced Statistical Manifold
1.2.1. Kullback-Leibler Divergence, Bregman Divergence and α-Divergence
1.2.2. Induced Dual Riemannian Geometry
- (i)
- with equality holding iff ;
- (ii)
- ;
- (iii)
- is positive definite;
1.3. Goals and Approach
2. Information Geometry on Infinite-Dimensional Function Space
2.1. Differentiable Manifold in the Infinite-Dimensional Setting
2.2. -Divergence, a Family of Generalized Divergence Functionals
2.2.1. Fundamental Convex Inequality and Divergence
2.2.2. Conjugate-Scaled Representations of Measurable Functions
2.2.3. Canonical Divergence
2.3. Geometry Induced by the -Divergence
- (i)
- the metric tensor field, , is given by:
- (ii)
- the family of covariant derivatives (connections) is given as:
- (iii)
- the family of conjugate covariant derivatives is:
- (i)
- the Riemann curvature tensor ;
- (ii)
- the torsion tensor .
2.4. Homogeneous -Divergence and the Induced Geometry
3. Parametric Statistical Manifold As Finite-Dimensional Embedding
3.1. Finite-Dimensional Parametric Models
3.1.1. Riemannian Geometry of Parametric Models
3.1.2. Example: The Parametric -Manifold
3.2. Affine Embedded Submanifold
3.2.1. Biorthogonality of Natural and Expectation Parameters
- (i)
- the function:
- (ii)
- the divergence functional, , takes the form of the divergence function:
- (iii)
- the metric tensor, affine connections and the Riemann curvature tensor take the forms:
- (i)
- define
- (ii)
- the pair of convex functions, , form a pair of “potentials” to induce :
- (iii)
- the expectation parameter, , and the natural parameter, , form biorthogonal coordinates:
3.2.2. Dually Flat Affine Manifolds
4. Proofs
5. Discussions
6. Conclusions
Acknowledgments
Conflicts of Interest
References
- Amari, S.; Nagaoka, H. Method of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Amari, S. Differential Geometric Methods in Statistics; Springer-Verlag: New York, NY, USA, 1985. [Google Scholar]
- Barndorff-Nielsen, O.E. Parametric Statistical Models and Likelihood; Springer-Verlag: Heidelberg, Germany, 1988. [Google Scholar]
- Barndorff-Nielsen, O.E.; Cox, R.D.; Reid, N. The role of differential geometry in statistical theory. Int. Stat. Rev. 1986, 54, 83–96. [Google Scholar] [CrossRef]
- Kass, R.E. The geometry of asymptotic inference (with discussion). Stat. Sci. 1989, 4, 188–234. [Google Scholar] [CrossRef]
- Kass, R.E.; Vos, P.W. Geometric Foundation of Asymptotic Inference; John Wiley and Sons: New York, NY, USA, 1997. [Google Scholar]
- Murray, M.K.; Rice, J.W. Differential Geometry and Statistics; Chapman & Hall: London, UK, 1993. [Google Scholar]
- Amari, S.; Kumon, M. Estimation in the presence of infinitely many nuisance parameters — Geometry of estimating functions. Ann. Stat. 1988, 16, 1044–1068. [Google Scholar] [CrossRef]
- Henmi, M.; Matsuzoe, H. Geometry of pre-contrast functions and non-conservative estimating functions. In Proceedings of the International Workshop on Complex Structures, Integrability, and Vector Fields, Sofia, Bulgaria, 13–17 September 2010; Volume 1340, pp. 32–41.
- Matsuzoe, H.; Takeuchi, J.; Amari, S. Equiaffine structures on statistical manifolds and Bayesian statistics. Differ. Geom. Its Appl. 2006, 109, 567–578. [Google Scholar] [CrossRef]
- Takeuchi, J.; Amari, S. α-Parallel prior and its properties. IEEE Trans. Inf. Theory 2005, 51, 1011–1023. [Google Scholar] [CrossRef]
- Amari, S. Natural gradient works efficiently in learning. Neural Comput. 1988, 10, 251–276. [Google Scholar] [CrossRef]
- Yang, H.H.; Amari, S. Complexity issues in natural gradient descent method for training multilayer perceptrons. Neural Comput. 1998, 10, 2137–2157. [Google Scholar] [CrossRef] [PubMed]
- Amari, S.I.; Wu, S. Improving support vector machine classifiers by modifying kernel functions. Neural Networks 1999, 12, 783–789. [Google Scholar] [CrossRef]
- Murata, N.; Takenouchi, T.; Kanamori, T.; Eguchi, S. Information geometry of U-Boost and Bregman divergence. Neural Comput. 2004, 16, 1437–1481. [Google Scholar] [CrossRef] [PubMed]
- Ikeda, S.; Tanaka, T.; Amari, S. Information geometry of turbo and low-density parity-check codes. IEEE Trans. Inf. Theory 2004, 50, 1097–1114. [Google Scholar] [CrossRef]
- Rao, C.R. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
- Efron, B. Defining the curvature of a statistical problem (with application to second order efficiency) (with discussion). Ann. Stat. 1975, 3, 1189–1242. [Google Scholar] [CrossRef]
- Dawid, A.P. Discussion to Efron’s paper. Ann. Stat. 1975, 3, 1231–1234. [Google Scholar]
- Amari, S. Differential geometry of curved exponential families—Curvatures and information loss. Ann. Stat. 1982, 10, 357–385. [Google Scholar] [CrossRef]
- Cena, A. Geometric Structures on the Non-Parametric Statistical Manifold. Ph.D. Thesis, UniversitÀ Degli Studi di Milano, Milano, Italy, 2003. [Google Scholar]
- Gibilisco, P.; Pistone, G. Connections on non-parametric statistical manifolds by Olicz space geometry. Infin. Dimens. Anal. QU. 1998, 1, 325–347. [Google Scholar] [CrossRef]
- Grasselli, M. Dual connections in nonparametric classical information geometry. Ann. Inst. Stat. Math. 2010, 62, 873–896. [Google Scholar] [CrossRef]
- Pistone, G.; Sempi, C. An infinite dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 1995, 33, 1543–1561. [Google Scholar] [CrossRef]
- Zhang, J.; Hasto, P. Statistical manifold as an affine space: A functional equation approach. J. Math. Psychol. 2006, 50, 60–65. [Google Scholar] [CrossRef]
- Zhang, J.; Matsuzoe, H. Dualistic Differential Geometry Associated with a Convex Function. In Advances in Applied Mathematics and Global Optimization; Gao, D.Y., Sherali, H.D., Eds.; Springer: New York, NY, USA, 2009; Volume III, Chapter 13; pp. 439–466. [Google Scholar]
- Nomizu, K.; Sasaki, T. Affine Differential Geometry—Geometry of Affine Immersions; Cambridge University Press: Cambridge, MA, USA, 1994. [Google Scholar]
- Simon, U.; Schwenk-Schellschmidt, A.; Viesel, H. Introduction to the Affine Differential Geometry of Hypersurfaces; University of Tokyo Press: Tokyo, Japan, 1991. [Google Scholar]
- Lauritzen, S. Statistical manifolds. In Differential Geometry in Statistical Inference; Amari, S., Barndorff-Nielsen, O., Kass, R., Lauritzen, S., Rao, C.R., Eds.; IMS: Hayward, CA, USA, 1987; Volume 10, pp. 163–216. [Google Scholar]
- Lauritzen, S. Conjugate connections in statistical theory. In Proceedings of the Workshop on Geometrization of Statistical Theory; Dodson, C.T.J., Ed.; University of Lancaster: Lancaster, UK, 1987; pp. 33–51. [Google Scholar]
- Kurose, T. Dual connections and affine geometry. Math. Z 1990, 203, 115–121. [Google Scholar] [CrossRef]
- Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tôhoko Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
- Matsuzoe, H. On realization of conformally-projecively flat statistical manifolds and the divergences. Hokkaido Math. J. 1998, 27, 409–421. [Google Scholar] [CrossRef]
- Matsuzoe, H. Geometry of contrast functions and conformal geometry. Hokkaido Math. J. 1999, 29, 175–191. [Google Scholar]
- Calin, O.; Matsuzoe, H.; Zhang, J. Generalization of conjugate connections. In Trends in Differential Geometry, Complex Analysis, and Mathematical Physics; In Proceedings of the 9th International Workshop on Complex Structures, Integrability, and Vector Fields, Sofia, Bulgaria, 25–29 August 2008; pp. 24–34.
- Eguchi, S. Second order efficiency of minimum contrast estimators in a curved exponential family. Ann. Stat. 1983, 11, 793–803. [Google Scholar] [CrossRef]
- Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J. 1985, 15, 341–391. [Google Scholar]
- Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar]
- Chentsov, N.N. Statistical Decision Rules and Optimal Inference; AMS: Providence, RI, USA, 1982. [Google Scholar]
- Naudts, J. Generalised exponential families and associated entropy functions. Entropy 2008, 10, 131–149. [Google Scholar] [CrossRef]
- Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
- Zhu, H.Y.; Rohwer, R. Bayesian invariant measurements of generalization. Neural Process. Lett. 1995, 2, 28–31. [Google Scholar] [CrossRef]
- Zhu, H.Y.; Rohwer, R. Measurements of generalisation based on information geometry. In Mathematics of Neural Networks: Models Algorithms and Applications; In Proceedings of the Mathematics of Neural Networks and Applications (MANNA 1995); Oxford, UK, 3–7 July 1995, Ellacott, S.W., Mason, J.C., Anderson, I.J., Eds.; Kluwer: Boston, MA, USA, 1997; pp. 394–398. [Google Scholar]
- Rao, C.R. Differential Metrics in Probability Spaces. In Differential Geometry in Statistical Inference; Amari, S., Barndorff-Nielsen, O., Kass, R., Lauritzen, S., Rao, C.R., Eds.; IMS: Hayward, CA, USA, 1987; Volume 10, Lecture; pp. 217–240. [Google Scholar]
- Pistone, G.; Rogantin, M.P. The exponential statistical manifold: Mean parameters, orthogonality and space transformations. Bernoulli 1999, 5, 721–760. [Google Scholar] [CrossRef]
- Lang, S. Differential and Riemannian Manifolds; Springer-Verlag: New York, NY, USA, 1995. [Google Scholar]
- Zhang, J. Referential Duality and Representational Duality on Statistical Manifolds. In Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo, Japan, 12–16 December 2005; pp. 58–67.
- Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
- Basu, A.; Harris, I.R.; Hjort, N.; Jones, M. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
- Zhang, J. A note on curvature of alpha-connections of a statistical manifold. Ann. Inst. Stat. Math. 2007, 59, 161–170. [Google Scholar] [CrossRef]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica, 1967, 2, 229–318. [Google Scholar]
- Cichocki, A.; Cruces, S.; Amari, S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 2011, 13, 134–170. [Google Scholar] [CrossRef]
- Cichocki, A.; Amari, S. Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
- Zhang, J. Dual scaling between comparison and reference stimuli in multidimensional psychological space. J. Math. Psychol. 2004, 48, 409–424. [Google Scholar] [CrossRef]
- Zhang, J. Referential duality and representational duality in the scaling of multi-dimensional and infinite-dimensional stimulus space. In Measurement and Representation of Sensations: Recent Progress in Psychological Theory; Dzhafarov, E., Colonius, H., Eds.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2006. [Google Scholar]
- Critchley, F.; Marriott, P.; Salmon, M. Preferred point geometry and statistical manifolds. Ann. Stat. 1993, 21, 1197–1224. [Google Scholar] [CrossRef]
- Critchley, F.; Marriott, P.; Salmon, M. Preferred point geometry and the local differential geometry of the Kullback-Leibler divergence. Ann. Stat. 1994, 22, 1587–1602. [Google Scholar] [CrossRef]
- Critchley, F.; Marriott, P.K.; Salmon, M. On preferred point geometry in statistics. J. Stat. Plan. Inference 2002, 102, 229–245. [Google Scholar] [CrossRef]
- Marriott, P.; Vos, P. On the global geometry of parametric models and information recovery. Bernoulli 2004, 10, 639–649. [Google Scholar] [CrossRef]
- Zhu, H.-T.; Wei, B.-C. Some notes on preferred point α-geometry and α-divergence function. Stat. Probab. Lett. 1997, 33, 427–437. [Google Scholar] [CrossRef]
- Zhu, H.-T.; Wei, B.-C. Preferred point α-manifold and Amari’s α-connections. Stat. Probab. Lett. 1997, 36, 219–229. [Google Scholar] [CrossRef]
- Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
- Ohara, A.; Matsuzoe, H.; Amari, S. A dually at structure on the space of escort distributions. J. Phys. Conf. Ser. 2010, 201, No. 012012. [Google Scholar] [CrossRef]
- Amari, S.; Ohara, A. Geometry of q-exponential family of probability distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
- Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Physica A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
- Shima, H. Compact locally Hessian manifolds. Osaka J. Math. 1978, 15, 509–513. [Google Scholar]
- Shima, H.; Yagi, K. Geometry of Hessian manifolds. Differ. Geom. Its Appl. 1997, 7, 277–290. [Google Scholar] [CrossRef]
- Matumoto, T. Any statistical manifold has a contrast function—On the C3-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J. 1993, 23, 327–332. [Google Scholar]
© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Zhang, J. Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds. Entropy 2013, 15, 5384-5418. https://doi.org/10.3390/e15125384
Zhang J. Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds. Entropy. 2013; 15(12):5384-5418. https://doi.org/10.3390/e15125384
Chicago/Turabian StyleZhang, Jun. 2013. "Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds" Entropy 15, no. 12: 5384-5418. https://doi.org/10.3390/e15125384
APA StyleZhang, J. (2013). Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds. Entropy, 15(12), 5384-5418. https://doi.org/10.3390/e15125384