Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference
Abstract
:1. Introduction
- De Finetti’s vision of statistics is grounded on the irrefutable fact that the Bayesian standpoint—intended as the use of basic tools of probability theory and, especially, of conditional distributions—becomes a necessity for those who intend statistical inference as the utilization of observed data to update their original beliefs about other quantities of interest, not yet observed. See [5,6].
- Rigorous notions of point estimation and optimality of an estimator can be achieved only within a decision-theoretic framework (see, e.g., [7]), at least if we admit all estimators into competition and disregard distinguished restrictions such as unbiasedness or equivariance. In turn, decision theory proves to be genuinely Bayesian, thanks to a well-known result by Abraham Wald. See [8] [Chapter 4].
- At least from a mathematical stance, the existence of the prior distribution can be drawn from various representation theorems which, by pertaining to the more basic act of modeling incoming information, stand before the problem of point estimation. The most luminous example is the celebrated de Finetti representation theorem for exchangeable observations. See [6,9] and, for a predictive approach [10,11].
1.1. Main Contributions and General Strategy
- (A)
- The loss function on is harmoniously coordinated with the original choice of the loss function on . This principle is much aligned with de Finetti’s thought (see [18]), since it remarks on the more concrete nature of the space compared with the space which is, in principle, only a set of labels. Hence, it is much more reasonable to firstly metrize the space and then the space accordingly (as in (6)), rather than directly metrize —even without taking account of the original predictive aim.
- (B)
- The Bayesian risk function associated with both and can be bounded from above by the sum of two quantities: the former taking account of the error in estimating T, the latter reflecting the fact that we are estimating both and from an “estimated distribution”.
1.2. Organization of the Paper
- (i)
- Proposition 2, which shows how to bound from above the Bayesian risk of any estimator of by using the Wasserstein distance;
- (ii)
- Proposition 3, which explains how to use the Laplace method of the approximation of integrals to get asymptotic expansions of the Bayesian risk functions;
- (iii)
- the formulation of the compatibility Equations (43) and (44);
- (iv)
- the proof of the “asymptotic almost efficiency” of the estimator obtained in Step 2, via verification of identities (2) and (3);
- (v)
- the successful completion of Step 6, that is, the proof of the “asymptotic almost efficiency” of estimators obtained in Step 5, via verification of identity (10).
2. Technical Preliminaries
2.1. The General Framework
2.2. The Simplified Framework
- (i)
- for all and ;
- (ii)
- for any fixed , belongs to ;
- (iii)
- there exists a separable Hilbert space for which for all , and such that, for any open whose closure is compact in Θ (, in symbols), the restriction operators are continuous from to ;
- (iv)
- for π-a.e. θ, and the Kullback-Leibler divergence
3. Main Results
- (A1)
- (26) holds uniformly with respect to some class of continuous functionals , in the sense that
- (A2)
- both the functionals and belong to , for all
4. Applications and Examples
4.1. The Gaussian Model
4.2. The Exponential Model
4.3. The Pareto Model
4.4. Robbins Approach to Empirical BAYES
4.5. An Example of Real Data Analysis
5. Proofs
5.1. Theorem 1
5.2. Proposition 1
5.3. Lemma 1
5.4. Proposition 2
5.5. Proposition 3
5.6. Theorem 2
5.7. Theorem 3
6. Conclusions and Future Developments
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cifarelli, D.M.; Dolera, E.; Regazzini, E. Note on “Frequentist Approximations to Bayesian prevision of exchangeable random elements” [Int. J. Approx. Reason. 78 (2016) 138–152]. Int. J. Approx. Reason. 2017, 86, 26–27. [Google Scholar] [CrossRef]
- Cifarelli, D.M.; Dolera, E.; Regazzini, E. frequentist approximations to Bayesian prevision of exchangeable random elements. Int. J. Approx. Reason. 2016, 78, 138–152. [Google Scholar] [CrossRef] [Green Version]
- Dolera, E. On an asymptotic property of posterior distributions. Boll. Dell’Unione Mat. Ital. 2013, 6, 741–748. (In Italian) [Google Scholar]
- Dolera, E.; Regazzini, E. Uniform rates of the Glivenko–Cantelli convergence and their use in approximating Bayesian inferences. Bernoulli 2019, 25, 2982–3015. [Google Scholar] [CrossRef] [Green Version]
- de Finetti, B. Bayesianism: Its unifying role for both the foundations and applications of statistics. Int. Stat. Rev. 1974, 42, 117–130. [Google Scholar] [CrossRef]
- de Finetti, B. La prévision: Ses lois logiques, ses sources subjectives. Ann. L’Inst. Henri Poincaré 1937, 7, 1–68. [Google Scholar]
- Ferguson, T.S. Mathematical Statistics: A Decision Theoretic Approach; Academic Press: Cambridge, MA, USA, 1967. [Google Scholar]
- Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Aldous, D.J. Exchangeability and Related Topics; Ecole d’Eté de Probabilités de Saint-Flour XIII, Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1985; pp. 1–198. [Google Scholar]
- Berti, P.; Pratelli, L.; Rigo, P. Exchangeable sequences driven by an absolutely continuous random measure. Ann. Probab. 2013, 78, 138–152. [Google Scholar] [CrossRef] [Green Version]
- Fortini, S.; Ladelli, L.; Regazzini, E. Exchangeability, predictive distributions and parametric models. Sankhya 2000, 62, 86–109. [Google Scholar]
- Rubin, D.B. Bayesianly justifiable and relevant frequency calculations for the applied statisticians. Ann. Stat. 1984, 12, 1151–1172. [Google Scholar] [CrossRef]
- Lijoi, A.; Prünster, I. Models beyond the Dirichlet process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C.C., Müller, P., Walker, S.G., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 80–136. [Google Scholar]
- Robbins, H. The empirical Bayes approach to statistical decision problems. Ann. Math. Stat. 1964, 35, 1–20. [Google Scholar] [CrossRef]
- Ghosh, J.K.; Sinha, B.K.; Joshi, S.N. Expansions for posterior probability and integrated Bayes risk. In Statistical Decision Theory and Related Topics III; Gupta, S., Berger, J., Eds.; Academic Press: Cambridge, MA, USA, 1982; pp. 403–456. [Google Scholar]
- Favaro, S.; Nipoti, B.; Teh, Y.W. Rediscovery of Good-Turing estimators via Bayesian nonparametrics. Biometrics 2016, 72, 136–145. [Google Scholar] [CrossRef] [PubMed]
- Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian Nonparametric Estimation of the Probability of Discovering New Species. Biometrika 2009, 94, 769–786. [Google Scholar] [CrossRef]
- de Finetti, B. Probabilità di una teoria e probabilità dei fatti. In Studi di Probabilità, Statistica e Ricerca Operativa in onore di Giuseppe Pompilj; Oderisi: Gubbio, Italy, 1971; pp. 86–101. (In Italian) [Google Scholar]
- Rao, R.C. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
- Amari, S.-I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
- Oller, J.M.; Corcuera, J.M. Intrinsic analysis of statistical estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]
- Zhang, C.-H. Estimation of sums of random variables: Example and information bounds. Ann. Stat. 2005, 33, 2022–2041. [Google Scholar] [CrossRef] [Green Version]
- Robbins, H. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability; Statistical Laboratory of the University of California: Davis Davis, CA, USA, 1956; Volume I, pp. 157–163. [Google Scholar]
- Berezin, S.; Miftakhov, A. On barycenters of probability measures. Bull. Pol. Acad. Sci. Math. 2020, 68, 11–20. [Google Scholar] [CrossRef]
- Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
- Kim, Y.-H.; Pass, B. Nonpositive curvature, the variance functional, and the Wasserstein barycenter. Proc. Am. Math. Soc. 2000, 148, 1745–1756. [Google Scholar] [CrossRef]
- Ambrosio, L.; Gigli, N.; Savaré, G. Gradient Flows in Metric Spaces and in the Space of Probability Measures, 2nd ed.; Birkhäuser: Basel, Switzerland, 2008. [Google Scholar]
- Billingsley, P. Probability and Measure, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1995. [Google Scholar]
- do Carmo, M.P. Riemannian Geomerty; Birkhäuser: Basel, Switzerland, 2013. [Google Scholar]
- Heinonen, J.; Kilpeläinen, T.; Martio, O. Nonlinear Potential Theory of Degenerate Elliptic Equations; Oxford Science Publications: Oxford, UK, 2008. [Google Scholar]
- Kufner, A. Weighted Sobolev Spaces; John Wiley & Sons: Hoboken, NJ, USA, 1985. [Google Scholar]
- de Finetti, B. La legge dei grandi numeri nel caso dei numeri aleatori equivalenti. Rend. Della R. Accad. Naz. Lincei 1933, 18, 203–207. (In Italian) [Google Scholar]
- Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Borwein, J.M.; Noll, D. Second order differentiability of convex functions in Banach spaces. Trans. Am. Math. Soc. 1994, 132, 43–81. [Google Scholar] [CrossRef]
- Dolera, E.; Favaro, S. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem. Bernoulli 2020, 26, 1294–1322. [Google Scholar] [CrossRef]
- Mijoule, G.; Peccati, G.; Swan, Y. On the rate of convergence in de Finetti’s representation theorem. Lat. Am. J. Probab. Math. Stat. 2016, 13, 1–23. [Google Scholar] [CrossRef]
- Dolera, E. Estimates of the approximation of weighted sums of conditionally independent random variables by the normal law. J. Inequal. Appl. 2013, 2013, 320. [Google Scholar] [CrossRef] [Green Version]
- Götze, F. On the rate of convergence in the central limit theorem in Banach Spaces. Ann. Probab. 1986, 14, 922–942. [Google Scholar] [CrossRef]
- Tierney, L.; Kadane, J.B. Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
- DasGupta, A. Asymptotic Theory of Statistics and Probability; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Dall’Aglio, G. Sugli estremi dei momenti delle funzioni di ripartizione doppia. Ann. Della Sc. Norm. Super. Pisa Cl. Sci. 1956, 10, 35–74. (In Italian) [Google Scholar]
- Dolera, E.; Mainini, E. On Uniform Continuity of Posterior Distributions. Stat. Probab. Lett. 2020, 157, 108627. [Google Scholar] [CrossRef] [Green Version]
- Dolera, E.; Mainini, E. Lipschitz continuity of probability kernels in the optimal transport framework. arXiv 2020, arXiv:2010.08380. [Google Scholar]
- Dowson, D.C.; Landau, B.V. The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 1982, 12, 450–455. [Google Scholar] [CrossRef] [Green Version]
- Olkin, I.; Pukelsheim, F. The distance between two random vectors with given dispersion matrices. Linear Algebra Its Appl. 1982, 48, 257–263. [Google Scholar] [CrossRef] [Green Version]
- Malagó, L.; Montrucchio, L.; Pistone, G. Wasserstein Riemannian geometry of positive definite matrices. Inf. Geom. 2018, 1, 137–179. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T. Computer Age Statistical Inference. Algorithms, Evidence, and Data Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Thyrion, P. Contribution à l’étude du bonus pour non sinistre en assurance automobile. ASTIN Bull. J. IAA 1960, 1, 142–162. (In French) [Google Scholar] [CrossRef] [Green Version]
- van Houwelingen, J.C. Monotonizing empirical Bayes estimators for a class of discrete distributions with monotone likelihood ratio. Stat. Neerl. 1977, 31, 95–104. [Google Scholar] [CrossRef]
- Carlin, B.P.; Louis, T.A. Bayesian Methods for Data Analysis, 3rd ed.; Chapman and Hall: Boca Raton, FL, USA, 2009. [Google Scholar]
- Ledoux, M.; Talagr, M. Probability in Banach Spaces; Springer: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
- Wong, R. Asymptotic Approximations of Integrals; SIAM: Philadelphia, PA, USA, 2001. [Google Scholar]
- McClure, J.P.; Wong, R. Error bounds for multidimensional Laplace approximation. J. Approx. Theory 1983, 37, 372–390. [Google Scholar] [CrossRef] [Green Version]
- Olver, F.W.J. Error bounds for the Laplace approximation for definite integrals. J. Approx. Theory 1968, 1, 293–313. [Google Scholar] [CrossRef] [Green Version]
- Dolera, E.; Favaro, S. A Berry–Esseen theorem for Pitman’s α–diversity. Ann. Appl. Probab. 2020, 30, 847–869. [Google Scholar] [CrossRef]
- Albeverio, S.; Steblovskaya, V. Asymptotics of infinite-dimensional integrals with respect to smooth measures. (I). Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1999, 2, 529–556. [Google Scholar] [CrossRef]
- Gigli, N. Second order analysis on (P2(M), W2). Mem. Am. Math. Soc. 2012, 216, xii+154. [Google Scholar]
- Gigli, N.; Ohta, S.I. First variation formula in Wasserstein spaces over compact Alexandrov spaces. Can. Math. Bull. 2010, 55, 723–735. [Google Scholar] [CrossRef] [Green Version]
- Villani, C. Optimal Transport. Old and New; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Cuturi, M.; Doucet, A. Fast Computation of Wasserstein Barycenters. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, June 21–26 2014; Volume 32, pp. 685–693. [Google Scholar]
- Smith, R.L. Maximum likelihood estimation in a class of nonregular cases. Biometrika 1985, 72, 67–90. [Google Scholar] [CrossRef]
- Woodroofe, M. Maximum likelihood estimation of a translation parameter of a truncated distribution. Ann. Math. Stat. 1972, 43, 113–122. [Google Scholar] [CrossRef]
- Woodroofe, M. Maximum likelihood estimation of a translation parameter of a truncated distribution (II). Ann. Stat. 1974, 2, 474–488. [Google Scholar] [CrossRef]
- Giné, E.; Nickl, R. Mathematical Foundations of Infinite-Dimensional Statistical Models; Cambridge Series in Statistical and Probabilistic Mathematics: Cambridge, UK, 2016. [Google Scholar]
- Efron, B.; Thisted, R. Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 1976, 63, 435–447. [Google Scholar] [CrossRef] [Green Version]
- Good, I.J. The population frequencies of species and the estimation of population parameters. Biometrika 1953, 40, 237–264. [Google Scholar] [CrossRef]
- Good, I.J.; Toulmin, G.H. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 1956, 43, 45–63. [Google Scholar] [CrossRef]
- Orlitsky, A.; Suresh, A.T.; Wu, Y. Optimal prediction of the number of unseen species. Proc. Natl. Acad. Sci. USA 2016, 113, 13283–13288. [Google Scholar] [CrossRef] [Green Version]
- Maritz, J.S.; Lwin, T. Empirical Bayes Methods with Applications; Chapman and Hall: Boca Raton, FL, USA, 1989. [Google Scholar]
- Fisher, R.A.; Corbet, A.S.; Williams, C.B. The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 1943, 12, 42–58. [Google Scholar] [CrossRef]
- Favaro, S.; Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian nonparametric inference for species variety with a two parameter Poisson-Dirichlet process prior. J. Roy. Statist. Soc. Ser. B 2009, 71, 993–1008. [Google Scholar] [CrossRef]
- Favaro, S.; Lijoi, A.; Prünster, I. A new estimator of the discovery probability. Biometrics 2012, 68, 1188–1196. [Google Scholar] [CrossRef] [Green Version]
- Arbel, J.; Favaro, S.; Nipoti, B.; Teh, Y.W. Bayesian nonparametric inference for discovery probabilities: Credible intervals and large sample asymptotic. Stat. Sin. 2017, 27, 839–858. [Google Scholar] [CrossRef]
- Dolera, E.; Favaro, S. A compound Poisson perspective of Ewens–Pitman sampling model. Mathematics 2021, 9, 2820. [Google Scholar] [CrossRef]
- Pitman, J. Combinatorial Stochastic Processes; Ecole d’Eté de Probabilités de Saint-Flour XXXII, Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Sambasivan, R.; Das, S.; Sahu, S.K. A Bayesian perspective of statistical machine learning for big data. Comput. Stat. 2020, 35, 893–930. [Google Scholar] [CrossRef] [Green Version]
- Cormode, G.; Yi, K. Small Summaries for Big Data; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Dolera, E.; Favaro, S.; Peluchetti, S. Learning-augmented count-min sketches via Bayesian nonparametrics. arXiv 2021, arXiv:2102.04462. [Google Scholar]
Claims | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Counts | 7840 | 1317 | 239 | 42 | 14 | 4 | 4 | 1 |
Robbins estimator | 0.168 | 0.363 | 0.527 | 1.33 | 1.43 | 6.00 | 1.25 | 0 |
Gamma MLE | 0.164 | 0.398 | 0.633 | 0.87 | 1.10 | 1.34 | 1.57 | 0 |
Claims | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Counts | 7840 | 1317 | 239 | 42 | 14 | 4 | 4 | 1 |
Estimator | 0.176 | 0.353 | 0.53 | 0.706 | 0.882 | 1.06 | 1.23 | 1.41 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dolera, E. Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference. Mathematics 2022, 10, 1136. https://doi.org/10.3390/math10071136
Dolera E. Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference. Mathematics. 2022; 10(7):1136. https://doi.org/10.3390/math10071136
Chicago/Turabian StyleDolera, Emanuele. 2022. "Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference" Mathematics 10, no. 7: 1136. https://doi.org/10.3390/math10071136
APA StyleDolera, E. (2022). Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference. Mathematics, 10(7), 1136. https://doi.org/10.3390/math10071136