On Lower Bounds for Statistical Learning Theory
Abstract
:1. Introduction
2. Statistical Estimation
2.1. Fano’s Method
2.2. Local Packings
2.3. Metric Entropy
- (i)
- f is differentiable times on
- (ii)
- , for all , where
- (iii)
- is 1-Lipschitz on .
3. Community Recovery
3.1. Weak Recovery
3.2. Exact Recovery
4. Online Learning
4.1. Stochastic Bandits
4.2. Adversarial Bandits
5. Discussion
Acknowledgments
Conflicts of Interest
References
- Bousquet, O.; Boucheron, S.; Lugosi, G. Introduction to statistical learning theory. In Advanced Lectures on Machine Learning; Springer: Berlin/Heidelberger, Germany, 2004; pp. 169–207. [Google Scholar]
- Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer Series in Statistics; Springer: Berlin/Heidelberger, Germany, 2001; Volume 1. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
- Raskutti, G.; Wainwright, M.J.; Yu, B. Minimax rates of estimation for high-dimensional linear regression over ℓq-balls. IEEE Trans. Inf. Theory 2011, 57, 6976–6994. [Google Scholar] [CrossRef]
- Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer: Berlin/Heidelberger, Germany, 2008. [Google Scholar]
- Santhanam, N.P.; Wainwright, M.J. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Trans. Inf. Theory 2012, 58, 4117–4134. [Google Scholar] [CrossRef]
- Guntuboyina, A. Lower bounds for the minimax risk using f-divergences, and applications. IEEE Trans. Inf. Theory 2011, 57, 2386–2399. [Google Scholar] [CrossRef]
- Cai, T.T.; Zhang, C.H.; Zhou, H.H. Optimal rates of convergence for covariance matrix estimation. Ann. Stat. 2010, 38, 2118–2144. [Google Scholar] [CrossRef]
- Amini, A.A.; Wainwright, M.J. High-Dimensional Analysis of Semidefinite Relaxations for Sparse Principal Components. Ann. Stat. 2009, 37, 2877–2921. [Google Scholar] [CrossRef]
- Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer Science & Business Media: Berlin/Heidelberger, Germany, 2006. [Google Scholar]
- Kühn, T. A lower estimate for entropy numbers. J. Approx. Theory 2001, 110, 120–124. [Google Scholar] [CrossRef]
- Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
- Yang, Y.; Barron, A. Information-theoretic determination of minimax rates of convergence. Ann. Stat. 1999, 27, 1564–1599. [Google Scholar]
- Lorentz, G.G. Metric entropy and approximation. Bull. Am. Math. Soc. 1966, 72, 903–937. [Google Scholar] [CrossRef]
- Tikhomirov, V.M.; Shiryayev, A.N. ϵ-entropy and ϵ-capacity of sets in functional spaces. In Selected Works of A.N. Kolmogorov: Volume III: Information Theory and the Theory of Algorithms; Springer: Dordrecht, The Netherlands, 1993; pp. 86–170. [Google Scholar]
- Stone, C.J. Optimal global rates of convergence for nonparametric regression. Ann. Stat. 1982, 10, 1040–1053. [Google Scholar] [CrossRef]
- Abbe, E. Community detection and stochastic block models: Recent developments. arXiv, 2017; arXiv:1703.10146. [Google Scholar]
- Zhang, A.Y.; Zhou, H.H. Minimax rates of community detection in stochastic block models. Ann. Stat. 2016, 44, 2252–2280. [Google Scholar] [CrossRef]
- Gao, C.; Ma, Z.; Zhang, A.Y.; Zhou, H.H. Achieving optimal misclassification proportion in stochastic block model. arXiv, 2015; arXiv:1505.03772. [Google Scholar]
- Xu, M.; Jog, V.; Loh, P. Optimal Rates for Community Estimation in the Weighted Stochastic Block Model. arXiv, 2017; arXiv:1706.01175. [Google Scholar]
- Abbe, E.; Sandon, C. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 17–20 October 2015; pp. 670–688. [Google Scholar]
- Yun, S.Y.; Proutiere, A. Optimal cluster recovery in the labeled stochastic block model. In Advances in Neural Information Processing Systems, Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 4–9 December 2016; The Neural Information Processing Systems (NIPS) Foundation: La Jolla, CA, USA, 2016; pp. 965–973. [Google Scholar]
- Bollobás, B. Random Graphs (Cambridge Studies in Advanced Mathematics); Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
- Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2016, 62, 471–487. [Google Scholar] [CrossRef]
- Jog, V.; Loh, P. Recovering communities in weighted stochastic block models. In Proceedings of the 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–2 October 2015; pp. 1308–1315. [Google Scholar]
- Abbe, E.; Montanari, A. Conditional random fields, planted constraint satisfaction and entropy concentration. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques; Springer: Berlin/Heidelberger, Germany, 2013; pp. 332–346. [Google Scholar]
- Chen, Y.; Suh, C.; Goldsmith, A.J. Information recovery from pairwise measurements. IEEE Trans. Inf. Theory 2016, 62, 5881–5905. [Google Scholar] [CrossRef]
- Chen, Y.; Xu, J. Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. J. Mach. Learn. Res. 2016, 17, 882–938. [Google Scholar]
- Hajek, B.; Wu, Y.; Xu, J. Submatrix localization via message passing. arXiv, 2015; arXiv:1510.09219. [Google Scholar]
- Bubeck, S.; Cesa-Bianchi, N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 2012, 5, 1–122. [Google Scholar] [CrossRef]
- Cesa-Bianchi, N.; Lugosi, G. Prediction, Learning, and Games; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Lai, T.L.; Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 1985, 6, 4–22. [Google Scholar] [CrossRef]
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
- Anthony, M.; Bartlett, P.L. Neural Network Learning: Theoretical Foundations; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
- Even-Dar, E.; Mannor, S.; Mansour, Y. PAC bounds for multi-armed bandit and Markov decision processes. In Proceedings of the Fifteenth Annual Conference on Computational Learning Theory, Sydney, Australia, 8–10 July 2002; Springer: Berlin, Germany, 2002; pp. 255–270. [Google Scholar]
- Mannor, S.; Tsitsiklis, J.N. The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 2004, 5, 623–648. [Google Scholar]
- Auer, P.; Cesa-Bianchi, N.; Freund, Y.; Schapire, R.E. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 2002, 32, 48–77. [Google Scholar] [CrossRef]
- Maillard, O.; Munos, R. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 570–578. [Google Scholar]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Hyvärinen, A.; Karhunen, J.; Oja, E. ICA by Minimization of Mutual Information. In Independent Component Analysis; John Wiley & Sons, Inc.: New York, NY, USA, 2002; pp. 221–227. [Google Scholar]
- Janzing, D.; Mooij, J.; Zhang, K.; Lemeire, J.; Zscheischler, J.; Daniušis, P.; Steudel, B.; Schölkopf, B. Information-geometric approach to inferring causal directions. Artif. Intell. 2012, 182, 1–31. [Google Scholar] [CrossRef]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Maes, F.; Collignon, A.; Vandermeulen, D.; Marchal, G.; Suetens, P. Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 1997, 16, 187–198. [Google Scholar] [CrossRef] [PubMed]
- Wolpert, D.H.; Wolf, D.R. Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E 1995, 52, 6841. [Google Scholar] [CrossRef]
- Paninski, L. Estimation of entropy and mutual information. Neural Comput. 2003, 15, 1191–1253. [Google Scholar] [CrossRef]
- Valiant, G.; Valiant, P. Estimating the unseen: An n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, San Jose, CA, USA, 6–8 June 2011; pp. 685–694. [Google Scholar]
- Jiao, J.; Venkat, K.; Han, Y.; Weissman, T. Minimax estimation of functionals of discrete distributions. IEEE Trans. Inf. Theory 2015, 61, 2835–2885. [Google Scholar] [CrossRef]
© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Loh, P.-L. On Lower Bounds for Statistical Learning Theory. Entropy 2017, 19, 617. https://doi.org/10.3390/e19110617
Loh P-L. On Lower Bounds for Statistical Learning Theory. Entropy. 2017; 19(11):617. https://doi.org/10.3390/e19110617
Chicago/Turabian StyleLoh, Po-Ling. 2017. "On Lower Bounds for Statistical Learning Theory" Entropy 19, no. 11: 617. https://doi.org/10.3390/e19110617
APA StyleLoh, P. -L. (2017). On Lower Bounds for Statistical Learning Theory. Entropy, 19(11), 617. https://doi.org/10.3390/e19110617