Smooth Function Approximation by Deep Neural Networks with General Activation Functions
Abstract
:1. Introduction
Notation
2. Deep Neural Networks
3. Classes of Activation Functions
3.1. Piecewise Linear Activation Functions
- ReLU: .
- Leaky ReLU: for .
3.2. Locally Quadratic Activation Functions
4. Approximation of Smooth Functions by Deep Neural Networks
4.1. Hölder Smooth Functions
4.2. Approximation of Hölder Smooth Functions
5. Application to Statistical Learning Theory
5.1. Application to Regression
5.2. Application to Binary Classification
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Proof of Theorem 1
Appendix A.1. Proof of Theorem 1 for Piecewise Linear Activation Functions
Appendix A.2. Proof of Theorem 1 for Locally Quadratic Activation Functions
- (a)
- There is a neural network with such that
- (b)
- Let . There is a neural network parameter with such that
- (c)
- Let α be a positive integer. For any multi-index with , there is a network parameter with such that
- (d)
- There is a network parameter with such that
- (e)
- There is a network parameter with such that
Appendix B. Proofs of Proposition 1
Appendix C. Proof of Theorem 2
Appendix D. Proof of Theorem 3
- there exists a sequence of classes of functions with for some such that there is with for some universal constant ;
- for some universal constant .
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Funahashi, K.I. On the approximate realization of continuous mappings by neural networks. Neural Netw. 1989, 2, 183–192. [Google Scholar] [CrossRef]
- Chui, C.K.; Li, X. Approximation by ridge functions and neural networks with one hidden layer. J. Approx. Theory 1992, 70, 131–141. [Google Scholar] [CrossRef] [Green Version]
- Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 1993, 6, 861–867. [Google Scholar] [CrossRef] [Green Version]
- Telgarsky, M. Neural networks and rational functions. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3387–3393. [Google Scholar]
- Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 2017, 94, 103–114. [Google Scholar] [CrossRef] [Green Version]
- Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. arXiv 2017, arXiv:1708.06633. [Google Scholar]
- Bauer, B.; Kohler, M. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Ann. Stat. 2019. accepted. [Google Scholar] [CrossRef]
- Li, B.; Tang, S.; Yu, H. Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units. arXiv 2019, arXiv:1903.05858. [Google Scholar]
- Suzuki, T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: Optimal rate and curse of dimensionality. arXiv 2018, arXiv:1810.08033. [Google Scholar]
- Petersen, P.; Voigtlaender, F. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw. 2018, 108, 296–330. [Google Scholar] [CrossRef] [Green Version]
- Imaizumi, M.; Fukumizu, K. Deep Neural Networks Learn Non-Smooth Functions Effectively. arXiv 2018, arXiv:1802.04474. [Google Scholar]
- Bergstra, J.; Desjardins, G.; Lamblin, P.; Bengio, Y. Quadratic Polynomials Learn Better Image Features; Technical Report 1337; Département d’Informatique et de Recherche Operationnelle, Université de Montréal: Montréal, QC, Canada, 2009. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Carlile, B.; Delamarter, G.; Kinney, P.; Marti, A.; Whitney, B. Improving deep learning by inverse square root linear units (ISRLUs). arXiv 2017, arXiv:1710.09967. [Google Scholar]
- Klimek, M.D.; Perelstein, M. Neural Network-Based Approach to Phase Space Integration. arXiv 2018, arXiv:1810.11509. [Google Scholar]
- Wuraola, A.; Patel, N. SQNL: A New Computationally Efficient Activation Function. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–7. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Mhaskar, H.N. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1993, 1, 61–80. [Google Scholar] [CrossRef]
- Costarelli, D.; Vinti, G. Saturation classes for max-product neural network operators activated by sigmoidal functions. Results Math. 2017, 72, 1555–1569. [Google Scholar] [CrossRef]
- Costarelli, D.; Spigler, R. Solving numerically nonlinear systems of balance laws by multivariate sigmoidal functions approximation. Comput. Appl. Math. 2018, 37, 99–133. [Google Scholar] [CrossRef]
- Costarelli, D.; Vinti, G. Estimates for the neural network operators of the max-product type with continuous and p-integrable functions. Results Math. 2018, 73, 12. [Google Scholar] [CrossRef]
- Costarelli, D.; Sambucini, A.R. Approximation results in Orlicz spaces for sequences of Kantorovich max-product neural network operators. Results Math. 2018, 73, 15. [Google Scholar] [CrossRef]
- Kim, Y.; Ohn, I.; Kim, D. Fast convergence rates of deep neural networks for classification. arXiv 2018, arXiv:1812.03599. [Google Scholar]
- Anthony, M.; Bartlett, P.L. Neural Network Learning: Theoretical Foundations; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
- Mammen, E.; Tsybakov, A.B. Smooth discrimination analysis. Ann. Stat. 1999, 27, 1808–1829. [Google Scholar]
- Tsybakov, A.B. Optimal aggregation of classifiers in statistical learning. Ann. Stat. 2004, 32, 135–166. [Google Scholar] [CrossRef]
- Lin, Y. A note on margin-based loss functions in classification. Stat. Probab. Lett. 2004, 68, 73–82. [Google Scholar] [CrossRef]
- Steinwart, I.; Christmann, A. Support Vector Machines; Springer Science & Business Media: New York, NY, USA, 2008. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ohn, I.; Kim, Y. Smooth Function Approximation by Deep Neural Networks with General Activation Functions. Entropy 2019, 21, 627. https://doi.org/10.3390/e21070627
Ohn I, Kim Y. Smooth Function Approximation by Deep Neural Networks with General Activation Functions. Entropy. 2019; 21(7):627. https://doi.org/10.3390/e21070627
Chicago/Turabian StyleOhn, Ilsang, and Yongdai Kim. 2019. "Smooth Function Approximation by Deep Neural Networks with General Activation Functions" Entropy 21, no. 7: 627. https://doi.org/10.3390/e21070627
APA StyleOhn, I., & Kim, Y. (2019). Smooth Function Approximation by Deep Neural Networks with General Activation Functions. Entropy, 21(7), 627. https://doi.org/10.3390/e21070627