Adversarially Training MCMC with Non-Volume-Preserving Flows
Abstract
:1. Introduction
2. Background
2.1. Markov Chain Monte Carlo and Metropolis–Hasting Algorithm
2.2. Hamiltonian Monte Carlo and Exploration on Total Energy Function
2.3. Real-Valued Non-Volume-Preserving Flows
3. The Proposed Method
3.1. Using Non-Volume-Preserving Flows as Generator
3.2. Loss Function and Training Procedure
Algorithm 1 Traing NVP-MC |
Input: Energy function , batch size M, learning rate , number of iterations N, empty buffer B, transition kernels and . |
1: Initialize B using HMC without the MH step. Initialize the parameters of the transition kernel and parameters of the discriminator . |
2: for do |
3: Sample a batch of Gaussian noise as the start points. |
4: for do |
5: Randomly sample a number u in open interval . |
6: Choose transition kernel: |
7: Generate the new sample through |
8: Accept the new sample with probability computed by Equation (4), |
9: and replace the samples in B with the accepted samples. |
end for |
10: Sample a batch from B as the correct samples. |
11: Update the discriminator: |
12: |
13: Update the transition kernel: |
14: |
end for |
4. Related Work
4.1. Getting MCMC Transition Kernels through Adversarially Training
4.2. Parameterized Non-Volume-Preserving Transition Kernels
5. Experiment
5.1. Performance Indexes
5.2. Varieties of Challenging Energy Functions
5.3. Bayesian Logistic Regression
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Neal, R.M. Probabilistic Inference Using Markov Chain Monte Carlo Methods; Technical Report; Department of Computer Science, University of Toronto: New Brunswick, MA, Canada, 1993. [Google Scholar]
- Martino, L.; Read, J. On the flexibility of the design of multiple try Metropolis schemes. Comput. Stat. 2013, 28, 2797–2823. [Google Scholar] [CrossRef] [Green Version]
- Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
- Wang, Z.; Mohamed, S.; Freitas, N. Adaptive Hamiltonian and riemann manifold Monte Carlo. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Wang, J.; Sun, S. Decomposed slice sampling for factorized distributions. Pattern Recognit. 2020, 97, 107021. [Google Scholar] [CrossRef]
- Duane, S.; Kennedy, A.D.; Pendleton, B.J.; Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 1987, 195, 216–222. [Google Scholar] [CrossRef]
- Simsekli, U.; Yildiz, C.; Nguyen, T.H.; Richard, G.; Cemgil, A.T. Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Betancourt, M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv 2017, arXiv:1701.02434. [Google Scholar]
- Psutka, J.V.; Psutka, J. Sample size for maximum-likelihood estimates of Gaussian model depending on dimensionality of pattern space. Pattern Recognit. 2019, 91, 25–33. [Google Scholar] [CrossRef] [Green Version]
- Betancourt, M.; Byrne, S.; Girolami, M. Optimizing the integrator step size for Hamiltonian Monte Carlo. arXiv 2014, arXiv:1411.6669. [Google Scholar]
- Zou, D.; Xu, P.; Gu, Q. Stochastic Variance-Reduced Hamilton Monte Carlo Methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Levy, D.; Hoffman, M.D.; Sohl-Dickstein, J. Generalizing Hamiltonian Monte Carlo with Neural Networks. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Liu, C.; Zhuo, J.; Zhu, J. Understanding MCMC Dynamics as Flows on the Wasserstein Space. arXiv 2019, arXiv:1902.00282. [Google Scholar]
- Song, J.; Zhao, S.; Ermon, S. A-NICE-MC: Adversarial training for MCMC. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Azadi, S.; Olsson, C.; Darrell, T.; Goodfellow, I.; Odena, A. Discriminator rejection sampling. arXiv 2018, arXiv:1810.06758. [Google Scholar]
- Dinh, L.; Krueger, D.; Bengio, Y. Nice: Non-linear independent components estimation. arXiv 2014, arXiv:1410.8516. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Wei, G.; Luo, M.; Liu, H.; Zhang, D.; Zheng, Q. Progressive generative adversarial networks with reliable sample identification. Pattern Recognit. Lett. 2020, 130, 91–98. [Google Scholar] [CrossRef]
- Pasarica, C.; Gelman, A. Adaptively scaling the Metropolis algorithm using expected squared jumped distance. Stat. Sin. 2010, 20, 343–364. [Google Scholar]
- Yang, J.; Roberts, G.O.; Rosenthal, J.S. Optimal scaling of Metropolis algorithms on general target distributions. arXiv 2019, arXiv:1904.12157. [Google Scholar]
- Green, P.J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Cong, Y.; Chen, B.; Liu, H.; Zhou, M. Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Betancourt, M.; Byrne, S.; Livingstone, S.; Girolami, M. The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 2017, 23, 2257–2298. [Google Scholar] [CrossRef]
- Tripuraneni, N.; Rowland, M.; Ghahramani, Z.; Turner, R. Magnetic Hamiltonian Monte Carlo. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Huang, L.; Wang, L. Accelerated Monte Carlo simulations with restricted Boltzmann machines. Phys. Rev. 2017, 95, 035105. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Chen, C.; Carlson, D.; Carin, L. Preconditioned Stochastic Gradient Langevin Dynamics for deep neural networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved variational inference with inverse autoregressive flow. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using Real NVP. arXiv 2016, arXiv:1605.08803. [Google Scholar]
- Ma, F.; Ayaz, U.; Karaman, S. Invertibility of convolutional generative networks from partial measurements. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Dinh, V.; Bilge, A.; Zhang, C.; Matsen, F.A., IV. Probabilistic path Hamiltonian Monte Carlo. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Zhang, Y.; Ghahramani, Z.; Storkey, A.J.; Sutton, C.A. Continuous relaxations for discrete Hamilton Monte Carlo. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Ichiki, A.; Ohzeki, M. Violation of detailed balance accelerates relaxation. arXiv 2013, arXiv:1306.6131. [Google Scholar] [CrossRef] [Green Version]
- Neal, R.M. MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo; Chapman & Hall/CRC: Boca Raton, FL, USA, 2011. [Google Scholar]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
- Rezende, D.J.; Mohamed, S. Variational Inference with Normalizing Flows. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.; Scholkopf, B.; Smola, A.J. Integrating structured biological data by Kernel Maximum Mean Discrepancy. IBM J. Res. Dev. 2006, 22, 49–57. [Google Scholar] [CrossRef] [Green Version]
- MacKay, D.J.C. The Evidence Framework Applied to Classification Networks. Neural Comput. 1992, 4, 720–736. [Google Scholar] [CrossRef]
- Hokman, M.D.; Gelman, A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Berkeley, CA, USA, 2009. [Google Scholar]
- Tóth, J.; Tomán, H.; Hajdu, A. Efficient sampling-based energy function evaluation for ensemble optimization using simulated annealing. Pattern Recognit. 2020, 107, 107510. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 13 March 2022).
- Hanley, J.A.; McNeil, B.J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148, 839–843. [Google Scholar] [CrossRef] [Green Version]
Target | NVP-MC | A-NICE-MC | L2HMC | HMC |
---|---|---|---|---|
MoG6 | 320.0 | 311.2 | 1.0 | |
GF | 270.0 | 304.1 | 8.0 | |
SCG | 539.4 | 497.0 | 0.48 | |
Ring5 | 155.6 | 69.1 | 0.43 | |
50-d ICG | 0.29 | 0.78 | 0.02 |
Dataset | LR | VBLR | HMC | NVP-MC |
---|---|---|---|---|
HA | 69.3 ± 0.2 | 69.3 ± 0.1 | 69.3 ± 0.2 | ± |
PI | 76.6 ± 0.2 | 76.2 ± 0.1 | 76.6 ± 0.1 | ± |
MA | 82.5 ± 0.3 | ± | ± | ± |
BL | 76.0 ± 0.2 | 76.0 ± 0.2 | 76.0 ± 0.3 | ± |
IM | 77.7 ± 0.3 | 77.8 ± 0.4 | 83.2 ± 0.2 | ± |
IN | ± | 73.2 ± 0.2 | 73.2 ± 0.2 | 73.8 ± 0.2 |
HE | 75.9 ± 0.2 | 75.9 ± 0.2 | 75.9 ± 0.2 | ± |
GE | 71.5 ± 0.1 | 71.5 ± 0.1 | 72.5 ± 0.2 | ± |
AU | 86.9 ± 0.2 | 87.6 ± 0.2 | 87.6 ± 0.2 | ± |
Dataset | LR | VBLR | HMC | NVP-MC |
---|---|---|---|---|
HA | 62.7 ± 0.1 | 63.2 ± 0.1 | 63.0 ± 0.2 | ± |
PI | 79.2 ± 0.2 | 79.3 ± 0.1 | 79.3 ± 0.1 | ± |
MA | ± | 89.8 ± 0.1 | ± | ± |
BL | 73.5 ± 0.3 | 73.4 ± 0.3 | ± | 73.6 ± 0.2 |
IM | 76.7 ± 0.3 | 78.5 ± 0.5 | 89.2 ± 0.2 | ± |
IN | 73.2 ± 0.3 | ± | 72.4 ± 0.2 | 72.7 ± 0.2 |
HE | 80.1 ± 0.2 | 81.3 ± 0.2 | ± | 81.9 ± 0.2 |
GE | 74.7 ± 0.2 | 75.5 ± 0.2 | 76.7 ± 0.3 | ± |
AU | 92.5 ± 0.2 | ± | 93.9 ± 0.3 | ± |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Sun, S. Adversarially Training MCMC with Non-Volume-Preserving Flows. Entropy 2022, 24, 415. https://doi.org/10.3390/e24030415
Liu S, Sun S. Adversarially Training MCMC with Non-Volume-Preserving Flows. Entropy. 2022; 24(3):415. https://doi.org/10.3390/e24030415
Chicago/Turabian StyleLiu, Shaofan, and Shiliang Sun. 2022. "Adversarially Training MCMC with Non-Volume-Preserving Flows" Entropy 24, no. 3: 415. https://doi.org/10.3390/e24030415
APA StyleLiu, S., & Sun, S. (2022). Adversarially Training MCMC with Non-Volume-Preserving Flows. Entropy, 24(3), 415. https://doi.org/10.3390/e24030415