Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes
Abstract
:1. Introduction
2. Related Work
3. Background
4. Optimality
- Unifilarity: if the current hidden state and next emission are known, then the next hidden state is determined; and
- Minimality: minimal number of states (or generative complexity [29]) out of all unifilar generators consistent with the observed process.
5. Continuous Time-Bayesian Structural Inference (CT-BSI) and Comparison Algorithms
5.1. Inferring Optimal Models of Unifilar Hidden Semi-Markov Processes
- The second-to-last layer’s activation functions are ReLus ( and so have nonnegative output) and the weights to the last layer are constrained to be nonnegative; and
- The output is the last layer’s output divided by a numerical integration of the last layer’s output.
5.2. Improved Differential Entropy Rates
5.3. Improved Prediction with Causal States
6. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Geller, R.J. Earthquake prediction: A critical review. Geophys. J. Intl. 1997, 131, 425–450. [Google Scholar] [CrossRef] [Green Version]
- Rieke, F.; Warland, D.; de Ruyter van Steveninck, R.; Bialek, W. Spikes: Exploring the Neural Code; Bradford Book: New York, NY, USA, 1999. [Google Scholar]
- Berman, G.J.; Bialek, W.; Shaevitz, J.W. Predictability and hierarchy in drosophila behavior. Proc. Natl. Acad. Sci. USA 2016, 113, 11943–11948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cavagna, A.; Giardina, I.; Ginelli, F.; Mora, T.; Piovani, D.; Tavarone, R.; Walczak, A.M. Dynamical maximum entropy approach to flocking. Phys. Rev. E 2014, 89, 042707. [Google Scholar] [CrossRef] [Green Version]
- Shalizi, C.R.; Crutchfield, J.P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 2001, 104, 817–879. [Google Scholar] [CrossRef]
- Littman, M.L.; Sutton, R.S. Predictive representations of state. Adv. Neural Inf. Process. Syst. 2002, 14, 1555–1561. [Google Scholar]
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
- Strelioff, C.C.; Crutchfield, J.P. Bayesian structural inference for hidden processes. Phys. Rev. E 2014, 89, 042119. [Google Scholar] [CrossRef] [Green Version]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Marzen, S.E.; Crutchfield, J.P. Structure and randomness of continuous-time, discrete-event processes. J. Stat. Phys. 2017, 169, 303–315. [Google Scholar] [CrossRef] [Green Version]
- Nelder, J.A.; Wedderburn, R.W.M. Generalized linear models. J. R. Stat. Soc. Ser. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
- Rabiner, L.R.; Juang, B.H. An introduction to hidden Markov models. IEEE ASSP Mag. 1986, 3, 4–6. [Google Scholar] [CrossRef]
- Pfau, D.; Bartlett, N.; Wood, F. Probabilistic deterministic infinite automata. Neural Inf. Process. Syst. 2010, 23, 1930–1938. [Google Scholar]
- Grigoryeva, L.; Ortega, J.-P. Echo state networks are universal. Neural Netw. 2018, 108, 495–508. [Google Scholar] [CrossRef] [PubMed]
- Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef] [Green Version]
- El-Hay, T.; Friedman, N.; Koller, D.; Kupferman, R. Continuous time markov networks. arXiv 2012, arXiv:1206.6838. [Google Scholar]
- Nodelman, U.; Shelton, C.R.; Koller, D. Continuous time Bayesian networks. In Proceedings of the Eighteenth Conference Uncertainty in Artificial Intelligence, Edmonton, AB, Canada, 1–4 August 2002; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2002; pp. 378–387. [Google Scholar]
- Yang, S.; Khot, T.; Kersting, K.; Natarajan, S. Learning continuous-time bayesian networks in relational domains: A non-parametric approach. In Proceedings of the Thirtieth AAAI Conference Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Gomez-Rodriguez, M.; Balduzzi, D.; Schölkopf, B. Uncovering the temporal dynamics of diffusion networks. arXiv 2011, arXiv:1105.0697. [Google Scholar]
- Du, N.; Dai, H.; Trivedi, R.; Upadhyay, U.; Gomez-Rodriguez, M.; Song, L. Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 1555–1564. [Google Scholar]
- Mei, H.; Eisner, J.M. The neural hawkes process: A neurally self-modulating multivariate point process. Adv. Neural Inf. Process. Syst. 2017, 30, 6754–6764. [Google Scholar]
- Türkmen, A.C.; Wang, Y.; Smola, A.J. Fastpoint: Scalable deep point processes. In Proceedings of the ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases, Wuerzburg, Germany, 16–20 September 2019; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2020; Volume 11907, pp. 465–480. [Google Scholar]
- Mavroforakis, C.; Valera, I.; Gomez-Rodriguez, M. Modeling the dynamics of learning activity on the web. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 1421–1430. [Google Scholar]
- Karimi, M.R.; Tavakoli, E.; Farajtabar, M.; Song, L.; Gomez-Rodriguez, M. Smart broadcasting: Do you want to be seen? In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 1635–1644. [Google Scholar]
- Marzen, S.; Crutchfield, J.P. Informational and causal architecture of continuous-time renewal processes. J. Stat. Phys. 2017, 168, 109–127. [Google Scholar] [CrossRef] [Green Version]
- Travers, N.; Crutchfield, J.P. Equivalence of history and generator ϵ-machines. arXiv 2011, arXiv:1111.4500. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
- Gaspard, P.; Wang, X.-J. Noise, chaos, and (ϵ,τ)-entropy per unit time. Phys. Rep. 1993, 235, 291–343. [Google Scholar] [CrossRef]
- Löhr, W. Models of Discrete-Time Stochastic Processes and Associated Complexity Measures. Ph.D. Thesis, University of Leipzig, Leipzig, Germany, May 2009. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Fukunaga, K.; Hostetler, L. Optimization of k nearest neighbor density estimates. IEEE Trans. Inf. Theory 1973, 19, 320–326. [Google Scholar] [CrossRef]
- Marron, J.S. A comparison of cross-validation techniques in density estimation. Ann. Stat. 1987, 15, 152–162. [Google Scholar] [CrossRef]
- Magdon-Ismail, M.; Atiya, A.F. Neural networks for density estimation. Adv. Neural Inf. Process. Syst. 1999, 11, 522–528. [Google Scholar]
- Trentin, E.; Freno, A. Unsupervised nonparametric density estimation: A neural network approach. In Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 3140–3147. [Google Scholar]
- Wang, S. A neural network method of density estimation for univariate unimodal data. Neural Comput. Appl. 1994, 2, 160–167. [Google Scholar] [CrossRef]
- Kobyzev, I.; Prince, S.; Brubaker, M.A. Normalizing flows: Introduction and ideas. arXiv 2019, arXiv:1908.09257. [Google Scholar]
- Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 2003, 13, 25–54. [Google Scholar] [CrossRef]
- Egner, S.; Balakirsky, V.B.; Tolhuizen, L.; Baggen, S.; Hollmann, H. On the entropy rate of a hidden Markov model. In Proceedings of the International Symposium onInformation Theory, ISIT 2004, Chicago, IL, USA, 27 June–2 July 2004; IEEE: Piscataway, NJ, USA, 2004; p. 12. [Google Scholar]
- Arnold, D.; Loeliger, H.-A. On the information rate of binary-input channels with memory. In Proceedings of the ICC 2001, IEEE International Conference on Communications. Conference Record (Cat. No. 01Ch37240), Helsinki, Finland, 11–14 June 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 9, pp. 2692–2695. [Google Scholar]
- Nemenman, I.; Shafee, F.; Bialek, W. Entropy and inference, revisited. Adv. Neural Inf. Process. Syst. 2002, 14, 471–478. [Google Scholar]
- Archer, E.; Park, I.M.; Pillow, J.W. Bayesian entropy estimation for countable discrete distributions. J. Mach. Learn. Res. 2014, 15, 2833–2868. [Google Scholar]
- Costa, M.; Goldberger, A.L.; Peng, C.-K. Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Let. 2002, 89, 068102. [Google Scholar] [CrossRef] [Green Version]
- Victor, J.D. Binless strategies for estimation of information from neural data. Phys. Rev. E 2002, 66, 051903. [Google Scholar] [CrossRef] [Green Version]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Bialek, W.; Nemenman, I.; Tishby, N. Complexity through nonextensivity. Phys. A 2001, 302, 89–99. [Google Scholar] [CrossRef] [Green Version]
- Bialek, W.; Nemenman, I.; Tishby, N. Predictability, complexity, and learning. Neural Comp. 2001, 13, 2409–2463. [Google Scholar] [CrossRef] [PubMed]
- Schmidhuber, J.; Hochreiter, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
- Collins, J.; Sohl-Dickstein, J.; Sussillo, D. Capacity and trainability in recurrent neural networks. arXiv 2016, arXiv:1611.09913. [Google Scholar]
- Johnson, B.D.; Crutchfield, J.P.; Ellison, C.J.; McTague, C.S. Enumerating finitary processes. arXiv 2010, arXiv:1011.0036. [Google Scholar]
- James, R.G.; Ellison, C.J.; Crutchfield, J.P. Anatomy of a bit: Information in a time series observation. Chaos 2011, 21, 037109. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marzen, S.E.; Crutchfield, J.P. Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes. Entropy 2022, 24, 1675. https://doi.org/10.3390/e24111675
Marzen SE, Crutchfield JP. Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes. Entropy. 2022; 24(11):1675. https://doi.org/10.3390/e24111675
Chicago/Turabian StyleMarzen, Sarah E., and James P. Crutchfield. 2022. "Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes" Entropy 24, no. 11: 1675. https://doi.org/10.3390/e24111675
APA StyleMarzen, S. E., & Crutchfield, J. P. (2022). Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes. Entropy, 24(11), 1675. https://doi.org/10.3390/e24111675