Learning Effective Good Variables from Physical Data
Abstract
:1. Introduction
2. Methods
2.1. Problem Statement
2.2. Searching for Good Variables by Regression Models
2.2.1. Single Invariant Group in Power Form
2.2.2. Multiple Concurrent Invariant Groups in Power Form
2.2.3. Further Generalization to Non Power Forms
2.2.4. Regression Model and Procedure Implementation
2.3. Searching for Good Variables by Classification Models
3. Numerical Examples and Discussion
3.1. Datasets Creation
3.2. Dittus–Boelter Equation
3.2.1. Use of Regression Models
3.2.2. Use of Classification Models
3.3. Gnielinski Correlation
3.3.1. Use of Regression Models
3.3.2. Use of Classification Models
3.4. Newton’s Law of Universal Gravitation
4. Conclusions and Final Remarks
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ML | Machine Learning |
DNN | Deep Neural Network |
MAE | Mean Absolute Error |
RMSE | Root Mean Squared Error |
Probability Density Functions | |
GEV | Generalized Extreme Value |
References
- Rappaz, M.; Bellet, M.; Deville, M.O.; Snyder, R. Numerical Modeling in Materials Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
- Chen, B.; Huang, K.; Raghupathi, S.; Chandratreya, I.; Du, Q.; Lipson, H. Automated discovery of fundamental variables hidden in experimental data. Nat. Comput. Sci. 2022, 2, 433–442. [Google Scholar] [CrossRef] [PubMed]
- Floryan, D.; Graham, M.D. Data-driven discovery of intrinsic dynamics. Nat. Mach. Intell. 2022, 4, 1113–1120. [Google Scholar] [CrossRef]
- Eva, B.; Ried, K.; Müller, T.; Briegel, H.J. How a Minimal Learning Agent can Infer the Existence of Unobserved Variables in a Complex Environment. Minds Mach. 2023, 33, 185–219. [Google Scholar] [CrossRef] [PubMed]
- Chiavazzo, E. Approximation of slow and fast dynamics in multiscale dynamical systems by the linearized Relaxation Redistribution Method. J. Comput. Phys. 2012, 231, 1751–1765. [Google Scholar] [CrossRef]
- Chiavazzo, E.; Karlin, I.V. Quasi-equilibrium grid algorithm: Geometric construction for model reduction. J. Comput. Phys. 2008, 227, 5535–5560. [Google Scholar] [CrossRef]
- Rayleigh, L., VIII. On the question of the stability of the flow of fluids. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1892, 34, 59–70. [Google Scholar] [CrossRef]
- Buckingham, E. On physically similar systems; illustrations of the use of dimensional equations. Phys. Rev. 1914, 4, 345. [Google Scholar] [CrossRef]
- Curtis, W.; Logan, J.D.; Parker, W. Dimensional analysis and the pi theorem. Linear Algebra Its Appl. 1982, 47, 117–126. [Google Scholar] [CrossRef]
- Chiavazzo, E.; Covino, R.; Coifman, R.R.; Gear, C.W.; Georgiou, A.S.; Hummer, G.; Kevrekidis, I.G. Intrinsic map dynamics exploration for uncharted effective free-energy landscapes. Proc. Natl. Acad. Sci. USA 2017, 114, E5494–E5503. [Google Scholar] [CrossRef]
- Chiavazzo, E.; Gear, C.W.; Dsilva, C.J.; Rabin, N.; Kevrekidis, I.G. Reduced models in chemical kinetics via nonlinear data-mining. Processes 2014, 2, 112–140. [Google Scholar] [CrossRef]
- Lin, K.K.; Lu, F. Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism. J. Comput. Phys. 2021, 424, 109864. [Google Scholar] [CrossRef]
- McRee, R.K. Symbolic regression using nearest neighbor indexing. In Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, New York, NY, USA, 7–11 July 2010; pp. 1983–1990. [Google Scholar]
- Stijven, S.; Minnebo, W.; Vladislavleva, K. Separating the wheat from the chaff: On feature selection and feature importance in regression random forests and symbolic regression. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, Dublin, Ireland, 12–16 July 2011; pp. 623–630. [Google Scholar]
- McConaghy, T. FFX: Fast, scalable, deterministic symbolic regression technology. In Genetic Programming Theory and Practice IX; Springer: New York, NY, USA, 2011; pp. 235–260. [Google Scholar]
- Arnaldo, I.; O’Reilly, U.M.; Veeramachaneni, K. Building predictive models via feature synthesis. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, New York, NY, USA, 11–15 July 2015; pp. 983–990. [Google Scholar]
- Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. [Google Scholar] [CrossRef]
- Quade, M.; Abel, M.; Nathan Kutz, J.; Brunton, S.L. Sparse identification of nonlinear dynamics for rapid model recovery. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 063116. [Google Scholar] [CrossRef]
- Searson, D.P.; Leahy, D.E.; Willis, M.J. GPTIPS: An open source genetic programming toolbox for multigene symbolic regression. In Proceedings of the International Multiconference of Engineers and Computer Scientists, Hong Kong, China, 17–19 March 2010; Citeseer. Volume 1, pp. 77–80. [Google Scholar]
- Dubčáková, R. Eureqa: Software Review. 2011. Available online: https://www.researchgate.net/publication/220286070_Eureqa_software_review (accessed on 5 May 2024).
- Schmidt, M.; Lipson, H. Distilling free-form natural laws from experimental data. Science 2009, 324, 81–85. [Google Scholar] [CrossRef]
- Udrescu, S.M.; Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 2020, 6, eaay2631. [Google Scholar] [CrossRef] [PubMed]
- Trezza, G.; Chiavazzo, E. Leveraging composition-based energy material descriptors for machine learning models. Mater. Today Commun. 2023, 36, 106579. [Google Scholar] [CrossRef]
- Bonke, S.A.; Trezza, G.; Bergamasco, L.; Song, H.; Rodríguez-Jiménez, S.; Hammarström, L.; Chiavazzo, E.; Reisner, E. Multi-Variable Multi-Metric Optimization of Self-Assembled Photocatalytic CO2 Reduction Performance Using Machine Learning Algorithms. J. Am. Chem. Soc. 2024, 146, 15648–15658. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Al-Helali, B.; Chen, Q.; Xue, B.; Zhang, M. Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2269–2282. [Google Scholar] [CrossRef]
- Branch, M.A.; Coleman, T.F.; Li, Y. A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J. Sci. Comput. 1999, 21, 1–23. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc. 1943, 35, 99–110. [Google Scholar]
- Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat. 1946, 7, 401–406. [Google Scholar]
- Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberg, Germany, 2009; Volume 338. [Google Scholar]
- Lide, D.R.; Kehiaian, H.V. CRC Handbook of Thermophysical and Thermochemical Data; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
- Tailor, N.K.; Abdi-Jalebi, M.; Gupta, V.; Hu, H.; Dar, M.I.; Li, G.; Satapathi, S. Recent progress in morphology optimization in perovskite solar cell. J. Mater. Chem. A 2020, 8, 21356–21386. [Google Scholar] [CrossRef]
- Huan, X.; Marzouk, Y.M. Simulation-based optimal Bayesian experimental design for nonlinear systems. J. Comput. Phys. 2013, 232, 288–317. [Google Scholar] [CrossRef]
- Motoyama, Y.; Tamura, R.; Yoshimi, K.; Terayama, K.; Ueno, T.; Tsuda, K. Bayesian optimization package: PHYSBO. Comput. Phys. Commun. 2022, 278, 108405. [Google Scholar] [CrossRef]
Group | Found | Reliable | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.71 | 0.71 | - | −0.71 | 0.05 | −0.70 | 0.05 | - | - | yes | yes | |
0.89 | −0.45 | - | 0.91 | 0.04 | −0.42 | 0.07 | - | - | yes | yes | |
0.89 | −0.45 | - | 0.89 | 0.01 | −0.45 | 0.03 | - | - | yes | yes | |
0.89 | −0.45 | - | 0.90 | 0.04 | −0.42 | 0.11 | - | - | yes | yes | |
0.89 | −0.45 | - | 0.87 | 0.02 | −0.50 | 0.04 | - | - | yes | yes | |
−0.71 | −0.71 | - | −0.67 | 0.10 | −0.73 | 0.10 | - | - | yes | yes | |
0.67 | 0.67 | −0.33 | 0.68 | 0.03 | 0.65 | 0.04 | −0.33 | 0.07 | yes | yes | |
0.67 | 0.67 | −0.33 | 0.66 | 0.03 | 0.64 | 0.03 | −0.37 | 0.03 | yes | yes | |
0.82 | −0.41 | −0.41 | 0.87 | 0.04 | −0.37 | 0.06 | −0.34 | 0.10 | yes | yes | |
0.82 | −0.41 | −0.41 | 0.83 | 0.06 | −0.35 | 0.11 | −0.41 | 0.05 | yes | yes |
Group/Set | Found | Reliable | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.707 | 0.707 | - | - | - | −0.673 | 0.067 | −0.734 | 0.056 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | 0.801 | 0.022 | −0.597 | 0.030 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | 0.949 | 0.010 | −0.313 | 0.032 | - | - | - | - | - | - | yes | no | |
- | - | - | - | - | −0.867 | 0.065 | −0.468 | 0.161 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | 0.807 | 0.042 | −0.587 | 0.055 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | 0.948 | 0.015 | −0.314 | 0.049 | - | - | - | - | - | - | yes | no | |
- | - | - | - | - | 0.868 | 0.068 | 0.479 | 0.111 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | −0.918 | 0.035 | −0.389 | 0.076 | - | - | - | - | - | - | yes | no | |
- | - | - | - | - | 0.756 | 0.052 | −0.650 | 0.056 | - | - | - | - | - | - | yes | yes | |
- | - | - | - | - | 0.467 | 0.033 | −0.883 | 0.017 | - | - | - | - | - | - | yes | yes | |
0.707 | −0.707 | - | 0.707 | −0.707 | 0.730 | 0.047 | −0.680 | 0.054 | - | - | −0.656 | 0.011 | 0.755 | 0.010 | yes | yes | |
0.707 | −0.707 | - | 0.707 | −0.707 | 0.695 | 0.028 | −0.718 | 0.027 | - | - | −0.687 | 0.008 | 0.727 | 0.007 | yes | yes | |
- | - | - | - | - | 0.313 | 0.000 | −0.950 | 0.000 | - | - | −0.950 | 0.001 | 0.313 | 0.002 | yes | no | |
- | - | - | - | - | −0.576 | 0.056 | −0.814 | 0.043 | - | - | −0.593 | 0.017 | 0.805 | 0.013 | yes | yes | |
- | - | - | - | - | 0.872 | 0.045 | −0.482 | 0.074 | - | - | −0.224 | 0.058 | 0.973 | 0.011 | yes | no | |
- | - | - | - | - | −0.665 | 0.052 | −0.744 | 0.044 | - | - | −0.503 | 0.036 | 0.863 | 0.020 | yes | yes | |
- | - | - | - | - | 0.860 | 0.016 | −0.510 | 0.026 | - | - | −0.868 | 0.036 | 0.491 | 0.063 | yes | yes | |
0.577 | 0.577 | −0.577 | 0.707 | −0.707 | 0.602 | 0.043 | 0.552 | 0.065 | −0.570 | 0.053 | 0.698 | 0.011 | −0.716 | 0.011 | yes | yes | |
- | - | - | - | - | 0.580 | 0.002 | 0.660 | 0.014 | −0.476 | 0.023 | 0.817 | 0.004 | −0.576 | 0.006 | yes | yes | |
- | - | - | - | - | 0.492 | 0.012 | 0.703 | 0.017 | −0.512 | 0.026 | 0.625 | 0.012 | −0.780 | 0.009 | yes | yes |
Group | Found | Reliable | ||||||
---|---|---|---|---|---|---|---|---|
0.707 | −0.707 | −0.679 | 0.111 | 0.721 | 0.083 | yes | yes | |
0.707 | −0.707 | −0.711 | 0.023 | 0.702 | 0.023 | yes | yes | |
0.707 | −0.707 | −0.683 | 0.030 | 0.722 | 0.044 | yes | yes | |
- | - | 0.105 | 0.556 | 0.759 | 0.322 | no | - | |
- | - | 0.000 | 0.000 | 1.000 | 0.000 | no | - | |
- | - | −0.604 | 0.369 | 0.335 | 0.622 | no | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barletta, G.; Trezza, G.; Chiavazzo, E. Learning Effective Good Variables from Physical Data. Mach. Learn. Knowl. Extr. 2024, 6, 1597-1618. https://doi.org/10.3390/make6030077
Barletta G, Trezza G, Chiavazzo E. Learning Effective Good Variables from Physical Data. Machine Learning and Knowledge Extraction. 2024; 6(3):1597-1618. https://doi.org/10.3390/make6030077
Chicago/Turabian StyleBarletta, Giulio, Giovanni Trezza, and Eliodoro Chiavazzo. 2024. "Learning Effective Good Variables from Physical Data" Machine Learning and Knowledge Extraction 6, no. 3: 1597-1618. https://doi.org/10.3390/make6030077
APA StyleBarletta, G., Trezza, G., & Chiavazzo, E. (2024). Learning Effective Good Variables from Physical Data. Machine Learning and Knowledge Extraction, 6(3), 1597-1618. https://doi.org/10.3390/make6030077