Application of Machine Learning in Predicting Formation Condition of Multi-Gas Hydrate
Abstract
:1. Introduction
2. Methodologies and Experimental Data
2.1. Machine Learning Model Selection
2.1.1. Random Forest (RF)
2.1.2. Naive Bayes (NB)
2.1.3. Support Vector Regression (SVR)
2.2. Chen–Guo (C-G) Model
2.3. Data Preparation
2.4. Error Analysis
3. Results and Discussion
3.1. Computational Accuracy
3.1.1. Comparison of Machine Learning Models with Experimental Data
3.1.2. Comparison of Machine Learning Models with Chen-Guo Model
3.2. Comparison of Computing Time
4. Conclusions
- (1)
- After data augmentation, support vector machine regression cannot achieve the desired prediction results directly. It also affects the independence of each feature and indirectly affects the prediction accuracy of Naive Bayes. They perform well in small sample training. In this prediction, the independence between features decreases with the increase in the number of samples, and the independence between features is not enough to support Naive Bayes for accurate prediction.
- (2)
- Comparisons of predicted results indicate that Random Forest performs better in the stability and accuracy than the other two machine learning models. After data augmentation, the prediction accuracy of Random Forest is greatly improved. Data enhancements can be used for data preprocessing when forecasting later using Random Forest.
- (3)
- In terms of computation time, Naive Bayes and Support Vector Regression take the least computation time, followed by the Random Forest. The average time of Random Forests is 2.497 s being substantially less than that of the Chen–Guo model (189 s).In this work, the machine learning models show better performance in predicting the formation conditions of hydrates formed by pure gases and mixtures. However, the performance of machine learning model strongly depends on the training and testing steps, which requires large amounts of data. Therefore, the experimental data of hydrate formations are important to the performance of machine learning models.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Changyu, S.; Wenzhi, L.; Xin, Y.; Fengguang, L.; Qing, Y.; Liang, M.; Jun, C.; Bei, L.; Guangjin, C. Progress in research of gas hydrate. Chin. J. Chem. Eng. 2011, 19, 151–162. [Google Scholar]
- Sloan, E.D., Jr.; Koh, C.A. Clathrate Hydrates of Natural Gases; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
- Makogon, Y.F. Perspectives for the development of gas hydrate deposits, Gas hydrates and permafrost. In Proceedings of the 4th Canadian Permafrost Conference, Calgary, Alberta, 2–6 March 1982; pp. 299–304. [Google Scholar]
- Makogon, Y.F. Natural gas hydrates—A promising source of energy. J. Nat. Gas Sci. Eng. 2010, 2, 49–59. [Google Scholar] [CrossRef]
- Kvenvolden, K. A primer on the geological occurrence of gas hydrate. Geol. Soc. Lond. Spec. Publ. 1998, 137, 9–30. [Google Scholar] [CrossRef]
- Moridis, G.J.; Freeman, C.M. The RealGas and RealGasH2O options of the TOUGH+ code for the simulation of coupled fluid and heat flow in tight/shale gas systems. Comput. Geosci. 2014, 65, 56–71. [Google Scholar] [CrossRef] [Green Version]
- Eslamimanesh, A.; Mohammadi, A.H.; Richon, D.; Naidoo, P.; Ramjugernath, D. Application of gas hydrate formation in separation processes: A review of experimental studies. J. Chem. Thermodyn. 2012, 46, 62–71. [Google Scholar] [CrossRef]
- Collett, T.S.; Kuuskraa, V.A. Hydrates contain vast store of world gas resources. Oil Gas J. 1998, 96, 90–95. [Google Scholar]
- Khan, M.N.; Warrier, P.; Peters, C.J.; Koh, C.A. Review of vapor-liquid equilibria of gas hydrate formers and phase equilibria of hydrates. J. Nat. Gas Sci. Eng. 2016, 35, 1388–1404. [Google Scholar] [CrossRef]
- Platteeuw, J.; Van der Waals, J. Thermodynamic properties of gas hydrates. Mol. Phys. 1958, 1, 91–96. [Google Scholar] [CrossRef]
- Parrish, W.R.; Prausnitz, J.M. Dissociation pressures of gas hydrates formed by gas mixtures. Ind. Eng. Chem. Process Des. Dev. 1972, 11, 26–35. [Google Scholar] [CrossRef]
- Ng, H.-J.; Robinson, D.B. The measurement and prediction of hydrate formation in liquid hydrocarbon-water systems. Ind. Eng. Chem. Fundam. 1976, 15, 293–298. [Google Scholar] [CrossRef]
- Tohidi, B.; Danesh, A.; Todd, A. Modeling single and mixed electrolyte-solutions and its applications to gas hydrates. Chem. Eng. Res. Des. 1995, 73, 464–472. [Google Scholar]
- Chen, G.-J.; Guo, T.-M. Thermodynamic modeling of hydrate formation based on new concepts. Fluid Phase Equilibria 1996, 122, 43–65. [Google Scholar] [CrossRef]
- Klauda, J.B.; Sandler, S.I. A Fugacity Model for Gas Hydrate Phase Equilibria. Ind. Eng. Chem. Res. 2000, 39, 3377–3386. [Google Scholar] [CrossRef]
- Ballard, A.; Sloan, E., Jr. The next generation of hydrate prediction: I. Hydrate standard states and incorporation of spectroscopy. Fluid Phase Equilibria 2002, 194, 371–383. [Google Scholar] [CrossRef]
- Lee, S.Y.; Holder, G.D. Model for gas hydrate equilibria using a variable reference chemical potential: Part 1. AIChE J. 2002, 48, 161–167. [Google Scholar] [CrossRef]
- Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- El Naqa, I.; Murphy, M.J. What is machine learning? In Machine Learning in Radiation Oncology; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–11. [Google Scholar]
- Philbrick, K.A.; Weston, A.D.; Akkus, Z.; Kline, T.L.; Korfiatis, P.; Sakinis, T.; Kostandy, P.; Boonrod, A.; Zeinoddini, A.; Takahashi, N. RIL-contour: A medical imaging dataset annotation tool for and with deep learning. J. Digit. Imaging 2019, 32, 571–581. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ribeiro, M.; Grolinger, K.; Capretz, M.A. Mlaas: Machine learning as a service. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 896–902. [Google Scholar]
- Baştanlar, Y.; Özuysal, M. Introduction to machine learning. In miRNomics: MicroRNA Biology and Computational Analysis; Springer: Berlin/Heidelberg, Germany, 2014; pp. 105–128. [Google Scholar]
- Zhu, X.; Goldberg, A.B. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3, 1–130. [Google Scholar] [CrossRef] [Green Version]
- Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Springer: Berlin/Heidelberg, Germany, 2015; pp. 67–80. [Google Scholar]
- Simard, P.Y.; LeCun, Y.A.; Denker, J.S.; Victorri, B. Transformation invariance in pattern recognition—tangent distance and tangent propagation. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998; pp. 239–274. [Google Scholar]
- Kobayashi, S. Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv 2018, arXiv:1805.06201. [Google Scholar]
- Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
- Li, G.; Wen, C.; Huang, G.-B.; Chen, Y. Error tolerance based support vector machine for regression. Neurocomputing 2011, 74, 771–782. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Yan, J.; Li, D.; Guo, R.; Yan, H.; Wang, Y. Classification of tooth marks and tongue based on deep learning and random forest. Chin. J. Tradit. Chin. Med. 2022, 40, 19–22, 259–261. (In Chinese) [Google Scholar]
- Zhang, H. The optimality of naive Bayes. Aa 2004, 1, 3. [Google Scholar]
- Frank, E.; Trigg, L.; Holmes, G. Thchnical Note: Naive Bayes for Rdgression. Mach. Learn. 2000, 41, 5–25. [Google Scholar] [CrossRef] [Green Version]
- Lin, C.-J. Asymptotic convergence of an SMO algorithm without any assumptions. IEEE Trans. Neural Netw. 2002, 13, 248–250. [Google Scholar]
- Courant, R.; Hilbert, D. Methods of Mathematical Physics: Partial Differential Equations; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Vapnik, V.N.; Chervonenkis, A.J. Theory of Pattern Recognition; Nauka: Moscow, Russia, 1974. [Google Scholar]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Vapnik, V. Estimation of Dependences Based on Empirical Data; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
- Chen, G.; Sun, C.; Ma, Q. Gas Hydrate Science and Technology; Chemical Industry Press: Beijing, China, 2008. (In Chinese) [Google Scholar]
- Soave, G. Equilibrium constants from a modified Redlich-Kwong equation of state. Chem. Eng. Sci. 1972, 27, 1197–1203. [Google Scholar] [CrossRef]
- Nixdorf, J.; Oellrich, L.R. Experimental determination of hydrate equilibrium conditions for pure gases, binary and ternary mixtures and natural gases. Fluid Phase Equilibria 1997, 139, 325–333. [Google Scholar] [CrossRef]
Methods | RF | NB | SVR |
---|---|---|---|
Applicable issues | Multi-class classification, regression | Multi-class classification, regression | Two-class classification, regression |
Model characteristics | Build multiple decision trees | Joint probability distribution of features and categories, conditional independence assumption | Separating hyperplane |
Model type | Additive model | Generative model | Discriminant model |
Learning strategy | Integrate multiple decision trees | Maximum likelihood estimation | from the misclassification point |
Learned loss function | Square loss and exponential loss | Log likelihood loss | Hinge loss |
Learning algorithm | Integrated algorithm composed of decision tree | Probability calculation formula, EM algorithm | Sequence Minimum Optimization algorithm (SMO) |
Advantage | 1. Training can be highly parallelized; 2. The trained model has small variance and strong generalization ability; 3. Insensitive to some missing data. | 1. Fast and easy to train; 2. Not very sensitive to missing data. | 1. Can make predictions on small sample data; 2. It can handle high-dimensional features; 3. The classification plane does not depend on all data, but is only related to a few support vectors. |
Disadvantage | 1. Easy to fall into overfitting; 2. Under special circumstances, the attribute weights produced by random forests are not credible. | If the input variables are related, problems will occur. | 1. Difficult to train; 2. Need to find a suitable kernel function; 3. Sensitive to missing data. |
Component | |||||
---|---|---|---|---|---|
N2 | CO2 | CH4 | C2H6 | C3H8 | |
N2 | 0 | −0.0171 | 0.031199 | 0.031899 | 0.0886 |
CO2 | −0.0171 | 0 | 0.0956 | 0.1401 | 0.1368 |
CH4 | 0.031199 | 0.0956 | 0 | 0.002241 | 0.006829 |
C2H6 | 0.31899 | 0.1401 | 0.002241 | 0 | 0.001258 |
C3H8 | 0.0886 | 0.1368 | 0.006829 | 0.001258 | 0 |
Gas Name | Critical Temperature, Tc (K) | Critical Pressure, Pc (Mpa) |
Acentric Factor, | Molar Weight, MW(g/mol) |
---|---|---|---|---|
N2 | −146.96 | 3394.37 | 0.04 | −146.96 |
CO2 | 30.95 | 7370 | 0.23894 | 30.95 |
CH4 | −82.45 | 4640.68 | 0.011498 | −82.45 |
C2H6 | 32.38 | 4883.85 | 0.0986 | 32.38 |
C3H8 | 96.75 | 4256.66 | 0.1524 | 96.75 |
System | T/K | Peq./MPa | T/K | Peq./MPa | T/K | Peq./MPa |
---|---|---|---|---|---|---|
P1: 100% CH4 | 273.49 | 2.716 | 281.12 | 5.822 | 285.99 | 9.874 |
274.36 | 2.961 | 281.38 | 6 | 286.95 | 10.922 | |
275.11 | 3.18 | 282.07 | 6.428 | 287.85 | 12.314 | |
276.29 | 3.564 | 283.04 | 7.139 | 288.62 | 13.475 | |
277.46 | 3.998 | 283.39 | 7.43 | 289.44 | 15 | |
278 | 4.244 | 284.01 | 7.925 | 290.84 | 17.861 | |
278.25 | 4.348 | 284.17 | 7.972 | 291.57 | 19.165 | |
279.1 | 4.733 | 284.18 | 7.97 | 291.6 | 19.195 | |
280.16 | 5.304 | 285.08 | 8.928 | |||
P2: 100% N2 | 273.67 | 16.9350 | 275.77 | 20.7480 | ||
274.07 | 17.6680 | 277.27 | 24.0920 | |||
275.11 | 19.5210 | |||||
B1: 10.74% N2 + 89.26% CH4 | 278.7 | 4.938 | 288.68 | 14.976 | ||
282.03 | 6.943 | 290.97 | 20.023 | |||
285.64 | 10.399 | 292.44 | 24.428 | |||
B2: 9.53% C2H6 + 90.47% CH4 | 278.21 | 2.254 | 288.12 | 7.208 | 294.63 | 19.891 |
279.6 | 2.628 | 290.44 | 9.992 | 295.52 | 23.198 | |
283.69 | 4.191 | 292.97 | 14.849 | |||
B3: 0.03% N2 + 5% CO2 + 94.97% CH4 | 276.85 | 3.454 | 287.41 | 10.935 | ||
279.95 | 4.868 | 290.76 | 16.827 | |||
283.49 | 7.035 | 293.41 | 23.979 | |||
T1: 4.18% N2 + 4.89% C2H6 + 90.93% CH4 | 276.85 | 3.454 | 287.41 | 10.935 | ||
279.95 | 4.868 | 290.76 | 16.827 | |||
283.49 | 7.035 | 293.41 | 23.979 | |||
T2: 2.93% C3H8 + 12.55% C2H6 + 84.52% CH4 | 277.36 | 2.575 | 287.45 | 8.36 | 294.23 | 23.833 |
280.91 | 3.803 | 290.93 | 13.806 | |||
284.9 | 6.096 | 292.82 | 18.82 | |||
T3: 1.00% C3H8 + 3.98% C2H6 + 95.02% CH4 | 277.1 | 1.198 | 291.52 | 6.924 | 298.14 | 24.474 |
281.56 | 1.982 | 294.32 | 11.207 | |||
287.42 | 3.999 | 296.14 | 17.034 | |||
T4: 0.02% N2 + 5.13% C2H6 + 5.25% CO2 + 89.60% CH4 | 279.01 | 2.964 | 290.04 | 11.876 | ||
283.54 | 5.003 | 292.28 | 17.515 | |||
287.08 | 7.735 | 294.21 | 24.326 |
Error Index | Methods | ||
---|---|---|---|
RF | NB | SVR | |
MAPE | 1.24% | 23.66% | 24.21% |
MAE | 0.16 | 0.25 | 0.31 |
RMSE | 0.23 | 0.32 | 0.45 |
MSE | 0.10 | 0.18 | 0.62 |
R2 | 1.0000 | 0.9993 | 0.9993 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Z.; Tian, H. Application of Machine Learning in Predicting Formation Condition of Multi-Gas Hydrate. Energies 2022, 15, 4719. https://doi.org/10.3390/en15134719
Yu Z, Tian H. Application of Machine Learning in Predicting Formation Condition of Multi-Gas Hydrate. Energies. 2022; 15(13):4719. https://doi.org/10.3390/en15134719
Chicago/Turabian StyleYu, Zimeng, and Hailong Tian. 2022. "Application of Machine Learning in Predicting Formation Condition of Multi-Gas Hydrate" Energies 15, no. 13: 4719. https://doi.org/10.3390/en15134719
APA StyleYu, Z., & Tian, H. (2022). Application of Machine Learning in Predicting Formation Condition of Multi-Gas Hydrate. Energies, 15(13), 4719. https://doi.org/10.3390/en15134719