Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China
Abstract
:1. Introduction
- Do nonlinear individual classifiers always show better performance in terms of accuracy than linear individual classifiers for well-logging-based lithology classification?
- Do ensemble methods consistently outperform individual classification models and by what margin? Which (if any) is the superior ensemble method?
- How well can different lithology classes in our study be distinguished by the best-performing models?
2. Study Area and Dataset
3. Machine Learning Models for Lithology Classification
3.1. Linear Models for Classification
3.1.1. Logistic Regression
3.1.2. Linear Discriminant Analysis
3.2. Nonlinear Models for Classification
3.2.1. k-Nearest Neighbor
3.2.2. Support Vector Machine
3.2.3. Decision Trees
3.3. Ensemble Models for Classification
3.3.1. Random Forest
3.3.2. Extreme Gradient Boosting Trees
3.4. Experiment Setting and Parameter Tuning
3.5. Model Evaluation
4. Results and Discussion
4.1. Hyperparameter Optimization
4.2. Overall Performances
4.3. Lithology Classification Evaluation
4.4. Feature Importance
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AUC | Area under the receiver operating characteristic curve |
XGBoost | Extreme Gradient Boosting |
RF | Random Forest |
kNN | k-Nearest Neighborhood |
SVM | Support Vector Machine |
GBDT | Gradient-Boosted Decision Trees |
DGF | Daniudi gas field |
HGF | Hangjinqi gas field |
GR | Gamma ray |
SP | Self potential |
CALI | Caliper log |
RE | Self potential |
RESS | Shallow reading resistivity measurement |
RESM | Medium deep reading resistivity measurement |
RESD | Deep reading resistivity measurement |
PHIN | Neutron porosity log |
RHOB | Bulk density log |
M | Mudstone |
S | Sandstone |
SC | Sandy conglomerate |
LR | Logistic Regression |
LDA | Linear Discriminant Analysis |
DT | Decision Trees |
True positive | |
False positive | |
True negative | |
False negative |
ROC | Receiver operating characteristics curve |
True positive rate | |
False positive rate | |
TCSC | Tractive current sandy conglomerates |
GFSC | Gravity flow sandy conglomerates |
References
- Buryakovsky, L.; Chilingar, G.V.; Rieke, H.H.; Shin, S. Fundamentals of the Petrophysics of Oil and Gas Reservoirs; Wiley: Hoboken, NJ, USA, 2012; p. 400. [Google Scholar]
- Gu, Y.; Bao, Z.; Song, X.; Patil, S.; Ling, K. Complex lithology prediction using probabilistic neural network improved by continuous restricted Boltzmann machine and particle swarm optimization. J. Pet. Sci. Eng. 2019, 179, 966–978. [Google Scholar] [CrossRef]
- Liu, J.J.; Liu, J.C. An intelligent approach for reservoir quality evaluation in tight sandstone reservoir using gradient boosting decision tree algorithm—A case study of the Yanchang Formation, mid-eastern Ordos Basin, China. Mar. Pet. Geol. 2021, 126, 104939. [Google Scholar] [CrossRef]
- Xie, Y.; Zhu, C.; Lu, Y.; Zhu, Z. Towards Optimization of Boosting Models for Formation Lithology Identification. Math. Probl. Eng. 2019, 2019, 5309852. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Wu, Y.; Cao, Y.; Lv, W.; Han, H.; Li, Z.; Chang, J. Well logging based lithology identification model establishment under data drift: A transfer learning method. Sensors 2020, 20, 3643. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Z.; He, Y.; Huang, X. Study on Fracture Characteristics and Controlling Factors of Tight Sandstone Reservoir: A Case Study on the Huagang Formation in the Xihu Depression, East China Sea Shelf Basin, China. Lithosphere 2021, 2021, 1–15. [Google Scholar] [CrossRef]
- Lu, X.; Sun, D.; Xie, X.; Chen, X.; Zhang, S.; Zhang, S.; Sun, G.; Shi, J. Microfacies characteristics and reservoir potential of Triassic Baikouquan Formation, northern Mahu Sag, Junggar Basin, NW China. J. Nat. Gas Geosci. 2019, 4, 47–62. [Google Scholar] [CrossRef]
- Li, W.; Hu, W.; Abubakar, A. Machine learning and data analytics for geoscience applications - Introduction. Geophysics 2020, 85, WAI–WAII. [Google Scholar] [CrossRef]
- Bergen, K.J.; Johnson, P.A.; De Hoop, M.V.; Beroza, G.C. Machine learning for data-driven discovery in solid Earth geoscience. Science 2019, 363, eaau0323. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Carr, T.R.; Pal, M. Comparison of supervised and unsupervised approaches for mudstone lithofacies classification: Case studies from the Bakken and Mahantango-Marcellus Shale, USA. J. Nat. Gas Sci. Eng. 2016, 33, 1119–1133. [Google Scholar] [CrossRef] [Green Version]
- Singh, H.; Seol, Y.; Myshakin, E.M. Automated Well-Log Processing and Lithology Classification by Identifying Optimal Features through Unsupervised and Supervised Machine-Learning Algorithms. SPE J. 2020, 25, 2778–2800. [Google Scholar] [CrossRef]
- Rosid, M.S.; Haikel, S.; Haidar, M.W. Carbonate reservoir rock type classification using comparison of Naïve Bayes and Random Forest method in field “S” East Java. AIP Conf. Proc. 2019, 2168, 020019. [Google Scholar] [CrossRef]
- Al-Mudhafar, W.J. Integrating well log interpretations for lithofacies classification and permeability modeling through advanced machine learning algorithms. J. Pet. Explor. Prod. Technol. 2017, 7, 1023–1033. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Yang, S.; Zhao, Y.; Wang, Y. Lithology identification using an optimized KNN clustering method based on entropy-weighed cosine distance in Mesozoic strata of Gaoqing field, Jiyang depression. J. Pet. Sci. Eng. 2018, 166, 157–174. [Google Scholar] [CrossRef]
- Al-Anazi, A.; Gates, I.D. A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs. Eng. Geol. 2010, 114, 267–277. [Google Scholar] [CrossRef]
- J Al-Mudhafar, W. Integrating kernel support vector machines for efficient rock facies classification in the main pay of Zubair formation in South Rumaila oil field, Iraq. Model. Earth Syst. Environ. 2017, 3, 12. [Google Scholar] [CrossRef]
- Dev, V.A.; Eden, M.R. Formation lithology classification using scalable gradient boosted decision trees. Comput. Chem. Eng. 2019, 128, 392–404. [Google Scholar] [CrossRef]
- Sun, Z.; Jiang, B.; Li, X.; Li, J.; Xiao, K. A data-driven approach for lithology identification based on parameter-optimized ensemble learning. Energies 2020, 13, 3903. [Google Scholar] [CrossRef]
- Bressan, T.S.; Kehl de Souza, M.; Girelli, T.J.; Junior, F.C. Evaluation of machine learning methods for lithology classification using geophysical data. Comput. Geosci. 2020, 139, 104475. [Google Scholar] [CrossRef]
- Ren, X.; Hou, J.; Song, S.; Liu, Y.; Chen, D.; Wang, X.; Dou, L. Lithology identification using well logs: A method by integrating artificial neural networks and sedimentary patterns. J. Pet. Sci. Eng. 2019, 182, 106336. [Google Scholar] [CrossRef]
- Liu, Y.; Huang, C.; Zhou, Y.; Lu, Y.; Ma, Q. The controlling factors of lacustrine shale lithofacies in the Upper Yangtze Platform (South China) using artificial neural networks. Mar. Pet. Geol. 2020, 118, 104350. [Google Scholar] [CrossRef]
- Xie, Y.; Zhu, C.; Zhou, W.; Li, Z.; Liu, X.; Tu, M. Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. J. Pet. Sci. Eng. 2018, 160, 182–193. [Google Scholar] [CrossRef]
- Dev, V.A.; Eden, M.R. Gradient Boosted Decision Trees for Lithology Classification. Comput. Aided Chem. Eng. 2019, 47, 113–118. [Google Scholar] [CrossRef]
- Merembayev, T.; Kurmangaliyev, D.; Bekbauov, B.; Amanbek, Y. A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan. Energies 2021, 14, 1896. [Google Scholar] [CrossRef]
- Tao, J.; Zhang, C.; Qu, J.; Yu, S.; Zhu, R. A de-flat roundness method for particle shape quantitative characterization. Arab. J. Geosci. 2018, 15, 414. [Google Scholar] [CrossRef]
- Yu, S.; Ma, J. Deep Learning for Geophysics: Current and Future Trends. Rev. Geophys. 2021, 59, e2021RG000742. [Google Scholar] [CrossRef]
- Banks, D.L.; Fienberg, S.E. Data Mining, Statistics. In Encyclopedia of Physical Science and Technology; Academic Press: Cambridge, MA, USA, 2003; pp. 247–261. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Ghojogh, B.; Ca, B.; Crowley, M.; Ca, M. Linear and quadratic discriminant analysis: Tutorial. arXiv 2019, arXiv:1906.02590. [Google Scholar]
- Stanimirova, I.; Daszykowski, M.; Walczak, B. Robust Methods in Analysis of Multivariate Food Chemistry Data. Data Handl. Sci. Technol. 2013, 28, 315–340. [Google Scholar] [CrossRef]
- Mucherino, A.; Papajorgji, P.J.; Pardalos, P.M. Data Mining in Agriculture; Springer: New York, NY, USA, 2009; Volume 34. [Google Scholar] [CrossRef]
- Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar] [CrossRef]
- Auria, L.; Moro, R.A. Support Vector Machines (SVM) as a Technique for Solvency Analysis. SSRN Electron. J. 2008, 811, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Quinlan, J. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers: Burlington, MA, USA, 1993. [Google Scholar]
- Kotu, V.; Deshpande, B. Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner; Morgan Kaufmann Publishers: Burlington, MA, USA, 2014; pp. 1–425. [Google Scholar] [CrossRef] [Green Version]
- Krogh, A.; Sollich, P. Statistical mechanics of ensemble learning. Phys. Rev. E 1997, 55, 811. [Google Scholar] [CrossRef] [Green Version]
- Guidolin, M.; Pedio, M. Sharpening the Accuracy of Credit Scoring Models with Machine Learning Algorithms. In Data Science for Economics and Finance; Springer: New York, NY, USA, 2021; pp. 89–115. [Google Scholar] [CrossRef]
- Wang, G.; Hao, J.; Ma, J.; Jiang, H. A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 2011, 38, 223–230. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
- XGBoost—Machine Learning Challenge Winning Solutions. Available online: https://github.com/dmlc/xgboost/blob/master/demo/README.md#machine-learning-challenge-winning-solutions (accessed on 12 May 2022).
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 2012, 4, 2951–2959. [Google Scholar]
Statistics | DEPTH | GR | SP | CALI | RESS | RESM | RESD | PHIN | RHOB | DT |
---|---|---|---|---|---|---|---|---|---|---|
mean | 3820.63 | 55.44 | −1.02 | 8.72 | 20.54 | 24.51 | 23.53 | 19.28 | 2.50 | 72.52 |
std | 157.30 | 13.83 | 33.97 | 0.77 | 17.42 | 15.15 | 15.78 | 4.77 | 0.10 | 6.76 |
min | 3279.63 | 29.29 | −68.06 | 5.59 | 0.20 | 2.88 | −1.69 | 9.01 | 1.85 | 58.31 |
25% | 3806.88 | 45.43 | −14.45 | 8.41 | 7.69 | 13.15 | 11.92 | 16.41 | 2.47 | 68.31 |
50% | 3855.13 | 52.32 | −9.53 | 8.50 | 16.48 | 22.14 | 20.92 | 18.21 | 2.52 | 70.52 |
75% | 3882.38 | 62.63 | −1.97 | 8.69 | 28.81 | 32.56 | 32.09 | 20.36 | 2.56 | 74.36 |
max | 4350.25 | 109.375 | 102.362 | 16.82 | 126.388 | 114.335 | 104.926 | 47.4 | 2.681 | 118.61 |
Model | Hyperparameter | Symbol | Parameter Values |
---|---|---|---|
XGBoost | Boosting learning rate | learning_rate | 0.08 |
Subsample ratio of the training instances | subsample | 0.7 | |
The maximum depth of a tree | max_depth | 9 | |
The number of boosted trees | n_estimators | 600 | |
L2 regularization term on weights | reg_alpha | 0.1 | |
L1 regularization term on weights | reg_lambda | 1.2 | |
RF | The minimum number of samples required at a leaf node | min_samples_leaf | 1 |
The minimum number of samples required to split an internal node | min_samples_split | 2 | |
The number of trees in the forest | n_estimators | 400 | |
KNN | The number of neighbors to inspect | n_neighbor | 3 |
SVM | Penalty parameter of the error term | C | 1000 |
Kernel coefficient for ‘RBF’ | gamma | 0.1 | |
DT | The minimum number of samples required at a leaf node | min_samples_leaf | 1 |
The minimum number of samples required to split an internal node | min_samples_split | 2 |
Model | Training | Testing | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Accuracy | AUC | Recall | Precision | F1 | Accuracy | AUC | Recall | Precision | F1 | |
XGBoost | 0.852 | 0.920 | 0.735 | 0.847 | 0.844 | 0.882 | 0.947 | 0.769 | 0.880 | 0.876 |
RF | 0.837 | 0.918 | 0.695 | 0.833 | 0.823 | 0.861 | 0.942 | 0.715 | 0.861 | 0.849 |
KNN | 0.801 | 0.861 | 0.689 | 0.794 | 0.794 | 0.839 | 0.892 | 0.753 | 0.836 | 0.837 |
SVM | 0.797 | 0.857 | 0.648 | 0.782 | 0.783 | 0.844 | 0.898 | 0.711 | 0.836 | 0.835 |
DT | 0.766 | 0.762 | 0.659 | 0.764 | 0.765 | 0.781 | 0.760 | 0.656 | 0.776 | 0.779 |
LDA | 0.708 | 0.741 | 0.423 | 0.637 | 0.636 | 0.748 | 0.793 | 0.451 | 0.692 | 0.685 |
LR | 0.705 | 0.744 | 0.411 | 0.620 | 0.627 | 0.745 | 0.796 | 0.446 | 0.669 | 0.677 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; He, Y.; Zhang, Y.; Li, W.; Zhang, J. Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China. Energies 2022, 15, 3675. https://doi.org/10.3390/en15103675
Zhang J, He Y, Zhang Y, Li W, Zhang J. Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China. Energies. 2022; 15(10):3675. https://doi.org/10.3390/en15103675
Chicago/Turabian StyleZhang, Junlong, Youbin He, Yuan Zhang, Weifeng Li, and Junjie Zhang. 2022. "Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China" Energies 15, no. 10: 3675. https://doi.org/10.3390/en15103675
APA StyleZhang, J., He, Y., Zhang, Y., Li, W., & Zhang, J. (2022). Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China. Energies, 15(10), 3675. https://doi.org/10.3390/en15103675