Application of Interpretable Machine Learning for Production Feasibility Prediction of Gold Mine Project
Abstract
:1. Introduction
2. Methods and Materials
2.1. Methods
2.1.1. Miceforest Interpolation
- (1)
- The observed values of variable As, represented as ;
- (2)
- The missing values of variable As, denoted by ;
- (3)
- The variables other than As with observations = {1, …, n}\ denoted by ;
- (4)
- The variables other than As with observations denoted by .
2.1.2. Random Forest Model
- (1)
- Randomly sample N training samples from the training set to create a new training set;
- (2)
- Train M submodels on the new training set;
- (3)
- For classification tasks, use a voting method to determine the final class prediction, which is based on the most frequent predictions amongst all submodels. For regression tasks, the predicted value is obtained through simple averaging of the submodels’ predictions.
2.1.3. SHAP Value Evaluation of Interpretability
2.2. Materials
3. Modeling and Results
3.1. Dataset Building
- (1)
- Geologic Ore Body Type Zone
- (2)
- Max (Interval (feet))
- (3)
- Development Stage label division
3.2. Modeling Process
3.3. Results and Accuracy Verification
4. Discussion
4.1. Feature Importance Analysis by SHAP Value Summary Plot
4.2. Feature Correlation Analysis by Partial Dependence Plot
4.3. Project Validation by SHAP Value Force Plot
5. Conclusions
- (1)
- The accuracy rate in the prediction of the validation dataset imputed by miceforest was obviously improved. After constructing a Random Forest model on the dataset imputed by miceforest, the accuracy rate in the prediction was improved from 93.80% to 95.99%.
- (2)
- The Random Forest algorithm was used to construct the production feasibility model of the global gold mine project. It is verified by the accuracy, recall rate, and false positive rate that the model has high training accuracy and inspection accuracy, which proved the accuracy of the model construction, and there was no obvious overfitting scenario.
- (3)
- The estimated production life and ore deposit grade can be obtained in the feature importance ranking results of the SHAP value algorithm model as the most important factors affecting the production of gold mining projects.
- (4)
- This paper proposes a workflow for the evaluation of global gold mining projects’ production feasibility, which achieves satisfactory application results and provides new ideas for mining project assessment research and mining development efficiency in the era of big data.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zuo, R.; Xiong, Y.; Wang, J.; Carranza, E.J.M. Deep learning and its application in geochemical mapping. Earth Sci. Rev. 2019, 192, 1–14. [Google Scholar] [CrossRef]
- Xiong, Y.; Zuo, R. Recognition of geochemical anomalies using a deep autoencoder network. Comput. Geosci. 2016, 86, 75–82. [Google Scholar] [CrossRef]
- Zaki, M.M.; Chen, S.; Jicheng, Z.; Feng, F.; Qi, L.; Mahdy, M.A.; Jin, L. Optimized Weighted Ensemble Approach for Enhancing Gold Mineralization Prediction. Appl. Sci. 2023, 13, 7622. [Google Scholar] [CrossRef]
- Qi, C. Big data management in the mining industry. Int. J. Miner. Metall. Mater. 2020, 27, 131–139. [Google Scholar] [CrossRef]
- Li, W.L.; Gao, S.Y.; Han, C.H.; Wei, G.H.; Song, X.; Yang, J.K. A brief analysis on data mining for deep-sea mineral resources based on big data. Procedia Comput. Sci. 2019, 154, 699–705. [Google Scholar] [CrossRef]
- Yu, P.; Chen, J.; Chai, F.; Zheng, X.; Yu, M.; Xu, B. Research on model-driven quantitative prediction and evaluation of mineral resources based on geological big data concept. Geol. Bull. China 2015, 34, 1333–1343. [Google Scholar]
- Chen, Q.; Yu, W.; Zhang, Y.; Tan, H. Resources-Industry ‘flying geese’ evolving pattern. Resour. Sci. 2015, 37, 871–882. (In Chinese) [Google Scholar]
- Chen, Q.; Yu, W.; Zhang, Y.; Tan, H. Mining development cycle theory and development trends in Chinese mining. Resour. Sci. 2015, 37, 891–899. (In Chinese) [Google Scholar]
- Chen, Q.; Zhang, Y.; Xing, J.; Long, T.; Zheng, G.; Wang, K.; Cui, B.; Qin, S. Methods of Strategic Mineral Resources Determination in China and Abroad. Acta Geosci. Sin. 2021, 42, 137–144. (In Chinese) [Google Scholar]
- Wang, K.; Chen, Q.; Zhang, Y.; Wang, F.; Xing, J.; Zheng, G.; Long, T.; Zhang, T.; Cui, B. A Discussion on a Comprehensive Evaluation Method for Overseas Copper Mine Investment Projects: A Case Study of Africa. Acta Geosci. Sin. 2021, 42, 229–235. (In Chinese) [Google Scholar]
- Li, B.; Liu, B.; Guo, K.; Li, C.; Wang, B. Application of a maximum entropy model for mineral prospectivity maps. Minerals 2019, 9, 556. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Yuan, F.; Zhang, M.; Jia, C.; Jowitt, S.M.; Ord, A.; Zheng, T.; Hu, X.; Li, Y. Three-dimensional mineral prospectivity modeling for targeting of concealed mineralization within the Zhonggu iron orefield, Ningwu Basin, China. Ore Geol. Rev. 2015, 71, 633–654. [Google Scholar] [CrossRef]
- Porwal, A.; Carranza, E.J.M. Introduction to the Special Issue: GIS-Based Mineral Potential Modelling and Geological Data Analyses for Mineral Exploration; Elsevier: Amsterdam, The Netherlands, 2015; Volume 71, pp. 477–483. [Google Scholar]
- Zuo, R. Machine learning of mineralization-related geochemical anomalies: A review of potential methods. Nat. Resour. Res. 2017, 26, 457–464. [Google Scholar] [CrossRef]
- Wang, K.; Ai, Z.; Zhao, W.; Fu, Q.; Zhou, A. A Hybrid Model for Predicting Low Oxygen in the Return Air Corner of Shallow Coal Seams Using Random Forests and Genetic Algorithm. Appl. Sci. 2023, 13, 2538. [Google Scholar] [CrossRef]
- Elahi, F.; Muhammad, K.; Din, S.U.; Khan, M.F.A.; Bashir, S.; Hanif, M. Lithological Mapping of Kohat Basin in Pakistan Using Multispectral Remote Sensing Data: A Comparison of Support Vector Machine (SVM) and Artificial Neural Network (ANN). Appl. Sci. 2022, 12, 12147. [Google Scholar] [CrossRef]
- Xi, N.; Yang, Q.; Sun, Y.; Mei, G. Machine Learning Approaches for Slope Deformation Prediction Based on Monitored Time-Series Displacement Data: A Comparative Investigation. Appl. Sci. 2023, 13, 4677. [Google Scholar] [CrossRef]
- Daviran, M.; Maghsoudi, A.; Ghezelbash, R.; Pradhan, B. A new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of Random Forest approach. Comput. Geosci. 2021, 148, 104688. [Google Scholar] [CrossRef]
- Martins, T.F.; Seoane, J.C.S.; Tavares, F.M. Cu–Au exploration target generation in the eastern Carajás Mineral Province using Random Forest and multi-class index overlay mapping. J. S. Am. Earth Sci. 2022, 116, 103790. [Google Scholar] [CrossRef]
- Harris, J.R.; Naghizadeh, M.; Behnia, P.; Mathieu, L. Data-driven gold potential maps for the Chibougamau area, Abitibi greenstone belt, Canada. Ore Geol. Rev. 2022, 150, 105176. [Google Scholar] [CrossRef]
- Leke, C.; Marwala, T.; Paul, S. Proposition of a theoretical model for missing data imputation using deep learning and evolutionary algorithms. arXiv 2015, arXiv:1512.01362. [Google Scholar]
- Valdiviezo, H.C.; Van Aelst, S. Tree-based prediction on incomplete data using imputation or surrogate decisions. Inf. Sci. 2015, 311, 163–181. [Google Scholar] [CrossRef]
- Van Buuren, S. Flexible Imputation of Missing Data; CRC Press: Boca Raton, FL, USA, 2018; pp. 87–126. [Google Scholar]
- Xu, D.; Sheng, J.Q.; Hu, P.J.; Huang, T.; Hsu, C. A deep learning–based unsupervised method to impute missing values in patient records for improved management of cardiovascular patients. IEEE J. Biomed. Health 2020, 25, 2260–2272. [Google Scholar] [CrossRef] [PubMed]
- Zhao, X.; Shen, W.; Wang, G. Early prediction of sepsis based on machine learning algorithm. Comput. Intell. Neurosc. 2021, 2021, 6522633. [Google Scholar] [CrossRef] [PubMed]
- Stekhoven, D.J.; Bühlmann, P. MissForest—Non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [Green Version]
- Akande, O.; Li, F.; Reiter, J. An empirical comparison of multiple imputation methods for categorical data. Am. Stat. 2017, 71, 162–170. [Google Scholar] [CrossRef]
- Li, L.; Prato, C.G.; Wang, Y. Ranking contributors to traffic crashes on mountainous freeways from an incomplete dataset: A sequential approach of multivariate imputation by chained equations and Random Forest classifier. Accid. Anal. Prev. 2020, 146, 105744. [Google Scholar] [CrossRef]
- Slade, E.; Naylor, M.G. A fair comparison of tree-based and parametric methods in multiple imputation by chained equations. Stat. Med. 2020, 39, 1156–1166. [Google Scholar] [CrossRef]
- Resche-Rigon, M.; White, I.R. Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Stat. Methods Med. Res. 2018, 27, 1634–1649. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural information Processing Systems 2017, Long Beach, CA, USA, 25 November 2017. [Google Scholar]
- Liu, Y.; Liu, Z.; Luo, X.; Zhao, H. Diagnosis of Parkinson’s disease based on SHAP value feature selection. Biocybern. Biomed. Eng. 2022, 42, 856–869. [Google Scholar] [CrossRef]
- Wang, D.; Thunéll, S.; Lindberg, U.; Jiang, L.; Trygg, J.; Tysklind, M. Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods. J. Environ. Manag. 2022, 301, 113941. [Google Scholar] [CrossRef]
- Baptista, M.L.; Goebel, K.; Henriques, E.M.P. Relation between prognostics predictor evaluation metrics and local interpretability SHAP values. Artif. Intell. 2022, 306, 103667. [Google Scholar] [CrossRef]
- Samad, M.D.; Yin, L. Non-linear regression models for imputing longitudinal missing data. In Proceedings of the 2019 IEEE International Conference on Healthcare Informatics, Xi’an, China, 1 June 2019. [Google Scholar]
- Breiman, L. Using iterated bagging to debias regressions. Mach. Learn. 2001, 45, 261–277. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Butnariu, D.; Kroupa, T. Shapley mappings and the cumulative value for n-person games with fuzzy coalitions. Eur. J. Oper. Res. 2008, 186, 288–299. [Google Scholar] [CrossRef] [Green Version]
- Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.G.; Lee, S. Consistent individualized feature attribution for tree ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- Brown, P.E.; Hagemann, S.G. MacFlinCor and its application to fluids in Archean lode-gold deposits. Geochim. Cosmochim. Acta 1995, 59, 3943–3952. [Google Scholar] [CrossRef]
- Groves, D.I.; Goldfarb, R.J.; Santosh, M. The conjunction of factors that lead to formation of giant gold provinces and deposits in non-arc settings. Geosci. Front. 2016, 7, 303–314. [Google Scholar] [CrossRef] [Green Version]
Category | Field Name | Category | Field Name |
---|---|---|---|
Basic information | Development Stage | Funding information | Funding Type |
Activity Status | Description | ||
Mine Type | Count of Capital Invested | ||
Count of Commodities | Capital Cost Type 1 | ||
Country/Region | Capital Cost Type 2 | ||
Primary Commodity | Capital Cost Type 3 | ||
Study Year | Study Price per oz | ||
Country Risk Score Overall Current | LOM Cash Costs | ||
Country Risk Score Political Current | Cash Costs per oz | ||
Country Risk Score Economic Current | Mine Total Cost | ||
Country Risk Score Legal Current | Transaction Information | Deal Type | |
Country Risk Score Tax Current | Deal Status | ||
Country Risk Score Operation Current | Deal Consideration | ||
Country Risk Score Security Current | Earn InYes/No | ||
Country Risk Outlook Overall | Joint VentureYes/No | ||
Country Risk Outlook Political | Deal Acquired (Announcement) | ||
Country Risk Outlook Economic | Total Deal Value (Announcement) | ||
Country Risk Outlook Legal | Total Deal Value (Completion) | ||
Country Risk Outlook Tax | Geological drilling information | Average Depth of Geologic Deposit Zone 1 | |
Country Risk Outlook Operation | Average Depth of Geologic Deposit Zone 2 | ||
Country Risk Outlook Security | Average Depth of Geologic Deposit Zone 3 | ||
Operator information | Operator Market Capitalization | Average Depth of Geologic Deposit Zone 4 | |
Operator Total Enterprise Value | Significant Interval Yes/No | ||
Operator Total Debt | Ore Minerals Zone | ||
Operator Working Capital | Interval (meters) Drill Result 1 | ||
Operator Total Capitalization | Interval (meters)Drill Result 2 | ||
Owner information | Count of Project Owners | Interval (meters) Drill Result 3 | |
Count of Project Royalty Holders | Depth (meters) Drill Result 1 | ||
Historical Equity Ownership Percent | Depth (meters) Drill Result 2 | ||
Historical Controlling Ownership Percent | Depth (meters) Drill Result 3 | ||
Current Equity Ownership Percent | Exploration Purpose Drill Result 1 | ||
Current Controlling Ownership Percent | Exploration Purpose Drill Result 2 | ||
Owner Market Capitalization | Exploration Purpose Drill Result 3 | ||
Owner Total Enterprise Value | Interval Value Drill Result 1 | ||
Owner Total Debt Total Capitalization | Interval Value Drill Result 2 | ||
Owner Working Capital | Interval Value Drill Result 3 | ||
Total Debt | Grade x Interval Drill Result 1 | ||
Current Liabilities | Grade x Interval Drill Result 2 | ||
Reported EBITDA | Grade x Interval Drill Result 3 | ||
EBITDA | Interval Grade Equivalent Drill Result 1 | ||
Royalty Type | Interval Grade Equivalent Drill Result 2 | ||
Production information | Mill Capacity | Interval Grade Equivalent Drill Result 3 | |
Stripping Ratio | Max (Interval (feet)) | ||
Waste to Ore Ratio | Resource endowments | Reserves (Ore Tonnage) | |
Count of Mining Methods | Measured Indicated (Ore Tonnage Excl Reserves) | ||
Count of Processing Method | Inferred Resources (Ore Tonnage) | ||
Mining Method 1 | Total Resources (Ore Tonnage Excl Reserves) | ||
Processing Method 1 | Geologic Ore Body Type Zone | ||
Recovery Rate | In Situ Value (Measured Indicated Excl Reserves) | ||
Ore Processed Mass | Grade Reserves | ||
LOM Yearly Production | Contained Reserves | ||
Mine development information | Estimated Mine Life | MillHead Grade | |
Life of Mine Cash Flow (High Case) | In Situ Value (Reserves and Resources) | ||
Payback Period (High Case) |
Vital Signs | Unit | Missing Percentage |
---|---|---|
In Situ Value (Reserves and Resources) | dollar | 3.5% |
MillHead Grade | g/tonne | 64.2% |
Mine Total Cost | dollar | 71.0% |
Cash Costs per oz | dollar | 72.6% |
Data Type | Method |
---|---|
Single text type | Dumb variable treatment is performed according to classification |
Multitext type | Numeric mapping is performed after counting |
Discrete type (no size meaning) | One-hot encoding |
Discrete type (size meaning) | Value mapping |
Continuous | No processing |
Special fields | Attribute aggregation, addition, multiple overlay |
Geologic Ore | Type |
---|---|
Saddle Reefs | Orogenic |
Mesothermal Lode Gold | Orogenic |
Vein Hosted | Orogenic |
Intrusive Related | Intrusive Related |
Disseminated | Intrusive Related |
Alkali Intrusion | Intrusive Related |
Granite Related | Granite Related |
Layered Mafic-Ultramafic Intrusion | Intrusive Related |
Skarn (Metasomatic) | Skarn |
Carbonate Replacement (incl Manto) | Skarn |
IOCG Breccia Complex | IOCG |
Iron Oxide Copper Gold (IOCG) | IOCG |
Replacement | Replacement |
Proterozoic Quartz Pebble Conglomerate | Placer |
Paleoplacer (Buried) | Placer |
Placer (Alluvial) | Placer |
Placer (Beach) | Placer |
Jasperoid Hosted | Epithermal |
Epithermal | Epithermal |
Epithermal Low Sulphidation | Epithermal |
Epithermal High Sulphidation | Epithermal |
Hot Spring Au-Ag | Epithermal |
Komatiitic Magmatic | Komatiitic Magmatic |
Carlin Style Carbonate Replacement | Carlin |
Flood Basalt (Dyke-Sill Complexes) | Flood Basalt |
Laterite (Generic) | Laterite |
Black Shale | Black Shale |
Breccia Pipes | Porphyry |
Collapse Breccia Pipes | Porphyry |
Breccia Fill | Porphyry |
Porphyry Deposit | Porphyry |
Sedimentary Exhalative (SEDEX) | Sediment |
Sediment Hosted (Reduced Facies) | Sediment |
Supergene | Supergene |
Volcanogenic Massive Sulfide (VMS) | VMS |
Carb-Hosted (Mississippi Valley Type) | MVT |
Banded Iron Formation (BIF) | BIF |
Label | Development Stage | Description |
---|---|---|
Already put into production | Mine-Stage | Project that has made a decision to move forward with production. |
Commissioning | The mine is commissioned and production has started. | |
Production | Commercial production has been achieved. | |
Operating | The mine is fully operational. | |
Satellite | A satellite of a central processing complex. | |
Limited | Some ore and/or commodity is being produced. Closure. | |
Preproduction | Ago-ahead decision has been made and the project is being readied for production. | |
Not put into production | Late-Stage | Project with a defined resource that has not yet reached a production decision. |
Reserves Development | An initial reserve/resource has been calculated. | |
Advanced | Drilling is being completed to add additional reserves/resources. | |
Exploration | Project is in the exploration phase. | |
Prefeasibility/Scoping | Usually an in-house assessment that includes mining and proccssing methods, capital costs, NPV, IRR. | |
Feasibility | Bankable feasibility study is underway to determine economic viability. | |
Feasibility Started | Feasibility report has commenced. | |
Feasibility Complete | Feasibility report is complete. | |
Construction Planned | Construction is planned for the property | |
Construction Started | Construction has begun at the property | |
Expansion | Operator is engaged in an active capital expansion of the facilities. | |
Residual | The operator has stopped mining ore and is leaching the residual ore. | |
Closed | Operation has stopped, in many cases because the ore has been exhausted. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, K.; Chen, Q.; Wang, K.; Zhang, Y.; Zhang, D.; Zheng, G.; Xing, J.; Long, T.; Ren, X.; Shang, C.; et al. Application of Interpretable Machine Learning for Production Feasibility Prediction of Gold Mine Project. Appl. Sci. 2023, 13, 8992. https://doi.org/10.3390/app13158992
Kang K, Chen Q, Wang K, Zhang Y, Zhang D, Zheng G, Xing J, Long T, Ren X, Shang C, et al. Application of Interpretable Machine Learning for Production Feasibility Prediction of Gold Mine Project. Applied Sciences. 2023; 13(15):8992. https://doi.org/10.3390/app13158992
Chicago/Turabian StyleKang, Kun, Qishen Chen, Kun Wang, Yanfei Zhang, Dehui Zhang, Guodong Zheng, Jiayun Xing, Tao Long, Xin Ren, Chenghong Shang, and et al. 2023. "Application of Interpretable Machine Learning for Production Feasibility Prediction of Gold Mine Project" Applied Sciences 13, no. 15: 8992. https://doi.org/10.3390/app13158992
APA StyleKang, K., Chen, Q., Wang, K., Zhang, Y., Zhang, D., Zheng, G., Xing, J., Long, T., Ren, X., Shang, C., & Cui, B. (2023). Application of Interpretable Machine Learning for Production Feasibility Prediction of Gold Mine Project. Applied Sciences, 13(15), 8992. https://doi.org/10.3390/app13158992