Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine
Abstract
:1. Introduction
2. Related Works
3. Data
3.1. Data Acquisition
3.2. Data Preprocessing
3.3. Instance Generation
4. Methodology
4.1. Gaussian Naive Bayes Model and the Reason Why It Is Hurt by Imbalanced Data Distribution
4.2. KNN-PDE-alike Probability Density Estimation Algorithm
Algorithm 1. KNN-PDE-alike algorithm |
Input: a data set Φ = {xi | xi∈Rm, 1 ≤ i ≤ q}, the neighborhood parameter K. |
Output: a 1 × q vector CPD to record the relative conditional probability density of all instances. |
Procedure: |
1. For each instance xi, find its Kth nearest neighbor, and record their distance ; |
2. Calculate the normalized factor Z by Equation (10); |
3. Calculate the relative conditional probability density P(xi) by Equation (9); |
4. Record the relative conditional probability density one by one into CPD and output it. |
4.3. Probability Density Machine
Algorithm 2. PDM algorithm |
Input: an imbalanced training set Φ = {(xi, yi) | xi∈Rm, 1 ≤ I ≤ q, yi∈{Y+, Y− }}, a test instance x’. |
Output: a predictive class label for x’. |
Training Procedure: |
1. For majority class Y− and minority class Y+, extract their corresponding instances into Φ− and Φ+ from Φ, count their number of instances and record them as q− and q+, respectively; |
2. Calculate CIL which equals |Φ−|/|Φ+|; |
3. For two classes Y− and Y+, set the corresponding parameter K as and , respectively; |
4. For two classes, call KNN-PDE-alike algorithm to obtain and record the corresponding normalized factors Z− and Z+. |
Testing Procedure: |
1. For class Y−, put x’ and the corresponding Φ−, K− and Z− into KNN-PDE-alike algorithm to obtain the corresponding relative conditional probability density P−(x’); |
2. For class Y+, put x’ and the corresponding Φ+, K+ and Z+ into KNN-PDE-alike algorithm to obtain the corresponding relative conditional probability density P+(x’); |
3. For class Y−, call CIL to adjust its relative conditional probability and obtain the normalized probability density P−(x’) by Equation (11); |
4. Compare P−(x’) and P+(x’), if P−(x’)> P+(x’), predicting x’ into Y−, otherwise, x’ is predicted into Y+; |
5. Output the class label of the test instance x’. |
5. Experiments
5.1. Experimental Settings
5.2. Results and Discussions
- (1)
- On gas concentration data, two traditional oversampling strategies, i.e., ROS and SMOTE, seem to be impossible for promoting the quality of the classification model, while RUS sometimes presents a little better performance than GNB. We believe it associates with the structural and distribution complexity of gas concentration monitoring data. On this kind of data, ROS makes the model extremely overfitting, and SMOTE generates many synthetic instances on inappropriate positions. GL-GNB, which considers the distribution during sampling, alleviates the problem of ROS and SMOTE to some extent. However, the performance promotion by adopting GL-GNB seems to be restricted as on D4 and D5, it produces worse performance than GNB.
- (2)
- Class imbalance rate can influence the performance of various algorithms to some extent, including the proposed PDM algorithm. It can clearly be observed that the worse F-measure values exist on those two highly imbalanced data sets, namely D3 and D6, while on other data sets, the classification performance is obviously better. We believe it is related to a rare number of minority training instances, which are not enough to precisely reconstruct the probability distribution of the minority class.
- (3)
- On most data sets, two algorithm-level algorithms, i.e., FSVM-CIL and ODOC-ELM, perform significantly better than those sampling-based strategies. Of course, they are both more sophisticated than those sampling algorithms. FSVM-CIL needs to explore data distribution and assign individual cost weight for each instance, while ODOC-ELM needs to adopt the random optimization algorithm to iteratively search the best threshold. We also note that on highly imbalanced data sets, e.g., D3 and D6, FSVM-CIL outperforms ODOC-ELM, while on the other data sets, ODOC-ELM performs better.
- (4)
- The proposed PDM algorithm outperforms all other solutions. In fact, it produced the best result on the most predicted time points of each data set. Specifically, in comparison with the GNB, the performance of PDM was improved 13.1~24.7%, while compared with several other algorithms, the performance of PDM promotes 0.4~40.7%. The results verifies the effectiveness and superiority of the proposed PDM algorithm.
5.3. Significance Statistical Analysis
5.4. Discussion about the Parameters
6. Conclusions
- (1)
- We consider the early warning issue of gas concentration in coal mine production as a classification issue, and note its characteristics of class imbalance and sophisticated distribution;
- (2)
- In context of the Naive Bayes theory, we analyzed why imbalanced data distribution can hurt predicted models in theory;
- (3)
- A novel class imbalance learning algorithm called the probability density machine was used to promote the accuracy of early warning of gas concentration in coal mine production.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- National Bureau of Statistics of the People’s Republic of China. Available online: https://data.stats.gov.cn/ (accessed on 8 May 2021).
- Ma, Q.; Zhang, Q.; Li, D.; Chen, J.; Ren, S.; Shen, S. Effects of premixed methane concentration on distribution of flame region and hazard effects in a tube and a tunnel gas explosion. J. Loss Prev. Process. Ind. 2015, 34, 30–38. [Google Scholar] [CrossRef]
- Zhang, J.; Cliff, D.; Xu, K.; You, G. Focusing on the patterns and characteristics of extraordinarily severe gas explosion accidents in Chinese coal mines. Process. Saf. Environ. Prot. 2018, 117, 390–398. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Y.; Wang, D.; Shao, Z.; Xu, C.; Zhu, X.; Qi, X.; Liu, F. A statistical analysis of coalmine fires and explosions in China. Process. Saf. Environ. Prot. 2019, 121, 357–366. [Google Scholar] [CrossRef]
- Song, Y.; Yang, S.; Hu, X.; Song, W.; Sang, N.; Cai, J.; Xu, Q. Prediction of gas and coal spontaneous combustion coexisting disaster through the chaotic characteristic analysis of gas indexes in goaf gas extraction. Process. Saf. Environ. Prot. 2019, 129, 8–16. [Google Scholar] [CrossRef]
- Zhang, S.; Wang, B.; Li, X.; Chen, H. Research and Application of Improved Gas Concentration Prediction Model Based on Grey Theory and BP Neural Network in Digital Mine. Procedia CIRP 2016, 56, 471–475. [Google Scholar] [CrossRef] [Green Version]
- Wang, F.; Liu, W. Prediction Strategy of Coal and Gas Outburst Based on Artificial Neural Network. J. Comput. 2013, 8, 240–247. [Google Scholar] [CrossRef]
- Wu, Y.; Gao, R.; Yang, J. Prediction of coal and gas outburst: A method based on the BP neural network optimized by GASA. Process. Saf. Environ. Prot. 2020, 133, 64–72. [Google Scholar] [CrossRef]
- Lyu, P.; Chen, N.; Mao, S.; Li, M. LSTM based encoder-decoder for short-term predictions of gas concentration using multi-sensor fusion. Process. Saf. Environ. Prot. 2020, 137, 93–105. [Google Scholar] [CrossRef]
- Wu, H.; Shi, S.; Lu, Y.; Liu, Y.; Huang, W. Top corner gas concentration prediction using t-distributed Stochastic Neighbor Embedding and Support Vector Regression algorithms. Concurr. Comput. Pract. Exp. 2020, 32, e5705. [Google Scholar] [CrossRef]
- Meng, Q.; Ma, X.; Zhou, Y. Prediction of Mine Gas Emission Rate using Support Vector Regression and Chaotic Particle Swarm Optimization Algorithm. J. Comput. 2013, 8, 2908–2915. [Google Scholar] [CrossRef]
- Wu, X.; Qian, J.-S.; Huang, C.-H.; Zhang, L. Short-Term Coalmine Gas Concentration Prediction Based on Wavelet Transform and Extreme Learning Machine. Math. Probl. Eng. 2014, 2014, 858620. [Google Scholar]
- Dong, D.-W.; Li, S.-G.; Chang, X.-T.; Lin, H.-F. Prediction Model of Gas Concentration around Working Face Using Multivariate Time Series. J. Min. Saf. Eng. 2012, 29, 135–139. [Google Scholar]
- Yuan, B. Study on gas emission prediction of working face based on GM (1, 1) model. J. Phys. Conf. Ser. 2020, 1549, 042031. [Google Scholar] [CrossRef]
- Li, D.; Cheng, Y.; Wang, L.; Wang, H.; Wang, L.; Zhou, H. Prediction method for risks of coal and gas outbursts based on spatial chaos theory using gas desorption index of drill cuttings. Min. Sci. Technol. 2011, 21, 439–443. (In Chinese) [Google Scholar] [CrossRef]
- Wang, T.; Cai, L.-Q.; Fu, Y.; Zhu, T.-C. A Wavelet-Based Robust Relevance Vector Machine Based on Sensor Data Scheduling Control for Modeling Mine Gas Gushing Forecasting on Virtual Environment. Math. Probl. Eng. 2013, 2013, 579693. [Google Scholar]
- Ruta, D.; Cen, L. Self-Organized Predictor of Methane Concentration Warnings in Coal Mines. Lect. Notes Comput. Sci. 2015, 9437, 485–493. [Google Scholar]
- Zhang, J.; Ai, Z.; Guo, L.; Cui, X. Research of Synergy Warning System for Gas Outburst Based on Entropy-Weight Bayesian. Int. J. Comput. Intell. Syst. 2021, 14, 376–385. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Gong, B. Learning from class-imbalance data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar]
- Yu, H.; Ni, J.; Zhao, J. ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 2013, 101, 309–318. [Google Scholar] [CrossRef]
- Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.W.; Hall, L.O. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Xie, Y.; Peng, L.; Chen, Z.; Yang, B.; Zhang, H.; Zhang, H. Generative leaning for imbalanced data using the Gaussian mixed model. Appl. Soft Comput. J. 2019, 79, 439–451. [Google Scholar] [CrossRef]
- Imam, T.; Ting, K.M.; Kamruzzaman, J. z-SVM: An SVM for improved classification of imbalanced data. Proc. Aust. Conf. Artif. Intell. 2006, 264–273. [Google Scholar]
- Batuwita, R.; Palade, V. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 2010, 18, 558–571. [Google Scholar] [CrossRef]
- Yu, H.; Mu, C.; Sun, C.; Yang, W.; Yang, X.; Zuo, X. Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl.-Based Syst. 2015, 76, 67–78. [Google Scholar] [CrossRef]
- Yu, H.; Sun, C.; Yang, X.; Yang, W.; Shen, J.; Qi, Y. ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data. Knowl.-Based Syst. 2016, 92, 55–70. [Google Scholar] [CrossRef]
- Yu, H.; Sun, C.; Yang, X.; Zheng, S.; Zou, H. Fuzzy Support Vector Machine with Relative Density Information for Classifying Imbalanced Data. IEEE Trans. Fuzzy Syst. 2019, 27, 2353–2367. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, H.; Pirbhulal, S.; Wu, W.; Victor, H.C.D.A. Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning-Based Classification Models. ACM Trans. Multimedia Comput. Commun. Appl. 2020, 16, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Griffis, J.C.; Allendorfer, J.B.; Szaflarski, J.P. Voxel-based Gaussian naive Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans. J. Neurosci. Methods 2016, 257, 97–108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Berrar, D. Bayes’ Theorem and Naive Bayes Classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics; Elsevier Science Publisher: Amsterdam, The Netherlands, 2018; pp. 403–412. [Google Scholar]
- Kupervasser, O. Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model. In Bayesian Networks—Advances and Novel Applications; IntechOpen Publisher: London, UK, 2019. [Google Scholar]
- Fukunaga, K.; Hostetler, L. Optimization of k nearest neighbor density estimates. IEEE Trans. Inf. Theory 1973, 19, 320–326. [Google Scholar] [CrossRef]
- Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theory 2009, 55, 2392–2405. [Google Scholar] [CrossRef]
- Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Garcia, S.; Fernandez, A.; Luengo, J.; Herrera, F. Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Data Set | Total | Loss | Loss Rate |
---|---|---|---|
D1 | 22,032 | 1354 | 6.1% |
D2 | 22,032 | 1845 | 8.4% |
D3 | 22,032 | 2237 | 10.2% |
D4 | 22,032 | 2532 | 11.5% |
D5 | 22,032 | 3155 | 14.3% |
D6 | 22,032 | 3423 | 15.5% |
Data Set | Algorithm | 10 min | 20 min | 30 min | 40 min | 50 min | 60 min | Average |
---|---|---|---|---|---|---|---|---|
D1 | PDM | 0.738 | 0.690 | 0.659 | 0.636 | 0.618 | 0.599 | 0.657 |
GNB | 0.560 | 0.537 | 0.522 | 0.511 | 0.503 | 0.494 | 0.521 | |
RUS-GNB | 0.601 | 0.569 | 0.548 | 0.534 | 0.524 | 0.511 | 0.548 | |
ROS-GNB | 0.550 | 0.529 | 0.513 | 0.502 | 0.492 | 0.484 | 0.512 | |
SMOTE-GNB | 0.549 | 0.527 | 0.512 | 0.499 | 0.489 | 0.481 | 0.510 | |
GL-GNB | 0.629 | 0.581 | 0.563 | 0.549 | 0.537 | 0.531 | 0.565 | |
FSVM-CIL | 0.633 | 0.597 | 0.601 | 0.585 | 0.576 | 0.544 | 0.589 | |
ODOC-ELM | 0.675 | 0.630 | 0.599 | 0.572 | 0.596 | 0.563 | 0.606 | |
D2 | PDM | 0.747 | 0.715 | 0.687 | 0.665 | 0.641 | 0.618 | 0.679 |
GNB | 0.520 | 0.497 | 0.481 | 0.468 | 0.453 | 0.441 | 0.477 | |
RUS-GNB | 0.434 | 0.417 | 0.403 | 0.391 | 0.381 | 0.372 | 0.400 | |
ROS-GNB | 0.503 | 0.480 | 0.464 | 0.448 | 0.436 | 0.427 | 0.460 | |
SMOTE-GNB | 0.507 | 0.483 | 0.465 | 0.448 | 0.436 | 0.425 | 0.461 | |
GL-GNB | 0.528 | 0.525 | 0.510 | 0.533 | 0.479 | 0.456 | 0.505 | |
FSVM-CIL | 0.499 | 0.486 | 0.472 | 0.481 | 0.466 | 0.452 | 0.476 | |
ODOC-ELM | 0.691 | 0.652 | 0.639 | 0.601 | 0.598 | 0.573 | 0.626 | |
D3 | PDM | 0.526 | 0.454 | 0.412 | 0.372 | 0.342 | 0.310 | 0.403 |
GNB | 0.237 | 0.217 | 0.206 | 0.197 | 0.191 | 0.183 | 0.205 | |
RUS-GNB | 0.368 | 0.287 | 0.225 | 0.173 | 0.138 | 0.108 | 0.217 | |
ROS-GNB | 0.222 | 0.203 | 0.192 | 0.182 | 0.174 | 0.167 | 0.190 | |
SMOTE-GNB | 0.221 | 0.201 | 0.186 | 0.178 | 0.172 | 0.164 | 0.187 | |
GL-GNB | 0.307 | 0.251 | 0.204 | 0.187 | 0.192 | 0.175 | 0.219 | |
FSVM-CIL | 0.519 | 0.432 | 0.377 | 0.291 | 0.286 | 0.259 | 0.361 | |
ODOC-ELM | 0.428 | 0.295 | 0.356 | 0.272 | 0.261 | 0.253 | 0.311 | |
D4 | PDM | 0.617 | 0.549 | 0.505 | 0.473 | 0.452 | 0.435 | 0.505 |
GNB | 0.416 | 0.389 | 0.372 | 0.363 | 0.355 | 0.349 | 0.374 | |
RUS-GNB | 0.215 | 0.209 | 0.204 | 0.201 | 0.198 | 0.196 | 0.204 | |
ROS-GNB | 0.398 | 0.373 | 0.358 | 0.346 | 0.337 | 0.331 | 0.357 | |
SMOTE-GNB | 0.399 | 0.370 | 0.354 | 0.342 | 0.333 | 0.327 | 0.354 | |
GL-GNB | 0.419 | 0.381 | 0.374 | 0.350 | 0.353 | 0.332 | 0.368 | |
FSVM-CIL | 0.514 | 0.453 | 0.446 | 0.429 | 0.398 | 0.401 | 0.440 | |
ODOC-ELM | 0.555 | 0.507 | 0.481 | 0.486 | 0.440 | 0.439 | 0.485 | |
D5 | PDM | 0.661 | 0.601 | 0.560 | 0.527 | 0.487 | 0.465 | 0.550 |
GNB | 0.360 | 0.330 | 0.307 | 0.292 | 0.276 | 0.263 | 0.305 | |
RUS-GNB | 0.575 | 0.519 | 0.482 | 0.455 | 0.418 | 0.352 | 0.467 | |
ROS-GNB | 0.333 | 0.306 | 0.286 | 0.271 | 0.258 | 0.247 | 0.284 | |
SMOTE-GNB | 0.333 | 0.304 | 0.286 | 0.268 | 0.256 | 0.243 | 0.282 | |
GL-GNB | 0.349 | 0.321 | 0.304 | 0.288 | 0.265 | 0.258 | 0.298 | |
FSVM-CIL | 0.507 | 0.486 | 0.454 | 0.429 | 0.401 | 0.382 | 0.443 | |
ODOC-ELM | 0.492 | 0.471 | 0.460 | 0.443 | 0.399 | 0.385 | 0.442 | |
D6 | PDM | 0.588 | 0.548 | 0.527 | 0.510 | 0.496 | 0.478 | 0.525 |
GNB | 0.320 | 0.294 | 0.277 | 0.266 | 0.258 | 0.251 | 0.278 | |
RUS-GNB | 0.139 | 0.130 | 0.125 | 0.120 | 0.116 | 0.113 | 0.124 | |
ROS-GNB | 0.305 | 0.276 | 0.262 | 0.250 | 0.241 | 0.232 | 0.261 | |
SMOTE-GNB | 0.321 | 0.284 | 0.267 | 0.255 | 0.244 | 0.235 | 0.268 | |
GL-GNB | 0.356 | 0.332 | 0.309 | 0.311 | 0.289 | 0.273 | 0.312 | |
FSVM-CIL | 0.576 | 0.553 | 0.515 | 0.508 | 0.502 | 0.484 | 0.523 | |
ODOC-ELM | 0.389 | 0.304 | 0.327 | 0.299 | 0.271 | 0.254 | 0.307 |
Data Set | |||||
---|---|---|---|---|---|
D1 | 0.665 | 0.668 | 0.657 | 0.648 | 0.636 |
D2 | 0.683 | 0.685 | 0.679 | 0.669 | 0.657 |
D3 | 0.376 | 0.398 | 0.403 | 0.415 | 0.401 |
D4 | 0.479 | 0.497 | 0.505 | 0.502 | 0.495 |
D5 | 0.536 | 0.551 | 0.550 | 0.536 | 0.503 |
D6 | 0.501 | 0.524 | 0.525 | 0.516 | 0.503 |
Data Set | 6 | 12 | 24 | 48 | 72 |
---|---|---|---|---|---|
D1 | 0.642 | 0.684 | 0.657 | 0.639 | 0.614 |
D2 | 0.659 | 0.671 | 0.679 | 0.664 | 0.663 |
D3 | 0.395 | 0.417 | 0.403 | 0.357 | 0.344 |
D4 | 0.488 | 0.538 | 0.505 | 0.470 | 0.444 |
D5 | 0.531 | 0.547 | 0.550 | 0.521 | 0.515 |
D6 | 0.511 | 0.520 | 0.525 | 0.506 | 0.488 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, Y.; Wu, S.; Zhou, M.; Gao, S.; Yu, H. Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors 2021, 21, 5730. https://doi.org/10.3390/s21175730
Cai Y, Wu S, Zhou M, Gao S, Yu H. Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors. 2021; 21(17):5730. https://doi.org/10.3390/s21175730
Chicago/Turabian StyleCai, Yadong, Shiqi Wu, Ming Zhou, Shang Gao, and Hualong Yu. 2021. "Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine" Sensors 21, no. 17: 5730. https://doi.org/10.3390/s21175730
APA StyleCai, Y., Wu, S., Zhou, M., Gao, S., & Yu, H. (2021). Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors, 21(17), 5730. https://doi.org/10.3390/s21175730