A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet

Zhang, Can; Ma, Liang; Liu, Wenying

doi:10.3390/min13030376

Open AccessArticle

A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet

by

Can Zhang

¹

,

Liang Ma

^2,* and

Wenying Liu

^1,*

¹

Department of Materials Engineering, University of British Columbia, 309-6350 Stores Road, Vancouver, BC V6T 1Z4, Canada

²

Energy, Mining and Environment Research Center, National Research Council Canada, 4250 Wesbrook Mall, Vancouver, BC V6T 1W5, Canada

^*

Authors to whom correspondence should be addressed.

Minerals 2023, 13(3), 376; https://doi.org/10.3390/min13030376

Submission received: 25 January 2023 / Revised: 21 February 2023 / Accepted: 2 March 2023 / Published: 8 March 2023

(This article belongs to the Special Issue Mobility of Potentially Toxic Elements: Environmental Hazards)

Download

Browse Figures

Versions Notes

Abstract

:

A new machine learning approach was developed to predict the quantity of mine waste rock drainage using weather data as the inputs. The novelty of the approach is that it includes spring freshet (melting of snow/ice in spring) as an input to the drainage flow rate model. Specifically, the machine learning approach integrates the decision tree algorithm to classify the occurrence or absence of spring freshet and a long short-term memory (LSTM) algorithm to predict the flow rate of mine waste rock drainage. The two algorithms are integrated by using the classification result of spring freshet as an input to the flow rate model. The machine learning approach developed was applied to predict the drainage flow rate at a case study mine in Canada. The model developed was trained with the local weather data as the inputs and the historical monitoring data of drainage flow rate as the target (output). The results show that the decision tree algorithm is able to classify the occurrence or absence of spring freshet with an accuracy of 91%. The inclusion of spring freshet as an input to the flow rate model significantly improves the performance of the flow rate model. The sensitivity tests show that changes in temperature and atmospheric precipitation influence the drainage flow rate.

Keywords:

drainage flow model; spring freshet threshold; decision tree algorithm; LSTM algorithm

1. Introduction

Mining activities leave behind large volumes of waste rock that is barren or contains uneconomic levels of valuable minerals to warrant treatment. When exposed to water and air, waste rock undergoes weathering via complex biogeochemical reactions, such as sulfide oxidation, acid neutralization by carbonate and silicate minerals, and formation of a range of secondary minerals [1,2,3]. This causes release and mobilization of a diverse range of potentially harmful substances in waste rock drainage, such as acid, heavy metals, and metalloids. Extensive efforts have been made to understand the behavior of waste rock drainage using mathematical modelling together with laboratory and field testing [4,5].

Reactive transport modelling, either deterministic or stochastic, has been used as an essential tool to predict the quantity and chemistry of waste rock drainage [6,7,8,9,10,11,12,13]. These models couple unsaturated multiphase fluid flow, solute transport, heat transfer, and equilibrium and kinetics of biogeochemical reactions to predict the concentrations of various water species. The application of stochastic modeling allows the embedment of uncertainty in decision making based on model predictions [14]. It is well known that waste rock piles exhibit a high degree of physical and geochemical heterogeneities, making it extremely challenging for the reactive transport models to capture the influence of heterogeneities on the drainage quantity and chemistry [2,15,16]. These models often require site-specific data that are difficult and expensive to collect.

Machine learning has advantages over the traditional reactive transport modelling in that machine learning models are primarily algorithm-driven and therefore tend to be easier and faster to develop. Various models based on machine learning algorithms have been applied to predict mine drainage chemistry [17,18]. Researchers have evaluated the accuracy and uncertainty of five machine learning algorithms in predicting the chemistry of an acid mine drainage [19,20]. The authors concluded that machine learning techniques are promising tools for predicting drainage chemistry and that the prediction results could be used to assess potential environmental risks of acid mine drainage. Ma et al. applied two neural networks to predict the flow rate and acidity of the waste rock seepage at the Equity Silver Mine respectively using only the weather monitoring data as the inputs [21,22].

In this research, a machine learning approach was developed specifically for prediction of drainage quantity at mine sites located in areas with spring freshet. Spring freshet refers to the melting of snow/ice in spring that leads to a comparatively high rate of flow of fresh water of short duration. The novelty of the approach is that it includes spring freshet as an input to the drainage flow rate model. It is important to understand how the occurrence of spring freshet affects the drainage quantity in the context of climate change [23,24]. Using weather monitoring data as the inputs, we developed a new machine learning approach that integrates the decision tree algorithm to predict the occurrence of spring freshet and the long short-term memory (LSTM) algorithm to predict the drainage quantity. LSTM is a special kind of recurrent neural network suitable for processing time sequence data and does not experience vanishing and exploding gradient problems [25]. The application of the machine learning approach to predict the drainage quantity was demonstrated using a mine site located in Canada with spring freshet as a case study.

2. Methodology

2.1. The General Framework of the Machine Learning Approach

Two supervised machine learning algorithms are applied to predict the drainage flow rate using weather monitoring data as the inputs: decision tree algorithm for the prediction of spring freshet and long short-term memory (LSTM) algorithm for the prediction of the drainage flow rate. Figure 1 shows a schematic of the general framework of the machine learning approach. The raw weather data are pre-processed by different methods in the data preparation step. Using the pre-processed weather data as the inputs, the decision tree algorithm classifies whether a drainage flow rate represents the occurrence or absence of spring freshet as the output. This output, together with the weather data pre-processed in a separate step, is then used as the inputs to the LSTM algorithm to predict the drainage flow rate as the target. The performance of the flow rate model with spring freshet prediction as an input is compared with that of the initial flow rate model, i.e., an LSTM model without spring freshet prediction as one of the inputs.

The hardware used in this research consists of an AMD R5 5600X CPU and 16 G memory. The decision tree and the LSTM algorithms were constructed with scikit-learn version 0.24.2 and TensorFlow version 2.3.0 in python 3.7.10 environment. The code used in this research is available on GitHub^® https://github.com/RichardZhang1997/Flow-rate (accessed on 3 March 2023).

2.2. Decision Tree Algorithm for the Prediction of Spring Freshet

In the present study, a decision tree algorithm is used to classify whether a drainage flow rate represents the occurrence or absence of spring freshet using the pre-processed weather data as the inputs. A decision tree starts with a single node, called the root node, and branches into possible outcomes that are available to the decision-maker. The training of a decision tree is to build the tree by determining the best split at each decision node. The metric used to measure the quality of the splitting is Gini index or entropy, which is one of the hyperparameters of the model. The smaller these metrics are, the less disordered the dataset is and the better the splitting is. Model overfitting is dealt with by pruning (removal of branches) or by limiting the maximum depth of the tree or the minimum size of a leaf. This requires careful tuning of the model hyperparameters.

2.3. LSTM Algorithm for the Prediction of Drainage Flow Rate

Long short-term memory (LSTM) is an advanced recurrent neural network capable of handling time series data. The structure of an LSTM model is well-defined by Hochreiter and Schmidhuber [25]. In the present study, LSTM is used to predict the drainage flow rate over time given weather conditions and the spring freshet prediction as the input features. The LSTM algorithm consists of an input layer containing these features, a recurrent layer considering the sequence relationship of each feature, a fully-connected layer rearranging the shape of the model output, and an output layer that produces the drainage flow rate over time as the target. The model hyperparameters include the number of hidden states, dropout rate, and neuron weight constraint. The number of hidden states is the number of neurons in the fully connected layer to rearrange the output shape of the LSTM. The dropout rate is the recurrent dropout rate inside the LSTM layer to prevent overfitting. The constraint is the maximum norm value for the incoming weights, preventing the weight of a single neuron getting too large to make the model unstable once being dropped. In addition, training parameters including the learning rate, the batch size, and the number of epochs must also be tuned.

2.4. Steps Involved in the Application of the Machine Learning Approach

2.4.1. Step 1: Data Preparation

The purpose of data preparation is to transform the raw data by various data pre-processing methods to a form suitable for machine learning. In terms of weather dataset used in the present study, raw data typically include temperature, total precipitation, rain, snow, solar radiation, evaporation rate, etc. Two major tasks in the pre-processing of these raw data are handling missing values and data scaling. Two common methods are proposed to handle the missing values: deleting the data samples that contain missing values or generating new data to fill the missing values.

Data scaling is often required to ensure that data of a significantly large magnitude do not disproportionately impact the modelling process. Common methods include the min–max scaling that normalizes all values to a range of 0 to 1 and the standardization scaling that rescales all values to a standard normal distribution with a mean of 0 and a standard deviation of 1. Data scaling is not required by the decision tree algorithm but suggested for the LSTM algorithm.

2.4.2. Step 2: Feature Selection

Feature selection is intended to transform raw data into input variables that are believed to be the most efficient to predict the target variable. In the context of mine waste rock drainage, weather data and their monitoring times are suggested as the input variables (features) to predict the drainage flow rate (the target variable) [21]. Specifically, temperature and rainfall have been reported to be the main factors controlling acid mine drainage [26]. In addition, because the decision tree algorithm chooses only one feature to split each time, it is feasible to use all relevant weather features to predict spring freshet. Even though it increases the training time, the addition of features does not increase the complexity of the spring freshet model. For the drainage flow rate model, a stricter feature selection process is required because the redundant features could overcomplicate the neural network, compromising the prediction results. Furthermore, because daily weather data typically have a high degree of short-term fluctuations, it is suggested to use weekly average weather data as the inputs to the flow rate model. The weekly averages have been found to improve the prediction accuracy compared with the daily weather data [21,22].

2.4.3. Step 3: Data Partitioning and Hyperparameter Tuning

Data partitioning intends to divide the whole dataset into training, validation, and test sets. The training set is used to train the model; the validation set is used to validate the model performance and prevent over-fitting during training; the test set is used for the final evaluation of the performance of the chosen model. To assess the generalizability of the model, the test set is usually chosen as a continuous interval of data at the end of the time series data.

Hyperparameters are fundamental building blocks that exist in all machine learning models to control the structure and performance of a learning algorithm. Prior to model training and validation, the model architecture defined by a set of hyperparameters should be chosen. Grid search cross validation method is often used to obtain a set of optimal hyperparameters for construction of machine leaching models for training [27]. Therefore, this method is applied to tune the hyperparameters of the decision tree and the LSTM algorithms in the present research.

2.4.4. Step 4: Training and Evaluation of the Spring Freshet Model

The spring freshet model (decision tree algorithm) is trained with the combination of training and validation datasets. The performance of the model is evaluated based on the test set that has never been used for training. The evaluation metric used is the confusion matrix, which is a straightforward way of visualizing the comparison between the actual data and the predicted results. Based on the confusion matrix, the accuracy, precision and recall of the model are calculated to assess the performance of the spring freshet model. The spring freshet model is then used to predict whether a flow rate indicates the occurrence (labelled positive) or the absence of spring freshet (labelled negative).

2.4.5. Step 5: Training and Evaluation of the Drainage Flow Rate Model

The drainage flow rate model (LSTM algorithm) is trained on the training set and validated on the validation set to detect overfitting caused by overtraining, as described by Tetko et al. [28]. Once the mismatch between the predicted and the observed values on the validation set starts to increase in several consecutive epochs, the training is early-stopped even though the mismatch on the training set is still decreasing. The root mean square error (RMSE) is used to quantify the mismatch between the predicted values and the flow rate monitoring data because RMSE has the same unit as flow rate (m³/s).

The machine learning approach developed above is a general framework that can guide mine sites to predict drainage flow rate using weather data as the inputs, specifically for those sites located in areas where spring freshet occurs. However, adjustments may be required in the application of the general framework to specific sites due to unique characteristics of each site, such as data quality and availability and the intended use of the modelling results.

3. Application of the Machine Learning Approach to the Case Study Site

3.1. Data Sources for the Case Study

The machine learning approach developed in this study was applied to predict the drainage flow rate at a case study site located in Canada using weather data as the inputs. The weather data were obtained from the database of Environment Canada, and were recorded by a weather station approximately 3–16 km to the three drainage monitoring stations at the mine. The weather monitoring data were considered to be representative of the weather conditions at the mine. The raw weather data used for preparation of the model features (inputs) include maximum, minimum and mean temperatures; total rainfall; total snowfall; and total precipitation on a daily basis. The drainage flow rates of three waste rock piles were monitored by their respective monitoring stations.

For ease of visualization of the climatic conditions at the mine, Figure 2 shows the average monthly precipitation and the average monthly temperature between 1990 and 2013. Two peaks were observed in the average monthly precipitation: one in June (77 mm) associated with rainfall and the other in November (76 mm) associated with snowfall. August had the lowest average monthly precipitation with only 34 mm rainfall. The two warmest months were July and August (~16 °C). The average monthly temperature was below 0 °C in January, February, March, November, and December.

The drainage flow rates were measured by three monitoring stations, named Station 1, Station 2, and Station 3. All three stations were active at the time of writing, but only data up to 2013 were available for this study. Station 1 had potential subsurface flow paths that might affect the flow rate measurements, especially during the low-flow season. It was assumed that the subsurface flow rate was either constant or correlated to the local weather conditions for the flow rate model to learn. The monitoring data from Station 2 and Station 3 had no known issues. Table 1 summarizes the drainage flow data collected by the three monitoring stations. Station 2 had the smallest number of monitoring data. The flow rate fluctuated the most for Station 2, with the largest difference between the maximum and the minimum values. Station 2 also had a larger mean and median flow rate than the other two stations. All three stations had zero or near-zero flow during the dry season.

3.2. Application of the Machine Learning Approach

3.2.1. Step 1: Data Preparation

Because the machine learning models were trained separately for each station, the data preparation was done separately for each station. Data defects exist in both the weather and the flow rate monitoring data. The data defects are mainly caused by inconsistent monitoring frequencies and missing values. Flow rate was measured more frequently from March to July than the rest of the year for all three monitoring stations. The weather monitoring was interrupted between February 2008 and September 2009. Therefore, the drainage flow rate data without corresponding weather monitoring data were discarded for each station. Station 3 had no flow rate data from 1997 to 2000 due to equipment damage.

Two methods were used to handle missing values in the raw weather data: null-average-filling and null-deleting. The null-average-filling method uses the average weather features to fill the missing values. The null-deleting method keeps the missing values as nulls during data preparation but later removes all null-containing data pairs in construction of the training dataset. The null-average-filling method performed better for the spring freshet model. Thus, the monthly average value was used to fill the missing values of a feature in a given month. This method of handling missing data ensures that more datasets are available for training the spring freshet model. In contrast, the null-deleting method performed better for the flow rate model, possibly because the other method introduces data noises regardless of the type of interpolation algorithms used. These data noises could compromise the performance of the flow rate model, especially in situations where the number of training datasets is very limited.

No data scaling is required for the decision tree algorithm. For the LSTM algorithm, the weather data and the flow rate monitoring data used were scaled to a range of 0 to 1 by the min–max method as shown by Equation (1):

X_{i}^{n} = \frac{X_{i}^{n} - X_{\min}^{n}}{X_{\max}^{n} - X_{\min}^{n}}

(1)

where

X_{i}^{n}

is any value of a variable n;

X_{\max}^{n}

and

X_{\min}^{n}

are the maximum and the minimum values of that variable;

X_{i, scaled}^{n}

is the value after scaling.

A criterion must be defined for the spring freshet model to categorize the occurrence or absence of spring freshet. The onset of spring freshet is typically defined as the date when the increase in daily streamflow across four days is greater than the average from January to July in North America [29]. However, the drainage flow rate data available in this study are too sparse to obtain accurate daily flow rates. Thus, a threshold flow rate was defined, above which a flow rate was labelled positive, i.e., the occurrence of spring freshet, and below which a flow rate was labelled negative, i.e., the absence of spring freshet.

Figure 3 shows the average, maximum, and minimum monthly flow rates for the three monitoring stations. A higher average monthly flow rate occurred in May, June, and July than the rest of the year; thus, these three months were defined as the high-flow season and the rest of the year was defined as the low-flow season. This is consistent with the general pattern in North America that the high-flow season occurs from April to July [30]. The flow rates in May, June and July also showed large differences between the maximum and the minimum values, indicating that spring freshet occurred in these three months. The threshold value for a station was defined as the midpoint between the lowest average monthly flow rate in the high-flow season and the highest average monthly flow rate in the low-flow season. The threshold values for the three stations were thus determined to be 1.2 m³/s for Station 1, 1.8 m³/s for Station 2, and 0.7 m³/s for Station 3.

3.2.2. Step 2: Feature Selection

In this step, the input features were selected for the development of the machine learning models. The relevant weather features selected as the inputs to the spring freshet model include daily mean temperature, daily rainfall, daily snowfall, and daily precipitation together with the corresponding month number when the weather data were collected. The use of the month number (1–12) as a feature enables the spring freshet model to better tolerate noises in the weather data.

Table 2 shows the contribution of each weather feature to the spring freshet model expressed as a percentage, with a higher percentage indicating a higher level of importance. The importance of a feature is determined by the relative drop in the model accuracy resulting from removal of that feature. The spring freshet model was found to rely on two features for prediction of spring freshet: the month number and the daily mean temperature. This is reasonable because spring freshet occurs within a 4 month window and the timing is controlled by temperature [30]. The features selected for the drainage flow rate model were temperature, total precipitation, and their measurement time. This feature selection approach is consistent with a previous study using the artificial neural network for prediction of seepage flow rates at the Equity Silver Mine [21].

3.2.3. Step 3: Data Partitioning and Hyperparameter Tuning

The labelled dataset prepared in Step 1 and 2 was split into a training and a test set. The data in 2013, which was the last monitoring year for the available data, were assigned to the test set. The remaining data were assigned to the training set. Table 3 and Table 4 shows the size of the training and the test set and their split ratio for each station. The size of the training and the test sets is slightly different for the two algorithms. More data were available for training and testing the spring freshet model than the drainage flow rate model due to the different methods applied to handle the missing values.

A five-fold cross-validation grid search was applied to tune the hyperparameters of the decision tree and the LSTM algorithms. The training data were evenly and randomly split into five groups. The cross-validation used one group for validation and the remaining four groups for training. This process was iterated five times, i.e., every group was used for validation. The model was scored for each iteration and the average of the five iterations represents the error score for a given hyperparameter setup. The optimal hyperparameter setup is the one that gives the lowest error score.

The hyperparameters for the spring freshet model are criterion and the minimum leaf weight fraction. Table 5 shows the hyperparameter setup that gives the lowest error score for the spring freshet model. The Gini index was used in this study as the criterion to measure the quality of a split in a decision node. The minimum leaf weight fraction, which is the minimum fraction of the input samples required to be at a leaf node, was fixed at 0.1 to prevent overfitting in the spring freshet model.

Table 6 shows the hyperparameter setup that gives the lowest error score for the LSTM algorithm. One important hyperparameter is the time step, which represents how past and current weather patterns affect the amount of water flowing through a waste rock pile. The time step chosen is 10, with each time step containing data averaged over the past 6 days. The use of average weather data removes daily fluctuations of temperature and precipitation, enhancing the stability of the LSTM algorithm. A time step of 10 means that the model considers the weather patterns in the past 60 days for the prediction of the current flow rate. Previous research on karst flood forecasting has reported that the rainfall event in the past two days had the strongest correlation with the flooding, and the memory effect lasted about 7 days [31]. However, flooding induced by melting of accumulated snow can be affected by precipitation events that have occurred much longer than the past 7 days. Weather patterns in the past 60 days have been suggested to affect the current flow rate measurement [22]. The learning rate, batch size, and epoch were fixed at 0.001, 8, and 80, respectively, during tuning of other hyperparameters. After the determination of other hyperparameters, the model was further tuned manually with different combinations of learning rate, batch size, and epochs. Their final values are shown in Table 6.

3.2.4. Step 4: Training and Evaluation of the Spring Freshet Model

Table 7 shows the performance of the decision tree algorithm in predicting spring freshet for the three stations. The spring freshet model made a certain number of predictions, with positive representing the occurrence of spring freshet and negative representing the absence of spring freshet. The metrics used for evaluating the model performance consist of accuracy, precision, and recall. Accuracy was calculated by dividing the number of correct predictions by the total number of predictions made. Precision was calculated by true positive divided by the total predicted positive (true positive + false positive). Recall was calculated by true positive divided by the total actual positive (true positive + false negative).

For station 1, the model made 23 predictions for the test set, of which 21 were correct, giving an accuracy of 0.91 (21/23); the model made a total of 9 positive predictions, of which 7 is true positive, giving a precision of 0.78 (7/9); the number of true positives predicted by the model is the same as that of the actual positives, meaning that the model was able to capture all actual positives, as indicated by a recall of 1.00. The model is considered to be generally accurate in predicting the occurrence of spring freshet for all three stations. In addition, the precision and the recall were high, indicating that the spring freshet model performed well in that it avoided both a type I error (false positive) and a type II error (false negative).

3.2.5. Step 5: Training and Evaluation of the Drainage Flow Rate Model

The output of the spring freshet model was used as an input to the drainage flow rate model. Figure 4 shows the drainage flow rate predicted by the model in comparison with the observed values in the training and testing of the model. Generally, the flow rate model was able to capture well the trend of the drainage flow rate during training, although it occasionally underestimated extreme high flow events due to a limited number of training examples of extremely high flow rates. These high flow events might occur, for example, due to the occurrence of subsurface channeling induced by heavy rainfall and flooding [15]. The model was tested with the test set that was never used for training and, therefore, was never revealed to the model. The model could predict the flow rate reasonably well in the test set, indicating that the model has a good degree of generalizability.

There were no weather monitoring data between 2008 and 2010 due to a damage to the station and, therefore, no model outputs were generated during this period. It should be noted that in the model training and evaluation, the model only generated outputs when flow rate monitoring data were available. In the actual model application, the drainage flow rate model trained can make flow rate predictions on a daily basis given the required weather monitoring data.

3.3. Contribution of Spring Freshet to the Prediction of Drainage Quantity

To assess the influence of spring freshet model on the prediction of drainage flow rate, another LSTM algorithm was trained without including the spring freshet prediction as an input. The root mean square error (RMSE) and the correlation coefficients (R) were used as the model evaluation metrics, both of which measure the differences between the model outputs and the monitoring values. Table 8 shows that the values of the RMSE with spring freshet prediction were lower than those without for all three stations. This indicates that the performance of the flow rate model was significantly improved by including spring freshet prediction as a model input. Figure 5 shows the correlations between the predicted and the observed values in scatter plots. The diagonal line, with an R of 1, represents a perfect match between the predicted and the observed values. The data points with spring freshet are more concentrated along the diagonals with a larger R value than those without spring freshet, indicating a better model performance with the spring freshet prediction as an input to the flow rate model.

The flow rate model based on the LSTM algorithm developed in the present research performed as well as the one based on the artificial neural network in a previous study, even though the size of the training set used in the present research was only 20% of that in the previous work (~300 compared with the 1789 training examples) [21]. Prior to the prediction of flow rate, the spring freshet model has already considered the influence of a wide range of weather futures on flow rate. Compared with using only weather features as the inputs to the flow rate model, the inclusion of spring freshet improves the data utilization efficiency. This is particularly advantageous in situations where waste rock drainage is not intensively monitored and there are limited data available.

3.4. Responses of Drainage Quantity to Changes in Temperature and Total Precipitation

The trained and tested flow rate model was used to assess how the drainage flow rate responds to variations in the weather inputs, specifically, temperature and precipitation. These sensitivity tests, shown in Figure 6, are particularly important in the context of assessing the potential impact of climate change on the drainage flow rate. It has been reported that changes in both precipitation and temperature associated with climate change are likely to affect the hydrology of Canadian rivers [29]. An earlier spring freshet and a shorter duration of ice cover have been reported in Canada due to global warming [30]. The sensitivity tests were carried out by adding 20% variation to either temperature or precipitation in the test dataset without changing the spring freshet prediction as an input to the flow rate model.

The model predicts that the drainage flow rate is affected by the change in precipitation and temperature. According to the model prediction, a higher level of precipitation is associated with a higher flow rate during the high-flow season, but insignificantly impacts the flow rate in the low-flow season. In general, an increase in temperature is associated with decreased flow rates in the high-flow season and increased flow rates in the low-flow season. The decreased flow rates in the high-flow season are only predicted for Station 1 and 3, whereas there is no distinctive trend of flow rate for Station 2. Given that Station 2 has the smallest number of data pairs for training and validation, the flow rate model for this station may not be as stable as those for the other two stations. The increased flow rates with an elevated temperature in the low-flow season could be explained by an earlier melting of ice and snow [32,33].

4. Conclusions

Given the challenge for traditional modelling to capture the highly heterogeneous nature of mine waste rock piles, a machine learning approach was developed in this study to predict the quantity of waste rock drainage. The machine learning approach developed integrates two algorithms to make such predictions. Specifically, a decision tree algorithm is used to classify the occurrence or absence of spring freshet (melting of snow/ice in spring) using weather data as the inputs. A long short-term memory algorithm is used to predict the drainage flow rate using pre-processed weather data and the spring freshet classification results as the inputs. The novelty of the approach developed is that it includes spring freshet as an input to the drainage flow rate model.

The integrated machine learning approach developed was applied to predict the drainage flow rate at a mine site located in Canada as a case study. The results show that the decision tree algorithm is able to classify the occurrence or absence of spring freshet with an accuracy of 91%. The inclusion of spring freshet as an input to the flow rate model significantly improves the model performance, as supported by a higher correlation coefficient and a lower root mean square error. The sensibility tests show that changes in temperature and atmospheric precipitation influence the drainage flow rate. The use of machine learning for prediction of drainage quantity may help with the design of remediation measures, such as drainage collection system and drainage treatment.

The current study used the same weather data as the input to train the flow rate model individually for each station. However, the drainage flow rate of each waste rock pile responded differently to the same weather conditions. This is indicative of the differences in the properties of these waste rock piles, such as particle size and porosity, which greatly influence the hydraulic properties of a waste rock pile. The machine learning approach developed is capable of capturing these different responses and, therefore, has general applicability. To improve the robustness of the machine learning approach, our future work aims to build physics-informed machine learning models with different categories of factors as inputs, such as waste rock characteristics.

Author Contributions

Conceptualization, L.M. and W.L.; Methodology, C.Z., L.M. and W.L.; Software, C.Z.; Validation, C.Z.; Formal Analysis, C.Z.; Investigation, C.Z., L.M. and W.L.; Resources, L.M. and W.L.; Data Curation, C.Z.; Writing—Original Draft Preparation, C.Z.; Writing—Review & Editing, L.M. and W.L.; Visualization, C.Z.; Supervision, L.M. and W.L.; Project Administration, W.L.; Funding Acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Council Canada New Beginnings Initiative (Grant ID:000433-1) and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant.

Data Availability Statement

The data presented in this study is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ortiz-Castillo, J.E.; Mirazimi, M.; Mohammadi, M.; Dy, E.; Liu, W. The role of microorganisms in the formation, dissolution, and transformation of secondary minerals in mine rock and drainage: A review. Minerals 2021, 11, 1349. [Google Scholar] [CrossRef]
Amos, R.T.; Blowes, D.W.; Bailey, B.L.; Sego, D.C.; Smith, L.; Ritchie, A.I.M. Waste-rock hydrogeology and geochemistry. Appl. Geochem. 2015, 57, 140–156. [Google Scholar] [CrossRef]
Anawar, H.M. Sustainable rehabilitation of mining waste and acid mine drainage using geochemistry, mine type, mineralogy, texture, ore extraction and climate knowledge. J. Environ. Manag. 2015, 158, 111–121. [Google Scholar] [CrossRef]
Fox, A.P.; Lottermoser, B.G. A critical review of acid rock drainage prediction methods and practices. Miner. Eng. 2015, 82, 107–124. [Google Scholar] [CrossRef]
Vriens, B.; Plante, B.; Seigneur, N.; Jamieson, H. Mine waste rock: Insights for sustainable hydrogeochemical management. Minerals 2020, 10, 728. [Google Scholar] [CrossRef]
Da Silva, J.C.; do Amaral Vargas, E.; Sracek, O. Modeling multiphase reactive transport in a waste rock pile with convective oxygen supply. Vadose Zone J. 2009, 8, 1038–1050. [Google Scholar] [CrossRef]
Kuo, E.Y.; Ritchie, A.I.M. The impact of convection on the overall oxidation rate in sulfidic waste rock dumps. In Proceedings of the Mining and the Environment II, Sudbury, ON, Canada, 28 May–1 June 1999. [Google Scholar]
Linklater, C.M.; Sinclair, D.J.; Brown, P.L. Coupled chemistry and transport modelling of sulphidic waste rock dumps at the Aitik mine site, Sweden. Appl. Geochem. 2005, 20, 275–293. [Google Scholar] [CrossRef]
Mayer, K.U.; Frind, E.O.; Blowes, D.W. Multicomponent reactive transport modeling in variably saturated porous media using a generalized formulation for kinetically controlled reactions. Water Resour. Res. 2002, 38, 13-1–13-21. [Google Scholar] [CrossRef]
Molins, S.; Mayer, K.U. Coupling between geochemical reactions and multicomponent gas and solute transport in unsaturated media: A reactive transport modeling study. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
Molson, J.W.; Fala, O.; Aubertin, M.; Bussière, B. Numerical simulations of pyrite oxidation and acid mine drainage in unsaturated waste rock piles. J. Contam. Hydrol. 2005, 78, 343–371. [Google Scholar] [CrossRef]
Pantelis, G.; Ritchie, A.I.M.; Stepanyants, Y.A. A conceptual model for the description of oxidation and transport processes in sulphidic waste rock dumps. Appl. Math. Model. 2002, 26, 751–770. [Google Scholar] [CrossRef] [Green Version]
Ma, L.; Huang, C.; Liu, Z.S.; Morin, K.A.; Aziz, M.; Meints, C. Prediction of acid rock drainage in waste rock piles part 1: Water film model for geochemical reactions and application to a full-scale case study. J. Contam. Hydrol. 2019, 220, 98–107. [Google Scholar] [CrossRef] [PubMed]
Muniruzzaman, M.; Pedretti, D. Mechanistic models supporting uncertainty quantification of water quality predictions in heterogeneous mining waste rocks: A review. Stoch. Environ. Res. Risk Assess. 2021, 35, 985–1001. [Google Scholar] [CrossRef]
Nichol, C.F. Transient Flow and Transport in Unsaturated Heterogeneous Media. Ph.D. Thesis, The University of British Columbia, Vancouver, BC, Canada, 2002. [Google Scholar]
Eriksson, N.; Destouni, G. Combined effects of dissolution kinetics, secondary mineral precipitation, and preferential flow on copper leaching from mining waste rock. Water Resour. Res. 1997, 33, 471–483. [Google Scholar] [CrossRef]
Aryafar, A.; Gholami, R.; Rooki, R.; Doulati Ardejani, F. Heavy metal pollution assessment using support vector machine in the Shur River, Sarcheshmeh copper mine, Iran. Environ. Earth Sci. 2012, 67, 1191–1199. [Google Scholar] [CrossRef]
Khandelwal, M.; Singh, T.N. Prediction of mine water quality by physical parameters. J. Sci. Ind. Res. 2005, 64, 564–570. [Google Scholar]
Betrie, G.D.; Tesfamariam, S.; Morin, K.A.; Sadiq, R. Predicting copper concentrations in acid mine drainage: A comparative analysis of five machine learning techniques. Environ. Monit. Assess. 2013, 185, 4171–4182. [Google Scholar] [CrossRef]
Betrie, G.D.; Sadiq, R.; Morin, K.A.; Tesfamariam, S. Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: A probability bounds approach. Sci. Total Environ. 2014, 490, 182–190. [Google Scholar] [CrossRef]
Ma, L.; Huang, C.; Liu, Z.S.; Morin, K.A.; Aziz, M.; Meints, C. Artificial neural network for prediction of full-scale seepage flow rate at the equity silver mine. Water. Air. Soil Pollut. 2020, 231, 179. [Google Scholar] [CrossRef]
Ma, L.; Huang, C.; Liu, Z.S.; Morin, K.A.; Aziz, M.; Meints, C. The correlation between drainage chemistry and weather for full-scale waste rock piles based on artificial neural network. J. Contam. Hydrol. 2021, 239, 103793. [Google Scholar] [CrossRef]
Szmigielski, J.T.; Barbour, S.L.; Carey, S.K.; Kurylo, J.; McClymont, A.F.; Hendry, M.J. Hydrogeology of a montane headwater groundwater system downgradient of a coal-mine waste rock dump: Elk Valley, British Columbia, Canada. Hydrogeol. J. 2018, 26, 2341–2356. [Google Scholar] [CrossRef]
Barbour, S.L.; Hendry, M.J.; Carey, S.K. High-resolution profiling of the stable isotopes of water in unsaturated coal waste rock. J. Hydrol. 2016, 534, 616–629. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Pearce, S.; Dobchuk, B.; Shurniak, R.; Song, J.; Christensen, D. Linking waste rock dump construction and design with seepage geochemistry: An integrated approach using quantitative tools. In Proceedings of the IMWA 2016—Mining meets water—Conflicts Solution, Freiberg, Germany, 11–15 July 2016. [Google Scholar]
Hsu, C.; Chang, C.; Lin, C. A practical guide to support vector classification. BJU Int. 2008, 101, 1–16. [Google Scholar]
Tetko, I.V.; Livingstone, D.J.; Luik, A.I. Neural network studies. 1. comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 1995, 35, 826–833. [Google Scholar] [CrossRef]
Zhang, X.; Harvey, K.D.; Hogg, W.D.; Yuzyk, T.R. Trends in Canadian streamflow. Water Resour. Res. 2001, 37, 987–998. [Google Scholar] [CrossRef]
Jones, N. High flows and freshet timing in Canada: Observed trends. In Climate Change Research Report; Ontario Ministry of Natural Resources and Forestry: Peterborough, ON, Canada, 2015. [Google Scholar]
Siou, L.K.A.; Johannet, A.; Borrell, V.; Pistre, S. Complexity selection of a neural network model for karst flood forecasting: The case of the Lez Basin (southern France). J. Hydrol. 2011, 403, 367–380. [Google Scholar] [CrossRef] [Green Version]
Aguado, E.; Cayan, D.; Riddle, L.; Roos, M. Climatic fluctuations and the timing of west coast streamflow. J. Clim. 1992, 5, 1468–1483. [Google Scholar] [CrossRef]
Stewart, I.T.; Cayan, D.R.; Dettinger, M.D. Changes toward earlier streamflow timing across western North America. J. Clim. 2005, 18, 1136–1155. [Google Scholar] [CrossRef]

Figure 1. A schematic of the general framework of the machine learning approach to predict drainage flow rate using weather data as the inputs.

Figure 2. The average monthly precipitation and the average monthly temperature between 1990 and 2013 at the case study mine.

Figure 3. Average, maximum, and minimum monthly drainage flow rates and the spring freshet threshold for the three monitoring stations. (a) Station 1 with spring freshet threshold of 1.2 m³/s; (b) Station 2 with spring freshet threshold of 1.8 m³/s; (c) Station 3 with spring freshet threshold of 0.7 m³/s.

Figure 4. The performance of the drainage flow rate model during training and testing for the three stations. (a) Station 1 with training set from 1995–2013; (b) Station 2 with training set from 1996–2013; (c) Station 3 with training set from 1992–2013.

Figure 5. Correlations between the predicted and the observed flow rates (both standardized) with and without spring freshet as an input to the flow rate model. (a) Station 1; (b) Station 2; (c) Station 3.

Figure 6. Changes in drainage flow rates in response to variations in temperature and total precipitation. (a) Temperature sensitivity test for Station 1; (b) Precipitation sensitivity test for Station 1; (c) Temperature sensitivity test for Station 2; (d) Precipitation sensitivity test for Station 2; (e) Temperature sensitivity test for Station 3; (f) Precipitation sensitivity test for Station 3.

Table 1. Summary of the drainage flow rate data for the three monitoring stations at the mine.

	Station 1	Station 2	Station 3
No. of monitoring data	402	335	378
Monitoring period	1995–2013	1996–2013	1992–2013
Mean (m³/s)	1.30	2.13	0.77
Median (m³/s)	0.97	1.66	0.46
Minimum (m³/s)	0.01	0.00	0.00
Maximum (m³/s)	7.32	12.19	4.33

Table 2. The importance of various weather features to the spring freshet model for the three stations.

Station	Month	Temperature	Rainfall	Snowfall	Precipitation
Station 1	45%	54%	0.0%	0.0%	<0.1%
Station 2	61%	39%	<0.1%	0.0%	0.0%
Station 3	41%	59%	0.0%	0.0%	0.0%

Table 3. Splitting of the labelled dataset into training and test sets for development of the spring freshet model.

Station	Size of the Training Set	Size of the Test Set	Train/Test Split Ratio
Station 1	379	23	94:6
Station 2	313	22	93:7
Station 3	359	19	95:5

Table 4. Splitting of the labelled dataset into training and test sets for development of the LSTM model.

Station	Size of the Training Set	Size of the Test Set	Train/Test Split Ratio
Station 1	338	22	94:6
Station 2	275	21	93:7
Station 3	318	18	95:5

Table 5. Hyperparameters with the lowest error score for the decision tree algorithm selected by 5-fold cross validation grid search.

Station	Criterion	Minimum Leaf Weight Fraction
Station 1	Gini	0.1
Station 2	Gini	0.1
Station 3	Gini	0.1

Table 6. Hyperparameters with the lowest error score for the flow rate model selected by a 5-fold cross validation grid search.

Station	Hidden States	Dropout Rate	Constraint	Time Step	Batch Size	Learning Rate	Epochs
Station 1	100	0.3	3	10	4	0.001	80
Station 2	50	0.2	3	10	4	0.001	80
Station 3	100	0.2	99	10	4	0.001	100

Table 7. The confusion matrix and the evaluation metrics for the spring freshet model.

	Actual Monitoring	Model Prediction		Model Evaluation
	Actual Monitoring	Positive	Negative	Accuracy	Precision	Recall
Station 1	Positive	7	0	0.91	0.78	1.00
Station 1	Negative	2	14	0.91	0.78	1.00
Station 2	Positive	10	2	0.91	1.00	0.83
Station 2	Negative	0	10	0.91	1.00	0.83
Station 3	Positive	8	1	0.89	0.89	0.89
Station 3	Negative	1	9	0.89	0.89	0.89

Table 8. The root mean square error as the metric in evaluation of the flow rate model.

Station	With Spring Freshet (m³/s)	Without Spring Freshet (m³/s)
Station 1	0.92	1.03
Station 2	1.01	1.61
Station 3	0.44	1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, C.; Ma, L.; Liu, W. A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet. Minerals 2023, 13, 376. https://doi.org/10.3390/min13030376

AMA Style

Zhang C, Ma L, Liu W. A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet. Minerals. 2023; 13(3):376. https://doi.org/10.3390/min13030376

Chicago/Turabian Style

Zhang, Can, Liang Ma, and Wenying Liu. 2023. "A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet" Minerals 13, no. 3: 376. https://doi.org/10.3390/min13030376

APA Style

Zhang, C., Ma, L., & Liu, W. (2023). A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet. Minerals, 13(3), 376. https://doi.org/10.3390/min13030376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet

Abstract

1. Introduction

2. Methodology

2.1. The General Framework of the Machine Learning Approach

2.2. Decision Tree Algorithm for the Prediction of Spring Freshet

2.3. LSTM Algorithm for the Prediction of Drainage Flow Rate

2.4. Steps Involved in the Application of the Machine Learning Approach

2.4.1. Step 1: Data Preparation

2.4.2. Step 2: Feature Selection

2.4.3. Step 3: Data Partitioning and Hyperparameter Tuning

2.4.4. Step 4: Training and Evaluation of the Spring Freshet Model

2.4.5. Step 5: Training and Evaluation of the Drainage Flow Rate Model

3. Application of the Machine Learning Approach to the Case Study Site

3.1. Data Sources for the Case Study

3.2. Application of the Machine Learning Approach

3.2.1. Step 1: Data Preparation

3.2.2. Step 2: Feature Selection

3.2.3. Step 3: Data Partitioning and Hyperparameter Tuning

3.2.4. Step 4: Training and Evaluation of the Spring Freshet Model

3.2.5. Step 5: Training and Evaluation of the Drainage Flow Rate Model

3.3. Contribution of Spring Freshet to the Prediction of Drainage Quantity

3.4. Responses of Drainage Quantity to Changes in Temperature and Total Precipitation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI