Next Article in Journal
Wastewater and Grey Water Footprint Assessment of the Olive Oil Production Process in Northwest Argentina
Previous Article in Journal
MOC-Z Model of Transient Cavitating Flow in Viscoelastic Pipe
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China

1
Xi’an Research Institute Co., Ltd., China Coal Technology and Engineering Group Corp., Xi’an 710077, China
2
Shaanxi Key Laboratory of Prevention and Control Technology for Coal Mine Water Hazard, Xi’an 700077, China
3
MOE Key Laboratory of Groundwater Circulation and Environmental Evolution, China University of Geosciences, Beijing 100083, China
4
China Coal Research Institute, Beijing 100013, China
5
Chinese Academy of Geological Science, Beijing 100037, China
6
Inner Mongolia Key Laboratory of River and Lake Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(11), 1611; https://doi.org/10.3390/w16111611
Submission received: 27 March 2024 / Revised: 9 May 2024 / Accepted: 14 May 2024 / Published: 5 June 2024

Abstract

:
As the medium of geological information, groundwater provides an indirect method to solve the secondary disasters of mining activities. Identifying the groundwater regime of overburden aquifers induced by the mining disturbance is significant in mining safety and geological environment protection. This study proposes the novel data-driven algorithm based on the combination of machine learning methods and hydrochemical analyses to predict anomalous changes in groundwater levels within the mine and its neighboring areas induced after mining activities accurately. The hydrochemistry analysis reveals that the dissolution of carbonate and evaporite and the cation exchange function are the main hydrochemical process for controlling the groundwater environment. The anomalous change in the hydrochemistry characteristic in different aquifers reveals that the hydraulic connection between different aquifers is enhanced by mining activities. The continuous wavelet coherence is used to reveal the nonlinear relationship between the groundwater level change and external influencing factors. Based on the above analysis, the groundwater level, precipitation, mine water inflow, and unit goal area could be considered as the input variables of the hydrological model. Two different data-driven algorithms, the Decision Tree and the Long Short-Term Memory (LSTM) neural network, are introduced to construct the hydrological prediction model. Four error metrics (MAPE, RMSE, NSE and R2) are applied for evaluating the performance of hydrological model. For the NSE value, the predictive accuracy of the hydrological model constructed using LSTM is 8% higher than that of Decision Tree algorithm. Accurately predicting the anomalous change in groundwater level caused by the mining activities could ensure the safety of coal mining and prevent the secondary disaster of mining activities.

1. Introduction

During the past decades, mining activities have caused several geo-environmental problems [1,2], including water loss, the decline of groundwater level and deterioration of groundwater quality, etc. The mining activities-induced secondary disaster [3], geological disaster [4,5,6] and land subsidence [7,8], threaten the safety of mining activities. Groundwater plays an important role during the mining activities, which is capable of amplifying the microdynamics of the geological setting caused by mining activities. In addition, mine water inrush induced by the mining activities threatens the mine safety [9,10]. Thus, monitoring and predicting the dynamic change in groundwater level is the most effective way to prevent the secondary disaster of mining activities.
Naturally, groundwater level is influenced by several external factors, including geological factors [11,12,13], meteorological factors, and anthropogenic factors, etc. These factors cause the highly nonlinear dynamics of groundwater level in the time and frequency domains, making it difficult to accurately predict groundwater levels. Traditionally, the physical-based model provides the feasible way to predict the variation in groundwater level, which need stratigraphic structures, aquifer parameters, and boundary conditions. However, due to the heterogeneity and anisotropy of aquifer properties, it is difficult to obtain these hydrogeological parameters accurately [14,15]. Nowadays, the data-driven method provides an efficient way to build hydrological models and improve the accuracy of model prediction [16,17]. The significant advantage of data-driven algorithms is that there is no need to explicitly define physical relationships between input and output variables. In the previous studies, several machine learning methods have been used in hydrological research, including identifying anomalies and prediction and cluster analysis, which are mainly divided into cluster analysis and data prediction [18,19]. A self-organizing neural network is one of the typical algorithms of cluster analysis, which can be used to identify the source of groundwater pollution [20,21,22]. The Artificial Neural Network (ANN), Recurrent Neural Network (RNN), Decision Tree and Long-Short Term Memory (LSTM) are used to analyze and predict the variation in groundwater level [23,24,25,26]. Generally, the machine learning algorithms have a three-layer structure, including input layer, hidden layer, and output layer. However, for different algorithms, there are great differences in the composition of three-layer structures. In ANN, both the hidden and output layer have the non-linear activation function. This is helpful in improving the prediction accuracy. The disadvantage of this algorithm is that it is not possible to take into account the relationship between different input variables. In RNN, the adjacent neurons in the hidden layer are connected together, which is the biggest difference from the ANN structure. It is helpful to prevent gradient disappearance and gradient explosion. Unlike the previous two methods, the Decision Tree and LSTM algorithm take into account the relationships between different input variables during model training, while extracting valuable information from previous neurons and passing it to the current neuron for predicting data. The Decision Tree is applied for the anomalous identification of radon before seismic activities [27]. The LSTM is used to predict the variation in groundwater level under the influence of meteorological factors. However, few studies focus on applying machine learning methods to predict anomalous changes in groundwater levels caused by the mining activities.
In mine area hydrology, most of the previous focus has been on the anomalous change in the hydrochemical components and aquifer parameters caused by the mining activities, while few studies have concentrated on the application of different data-driven methods to predict the dynamic change in groundwater level. The long-term mining activities in the Yili Coalfield has had a significant impact on the regional groundwater regime. Both the deterioration of the groundwater quality and the decline of the groundwater level threatened the local ecological environment and production safety. In this study, a hydrochemistry analysis and two different data-driven methods are used to predict the variation in the groundwater level induced by mining activities. Based on the monitoring dataset in the Yili Coalfield, the aims are to (1) reveal the genetic mechanism of the hydrochemistry characteristics. The comparison of hydrochemical data before and after mining activities is used to identify the effect of mining activities on the hydraulic connection between different aquifers. A further aim is to (2) identify the potential influencing factors. Continuous wavelet coherence is used to identify the external factors on the anomalous change in groundwater level induced by the mining activities, and then select the appropriate input variables to construct the data-driven predicting model; (3) to construct the data-driven models. Using the Decision Trees and Long Short-Term Memory neural network algorithm to construct the predicting model, this paper aims to (4) evaluate the model performance. Using four different error metrics to evaluate the predictive performance of data-driven model, including R2, root mean square error (RMSE), mean absolute percentage error (MAPE), and Nash–Sutcliffe efficiency (NSE). The result of this study could provide the new data-driven method to predict groundwater level change caused by the mining activities, which could also provide the premise for coal mining safety and water resource management.

2. Geological Setting and Data Source

2.1. Regional Hydrogeological Setting

The Yili 1# mining area, covering about 208 km2, is located in the northwestern part of the Xinjiang Uygur Autonomous Region, northwestern China, which is the first modernized coalfield in Xinjiang with a production capacity of 10 million t/a. The elevation of the study area ranges from 1000 m to 1300 m [28]. The main working areas of the mining activities are No. 3# and No. 5#. Longwall mining extracts large rectangular panels of coal, and the roof is temporarily supported using moveable hydraulic supports (Figure 1). The climate of the study area is characterized by a typical arid climate with a low relative humidity and high evapotranspiration. The annual evapotranspiration ranges from 1259 mm to 2381 mm, and the average annual temperature is 8 °C. The annual average precipitation is 90 mm [29,30]. No. 3# and No. 5# are the main extracted coal seam.
Tectonically, the study area is characterized by a monoclinic structure, with an east–west orientation. According to the mineral XRD analysis results, the mineral species include dolomite, calcite, halite, gypsum, albite, quartz and anorthite [28]. The aquifers are classified into Quaternary pore aquifer (phreatic aquifer) and Jurassic fissure aquifer (confined aquifer). The aquitards are composed of mudstone and siltstone, which are interbedded with the aquifer in different thicknesses. The isotopic results indicate that both the Quaternary pore aquifer and Jurassic fissure aquifer are recharged by the atmospheric precipitation and meltwater from the Tianshan Mountain. The Quaternary aquifer is characterized by the good water yield with the Q value of 0.041 L/s/m~0.128 L/s/m, while the Jurassic fissure aquifer is characterized by poor water yield with the specific capacity of 0.008 L/s/m~0.109 L/s/m [28].

2.2. Data Collection

2.2.1. Time Series Data

To ensure mining safety and reveal the impact of mining activities on the groundwater regime, the phreatic groundwater level (PGL), confined groundwater level (CGL), mine water inflow (MWI), and unit goal area (UGA) were monitored in the Yili 1# mining area during 2018~2023. The meteorological dataset was collected from the China Meteorological Administration (http://data.cma.cn/, accessed on 15 May 2024). Analyzing the dynamic change in monitored value could provide an accessible way to identify the impact of mining activities on the aquifer and its potential influencing factors.
In order to eliminate the effect of different monitored intervals, the monitored value of the groundwater level, precipitation, mine water inflow, and unit goal area are transferred to the monthly average value for analysis. As is shown in Figure 2, the precipitation shows a significant seasonal variation with a higher level during the summer wet season and lower levels during the drier winter season. The maximum value of precipitation is 45 mm. The mine water inflow (WI) increases with the increase in unit goal area (GA), indicating that the increase in the extent of mining promoted mine water inrush. The maximum value of the mine water inflow and the unit goal area are 311.5 m3/h and 54,300 m2, respectively. For the groundwater level change, there is no obvious change in the phreatic groundwater level associated with mining activities. It only shows the step-like change in 2021. In contrast, the confined groundwater level shows a significant decline from 2018 to 2020, associated with increases in WI and GA which indicates that it has been affected by the mining activities.

2.2.2. Hydrochemistry Data

A total of 59 groundwater samples were collected from the study area. A total of 26 groundwater samples were collected during the period of 2004–2006 (before mining activities), including 6 samples from the phreatic aquifer, and 20 samples from the confined aquifer. For identifying the effect of mining activities on the dynamic variation in groundwater, 33 groundwater samples were collected from 2021 to 2023 (after mining activities), including 5 phreatic groundwater samples and 28 confined water samples. All groundwater samples were analyzed and measured in the Key Laboratory of Prevention and Control Technology for Coal Mine Water Hazard, Shaanxi Province. The cations (Na+, K+, Ca2+, Mg2+) were measured using ion chromatography. The SO42− and Cl were also measured using ion chromatography, while the HCO3 was measured using the traditional titration with HCl. To ensure the accuracy and reliability of the hydrochemical analysis results, the charge balance error (CBE = c a t i o n s a n i o n s c a t i o n s + a n i o n s × 100 ) was calculated [8]. The results show that the CBE is <±10%, indicating all measurement results are acceptable.

3. Methods

3.1. Hydrochemistry Analysis

During the process of the mining activities, several groundwater samples collected from the phreatic and confined aquifers underwent a hydrochemistry analysis. The results of the hydrochemistry analysis are summarized in Table 1.
Before the mining activities, 6 groundwater samples were collected from the phreatic aquifer. Their pH values range from 7.70 to 7.90 with the average value of 7.83. The TDS values are 206–648 mg/L with the average value of 349 mg/L; this is considered as the fresh water (TDS < 1000 mg/L). The order of cation abundance is Ca2+ > Na+ + K+ > Mg2+ and those of the anions is HCO3 + CO32− > SO42– > Cl. The hydrochemistry type is characterized by HCO3-Ca and HCO3-Na. Then, the 20 groundwater samples were collected from the confined aquifer. Their range of pH and TDS are 7.50–8.00 and 400–3132 mg/L, respectively. The 80% and 20% samples are categorized as fresh water (TDS < 1000 mg/L) and brackish water (1000 mg/L < TDS < 3000 mg/L), respectively. The hydrochemistry type is characterized by HCO3-Ca/Na and SO4-Ca/Na.
After the mining activities, 5 samples and 28 samples were collected from the phreatic and confined aquifer, respectively. The order of cation and anion abundance in the groundwater sample of the phreatic and confined aquifers are different to those before the mining activities. For the phreatic aquifer, the pH value is 7.30–8.10, with the average value of 7.82. The TDS value ranges from 614 mg/L to 1719 mg/L. The average value of the TDS is 1264 mg/L, which is about three times than that of before the mining activities. The order of cation and anion abundance are different from those before the mining activities. The order of cation abundance is Na+ + K+ > Ca2+ > Mg2+ and those of the anions is SO42– > HCO3 + CO32− > Cl. The hydrochemistry type is SO4-Na and SO4-Ca. For the confined aquifer, the pH value ranges from 6.70 to 8.20 with the average value of 7.45. The TDS ranges from 309 mg/L to 1019 mg/L. The order of cation abundance is Ca2+ > Na+ + K+ > Mg2+ and those of the anions is SO42– > HCO3 + CO32− > Cl. The hydrochemistry type is HCO3-Ca/Na and SO4-Ca/Na. The genetic mechanisms of the hydrochemistry change caused by mining activities are discussed in Section 5.1.

3.2. Time Series Analysis (Continuous Wavelet Coherence)

A continuous wavelet transform is an efficiency mathematical tool to identify the correlation between different hydrological time series through the time and frequency domain [31,32]. In this study, it is applied to analyze the correlation between external influencing factors and groundwater level change. The Morlet function is selected to conduct the continuous wavelet coherence [33,34]. The mathematical equation is defined as follows:
R ( x , y ) = S ( s 1 W ( x , y ) ) 2 S ( s 1 W ( x ) ) S ( s 1 W ( y ) )
where W and S represent the continuous wavelet transform and smoothing operator, respectively. s is wavelet scale. R represents the correlation coefficient. The closer the R value is to 1, the higher the correlation is.

3.3. Machine Learning

3.3.1. Long Short-Term Memory Neural Network

The Long Short Term Memory algorithm (LSTM) is composed of three layers: input layer, hidden layer and output layer (Figure 3). The hidden layer is composed of three gates: input gates, output gates, and forget gates (Figure 3), which protects the valuable information of the dataset when it is passed down in the process of information transfer and solves the problem of gradient explosion and gradient disappearance. In this study, the LSTM model is programmed using MATLAB.

3.3.2. Decision Trees

The classical regression method applies fitting curves to build the linear relationship between the independent variables and the dependent variable, although it cannot accurately construct the relationship between the highly nonlinear variables. The Decision Trees could solve this scientific problem, which applies the independent axis-parallel rectangles to partition the space between the independent variables and the dependent variable. It provides an efficient way to build the nonlinear relationship. In this study, the WEKA (3.8.6) software is applied to build the Decision Tree model using an M5 algorithm. If there is a missing value in the dataset, the linear interpolation is introduced to complement the gap. Before training the hydrological model, the M5 algorithm conducts the pre-pruning of the dataset. The construction of the Decision Tree stops only when the class variance of all examples in the node is sufficiently small. The model in the selected leaf can then predict the value of the class.

4. Model Development

4.1. Splitting the Dataset into Different Subsets

The splitting of the dataset is the basic step for constructing the data-driven model. If the training dataset takes up a small proportion, the data-driven model cannot identify the mathematic characteristic of the time series that leads to poor prediction accuracy. In contrast, if the training dataset takes up a large proportion, it may cause the overfitting of the data-driven model. Actually, since there is no fixed ratio in the training dataset and prediction dataset, how to divide data sets accurately has always been a controversial problem. According to the results of previous studies, the proportion of the training dataset must be more than 50% of whole dataset. In this study, the dataset is split into three subsets, including a training subset and prediction subset. The proportion of the two subsets is 7:3. The training subset and validation subset are introduced to optimize the data-driven model parameters.

4.2. Data Normalization and Error Metric

The min–max normalization method is applied to eliminate the data dimensional influence, which normalizes the input variables into [0, 1]. It is defined as follows:
x n o r m = x x min x max x min
where x and xnorm represent the monitored value and the normalized value, respectively. xmax and xmin are the maximum monitored value and the minimum monitored value, respectively. After the model is trained, the predictive results are retransformed through the inverse transformation of Equation (2).
Four different error metrics are introduced to evaluate the predictive efficiency of data-driven model, as follows:
The coefficient of determination [2,34,35]:
R 2 = 1 i = 1 N ( y i y i * ) 2 i = 1 N y i 2 i = 1 N y i * 2 N
The root mean square error (RMSE) [2,34,35]:
RMSE = i = 1 N ( y i y i * ) 2 N
The mean absolute percentage error (MAPE) [2,34,35]:
MAPE = 1 N i = 1 N y i y i * y i     100 %
The Nash–Sutcliffe efficiency (NSE) [2,34,35]:
NSE = 1 i = 1 N ( y i y i * ) 2 i = 1 N ( y i y i _ ) 2
where yi is the observed value, y i * and y i _ represent the simulated value and the mean of observed values, respectively. N is the number of observations. R2 calculates the error metrics between monitored values and simulated values [35]. RMSE evaluates the deviation between monitored values and simulated values. MAPE is applied to calculate the predictability accuracy of data-driven models as a percentage. NSE is applied to evaluate the hydrological model accuracy.

4.3. Input Variable Selection

The prerequisite for improving the predicted accuracy of the hydrological model is the selection of proper input variables, as it could provide the basic hydrological information. However, no guideline exists on how to select the proper input variables for constructing the hydrologic model using a machine learning algorithm. In this study, the hydrochemistry analysis and continuous wavelet coherence method are combined to analyze the external influencing factors on the anomalous change in the groundwater level caused by mining activities, which provides an effective method for determining the input variables of the data-driven model.

5. Results and Discussion

5.1. Hydrochemistry Change Induced by the Mining Activities

The ionic ratio and saturation indices are introduced to identify the genetic mechanism of the hydrochemical characteristic. As is shown in Figure 4, the groundwater samples are distributed in the middle part of a Gibbs diagram. It is indicated that the water–rock interaction is the main hydrochemical process for controlling ionic concentration.
All groundwater samples are characterized by high Na+, Ca2+, HCO3 and SO42−, which indicates that they are primarily derived from the dissolution of evaporite (e.g., halite and gypsum) and carbonate (e.g., calcite and dolomite). The saturation indices of gypsum, halite, calcite and dolomite are calculated. SIgypsum and SIhalite are <0 (Figure 5a), indicating that gypsum and halite tend to dissolve, while SIcalcite and SIdolomite in most of the groundwater samples are >0 (Figure 5b), revealing that calcite and dolomite tend to saturate. If halite is the sole source of Na+, the Na+/Cl value should be equal to 1 [8]. As shown in Figure 5c, the Na+/Cl value in all the groundwater samples is >1, and the value of Ca2+/SO42− in some groundwater samples is <1 (Figure 5d), which indicates that there is another hydrochemical process leading to the increase in Na+ concentration and the decrease in Ca2+ concentration (e.g., cation exchange function). The chloro-alkaline indices of CAI 1 and CAI 2, and (Na+ + K+ − Cl)/((Ca2+ + Mg2+) − (SO42− + HCO3)) are introduced to evaluate the interaction of cation exchange. The value of CAI 1 and CAI 2 in most groundwater samples is <0 (Figure 5e), and the value of (Na+ + K+ − Cl)/((Ca2+ + Mg2+) − (SO42− + HCO3)) in most groundwater samples is distributed along the 1:1 line (Figure 5f). Both of the above calculated results indicate that the cation exchange function regulates the ionic concentration of Na+ and Ca2+. The values of (Ca2+ + Mg2+)/(SO42− + HCO3), Ca2+/HCO3 and Ca2+/SO42− in most groundwater samples are located near the 1:1 line (Figure 5d,g,h), which indicates that the dissolution of carbonate and gypsum are the main sources of Ca2+ and Mg2+ [5]. Some groundwater samples are located above the 1:1 line in Figure 5g, which are attributed to the decrease in Ca2+ and Mg2+ and the increase in Na+ and K+ under the cation exchange function.
As is shown in Figure 6, the mining activities cause the anomalous change in the hydrochemistry characteristic in the phreatic and confined aquifers. Before the mining activities, the groundwater flow speed in the phreatic aquifer is fast, resulting in the weakness of the water–rock interaction and low mineralization. In contrast, the groundwater environment of the confined aquifer is relatively stable and the groundwater circulation is slow, which is characterized by the strong water–rock interaction and high mineralization. Previous studies have revealed that the mining activities cause the aquifer’s parameters to change and enhance the hydraulic connection between the different aquifers caused by the mining activities. In this study, the hydrochemical characteristics of the phreatic and confined aquifers have changed due to the increased hydraulic connection between them. After the mining activities, the hydrochemistry type of the phreatic aquifer changes from HCO3-Ca and HCO3-Na to SO4-Ca and SO4-Na with increasing TDS. It is attributed to the inflow of high mineralization groundwater from the confined aquifer. In addition, the mining activities promote the occurrence of low-mineralization and high-mineralization groundwater. The inflow of groundwater with low TDS in the phreatic aquifer results in the mineralization decrease in the confined aquifer with high TDS. After mining activities, a large number of water-conducting fissures are distributed in the overburden aquifer. In this case, the renewal time of the groundwater system is shortened, which promotes the active flushing of groundwater (i.e., groundwater cycling). As a result, the presence of water-conducting fissures brings more CO2 into the groundwater, which promotes the dissolution of calcite and dolomite (Equations (7) and (8)) [8]; this causes the mineralization increase.
CaCO3 + CO2 + H2O → Ca2+ + 2HCO3
CaMg(CO3)2 + 2CO2 + 2H2O → Ca2+ + Mg2+ + 4HCO3
In summary, the hydrochemistry analysis reveals that the enhancement of the hydraulic connection between the phreatic aquifer and confined aquifer induced by mining activities causes the anomalous change in the hydrochemistry characteristic. Thus, the phreatic groundwater level can be considered as the input variable data-driven model for predicting the dynamic change in the groundwater level in the confined aquifer.

5.2. The External Influencing Factors

The wavelet coherence method is applied to analyze the relationship between the groundwater level and precipitation (P), mine water inflow (MWI), and the unit goal area (UGA). The analysis results of PGL and CGL are shown in Figure 7 and Figure 8, respectively.
As is shown in Figure 7, there is the high coherence between the PGL and precipitation within the band of 128 days in 2021~2022 and the band of 256 days in 2019~2020. The coherence between the PGL and mine water inflow strengthens for 64 days during the interval of 2019~2020 and 128 days in 2022. The PGL is highly coherent with the unit goal area within the band of 256 days in 2020–2022. As is shown in Figure 8, the CGL and precipitation shows coherence within the band of 64~128 days from 2019 to 2021. The coherence between the CGL and mine water inflow is shown to be highly coherent within the band of 128 days during the period of 2019~2022. The coherence between the CGL and unit goal area strengthens for 128 days and 256 days from 2020 to 2021.
To sum up, the dynamic change in the groundwater level is highly coherent with several external influencing factors in different frequency domains over past time periods. Thus, P, MWI and UGA can be considered as the input variables of the data-driven model for predicting the anomalous change in the groundwater level induced by mining activities.

5.3. Comparisons of Prediction Performance

Based on the different input variables, the PGL model and the CGL model are constructed for predicting the groundwater level change caused by mining activities. A 70% dataset is considered as the training subset, and the remain 30% dataset is the prediction subset. The results predicted using the Decision Tree and LSTM are shown in Figure 9 and Figure 10. The errors of the model predictions are summarized in Table 2. Although the anomalous change in the groundwater level induced by mining activities can be predicted using the Decision Tree and LSTM, different error indicators indicate that the prediction performance of the two algorithms is different.
For the PGL model, the MAPE and RMSE of the LSTM algorithm are smaller than those of the Decision Tree algorithm. The NSE value of LSTM is larger than that of the Decision Tree, which is closer to 1. Generally, the MAPE and RMSE values are close to zero, and the R2 and NSE values are close to one, indicating that the predicted performance of the data-driven model is perfect. The error metrics indicate that the predicted performance of the PGL model using the LSTM algorithm is greater than that of the Decision Trees algorithm. Similar conclusions can be found in the error results of the CGL models.
In order to further analyze the predictive result of the data-driven model, the scatter plot of the stimulated value and monitored value in the training and prediction stage are shown in Figure 11 and Figure 12, respectively. If the data-driven model could predict the anomalous change in the groundwater level caused by the mining activities accurately, the predictive results should be distributed over X = Y. In the PGL model, the R2 value of the Decision Tree is 0.93 for the training stage and 0.88 for the predicting stage, respectively, and these values regarding the LSTM algorithm are 0.95 in the training stage and 0.91 in the predicting stage, respectively. In the CGL model, the R2 value of the training stage is 0.91 in the Decision Tree algorithm and 0.85 in the LSTM algorithm, and these values in the predicting stage are 0.96 in the Decision Tree algorithm and 0.90 in the LSTM algorithm, respectively. The results reveal that the predictive performance of the LSTM algorithm is better than the Decision Tree algorithm. Compared with the prediction results calculated using the Decision Tree algorithm, the prediction accuracy of the hydrological model constructed using the LSTM algorithm is improved by 6%. Thus, the hydrological model based on the LSTM algorithm can predict the anomalous change in the groundwater level caused by the mining activities accurately.
Unlike the structure of the Decision Tree algorithm, the LSTM structure consists of three different layers, including the input layer, hidden layer and output layer, where the hidden layer is composed of three gates: input gates, output gates, and forget gates. The composite structure of the hidden layer could effectively prevent the phenomenon of the gradient disappearing and the gradient explosion in the process of data training. In addition, the LSTM algorithm has two significant advantages. One is that it extracts the valuable information from the previous neurons and transfers them into the current neurons, which can significantly improve prediction accuracy. Another one is that it improves the speed of computation through multithreading parallel computing. Based on the above characteristics, the LSTM algorithm is more accurate than the Decision Tree algorithm.
Yan et al. [36] used the LSTM algorithm to build the hydrological model for predicting the anomalous changes in groundwater levels and hydrochemical components caused by seismic activities. The hydrologic model was trained based on the dynamic changes in the groundwater level and the hydrochemical component without earthquakes, which was used to predict the abnormal changes in groundwater level before earthquakes and then to identify the pre-seismic anomalies. The MAPE, RMSE, NSE and R2 are also used to evaluate the prediction performance. The results show that the hydrological model based on the LSTM algorithm has the best performance, which can effectively identify the pre-seismic anomalies. It is consistent with the conclusions we obtained. The application of the LSTM algorithm to construct hydrological models can improve the forecast accuracy.

6. Conclusions

In this study, the hydrochemistry analysis and time series analysis are used to identify the effect of mining activities on the groundwater regime. The novel data-driven algorithm is introduced to accurately predict the variation in the groundwater levels caused by the mining activities. The main conclusions are listed as follows:
(1)
The hydrochemical characteristic in the Yili Coalfield is affected by the cation exchange function and the dissolution of halite, gypsum, calcite and dolomite. The anomalous change in the hydrochemistry characteristic in the phreatic aquifer is attributed to the enhancement of the hydraulic connection between the different aquifers induced by the mining activities. The groundwater level of the confined aquifer can be considered as the input variable of the model to predict the phreatic groundwater level.
(2)
Precipitation, mine water inflow, and unit goal are highly coherent with the anomalous change in the groundwater level induced by the mining activities, which are considered as the input variables for constructing the prediction model.
(3)
According to the errors of predictive performance, the accuracy of predictive results calculated using the LSTM algorithm is 8% of NSE and 6% of R2 higher than that of the Decision Tree algorithm; the predictive error of the LSTM algorithm is 11% of RMSE and 6% of MAPE lower than that of the Decision Tree algorithm. The predictive errors indicate that the data-driven model based on the LSTM algorithm yields a better prediction performance than that of the Decision Tree algorithm, which can be used to predict the anomalous change in the groundwater level caused by the mining activities.

Author Contributions

Conceptualization, A.L. and S.Z.; methodology A.L., S.Z. and S.Q.; software S.D., H.W. and H.C.; validation T.W. and X.H.; formal analysis S.D. and X.H.; investigation, S.D. and H.W.; data curation, A.L.; writing—original draft preparation, A.L. and S.Z.; writing—review and editing, S.Z.; visualization, C.W.; project administration, A.L.; funding acquisition, A.L. and S.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (42307092; 42307113) and the Science, Technology and Innovation Fund Project of Xi’an Research Institute of China Coal Technology & Engineering Group Corp (2023XAYJS12). Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2023-JC-ZD-27). Tiandi Science and Technology Co. Ltd. Science and Technology Innovation Venture Capital Special Project (Grant No. 2022-2-TD-ZD005).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author.

Acknowledgments

We would like to thank the editor and three anonymous reviewers for very constructive comments and suggestions, which helped us improve the quality of the manuscript.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there is no conflicts of interest.

References

  1. Hikouei, I.S.; Eshleman, K.N.; Saharjo, B.H.; Graham, L.L.B.; Applegate, G.; Cochrane, M.A. Using machine learning algorithms to predict groundwater levels in Indonesian tropical peatlands. Sci. Total Environ. 2023, 857, 159701. [Google Scholar] [CrossRef] [PubMed]
  2. Zhang, S.; Shi, Z.; Wang, G.; Yan, R.; Zhang, Z. Groundwater radon precursor anomalies identification by decision tree method. Appl. Geochem. 2020, 121, 104696. [Google Scholar] [CrossRef]
  3. Yu, S.; Ma, J. Deep Learning for Geophysics: Current and Future Trends. Rev. Geophys. 2021, 59, e2021RG000742. [Google Scholar] [CrossRef]
  4. Hosono, T.; Yamada, C.; Shibata, T.; Tawara, Y.; Wang, C.Y.; Manga, M.; Rahman, A.T.M.S.; Shimada, J. Coseismic Groundwater Drawdown Along Crustal Ruptures During the 2016 Mw 7.0 Kumamoto Earthquake. Water Resour. Res. 2019, 55, 5891–5903. [Google Scholar] [CrossRef]
  5. Miyakoshi, A.; Taniguchi, M.; Ide, K.; Kagabu, M.; Hosono, T.; Shimada, J. Identification of changes in subsurface temperature and groundwater flow after the 2016 Kumamoto earthquake using long-term well temperature-depth profiles. J. Hydrol. 2020, 582, 10. [Google Scholar] [CrossRef]
  6. Qu, S.; Shi, Z.; Wang, G.; Xu, Q.; Han, J. Using water-level fluctuations in response to Earth-tide and barometric-pressure changes to measure the in-situ hydrogeological properties of an overburden aquifer in a coalfield. Hydrogeol. J. 2020, 595, 125673. [Google Scholar]
  7. Wang, C.Y.; Wang, L.P.; Manga, M.; Wang, C.H.; Chen, C.H. Basin-scale transport of heat and fluid induced by earthquakes. Geophys. Res. Lett. 2013, 40, 3893–3897. [Google Scholar] [CrossRef]
  8. Xiao, Y.; Hao, Q.; Zhang, Y.; Zhu, Y.; Yin, S.; Qin, L.; Li, X. Investigating sources, driving forces and potential health risks of nitrate and fluoride in groundwater of a typical alluvial fan plain. Sci. Total Environ. 2022, 802, 149909. [Google Scholar] [CrossRef] [PubMed]
  9. Wang, C.; Liao, F.; Wang, G.; Qu, S.; Mao, H.; Bai, Y. Hydrogeochemical evolution induced by long-term mining activities in a multi-aquifer system in the mining area. Sci. Total Environ. 2023, 854, 158806. [Google Scholar] [CrossRef]
  10. Yang, Y.; Mei, A.; Gao, S.; Zhao, D. Both natural and anthropogenic factors control surface water and groundwater chemistry and quality in the Ningtiaota coalfield of Ordos Basin, Northwestern China. Environ. Sci. Pollut. Res. 2023, 30, 67227–67249. [Google Scholar] [CrossRef]
  11. Barbieri, M.; Franchini, S.; Barberio, M.D.; Billi, A.; Boschetti, T.; Giansante, L.; Gori, F.; Jónsson, S.; Petitta, M.; Skelton, A.; et al. Changes in groundwater trace element concentrations before seismic and volcanic activities in Iceland during 2010–2018. Sci. Total Environ. 2021, 793, 148635. [Google Scholar] [CrossRef] [PubMed]
  12. Schiavo, M. Probabilistic delineation of subsurface connected pathways in alluvial aquifers under geological uncertainty. J. Hydrol. 2022, 615, 128674. [Google Scholar] [CrossRef]
  13. Zhang, S.; Shi, Z.; Wang, G.; Zhang, Z.; Guo, H. The origin of hydrological responses following earthquakes in a confined aquifer: Insight from water level, flow rate, and temperature observations. Hydrol. Earth Syst. Sci. 2023, 27, 401–415. [Google Scholar] [CrossRef]
  14. Liu, K.; Zhang, Y.; He, Q.; Zhang, S.; Jia, W.; He, X.; Zhang, H.; Wang, L.; Wang, S. Characteristics of thermophysical parameters in the Wugongshan area of South China and their insights for geothermal genesis. Front. Environ. Sci. 2023, 11, 1112143. [Google Scholar] [CrossRef]
  15. Zhang, H.; Shi, Z.; Wang, G.; Yan, X.; Liu, C.; Sun, X.; Ma, Y.; Wen, D. Different Sensitivities of Earthquake-Induced Water Level and Hydrogeological Property Variations in Two Aquifer Systems. Water Resour. Res. 2021, 57, e2020WR028217. [Google Scholar] [CrossRef]
  16. Janetti, E.B.; Guadagnini, L.; Riva, M.; Guadagnini, A. Global sensitivity analyses of multiple conceptual models with uncertain parameters driving groundwater flow in a regional-scale sedimentary aquifer. J. Hydrol. 2019, 574, 544–556. [Google Scholar] [CrossRef]
  17. Tian, X.; Gong, Z.; Fu, L.; You, D.; Li, F.; Wang, Y.; Chen, Z.; Zhou, Y. Determination of Groundwater Recharge Mechanism Based on Environmental Isotopes in Chahannur Basin. Water 2023, 15, 180. [Google Scholar] [CrossRef]
  18. Mousavi, S.M.; Beroza, G.C. Machine Learning in Earthquake Seismology. Annu. Rev. Earth Planet. Sci. 2023, 51, 2365–2378. [Google Scholar] [CrossRef]
  19. Okoroafor, E.R.; Smith, C.M.; Ochie, K.I.; Nwosu, C.J.; Gudmundsdottir, H.; Aljubran, M. Machine learning in subsurface geothermal energy: Two decades in review. Geothermics 2022, 102, 102401. [Google Scholar] [CrossRef]
  20. Mao, H.; Wang, G.; Rao, Z.; Liao, F.; Shi, Z.; Huang, X.; Chen, X.; Yang, Y. Deciphering spatial pattern of groundwater chemistry and nitrogen pollution in Poyang Lake Basin (eastern China) using self-organizing map and multivariate statistics. J. Clean. Prod. 2021, 329, 129697. [Google Scholar] [CrossRef]
  21. Qu, S.; Duan, L.; Mao, H.; Wang, C.; Liang, X.; Luo, A.; Huang, L.; Yu, R.; Miao, P.; Zhao, Y. Hydrochemical and isotopic fingerprints of groundwater origin and evolution in the Urangulan River basin, China’s Loess Plateau. Sci. Total Environ. 2023, 866, 161377. [Google Scholar] [CrossRef]
  22. Xu, K.; Dai, G.; Zhao, D.; Xue, X. Hydrogeochemical Evolution of an Ordovician Limestone Aquifer Influenced by Coal Mining: A Case Study in the Hancheng Mining Area, China. Mine Water Environ. 2018, 37, 238–248. [Google Scholar] [CrossRef]
  23. Bredy, J.; Gallichand, J.; Celicourt, P.; Gumiere, S.J. Water table depth forecasting in cranberry fields using two decision-tree-modeling approaches. Agric. Water Manag. 2020, 233, 106090. [Google Scholar] [CrossRef]
  24. Rodrigues, E.; Gomes, Á.; Gaspar, A.R.; Henggeler Antunes, C. Estimation of renewable energy and built environment-related variables using neural networks—A review. Renew. Sustain. Energy Rev. 2018, 94, 959–988. [Google Scholar] [CrossRef]
  25. Wunsch, A.; Liesch, T.; Broda, S. Groundwater level forecasting with artificial neural networks: A comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX). Hydrol. Earth Syst. Sci. 2021, 25, 1671–1687. [Google Scholar] [CrossRef]
  26. Zhang, S.; Shi, Z.; Wang, G.; Yan, R.; Zhang, Z. Application of the extreme gradient boosting method to quantitatively analyze the mechanism of radon anomalous change in Banglazhang hot spring before the Lijiang Mw 7.0 earthquake. J. Hydrol. 2022, 612, 128249. [Google Scholar] [CrossRef]
  27. Luque-Espinar, J.A.; López-Chicano, M.; Pardo-Igúzquiza, E.; Chica-Olmo, M. Using numerical methods for map the spatiotemporal geogenic and anthropogenic influences on the groundwater in a detrital aquifer in south spain. J. Environ. Manag. 2024, 355, 120442. [Google Scholar] [CrossRef] [PubMed]
  28. Luo, A.; Dong, S.; Wang, H.; Ji, Z.; Wang, T.; Hu, X.; Wang, C.; Qu, S.; Zhang, S. Impact of long-term mining activity on groundwater dynamics in a mining district in Xinjiang coal Mine Base, Northwest China: Insight from geochemical fingerprint and machine learning. Environ. Sci. Pollut. Res. 2024, 2024, 1–16. [Google Scholar] [CrossRef] [PubMed]
  29. Wang, L.; Jia, J.; Xia, D.; Liu, H.; Gao, F.; Duan, Y.; Wang, Q.; Xie, H.; Chen, F. Climate change in arid central Asia since MIS 2 revealed from a loess sequence in Yili Basin, Xinjiang, China. Quat. Int. 2019, 502, 258–266. [Google Scholar] [CrossRef]
  30. Wang, L.; Jia, J.; Xia, D.; Liu, H.; Gao, F.; Duan, Y.; Wang, Q.; Xie, H.; Chen, F. Vegetation and climate history in arid western China during MIS2, New insights from pollen and grain-size data of the Balikun Lake, eastern Tien Shan. Quat. Sci. Rev. 2015, 126, 112–125. [Google Scholar]
  31. Acworth, R.I.; Halloran, L.J.S.; Rau, G.C.; Cuthbert, M.O.; Bernardi, T.L. An objective frequency domain method for quantifying confined aquifer compressible storage using Earth and atmospheric tides. Geophys. Res. Lett. 2016, 43, 11671–11678. [Google Scholar] [CrossRef]
  32. Qu, S.; Shi, Z.; Wang, G.; Zhang, H.; Han, J.; Liu, T.; Jin, X. Detection of hydrological responses to longwall mining in an overburden aquifer. J. Hydrol. 2021, 603, 126919. [Google Scholar] [CrossRef]
  33. Gu, X.; Sun, H.; Zhang, Y.; Zhang, S.; Lu, C. Partial Wavelet Coherence to Evaluate Scale-dependent Relationships Between Precipitation/Surface Water and Groundwater Levels in a Groundwater System. Water Resour. Manag. 2022, 36, 2509–2522. [Google Scholar] [CrossRef]
  34. Wang, C.-Y.; Liao, X.; Wang, L.-P.; Wang, C.-H.; Manga, M. Large earthquakes create vertical permeability by breaching aquitards. Water Resour. Res. 2016, 52, 5923–5937. [Google Scholar] [CrossRef]
  35. An, L.; Hao, Y.; Yeh, T.-C.J.; Liu, Y.; Liu, W.; Zhang, B. Simulation of karst spring discharge using a combination of time-frequency analysis methods and long short-term memory neural networks. J. Hydrol. 2020, 589, 125320. [Google Scholar] [CrossRef]
  36. Yan, X.; Shi, Z.; Wang, G.; Zhang, H.; Bi, E. Detection of possible hydrological precursor anomalies using long short-term memory: A case study of the 1996 Lijiang earthquake. J. Hydrol. 2021, 599, 126369. [Google Scholar] [CrossRef]
Figure 1. The geographic location and hydrogeological profile of the study area.
Figure 1. The geographic location and hydrogeological profile of the study area.
Water 16 01611 g001
Figure 2. The monthly monitored value of phreatic groundwater level elevation (PGL), confined groundwater level elevation (CGL), precipitation (P), water inflow (WI), and unit goal area (GA) during the interval of 2018~2023.
Figure 2. The monthly monitored value of phreatic groundwater level elevation (PGL), confined groundwater level elevation (CGL), precipitation (P), water inflow (WI), and unit goal area (GA) during the interval of 2018~2023.
Water 16 01611 g002
Figure 3. A schematic flowchart of LSTM algorithm.
Figure 3. A schematic flowchart of LSTM algorithm.
Water 16 01611 g003
Figure 4. The Gibbs diagram of groundwater samples.
Figure 4. The Gibbs diagram of groundwater samples.
Water 16 01611 g004
Figure 5. Relationships between hydrochemical parameters: (a) SIgypsum vs. SIhalite; (b) SIcalcite vs. SIdolomite; (c) Na+ vs. Cl; (d) Ca2+ vs. SO42−; (e) CAI 1 vs. CAI2; (f) (Na+ + K+ − Cl) vs. (Ca2+ + Mg2+) − (SO42− + HCO3); (g) (Ca2+ + Mg2+) vs. (SO42− + HCO3); (h) Ca2+ vs. HCO3.
Figure 5. Relationships between hydrochemical parameters: (a) SIgypsum vs. SIhalite; (b) SIcalcite vs. SIdolomite; (c) Na+ vs. Cl; (d) Ca2+ vs. SO42−; (e) CAI 1 vs. CAI2; (f) (Na+ + K+ − Cl) vs. (Ca2+ + Mg2+) − (SO42− + HCO3); (g) (Ca2+ + Mg2+) vs. (SO42− + HCO3); (h) Ca2+ vs. HCO3.
Water 16 01611 g005
Figure 6. The Piper diagram of groundwater samples collected from phreatic and confined aquifer before and after mining activities.
Figure 6. The Piper diagram of groundwater samples collected from phreatic and confined aquifer before and after mining activities.
Water 16 01611 g006
Figure 7. Wavelet coherence during the period of 2018~2023 between groundwater level in phreatic aquifer and (a) precipitation, (b) mine water inflow, and (c) unit goal area.
Figure 7. Wavelet coherence during the period of 2018~2023 between groundwater level in phreatic aquifer and (a) precipitation, (b) mine water inflow, and (c) unit goal area.
Water 16 01611 g007
Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and. (a) precipitation, (b) mine water inflow, and (c) unit goal area.
Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and. (a) precipitation, (b) mine water inflow, and (c) unit goal area.
Water 16 01611 g008
Figure 9. The training and prediction results of PGL model using. (a) Decision Tree algorithm, (b) LSTM algorithm.
Figure 9. The training and prediction results of PGL model using. (a) Decision Tree algorithm, (b) LSTM algorithm.
Water 16 01611 g009
Figure 10. The training and prediction results of CGL model using. (a) Decision Tree algorithm, (b) LSTM algorithm.
Figure 10. The training and prediction results of CGL model using. (a) Decision Tree algorithm, (b) LSTM algorithm.
Water 16 01611 g010
Figure 11. Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of PGL model. (A) and (a) represent the training and prediction stage using Decision Tree, respectively. (B) and (b) represent the training and prediction stage using LSTM, respectively.
Figure 11. Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of PGL model. (A) and (a) represent the training and prediction stage using Decision Tree, respectively. (B) and (b) represent the training and prediction stage using LSTM, respectively.
Water 16 01611 g011
Figure 12. Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of CGL model. (A) and (a) represent the training and prediction stage using Decision Tree, respectively. (B) and (b) represent the training and prediction stage using LSTM, respectively.
Figure 12. Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of CGL model. (A) and (a) represent the training and prediction stage using Decision Tree, respectively. (B) and (b) represent the training and prediction stage using LSTM, respectively.
Water 16 01611 g012
Table 1. The hydrochemistry parameters of phreatic and confined aquifer before and after mining activities.
Table 1. The hydrochemistry parameters of phreatic and confined aquifer before and after mining activities.
Aquifer Type
/Mining Stage
N pHTDSNa+ + K+Ca2+Mg2+ClSO42−HCO3 + CO32−
mg/L
Phreatic aquifer
before mining activities
6Maximum7.90648133.5272.6922.9233.44327.55277.09
Minimum7.702061.3350.006.181.6317.28158.77
Average7.8334948.1760.0613.0017.94104.41210.37
Confined aquifer
before mining activities
20Maximum8.003132956.64129.4460.96447.681435.31313.52
Minimum7.5040054.5357.3910.969.79123.45182.45
Average7.78810160.5585.2824.4382.83324.66248.62
Phreatic aquifer
after mining activities
5Maximum8.101719256.21192.0086.28114.87686.55324.00
Minimum7.30614100.50106.216.6893.60181.93220.26
Average7.821264188.34137.8642.82103.42520.74251.48
Confined aquifer
after mining activities
28Maximum8.201019121.9530.7010.506.5054.30102.00
Minimum6.7030913.60140.0086.28186.00665.71474.35
Average7.4571158.2593.3232.5468.73205.01201.32
Table 2. The errors of PCL model and CGL model in training and testing stage.
Table 2. The errors of PCL model and CGL model in training and testing stage.
ModelAlgorithmStageDG Well
MAPE (%)RMSENSER2
PGLDecision TreeTraining0.12610.17650.920.93
Testing0.13660.19890.860.88
LSTMTraining0.09180.15620.960.95
Testing0.09860.16250.930.91
CGLDecision TreeTraining0.09960.15260.910.91
Testing0.10260.16950.850.85
LSTMTraining0.08650.13680.950.96
Testing0.09580.15020.910.90
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, A.; Dong, S.; Wang, H.; Cao, H.; Wang, T.; Hu, X.; Wang, C.; Zhang, S.; Qu, S. Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China. Water 2024, 16, 1611. https://doi.org/10.3390/w16111611

AMA Style

Luo A, Dong S, Wang H, Cao H, Wang T, Hu X, Wang C, Zhang S, Qu S. Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China. Water. 2024; 16(11):1611. https://doi.org/10.3390/w16111611

Chicago/Turabian Style

Luo, Ankun, Shuning Dong, Hao Wang, Haidong Cao, Tiantian Wang, Xiaoyu Hu, Chenyu Wang, Shouchuan Zhang, and Shen Qu. 2024. "Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China" Water 16, no. 11: 1611. https://doi.org/10.3390/w16111611

APA Style

Luo, A., Dong, S., Wang, H., Cao, H., Wang, T., Hu, X., Wang, C., Zhang, S., & Qu, S. (2024). Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China. Water, 16(11), 1611. https://doi.org/10.3390/w16111611

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop