1. Introduction
Water inflow accident is a major hidden danger threatening the safety of mines all over the world [
1]. The volume of mine water inflow will directly affect the safety of a mine; thus, accurate prediction of water inflow can provide a safety guarantee for coal mine production [
2].
The history of mine water inflow prediction extends back to the late 19th century, evolving from empirical judgment to artificial intelligence [
3,
4]. Initially, the prediction mainly relied on the experience and perception of miners [
5,
6], mainly from similar experience in geology and mining environments, and the method was simple and practical, but the accuracy was often impossible to grasp due to the lack of scientific basis [
7]. In 1856, French engineer Henry Darcy studied the flow of fluid through sand through experiments and proposed the famous Darcy Law, which laid a theoretical foundation for the prediction of water inflow by deterministic methods such as the big well method [
8] and the analytical method [
9]. However, these methods rely on the groundwater flow mechanism and idealize practical problems. The effects of geological heterogeneity, water flow state, and boundary conditions are usually ignored, which leads to unsatisfactory prediction results. With the development of computer technology in the 20th century, numerical simulation methods [
10] were introduced into groundwater simulation. These methods, including the finite difference method and finite element method, can handle more complex fluid and fluid medium conditions, which improves the accuracy of water inflow prediction. Domingue et al. [
11] and Wu et al. [
12] established a three-dimensional unsteady groundwater flow model by repeatedly adjusting parameters on the basis of a conceptual hydrogeological model. The mine water inflow was also calculated. However, such methods require a large amount of exploration engineering support, and the modeling and calculation process is complex, which requires professional operation [
13,
14,
15].
In the 21st century, the application of artificial intelligence and machine learning technology has brought new possibilities to meet the accuracy challenge of mine water inflow prediction [
16]. Through historical correlation data analysis, data-driven prediction models are established which can learn and extract common features from a large amount of data, abandon the constraints of physical mechanisms, predict the future from existing information, and improve the accuracy and efficiency of prediction. Earlier statistical methods [
17,
18,
19] introduced data analysis, progressing from simple linear regression to complex multivariate analysis. Researchers have successfully applied the regression theory to the prediction of mine water inflow and have made significant contributions to the progress of prediction technology through detailed data processing and model construction [
20,
21]. However, this method relies on assumptions about the distribution of data and the underlying relationship between variables, often based on linear and additivity assumptions of the data which often do not hold in mine inflow prediction. The dynamic characteristics of mine water inflow are highly nonlinear, in which the relationship between variables is complex, non-monotonic, and changes with the change of time and environmental conditions, and statistical methods cannot effectively capture the dynamic characteristics [
22,
23]. A limited sample of data often results in the failure of statistical rules. The introduction of artificial intelligence methods, especially machine learning and deep learning technologies, has provided a new perspective for the prediction of mine water inflow, greatly enriched the prediction methods, and improved the accuracy and efficiency of the prediction [
24,
25,
26]. Among various methods, common techniques such as time series analysis [
27] and decision trees [
28] have become important means to predict mine water inflow. Artificial intelligence methods can build models to deal with nonlinear and complex relationships in data without relying on strict statistical assumptions and have strong feature learning abilities, which is more suitable for the dynamic prediction of mine water inflow [
29,
30].
Mine water inflow is usually predicted using time-series data from historical records [
31,
32]. Long Short-Term Memory(LSTM) networks are often used for the prediction of time-series data and can learn long-term dependence [
33,
34]. A Gated Recurrent Unit (GRU) neural network is a variant of an LSTM model which solves the long dependence problem in RNNs and has better prediction ability [
35]. With the continuous development and progress of science and technology, the accuracy of mine water inflow prediction methods is constantly improved with the use of model variants [
36,
37]. However, water inflow at the working face is a complex mining geological phenomenon caused by many interacting factors [
38,
39]. Geological structure, hydrogeological conditions, mining disturbance, and environmental constraints should be considered comprehensively in the process of water inflow prediction [
40,
41]. In the past, only the historical water inflow data were used to predict the future trends, the models over-relied on the historical trends, and the generalization ability was poor [
42]. Faced with multiple influencing factors, it is often difficult for a single factor and a single model to fully capture these complex interactions. At present, the application of multi-factor and multi-time series forecasting models in mine water inflow prediction is not mature and there are challenges in model and data selection [
43,
44]. Therefore, further research on multi-factor synthesis models and the extension of traditional single-factor models are helpful to improve the application range and prediction accuracy of these models [
45,
46,
47].
In this study, we combined microseismic and borehole water level data and proposed an improved sparrow search algorithm-residual network, Gated Recurrent Unit, and multi-head attention (SSA-RG-MHA) model. LSTM and its variants have significant advantages in processing time series data [
48]. However, only using this model not only requires a large amount of data, but also has a large number of parameters and hyperparameters, making the adjustment more complicated. Therefore, improved SSA is adopted to optimize the parameters. In order to improve the generalization ability of the model and the ability to extract data features, the residual network (ResNet) is combined to extract nonlinear features of complex inputs and reduce the risk of overfitting. In the face of the different nonlinear characteristics of different inputs, we should pay more attention to the factors that have a greater impact on the water inflow. The model uses the attention module to focus on important information. In conclusion, combining ResNet and GRU into a new network structure, allocating different weights through multi-head attention mechanism, and optimizing parameters through improved sparrow search algorithm is expected to predict mine water inflow more quickly, stably, and accurately and provide technical support for mine water disaster control.
2. Methods
2.1. Method Overview
In this paper, we propose an improved SSA-RG-MHA model designed for the prediction of water inflow, with the architecture outlined in
Figure 1.
This model operates on a set of three input feature variables: water level data, microseismic energy data, and records of historical water inflow. Each sample in the dataset is a sequence. The data set is divided into a training set, a validation set, and a test set, and its prediction performance is evaluated by retaining the test set to ensure that the model not only performs well on the training set but also maintains a stable prediction ability on the previously unseen data. The model merges ResNet and GRU layers, and it also integrates a multi-head attention (MHA) mechanism to strengthen its predictive power. First, the input data is extracted by ResNet to obtain the deep representation of the data. The extracted features are then fed into the GRU to learn the dynamic relationships of the time series. Finally, the MHA mechanism is applied to the output of the GRU to allocate attention at different points in time and optimize the prediction results. To refine the model’s performance, we utilize the sparrow search algorithm to optimize its hyperparameters. This optimization process enhances the model’s accuracy in predictions and its ability to generalize to new data.
With this integrated approach, the SSA-RG-MHA model can effectively process mine water inflow data on different time scales and improve the accuracy and robustness of prediction.
2.2. Residual Network
ResNet is a type of convolutional neural network used to extract high-level features from the input data. These features can effectively represent the nonlinear relationships of mine water inflow and the structure of ResNet allows the model to capture the long-term dependence of the data on different time scales [
49,
50,
51,
52]. As the number of layers in a deep neural network increases, there is a higher propensity for the gradients to either vanish or explode during training. Unlike traditional deep learning architectures such as standard CNNs, ResNet introduces the concept of identity mapping to mitigate the challenges associated with training very deep networks [
53,
54,
55]. The residual module is the fundamental building block of the ResNet architecture. The structure of the ResNet network is depicted in
Figure 2.
After the input x is processed by the convolutional operation in the main pathway, it is combined with a transformed version of itself, referred to as the eigenmatrix, within the identity mapping branch. This mechanism allows the network to learn the residual mapping F(x)—the difference between the output and the input—instead of directly approximating the output. This design obviates the need for extra parameters, thereby lightening the computational load on the network and preventing performance decline as the network depth increases.
In this study, we used three residual blocks, each consisting of three layers, including batch normalization, ReLU activation function, and convolution layer. Using 1 × 1 convolution to adjust the feature size simplifies the calculation process and improves the calculation efficiency. The ResNet structure not only overcomes the degradation problem of deep neural networks, but also speeds up network training and enhances feature extraction. The network consists of two residual modules.
The output function is
where
x is the input value,
H(
x) is the output value, and
F(
x) is the value mapped by the weight layer and the residual. The optimal output value is obtained when the residual is zero.
2.3. GRU Neural Network
A GRU is a common recurrent neural network model used to process data on medium time scales and capture the temporal dynamic characteristics of water inflow. It introduces the reset gate to control whether the state of the previous moment is retained and the update gate to control whether the state of the current moment is updated. The GRU model can ensure information transfer while improving the handling of long time-series data, and it can be used for both univariate and multivariate predictions [
56,
57,
58,
59,
60]. In this study, a GRU is used to ensure information transfer and efficiently manage long time-series data.
In the course of experiments, it is found that the prediction ability of the model can be increased by deepening the depth of the network properly, so two GRU layers with output dimensions of 512 are used in this paper.
The structural unit of the GRU is shown in
Figure 3.
The function of reset gate
rt and update gate
zt is to input the current information
xt at time
t and the output information
ht−1 at time
t − 1 into the gate control state for calculation, where
δ is the Sigmoid activation function and the mathematical expression of its gated state is as follows:
The row is reset and then the reset information and the input information
xt at time
t are multiplied and summed respectively with their corresponding bias items. Finally, the updated candidate state information
Nt at the current time is obtained through the Tanh activation function. Its mathematical calculation expression is as follows:
In the output of the current status information,
ht represents the information to be extracted from the information state
ht−1 of the previous time and retained from the candidate state
ht. The result of the forthcoming update gate is processed together with the information of the candidate state of the current time as output. Its mathematical calculation expression is as follows:
In the above formula, Wxz, Wxr, Whz, and Whr represent the weight matrix connected between the input layers at time t and time t − 1 respectively; “·” stands for matrix multiplication.
2.4. Multi-Head Attention
The MHA mechanism is used to improve the performance of sequence-to-sequence models and is similar to conventional attention mechanisms for processing input sequences and generating individual outputs. MHA calculates multiple attention weights in parallel and concatenates their results. In the prediction of mine water inflow, MHA can be used to simultaneously focus on the information at different time scales, better capture the relationships between different time intervals, local regions, and features, better handle the complexity and uncertainty in time-series data, and improve the robustness and generalizability of the model [
61,
62,
63,
64]. The number of hidden units is set to 9 and the number of heads is set to 8, depending on the size of the research data set and computing resources.
The structure of the MHA unit is shown in
Figure 4.
MHA is an attention mechanism that performs multiple attention functions simultaneously. The calculation process is as follows. First, the input is converted into three matrices: key value
K, query value
Q, and value
V. The parallel attention number
h is used to address different parts of the matrix. Second, the attention score of the zooming-point multiplier attention mechanism is calculated. The specific expression is given using Equation (5).
where
Q is a query matrix with value ∈ R
n×d, K is a key matrix with value ∈ R
n×d, V is a numerical matrix with value ∈ R
n×d,
d is the number of hidden layer nodes, and
n is the input length.
The generation process of the weight matrix
Q,
K, and
V is shown in
Figure 5.
2.5. Improved Sparrow Search Algorithm (SSSA)
The SSA is a swarm intelligence algorithm based on the sparrows when searching for food. However, owing to the lack of diversity in the initial population, the searchability of the SSA quickly declines, and in the late search period, the algorithm quickly falls into the problem of a local optimal solution [
65,
66]. Therefore, a chaotic sequence was used in this study to initialize the sparrow population position and fitness value, increase the coverage of the search space, improve the global search ability, and increase the randomness of the initial position to effectively avoid falling into a local optimal solution. The population distribution was more dispersed to avoid multiple individuals with the same fitness value, and the diversity of fitness values and differences between individual sparrows were increased.
The flowchart of the SSSA is shown in
Figure 6.
(1) The chaotic sequence was used to initialize the sparrow population position and fitness value and a chaotic sequence was generated between the chaotic variable spaces [
1] through the mapping relationship, and it was transformed into a single optimized variable space. The sinusoidal chaotic sequence model is a kind of chaotic model with infinite mapping folds. The one-dimensional mapping expression for sine-chaos is as follows:
where
Xn is a sequence of values (−1, 1) and the initial value cannot be set to zero.
(2) RMSE was used to calculate the fitness value of each sparrow and then sorted. In this model, the mean square error (EMS) of the validation set is used as the fitness function to judge the degree of optimization. The fitness is calculated as follows:
where
fi is fitness,
n is totality,
y is the true value, and
yi is the predicted value.
(3) The top 20% of sparrows were taken as discoverers and the remaining 80% as participants, and 20% of the two groups was randomly selected as alerters of perceived danger.
For the finder location update, the calculated is as follows:
where
t is the current iteration;
j is the fitness value of the
i-th sparrow at iteration
t, with
j = 1, 2, …;
Itermax is the maximum number of iterations; α is a random number;
R2 is the warning value at ∈ [0, 1];
S is the safety threshold at ∈ [0.5, 1];
M is a random number subject to a normal distribution; and
L is a 1 × 1 full matrix of d.
For the user position update, the calculated is as follows:
where
Xp is the optimal position occupied by the discoverer and
Xworst is the current worst-case position.
For the alert location update, the calculated is as follows:
where
Xbest is the current global optimal position,
β is the normal distribution of the random number whose step control parameter is the mean value 0 and variance is 1,
K is a random number with value ∈ [−1, 1],
f is the fitness value of the current sparrow,
fg and
fw are the current global optimal and worst values, respectively, and
ε is a very small number to avoid having a zero denominator.
(4) The positions of the above three were constantly updated and the fitness values calculated according to the results of iteration;
(5) At the end of iteration, the global optimal sparrow position was used as the optimal hyperparameter of the neural network.
The hyperparameters to be optimized by the algorithm include the number of neurons in the first and second hidden layers of the GRU neural network, the number of heads in the multi-head attention (MHA) mechanism, the training epochs, and the learning rate. Consequently, the optimization dimension of the sparrow search algorithm is set to five, with a population size of 50. The maximum number of iterations is capped at 15, and the warning value is set at 0.8. Following hyperparameter optimization, the optimal configuration is as follows: the first hidden layer contains 155 neurons, the second hidden layer has 84 neurons, the number of MHA heads is 4, the optimal training epoch amount is 113, and the optimal learning rate is 0.007.
2.6. RG-MHA Model Construction
The model structure of the RG-MHA model is shown in
Figure 7.
In this model, the residual network structure and GRU hidden layer are fused to form a new network structure. The output from the residual neural network (ResNet) serves as input for the gated recurrent neural network (GRU), creating a series fusion model. Through experimental training, we found that a 3-layer ResNet model yields superior prediction results. Consequently, the ResNet architecture comprises 10 convolutional layers, 1 pooling layer, 1 flattening layer, and 1 fully connected layer. The subsequent GRU channel consists of two GRU units and a fully connected layer. Because the output features of the ResNet model differ in dimension from the GRU model’s input requirements, a Reshape layer is inserted between the two models to adjust the data structure for compatibility with the GRU network.
The addition of the Dropout layer between the two classes of structures can also improve the generalization of the model and reduce the training time of the model. In this model, the drop value is 0.2. The specific calculation formula is as follows:
where
x is the input feature,
m is a binary mask of the same size as
x in which each element independently takes a value of 1 with probability
p and a value of 0 with probability (1 −
p), and
y is the output feature. In the testing phase,
m was fixed at 1 s.
The concatenate operation fuses the features extracted from the RG network with the features weighted by the attention mechanism and makes use of the basic features extracted from the GRU and the important features emphasized by the attention mechanism together, thus achieving better performance in the task of multivariate time series prediction.
Finally, through the fully connected layer, the correlation between the features extracted above is extracted through nonlinear changes, and the final water inflow prediction result is output. This model uses two dense layers to improve the complexity and accuracy. The core operation is a matrix-vector product.
where
Y is the final feature output,
W is the weight of each feature value, and
X is the extracted feature vector.
3. Case Study
3.1. Overview of the Study Area
The Tingnan Coal Mine is located in Changwu County, Shanxi Province, China. This coal mine is one of the 13 largest coal bases in China, located in the Binchang mining area in the Ordos Basin and Jurassic coal field [
47,
67]. The 207 working face is located in the middle of the second disc on the west side [
2]. The coal seam mined is No. 4 coal of the Jurassic Yan’an formation. The location of the 207 working face is shown in
Figure 8.
According to boreholes No. ZK8-1 and 3-1 (
Table 1), the distance between the coal roof of working face 207 and the bottom boundary of the Luohe Formation is about 173.0 m~172.0 m and the post-mining water-guiding fracture zone communicates with Luohe Formation. The lithology of Luohe Formation aquifer is mainly medium-grained sandstone, coarse-grained sandstone, and coarse conglomerate and the permeability coefficient is 0.0241 m/d. Although the permeability and water-richness are weak, the static reserves of the very thick aquifer are abundant, resulting in high water inflow and sharp fluctuation of the working face. Accurate prediction of water inflow is a major technical problem faced by coal mines in this region.
3.2. Datasets
In this case, borehole water level data, microseismic data, and water inflow from the mine workings were collected to demonstrate the correctness of the model. It has been shown that changes in borehole water level are strongly correlated to the amount of water inflow [
68]. Microseismic monitoring data can reflect the stress state and fracture activity of the subsurface, and these factors are closely associated with changes in water inflow. When the microseismic energy increases, the rock fracture is enhanced and the water-conducting fracture zone is more developed, resulting in the increase of water inflow. Therefore, the data of groundwater level, past water flow, and microseismic energy of the confined aquifer are selected as input characteristics.
A summary of 400 data points from the 207 working face were gathered and sorted from 30 November 2017 to 4 January 2019. The water level data were obtained from the Cretaceous Luohe Formation Observation Hole 3-1 in the working face, which is closest to the working face and has the most complete data. The microseismic data were acquired by a monitoring probe and four pickers covering the entire working area of the 207 working face to ensure the accuracy and completeness of the data. Missing data were completed with linear interpolation. The specific microseismic data, water level data and water inflow data, and their relationships are shown in
Figure 9.
3.3. Correlation Feasibility Analysis
Spearman’s correlation analysis is suitable for testing the nonlinear relationship between variables and can be used to calculate the correlation coefficients between two ordered or hierarchical variables. Therefore, this study used Statistical Product and Service Solutions (SPSS) to conduct a Spearman’s correlation analysis of mine water inflow, microseismic, and water level data. The correlation coefficients are presented in
Table 2.
According to the Spearman’s correlation coefficient obtained from the experimental, a significant correlation occurred among the mine water inflow, water level, and microseismic data.
As shown in
Figure 7, the energy levels of the microseismic events on the working faces during the monitoring period can be roughly divided into three stages.
(1) From 5 December 2017 to 20 December 2018, the working face experienced a period of severe stress changes in the bottom plate and expansion and development of the water-guiding fracture zone. At this stage, the fluctuation in the water level was relatively small and the mine water inflow was relatively stable. The average water inflow is about 400 m3/h. The overall energy level of the microseismic events was relatively low.
(2) From 28 February 2018 to 30 July 2018, the impact of mining pressure gradually increased, with a decrease in water level and an increase in water volume. The trend of energy level changes in the microseismic events was consistent with that of mine water inflow changes, and the overall energy level of the microseismic events was relatively low. On 25 March 2018, the microseismic energy increased, with an energy level of 1.0 × 105. The water inflow increased significantly to 769 m3/h and the water level decreased significantly.
(3) From 31 August 2018 to 6 November 2018, the stress in the mine changed dramatically, causing the groundwater to become active again, thus leading to a decrease in the water level and an increase in the water volume. On 15 September 2018, the energy level of the microseismic event began to increase suddenly, forming the first peak and reaching 6.4 × 105.
The above data show that the borehole water level and microseismic energy are closely related to the water inflow, and the monitoring forms a period, which is sufficient for statistical significance analysis. Therefore, it is feasible to use borehole water level data and microseismic data as the two characteristic factors for predicting mine water inflow.
3.4. Model Evaluation
To better evaluate the prediction results, we used the mean absolute error (MAE), root mean square error (RMSE) [
69], and mean absolute percentage error (MAPE) as evaluation indicators.
The MAE was used to measure the average difference between the predicted and actual values of the model. The smaller its value, the better the prediction effect of the model.
where
n is the number of samples,
yi is the true value of the
ith data point, and
f(
xi) is the predicted value of the
ith data point.
The RMSE is an indicator used to measure the error between the predicted and true values. The smaller its value, the smaller the error between the predicted and true values and the higher the accuracy of the model’s prediction.
where
n is the number of samples,
yi is the true value of the
ith data point, and
f(
xi) is the predicted value of the
ith data point.
The MAPE is used to calculate the percentage error. The smaller its value, the higher the accuracy of the prediction model.
where
n is the number of samples,
yi is the true value of the
ith data point, and
f(
xi) is the predicted value of the
ith data point.
4. Results and Discussion
4.1. Relationships Between Data and Models
In the research of mine inflow prediction, we choose a GRU (gated circulation unit), ResNet (residual network), MHA (multi-head attention mechanism), and SSA (Sparrow search algorithm).
The relationship between the specific data and the model is shown in
Figure 10.
(1) Residual Structure
Microseismic monitoring data can reflect the stress state and fracture activity of underground rock mass, and these factors are closely related to the change of water inflow. When the microseismic energy increases, the rock fracture increases, and the water-conducting fracture zone develops more, resulting in the increase of water inflow. Water level change is affected by many factors, such as rainfall, mining activities, etc. These factors interact with each other, resulting in a complex relationship between water level change and water inflow. In addition, general water level change has a hysteresis effect. In a word, the relationship between water inflow, microseismic events, and water level change is nonlinear and difficult to be described by simple mathematical models.
By introducing skip connections that allow gradients to flow directly to deeper networks, the residual structure can learn complex relationships between individual data. Mining activities will change the state of groundwater flow, resulting in the dynamic change of water inflow. Resnet is able to capture this dynamic.
(2) Gated Cycle Unit
The water level data record the water level height over time and reflect the dynamic change of the groundwater system, which is a typical time series dataset. The occurrence of microseismic events has a time stamp, which records the seismic activity over time, and is also time series data. The water inflow data record the water inflow of the mine at different time points and also constitute the time series data. They have typical timing characteristics.
GRUs, through their internal mechanisms such as gated structures, are able to capture long-term dependencies in time series. Features can be automatically extracted from the input time series to learn complex time patterns in water level and microseismic data. GRUs are able to process input sequences of different lengths, which allows the model to adapt to data changes on different time scales. Compared with other time series forecasting models, it has fewer parameters and higher computational efficiency.
(3) Multi-head attention
The influence weights of different geological factors on water inflow are different, and there are interactions among these factors. According to the correlation study, the correlation between water level data and water inflow is higher, so a higher weight of attention should be assigned to it.
The multi-head attention mechanism is able to process multiple factors at the same time and assign different attention weights to them to better capture the importance of each factor. The multi-head attention mechanism can flexibly adjust the focus of attention to adapt to this change.
(4) Sparrow search algorithm:
The optimization mechanism of the sparrow search algorithm is suitable for dealing with high-dimensional and nonlinear optimization problems, which is very suitable for a mine water inflow prediction model which includes multiple influencing factors. Other optimization algorithms may be less efficient in dealing with such complex problems or easily fall into local optimality. In contrast to other optimization algorithms, such as genetic algorithms or particle swarm optimization algorithms, the sparrow search algorithm has a better global search capability and convergence speed and can find the optimal parameter combination faster.
4.2. Comparison Between Multi- and Single-Factor Models
Mater inflow prediction in a mine working face is generally divided into two categories: single-factor prediction, which uses historical observational data; and multi-factor prediction, which uses mine water damage-related data for prediction. As mine data are difficult to obtain and process, the single-factor prediction method is more widely used. The accuracy of multi-factor models and single-factor models is compared and analyzed on the 207 work face test datasets.
Table 3 shows the results.
The mixed models are superior to the single models regardless of the single-factor model or the multi-factor model used, mainly because the mixed models take advantage of the diversity among different models. Each model focuses on different aspects of the data that, when combined, can provide a more comprehensive forecast. Compared with the SSA-CG-Attention model, the MAE of the single-factor model and the multi-factor model of the SSA-RG-Attention model decreased by 0.29 m3/h and 0.81 m3/h, respectively. In other words, ResNet is more advantageous for feature extraction in water inflow prediction.
The BP, LSTM, GRU, SSA-CG-Attention, SSA-RG-Attention, and improved MAE decrease by 2 m3/h, 1.29 m3/h, 1.63 m3/h, 8.9 m3/h, 9.42 m3/h and 8.08 m3/h. On the whole, the results of multi-factor prediction and evaluation are better than those of single-factor prediction, because a single-factor model only considers the historical trend and change law and can only provide single time series information, which is prone to misjudgment and omission. Using borehole water level and microseismic data to predict water inflow can impose multi-factor input constraints on each factor, provide more comprehensive information acquisition, and predict results more stably and accurately, which can significantly reduce the chance of misjudgment and improve the reliability, accuracy, and precision of water inflow prediction at the working face.
In short, the results of multi-factor prediction models are more accurate than those of single-factor prediction models, and the results of mixed models are more accurate than those of single model predictions.
4.3. Model Stability Comparison
Historical mine water inflow, borehole water level, and microseismic data collected over the same period were converted into training and test sets and the models were trained using the training set. The test set was applied to the trained models and the results for each model were recorded. The data were split and application model iterations were used to reduce randomness and improve the results’ accuracy. The results of each iteration were analyzed and the variance of the evaluation metrics for each model was calculated. The formula for calculating the variance of the model evaluation is shown in Equation (16):
where
m represents M model evaluation indicators,
n represents the number of iterations, and
represents the average value for each iteration indicator.
The dataset was split into 20 iterations and prediction experiments were performed to obtain the variance results, as shown in
Table 4.
The variance and sum of single-factor models such as BP, LSTM, and GRU are high, which indicates that the prediction results fluctuate greatly. This volatility is attributed to their reliance on a single indicator, which does not fully reflect the complexity of the water gushing process. In contrast, the improved SSA-RG-MHA models, both single-factor and multi-factor, exhibit lower variance sums, which are 10.3 m3/h and 7.5 m3/h, respectively, indicating that the model is more stable in predicting the effect.
In particular, the multi-factor models have the characteristic of a low variance sum when predicting water inflow. The multi-factor models of BP, LSTM, and GRU showed better performance than the single-factor models, where the variance sum of GRU and LSTM was 10.50 m3/h and 13.30 m3/h, respectively. However, the improved SSA-CG-Attention and improved SSA-RG-MHA multi-factor models showed the most significant improvements, with variance sums of 8.90 m3/h and 7.50 m3/h, respectively. This shows that these models not only learn the water inflow pattern but also capture the dynamic characteristics of the groundwater table and the effect of microseismic activity on the water inflow.
In summary, the multi-factor models can more effectively use the integration of historical mine inflow, borehole level, and microseismic data to capture the complex interactions and dynamic changes of the underground environment. Compared with the single-factor models, the variance sum of the multi-factor models is generally smaller and they can provide more accurate and stable results for the prediction of the water inflow of the working face, which is of great significance for mine safety and water resource management.
4.4. Model Parameter Optimization
In terms of model parameter optimization, this study uses an improved sparrow search algorithm (SSSA) to optimize the parameters of the deep learning model to improve the prediction accuracy of mine water inflow. We compared the fitness convergence of a genetic algorithm (GA), particle swarm optimization (PSO), sparrow search algorithm (SSA), and improved sparrow search algorithm (SSSA) in the optimization process by taking the absolute sum of the evaluation errors of training data as individual fitness values, and the results are shown in
Figure 11.
The experimental results show that the improved sparrow search algorithm can reach convergence after only 10 iterations and has a lower fitness value than other optimization algorithms, indicating that the model parameters optimized by the algorithm can predict the mine water inflow more accurately and quickly. Although other algorithms can eventually achieve lower fitness values, their convergence speed and stability are not as good as the SSSA, which may lead to prediction delays or reduced accuracy in practical prediction.
In summary, the model parameters optimized by the SSSA have a better effect on improving the prediction results of mine water inflow.
4.5. Comparison of Multi-Factor Prediction Models
The improved SSA-RG-MHA prediction model proposed in this study was compared with the SLP, MLP, SVR, GRU, and LSTM models, while the GRU, LSTM, and CG-Attention models were optimized on the same data set by the SSA. The results are shown in
Figure 12, and the evaluation indicators of each model are shown in
Table 5.
As can be seen from the above results, the MAE of the SLP, SVR, LSTM, GRU, SSA-LSTM, SSA-GRU, SSA-CG-Attention, and improved SSA-RG-MHA were 21.70 m3/h, 14.60 m3/h, 13.61 m3/h, 12.81 m3/h, 11.28 m3/h, 8.09 m3/h, 5.24 m3/h, and 4.42 m3/h, respectively. The predicted MAE of the improved SSA-RG-MHA model is only 4.42 m3/h, which is significantly lower than other models. This proves the effectiveness of improving the SSA-RG-MHA model.
Notably, with the exception of the SSA-CG-Attention model, there is always a small time lag between the predicted results and the observed results. This because the SSA-CG-Attention and the improved SSA-RG-MHA model can better learn the regularity of mutation points. This feature also reflects the superiority of the improved SSA-RG-MHA model in dealing with nonlinear problems and dynamic prediction. This is of great significance for mine warning and emergency response.
Overall, the performance of the improved SSA-RG-MHA model in multi-factor mine inflow prediction not only validates its effectiveness but also highlights its potential advantages in dealing with complex forecasting tasks. These findings provide a new and efficient solution for mine water inflow prediction.
5. Implications of Multi-Factor Prediction of Water Inflow from the Working Face Based on an Improved SSA-RG-MHA Model
Currently, the commonly used prediction methods of water inflow include traditional hydrogeological methods to calculate water inflow, including the large well method, hydrogeological analogy method, analytical method, etc.
The analytical method was used to predict the water inflow and the calculation formula is shown in Equation (17).
The calculated normal water inflow is 272.14 m3/h and the maximum water inflow is 544.28 m3/h, which is twice the normal water inflow.
The area analogy method is used to compare with the similar working face 205. The calculation formula is shown in Equation (18).
where
Q is the water inflow.
According to the area analogy method, the normal water inflow of 207 working faces is 200.6 m3/h, and the maximum water inflow is twice the normal water inflow, which is 401.2 m3/h.
Figure 13 illustrates a comparison of the computed results obtained through the analytical method and the area analogy method with the observed original water inflow data.
The forecast comparison chart reveals that conventional methods typically use the estimated normal water inflow value as the predictive value for the overall water inflow. This approach often yields a significant discrepancy between the predicted and actual water inflow values, resulting in suboptimal prediction efficacy. These methods predominantly focus on calculating water inflow at specific characteristic moments or at certain equilibrium states, rather than capturing the temporal variation of the inflow process. Consequently, these techniques are generally categorized as yielding static results, and they do not effectively support the dynamic prediction or early warning of water inflow changes.
Another common prediction method is the numerical simulation method, which simulates the water inflow based on the hydrogeological conditions. The accuracy of the method is limited by the accuracy of the model parameters and the simplified assumptions of the model. The complexity of the groundwater system may lead to some differences between the simulated results and the actual situation. In other words, when generalized hydrological conditions are carried out by this method, there is often a certain deviation from the actual situation, and more parameters are required, including more detailed boundary conditions, which requires more in-depth exploration of the study area, especially in the numerical simulation of specific working surfaces, and there are many limitations.
Therefore, using deep learning theory to predict water inflow is a new attempt to prevent mine water damage. The multi-factor prediction of working face water inflow is based on an improved SSA-RG-MHA model, providing dynamic prediction of water inflow based on a data-driven approach. Although certain research progress has been made in deep learning water inflow prediction, with the continuous development and improvement of technology, it is still necessary to integrate multiple methods to achieve higher prediction accuracy and reliability. In addition, in order to avoid the instability of the prediction results only applicable to the inflow water quantity, the multi-factor prediction idea proposed in this paper is to predict the inflow water quantity through the data with strong correlation with the inflow water quantity, so as to improve the shortcomings of the prediction model that only uses the historical data of the inflow water quantity. Therefore, in this paper, only some relevant data are selected for multi-factor water inflow prediction and other obtained data with strong correlation with water inflow can be further added to the experiment. Therefore, the research results of this paper may be used as an intelligent template for mine water inflow prediction so that the mine water inflow system can be accurately maintained.
6. Conclusions
The model proposed in this study is built on the basis of deep learning, integrates a residual network and an MHA mechanism, considers the local features of the data while extracting the temporal features, focuses on the input elements, and learns as many features of the input data as possible. In addition, an SSSA was used to optimize the model parameters and quickly find the network parameters with the least number of iterations, thereby avoiding errors in the local optimal solution and improving the prediction accuracy. The improved SSA-RG-MHA model was compared with other models and the comparison revealed its advantages in terms of prediction stability, prediction accuracy, and prevention of unexpected results.
This study took a coal mine in Shanxi Province as an exhibit and analyzed the correlations among the borehole water level data, microseismic data, and water inflow of the working face, and showed that it is feasible to predict the water inflow of the working face by using the borehole water level and microseismic monitoring data. Therefore, the borehole water level and microseismic data were taken as two characteristic variables of the prediction model. The improved SSA-RG-MHA model was compared with other single-factor and multi-factor models on the same data set. The MAE, RMSE, and MAPE of the improved SSA-RG-MHA multi-factor water inflow prediction model are 4.42 m3/h, 7.17 m3/h, and 0.5, respectively, which are lower than other single-factor water inflow prediction models and multi-factor water inflow prediction models. The results prove the superiority of the improved SSA-RG-MHA model. Moreover, compared with the stability of other models, the variance sum of the improved SSA-RG-MHA multi-factor water inflow prediction model is 7.5 m3/h, which is lower than other water inflow prediction models, confirming the stability advantage of the proposed model.
This study considered only two related factors of water inflow, namely the borehole water level and microseismic data. Future research should elaborate on related influencing factors and apply this method to other cases. In this way, the prediction model can be further improved and its prediction accuracy further increased.