Next Article in Journal
Enhancing Ecological Security in Ili River Valley: Comprehensive Approach
Next Article in Special Issue
Research Progress in Remote Sensing, Artificial Intelligence and Deep Learning in Hydraulic Structure Safety Monitoring
Previous Article in Journal
Rate Transient Analysis for Multi-Fractured Wells in Tight Gas Reservoirs Considering Multiple Nonlinear Flow Mechanisms
Previous Article in Special Issue
A Review of Stability of Dam Structures in Coal Mine Underground Reservoirs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Artificial Intelligence Prediction Process of Concrete Dam Deformation Based on a Stacking Model Fusion Method

1
The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210024, China
2
College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210024, China
3
Cooperative Innovation Center for Water Safety and Hydro Science, Hohai University, Nanjing 210024, China
4
Yunnan Provincial Key Laboratory of Water Resources and Hydropower Engineering Safety, Kunming 650051, China
5
Powerchina Kunming Engineering Corporation Limited, Kunming 650051, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(13), 1868; https://doi.org/10.3390/w16131868
Submission received: 8 May 2024 / Revised: 23 June 2024 / Accepted: 27 June 2024 / Published: 29 June 2024

Abstract

:
Deformation effectively represents the structural integrity of concrete dams and acts as a clear indicator of their operational performance. Predicting deformation is critical for monitoring the safety of hydraulic structures. To this end, this paper proposes an artificial intelligence-based process for predicting concrete dam deformation. Initially, using the principles of feature engineering, the preprocessing of deformation safety monitoring data is conducted. Subsequently, employing a stacking model fusion method, a novel prediction process embedded with multiple artificial intelligence algorithms is developed. Moreover, three new performance indicators—a superiority evaluation indicator, an accuracy evaluation indicator, and a generalization evaluation indicator—are introduced to provide a comprehensive assessment of the model’s effectiveness. Finally, an engineering example demonstrates that the ensemble artificial intelligence method proposed herein outperforms traditional statistical models and single machine learning models in both fitting and predictive accuracy, thereby providing a scientific and effective foundation for concrete dam deformation prediction and safety monitoring.

1. Introduction

Concrete dam projects are mostly built in high mountain canyons with significant faults and deep coverage. These structures not only endure the cyclic action of various loads but also face deterioration from a complex and harsh environment. The safety behavior of their service is a nonlinear dynamic process, evolving due to the interactions between materials and structures under the combined influence of various factors [1,2]. Deformation serves as a comprehensive reflection of the structural form of concrete dams and is a crucial indicator for evaluating the transformation of the structural state and long-term service health of the dam. Establishing an accurate and effective deformation prediction model is vital for predicting its structural state evolution, conducting safety monitoring, and ensuring stable and efficient operation.
Three traditional approaches used to describe and assess the performance of concrete dams through the analysis of real-time monitoring data include statistical models, deterministic models, and hybrid models based on causality [3,4]. These models are frequently employed to construct deformation prediction models. In these numerical simulation models, the effects of hydrostatic pressure, ambient temperature, and time are primarily considered. In a statistical model, the effect quantity is divided into components of water pressure, temperature, and aging. A mathematical model is constructed by integrating the measured dam data with statistical methods such as multiple linear regression. The advantages of statistical methods include simplicity of formulation and rapid execution speed. However, when variables exhibit multicollinearity or contain internal outliers, the performance of existing statistical regression models is often unsatisfactory. In conjunction with the real operational behavior of the dam and its foundation, the deterministic model utilizes the finite element method to compute fields impacted by the load, including displacement, stress, and seepage, and then optimizes and fits these with the measured values to refine the physical and mechanical parameters, thus establishing the deterministic model. Deterministic models involve solving nonlinear partial differential equations, which are often complex or impossible to solve analytically, and their accuracy is difficult to ensure [5,6]. The hybrid model utilizes finite element calculations for the water pressure component, while other components continue to employ the statistical model, which is then optimized and fitted with measured values [7].
However, traditional models exhibit several limitations [2]. For instance, due to the strong correlation between water temperature and air temperature among input factors, the traditional statistical model struggles to accurately model this nonlinear effect. Traditional models are usually based on explicit assumptions about data distribution and hypotheses, often employing simpler and more interpretable models. In contrast, artificial intelligence (AI) methods do not impose strict distribution assumptions and are more data-driven. They rely on large datasets and iterative training to continuously optimize model parameters, thereby enhancing prediction accuracy. Additionally, AI methods are typically complex and highly nonlinear, which makes machine learning approaches more effective in handling large-scale, high-dimensional data. In recent years, artificial intelligence methods have been increasingly applied in various engineering fields [8,9,10]. Techniques such as support vector machines [11,12,13], random forests [14,15,16], and neural networks [17,18,19,20] have been incorporated into the concrete dam deformation prediction system.
The structure of ordinary neural networks must be modified by a predetermined algorithm during the training process, which does not guarantee that the optimal structure is achieved, often resulting in convergence to local optima. Support vector machines effectively handle practical challenges such as small samples, non-linearity, high dimensionality, and local minima, substantially mitigating the issues of “dimension disaster” and “overlearning”. However, due to the numerous and complex factors influencing concrete dam deformation, the accuracy of models based solely on support vectors is often compromised. The random forest algorithm addresses the interactions and nonlinear relationships between variables, enhancing the accuracy of classification and regression tasks without significantly increasing computational demands. Nevertheless, it is susceptible to overfitting in the presence of noisy data. Some scholars [17,21,22] advocate for model integration or strategy combination to improve prediction performance, although the shared training dataset among individual models often limits enhancements in generalization ability and the models do not provide algorithmic synergy, with insufficient theoretical backing to convince users.
Moreover, research on concrete dam deformation prediction models frequently overlooks data preprocessing and comprehensive model performance evaluation. The process of collecting deformation monitoring data is often compromised by factors such as instrument failure and human error, leading to data issues such as missing, repeated, erroneous, and outlier values, which diminish the completeness and reliability of the monitoring data and may even result in misjudgments of the dam’s operational state. Therefore, developing effective monitoring data preprocessing methods is essential to enhancing the data’s completeness and reliability, thereby facilitating the establishment of prediction models based on accurate and reliable health status information. Additionally, a comprehensive model performance evaluation system is crucial for assessing various prediction models from multiple perspectives [3,18].
To enhance the integrity and reliability of displacement monitoring information, improve the accuracy and generalization ability of prediction models, and comprehensively evaluate model performance, this paper proposes a novel artificial intelligence-based concrete dam deformation prediction process. This process utilizes a stacking model fusion method combined with feature engineering. Initially, missing values in the displacement time series are addressed using the concept of temporal and spatial correlation among dam monitoring points. The wavelet analysis method is then employed to denoise the displacement time series, ensuring complete and accurate monitoring information and facilitating accurate reflection of the dam’s deformation safety status and the establishment of a reliable prediction model. Subsequently, using the stacking model fusion method, a two-layer stacking ensemble learning prediction process is constructed with XGBoost, Extremely Randomized Trees (Extra-Trees), and SVR as the first-layer base-learners, and multi-response linear regression (MLR) as the second-layer meta-learner. Finally, the measured deformation data from a concrete dam are used for training and prediction. The basic and novel performance evaluation indicators are applied to assess the predictive capabilities of different methods. The results show that the proposed process markedly enhances both accuracy and generalizability over traditional statistical models and single machine learning models, offering a robust foundation for monitoring deformation safety in concrete dams.

2. Materials and Methods

2.1. Feature Engineering of Deformation Monitoring Data

In the process of acquiring time series of monitoring data, various factors such as environmental conditions, instrumentation, data acquisition systems, and human error can degrade data quality, leading to issues such as missing values and noise, etc. These issues may prevent the data from accurately reflecting the actual operational behavior of the dam [23,24,25]. Therefore, filling missing values and reducing noise, coupled with performing feature engineering on dam monitoring data, are crucial for accurately constructing safety monitoring models and comprehensively evaluating operational performance.

2.1.1. Spatiotemporal Correlation Weighted Regression Interpolation Method

During the operation of concrete dams, critical monitoring data such as water level and displacement often go missing due to instrument failure and human error. The absence of this data can significantly impact the analysis of deformation behavior and the estimation of parameters for deformation prediction models. The common approaches to address missing data include deletion and filling methods. The deletion method can result in the loss of substantial monitoring information, potentially leading to errors in the evaluation of the dam’s safety state. Filling methods typically involve statistical techniques (such as mean and median filling) and machine learning techniques (such as SVR, EM, and KNN). However, statistical filling can reduce the variance of the data set, distort its distribution characteristics, and overlook the time series relationships and physical significance in the dam monitoring data. Machine learning filling may lead to overfitting in the prediction model, compromising the model’s robustness. Consequently, this paper proposes a spatiotemporal correlation weighted regression interpolation method.
Concrete dams are a quintessential example of large spatial structures with a certain degree of integrity at different scales. Consequently, the distribution of deformation in dams also exhibits continuity and similarity throughout the structure. Assuming that the material and structure of the dam are homogeneous and consistent, we illustrate the spatial correlation regression interpolation method for handling missing deformation values in concrete dams, using three points as examples. It is assumed that a concrete dam has three spatially adjacent and structurally related monitoring points, A, B, and C, as shown in Figure 1. Puh, Puv, Pdh, and Pdv represent upstream horizontal water pressure, upstream vertical water pressure, downstream horizontal water pressure, and downstream vertical water pressure, respectively.
Among them, the monitoring data for A and C are complete, while the monitoring data for B are partially missing, as depicted in Figure 2. Tl, Tm, and Tn represent the same time period in each year.
Due to the structural integrity of dams, the distribution of deformation values also exhibits continuity in space [26]. If the spatial proximities of measurement points A, B, and C are considered simultaneously, the missing value at measurement point B should be correlated with the deformation values at measurement points A and C during the missing period. Therefore, based on the deformation information from measurement points A and C, the deformation sequence of measurement point B can be expressed as follows:
δ B = g ( δ A ) + g ( δ C ) + ε
where g ( δ A ) and g ( δ C ) represent the functions of the deformation values at A and C as a function of δ B , respectively, and ε represents allowable deviation, the same as below. If the relationships between δ A and δ B and between δ C and δ B are expressed by polynomials, then the following expressions apply:
δ B = α A i = 1 K A λ A i δ A i + α C i = 1 K C λ C i δ C i + β B + ε
where α A and α C are the coefficients of polynomials containing terms δ A and δ C , respectively, and KA and KC represent the highest degrees of these polynomials. The values of KA and KC can be determined by plotting a scatter diagram to analyze the correlation between variable δ B and variables δ A , δ B , and δ C . The term β B represents a translation item.
Considering generality, for the deformation sequence δ i t of any monitoring point i with missing data, the following expressions of spatial correlation are applicable:
δ S = j = 1 L α i j g ( δ j ) + β i + ε
where L represents the number of points adjacent to i, determined by the spatial continuity between the location of the point and the dam section, and the similar trend in deformation value distribution. δ j represents the deformation sequence of the measurement point j adjacent to point i, g ( δ j ) denotes the correlation function between the deformation sequences δ j t and δ i t , and β i represents the translation term for i.
Based on the sequences δ i t and δ j t , the value of α i j is estimated using the least squares method to define the expression for the spatial proximity regression interpolation model.
Through the application of a single spatial proximity regression interpolation method, considerations are given to the spatial continuity of different measurement points and the consistent trend in deformation value distribution. However, the volatility inherent to a single measurement point is often overlooked. The missing sequence of adjacent measurement points in space is often the same. Similarly, considering the periodic distribution of the measurement value at a point over time, the expression for time correlation at a single measurement point is:
δ T = k = 1 M α t k g ( δ k ) + γ i + ε
where M represents the number of time periods adjacent to time period t. δ k represents the deformation sequence of time periods adjacent to t. g ( δ k ) denotes the correlation function between the deformation sequences δ k and δ t , and γ i represents the translation term for i.
In this paper, a spatiotemporal correlation weighted regression interpolation method is proposed.
δ = R S ¯ R S ¯ + R T ¯ δ S + R T ¯ R S ¯ + R T ¯ δ T
where R S ¯ and R T ¯ are the average correlation coefficients of deformation time series between spatially correlated different measurement points in the same time period and target sequence, and between time-correlated different time periods in the same measurement point and target sequence, respectively.

2.1.2. Wavelet Analysis Denoising Method of Monitoring Data

Wavelet transform is a multi-scale analysis method that exhibits strong local recognition capabilities in both the time and frequency domains. In the wavelet domain, the effective signal of dam safety monitoring data is represented by large wavelet coefficients, while noise, characterized by randomness and discontinuity in the time domain, has small coefficients. Based on these characteristics, an appropriate threshold can be determined to nullify noise coefficients while preserving the effective signal, thereby achieving noise reduction. Traditional threshold determination methods include the hard threshold and soft threshold methods. Currently, commonly employed methods for estimating thresholds include sqtwolog, rigrsure, heursure, and minimaxi.
In this paper, the hard threshold and soft threshold methods, along with the four threshold estimation methods mentioned previously, are employed to reduce noise. The root mean square error (RMSE) and signal-to-noise ratio (SNR) after noise reduction are calculated, the indicators after noise reduction are compared, and the method combination that provides the best noise reduction effect is selected.

2.2. Stacking Ensemble Learning Prediction Process

Stacking is a method known for its simple structure, high performance, and strong classification capability [27]. Essentially, stacking resembles the hierarchical structure characteristic of neural networks, with its effectiveness largely derived from feature extraction [28,29]. Thus, the enhancement of learning ability through stacking primarily results from the cumulative learning capabilities of different classifiers for various features. As depicted in Figure 3, under the ensemble learning approach based on stacking, the construction of the model is divided into two steps. Prediction accuracy is enhanced by cascading the prediction results among learners. In the first step, the original dataset is divided into a training set and a test set in a predetermined proportion, and suitable base learners are chosen to train the training set via cross-validation. Each base learner predicts the outcomes for the validation set and the test set. In the second step, the prediction outcomes from the base learners are used as feature data for training and predicting with the meta-learner. This meta-learner consolidates the features gathered in the earlier stage and the labels from the original training set to construct the model. Ultimately, it delivers the final prediction results of the stacking method.
In the initial step, a machine learning model with excellent predictive performance should be chosen to ensure the diversity of the models. In the subsequent stage, the meta-learner typically selects a straightforward model with robust stability to enhance the overall performance of the model.
The model is based on the concept of K-fold cross-validation, typically setting K to 5. This approach effectively avoids the overfitting phenomenon that can occur due to a limited amount of data. The detailed steps for implementing the stacking algorithm are as follows:
  • The original dataset is split into a training set T and a test set D in a specified proportion.
  • The training set (T) is randomly divided into five equal parts: T1, T2, T3, T4, and T5. Each subset Ti (i = 1, 2, 3, 4, 5) serves as the validation set, with the remaining subsets combined as the training set for model training. This setup is used to predict outcomes for both the validation and test sets.
  • After completing the 5-fold cross-validation, the first base learner compiles the predicted values for each validation set and the test set into separate columns. These predictions are then consolidated: the predictions from each validation set are merged into a single column denoted as A1, and the average of the predictions for the test set is calculated and recorded as B1, as depicted in Figure 4.
  • Upon completion of the training for all base learners, the input characteristic matrices for the meta-learner model, denoted as T ˜ = (A1, A2, ⋯, An), and for the final test set, as D ˜ = (B1, B2, ⋯, Bn), are established. The label values from the original training set are used as the output matrix for the model. The meta-learner then uses these matrices to generate the final prediction result after model fusion, as illustrated in Figure 5.

2.3. Stacking Ensemble Model-Based Learner

2.3.1. XGBoost

XGBoost, proposed by Chen and Guestrin [30] in 2016, is an enhancement of the serial ensemble machine learning algorithm known as the gradient-boosting machine (GBM). The core concept is to amalgamate hundreds of tree models, each with low prediction accuracy, into a highly accurate composite model. During the generation of each tree, the gradient descent method is employed to guide the optimization process towards minimizing a specified objective function. Building on the GBM, XGBoost performs a second-order Taylor expansion of the loss function, allowing for customization of the loss function and improved prediction accuracy. Its capability for automatic multithreading on CPUs enhances the algorithm’s execution speed. Due to its exceptional performance, XGBoost has been widely adopted in Kaggle competitions and various other machine learning contests, consistently delivering impressive results. Figure 6 illustrates the schematic diagram of XGBoost.

2.3.2. Extra-Trees

Extra-Trees, proposed by Pierre Geurts [31] and other scholars in 2006 after extensive experimental research, is a parallel ensemble machine learning method. The Extra-Trees algorithm is a variant of the Random Forest method. It constructs multiple decision trees using node random splitting technology and utilizes the weighted average of these trees for regression predictions. Unlike Random Forest, Extra-Trees do not search and compare within a subset of features to split nodes; instead, a feature and a threshold are randomly selected as the basis for each node’s division. This introduction of dual randomness enhances the generalization capability of Extra-Trees beyond that of Random Forest. Moreover, the method can handle high-dimensional data and complex nonlinear relationships between multiple variables. It also exhibits robustness against outlier data. In recent years, it has been widely used in areas such as agriculture, engineering, medicine, and finance for selecting features and detecting outliers. Figure 7 illustrates the schematic diagram of Extra-Trees.

2.3.3. SVR

In 1995, Vapnik [32] proposed the regression support vector machine model SVR. Based on the structural risk minimization criterion, SVR exhibits global optimality and robust generalization capabilities. It effectively addresses the issues of overfitting and the “dimension disaster” that are common in methods based on the empirical risk minimization principle, such as the BP neural network. Through the nonlinear transformation enabled by the inner product kernel function, SVR converts the input variables from low-dimensional space to high-dimensional space. This transformation turns the nonlinear problem into a linear one, facilitating the discovery of linear relationships between input and output variables in high-dimensional space, ultimately finding the optimal separation hyperplane. The schematic diagram of SVR is illustrated in Figure 8.

2.4. Model Performance Evaluation System

Performance evaluation indicators play a crucial guiding role in comparing and selecting different models. However, previous model evaluations often relied solely on single indicators such as the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Square Error (RMSE). This approach did not allow for a comprehensive and categorized analysis and discussion of model performance, and often overlooked the generalization abilities of the models, which is detrimental to the comprehensive evaluation of the prediction models [33]. Therefore, designing scientific performance evaluation indicators for dam deformation prediction models and establishing a comprehensive performance evaluation framework are critical challenges that need to be addressed in concrete dam deformation prediction.

2.4.1. Superiority Evaluation Indicator

The superiority of a prediction model is demonstrated through enhanced prediction accuracy compared to previous models. In past research, differences in model performance were qualitatively explained by comparing MAE and MSE values between models. However, the variations in performance between models were not quantitatively studied. Therefore, from a quantitative comparison perspective, the Superiority Evaluation Indicator SEI for concrete dam deformation prediction models is established by selecting two metrics: Relative Mean Absolute Error RMAE and Percentage Better Mean Absolute Error PB(MAE) [34]. The formula for the indicator is as follows:
S E I 1 = R M A E = M A E M A E
S E I 2 = P B ( M A E ) = 100 % 1 n ( i = 1 n I { A E i < A E i } )
where M A E and M A E represent the average absolute errors of the prediction model and the reference model for comparison, respectively. “I” is an operator that generates a value of 0 or 1 based on the given expression.
I ( A E ) = { 0 ,   i f   A E i < A E i 1 ,   otherwise }

2.4.2. Accuracy Evaluation Indicator

The accuracy of a prediction model is determined by the consistency between the predicted values derived from model training and the actual values obtained from measurements. When the monitoring data of a concrete dam include values of 0 or close to 0, MAPE may become excessively large or incalculable. MSE is also sensitive to the scale of the data and varies across different data subsets. Consequently, this paper utilizes two improved metrics: the Mean Arctangent Absolute Percentage Error MAAPE [34] and the Normalized Root Mean Square Error nRMSE to establish the Accuracy Evaluation Indicator AEI for concrete dam deformation prediction models. The formula for the indicator is as follows:
A E I 1 = n R M S E = 1 y ˜ R M S E = 1 y ˜ 1 n i = 1 n ( y i f i ) 2
A E I 2 = M A A P E = 1 n i = 1 n ( arctan ( | y i f i y i | ) )
where y ˜ is the normalization factor.

2.4.3. Generalized Evaluation Indicator

The generalization of a prediction model refers to the model’s adaptability when applied to a test set after being trained on a training set. Typically, overfitting is mitigated and the model’s generalization ability is enhanced by increasing the amount of training data and incorporating a regularization term into the model’s loss function. In this paper, the Generalization Evaluation Indicator GEI is established using the following formula:
G E I = | n R M S E P n R M S E T |
where the subscript P represents the prediction process and the subscript ‘T’ represents the training process. n R M S E P and n R M S E T denote entities within the prediction process and the training process, respectively.

3. Design and Implementation of the Model

3.1. Project Overview

A concrete gravity dam was constructed in 1973 and completed in September 1986. The dam crest elevation is 138.8 m, the lowest bedrock elevation is 86 m, the maximum dam height is 52.8 m, and the main dam length is 1354 m. The normal pool level is 133 m, and the total storage capacity is 320 million cubic meters. The dam is segmented into 82 sections, including 36 non-overflow sections, 29 overflow sections, eight flood discharge bottom outlet sections, one riverbed power station section, two headwork power station sections, two drainage bottom outlet sections, two east headwork power station sections, and two elevator shaft sections.
Currently, deformation monitoring is one of the most important monitoring items of the dam, which includes the monitoring of horizontal and vertical displacement. Among these, three normal vertical lines (measurement points are PL) are positioned at sections 1#, 1 k#, and 30#, seven inverted vertical lines (measurement points are IP) are positioned at sections XT3, 1#, 14#, 19#, 30#, 56#, and 58#, and sets of extension lines are positioned at an elevation of 110 m on the dam body (measurement points are EXL) and 138.8 m on the dam crest (measurement points are EXD). The horizontal displacement monitoring system for the concrete dam comprises normal vertical lines, inverted vertical lines, and extension lines. The horizontal displacement monitoring system is depicted in Figure 9.

3.2. Model Sample and Input Factor Selection

Monitoring data indicate that the horizontal displacement of the concrete dam adheres to the general deformation law of concrete gravity dams. Specifically, the amplitude of variation is large in the riverbed dam sections, smaller in the bank slope dam sections, larger at higher elevations, and smaller at lower elevations. Given the horizontal displacement distribution law of the concrete dam and its hydrogeological conditions, the measurement points EXD-18 at the crest of riverbed overflow dam section 18# and EXD-D7 at the crest of diversion bottom outlet dam section D7 have been selected as the focal points for analysis in this paper. This analysis aims to evaluate the model’s superiority, accuracy, and generalization.
The HST analysis model is the most widely used quantitative interpretation model in dam deformation analysis. The model predicts the horizontal displacement of concrete dams using long-term monitoring information and mathematical statistical analysis. Based on the specific engineering characteristics and operational state of the dam discussed in this paper, and in conjunction with the HST model, the displacement is categorized into water pressure component δ H , temperature component δ T , and aging component δ θ .
δ H = i = 1 3 a i ( H i H 0 i )
δ T = i = 1 2 [ b 1 i ( sin 2 π i t 365 sin 2 π i t 0 365 ) + b 2 i ( cos 2 π i t 365 cos 2 π i t 0 365 ) ]
δ θ = c 1 ( θ θ 0 ) + c 2 ( ln θ ln θ 0 )
Combined with the three formulas described previously, the calculation formula of the measured value δ ^ is as follows:
δ ^ = a 0 + i = 1 3 a i ( H i H 0 i ) + i = 1 2 [ b 1 i ( sin 2 π i t 365 sin 2 π i t 0 365 ) + b 2 i ( cos 2 π i t 365 cos 2 π i t 0 365 ) ] + c 1 ( θ θ 0 ) + c 2 ( ln θ ln θ 0 )
where: H is the upstream water depth on the monitoring day; H 0 is the upstream water depth on the starting date; t is the cumulative number of days from the monitoring date to the starting date; t 0 is the cumulative number of days from the first day of the monitoring period to the starting date; θ is the cumulative number of days from the monitoring date to the starting date divided by 100; θ 0 is the cumulative number of days from the first day of the monitoring period to the starting date divided by 100. h a is the dam foundation elevation of a dam section; a 0 is a constant term; ai, b1i, b2i, c1, and c2 are regression coefficients.
In this paper, 1958 groups of monitoring data in the dam deformation time series, from 1 January 2013 to 12 May 2018, are selected as the training set, and 346 groups of data from 13 May 2018 to 23 April 2019 are selected as the test set.

3.3. Model Implementation

3.3.1. Feature Engineering

Based on the concept of feature engineering, the initial step involves removing gross errors from the concrete dam displacement time series using the widely applied Pauta criterion in practical engineering. This process eliminates apparent outliers and sudden jumps in the data. Given the prevalence of missing values in the time series, which hinder the research and analysis of dam deformation behavior and the estimation of parameters for deformation prediction models, a spatiotemporal correlation weighted regression interpolation model is proposed. This model aims to address missing values by considering fluctuations in space and time of the monitoring data.
Additionally, this paper leverages the unique advantages of multi-scale localization offered by wavelet transform for noise reduction in dam safety monitoring data. Two threshold determination methods—hard threshold and soft threshold—and four threshold estimation methods, namely Sqtwolog, Rigrsure, Heursure, and Minimaxi, are employed to reduce noise in concrete dam deformation monitoring signals. The RMSE and SNR are calculated and compared across two noise reduction indicators to determine the method combination that achieves the most effective noise reduction.

3.3.2. Stacking Ensemble Learning Prediction Process

To ensure the effectiveness of the stacking model fusion method, it is crucial in the first step to select proper models while ensuring diversity and complementarity between models. In the second step, the meta-learner generally opts for a simple model with good stability to enhance the overall performance of the system.
Bagging family algorithms, such as Random Forest and Extra-Trees, enhance performance by reducing the model’s variance. In contrast, boosting family algorithms such as Gradient Boosting Decision Trees (GBDT) and XGBoost enhance performance by reducing the model’s bias. In this paper, XGBoost (2.0.2), based on the boosting strategy, Extra-Trees, based on the bagging strategy, and SVR, which is widely used in practical engineering, are selected as the base learners for the stacking ensemble learning prediction process. Grey Wolf Optimization is employed to optimize the hyperparameters that influence the performance of the algorithm. A Multiple Linear Regression (MLR) algorithm, known for its strong stability, is selected as the meta-learner. The displacement influence factor from the HST model is used as the input for the model, with the horizontal displacement time series as the output. This study predicts the displacement time series of the concrete dam.

3.3.3. Model Performance Evaluation

To evaluate the performance of the prediction method combined with feature engineering proposed in this paper, we have introduced three evaluation indicators: superiority, accuracy, and generalization. These are designed to provide a thorough and comprehensive assessment of the model.
Following data training that incorporates feature engineering, we compared basic evaluation indicators (MAE, MSE, and RMSE) and the superiority, accuracy, and generalization indicators between each individual model (XGBoost, Extra-Trees, SVR, and HST) and the stacking ensemble learning prediction process. This comparison helps verify the superiority of the proposed process in terms of fitting and forecasting accuracy.
The workflow chart described in this paper is illustrated in Figure 10.

4. Results and Discussion

4.1. Feature Engineering of Deformation Monitoring Data

4.1.1. Monitoring Data Spatiotemporal Correlation Weighted Regression Interpolation

Taking measurement point EXD-18 as an example, nearby measurement points EXD-2, EXD-3, and EXD-4, which are spatially close to EXD-18 and exhibit roughly the same small-scale fluctuations in measured values, are selected as the spatial reference series for the spatiotemporal correlation regression interpolation method. Additionally, the data from 2013, 2015, and 2017, which have a relatively complete series of measured values, are chosen as the temporal reference points for the 2014 data, which have a higher incidence of missing measurements. The correlation between each reference sequence and the EXD-18 sequence is illustrated in Figure 11 and Figure 12.
During the operation of the dam, each dam section is uniformly affected by environmental factors. Concurrently, the monitoring instruments at each measurement point may fail, resulting in absent measured values. For the time period requiring interpolation, the spatiotemporal correlation regression interpolation method (STCM) is adopted when the referenced measurement points and periods are available, ensuring that both temporal and spatial correlations are considered. If the referenced measurement points are missing, the time correlation regression interpolation method (TCM) is employed; if the referenced periods are missing, the spatial correlation regression interpolation (SCM) method is utilized.
The interpolation expressions and error indicators for each interpolation period of EXD-18 and EXD-D7 measurement points are detailed in Table 1 and Table 2. For EXD-18, x1~x6, respectively, represent the measured value sequences in 2013, 2015, and 2017, and the measured value sequences of EXD-2, EXD-3, and EXD-4 correspond to the year to be interpolated. For EXD-D7, x0~x6, respectively, represent the measured value sequence of the measurement point in 2013, 2016, 2017, and 2018, and the measured value sequences of EXD-2, EXD-3, and EXD-4 correspond to the year to be interpolated.
The error indicators demonstrate that, generally, the spatiotemporal correlation regression interpolation method outperforms the spatial correlation regression interpolation method, which in turn is superior to the temporal correlation regression interpolation method. From the perspective of concrete dam operations, the operational conditions among different measurement points within the same year are identical, whereas the conditions across different years are broadly similar. This results in better synchronization of displacement curves among different measurement points within the same year compared to displacement curves of the same measurement point across different years. From the perspective of the measured data, it is evident from the missing sequence at the first section of the EXD-18 measurement point that the coefficient before the displacement time series of the same measurement point in different years is significantly smaller than that before the time series of different measurement points in the same year. This indicates that the correlation between displacement time series and target series at the same measurement point across different years is much weaker than that between time series at different measurement points in the same year and the target series.
Therefore, when selecting independent variable time series for regression analysis, those with a more pronounced correlation to the target series should be chosen, considering both temporal correlation and spatial continuity.

4.1.2. Denoising of Monitoring Data

The hard threshold method and soft threshold method are employed to reduce the noise in sequences. Utilizing the four threshold determination methods, the noise reduction thresholds for different high-frequency subsequences are calculated, as detailed in Table 3.
It is evident from Table 3 that the Rigorous-hard wavelet denoising method yields the smallest RMSE, the highest SNR, and the most effective denoising result. Consequently, the Rigorous-hard wavelet denoising method has been chosen to denoise the original observational signal.

4.2. Analysis of Modeling Characteristics of Prediction Models

Using measurement point EXD-18 as an example, an upstream water level–temperature–displacement hydrograph has been drawn, as depicted in Figure 13. Analysis of this Figure reveals that the horizontal displacement at this measurement point in the water flow direction exhibits a clear periodic variation pattern, which correlates strongly with changes in environmental temperature and upstream reservoir water levels. As the reservoir water level rises, the displacement of the dam towards the downstream direction increases; conversely, as the water level decreases, the displacement of the dam towards the downstream direction diminishes. Similarly, as the temperature rises, the displacement of the dam upstream increases; inversely, as the temperature decreases, the downstream displacement increases. Overall, the horizontal displacement in the water flow direction of the dam demonstrates a stable and significant periodic variation pattern.
The horizontal displacement measurement points, EXD-18 in the water flow direction of the 18# section of the riverbed overflow dam and EXD-D7 in the water flow direction of the diversion bottom outlet dam section, are designated as the focal points of analysis in this article. A total of 1958 sets of measured data from 1 January 2013 to 12 May 2018 have been selected as the training set, while 346 sets of measured data from 13 May 2018 to 23 April 2019 constitute the test set. The models were trained using the statistical model HST and the machine learning models XGBoost, Extra-Trees, and SVR, as well as the stacking model fusion method proposed in this paper. Employing the Grey Wolf optimization algorithm from biomimetic intelligent algorithms, the hyperparameter space of each sub-model within the stacking method was optimized. The optimal hyperparameter list is presented in Table 4.

4.2.1. Training and Prediction Process Analysis of Dam Deformation Prediction Models

The training and prediction processes of the time series for EXD-18 and EXD-D7, corresponding to the HST, Extra-Trees, XGBoost, SVR, and the stacking method proposed in this paper, are illustrated in Figure 14 and Figure 15, respectively.
From Figure 14 and Figure 15, it is apparent that the fitting and prediction residuals of the four machine learning models at the two measurement points are relatively small, and their performance exceeds that of the traditional HST model. The efficacy of different models varies across various time series datasets. For instance, during the fitting process for the time series at the two measurement points, the Extra-Trees model and the prediction process proposed in this paper exhibit the best fitting capabilities for the EXD-18 measurement point. At the EXD-D7 measurement point, the Extra-Trees, XGBoost, and the proposed process all demonstrate strong fitting abilities. These differences can be attributed to the distinct distribution and statistical characteristics of the datasets. Moreover, it can be clearly seen from the figures that the prediction residuals of the prediction process proposed in this paper are relatively small, indicating the best prediction performance and a significant improvement over the other four models.
To better demonstrate and compare the prediction performance of various models, the prediction results at EXD-18 of each model and their linear regression analysis graphs were produced. Additionally, to more clearly showcase the level of prediction accuracy, a histogram of the absolute values of the prediction residuals was created.
From Figure 16, it is clear that the HST model exhibits the highest prediction residual and the poorest prediction performance. This paper introduces a prediction process with the smallest prediction residual and the most superior prediction performance. The residual distribution of individual machine learning models (XGBoost, Extra-Trees, and SVR) is relatively uneven, and their predictive abilities are considered average. Regarding deformation trends, HST, XGBoost, Extra-Trees, and SVR all demonstrate significant deviations in their predictions at various stages, which the prediction process proposed in this paper substantially improves. In terms of deformation peaks and local extremes, HST, XGBoost, Extra-Trees, and SVR show large prediction errors and limited characterization ability. This paper presents a prediction process with minimal prediction errors and robust characterization ability. The time series of dam deformation is non-stationary and nonlinear, and its peaks and local extremes contain valuable information about the underlying impact mechanisms, which are crucial for capturing the peaks and local extremes of dam deformation time series. Overall, the prediction process proposed in this paper exhibits outstanding performance compared to other individual models in terms of both overall trend prediction and local detail capture, and it can accurately and effectively predict the changing laws of dam deformation.
The prediction process for the results at measurement points EXD-2, EXD-14, and EXD-25 were created, as depicted in Figure A1 of Appendix A.

4.2.2. Training and Prediction Residual Analysis of Dam Deformation Prediction Models

Figure 17 and Figure 18 display the Raincloud plots for residuals and the absolute value of residuals, respectively. From Figure 17, it is apparent that the residuals of the prediction process proposed in this paper are more concentrated than those of other models in terms of time series prediction performance at the two measurement points. Almost all residuals are distributed within 1.5 times of the interquartile range, with only a few outliers, and the median residual is close to zero. Figure 18 illustrates more clearly that the absolute value of residuals of the prediction process in this paper is relatively minimal, indicating superior prediction accuracy at each point in the time series prediction process compared to other models. Therefore, the prediction process proposed in this paper demonstrates better prediction accuracy and stability relative to other models.

4.2.3. Analysis of Evaluation Indicators of Dam Deformation Prediction Models

Table 5 and Table 6 display the basic performance evaluation indicators (MAE, MSE, and RMSE) for different prediction models at the EXD-18 and EXD-D7 measurement points, respectively. These three indicators are utilized to assess the model’s performance at the two points. According to the tables, the prediction process proposed in this article has MAE values of 0.1007 and 0.0906 for EXD-18 and EXD-D7 during the training process, respectively, while the MSE values are 0.0207 and 0.0164, respectively. During the prediction process, the MAE values are 0.1249 and 0.2153 and the MSE values are 0.0242 and 0.0784, respectively. This demonstrates that the prediction process’s fitting and prediction accuracies are superior to those of the HST model and other single machine learning models.
Due to the strong numerical scale dependence of the MAE and MSE indicators, comparing the performance of the same model at different measurement points based solely on these values is not straightforward. The three model evaluation indicators proposed in this paper provide a more comprehensive and objective evaluation of the model’s performance.
Table 7 and Table 8 present the proposed evaluation indicators for different prediction models at the EXD-18 and EXD-D7 measurement points, respectively.
SEI1 and SEI2 utilize a set of indicators for comparing the performance against the HST model, which is the most commonly used model for predicting the deformation of concrete dams and serves as the benchmark. These indicators demonstrate the relative advancement of the comparative models against this benchmark. The value range for SEI1 and SEI2 is [0, 1], with values closer to 0 indicating a model that is more advanced relative to the benchmark. SEI1 and SEI2 have threshold values of 1 and 0.5, respectively. Values below these thresholds suggest that the model’s predictive ability is more advanced compared to the benchmark; otherwise, it suggests reduced capability.
AEI1 and AEI2 are designed to avoid numerical scale dependence, with a value range also of [0, 1]. Values closer to 0 signify greater accuracy. GEI represents the generalization ability of the model, with a GEI value closer to 1 when the accuracy and superiority of the model are maximized, indicating stronger generalization ability.
From the tables, it is evident that SEI1, SEI2, AEI1, and AEI2 for the prediction process in this paper are 0.1489, 0.0607, 0.0146, and 0.0808 for EXD-18 and 0.3601, 0.2312, 0.0298, and 0.1128 for EXD-D7, respectively. These values demonstrate that the prediction process’s accuracy is superior to that of other single models and exhibits strong progressiveness. Notably, the SEI2 values are very small for both measurement points, indicating that the prediction process’s accuracy at most time points of series predictions surpasses that of the HST model. The GEI values for the prediction process are 1.1906 and 2.6485, respectively, signifying robust generalization. Although the GEI of the SVR model is closer to 1, this does not necessarily indicate superior generalization compared to the prediction process proposed in this paper, as strong generalization must also be founded on accuracy.
Figure 19 and Figure 20 display the radar chart of the basic evaluation indicators and the histogram of the proposed evaluation indicators, respectively. From these figures, it is evident that the MAE, MSE, and RMSE of the prediction process discussed in this paper are all lower than those of other models, indicating the highest prediction accuracy. Similarly, the values of AEI1, AEI2, SEI1, and SEI2 are low, demonstrating the accuracy and progressiveness of the prediction process discussed in this paper.

5. Conclusions

This paper proposes a novel artificial intelligence prediction process for concrete dam deformation, based on a stacking model fusion method. Initially, considering the spatiotemporal correlation, effective filling of dam deformation monitoring data was achieved by using the proposed spatiotemporal correlation weighted regression interpolation method. Using widely accepted wavelet analysis theory, the monitoring signal was denoised. The preprocessing of the original observation signals for concrete dam deformation was accomplished. Subsequently, employing the stacking method, three regression prediction artificial intelligence algorithms—XGBoost, Extra-Trees, and SVR—were utilized as the first layer of base learners, and MLR was employed as the second-layer meta-learner. The Grey Wolf optimization algorithm was used to optimize the hyperparameters of the base learners, and a concrete dam displacement prediction process was constructed by integrating multiple machine learning algorithms within the stacking model fusion framework. Additionally, a multi-dimensional evaluation framework for model performance was established, incorporating indicators such as superiority, accuracy, and generalization, thereby creating a holistic assessment system for prediction effectiveness.
An engineering example demonstrates that the proposed preprocessing for the displacement monitoring information of concrete dams is accurate and effective, avoiding issues such as data loss and noise in monitoring data. Compared to traditional statistical models and commonly used single machine learning models, the stacking ensemble learning prediction process significantly enhances fitting accuracy, prediction accuracy, and generalization capabilities. Whether in predicting overall trends or capturing local details, the proposed process exhibits outstanding performance. The model performance evaluation system introduced in this paper can comprehensively and accurately evaluate the performance of prediction models. The prediction process based on the stacking model fusion method provides a scientific and effective basis for deformation prediction and the safety monitoring of concrete dams.

Author Contributions

Conceptualization, W.W. and Y.F.; methodology, W.W. and S.Z. (Shuai Zhang); software, W.W. and S.Z. (Sen Zheng); validation, W.W. and S.Z. (Sen Zheng); formal analysis, W.W. and W.C.; investigation, W.W. and H.L.; resources, H.S.; data curation, H.S.; writing—original draft preparation, W.W.; writing—review and editing, W.W. and H.S.; visualization, W.W.; supervision, H.S.; project administration, W.C. and H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant No. 52239009, 51979093, the National Key Research and Development Program of China, grant No. 2019YFC1510801, the Open Foundation of The National Key Laboratory of Water Disaster Prevention, grant No. 523024852 and the Fundamental Research Funds for the Central Universities, grant No. 2019B69814.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to laboratory’s policy and project collaboration agreements.

Conflicts of Interest

Authors Yanming Feng and Shuai Zhang are employed by the company Powerchina Kunming Engineering Corporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Prediction at Measurement Points EXD-2, EXD-14, and EXD-25

To more effectively illustrate the prediction performance of the proposed process across different measurement point time series, line graphs of the prediction process and linear correlation graphs for the results at measurement points EXD-2, EXD-14, and EXD-25 were created, as depicted in Figure A1.
Figure A1. Proposed process prediction results and their linear analysis at different points: (a) prediction results of EXD-2; (b) linear analysis of EXD-2 prediction; (c) prediction results of EXD-14; (d) linear analysis of EXD-14 prediction; (e) prediction results of EXD-25; (f) linear analysis of EXD-25 prediction.
Figure A1. Proposed process prediction results and their linear analysis at different points: (a) prediction results of EXD-2; (b) linear analysis of EXD-2 prediction; (c) prediction results of EXD-14; (d) linear analysis of EXD-14 prediction; (e) prediction results of EXD-25; (f) linear analysis of EXD-25 prediction.
Water 16 01868 g0a1

References

  1. Su, H.; Wen, Z.; Wang, F.; Hu, J. Dam Structural Behavior Identification and Prediction by Using Variable Dimension Fractal Model and Iterated Function System. Appl. Soft. Comput. 2016, 48, 612–620. [Google Scholar] [CrossRef]
  2. Bukenya, P.; Moyo, P.; Beushausen, H.; Oosthuizen, C. Health Monitoring of Concrete Dams: A Literature Review. J. Civil Struct. Health Monit. 2014, 4, 235–244. [Google Scholar] [CrossRef]
  3. Salazar, F.; Moran, R.; Toledo, M.A.; Onate, E. Data-Based Models for the Prediction of Dam Behaviour: A Review and Some Methodological Considerations. Arch. Comput. Method Eng. 2017, 24, 1–21. [Google Scholar] [CrossRef]
  4. Kang, F.; Liu, J.; Li, J.; Li, S. Concrete Dam Deformation Prediction Model for Health Monitoring Based on Extreme Learning Machine. Struct. Control Health Monit. 2017, 24, e1997. [Google Scholar] [CrossRef]
  5. Salazar, F.; Toledo, M.A.; Oñate, E.; Morán, R. An Empirical Comparison of Machine Learning Techniques for Dam Behaviour Modelling. Struct. Saf. 2015, 56, 9–17. [Google Scholar] [CrossRef]
  6. Bernier, C.; Padgett, J.E.; Proulx, J.; Paultre, P. Seismic Fragility of Concrete Gravity Dams with Spatial Variation of Angle of Friction: Case Study. J. Struct. Eng. 2016, 142, 05015002. [Google Scholar] [CrossRef]
  7. Wang, S.; Xu, Y.; Gu, C.; Bao, T.; Xia, Q.; Hu, K. Hysteretic Effect Considered Monitoring Model for Interpreting Abnormal Deformation Behavior of Arch Dams: A Case Study. Struct. Control Health Monit. 2019, 26, e2417. [Google Scholar] [CrossRef]
  8. Nguyen, T.T.; Le, V.D.; Huynh, T.Q.; Nguyen, N.H.T. Influence of Settlement on Base Resistance of Long Piles in Soft Soil—Field and Machine Learning Assessments. Geotechnics 2024, 4, 447–469. [Google Scholar] [CrossRef]
  9. Huynh, T.Q.; Nguyen, T.T.; Nguyen, H. Base Resistance of Super-Large and Long Piles in Soft Soil: Performance of Artificial Neural Network Model and Field Implications. Acta Geotech. 2023, 18, 2755–2775. [Google Scholar] [CrossRef]
  10. Hariri-Ardebili, M.A.; Mahdavi, G.; Nuss, L.K.; Lall, U. The Role of Artificial Intelligence and Digital Technologies in Dam Engineering: Narrative Review and Outlook. Eng. Appl. Artif. Intell. 2023, 126, 106813. [Google Scholar] [CrossRef]
  11. Rankovic, V.; Grujovic, N.; Divac, D.; Milivojevic, N. Development of Support Vector Regression Identification Model for Prediction of Dam Structural Behaviour. Struct. Saf. 2014, 48, 33–39. [Google Scholar] [CrossRef]
  12. Su, H.; Li, X.; Yang, B.; Wen, Z. Wavelet Support Vector Machine-Based Prediction Model of Dam Deformation. Mech. Syst. Signal Proc. 2018, 110, 412–427. [Google Scholar] [CrossRef]
  13. Yao, K.; Wen, Z.; Yang, L.; Chen, J.; Hou, H.; Su, H. A Multipoint Prediction Model for Nonlinear Displacement of Concrete Dam. Comput.-Aided Civil Infrastruct. Eng. 2022, 37, 1932–1952. [Google Scholar] [CrossRef]
  14. Zhang, S.; Zheng, D.; Liu, Y. Deformation Prediction System of Concrete Dam Based on IVM-SCSO-RF. Water 2022, 14, 3739. [Google Scholar] [CrossRef]
  15. Gu, C.; Wu, B.; Chen, Y. A High-Robust Displacement Prediction Model for Super-High Arch Dams Integrating Wavelet De-Noising and Improved Random Forest. Water 2023, 15, 1271. [Google Scholar] [CrossRef]
  16. Dai, B.; Gu, C.; Zhao, E.; Qin, X. Statistical Model Optimized Random Forest Regression Model for Concrete Dam Deformation Monitoring. Struct. Control Health Monit. 2018, 25, e2170. [Google Scholar] [CrossRef]
  17. Cao, E.; Bao, T.; Yuan, R.; Hu, S. Hierarchical Prediction of Dam Deformation Based on Hybrid Temporal Network and Load-Oriented Residual Correction. Eng. Struct. 2024, 308, 117949. [Google Scholar] [CrossRef]
  18. Qu, X.; Yang, J.; Chang, M. A Deep Learning Model for Concrete Dam Deformation Prediction Based on RS-LSTM. J. Sens. 2019, 2019, 4581672. [Google Scholar] [CrossRef]
  19. Song, S.; Zhou, Q.; Zhang, T.; Hu, Y. Automatic Concrete Dam Deformation Prediction Model Based on TPE-STL-LSTM. Water 2023, 15, 2090. [Google Scholar] [CrossRef]
  20. Zhang, C.; Fu, S.; Ou, B.; Liu, Z.; Hu, M. Prediction of Dam Deformation Using SSA-LSTM Model Based on Empirical Mode Decomposition Method and Wavelet Threshold Noise Reduction. Water 2022, 14, 3380. [Google Scholar] [CrossRef]
  21. Cao, E.; Bao, T.; Gu, C.; Li, H.; Liu, Y.; Hu, S. A Novel Hybrid Decomposition-Ensemble Prediction Model for Dam Deformation. Appl. Sci. 2020, 10, 5700. [Google Scholar] [CrossRef]
  22. Li, Y.; Bao, T.; Gong, J.; Shu, X.; Zhang, K. The Prediction of Dam Displacement Time Series Using STL, Extra-Trees, and Stacked LSTM Neural Network. IEEE Access 2020, 8, 94440–94452. [Google Scholar] [CrossRef]
  23. Liu, M.; Wen, Z.; Zhou, R.; Su, H. Bayesian Optimization and Ensemble Learning Algorithm Combined Method for Deformation Prediction of Concrete Dam. Structures 2023, 54, 981–993. [Google Scholar] [CrossRef]
  24. Song, J.; Chen, Y.; Yang, J. A Novel Outlier Detection Method of Long-Term Dam Monitoring Data Based on SSA-NAR. Wirel. Commun. Mob. Comput. 2022, 2022, 6569367. [Google Scholar] [CrossRef]
  25. Shao, C.; Zheng, S.; Gu, C.; Hu, Y.; Qin, X. A Novel Outlier Detection Method for Monitoring Data in Dam Engineering. Expert Syst. Appl. 2022, 193, 116476. [Google Scholar] [CrossRef]
  26. Cao, W.; Wen, Z.; Su, H. Spatiotemporal Clustering Analysis and Zonal Prediction Model for Deformation Behavior of Super-High Arch Dams. Expert Syst. Appl. 2023, 216, 119439. [Google Scholar] [CrossRef]
  27. Gao, Y.; Yu, X.; Su, Y.; Yin, Z.; Wang, X.; Li, S. Intelligent Identification Method for Drilling Conditions Based on Stacking Model Fusion. Energies 2023, 16, 883. [Google Scholar] [CrossRef]
  28. Zhang, Q.; Wu, J.; Ma, Y.; Li, G.; Ma, J.; Wang, C. Short-Term Load Forecasting Method with Variational Mode Decomposition and Stacking Model Fusion. Sustain. Energy Grids Netw. 2022, 30, 100622. [Google Scholar] [CrossRef]
  29. Yu, F.; Liu, X. Research on Student Performance Prediction Based on Stacking Fusion Model. Electronics 2022, 11, 3166. [Google Scholar] [CrossRef]
  30. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD’16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Assoc Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  31. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  32. Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  33. Mehdiyev, N.; Enke, D.; Fettke, P.; Loos, P. Evaluating Forecasting Methods by Considering Different Accuracy Measures. Procedia Comput. Sci. 2016, 95, 264–271. [Google Scholar] [CrossRef]
  34. Shcherbakov, M.V.; Brebels, A.; Shcherbakova, N.L.; Tyukov, A.P.; Janovsky, T.A.; Kamaev, V.A. A Survey of Forecast Error Measures. World Appl. Sci. J. 2013, 24, 171–176. [Google Scholar]
Figure 1. Location of deformation monitoring points of a concrete dam.
Figure 1. Location of deformation monitoring points of a concrete dam.
Water 16 01868 g001
Figure 2. Missing deformation monitoring data of a concrete dam.
Figure 2. Missing deformation monitoring data of a concrete dam.
Water 16 01868 g002
Figure 3. Stacking ensemble learning prediction process.
Figure 3. Stacking ensemble learning prediction process.
Water 16 01868 g003
Figure 4. Schematic diagram of the stacking-based learner.
Figure 4. Schematic diagram of the stacking-based learner.
Water 16 01868 g004
Figure 5. Schematic diagram of the stacking meta-learner.
Figure 5. Schematic diagram of the stacking meta-learner.
Water 16 01868 g005
Figure 6. Flow chart of the XGBoost algorithm.
Figure 6. Flow chart of the XGBoost algorithm.
Water 16 01868 g006
Figure 7. Flow chart of the Extra-Trees algorithm.
Figure 7. Flow chart of the Extra-Trees algorithm.
Water 16 01868 g007
Figure 8. Flow chart of the SVR algorithm.
Figure 8. Flow chart of the SVR algorithm.
Water 16 01868 g008
Figure 9. Layout of the horizontal displacement monitoring system of a concrete dam.
Figure 9. Layout of the horizontal displacement monitoring system of a concrete dam.
Water 16 01868 g009
Figure 10. Overall flowchart of the paper.
Figure 10. Overall flowchart of the paper.
Water 16 01868 g010
Figure 11. Spatial correlation of EXD-18 measurement point reference sequence: (a) Time sequence of EXD-18 and EXD-2; (b) Correlation coefficient between time sequences of EXD-18 and EXD-2; (c) Time sequence of EXD-18 and EXD-3; (d) Correlation coefficient between time sequences of EXD-18 and EXD-3; (e) Time sequence of EXD-18 and EXD-4; (f) Correlation coefficient between time sequences of EXD-18 and EXD-4.
Figure 11. Spatial correlation of EXD-18 measurement point reference sequence: (a) Time sequence of EXD-18 and EXD-2; (b) Correlation coefficient between time sequences of EXD-18 and EXD-2; (c) Time sequence of EXD-18 and EXD-3; (d) Correlation coefficient between time sequences of EXD-18 and EXD-3; (e) Time sequence of EXD-18 and EXD-4; (f) Correlation coefficient between time sequences of EXD-18 and EXD-4.
Water 16 01868 g011
Figure 12. Time correlation of EXD-18 measurement point reference sequence: (a) Time sequence of 2014 and 2013; (b) Correlation coefficient between time sequences of 2014 and 2013; (c) Time sequence of 2014 and 2015; (d) Correlation coefficient between time sequences of 2014 and 2015; (e) Time sequence of 2014 and 2017; (f) Correlation coefficient between time sequences of 2014 and 2017.
Figure 12. Time correlation of EXD-18 measurement point reference sequence: (a) Time sequence of 2014 and 2013; (b) Correlation coefficient between time sequences of 2014 and 2013; (c) Time sequence of 2014 and 2015; (d) Correlation coefficient between time sequences of 2014 and 2015; (e) Time sequence of 2014 and 2017; (f) Correlation coefficient between time sequences of 2014 and 2017.
Water 16 01868 g012
Figure 13. Water level–temperature–displacement hydrograph.
Figure 13. Water level–temperature–displacement hydrograph.
Water 16 01868 g013
Figure 14. Training and prediction processes of different models for horizontal displacement time series of EXD-18 measurement points: (a) HST, (b) Extra-Trees, (c) XGBoost, (d) SVR, and (e) Proposed.
Figure 14. Training and prediction processes of different models for horizontal displacement time series of EXD-18 measurement points: (a) HST, (b) Extra-Trees, (c) XGBoost, (d) SVR, and (e) Proposed.
Water 16 01868 g014
Figure 15. Training and prediction processes of different models for horizontal displacement time series of EXD-D7 measurement points: (a) HST, (b) Extra-Trees, (c) XGBoost, (d) SVR, and (e) Proposed.
Figure 15. Training and prediction processes of different models for horizontal displacement time series of EXD-D7 measurement points: (a) HST, (b) Extra-Trees, (c) XGBoost, (d) SVR, and (e) Proposed.
Water 16 01868 g015
Figure 16. The prediction results at EXD-18 for different models and their linear analysis: (a) prediction results of HST; (b) linear analysis of HST prediction; (c) prediction results of Extra-Trees; (d) linear analysis of Extra-Trees prediction; (e) prediction results of XGBoost; (f) linear analysis of XGBoost prediction; (g) prediction results of SVR; (h) linear analysis of SVR prediction; (i) prediction results of the proposed process; (j) linear analysis of HST and the proposed process.
Figure 16. The prediction results at EXD-18 for different models and their linear analysis: (a) prediction results of HST; (b) linear analysis of HST prediction; (c) prediction results of Extra-Trees; (d) linear analysis of Extra-Trees prediction; (e) prediction results of XGBoost; (f) linear analysis of XGBoost prediction; (g) prediction results of SVR; (h) linear analysis of SVR prediction; (i) prediction results of the proposed process; (j) linear analysis of HST and the proposed process.
Water 16 01868 g016aWater 16 01868 g016b
Figure 17. Raincloud plots for residuals of different models: (a) EXD-18 prediction residuals; (b) EXD-D7 prediction residuals.
Figure 17. Raincloud plots for residuals of different models: (a) EXD-18 prediction residuals; (b) EXD-D7 prediction residuals.
Water 16 01868 g017
Figure 18. Raincloud plots for the absolute value of residuals of different models: (a) Absolute value of residuals of EXD-18 prediction; (b) Absolute value of residuals of EXD-D7 prediction. The red lines connect the various averages.
Figure 18. Raincloud plots for the absolute value of residuals of different models: (a) Absolute value of residuals of EXD-18 prediction; (b) Absolute value of residuals of EXD-D7 prediction. The red lines connect the various averages.
Water 16 01868 g018
Figure 19. Radar chart of the basic evaluation indicators for different models. (a) EXD-18 (b) EXD-D7.
Figure 19. Radar chart of the basic evaluation indicators for different models. (a) EXD-18 (b) EXD-D7.
Water 16 01868 g019
Figure 20. Histogram of the proposed evaluation indicators for different models.
Figure 20. Histogram of the proposed evaluation indicators for different models.
Water 16 01868 g020
Table 1. Interpolation expression and error indicators of EXD-18 measurement points in each period.
Table 1. Interpolation expression and error indicators of EXD-18 measurement points in each period.
Interpolation Periods and Methods18 June 2014~23 July 201426 May 2016~4 June 2016
16 July 2016~26 July 2016
22 June 2018~18 July 2018
STCMTCMSCMTCMTCM
Equationy = 0.084x1 + 0.003x2 − 0.166x3 + 0.552x4 + 0.351x5 + 0.091x6y = −0.021x1 + 0.409x2 + 0.579x3y = 0.367x1 + 0.178x2 + 0.516x3
y = 0.328x1 + 0.642x2 − 0.030x3
y = 0.238x4 + 0.434x5 + 0.247x6
MAE0.2220.4360.2160.3990.441
MSE0.0800.2910.0870.2460.282
Table 2. Interpolation expression and error indicators of EXD-D7 measurement points in each period.
Table 2. Interpolation expression and error indicators of EXD-D7 measurement points in each period.
Interpolation Periods and Methods27 July 2013~7 August 2013
31 October 2013~19 November 2013
26 May 2016~4 June 2016
16 July 2016~26 July 2016
31 May 2018~12 June 2018
STCMTCMSCMTCMSCM
Equationy = −0.447x1 + 0.173x2 + 0.528x3 − 2.396x4 + 1.221x5 + 2.047x6y = 0.140x0 + 0.991x2 − 0.219x3y = 1.386x4 + 0.822x5 − 1.382x6
y = 0.249x1 + 0.652x2 + 0.074x3
y = −1.556x4 + 1.915x5 + 0.722x6
MAE0.1960.6350.3470.4850.337
MSE0.0720.6200.1730.3330.160
Table 3. Comparison of noise reduction effects of different wavelet denoising methods.
Table 3. Comparison of noise reduction effects of different wavelet denoising methods.
Noise Reduction MethodEXD-18EXD-D7
RMSESNRRMSESNR
Sqtwolog-hard0.100632.97410.085932.3884
Sqtwolog-soft0.143529.87740.118629.5752
Rigorous-hard0.033642.4940.026042.7755
Rigorous-soft0.051738.76130.040838.8422
Heursure-hard0.059037.60460.050237.0474
Heursure-soft0.067536.43200.055736.1532
Minmaxi-hard0.070636.04540.060935.3709
Minmaxi-soft0.110432.16180.092631.7294
Table 4. List of optimal hyperparameters for each based-learner model.
Table 4. List of optimal hyperparameters for each based-learner model.
ModelParametersOptimal Parameter
EXD-18
Optimal Parameter
EXD-D7
XGBoostGamma0.050.04
Alpha5.054.07
Eta0.200.18
Max_depth78
Min_child_weight 79
Lambda0.650.53
Colsample_bytree0.980.85
Subsample0.770.64
Extra-treesN_estimators8969
Max_depth68
Min_samples_split24
Min_samples_leaf46
SVRC1.261.03
Gamma0.110.18
Table 5. List of basic performance evaluation indicators of different prediction models at EXD-18.
Table 5. List of basic performance evaluation indicators of different prediction models at EXD-18.
EXD-18HSTXGBoostExtra-TreesSVRProposed
Training
Set
MAE0.47050.22130.09460.25380.1007
MSE0.37050.08130.01710.12250.0207
RMSE0.60870.28510.13080.35000.1439
Test SetMAE0.83890.43410.42090.29900.1249
MSE0.91830.29510.28890.14880.0242
RMSE0.95830.54320.53750.38570.1555
Table 6. List of basic performance evaluation indicators of different prediction models at EXD-D7.
Table 6. List of basic performance evaluation indicators of different prediction models at EXD-D7.
EXD-D7HSTXG-BoostExtra-TreesSVRProposed
Training
Set
MAE0.54160.12500.05110.36990.0906
MSE0.48520.02600.00750.29440.0164
RMSE0.69660.16120.08660.54260.1281
Test
Set
MAE0.59790.41720.42330.45160.2153
MSE0.53980.25720.30670.31070.0784
RMSE0.73470.50710.55380.55740.2800
Table 7. List of proposed performance evaluation indicators of different prediction models at EXD-18.
Table 7. List of proposed performance evaluation indicators of different prediction models at EXD-18.
EXD-18HSTXG-BoostExtra-TreesSVRProposed
SEI11.00000.51750.50170.35640.1489
SEI2 0.12720.12430.17340.0607
AEI10.10000.05590.05350.03910.0146
AEI20.31790.18620.17980.16460.0808
GEI1.95642.10214.79601.24841.1906
Table 8. List of proposed performance evaluation indicators of different prediction models at EXD-D7.
Table 8. List of proposed performance evaluation indicators of different prediction models at EXD-D7.
EXD-D7HSTXG-BoostExtra-TreesSVRProposed
SEI11.00000.69780.70800.75530.3601
SEI2 0.30350.36130.48270.2312
AEI10.07790.05300.05860.05310.0298
AEI20.26530.20420.22070.23260.1128
GEI1.30523.68807.76061.06242.6485
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, W.; Su, H.; Feng, Y.; Zhang, S.; Zheng, S.; Cao, W.; Liu, H. A Novel Artificial Intelligence Prediction Process of Concrete Dam Deformation Based on a Stacking Model Fusion Method. Water 2024, 16, 1868. https://doi.org/10.3390/w16131868

AMA Style

Wu W, Su H, Feng Y, Zhang S, Zheng S, Cao W, Liu H. A Novel Artificial Intelligence Prediction Process of Concrete Dam Deformation Based on a Stacking Model Fusion Method. Water. 2024; 16(13):1868. https://doi.org/10.3390/w16131868

Chicago/Turabian Style

Wu, Wenyuan, Huaizhi Su, Yanming Feng, Shuai Zhang, Sen Zheng, Wenhan Cao, and Hongchen Liu. 2024. "A Novel Artificial Intelligence Prediction Process of Concrete Dam Deformation Based on a Stacking Model Fusion Method" Water 16, no. 13: 1868. https://doi.org/10.3390/w16131868

APA Style

Wu, W., Su, H., Feng, Y., Zhang, S., Zheng, S., Cao, W., & Liu, H. (2024). A Novel Artificial Intelligence Prediction Process of Concrete Dam Deformation Based on a Stacking Model Fusion Method. Water, 16(13), 1868. https://doi.org/10.3390/w16131868

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop