For large-volume concrete dams, local deformation abnormalities in a single measurement point do not represent changes in the safety of the dam structure; thus, analysis of local deformation of the dam alone is no longer sufficient to meet the requirements. Based on the panel characteristics of concrete dam deformation, if we can consider the elevation and regional differences in dam deformations from the traditional “point” analysis, we can avoid the biased judgement caused by considering local deformation in a large part. At the same time, concrete dam deformation varies greatly in different regions due to the large differences in factors influencing deformation behavior in different areas of concrete dams (such as load action, constraint conditions, material properties, environmental factors, etc.). Thus, how to model parts with similar deformation patterns and homogeneous responses to loads is associated with the robustness of the regional analysis model. Deformation partition analysis needs to deal with the following two questions: (1) what statistics should be used to characterize the degree of similarity between deformations in the measured points? And (2) what criteria are used to determine the degree of similarity between the regions? In the following section, the similarity criterion for deformation is developed based on panel characteristics of concrete dam deformation combined with spatial and temporal information on deformation. The structural deformation properties in both time and cross-sectional dimensions are then studied to establish the concrete dam deformation partitioning method.
2.1. Deformation Partitioning Criterion
Concrete dam measurement point deformation partitioning is the process of distinguishing and classifying the deformation properties of the entire dam based on the similarity (dissimilarity) of the deformation of each measurement point. Under the premise that no assumptions should be made regarding the deformation field, the deformation of each component of the concrete dam is examined and processed using mathematical methods and the appropriate classification criteria are established. On one hand, dam deformation monitoring data are compressed and retrieved, while on the other hand, the basis for further investigation is created. The goal is to make the deformation law inside the region as near to the closest degree of similarity as possible, while making the deformation law between regions as dissimilar as possible. Traditional deformation partitioning methods utilize the mean values of time series deformation of each measurement point, i.e., the deformation series degenerates into a cross-sectional series. This method can only represent the average change in dam deformation, which leads to loss of deformation information on the time dimension. Besides, the method is based on an assumption that the deformation of each measurement point varies in the same direction in the time dimension, making it difficult to reflect changes in deformation properties over time. As shown in
Figure 1, if the above method of taking the mean value is adopted, measurement point 1 and measurement point 3 should be grouped in one category; however, if we consider the change in the deformation sequence over the whole time period, it is more reasonable to group measurement point 2 and measurement point 3 into one category.
Concrete dam deformation data provide information on the following three aspects: first, the absolute dam deformation values; second, the dynamic deformation time series, i.e., the increment in deformation over time; and third, the fluctuation in deformation development, i.e., the degree of variability or fluctuation. Thus, three similarity indices (absolute distance, incremental distance and growth rate distance) are merged to effectively reflect the similarity in deformation monitoring sequence on which dam deformation partitioning can be performed.
When preprocessing the deformation monitoring data, let {δit} and {δjt} denote the absolute deformation of the monitoring point i and monitoring point j at time section t, in which i and j are the indices of the cross-sectional dimension (spatial units), with i, j = 1, 2, …, n, and t is the monitoring time index of the time dimension (time periods),with t = 1, 2, …, T. In total, there are n monitoring points and T monitoring days.
First, the Absolute Quantity Euclidean Distance between points
i and
j, denoted as
dij (AQED), can be expressed as follows:
where
δit is the deformation value of measurement point
i at time
t;
δjt is the deformation value of measurement point
j at time
t; and
dij (AQED) characterizes the distance between measurement point
i and measurement point
j during the whole period
T.
Second, the Increment Quantity Euclidean Distance between points
i and
j, denoted as
dij (IQED), can be expressed as follows:
where Δ
δit =
δit −
δit−1, Δ
δjt =
δjt −
δjt−1. Δ
δit and Δ
δjt denote the differences in the absolute amount of deformation between two adjacent periods.
dij (IQED) characterizes the difference between the absolute quantity of indices in adjacent periods, which specifies the magnitudes of fluctuations in the data between points
i and
j in
T time sections.
Third, the Increment Speed Euclidean Distance between points
i and
j, denoted as
dij (ISED), can be expressed as follows:
dij (ISED) portrays the difference in the incremental deformation trend of measurement points
i and
j over time. If the deformation is changing in the same direction over time and the more coordinated this change is, the more similar they are and the smaller
dij (ISED) is; if the corresponding deformation is changing in the opposite direction, the similarity is poor and
dij (ISED) will be larger at this time, which is in line with the basic principle of similarity metrics. However, there are several problems associated with the formula for traditional Increment Speed Euclidean Distance: (1) The denominator changes to 0 when measured deformation values at a measuring point are unchanged in two adjacent time periods. (2) When measured values at a measuring point change very little in two adjacent time periods,
dij (ISED) is infinite. Both cases will make the results of the growth distance inaccurate. Thus, the Relative Deformation Increase Euclidean Distance between points
i and
j, denoted as
dij (RDIED), is proposed. It can be expressed as follows:
where Δ
δmax and Δ
δmin represent the maximum value and the minimum value of Δ
δit and Δ
δjt, namely, the increment in the absolute amount of deformation between two adjacent periods, respectively. Δ
δmax − Δ
δmin indicate the amplitude of the incremental deformation in the measurement points
i and
j. Thus,
dij (RDIED) reflects the relative increment in deformation considering the maximum increase in the amplitude of deformation sequences and can be used as the deformation similarity index instead of
dij (ISED).
In order to accurately describe the deformation characteristics of each measurement point, it is necessary to establish a comprehensive criterion to portray the deformation similarity. The “comprehensive distance” (Comprehensive Euclidean Distance) between measurement points
i and
j, abbreviated as
dij (CED), is introduced as:
where
ω1 +
ω2 +
ω3 = 1.
ω1,
ω2 and
ω3 denote the weights of the three distances.
The Comprehensive Euclidean Distance dij (CED) is a weighted combination of the above three distances and the weight coefficients can be subjectively given or objectively determined based on the actual situation. In order to reflect the comprehensive information in the concrete dam’s spatial deformation data itself, a combination of the entropy weighting method and the CRiteria Importance Through Intercriteria Correlation (CRITIC) method are used to calculate the comprehensive distance weight coefficients.
2.3. Clustering Method for Dam Deformation Monitoring Points
The
k-means method is introduced into the clustering analysis in response to the question of which criterion to use to determine the degree of similarity between deformation regions. The
k-means method is a classical unsupervised machine learning algorithm that clusters samples based on their distances to
K clustering centers
μk. Each partition is set as
ck (
k = 1, 2, …,
K) [
36]. The algorithm is used widely due to its computational simplicity and high efficiency. Based on the similarity measurement criteria proposed above, the calculation steps are as follows [
37]:
Step 1: Randomly select k sample points from the dataset yi (i = 1, 2, …, n) as cluster centers μk.
Step 2: Calculate the Absolute Quantity Euclidean Distance, the Increment Quantity Euclidean Distance and the Relative Deformation Increase Euclidean Distance according to Equation (1), Equation (2) and Equation (4).
Step 3: Refer to Equation (15) and calculate the entropy weight coefficients of the three distances in step 2. Substitute the weight coefficients into Equation (5) to calculate the comprehensive distance dij (CED) between n measurement points.
Step 4: Based on the comprehensive distance dij (CED) between each sample point and the cluster center μk, the sample point is placed into a cluster corresponding to the cluster center with the greatest similarity.
Step 5: Recalculate the cluster center μk of each cluster based on the existing samples in the cluster.
Step 6: Iterate step 4 and step 5 until the objective function converges, that is, the cluster center does not change. This marks the end of the clustering process. The core code for clustering is shown in
Figure 2. The flow chart of the method for clustering dam deformation monitoring points is shown in
Figure 3.
Using the
k-means clustering method, a complete deformation partitioning process is proposed based on the deformation similarity criterion and the method for determining the number of regions. The choice of the value of
k is the key step when using
k-means to cluster deformation measurement points on concrete dams. In this paper, the elbow method is proposed for selecting the value of
k. The core index of the elbow method is the Sum of Squared Errors (SSE), which can be calculated as:
where
k is the number of clusters;
ck represents the number of
k clusters; and
yμ is the average value of the monitoring deformation data
yit. The basic idea of the elbow method when determining the optimal number of clusters is that as the number of clusters
k increases, sample grouping becomes more refined, the degree of aggregation of each cluster gradually increases and the SSE gradually decreases. When
k is less than the true number of clusters, the degree of aggregation of each cluster increases significantly due to the increase in
k, thus the SSE decreases significantly. When
k reaches the true number of clusters, the return on the degree of aggregation obtained by increasing
k decreases rapidly, thus the decline in SSE decreases sharply and then flattens as the value of
k continues to increase, that is, the graph of SSE and
k is elbow-shaped and the corresponding k value of the elbow is the true clustering number of the data.
Figure 4 shows the selection process for parameter
k using the elbow method.