1. Introduction
The power grid is an important urban infrastructure that supports the regular operation of a city and ensures the normal functioning of people’s daily lives. Urban development increases both dependence on the power grid and the magnitude of loss and damage caused by large-scale failures. Earthquakes pose the greatest threat of all natural disasters to a power grid and can entirely disrupt it. The prevention of damage to the grid due to earthquakes can maintain the safety and sustainability of modern society. Measures to deal with disasters can be divided into pre-disaster damage prevention and preparation, and post-disaster rescue and repair. The academic community believes that the former is superior to the latter and has proposed building a Culture of Prevention. Although post-disaster rescue and repair is always necessary, it is not enough to respond to a disaster only after it happens because the response is costly and its effects are temporary; pre-disaster damage prevention and preparation can contribute to more lasting security. Vulnerability assessment is a major earthquake damage prevention measure for power grids [
1].
The concept of vulnerability was first used in international political economy to explain dependency [
2]. It was subsequently introduced into the natural sciences and engineering to describe the state of a system and its components that were vulnerable to damage or exposure. Vulnerability is universal. Almost all systems have some degree of vulnerability. The manner of conducting a vulnerability analysis is a key issue when considering how to ensure stability in the operation of a system. We investigated the vulnerability of power grid nodes to earthquake events by finding weak links in the power grid in order to reduce vulnerability by improving the seismic grade of power facilities and reducing the risk of seismic damage to key nodes before an earthquake event. Our research has further significance in guiding post-earthquake rescue and repair strategies.
There is no single agreed-on concept of power grid vulnerability and no perfect vulnerability assessment method or widely accepted set of indicators. Vulnerability is influenced by internal and external factors. The latter include natural disasters and damage caused by humans, and the former include electrical component failures and convoluted power grid topology. When scholars of different disciplines study the vulnerability of power grid nodes, they often analyze from their own professional perspective, and the research focus varies. Some literature is concerned with the response of mechanical properties of electric power facilities to seismic activity and the capacity of the facilities to resist structural damage [
3,
4,
5]. Some literature describes the functional modeling of electrical substations as a method of studying how earthquakes damage these structures. The authors found that component damage caused by the earthquake led to a short circuit, propagated inside the substation and to surrounding substations, and could thus cause a system-wide failure [
6,
7]. Some literature uses the graph theory to analyze the relationship between network topology and reliability from the perspective of complex networks. This kind of research can be divided into a pure model and an extended model according to different abstract forms of the power grid. The former abstracts the power grid into a pure network, focusing on the influence of topological characteristics (such as degree, proximity, betweenness, clustering coefficient, etc.) on the network, which has been applied in studying the failure mechanisms of the U.S. power grid [
8,
9,
10], the European power grid [
11], and the Italian power grid [
12]. The latter pays more attention on the electrical performance of the grid, incorporating the characteristics of electrical components, such as impedance, power, and component capacitance, into the network model. Compared with the pure model, the extended model is more in line with the physical characteristics of the grid [
13,
14,
15,
16], but because of its computational complexity, the extended model is not practical for large networks and complex situations. Some literature has focused on the performance of the power grid. Researchers have analyzed power flow and derived performance indicators that can be used to judge the state of the power grid network [
17,
18,
19]. The aforementioned research examines vulnerability from different perspectives. However, the vulnerability of a power grid is multi-faceted and complex, so it must be assessed comprehensively. We introduce the variable fuzzy clustering algorithm, which we use to quantify the vulnerability of a power grid by analyzing characteristics of the nodes.
We selected two types of indicators to measure the vulnerability of power grid nodes: Structural vulnerability indicators and functional vulnerability indicators. The former are concerned with vulnerability due to network topology, and the two indicators used are the hierarchical level of each node and the critical threshold. The indicator of functional vulnerability is the service characteristic indicator, also known as the probability of node disconnection. We justify the direction of our research and the choice of indicators as follows.
(1) The common topological indicators used to quantify vulnerability are node degrees and betweenness [
20,
21], but the research literature [
22,
23] shows that the conclusions obtained from the use of the degree indicator are one-sided. In general, the greater the degree of a node, the greater its vulnerability. However, some special nodes in the network, such as bridge nodes, although small in degree, are very vulnerable. Use of the betweenness indicator requires a holistic understanding of the network and its information, which is often difficult to obtain. Therefore, we use the critical threshold and the hierarchical level indicators to measure the node vulnerability from a topological perspective.
(2) Under normal conditions, the performance of the power grid should be calculated from the power flow in the network, from which the power distribution, voltage, and other performance indicators can be obtained. However, if there is earthquake damage, the quantitative relationship between the failure probability of high-voltage electrical equipment and the power flow loss is extremely difficult to determine. To facilitate research, this calculation is generally replaced by network connectivity analysis. Node disconnection probability is used instead of the power performance indicator. If the node is connected, the node power performance is considered to be normal. This substitution is acceptably accurate for earthquake-related vulnerability analysis [
24].
(3) The boundary between structural vulnerability indicators and functional vulnerability indicators is not inflexible when indicators are selected. For example, the critical threshold indicator is derived from the cascading failure model of complex networks. This model represents the process of cascading failures caused by load redistribution and includes the functional influence of nodes. Thus, the critical threshold represents a certain functional attribute [
22,
23]. Therefore, we combined the critical threshold and node disconnection probability for a more comprehensive indicator instead of undertaking power flow analysis of the grid to take into account the performance and topology of the network.
In the past, the research on the power grid was conducted from different angles, and a single index was selected to evaluate the vulnerability of the nodes. For example, the literature [
25,
26] uses the probability of node failure under earthquakes; the literature [
27,
28] uses the node degree or the power-based degree; the literature [
8,
12,
29,
30] uses the node betweenness or electrical betweenness; the literature [
31] uses the node electrical centrality. The above-mentioned indicators assuredly reflect the degree of node vulnerability to some extent, but as vulnerability is a rather complex problem, a single indicator fails to comprehensively measure the node vulnerability. The main contribution of this paper is to use the idea of clustering and select relevant indicators from two aspects (internal factors, i.e., the characteristics of the power grid; external factors, i.e. the impact of earthquake effects) to comprehensively evaluate the vulnerability of power grid nodes.
The rest of this paper is organized as follows. In
Section 2, the reasons for using a variable fuzzy clustering model, and the advantages it offers, are presented. In
Section 3, the specific calculations for each of the three indicators are given. In
Section 4, the algorithm flow of the variable fuzzy clustering model is introduced. In
Section 5, the grid of a particular city is used as an example to show how the vulnerabilities of grid nodes are classified and sorted. In
Section 6, through a discussion and analysis of the results, the reasons for the high vulnerability of certain nodes are identified, and some targeted measures are proposed to reduce the impact of vulnerabilities at such nodes. In the final section, the research described in this paper is summarized, and the direction of future research is outlined.
2. Methodology
We pioneer the use of a clustering algorithm, which is a data mining technique, to classify vulnerability. This methodology was inspired by the Walmart beer and diaper story. Walmart executives found, by analyzing sales data, that two completely unrelated products, beer and diapers, were often sold at the same time. Research showed that in a family with a newborn, the mother takes care of the baby, and the father is responsible for the purchase of diapers. However, when the father purchases diapers, he often also buys beer. The beer–diaper association is difficult to understand at first sight, but it follows a pattern that can be identified through data mining and analysis.
In a power grid, a single indicator cannot accurately quantify the vulnerability of a node. Only by considering a number of indicators can we comprehensively evaluate the vulnerability. Using a clustering algorithm to find the vulnerability, using data that represents node characteristics, therefore appears to be a fruitful approach. Early clustering algorithms, such as hierarchical clustering and k-means clustering, strictly classify data objects into certain well-defined categories [
32], but in many problems that are encountered, the boundaries between categories are vague or ill-defined. This is the case with power grid vulnerability. There is no clear boundary between high and low vulnerability, so it may not be reasonable to use older clustering algorithms for classification. In the 1960s, when Zadeh introduced the concept of a fuzzy set [
33], fuzzy set theory was used in clustering problems in fuzzy clustering analysis. In 1984, Bezdek developed the c-means fuzzy clustering algorithm (FCM) [
34], which is used extensively. We propose a variable fuzzy clustering model, which improves on FCM.
To study the vulnerability of the power grid to earthquake damage, it is necessary to classify the nodes and to evaluate different types of vulnerability to develop targeted prevention measures. FCM can only categorize the sample nodes and does not quantify the characteristics of the categories. The variable fuzzy clustering model we introduce in this paper can assess the vulnerability of each category by improving the FCM algorithm.
4. Variable Fuzzy Clustering Model
4.1. Algorithm Flow of Variable Fuzzy Clustering Model
With a sample size of
n, the number of selected metrics is
m. The sample index data are listed in matrix
X:
where
xij is the data of the
ith indicator of sample
j, and
i = 1, 2, …,
m;
j = 1, 2, …,
n.
The data are normalized because the dimensions of different indicators differ. The equation to normalize indicators for which larger values indicate better performance is:
The equation to normalize indicators for which smaller values indicate better performance is:
After normalizing
X using Equations (12) and (13), the normalized index matrix
R is:
where
rij is the normalized index. The closer
rij is to 1, the greater the vulnerability that is indexed. The purpose of the normalization is to facilitate the sorting of the vulnerability of the cluster center in a later stage (
Section 4.2).
The n samples are divided into c clusters, and the c cluster centers can be represented by a matrix, , where sih is the normalized ith indicator of cluster center h and 0 ≤ sih ≤ 1, i = 1, 2, …, m; h = 1, 2, …, c.
The membership matrix is formed, where uhj is the sample j belonging to the category h. h = 1, 2, …, c; j = 1, 2, …, n, and the condition must be satisfied.
The difference between the sample
j and the cluster center
h is represented by the distance
dhj. The weight vector w = (
w1 w2 …
wm) = (
wi) is formed, satisfying the condition
, where
wi represent the degree of influence of different indicators on clustering results. The equation for
dhj is:
where different
p values represent different distance parameters. When
p = 1, it is the Hamming distance; when
p is 2, it is the Euclidean distance.
To create the final membership matrix
and the cluster center matrix
, the objective function is:
where
α is a variable parameter. When
α = 1, the function corresponds to the least absolute criterion; when
α = 2, the function corresponds to the least squares criterion. This model is a conditional extremum problem, which is transformed into an unconditional extremum problem using a Lagrangian multiplier. The final iterative equation is obtained by:
Using Equations (17) and (18), the final membership matrix and the cluster center matrix are found by looping iterations, where α, p and wi are the variable parameters. We used α = 2 and p = 2, and equally weighted parameter combinations were used for clustering calculations.
4.2. Vulnerability Assessment and Level Characteristic Values
To quantify vulnerability, the vulnerability of c cluster centers in is first quantified. The vulnerability of the cluster varies as the vulnerability of the cluster center. In Equations (17) and (18), which normalize the sample data, the vulnerability increases as rij gets closer to 1. An ideal node is one for which all values are 1, which means that all indicators are most vulnerable, so the ideal node has the greatest vulnerability. The vulnerability of the cluster center Sh is quantified based on its distance from the ideal node, with a smaller distance representing a greater vulnerability.
The random number of the initial fuzzy clustering matrix () takes different values for different iterations, so the order of the cluster centers in can differ between iterations. To facilitate sorting, cluster centers are ranked by vulnerability from large to small as 1, 2, …, c. The membership matrix must be adjusted according to the order of cluster centers in .
After the adjustments to
and
, the nodes are classified. FCM clustering customarily uses the maximum membership principle, but Chen and Guo [
38] explicitly opposed this method, claiming that the classification based on the maximum membership principle lost the global information of membership degree. To give an extreme example: When the sample membership value is equal, the maximum membership principle cannot determine which category the sample belongs to. Therefore, Chen and Guo [
38] used the level characteristic value to determine the category. The membership distribution function of
u0 for
c categories is
, and the product of the grade variable
h and the degree of membership is summed to obtain the level characteristic value:
Using Formula (19) to determine the category of samples has a more explicit mathematical and physical meaning: assuming that the unit mass objects are distributed along the horizontal axis, the corresponding masses
u1,
u2, …,
uc, are concentrated on
c points 1, 2, …,
c, on the horizontal axis. As shown in
Figure 6,
H(u0) represents the centroid position of the object.
(1) If the membership degree is concentrated at one level point
a, there is
When u0 is a member of point a, it has a physical meaning that the mass point of the object is at point a.
(2) If the membership degree is evenly distributed at each level point, there is
When u0 is a member of point (c + 1)/2, it has a physical meaning that the mass point of the object is at the midpoint of the object.
After
H(
u0) is obtained, the grade of
u0 can be determined by:
The level characteristic value given by Equation (19) reflects the global information contained in the u0 membership degree and can determine more accurately which grade u0 is.
5. Node Vulnerability Analysis of a Power Grid in a Certain Region Under Earthquake Action
The partial grid of the San Francisco Bay area was selected as the research object in this paper. Due to the limitation of the acquired data, this paper adopted the following principles for the simplification of the power grid: (1) Only power plants and substations are reserved as nodes of the power grid, and transmission lines with voltage above 110 kV are used as edges. (2) Power plants and substations are regarded as indistinctive nodes; regardless of the influence of power flow direction and electrical parameters in the transmission line, it is abstracted as an unweighted and undirected edge. (3) By combining the transmission lines with the same pole, the self-loop and multiple edges in the topology model of the power network are eliminated, and the corresponding diagram becomes a simple one. According to the above principles, the grid in this area was simplified in
Figure 7, which contains 20 nodes and 27 edges. Nodes 1 to 5 are power plants, and the rest are substations.
In this example, the design earthquake was the largest earthquake in the history of this area, the Loma Prieta earthquake in 1989 (Ms = 7.0), with an epicenter 29 km north of Point 4, and the disconnection probability
Pi of each node was obtained. The hierarchical level
Gi and the critical threshold
Ti of each node for cascading failures were obtained. The initial values of these three indicators are listed in
Table 1.
The indicators were normalized to accommodate different dimensions. For the probability of disconnection
Pi and the critical threshold
Ti, larger values indicate greater vulnerability, so Equation (12) was used for normalization. For the hierarchical level
Gi, smaller values indicate greater vulnerability, so Equation (13) was used for normalization. Normalized data values are listed in
Table 2.
Node vulnerability in the power grid was divided into three categories, high, medium, and low; that is,
c = 3. The parameter values were
α = 2,
p = 2. Calculations were made according to the weight of each indicator,
w = (1/3 1/3 1/3). The cluster center matrix
was obtained using Equations (17) and (18).
The three cluster centers were
s1 = (0.0256, 0.728, 0.2969)
T,
s2 = (0.9137, 0.4058, 0.4405)
T,
s3 = (0.1023, 0.7867, 0.6959)
T. At this point, it was necessary to determine the preferential order of cluster centers. The ideal node with the greatest vulnerability (1, 1, 1) was used, and the distance from the ideal node to the three cluster centers
d and the degree of membership
u were calculated using Equations (15) and (17):
d = (1.232, 0.8207, 0.9715),
u = (0.206, 0.463, 0.331). Using
d and
u, it was found that the distance between node
s1 and node (1, 1, 1) was the greatest and its degree of membership was the least, which indicated that node
s1 represented the category with the least vulnerability. Node
s2 was at the other extreme and represented the most vulnerable category. Node
s3 was between the two. Based on this result, the optimal cluster center order was adjusted to
= (
s2,
s3,
s1). The corresponding membership matrix
was also adjusted, and the final result is shown in
Table 3.
According to the membership degree of 20 nodes in the three cluster centers in
Table 3, the level characteristic Equation (19) is applied to obtain the characteristic value
Hi of vulnerability level for 20 nodes, which are listed in
Table 4:
The vulnerability of each node was ranked using Equation (20), giving the following results. Nodes with high vulnerability were 9, 10, 12 and 20; nodes with medium vulnerability were 2, 4, 6, 7, 8, 13, 16, 17 and 18; and nodes with low vulnerability were 1, 3, 5, 11, 14, 15 and 19. The clustering results are shown in
Figure 8.
6. Result and Discussion
A vulnerability assessment was briefly analyzed. We found that the four nodes with high vulnerability were nodes with the highest probability of disconnection
Pi, showing a large discrepancy in node disconnection probability. Thus, the influence of this indicator on the results is significant. This is consistent with the observed characteristic that the fuzzy clustering algorithm is sensitive to outliers with large changes [
32]. The seven points with low vulnerability had a probability of disconnection 0, and the critical threshold of each node was also at a low level, which suggests that these nodes not only have a low probability of failure but also have little impact on adjacent nodes after failures. Intuitively they have low vulnerability, which matches everyday experience, showing that the variable fuzzy clustering method can well determine the vulnerability of power grid nodes to earthquake events.
Determining the vulnerability of power grid nodes is of great importance in determining what measures to take to reduce the effects of earthquakes and what post-earthquake emergency responses to initiate. For example, if a node loses its functionality due to structural damage, attention should be paid to improving the seismic grade of the facility, and post-earthquake priority should be given to inspecting and repairing nodes on the main trunk. For nodes with high vulnerability due to the topology of the network, the power grid should be optimized, which may include redundant facilities being added, power sources being better dispersed, and multi-loop power grids being created with each loop having a different power source.
Clustering is a typical algorithm for unsupervised learning. It is intended to explore and discover patterns in data samples and to find similar groups in them [
39]. This model can be run without any prior knowledge of the data, which makes it suitable for vulnerability analysis. Nowadays, research on vulnerability is to establish a system performance model. By removing the nodes to simulate the impact on system performance after the failure, this effect actually represents the pattern of vulnerability generation. Therefore, the accuracy of the model becomes a key factor in vulnerability analysis. However, due to the complex mechanism of vulnerability generation and even the lack of an accurate definition to describe it, the accuracy of various system performance models is currently under discussion. In the vulnerability analysis, the conclusions obtained by different system performance models are quite different or even completely opposite. For example, in the literature on the vulnerability of complex networks, the vast majority of research supports the view that nodes with a large load have a great impact on the network. However, Wang and Rong [
10] formed a different conclusion after studying the failure mechanism of the power grid in the Western United States. They found that if the parameters of the model meet certain conditions, attacking the node with a small load is more likely to cause a large-scale collapse than if the load is large. Another example is the use of pure models and extended models in the literature [
14] to study the vulnerability of the power system, and the conclusions obtained are also divergent. Therefore, the purpose of using the clustering method in this paper is to break out of the limitation of system performance model and analyze the vulnerability from a completely different perspective.
The method described in this paper is more comprehensive in analyzing the impact of earthquake damage on the power grid than were previous studies, which have usually predicted the effects of node failure on the power grid by removing a node from the grid to determine its vulnerability through analysis of power flow and network topology. However, previous studies have assumed that each node has an equal probability of being destroyed, which is contrary to real-world observation. Some researchers have simulated the grid to quantify its vulnerability to earthquake damage by attaching a probability to each node. However, the computation required for this sort of simulation is huge and complex, and this approach has been unsuccessful so far [
40,
41]. We used the probability of a node being disconnected due to earthquake activity as an indicator of vulnerability and included it in the cluster analysis for a more realistic and reasonable consideration of the effect of an earthquake than the previous study using the same probability assumption.
7. Conclusions
It is difficult to quantify the vulnerability of a power grid to earthquake damage because of its complexity. Most previous studies have used a single indicator of grid vulnerability, which shows that only one perspective of vulnerability is taken into account. The single-index approach fails to represent the vulnerability of the grid comprehensively or accurately. We used three indicators of the vulnerability of the grid to earthquake damage in this study, the probability of disconnection, the hierarchical level, and the critical threshold of the power grid, together with the variable fuzzy clustering model, to obtain a more comprehensive measure.
The use of the indicators and methods proposed in this paper can objectively and accurately assess the vulnerability of grid nodes, but there are still some shortcomings in the research. First, because there is no one agreed-upon precise definition of vulnerability, the choice of an appropriate set of indicators that accurately reflects vulnerability remains a problem that needs to be studied more deeply than we were able to. In this paper, the pure model is used to calculate the critical threshold, topological metrics identify a first level of vulnerability in the physical structure. However, the flow of electric power in power grid follows Kirchoff’s laws, using only topology metrics, ignoring power grid characteristics and technical constraints may lead to inaccurate results. Therefore, in our future work, the influence of technical constraints (voltage, resistance, maximum power, etc.) should be taken into account. Second, power grid performance is also an important indicator of vulnerability. We used the probability of disconnection and the critical threshold as alternatives to functional indicators. This choice is acceptable for analysis of vulnerability to earthquake damage, but functional indicators are also likely to provide accurate measure of the vulnerability of the grid and must be considered in future research. Third, the case power grid used in this paper is small in scale, which is inconsistent with the characteristics of large-scale modern power grid. In future work, the methodology should be tested in a larger-scale power grid. At last, with the development of power grid technology, smart grid has become a new and vibrant research field. In smart grid, the network topology may be frequently changed to optimize its behavior. How to evaluate the impact of structural changes on vulnerability is also an important research field.