1. Introduction
With the ever-increasing penetration of renewable energy [
1,
2] and the application of power electronic devices [
3], the structure and operating conditions of transmission networks become increasingly complex, posing unprecedented challenges to the stability of traditional power systems. Changes in the network structure and power flow necessitate a reassessment of their impacts on power system stability [
4]. Particularly, as the operational modes of traditional equipment and the distribution of power flows have been altered, various aspects such as rotor angle stability, voltage stability, and frequency stability are affected. Additionally, new stability issues, including electromechanical-like low-frequency oscillation and electromagnetic wideband oscillations, are gradually emerging [
5]. Although renewable energy devices connected to the grid via power electronics improve flexibility and environmental sustainability [
6], their low inertia characteristics weaken the system’s resistance to frequency disturbances [
7]. Furthermore, the rapid dynamic response of power electronic devices, along with their complex control strategies, complicates the analysis and control of transient stability in power systems. Consequently, there is an urgent need for an accurate and rapid TSA method to ensure the safe and stable operation of power systems under these evolving conditions.
Traditional TSA methods primarily rely on energy-based direct methods [
8] and time–domain simulations [
9]. However, these approaches are increasingly facing challenges in addressing the uncertainty and complexity introduced by high-penetration renewable energy systems [
10]. In recent decades, the rapidly developing deep learning (DL) has been generally applied to build the TSA model for the power system [
11], and compared to traditional transient stability assessment methods, DL establishes a mapping relationship between the physical characteristics of power systems and the results of transient stability assessment through offline training. Subsequently, the trained model is applied for online transient stability prediction in power systems. On the one hand, the trajectories of state and operational variables following a disturbance can directly reflect the transient stability of the power system [
12,
13]. In [
14], a deep belief network (DBN) is presented to precisely model the mapping relationship between the input and the output, realizing the accuracy evaluation. A novel evaluation of power system transient stability assessment (TSA) based on a convolutional neural network (CNN) has been proposed, which significantly enhances the speed of batch assessment [
15]. The long short-term memory (LSTM) is employed in [
16,
17] to extract features from voltage phasors at different time intervals, resulting in a time-adaptive TSA model. Furthermore, the integration of LSTM with parallel CNN effectively captures temporal relationships within transient processes, thereby improving the performance of the TSA model [
18]. The aforementioned methods utilize electrical measurement data as inputs to construct power system stability assessment models based on temporal characteristics, achieving a certain level of effectiveness in stability assessment [
19]. On the other hand, considering the influence of power grid topology on power system transient stability, some graph neural network (GNN)-based methods have been employed for TSA [
20]. To mine the impact of the power grid topology, the graph convolutional network (GCN) is adopted to improve the prediction performance in TSA [
21]. Additionally, the graph attention network (GAT) is utilized in [
22] to effectively aggregate spatial features during transient processes, while residual structures are introduced to prevent network degradation and enhance the accuracy of TSA.
Existing DL-based TSA models typically focus on either the temporal measurement data of the power system or the spatial characteristics of grid topology [
23]. However, these models rarely integrate both aspects to capture the spatiotemporal features of grid data. Incorporating both spatial and temporal features is crucial for accurately evaluating power system stability [
24,
25]. By analyzing and incorporating the interactions between these features, a comprehensive understanding of the dynamic behavior during transitions can be achieved, leading to more reliable assessments of power system stability. During major disturbances, such as faults and stability control actions, the transient processes of power systems exhibit significant spatiotemporal correlations. Analyzing these correlations provides a comprehensive understanding of the dynamic behavior during transitions, leading to more reliable assessments of power system stability [
26,
27]. Furthermore, as power systems continue to grow in scale, the challenge of prolonged model training times has become increasingly prominent [
28,
29]. Under limited resource conditions, the effective and rapid training of models on large-scale grid data remain a critical challenge that must be addressed [
30].
To overcome these limitations, we propose a spatiotemporal graph convolutional network approach with graph simplification (GS-STGCN) for TSA, which improves prediction performance and accelerates TSA evaluation. The contributions of this paper are as follows:
A node influence-based graph simplification method is proposed, incorporating both the topological and electrical characteristics of the power system. This approach introduces pruning parameters to remove less influential nodes, thereby optimizing the model input and enhancing the training efficiency of the TSA model.
A spatiotemporal graph convolutional network (STGCN) is designed to simultaneously capture spatial and temporal features during transient processes. It combines Gated Convolutional Network (Gated CNN) with GCN to effectively extract spatiotemporal features, aiming to integrate temporal sequence data with topological information and develop a TSA model with enhanced performance.
The focal loss function has been improved to adjust the impact of samples with varying levels of difficulty on model training. This adjustment helps prevent premature convergence caused by excessively small losses in later stages, which could otherwise degrade assessment performance. Additionally, an adaptive class weighting factor is introduced to reduce the need for manual parameter tuning.
The remainder of this paper is organized as follows:
Section 2 introduces the GS-STGCN methodology;
Section 3 elaborates on the proposed TSA framework; and
Section 4 conducts case studies on the IEEE 39-bus system; finally,
Section 5 concludes the whole paper’s work.
2. GS-STGCN Methodology
2.1. Node Influence-Based Graph Simplification Method
The expansion of the power system scale will cause the growth of input data. In power systems, some nodes exhibit noteworthy importance and carry rich information, exerting a substantial impact on TSA. Conversely, other nodes contribute relatively less to model training, but they may increase training time and memory usage. Therefore, a node influence-based graph simplification method (GS) is proposed to compress large-scale power grid input data, thus improving the calculation speed of the TSA model.
The K-Shell algorithm (KS) is a coarse-grained node importance assessment method based on the entire network structure. The algorithm stratifies the network from the periphery to the core and assigns a KS value to each node, reflecting its importance within the network. The magnitude of the KS value effectively distinguishes core nodes from non-core nodes. A larger KS value indicates that the node is closer to the core of the network and possesses greater influence.
Additionally, the node information entropy is utilized to assess the amount of information a node carries within the power system. Calculating the information entropy of a node mainly relies on the node’s neighborhood degree information, which can be expressed as
where
is the set of neighbor nodes of node
;
is defined as
where
is the degree of node
.
Considering the location and information entropy of the nodes in the network, the topological entropy impact indicator is defined as
where
is the KS value of node
.
In power systems, power flow is a critical indicator of a node’s output capacity and load demand, reflecting the node’s importance within the system. The active power and reactive power at steady-state conditions are selected to jointly assess the node’s power flow. The power flow influence indicator can be defined as
where
and
are the active power and reactive power of node
in the steady state, respectively.
Considering both the topological entropy and power flow influence of a node within the system, a comprehensive influence indicator can be expressed as
where
and
are the weights of the two impact assessment indicators;
is the MIN-MAX normalization method. The comprehensive influence indicator of a node quantifies the importance of each node in the power grid by considering both the system’s topological entropy and power flow influence.
The importance of each node in the power grid is quantified using a comprehensive influence indicator. This indicator is then used to simplify and compress the system’s graph data structure. The steps are as follows:
Comprehensive Influence Index Calculation The comprehensive influence index is calculated by considering both the topological entropy and the power flow influence of each node within the system. Using Equations (1)–(5), the comprehensive influence index for each node in the power grid is computed and ranked in ascending order, resulting in the sorted node influence vector:
where
is the number of nodes.
Node Selection and Elimination Nodes and their connected transmission lines are removed sequentially according to their order in the set , with continuous checks to ensure that the power grid remains connected. If the grid becomes disconnected after a pruning operation, the removal is reversed and the process moves to the next node. This procedure continues until the proportion of removed nodes exceeds the predefined threshold , at which point the process is terminated.
Reconstruction of the Input Feature Matrix After pruning, the topology and feature matrix of the remaining nodes in the power grid are reconstructed, resulting in a reduced number of nodes and corresponding feature data. The impact on the grid topology before and after pruning is illustrated in
Figure 1.
2.2. TSA Based on Spatiotemporal Graph Convolutional Networks
TSA is a classification task that relies on spatiotemporal features. The spatial structure of the power grid represents the spatial features, while the temporal variations in electrical quantities during transient processes represent the temporal features. In this study, GCN [
21] and Gated CNN [
31] are employed to capture these spatial and temporal features, respectively, enabling the accurate extraction of transient stability characteristics.
CNN and GCN are classic neural network models widely used for feature extraction. CNN is primarily designed for data in Euclidean space, excelling at extracting features from two-dimensional tensors. In contrast, GCN is specifically tailored for processing graph-structured data, making it ideal for extracting spatial features from the nodes within the power grid’s graph structure.
The power grid structure can be defined using the normalized graph Laplacian matrix as follows:
where
is an identity matrix,
is the diagonal degree matrix of the power grid graph structure
;
is the adjacency matrix of
;
is the matrix of eigenvectors of
;
is the diagonal matrix of eigenvalues.
The graph convolution formula can be defined as
where
is the input of the spatial convolutional layer;
is graph convolution operator;
is the convolution kernel;
is the Hadamard product.
Using
as the learnable convolution kernel, the graph convolution formula is
To accelerate the computation of Equation (9), the Chebyshev polynomial is employed to approximate the convolution kernel, which can be obtained as follows:
where
;
is the largest eigenvalue of
;
is the vector of Chebyshev polynomial coefficients.
is the H-order Chebyshev polynomial and it can be expressed as
where
,
.
By introducing Chebyshev polynomials, the graph convolution formula is derived as follows:
The temporal convolutional layer comprises two key components. First, Gated CNN is used to capture long-term dependencies across different time steps. Second, the Gated Linear Unit (GLU) serves as the nonlinear activation function, effectively integrating both linear and nonlinear transformations. In the temporal convolutional layer, the input for each node is a sequence with
input channels and a length of
, represented as
. The convolution kernel
is designed to perform a one-dimensional convolution operation on
, with the output can be defined as
where
and
are the input of gates in GLU respectively;
denotes the activation function;
is the number of output channels;
is the size of the convolution kernel. Gated CNN is employed to extract temporal features of the transient process. Compared to Recurrent Neural Network (RNN) and their derivatives, Gated CNN offers faster training speeds and is less prone to gradient-related issues.
To accurately and efficiently fuse spatial and temporal features, a spatiotemporal graph convolutional network is designed, as illustrated in
Figure 2.
In this network, the power flow data
is the input for the temporal convolutional layer
, while the adjacency matrix
of the power grid’s graph structure is the input for the spatial convolutional layer
. First,
is input into
, where Gated CNN and GLU are used to capture temporal dependencies, performing convolution operations along the time dimension. The extracted temporal features are then combined with the spatial features, followed by an activation operation to perform graph convolution along the spatial dimension. Subsequently, the temporal and spatial features are further integrated by processing them through the temporal convolutional layer
, ultimately producing the fused spatiotemporal feature
, defined as:
2.3. GS-STGCN Model
To efficiently and accurately perform TSA, this paper proposes a Graph Simplification-based spatiotemporal graph convolutional network (GS-STGCN) model, as illustrated in
Figure 3. The model includes several key components: a GS module, two STGCN layers with a residual structure, and four fully connected layers followed by a SoftMax layer. The GS module preprocesses electrical measurement data by removing redundant information, enhancing computational efficiency, and allowing the model to focus on relevant features. The STGCN layers extract spatial and temporal features from the graph data, capturing complex patterns in power system behavior over time. The residual structure between the STGCN layers facilitates gradient flow during backpropagation, mitigating the vanishing gradient problem and enabling deeper architectures without sacrificing performance. The four fully connected layers process the extracted features, allowing the model to learn complex interactions. This architecture improves the model’s generalization and prediction accuracy. Finally, the SoftMax layer converts the outputs into probability distributions over stability classes, yielding the final stability assessment result. This structured approach ensures that the GS-STGCN model effectively integrates graph simplification, feature extraction, and classification for reliable TSA.
3. GS-STGCN-Based TSA Model
3.1. Feature Input and Output
To enhance the accuracy of TSA, it is essential to comprehensively consider the spatiotemporal distribution characteristics of transient stability data. Consequently, the input features of the assessment model should be constructed from both temporal and spatial perspectives. For temporal features, variations in voltage magnitudes and phase angles can intuitively reflect the system’s response characteristics, making node voltage amplitude and phase angle time series data suitable input features. For spatial features, the system’s topological structure is used as an input feature to account for the impact of the power system’s topological connections on energy transmission paths. And it captures the spatial evolution patterns of transient stability data by considering the dynamic interactions between nodes.
TSA is essentially a binary classification problem, where the labels 0 and 1 correspond to stable and unstable samples, respectively. The criterion for stability is determined based on the Transient Stability Index:
The is calculated using the maximum rotor angle difference between generators as its benchmark. A positive () corresponds to a stable system, which is labeled as (1,0), indicating that the system can maintain a synchronous operation following a disturbance. Conversely, a non-positive () is associated with an unstable system, labeled as (0,1), suggesting that the system has failed to sustain synchronization. Fundamentally, the assesses the capacity for synchronous machine operation within an electrical power system by determining if the rotor angle difference is greater than 360°. If is less than this critical threshold, it indicates that the system has a sufficient stability margin to resist disturbances and allows for the resynchronization of generators. In these cases, the is positive, confirming the system’s stability. In contrast, if the angle difference is equal to or exceeds 360°, it reflects an inability to maintain synchronous operation, resulting in a that is either negative or zero, which is indicative of system instability. Moreover, is intrinsically linked to the dynamics of power systems as it quantifies the system’s inertial response following a significant disturbance, such as a fault or a change in load. The value provides a quantitative measure of the system’s resilience, which is essential for comprehending the dynamic behavior of power systems during transient stability evaluations.
3.2. The Adaptive Focal Loss Function
In practical power system applications, class imbalance is commonly observed, with the number of stable samples markedly exceeding that of unstable ones. During training, the abundance of stable samples results in more frequent parameter updates for these samples. Consequently, these updates dominate the overall contribution to the loss function, leading to an underfitting of the unstable samples. This imbalance increases the likelihood of the model misclassifying unknown samples as stable.
Simultaneously, the training dataset includes both easily classified samples and difficult samples that are prone to misclassification. A large number of simple samples can dominate the gradient update direction, causing the model’s overall learning process to deviate and leading to ineffective learning. During iterations involving a substantial number of simple samples, the loss function may not decrease toward the desired optimization direction or may fail to reach the optimal solution.
An effective approach is to modify the model’s loss function during training. The focal loss (FL) function, an improvement of the traditional cross-entropy loss function (CE), is expressed as
where
is the predicted probability;
is the class weighting factor used to adjust the balance between positive and negative samples—when
is closer to 1, the contribution of negative samples to the loss function increases.
is the difficulty weighting factor, where a larger
increases the contribution of difficult samples to the loss function.
Intuitively, the FL function addresses the class imbalance between stable and unstable samples through , and it tackles the imbalance between easy and difficult samples via . However, as the model’s recognition ability improves during training, the distinction between easy and difficult samples evolves. Consequently, fixing the value of may negatively impact the later stages of training. Additionally, treating as a hyperparameter requires extensive experimentation to optimize the loss function, which can lead to considerable computational time and resource consumption.
To address the issues mentioned above, adaptive class weighting factors and dynamically changing difficulty weighting factors are introduced based on the FL function, resulting in the improved focal loss (IFL) function. The improved
and
are defined as follows:
where
and
represent the number of positive and negative samples, respectively;
is the total number of training iterations; and
is the initially set difficulty weighting factor.
3.3. Classification Performance Metrics
The evaluation results of the TSA method are presented in
Table 1. Four evaluation metrics—accuracy
, precision
, recall
, and F1-score
—are used to comprehensively assess the performance of the different TSA models.
where
is the number of samples predicted by the model to be positive and actually positive;
is the number of samples predicted to be negative and actually negative;
is the number of samples predicted by the model to be positive but actually negative, and
is the number of samples predicted by the model to be negative but actually positive.
4. Case Study
To verify the effectiveness and applicability of the proposed method, the model is implemented using PaddlePaddle (version 2.4) within a Python (version 3.9) environment and applied to the IEEE 39-bus system for case analysis.
4.1. Database Generation
The Dynamic Simulation Program is used to simulate and generate transient stability data. The load level is increased from 70% to 110% in 10% increments. The fault type is a three-phase short circuit, with faults occurring at 20 different locations, increasing by 5% from 0% to 95%. Fault durations vary from 2 to 18 cycles. A total of 15,000 samples are generated, including 9500 stable samples and 5500 unstable samples. Simulation Data Link:
https://github.com/yangyuan123654/yydata (accessed on 12 October 2024).
4.2. Performance Testing and Analysis of GS-STGCN Model Evaluation
To verify the superiority of the GS-STGCN model for TSA, it is compared against the SVM [
32], GCN [
21], GAT [
22], and LSTM [
16] models. The SVM model uses a radial basis function (RBF) kernel, with sample feature data transformed into a one-dimensional format to meet the kernel’s requirements. The LSTM model comprises two LSTM layers, followed by a fully connected layer and a SoftMax output layer. The GCN model consists of two stacked GCN layers, a fully connected layer, and a SoftMax output layer. Similarly, the GAT model consists of two stacked GAT layers, a fully connected layer, and a SoftMax output layer. The first GAT layer utilizes eight attention heads, while the second GAT layer employs two attention heads, with the outputs from these heads averaged. All DL models use the IFL loss function and the Adam optimization algorithm. The pruning parameter for the GS-STGCN model is set to 0.3. The performance results of the five models are presented in
Table 1.
The test results indicate that, as a shallow model, the SVM is limited to capturing only surface-level features of the data and struggles to identify deeper, more complex patterns. Consequently, its performance is notably inferior to that of the GCN, GAT, and LSTM models. By effectively leveraging the topological information of the power grid, both the GCN and GAT models achieve accuracies of 98.62% and 98.87%, respectively, outperforming the LSTM model. However, the proposed GS-STGCN model surpasses the GCN, GAT, LSTM, and SVM models across all four evaluation metrics. This superior performance can be attributed to the GS-STGCN model’s capability to comprehensively capture the complex topology of the power grid while simultaneously extracting temporal dynamics.
4.3. GS Performance Verification
To validate the effectiveness of the GS, it is applied independently to the STGCN, GCN, and GAT models, with the pruning parameter set to 0.3. Transient stability prediction is then conducted by comparing GS-STGCN with STGCN, GS-GCN with GCN, and GS-GAT with GAT. Each model is iterated for 100 rounds, and the results are analyzed in terms of prediction accuracy and model training time. The comparison of GS effectiveness verification results is shown in
Figure 4.
Models employing the GS exhibit faster training speeds compared to their original counterparts. By reducing the complexity of topology and feature matrix of the power system, GS lowers both computational and storage requirements. Notably, the GS-GCN, GS-GAT, and GS-STGCN models improve their training speeds by 19.16%, 14.12%, and 20.08%, respectively, significantly accelerating the training process for TSA tasks. Additionally, the GS technique not only enhances task efficiency but also improves prediction accuracy. Specifically, the GS-GCN, GS-GAT, and GS-STGCN models increase their prediction accuracy by 0.16%, 0.08%, and 0.14%, respectively. This indicates that the GS technique effectively simplifies the power grid’s topology and feature matrix by removing redundant information. It allows the models to focus on learning and analyzing critical information, thereby improving the precision and accuracy of the predictions.
4.4. Performance Verification of IFL Loss Function
The performance metrics of GS-STGCN on the test set, trained using the CE function, FL function, and IFL function as loss functions, are shown in
Table 2. The accuracy of the model using the IFL function improves by 0.88% and 0.39% compared to the CE function and the FL function, respectively. The reason for this improvement is that the IFL function introduces an adaptive class weight factor compared to the CE function, which helps mitigate the impact of class imbalance between positive and negative samples. Compared to the FL function, the IFL function incorporates dynamic parameters to adjust the model’s focus on difficult samples, leading to more effective utilization of the data during training.
4.5. Validation of the Robustness of the Model to Noise
In practical online applications, the test dataset is derived from real-time PMU data. Given the inherent variability in PMU measurements of dynamic data, some level of error is inevitable. To assess the robustness of the model, Gaussian white noise—characterized by instantaneous values following a Gaussian distribution and uniform power spectral density—is introduced to the test set to simulate the measurement errors typically encountered in real-world scenarios. The trained models are then evaluated using the test set augmented with Gaussian white noise. The accuracy of various models under noisy conditions is displayed as
Figure 5.
It is observed that, compared to the noise-free scenario, each method’s accuracy is somewhat impacted by the presence of noise. Nevertheless, the proposed GS-STGCN achieves a transient stability prediction accuracy of 99.02%, outperforming the other methods. Furthermore, the accuracy of the GS-STGCN decreases by only 0.49% in the presence of noise, which is a relatively smaller reduction compared to the accuracy losses seen in the other methods. Therefore, the proposed TSA method based on GS-STGCN exhibits strong robustness to noise.
5. Conclusions
In this paper, we develop a TSA method based on a spatiotemporal graph convolutional network with graph simplification. First, the topology and electrical characteristics of the power grid are integrated, and the graph data of the power grid is compressed and input into a deep spatiotemporal graph convolutional network. By combining the spatial features of the power grid with the temporal features of the transient process, a high-performance TSA model is constructed. Second, the improved focal loss function is employed as the loss function of the model. The method proposed in this paper was validated through simulation studies conducted on the IEEE 39-bus system. The experimental conclusions are as follows:
- (1)
The GS method optimizes the model input by removing low-influence nodes, thereby compressing the large-scale power grid input data. This approach effectively enhances the training speed of the TSA model.
- (2)
The STGCN model can fully capture the spatiotemporal features of transient processes, effectively reflecting the changing patterns of the system’s transient behavior and ensuring the accuracy of the model’s evaluation.
- (3)
The FL function can adaptively adjust the weight factors of the loss function, allowing the model to handle sample imbalance during training. This reduces the misclassification of unstable samples while also enhancing overall accuracy.
Currently, the TSA model proposed in this paper primarily targets offline and online analysis scenarios based on simulation analysis. It effectively shortens model training time and improves evaluation accuracy. This advancement is particularly beneficial for utility companies and power system operators, as it allows them to conduct more rapid and reliable stability assessments without significant computational overhead. In future research, greater emphasis will be placed on integrating the mechanistic knowledge of power systems into the model to further enhance its interpretability and convergence. This integration will not only improve the model’s predictive capabilities but also provide deeper insights into the underlying dynamics of power systems, which can be crucial for operational decision making and strategic planning.