1. Introduction
In the field of power system analysis and optimization, building precise and comprehensive models is essential for predicting and maintaining power network balance, which directly impacts the grid’s reliability and efficiency [
1]. The rapid advancement of grid measurement systems and big data technologies poses significant challenges to grid security systems in terms of modeling, simulation, and fault prediction [
2]. The increasing uncertainty in renewable energy use presents unprecedented challenges to the safe and cost-effective operation of power systems. Consequently, in-depth analysis and utilization of grid operational data have become essential to ensuring the safe and stable operation of power grids [
3,
4].
In this context, timely and accurate identification of grid topology and its changes using real measurement data has become crucial for data-driven grid operation and control. Network topology analysis is fundamental for enabling advanced power system management applications. Any errors in the topology can compromise data analysis in these applications and increase the likelihood of security analysis failure. Many identification methods, however, have low error tolerance, require extensive redundant information, and rely heavily on complete active power data.When data is missing or there are significant errors, the identified optimal data combinations will be directly affected [
5]. The algorithm also needs a lot of redundant data to support it, and since power management units (PMUs) are not used very often right now, getting the extra data would cost a lot of money [
6]. All of this adds to the algorithm’s complexity and implementation costs.
Traditional topological error identification methods include rule-based methods [
7], residual methods [
8], state estimation methods [
9], and Artificial Neural Networks (ANN) [
10,
11], among others. The rule-based method is simple and fast to implement. It requires minimal computational and human resources when integrated into existing systems, and is reliable under various conditions. However, it is less effective for identifying plant-station topological errors [
7]. Residual methods are straightforward after state estimation, but when both analog bad data and topological errors are present, the residuals will exceed the threshold, making it difficult to distinguish between analog and topological errors [
8]. Traditional state estimation methods establish a relationship between measurement residuals and topological branches using the measurement branch association matrix, applying covariance principles to identify topological errors [
9]. In addition to traditional state estimation, Generalized State Estimation (GSE) is widely used for topological error identification [
12]. This method integrates network topology and system parameters, and the algorithm converges efficiently. However, GSE faces challenges with the large number of analyzed nodes, high dimensionality of the iteration matrix, and slow computational speeds. Recently, Wu et al. treated the measurement data as a non-smooth Gaussian process, detecting errors by testing whether the mean vector of the process is zero, independent of the convergence of the standard weighted least squares (WLS) state estimation algorithm [
13]. This method addresses the issue of non-convergence in traditional state estimation algorithms and is highly effective in detecting topological errors. These methods do not require extensive historical data for modeling, and have transparent, easily understandable, and verifiable algorithmic steps, providing new solutions and advances in grid topological error identification. However, traditional methods lack flexibility in handling complex and dynamic grid structures and are inefficient in processing large-scale datasets [
14].
In recent years, deep learning methods have been extensively applied to power grid topology identification, with commonly used models including convolutional neural networks (CNN), deep neural networks (DNN), and graph convolutional networks (GCN). Wang et al. employed artificial neural networks and power grid balance equations for the identification of grid topological errors [
15]. By treating the measured residual state estimation as the optimization objective and the identification outcomes of the state estimation as the input, the neural networks were repeatedly trained to identify switching state errors and electrical connection errors more quickly. Tian et al. employed CNN for smart grid topology identification, which aims to address the issues of frequently changing power grid topology and challenges in learning deep features of data [
16]. Additionally, Zhong et al. formed a CNN-LSTM hybrid neural network by combining the spatial feature extraction capability of CNN and the temporal feature extraction capability of LSTM [
17]. This allowed them to finally classify the samples to obtain the topology relationship, thus improving the accuracy of topology identification. Through the introduction of deep learning and neural network approaches, these studies significantly advance the development of smart grid topology identification technology [
18].
However, deep learning methods are less frequently applied in topological error identification. Traditional artificial neural networks, although generally effective in topological error identification, require extensive training samples, limiting their practical application in real systems. As deep learning models become widely adopted, traditional artificial neural networks struggle to meet the requirements for topological error identification in modern large-scale power grids. While CNN, DNN and similar networks typically handle data with Euclidean characteristics and are restricted to spatial and local feature extraction, this limitation makes it challenging to process non-Euclidean data and capture relationships between nodes, leading to insufficient exploration of the network structure and node connections. In contrast, graph convolutional networks (GCNs) are well-suited for processing non-Euclidean data, enabling the identification of dynamic topological errors in power grids through graph convolution operations [
19].
GCNs offer a broad range of applications in grid engineering and power systems, including grid topology identification, state estimation, and fault diagnostics [
20,
21]. For instance, Zhang et al. introduced GCNs into fault diagnosis to address the limitations of traditional machine learning methods [
22]. They employed GCNs for node information aggregation and wavelet transforms to analyze time-domain fault data, achieving higher accuracy and robustness than CNN and DCN. In fault recognition, Li et al. proposed a novel approach to address the challenges of low accuracy and poor real-time performance in existing methods [
23]. They modeled grid topology as line graphs and trained GCNs using node features. Therefore, based on the insights from these studies, applying GCNs to topological error identification is feasible.
Knowledge graphs are increasingly applied across various industries due to their ability to clearly and flexibly represent relationships between concepts, entities, and their attributes. For instance, Feng et al. utilized the knowledge graph’s rapid analysis capabilities and robust semantic processing to develop a fault operation and maintenance knowledge graph (FOM-KG) within the power information collection system [
24]. This significantly improved the efficiency of operation and maintenance processes. Many researchers combine knowledge graphs with GCNs, as the graphical nature of knowledge graphs aligns with GCNs’ data processing requirements. To address the limitations of using a single knowledge graph approach, Wang et al. applied both knowledge graphs and GCNs for topology identification, using GCNs to analyze relationships between elements identified through the knowledge graph [
25]. Inspired by this study, we constructed a knowledge graph for grid data, which was subsequently used with GCNs to identify topological errors.
The topology of a power system represents the relationships between nodes in graphical form, making the knowledge graph an ideal tool for representing this topological information. However, topological errors can arise in the network due to events like line disconnections or data loss. GCNs, designed to analyze graph data and explore relationships between entities in a knowledge graph, can effectively learn the connections between nodes and edges to detect potential topological errors. Thus, the characteristics of knowledge graphs and GCNs make them well-suited for identifying topological errors.
This paper further investigates the combined application of the Knowledge Graph and Graph Convolutional Network (GCN), building on the work of previous researchers. The main contributions of this paper are as follows:
We propose a knowledge graph construction method for power grids that transforms nodes (e.g., generators, transformers, lines) and their relationships into a knowledge graph, offering rich contextual information for topology analysis.
To address the limitations in information aggregation with traditional GCNs, we propose a new GCN model incorporating mechanisms like multilayer graph convolution, neighborhood learning, and splicing operations. This model enhances the ability to capture both node and neighborhood features while preserving node-specific information, thus improving model flexibility.
By combining the knowledge graph with the improved GCN, we input the processed knowledge graph data into the GCN for training. The GCN can identify potential topological errors, such as line disconnections or data loss, by learning node and edge features. Test results demonstrate that this method enhances identification accuracy and robustness compared to traditional GCN–knowledge graph combinations, offering an effective solution for intelligent topological error identification in larger-scale power grid systems.
3. Methodology of the Proposed Method
3.1. Knowledge Graph Construction
The knowledge graph construction process is illustrated in
Figure 6.
Semi-structured data is used to construct the knowledge graph, with information extracted using specific wrappers for formats like HTML and XML.
The grid steady-state model data from the State Grid Cloud is primarily presented in the form of Excel sheets containing semi-structured data. Python is used to extract necessary data from the Excel sheets, including values such as the start point, end point, first-end active, and first-end reactive power, as shown in
Figure 7. The data is then organized into a ternary entity-attribute structure, forming an array with elements such as: branch, power status at the endpoint, corresponding attribute value; branch, start point, node serial number, etc. Finally, the ternary form of the knowledge graph is saved in a JSON file for later error analysis and optimization.
After information extraction, the data enters the knowledge fusion phase, which includes entity disambiguation and co-reference resolution. These processes aim to eliminate the ambiguity of entity references and improve the accuracy of entity associations and referential relationships, ensuring more precise semantic interpretations for knowledge mapping tasks.
Following knowledge integration, a series of knowledge processing steps, including ontology construction, are performed. Ontology construction entails designing and building the ontological framework of the knowledge graph. In this context, an ontology refers to a formal expression for describing the relationships between concepts and entities in a particular domain, which defines terms, classifications, attributes, and relationships within the domain and specifies the hierarchies and constraints between them.
Data Intelligence Diagnostics play a crucial role in the construction and maintenance of the Knowledge Graph. This task is a comprehensive and automated analysis of the data in the knowledge graph, with the aim of revealing data quality issues, potential errors, and inconsistencies, and providing appropriate adjustments.
This study introduces the visualization function, incorporating an intelligent search and recommendation engine, aimed at improving the operability and user experience of the knowledge graph. The knowledge graph of power grid data is constructed and visualized through the following steps, utilizing the theory of infinite automata and the data visualization tool Neo4j: First, the power grid data are extracted from the Excel sheet. Next, the infinite automata are employed to extract keywords from the data, which are then organized into triples and imported into the Neo4j database for visualization. The visualized knowledge graph provides detailed node information, including node number, reactive load estimate (WLS voltage magnitude), node name, reference voltage, and node type.
3.2. GCN Structure
The GCN structure of this paper is shown in
Figure 8.
After processing the knowledge graph data, GCN processes the graph structure data by jointly learning node features and the adjacency matrix. This method enhances information extraction, and increases the model’s efficiency in learning deeper graph structure features.
The GCN accepts two main inputs: node feature matrix and adjacency matrix. The Node Feature Matrix captures the features of each node, while the Adjacency Matrix defines the connectivity relationship among the nodes [
38,
39]. In GCN, the update of each node depends on its own features and those of neighboring nodes weighted by the adjacency matrix.The result of neighborhood aggregation can thus be represented as:
where
represents the adjacency matrix,
denotes the feature matrix, and
is the result of neighborhood aggregation.
To mitigate information loss during the convolution process, this paper introduces concatenation and employs a three-layer graph convolution. The graph convolution process for each layer is as follows:
where
,
and
represent the output, input, and neighborhood aggregation result of the
i-th layer, respectively.
During graph convolution, the graph convolution layer linearly transforms node features into a new feature space, where the adjacency matrix aggregates the transformed features from neighboring nodes to achieve feature fusion. Compared to the traditional GCN, the model in this paper retains feature splicing before and after aggregation, which better preserves the original node information and addressing the issue of information loss during convolution. Moreover, the additional linear and BN layers improve the model’s learning ability and enhance the flexibility and adaptability in the topological error identification.
3.3. Loss Function
The loss function used in the network training process is the Cross-Entropy Loss Function [
40,
41], which is formulated as follows:
where
is the total loss of all nodes,
is the true value of the
i-th node, and
is the observed value of the
i-th node. Finally, in the output section, the feature matrix output by the network is used to represent the classification result corresponding to each node, i.e., the final topological error identification result.