Next Article in Journal
The Construction of a Digital Agricultural GIS Application Suite
Previous Article in Journal
Study on the Hydrodynamic Effects of Bridge Piers Under Velocity-Type Pulse Ground Motion Based on Different Characteristic Periods
Previous Article in Special Issue
A Method for Measuring Spatial Information of Area Maps Considering the Diversity of Node–Edge and Gestalt Principles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning

1
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
2
Guizhou Survey & Design Research Institute for Water Resources and Hydropower, Guiyang 550002, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(22), 10704; https://doi.org/10.3390/app142210704
Submission received: 11 October 2024 / Revised: 8 November 2024 / Accepted: 12 November 2024 / Published: 19 November 2024

Abstract

:
Geospatial knowledge graphs provide critical technology for integrating geographic information and semantic knowledge, which are very useful for geographic data analysis. As the scale of geospatial knowledge graphs continues to grow, the distributed management of geospatial knowledge graphs is becoming an inevitable requirement. Geospatial knowledge graph partitioning is the core technology for the distributed management of geospatial knowledge graphs. To support geographic data analysis, spatial relationships between entities should be considered in the application of geospatial knowledge graphs. However, existing knowledge graph partitioning methods overlook the spatial relationships between entities, resulting in the low efficiency of spatial queries. To address this issue, this study proposes a geospatial knowledge graph partitioning method based on dual clustering which performs two different clustering methods step by step. First, the density peak clustering method (DPC) is used to cluster geographic nodes. The nodes within each cluster are merged into a super-node. Then, we use an efficient graph clustering method (i.e., Leiden) to identify the community structure of the graph. Nodes belonging to the same community are further merged to reduce the size of the graph. Finally, partitioning operations are performed on the compressed graph based on the idea of the Linear-Weighted Deterministic Greedy Policy (LDG). We construct a geospatial knowledge graph based on YAGO3 to evaluate the performance of the proposed graph partitioning method. The experimental results show that the proposed method outperforms ten comparison methods in terms of graph partitioning quality and spatial query efficiency.

1. Introduction

Geospatial knowledge graphs adopt structured graphs to effectively organize geographic knowledge by describing geographic concepts, geographic entities, and their inter-relationships [1]. The knowledge system represented by geospatial knowledge graphs is the key to transitioning from geographic data services to geographic knowledge services. Geospatial knowledge graphs are very helpful for geographic knowledge searches, intelligent geographic knowledge Q&As, geographic knowledge recommendations, and geographic data analysis [2]. In the era of big data, the scale of geospatial knowledge graphs is growing exponentially. For example, the LinkedGeodata dataset [3], first released in 2009, has more than 3 billion entities and 20 billion triples in its latest version. The GeoWordNet dataset [4], released in 2011, has over 3.6 million entities and 50 million triples in its latest version. The WorldKG dataset [2] released in 2021, contains more than 100 million entities and 800 million triples in its latest version. For large-scale geospatial knowledge graphs, standalone management methods can no longer meet current demands [5]. Distributed management is becoming an inevitable requirement. The partitioning of geospatial knowledge graphs is an essential task in large-scale knowledge graph management, which partitions a data graph into several subgraphs which are stored on different distributed cluster nodes [6]. Geospatial knowledge graph partitioning provides the foundational support for the distributed storage, management, querying, and reasoning of geospatial knowledge graphs. When executing a distributed query, the query is first decomposed into several subqueries based on the partitioning result. Then, the subqueries are assigned to various cluster nodes to obtain partial solutions. Finally, the results of the subqueries are connected to produce the final solution. The efficiency of query is highly relevant to the partitioning results. The more partitions passed during the query, the slower the query is.
Currently, knowledge graph partitioning is generally regarded as a graph partitioning problem. The optimization goals of node/edge partitioning strategies are (i) minimizing the number of cross-partition edges (cut edges) or nodes that have been duplicated in different partitions (replicated node) and (ii) ensuring balanced node or edge counts within each partition [7]. Minimizing the number of cut edges or replicated vertices aims to reduce network communication costs between cluster nodes in a distributed environment. Meanwhile, balancing the number of nodes or edges within each partition prevents the formation of computational hotspots and avoids excessive differences in processing capabilities. The graph partitioning problem is a well-known NP-hard problem [8,9], and no polynomial-time algorithm has been found to solve it thus far. Therefore, existing work has focused on finding approximate partitioning solutions rather than optimal ones [10,11]. Currently, according to different graph loading methods, existing graph partitioning algorithms can be categorized into streaming partitioning algorithms and offline partitioning algorithms.
Streaming graph partitioning algorithms load edges or nodes into a data stream according to certain rules (such as random-order, depth-first, or breadth-first traversal) and partition only one node or one edge at a time [12]. The hash partitioning method is the simplest streaming graph partitioning technique. It calculates a hash value based on the reading order or globally unique ID of vertices or edges and assigns vertices or edges with equal hash values to the same partition. Hash-based graph partitioning is nearly a random partitioning method, which leads to significant communication costs [13]. Xie et al. (2014) proposed the DBH (Degree-Based Hashing) algorithm, which improves upon the hash algorithm by prioritizing the partitioning of higher-degree vertices to enhance partitioning quality [14]. Degree-Based Hashing (DBH) is a hashing method that takes edges as input. For the nodes of an edge, it leverages the degree information of two nodes. Then, DBH replicates the node with a higher degree and assigns the edge according to the hash value of the node with a lower degree. However, the partitioning algorithms mentioned above do not consider historical partitioning information and have a certain level of randomness, resulting in a large number of cut edges or replicated vertices. This disadvantage leads to substantial communication costs. To solve this problem, many scholars have taken historical partitioning information into account. These algorithms establish a scoring function between the current data and each previously partitioned subgraph. When a new edge or vertex is loaded into the data stream, the partition is assigned based on the scoring function (e.g., Greedy algorithm [15], FENNEL (streaming edge-cut neural network for efficient graph partitioning) algorithm [16], HDRF (High-Degree Replicated First) algorithm [17] and Skewness-aware Vertex-cut Partitioner [18]). For instance, HDRF is an improved method of DBH. Instead of using the degrees of the nodes, HDRF constructs a score function based on the number of subgraph neighbors in the historical partition and computational load. The input edge will be assigned to the partition with the highest score. However, the above traditional streaming partitioning algorithms neglect the global structure of the graph and may fall into local optima. Researchers have introduced sliding window or buffer techniques to alleviate these limitations. These methods first load the data into a window or buffer to increase the number of candidate nodes/edges in one assignment, and then choose the best node/edge from the candidates for assignment to a partition [19,20,21,22,23]. Streaming graph partitioning algorithms are usually efficient, typically requiring only a single pass over the graph data to complete the partitioning. However, it is hard to capture the global structures of graphs with streaming partitioning algorithms, which usually lead to low-quality partitions [21].
Offline partitioning algorithms load the complete graph data into memory before partitioning. These algorithms can be further categorized into global and local search partitioning algorithms. Global search partitioning algorithms directly consider the global information of the entire graph to produce a partition. For instance, spectral partitioning methods calculate the eigenvector corresponding to the second-smallest eigenvalue of the graph’s Laplacian matrix and recursively bisect the graph until the partitioning goal is achieved [24]. The time complexity of spectral partitioning is too high to partition large-scale graphs. Multilevel partitioning algorithms [25,26,27,28,29,30,31] were further proposed to address this problem. Multilevel partitioning algorithms first coarsen a large graph into a smaller graph. Then, they apply a partitioning strategy (such as spectral partitioning [32], KL algorithm [33], or Greedy Graph Growth algorithm [34]) to the small graph. Finally, the partition is projected back onto the original large graph with further refinement. Local search partitioning algorithms optimize the initial partitioning based on local graph information. These algorithms can be classified into three categories: unit or group movement, label propagation partitioning algorithms, and neighborhood expansion-based partitioning algorithms. Algorithms based on unit or group movement aim to reduce edge cuts by moving units or groups between partitions to optimize the initial partitioning (e.g., KL algorithm [26], FM algorithm [35], and JA-BE-JA-VC algorithm [36]). However, the act of exchanging individual vertices or edges between partitions is prone to getting stuck in local optima [37,38,39,40,41]. To solve this problem, vertex group movement methods have been further developed [37,38]. These methods first detect vertex groups at partition boundaries, and then move these vertex groups based on movement gain and partition load to improve partition quality [40,41]. Label propagation-based partitioning algorithms randomly assign each node a label, and then iteratively update the labels under the constraint of balancing computational load between partitions [42,43]. The label of a node is updated to the most frequent label among its adjacent vertices. This update process continues until there are no label changes and loads of partitions are balanced [44,45,46]. Neighborhood expansion-based partitioning algorithms [27,47,48] first select seed points to form a core set, and then iteratively select appropriate vertices from the boundary set to join the core set. The boundary set is iteratively updated until the threshold of computational load is reached.
Based on the above reviews, one can find that the aforementioned methods only rely on the graph structure for partitioning and neglect the spatial relationship between entities. In the application of geospatial knowledge graphs (e.g., geographic knowledge recommendation and geographic data analysis), a spatial query is frequently required [49,50]. In Figure 1, we use a practical example to show the disadvantage of existing non-spatial graph partitioning methods. Different partitions are labeled by the color of nodes (orange and purple). Spatial queries are represented by orange and blue shadows. When a spatial relationship is neglected (Figure 1a), the two spatial queries need to run across partitions. The network communication between partitions will result in the increase in query response time. In contrast, when a spatial relationship is considered to produce partitions (Figure 1b), the two spatial queries can run in a single partition. To improve the efficiency of a spatial query in large-scale geospatial knowledge graphs, spatial relationships between entities must be considered in the partitioning of geospatial knowledge graphs [50,51]. Existing partitioning methods that disregard spatial relationships may lead to spatially adjacent entities being assigned to different partitions, negatively affecting spatial query performance. To address this issue, we treat the partitioning of geospatial knowledge graphs as a special spatial clustering problem and develop a dual clustering-based method for geospatial knowledge graph partitioning (GKGP-DC). GKGP-DC incorporates the constraints of spatial relationships between entity nodes into the optimization objective of graph partitioning.

2. Methods

GKGP-DC first identifies the geographic entities from a geospatial knowledge graph, and then extracts coordinates from these entities. Then, a spatial clustering method named density peak clustering is applied to obtained spatial clusters. Next, nodes within each spatial cluster are merged into super-nodes to ensure that spatially adjacent nodes can be assigned to the same partition. Subsequently, the graph clustering method named Leiden is used, further consolidating the aggregated graph into a weighted graph. The weighted graph not only accounts for geospatial correlation but also reflects the connectivity between entities in the graph. Finally, the partitions are produced by performing LDG on the weighted graph to balance the computational load of each partition. The overall process of the proposed method is shown in Figure 2.

2.1. Geographic Node Partitioning Based on Density Peak Clustering

To improve the efficiency of a spatial query, it is crucial to assign spatially adjacent nodes to the same partition. Therefore, GKGP-DC uses spatial clustering to pre-analyze the spatial structure of data. To find the natural spatial clusters from geographic nodes, we use a well-established spatial clustering algorithm, i.e., density peak clustering (DPC) [52], to cluster geographic nodes. DPC can naturally identify the clustering structure and automatically determine the number of clusters. The DPC algorithm is based on two fundamental observations: (i) the local density of the cluster center is higher than that of the surrounding samples; (ii) the distance between cluster centers is relatively large. DPC determines the cluster centers by identifying the density peaks in the dataset, and then defines the cluster boundaries based on density and relative distance. Inspired by DPC-KNN [53] and an adaptive cluster center selection method proposed by Sun et al. [54], we partition the geographical nodes in the following two steps:

2.1.1. Definition of Local Density and Relative Distance

As mentioned above, local density and relative distance are the basis of choosing cluster centers and determining the cluster boundaries. In this study, we used the definition of local density proposed in DPC-KNN [53]. The local density of a node is defined as the exponent of the mean square distance between the node itself and its k-nearest neighbors, as follows:
ρ i = exp 1 k j K N N i d i , j 2
where ρ i represents the local density of node i , k denotes the number of nearest neighbors, K N N i represents the set of the k nearest nodes to node i , and d i , j represents the Euclidean distance between node i and node j . Calculating the local density based on the k-nearest neighbors can assess the local structure of nodes and is helpful for identifying the clusters with different densities [53]. The relative distance is defined as the minimum distance between a node and the set of nodes with higher local density. For the node with the highest local density, the relative distance is defined as the maximum possible distance between two nodes (Equation (2)).
δ i = m i n j : ρ i > ρ j d i , j   ρ i < max ρ m a x j i d i , j           ρ i = max ρ
where  δ i represents the relative distance of node i , and max ρ represents the maximum local density.

2.1.2. Cluster Center Selection and Node Agglomeration

To determine the cluster centers, the DPC algorithm typically involves manually selecting the cluster centers according to a decision diagram, as shown in Figure 3. However, this manual selection reduces the efficiency of DPC and introduces subjectivity into the results. Therefore, the strategy proposed by Sun et al. [54] is adopted for determining the cluster centers. First, nodes where both the local density ρ and the relative distance δ are greater than the remaining 80% are selected as candidate nodes. Then, the candidate nodes are ranked in descending order based on the product value of δ and ρ ( γ = δ × ρ ). A linear fit line is drawn according to the γ values, and the fitted value γ is calculated. The difference between γ and γ is obtained for each node ( Δ γ = γ γ ). The nodes with Δ γ greater than the average Δ γ are treated as candidates. Finally, cluster centers are consecutively selected from the front to the back of the ranked candidate sequence.
After the centers of clusters are determined, non-cluster center nodes are directly assigned to the cluster of the nearest node with a higher density than themselves. To ensure the spatial correlation of geographic nodes, the nodes within each cluster are merged into a super-node. The weight of the super-node is set as the sum of the weights of the nodes contained within the cluster. The edges between a cluster are merged into a single edge, with the edge weight set as the sum of the weights of the edges. For instance, nodes A, B, C, and D are merged into the orange super-node, and node I, J, and K are merged into the purple super-node (Figure 4). The weights of the orange super-node and purple super-node are 4 and 3, respectively. For example, the weight of the edge between the orange super-node and node G is set as the sum of the weights of BG and DG.

2.2. Discovery of Graph Structure Based on Graph Clustering

To avoid tightly connected nodes being assigned to different partitions, we further used a graph clustering method to identify the graph structure from the coarse graph obtained by Section 2.1, which can find the closely connected groups of nodes. Based on the graph structure, the coarse graph can be further comprised. Constructing partitions according to closely connected groups will be helpful for minimizing the number of cut edges or duplicated nodes. In this study, we use the Leiden algorithm [55] on the coarse graph to comprise the clustering result obtained by DPC. The Leiden algorithm was introduced by Traag et al. [55] as an improvement over the Louvain algorithm [56]. The Leiden algorithm is one of the most advanced community detection algorithms in recent years and has been shown to achieve state-of-the-art performance on many graphs [57].
The Leiden algorithm requires a pre-defined function to evaluate the quality of communities. GKGP-DC adopts modularity as the quality function for evaluating communities. Given a weighted graph G and a partition C = { C 1 C 2 , C n } , the modularity is calculated as follows:
Q = 1 2 m i , j w i j K i K j 2 m δ C i , C j
Based on the definition of quality function of communities and the graph aggregated from the DPC results in Section 2.1, the Leiden algorithm is applied to detect community structures with tight internal connections and loose external connections. The graph is further aggregated based on these community structures. The detailed steps of the algorithm are as follows:
(i) Local Movement of Nodes. Initially, each node is considered as a separate community (Figure 5a). Then, each node is moved to the community that yields the largest modularity gain (Figure 5b). In this phase, the Leiden algorithm uses a fast local movement strategy, only accessing nodes whose neighbors have moved to other partitions. This strategy enables the algorithm to perform local moves quickly and efficiently. The modularity gain from moving node i from community C j to community C s (where j ≠ k) can be calculated as follows:
Q i = 1 2 m K i , s K i , j + K i m j , t o t s , t o t K i
where K i , s and K i , j represent the total weight of edges between node i and nodes in communities C s and C j , respectively. j , t o t and s , t o t is the total weight of edges in C s and C j , respectively. m denotes the total weight of all edges.
(ii) Refinement of Partitions. After local movement, each node within a partition is treated as a separate community, and community merging is performed within each partition (Figure 5c). This merging does not necessarily maximize modularity gain but allows the random selection of a community that increases modularity. The larger the modularity increase, the higher the likelihood of selecting that community for merging. The randomness in selecting communities allows a broader exploration of the partitioning space and also accelerates the algorithm.
(iii) Graph Aggregation Based on Refined Partitions. The graph is aggregated based on the refined partitioning results obtained by step (ii). Nodes in each partition are merged into a single node, and the corresponding edges are merged into a single edge (Figure 5d). The node weights are updated as the sum of the weights of the nodes in the partition, and the edge weights are set to the sum of the weights of the parallel edges. Then, step (i) and step (ii) are repeated on the aggregated graph (Figure 5e). The iteration will not stop until modularity can no longer be improved, and the graph will be the final result (Figure 5f).

2.3. Super-Node Partitioning Based on the LDG Algorithm

The geospatial knowledge graph is compressed into a smaller-scale weighted graph after the process of Section 2.1 and Section 2.2. To achieve the optimization goal of load balancing in graph partitioning, the LDG algorithm [12] is applied to partition the weighted graph (Figure 6). The main idea is to construct a scoring function based on the feature of vertices, their neighboring nodes, and the existing partition. Then, the nodes are assigned to the partition with the highest score under the load balancing constraint. The scoring function for a node is defined from two aspects: one is the sum of the edge weights between the node and its neighbors already assigned to the partition, and the other is the capacity constraint term. The calculation formulas are as follows:
g v , P i = u N v , u P i w v , u × ( 1 P i C )
C = β × x V w ( x ) k
where N v represents the set of neighbors of node v , and w v , u represents the weight between nodes v and u . P i represents the partition, and C represents the capacity constraint for the partition. β ( β > 1 ) is a user-defined parameter. V represents the total set of nodes in the graph, and k represents the number of partitions.
Since the smaller-scale weighted graph may contain some nodes with larger weights (Figure 6a), the optimization goal of load balancing needs extra attention. In our method, the k nodes with the largest weights are first selected and assigned to different partitions. Then, the remaining nodes are loaded based on a breadth-first search strategy and assigned based on the aforementioned scoring function. The specific implementation process (Figure 6) is shown as follows:
(i) Pre-allocation. As shown in the lower part of Figure 6b, nodes with the largest weight are first selected for each partition to prevent significant load imbalances in later partitioning stages.
(ii) Node Assignment Order. A node is randomly selected from the remaining nodes, and the node partitioning order is determined using the breadth-first search strategy.
(iii) Data Allocation. One node is loaded each time. If the sum of the weight of the current node and the total weight of the current partition does not exceed the capacity constraint C defined by Equation (6), the partition is considered a candidate partition. Then, the scoring function described by Equation (5) is used to calculate a score for the node and each candidate partition. The node is assigned to the partition with the highest score (Figure 6b). If multiple partitions have the same maximum score, the partition with the smallest weight is selected. The final result of partitioning is obtained after all nodes are assigned to corresponding partitions (Figure 6c).

3. Experiments

GKGP-DC was compared with ten representative graph partitioning algorithms: Hash, LDG, FENNEL, WStream, HeiStream, IHCGP, METIS, ParMetis, KAFFPA, and XtraPuLP. The experiments were executed on an Ubuntu cluster which was set up using VMWare virtual machines, consisting of one master node and eight worker nodes. Each virtual machine is allocated two processors, 4 GB of memory, and a 50 GB hard drive.

3.1. Dataset

In this study, a portion of the YAGO3 [58] dataset was selected to construct the geospatial knowledge graph, referred to as GeoKG. YAGO3 integrates information from Wikipedia with other data sources such as WordNet and Geonames, and it describes various entities such as people, organizations, cities, and the relationships between them. There are two reasons that YAGO3 is chosen to construct GeoKG. First, YAGO3 is a large-scale knowledge graph, which can highlight the difference in query response time between different partitioning methods. Second, YAGO3 has enough geospatial information to support a spatial–graph coupled query. The process of constructing GeoKG is as follows: The file named “yagoFacts” describes the relationships between all entities in the YAGO3 dataset. Therefore, all data (5,628,166 triples and 37 types of edges) from “yagoFacts” were selected. Then, the geospatial information of entities, such as location, was extracted from the “yagoGeonamesOnlyData” file. The geospatial information was connected with the entities in “yagoFacts”. Additionally, some random entities and their location information were selected from the remaining data in “yagoGeonamesOnlyData”. Finally, the aforementioned triples were merged to build GeoKG, which served as the geospatial knowledge graph for testing (Figure 7). GeoKG describes entities such as people, organizations, events, and places, along with the relationships between them. Entities with spatial location information are defined as geographic entity nodes. As shown in Table 1, GeoKG is a large-scale geographic spatial knowledge graph which has 3312247 nodes and 7022211 edges. There are approximately 12% (384981) geographic nodes in GeoKG, which are distributed globally (Figure 8). GeoKG contains 65 types of edges, with 23 types of edges related to geographic nodes.

3.2. Query Design

To validate the effectiveness of GKGP-DC, four types of spatial queries were designed for testing: withinDistance queries, buffer queries, k nearest neighbor (kNN) queries, and intersects queries. These four types of spatial queries are widely used in the field of geospatial data processing and are the basis of complex spatial queries [45]. Therefore, we believe that these four types of queries can comprehensively evaluate the performance of the proposed algorithm in practical applications. In addition, coupled queries that combine spatial and graph queries are an inevitable requirement in geospatial knowledge graph query scenarios. Therefore, the four spatial queries mentioned above were combined with four types of graph queries (Figure 9) to construct various spatial–graph coupled queries.
As shown in Figure 9a, the star queries were designed for obtaining the first neighbors of the node. Snowflake queries consist of multiple star queries. Figure 9b shows a snowflake query consisting of two star queries. The chain queries are step-by-step queries, which obtain only one neighbor according to the label of edge at each step (Figure 9c). The complex queries are composed of multiple basic queries. For example, Figure 9d shows a complex query consisting of two star queries and a chain query. In the experiments of spatial–graph coupled query, the spatial query was first applied to the geospatial knowledge graph. Based on the results of the spatial query, the graph query was further used to obtain the final results.

3.3. Results

3.3.1. Spatial Query Results

For withinDistance queries, buffer queries, kNN queries, and intersect queries, 20 geographic nodes were randomly selected as query inputs. The average number of computational nodes required to execute these queries (rounded up) was calculated for each partitioning algorithm. As shown in Table 2 and Figure 10, GKGP-DC required fewer computational nodes for all types of queries compared to other methods. Specifically, for withinDistance queries and buffer queries, the proposed method only needed three and two computational nodes, respectively. In contrast, algorithms such as Hash, LDG, FENNEL, WStream, and HeiStream required seven to eight computational nodes for most queries. GKGP-DC requires significantly fewer nodes than other algorithms, indicating a lower resource demand. Offline partitioning algorithms like METIS, ParMetis, KAFFPA, and XtraPuLP showed relatively lower computational node requirements for certain queries. For example, ParMetis needed only five computational nodes for kNN queries, which can be attributed to the fact that offline partitioning algorithms focus on minimizing edge cuts. As a result, proposed method can group nodes that are spatially close and tightly connected by edges into the same partition, reducing the number of computational nodes required for some queries.
The above experimental results can be explained as follows. Traditional graph partitioning methods do not consider spatial correlation; therefore, the spatial neighboring nodes are randomly distributed in different partitions. As a result, traditional graph partitioning methods require more computational nodes when spatial queries are executed. In contrast, the proposed method in this study takes spatial correlation into account, prioritizing the partitioning of geographically close nodes into the same partition. As a result, the number of computational nodes required for spatial queries is significantly reduced, which decreases dependency on computational resources, improves query response times, and enhances overall system performance.

3.3.2. Spatial–Graph Coupled Query Experimental Results

For withinDistance–graph coupled queries, GKGP-DC exhibits significant performance improvements (Table 3, Figure 11). For a withinDistance–star query, the proposed algorithm reduces the query response time by at least 0.868 s and up to 2.102 s. For a withinDistance–snowflake query, the reduction is at least 4.786 s and up to 15.325 s. The proposed method has a significant improvement on a withinDistance–chain query and a withinDistance–complex query, whose response time is reduced by at least 9.098 s and up to 23.053 s and at least 10.429 s and up to 22.260 s, respectively.
For buffer–graph coupled queries, the proposed algorithm also performs better compared to other partitioning methods (Table 4, Figure 12). Specifically, for a buffer–star query, the proposed algorithm reduces the query response time by at least 1.797 s and up to 2.487 s. For a buffer–snowflake query, the reduction is at least 4.907 s and up to 19.571 s. As for buffer–chain queries and buffer–complex queries, the response time is reduced by at least 7.162 s and up to 19.506 s and at least 10.756 s and up to 22.774 s, respectively.
The result of a kNN–graph coupled query also shows that GKGP-DC outperforms the comparison methods (Table 5, Figure 13). For example, the proposed algorithm reduces the query response time by at least 0.586 s and up to 0.886 s on a kNN–star query. For a kNN–snowflake query, the reduction is at least 4.992 s and up to 12.360 s. For a kNN–chain query, the response time is reduced by at least 5.172 s and up to 13.258 s. And for a kNN–complex query, the response time is reduced by at least 4.248 s and up to 16.845 s.
In addition, Table 6 and Figure 14 show that GKGP-DC exhibits significant performance improvements when handling different intersect–graph coupled queries. Specifically, for an intersect–star query, the proposed algorithm reduces the query response time by at least 0.472 s and up to 3.078 s. For an intersect–snowflake query, the reduction is at least 2.613 s and up to 10.772 s. For an intersect–chain query, the response time is reduced by at least 2.455 s and up to 11.819 s. And for an intersect–complex query, the response time is reduced by at least 4.039 s and up to 19.744 s.

3.4. Discussion

The experiment results (Table 2 and Figure 10) show that GKGP-DC outperforms the comparison methods on spatial-related queries. The number of computational nodes traversed is at least three less for withinDistance and buffer queries and at least three less for kNN queries and intersect queries. In addition, the improvement is also speciously obvious in spatial–complex coupled queries. In all experimental results of spatial–graph coupled queries, the response time is reduced by approximately 76% at most (result of intersect–complex queries, Table 6 and Figure 14). The main reasons for this can be summarized as follows:
(i) GKGP-DC preprocesses geographic nodes using density peak clustering, effectively grouping spatially proximate nodes together. This strategy significantly optimizes the execution efficiency of spatial queries. As for spatial–graph coupled queries, the results of spatial subqueries are confined to only a few partitions. This not only significantly reduces the assembly time of the sub-results, but also decreases the number of results generated by the factorized queries. Furthermore, the network transmission consumption is reduced. Overall, the partitions produced by GKGP-DC effectively reduce the total query response time for spatial-related queries.
(ii) The preprocessing of geographic nodes may result in inferior partition quality compared to the results obtained by high-quality offline partitioning algorithms (such as Metis). However, GKGP-DC mitigates this issue by using the Leiden algorithm to identify graph structures. The graph clusters detected by Leiden are treated as the basic units for graph partitioning, which can preserve the structural characteristics of the graph. It is helpful for reducing the number of cross-partition edges. Therefore, the weighted graph produced after applying DPC and Leiden contains information on both spatial relationships and graph structures. The efficiency of both spatial queries and graph queries can benefit from this partitioning scheme.
The significant reductions in query response time and computational node usage show that GKGP-DC can optimize the performance of spatial queries and spatial–graph coupled queries. For a system requiring the distributed management of large-scale geospatial knowledge graphs, such as a geographic intelligent question-answering platform, GKGP-DC can improve system operation efficiency.

4. Conclusions

Geospatial knowledge graphs contain rich geographic location semantics, requiring graph partitioning algorithms to comprehensively consider both spatial relationships and graph structures. Traditional graph partitioning methods discard the spatial relationship between entities, resulting in poor performance for spatial queries. To solve this problem, we propose GKGP-DC to produce spatially aware partitions. By using density peak clustering to aggregate spatial entities to super-nodes, GKGP-DC avoids dividing close spatial entities into different partitions. Then, the Leiden algorithm is applied on the coarse graph, producing clusters which meet the requirement of minimizing cut edges. Finally, the LDG algorithm is used to refine the partitioning result, ensuring load balance without altering the clusters. Therefore, the final partitioning result can be obtained under three constraints: spatial relationship, graph structure, and load balance. Experiments on a geospatial knowledge graph (i.e., GeoKG) show that compared to ten representative methods, GKGP-DC can effectively reduce the number of computational node resources required for spatial queries and obviously improve the response time for spatial–graph coupled queries. GKGP-DC can be further used for the distributed management of large-scale geospatial knowledge graphs.
Future studies should focus on these two aspects. First, GKGP-DC lacks the ability of processing spatial relationships between entities with different geometric types, such as points and polygons. GKGP-DC treats all geographic entities as points. The centroids of entities were used to represent the locations of entities. It is possible that a point entity is in a polygon entity, but the distance between the point and the centroid of the polygon is large. In this situation, the point may be assigned to a different partition, resulting in cross-partition communication and the low efficiency of spatial queries. When searching the k-nearest neighbors for DPC, taking the spatial relationship into account may address this issue. For example, if a polygon entity contains a point entity, they should always be considered as neighbors. In addition, GKGP-DC can only partition geospatial knowledge graphs based on the spatial relationship and graph structure. In practice, user preference also should be considered to partition the graph. One way to alleviate this problem is to adjust the partition based on the history records of users. We leave this issue for further research.

Author Contributions

Conceptualization, Y.C. and K.C.; data curation, F.O., Y.C. and M.C.; methodology, Y.C., K.C. and Q.L.; visualization, Y.C. and F.O.; writing—original draft, Y.C., K.C. and G.W.; writing—review and editing, Q.L., M.D. and R.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded through support from the National Key Research and Development Program of China (No. 2021YFB3900903), the National Natural Science Foundation of China (No. 42271484), the water conservancy science and technology project of Guizhou, China (No. KT202410), and the Research Foundation of the Department of Natural Resources of Hunan Province (Grant No. HBZ20240132).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author(s).

Acknowledgments

We deeply appreciate the anonymous reviewers for their constructive and helpful suggestions.

Conflicts of Interest

Rui Xu was employed by the company Guizhou Survey & Design Research Institute for Water Resources and Hydropower. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Varanka, D.E. A geospatial knowledge graph prototype for national topographic mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 48, 511–516. [Google Scholar] [CrossRef]
  2. Dsouza, A.; Tempelmeier, N.; Yu, R.; Gottschalk, S.; Demidova, E. Worldkg: A world-scale geographic knowledge graph. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 4475–4484. [Google Scholar]
  3. Auer, S.; Lehmann, J.; Hellmann, S. Linkedgeodata: Adding a spatial dimension to the web of data. In Proceedings of the Semantic Web-ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, 25–29 October 2009; Proceedings 8. pp. 731–746. [Google Scholar]
  4. Giunchiglia, F.; Maltese, V.; Farazi, F.; Dutta, B. GeoWordNet: A resource for geo-spatial applications. In Proceedings of the Semantic Web: Research and Applications: 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Greece, 30 May–3 June 2010; Proceedings, Part I 7. pp. 121–136. [Google Scholar]
  5. Qi, Z.; Wang, H.; Zhang, H. A dual-store structure for knowledge graphs. IEEE Trans. Knowl. Data Eng. 2023, 35, 1104–1108. [Google Scholar] [CrossRef]
  6. Heidari, S.; Simmhan, Y.; Calheiros, R.N.; Buyya, R. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv. (CSUR) 2018, 51, 1–53. [Google Scholar] [CrossRef]
  7. Rahimian, F.; Payberah, A.H.; Girdzijauskas, S.; Haridi, S. A distributed algorithm for large-scale graph partitioning. ACM Trans. Auton. Adapt. Syst. 2015, 10, 1–24. [Google Scholar] [CrossRef]
  8. Garey, M.R.; Johnson, D.S.; Stockmeyer, L. Some simplified NP-complete problems. Theor. Comput. Sci. 1974, 1, 237–267. [Google Scholar] [CrossRef]
  9. Ding, L.; Li, C.; Jin, D.; Ding, S. Survey of spectral clustering based on graph theory. Pattern Recognit. 2024, 151, 110366. [Google Scholar] [CrossRef]
  10. Kolountzakis, M.N.; Miller, G.L.; Peng, R.; Tsourakakis, C.E. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 2012, 8, 161–185. [Google Scholar] [CrossRef]
  11. Camilus, K.S.; Govindan, V.K. A review on graph based segmentation. Int. J. Image Graph. Signal Process. 2012, 4, 1. [Google Scholar] [CrossRef]
  12. Stanton, I.; Kliot, G. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1222–1230. [Google Scholar]
  13. Malewicz, G.; Austern, M.H.; Bik, A.J.; Dehnert, J.C.; Horn, I.; Leiser, N.; Czajkowski, G. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA, 6–10 June 2010; pp. 135–146. [Google Scholar]
  14. Xie, C.; Yan, L.; Li, W.J.; Zhang, Z. Distributed power-law graph computing: Theoretical and empirical analysis. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1673–1681. [Google Scholar]
  15. Gonzalez, J.E.; Low, Y.; Gu, H.; Bickson, D.; Guestrin, C. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, Hollywood, CA, USA, 8–10 October 2012; pp. 17–30. [Google Scholar]
  16. Tsourakakis, C.; Gkantsidis, C.; Radunovic, B.; Vojnovic, M. FENNEL: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 333–342. [Google Scholar]
  17. Petroni, F.; Querzoni, L.; Daudjee, K.; Kamali, S.; Iacoboni, G. HDRF: Stream-based partitioning for power-law graphs. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 243–252. [Google Scholar]
  18. Ding, Z.; Xiang, Y.; Wang, S.; Xie, X.; Zhou, S.K. Play like a vertex: A stackelberg game approach for streaming graph partitioning. Proc. ACM Manag. Data 2024, 2, 1–27. [Google Scholar] [CrossRef]
  19. Mayer, C.; Mayer, R.; Tariq, M.A.; Geppert, H.; Laich, L.; Rieger, L.; Rothermel, K. ADWISE: Adaptive window-based streaming edge partitioning for high-speed graph processing. In Proceedings of the 38th IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, 2–6 July 2018; pp. 685–695. [Google Scholar]
  20. Patwary, M.A.K.; Garg, S.; Kang, B. Window-based streaming graph partitioning algorithm. In Proceedings of the Australasian Computer Science Week Multiconference, Sydney, NSW, Australia, 29–31 January 2019; pp. 1–10. [Google Scholar]
  21. Li, Y.; Li, C.; Orgerie, A.C.; Parvédy, P.R. WSGP: A window-based streaming graph partitioning approach. In Proceedings of the 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Melbourne, Australia, 10–13 May 2021; pp. 586–595. [Google Scholar]
  22. Faraj, M.F.; Schulz, C. Buffered streaming graph partitioning. ACM J. Exp. Algorithmics 2022, 27, 1–26. [Google Scholar] [CrossRef]
  23. Wang, Z.; Yang, Z.; Wang, N.; Du, Y.; Nie, J.; Wei, Z.; Yu, G. Lightweight streaming graph partitioning by fully utilizing knowledge from local view. In Proceedings of the 43rd International Conference on Distributed Computing Systems, Hong Kong, China, 18–21 July 2023; pp. 614–625. [Google Scholar]
  24. Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J. 1973, 23, 298–305. [Google Scholar] [CrossRef]
  25. Karypis, G.; Kumar, V. Analysis of multilevel graph partitioning. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, San Diego, CA, USA, 8 December 1995; p. 29-es. [Google Scholar]
  26. Karypis, G.; Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 1998, 20, 359–392. [Google Scholar] [CrossRef]
  27. Holtgrewe, M.; Sanders, P.; Schulz, C. Engineering a scalable high-quality graph partitioner. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Atlanta, GA, USA, 19–23 April 2010; pp. 1–12. [Google Scholar]
  28. Sanders, P.; Schulz, C. Engineering multilevel graph partitioning algorithms. In Proceedings of the 19th Annual European Symposium on Algorithms, Saarbrücken, Germany, 5–9 September 2011; pp. 469–480. [Google Scholar]
  29. Sanders, P.; Schulz, C. Distributed evolutionary graph partitioning. In Proceedings of the 14th Workshop on Algorithm Engineering and Experiments, Kyoto, Japan, 16 January 2012; pp. 16–29. [Google Scholar]
  30. Meyerhenke, H.; Sanders, P.; Schulz, C. Parallel graph partitioning for complex networks. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 2625–2638. [Google Scholar] [CrossRef]
  31. Jafari, N.; Selvitopi, O.; Aykanat, C. Fast shared-memory streaming multilevel graph partitioning. J. Parallel Distrib. Comput. 2021, 147, 140–151. [Google Scholar] [CrossRef]
  32. Chan, P.K.; Schlag, M.D.F.; Zien, J.Y. Spectral k-way ratio-cut partitioning and clustering. In Proceedings of the 30th International Design Automation Conference, Dallas, TX, USA, 14–18 June 1993; pp. 749–754. [Google Scholar]
  33. Kernighan, B.W.; Lin, S. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 1970, 49, 291–307. [Google Scholar] [CrossRef]
  34. Predari, M.; Esnard, A. A k-way greedy graph partitioning with initial fixed vertices for parallel applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Heraklion, Greece, 17–19 February 2016; pp. 280–287. [Google Scholar]
  35. Fiduccia, C.M.; Mattheyses, R.M. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Design Automation Conference, Las Vegas, NV, USA, 14–16 June 1982; pp. 175–181. [Google Scholar]
  36. Rahimian, F.; Payberah, A.H.; Girdzijauskas, S.; Haridi, S. Distributed vertex-cut partitioning. In Proceedings of the 14th IFIP International Conference on Distributed Applications and Interoperable System, Berlin, Germany, 3–6 June 2014; pp. 186–200. [Google Scholar]
  37. Li, H.; Yuan, H.; Huang, J.; Cui, J.; Yoo, J. Dynamic graph repartitioning: From single vertex to vertex group. In Proceedings of the 25th International Conference on Database Systems for Advanced Applications, Jeju, Republic of Korea, 24–27 September 2020; pp. 482–497. [Google Scholar]
  38. Li, H.; Liu, Y.; Yang, S.; Lin, Y.; Yang, Y.; Yoo, J. An improved hill climbing algorithm for graph partitioning. Comput. J. 2023, 66, 1176–1761. [Google Scholar] [CrossRef]
  39. Mayer, C.; Tariq, M.A.; Mayer, R.; Rothermel, K. Graph: Traffic-aware graph processing. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 1289–1302. [Google Scholar] [CrossRef]
  40. Li, H.; Yuan, H.; Huang, J.; Ma, X.; Cui, J.; Yoo, J. Edge repartitioning via structure-aware group migration. IEEE Trans. Comput. Soc. Syst. 2021, 9, 751–760. [Google Scholar] [CrossRef]
  41. Li, H.; Yuan, H.; Huang, J.; Cui, J.; Ma, X.; Wang, S.; Philip, S.Y. Group reassignment for dynamic edge partitioning. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2477–2490. [Google Scholar] [CrossRef]
  42. Ugander, J.; Backstrom, L. Balanced label propagation for partitioning massive graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013; pp. 507–516. [Google Scholar]
  43. Vaquero, L.; Cuadrado, F.; Logothetis, D.; Martella, C. Adaptive partitioning for large-scale dynamic graphs. In Proceedings of the 4th Annual Symposium on Cloud Computing, Sydney, Australia, 3–5 December 2014; pp. 144–153. [Google Scholar]
  44. Martella, C.; Logothetis, D.; Loukas, A.; Siganos, G. Spinner: Scalable graph partitioning in the cloud. In Proceedings of the 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 19–22 April 2017; pp. 1083–1094. [Google Scholar]
  45. Slota, G.M.; Rajamanickam, S.; Madduri, K. PuLP/XtraPuLP: Partitioning Tools for Extreme-Scale Graphs; Technical report; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2017. [Google Scholar]
  46. Moussawi, A.E.; Seghouani, N.B.; Bugiotti, F. A graph partitioning algorithm for edge or vertex balance. In Proceedings of the 31st International Conference on Database and Expert Systems Applications, Bratislava, Slovakia, 14–17 September 2020; pp. 23–37. [Google Scholar]
  47. Zhang, C.; Wei, F.; Liu, Q.; Tang, Z.G.; Li, Z. Graph edge partitioning via neighborhood heuristic. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 605–614. [Google Scholar]
  48. Hanai, M.; Suzumura, T.; Tan, W.J.; Liu, E.; Theodoropoulos, G.; Cai, W. Distributed edge partitioning for trillion-edge graphs. VLDB Endow. 2019, 12, 2379–2392. [Google Scholar] [CrossRef]
  49. Lee, K.; Ganti, R.K.; Srivatsa, M.; Liu, L. Efficient spatial query processing for big data. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, 4–7 November 2014; pp. 469–472. [Google Scholar]
  50. García-García, F.; Corral, A.; Iribarne, L.; Vassilakopoulos, M.; Manolopoulos, Y. Efficient distance join query processing in distributed spatial data management systems. Inf. Sci. 2020, 512, 985–1008. [Google Scholar] [CrossRef]
  51. Zhong, Y.; Han, J.; Zhang, T.; Li, Z.; Fang, J.; Chen, G. Towards parallel spatial query processing for big spatial data. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, China, 21–25 May 2012; pp. 2085–2094. [Google Scholar]
  52. Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
  53. Du, M.; Ding, S.; Jia, H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl. Based Syst. 2016, 99, 135–145. [Google Scholar] [CrossRef]
  54. Sun, L.; Ye, T.; Sun, J.; Duan, X.; Luo, Y. Density-peak-based overlapping community detection algorithm. IEEE Trans. Comput. Soc. Syst. 2021, 9, 1211–1223. [Google Scholar] [CrossRef]
  55. Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
  56. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, 10008. [Google Scholar] [CrossRef]
  57. Anuar, S.H.H.; Abas, Z.A.; Yunos, N.M.; Zaki, N.H.M.; Hashim, N.A.; Mokhtar, M.F.; Nizam, A.F. Comparison between Louvain and Leiden algorithm for network structure: A review. J. Phys. Conf. Ser. 2021, 2129, 012028. [Google Scholar] [CrossRef]
  58. Mahdisoltani, F.; Biega, J.; Suchanek, F.M. Yago3: A knowledge base from multilingual Wikipedias. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 6–9 January 2013. [Google Scholar]
Figure 1. Comparison of partitioning methods when ignoring and considering spatial information.
Figure 1. Comparison of partitioning methods when ignoring and considering spatial information.
Applsci 14 10704 g001
Figure 2. Flowchart of GKGP-DC.
Figure 2. Flowchart of GKGP-DC.
Applsci 14 10704 g002
Figure 3. Decision diagram of DPC.
Figure 3. Decision diagram of DPC.
Applsci 14 10704 g003
Figure 4. Node agglomeration based on DPC.
Figure 4. Node agglomeration based on DPC.
Applsci 14 10704 g004
Figure 5. Graph structure discovery.
Figure 5. Graph structure discovery.
Applsci 14 10704 g005
Figure 6. Super-node partitioning based on LDG.
Figure 6. Super-node partitioning based on LDG.
Applsci 14 10704 g006
Figure 7. Part of GeoKG.
Figure 7. Part of GeoKG.
Applsci 14 10704 g007
Figure 8. Distribution of geographic entities.
Figure 8. Distribution of geographic entities.
Applsci 14 10704 g008
Figure 9. Four types of graph query.
Figure 9. Four types of graph query.
Applsci 14 10704 g009
Figure 10. The number of computational nodes for different queries (unit: count).
Figure 10. The number of computational nodes for different queries (unit: count).
Applsci 14 10704 g010
Figure 11. Response time of withinDistance–graph coupled query (unit: seconds).
Figure 11. Response time of withinDistance–graph coupled query (unit: seconds).
Applsci 14 10704 g011
Figure 12. Response time of buffer–graph coupled query (unit: seconds).
Figure 12. Response time of buffer–graph coupled query (unit: seconds).
Applsci 14 10704 g012
Figure 13. Response time of buffer–graph coupled query (unit: seconds).
Figure 13. Response time of buffer–graph coupled query (unit: seconds).
Applsci 14 10704 g013
Figure 14. Response time of intersect–graph coupled query (unit: seconds).
Figure 14. Response time of intersect–graph coupled query (unit: seconds).
Applsci 14 10704 g014
Table 1. Data description of GeoKG.
Table 1. Data description of GeoKG.
DatasetNumber of NodesNumber of EdgesNumber of Geographic NodesNumber of Edge Labels
GeoKG3,312,2477,022,211384,98165
Table 2. The number of computational nodes required for different queries (unit: count).
Table 2. The number of computational nodes required for different queries (unit: count).
AlgorithmWithinDistance QueryBuffer QuerykNN QueryIntersect Query
Hash8788
LDG8887
FENNEL7777
WStream8888
HeiSream6676
IHCGP7677
METIS7766
ParMetis6658
KAFFPA6566
XtraPuLP7657
GKGP-DC3234
Table 3. Response time of withinDistance–graph coupled query (unit: seconds).
Table 3. Response time of withinDistance–graph coupled query (unit: seconds).
AlgorithmwithinDistance–
Star Query
withinDistance–
Snowflake Query
withinDistance–
Chain Query
withinDistance–
Complex Query
hash5.73926.24836.12838.154
LDG5.82123.24535.75938.047
FENNEL4.58724.05634.80237.095
wstream5.40722.10833.25237.165
HeiStream5.42421.23633.28537.043
IHCGP5.70519.09432.90837.094
METIS5.04316.24622.57126.981
parmetis5.10315.70922.54526.323
KaFFPa5.15215.74522.17326.584
xtraPuLP4.64815.87922.55626.427
GKGP-DC3.71910.92313.07515.894
Table 4. Response time of buffer–graph coupled query (unit: seconds).
Table 4. Response time of buffer–graph coupled query (unit: seconds).
AlgorithmBuffer–Star QueryBuffer–Snowflake QueryBuffer–Chain QueryBuffer–Complex Query
hash4.04528.69932.36735.879
LDG4.63024.15829.41033.543
Fennel4.01524.53729.07233.109
wstream4.40323.78929.32833.132
HeiStream4.35221.58529.09133.096
IHCGP4.70520.95627.92330.745
METIS4.04315.11421.51824.511
parmetis4.23315.39321.53124.462
KaFFPa4.19214.03520.02323.861
xtraPuLP4.04814.75221.47924.533
GKGP-DC2.2189.12812.86113.105
Table 5. Response time of kNN coupled query (unit: seconds).
Table 5. Response time of kNN coupled query (unit: seconds).
AlgorithmkNN–Star QuerykNN–Snowflake QuerykNN–Chain QuerykNN–Complex Query
hash2.14222.42121.3125.994
LDG2.28722.20420.15625.754
FENNEL2.07921.31220.01524.695
wstream2.26022.18419.98623.624
HeiStream1.98721.66119.08423.345
IHCGP2.07619.27318.60820.309
METIS2.05415.9414.58413.397
parmetis1.99315.70214.98714.082
KaFFPa2.12115.47413.22413.599
xtraPuLP2.01815.05314.43113.48
GKGP-DC 1.40110.0618.0529.149
Table 6. Response time of intersect–graph coupled query (unit: seconds).
Table 6. Response time of intersect–graph coupled query (unit: seconds).
AlgorithmIntersect–Star QueryIntersect–Snowflake QueryIntersect–Chain QueryIntersect–Complex Query
hash4.20216.23516.14625.992
LDG4.56416.20915.1123.415
FENNEL4.69715.77615.83723.639
wstream4.33915.75315.40122.177
HeiStream3.98715.06714.94521.046
IHCGP3.74513.26814.51521.768
METIS2.0938.3887.57310.639
parmetis2.1228.1717.54410.287
KaFFPa2.0918.0766.78210.334
xtraPuLP2.1468.5626.88610.639
GKGP-DC 1.6195.4634.3276.248
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Ou, F.; Liu, Q.; Wu, G.; Chen, K.; Deng, M.; Chen, M.; Xu, R. Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Appl. Sci. 2024, 14, 10704. https://doi.org/10.3390/app142210704

AMA Style

Chen Y, Ou F, Liu Q, Wu G, Chen K, Deng M, Chen M, Xu R. Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Applied Sciences. 2024; 14(22):10704. https://doi.org/10.3390/app142210704

Chicago/Turabian Style

Chen, Yuxuan, Feifei Ou, Qiliang Liu, Gusheng Wu, Kaiqi Chen, Min Deng, Meihua Chen, and Rui Xu. 2024. "Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning" Applied Sciences 14, no. 22: 10704. https://doi.org/10.3390/app142210704

APA Style

Chen, Y., Ou, F., Liu, Q., Wu, G., Chen, K., Deng, M., Chen, M., & Xu, R. (2024). Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Applied Sciences, 14(22), 10704. https://doi.org/10.3390/app142210704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop