Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning
Abstract
:1. Introduction
2. Methods
2.1. Geographic Node Partitioning Based on Density Peak Clustering
2.1.1. Definition of Local Density and Relative Distance
2.1.2. Cluster Center Selection and Node Agglomeration
2.2. Discovery of Graph Structure Based on Graph Clustering
2.3. Super-Node Partitioning Based on the LDG Algorithm
3. Experiments
3.1. Dataset
3.2. Query Design
3.3. Results
3.3.1. Spatial Query Results
3.3.2. Spatial–Graph Coupled Query Experimental Results
3.4. Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Varanka, D.E. A geospatial knowledge graph prototype for national topographic mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 48, 511–516. [Google Scholar] [CrossRef]
- Dsouza, A.; Tempelmeier, N.; Yu, R.; Gottschalk, S.; Demidova, E. Worldkg: A world-scale geographic knowledge graph. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 4475–4484. [Google Scholar]
- Auer, S.; Lehmann, J.; Hellmann, S. Linkedgeodata: Adding a spatial dimension to the web of data. In Proceedings of the Semantic Web-ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, 25–29 October 2009; Proceedings 8. pp. 731–746. [Google Scholar]
- Giunchiglia, F.; Maltese, V.; Farazi, F.; Dutta, B. GeoWordNet: A resource for geo-spatial applications. In Proceedings of the Semantic Web: Research and Applications: 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Greece, 30 May–3 June 2010; Proceedings, Part I 7. pp. 121–136. [Google Scholar]
- Qi, Z.; Wang, H.; Zhang, H. A dual-store structure for knowledge graphs. IEEE Trans. Knowl. Data Eng. 2023, 35, 1104–1108. [Google Scholar] [CrossRef]
- Heidari, S.; Simmhan, Y.; Calheiros, R.N.; Buyya, R. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv. (CSUR) 2018, 51, 1–53. [Google Scholar] [CrossRef]
- Rahimian, F.; Payberah, A.H.; Girdzijauskas, S.; Haridi, S. A distributed algorithm for large-scale graph partitioning. ACM Trans. Auton. Adapt. Syst. 2015, 10, 1–24. [Google Scholar] [CrossRef]
- Garey, M.R.; Johnson, D.S.; Stockmeyer, L. Some simplified NP-complete problems. Theor. Comput. Sci. 1974, 1, 237–267. [Google Scholar] [CrossRef]
- Ding, L.; Li, C.; Jin, D.; Ding, S. Survey of spectral clustering based on graph theory. Pattern Recognit. 2024, 151, 110366. [Google Scholar] [CrossRef]
- Kolountzakis, M.N.; Miller, G.L.; Peng, R.; Tsourakakis, C.E. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 2012, 8, 161–185. [Google Scholar] [CrossRef]
- Camilus, K.S.; Govindan, V.K. A review on graph based segmentation. Int. J. Image Graph. Signal Process. 2012, 4, 1. [Google Scholar] [CrossRef]
- Stanton, I.; Kliot, G. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1222–1230. [Google Scholar]
- Malewicz, G.; Austern, M.H.; Bik, A.J.; Dehnert, J.C.; Horn, I.; Leiser, N.; Czajkowski, G. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA, 6–10 June 2010; pp. 135–146. [Google Scholar]
- Xie, C.; Yan, L.; Li, W.J.; Zhang, Z. Distributed power-law graph computing: Theoretical and empirical analysis. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1673–1681. [Google Scholar]
- Gonzalez, J.E.; Low, Y.; Gu, H.; Bickson, D.; Guestrin, C. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, Hollywood, CA, USA, 8–10 October 2012; pp. 17–30. [Google Scholar]
- Tsourakakis, C.; Gkantsidis, C.; Radunovic, B.; Vojnovic, M. FENNEL: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 333–342. [Google Scholar]
- Petroni, F.; Querzoni, L.; Daudjee, K.; Kamali, S.; Iacoboni, G. HDRF: Stream-based partitioning for power-law graphs. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 243–252. [Google Scholar]
- Ding, Z.; Xiang, Y.; Wang, S.; Xie, X.; Zhou, S.K. Play like a vertex: A stackelberg game approach for streaming graph partitioning. Proc. ACM Manag. Data 2024, 2, 1–27. [Google Scholar] [CrossRef]
- Mayer, C.; Mayer, R.; Tariq, M.A.; Geppert, H.; Laich, L.; Rieger, L.; Rothermel, K. ADWISE: Adaptive window-based streaming edge partitioning for high-speed graph processing. In Proceedings of the 38th IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, 2–6 July 2018; pp. 685–695. [Google Scholar]
- Patwary, M.A.K.; Garg, S.; Kang, B. Window-based streaming graph partitioning algorithm. In Proceedings of the Australasian Computer Science Week Multiconference, Sydney, NSW, Australia, 29–31 January 2019; pp. 1–10. [Google Scholar]
- Li, Y.; Li, C.; Orgerie, A.C.; Parvédy, P.R. WSGP: A window-based streaming graph partitioning approach. In Proceedings of the 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Melbourne, Australia, 10–13 May 2021; pp. 586–595. [Google Scholar]
- Faraj, M.F.; Schulz, C. Buffered streaming graph partitioning. ACM J. Exp. Algorithmics 2022, 27, 1–26. [Google Scholar] [CrossRef]
- Wang, Z.; Yang, Z.; Wang, N.; Du, Y.; Nie, J.; Wei, Z.; Yu, G. Lightweight streaming graph partitioning by fully utilizing knowledge from local view. In Proceedings of the 43rd International Conference on Distributed Computing Systems, Hong Kong, China, 18–21 July 2023; pp. 614–625. [Google Scholar]
- Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J. 1973, 23, 298–305. [Google Scholar] [CrossRef]
- Karypis, G.; Kumar, V. Analysis of multilevel graph partitioning. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, San Diego, CA, USA, 8 December 1995; p. 29-es. [Google Scholar]
- Karypis, G.; Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 1998, 20, 359–392. [Google Scholar] [CrossRef]
- Holtgrewe, M.; Sanders, P.; Schulz, C. Engineering a scalable high-quality graph partitioner. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Atlanta, GA, USA, 19–23 April 2010; pp. 1–12. [Google Scholar]
- Sanders, P.; Schulz, C. Engineering multilevel graph partitioning algorithms. In Proceedings of the 19th Annual European Symposium on Algorithms, Saarbrücken, Germany, 5–9 September 2011; pp. 469–480. [Google Scholar]
- Sanders, P.; Schulz, C. Distributed evolutionary graph partitioning. In Proceedings of the 14th Workshop on Algorithm Engineering and Experiments, Kyoto, Japan, 16 January 2012; pp. 16–29. [Google Scholar]
- Meyerhenke, H.; Sanders, P.; Schulz, C. Parallel graph partitioning for complex networks. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 2625–2638. [Google Scholar] [CrossRef]
- Jafari, N.; Selvitopi, O.; Aykanat, C. Fast shared-memory streaming multilevel graph partitioning. J. Parallel Distrib. Comput. 2021, 147, 140–151. [Google Scholar] [CrossRef]
- Chan, P.K.; Schlag, M.D.F.; Zien, J.Y. Spectral k-way ratio-cut partitioning and clustering. In Proceedings of the 30th International Design Automation Conference, Dallas, TX, USA, 14–18 June 1993; pp. 749–754. [Google Scholar]
- Kernighan, B.W.; Lin, S. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 1970, 49, 291–307. [Google Scholar] [CrossRef]
- Predari, M.; Esnard, A. A k-way greedy graph partitioning with initial fixed vertices for parallel applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Heraklion, Greece, 17–19 February 2016; pp. 280–287. [Google Scholar]
- Fiduccia, C.M.; Mattheyses, R.M. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Design Automation Conference, Las Vegas, NV, USA, 14–16 June 1982; pp. 175–181. [Google Scholar]
- Rahimian, F.; Payberah, A.H.; Girdzijauskas, S.; Haridi, S. Distributed vertex-cut partitioning. In Proceedings of the 14th IFIP International Conference on Distributed Applications and Interoperable System, Berlin, Germany, 3–6 June 2014; pp. 186–200. [Google Scholar]
- Li, H.; Yuan, H.; Huang, J.; Cui, J.; Yoo, J. Dynamic graph repartitioning: From single vertex to vertex group. In Proceedings of the 25th International Conference on Database Systems for Advanced Applications, Jeju, Republic of Korea, 24–27 September 2020; pp. 482–497. [Google Scholar]
- Li, H.; Liu, Y.; Yang, S.; Lin, Y.; Yang, Y.; Yoo, J. An improved hill climbing algorithm for graph partitioning. Comput. J. 2023, 66, 1176–1761. [Google Scholar] [CrossRef]
- Mayer, C.; Tariq, M.A.; Mayer, R.; Rothermel, K. Graph: Traffic-aware graph processing. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 1289–1302. [Google Scholar] [CrossRef]
- Li, H.; Yuan, H.; Huang, J.; Ma, X.; Cui, J.; Yoo, J. Edge repartitioning via structure-aware group migration. IEEE Trans. Comput. Soc. Syst. 2021, 9, 751–760. [Google Scholar] [CrossRef]
- Li, H.; Yuan, H.; Huang, J.; Cui, J.; Ma, X.; Wang, S.; Philip, S.Y. Group reassignment for dynamic edge partitioning. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2477–2490. [Google Scholar] [CrossRef]
- Ugander, J.; Backstrom, L. Balanced label propagation for partitioning massive graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013; pp. 507–516. [Google Scholar]
- Vaquero, L.; Cuadrado, F.; Logothetis, D.; Martella, C. Adaptive partitioning for large-scale dynamic graphs. In Proceedings of the 4th Annual Symposium on Cloud Computing, Sydney, Australia, 3–5 December 2014; pp. 144–153. [Google Scholar]
- Martella, C.; Logothetis, D.; Loukas, A.; Siganos, G. Spinner: Scalable graph partitioning in the cloud. In Proceedings of the 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 19–22 April 2017; pp. 1083–1094. [Google Scholar]
- Slota, G.M.; Rajamanickam, S.; Madduri, K. PuLP/XtraPuLP: Partitioning Tools for Extreme-Scale Graphs; Technical report; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2017. [Google Scholar]
- Moussawi, A.E.; Seghouani, N.B.; Bugiotti, F. A graph partitioning algorithm for edge or vertex balance. In Proceedings of the 31st International Conference on Database and Expert Systems Applications, Bratislava, Slovakia, 14–17 September 2020; pp. 23–37. [Google Scholar]
- Zhang, C.; Wei, F.; Liu, Q.; Tang, Z.G.; Li, Z. Graph edge partitioning via neighborhood heuristic. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 605–614. [Google Scholar]
- Hanai, M.; Suzumura, T.; Tan, W.J.; Liu, E.; Theodoropoulos, G.; Cai, W. Distributed edge partitioning for trillion-edge graphs. VLDB Endow. 2019, 12, 2379–2392. [Google Scholar] [CrossRef]
- Lee, K.; Ganti, R.K.; Srivatsa, M.; Liu, L. Efficient spatial query processing for big data. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, 4–7 November 2014; pp. 469–472. [Google Scholar]
- García-García, F.; Corral, A.; Iribarne, L.; Vassilakopoulos, M.; Manolopoulos, Y. Efficient distance join query processing in distributed spatial data management systems. Inf. Sci. 2020, 512, 985–1008. [Google Scholar] [CrossRef]
- Zhong, Y.; Han, J.; Zhang, T.; Li, Z.; Fang, J.; Chen, G. Towards parallel spatial query processing for big spatial data. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, China, 21–25 May 2012; pp. 2085–2094. [Google Scholar]
- Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
- Du, M.; Ding, S.; Jia, H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl. Based Syst. 2016, 99, 135–145. [Google Scholar] [CrossRef]
- Sun, L.; Ye, T.; Sun, J.; Duan, X.; Luo, Y. Density-peak-based overlapping community detection algorithm. IEEE Trans. Comput. Soc. Syst. 2021, 9, 1211–1223. [Google Scholar] [CrossRef]
- Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
- Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, 10008. [Google Scholar] [CrossRef]
- Anuar, S.H.H.; Abas, Z.A.; Yunos, N.M.; Zaki, N.H.M.; Hashim, N.A.; Mokhtar, M.F.; Nizam, A.F. Comparison between Louvain and Leiden algorithm for network structure: A review. J. Phys. Conf. Ser. 2021, 2129, 012028. [Google Scholar] [CrossRef]
- Mahdisoltani, F.; Biega, J.; Suchanek, F.M. Yago3: A knowledge base from multilingual Wikipedias. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 6–9 January 2013. [Google Scholar]
Dataset | Number of Nodes | Number of Edges | Number of Geographic Nodes | Number of Edge Labels |
---|---|---|---|---|
GeoKG | 3,312,247 | 7,022,211 | 384,981 | 65 |
Algorithm | WithinDistance Query | Buffer Query | kNN Query | Intersect Query |
---|---|---|---|---|
Hash | 8 | 7 | 8 | 8 |
LDG | 8 | 8 | 8 | 7 |
FENNEL | 7 | 7 | 7 | 7 |
WStream | 8 | 8 | 8 | 8 |
HeiSream | 6 | 6 | 7 | 6 |
IHCGP | 7 | 6 | 7 | 7 |
METIS | 7 | 7 | 6 | 6 |
ParMetis | 6 | 6 | 5 | 8 |
KAFFPA | 6 | 5 | 6 | 6 |
XtraPuLP | 7 | 6 | 5 | 7 |
GKGP-DC | 3 | 2 | 3 | 4 |
Algorithm | withinDistance– Star Query | withinDistance– Snowflake Query | withinDistance– Chain Query | withinDistance– Complex Query |
---|---|---|---|---|
hash | 5.739 | 26.248 | 36.128 | 38.154 |
LDG | 5.821 | 23.245 | 35.759 | 38.047 |
FENNEL | 4.587 | 24.056 | 34.802 | 37.095 |
wstream | 5.407 | 22.108 | 33.252 | 37.165 |
HeiStream | 5.424 | 21.236 | 33.285 | 37.043 |
IHCGP | 5.705 | 19.094 | 32.908 | 37.094 |
METIS | 5.043 | 16.246 | 22.571 | 26.981 |
parmetis | 5.103 | 15.709 | 22.545 | 26.323 |
KaFFPa | 5.152 | 15.745 | 22.173 | 26.584 |
xtraPuLP | 4.648 | 15.879 | 22.556 | 26.427 |
GKGP-DC | 3.719 | 10.923 | 13.075 | 15.894 |
Algorithm | Buffer–Star Query | Buffer–Snowflake Query | Buffer–Chain Query | Buffer–Complex Query |
---|---|---|---|---|
hash | 4.045 | 28.699 | 32.367 | 35.879 |
LDG | 4.630 | 24.158 | 29.410 | 33.543 |
Fennel | 4.015 | 24.537 | 29.072 | 33.109 |
wstream | 4.403 | 23.789 | 29.328 | 33.132 |
HeiStream | 4.352 | 21.585 | 29.091 | 33.096 |
IHCGP | 4.705 | 20.956 | 27.923 | 30.745 |
METIS | 4.043 | 15.114 | 21.518 | 24.511 |
parmetis | 4.233 | 15.393 | 21.531 | 24.462 |
KaFFPa | 4.192 | 14.035 | 20.023 | 23.861 |
xtraPuLP | 4.048 | 14.752 | 21.479 | 24.533 |
GKGP-DC | 2.218 | 9.128 | 12.861 | 13.105 |
Algorithm | kNN–Star Query | kNN–Snowflake Query | kNN–Chain Query | kNN–Complex Query |
---|---|---|---|---|
hash | 2.142 | 22.421 | 21.31 | 25.994 |
LDG | 2.287 | 22.204 | 20.156 | 25.754 |
FENNEL | 2.079 | 21.312 | 20.015 | 24.695 |
wstream | 2.260 | 22.184 | 19.986 | 23.624 |
HeiStream | 1.987 | 21.661 | 19.084 | 23.345 |
IHCGP | 2.076 | 19.273 | 18.608 | 20.309 |
METIS | 2.054 | 15.94 | 14.584 | 13.397 |
parmetis | 1.993 | 15.702 | 14.987 | 14.082 |
KaFFPa | 2.121 | 15.474 | 13.224 | 13.599 |
xtraPuLP | 2.018 | 15.053 | 14.431 | 13.48 |
GKGP-DC | 1.401 | 10.061 | 8.052 | 9.149 |
Algorithm | Intersect–Star Query | Intersect–Snowflake Query | Intersect–Chain Query | Intersect–Complex Query |
---|---|---|---|---|
hash | 4.202 | 16.235 | 16.146 | 25.992 |
LDG | 4.564 | 16.209 | 15.11 | 23.415 |
FENNEL | 4.697 | 15.776 | 15.837 | 23.639 |
wstream | 4.339 | 15.753 | 15.401 | 22.177 |
HeiStream | 3.987 | 15.067 | 14.945 | 21.046 |
IHCGP | 3.745 | 13.268 | 14.515 | 21.768 |
METIS | 2.093 | 8.388 | 7.573 | 10.639 |
parmetis | 2.122 | 8.171 | 7.544 | 10.287 |
KaFFPa | 2.091 | 8.076 | 6.782 | 10.334 |
xtraPuLP | 2.146 | 8.562 | 6.886 | 10.639 |
GKGP-DC | 1.619 | 5.463 | 4.327 | 6.248 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Ou, F.; Liu, Q.; Wu, G.; Chen, K.; Deng, M.; Chen, M.; Xu, R. Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Appl. Sci. 2024, 14, 10704. https://doi.org/10.3390/app142210704
Chen Y, Ou F, Liu Q, Wu G, Chen K, Deng M, Chen M, Xu R. Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Applied Sciences. 2024; 14(22):10704. https://doi.org/10.3390/app142210704
Chicago/Turabian StyleChen, Yuxuan, Feifei Ou, Qiliang Liu, Gusheng Wu, Kaiqi Chen, Min Deng, Meihua Chen, and Rui Xu. 2024. "Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning" Applied Sciences 14, no. 22: 10704. https://doi.org/10.3390/app142210704
APA StyleChen, Y., Ou, F., Liu, Q., Wu, G., Chen, K., Deng, M., Chen, M., & Xu, R. (2024). Dual Clustering-Based Method for Geospatial Knowledge Graph Partitioning. Applied Sciences, 14(22), 10704. https://doi.org/10.3390/app142210704