Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy
Abstract
:1. Introduction
- Scalability and efficiency: As mentioned before, the linked data cloud is continually growing, employing new sources and datasets, and the service should be able to handle big datasets. The idea is to provide a service for different linked data environments (open or closed). Therefore, the time required for linking has to be minimized, and the vision is to discover even extensive datasets in near real time.
- Robustness: The service must retain functionality under unforeseen conditions, such as corrupted data. This is especially true for crowd-sourced or automatically generated datasets, which are likely to include errors as the size of data grows.
- Interoperability and flexibility: The service has to be handled as easily and transparently as possible. The (SPARQL affine) user should be able to easily formulate queries to retrieve source and target datasets, as well as the linking condition. This includes the ability to handle data of different formats, as datasets are heterogeneous. For example, the computation of topological relations requires that geometries are represented similarly, e.g., in WKT format. However, if the datasets use different formats, then the service has to provide the means to unify these representations.The service has to operate easily as a standalone system, as a module integrated into other applications, or through RESTful API.
- Quality: Given two sets of RDF resources with geospatial data, and , and a spatial predicate, P, the service shall return all the links between the resources and which satisfy P (see more in Section 2).
2. Background
3. Related Work
4. Geo-L
4.1. Input
4.2. Download
4.3. Caching
Algorithm 1: Dataset Caching |
4.4. Link Discovery
4.5. Implementation
4.5.1. GeoPandas
4.5.2. PostgreSQL
- GiST indexes are “null safe”; therefore, attempting to build an R-Tree on data which contain an empty geometry field will fail.
- GiST uses a compression technique which results in fast indexing.
- The database facilitates the implementation of the resource caching mechanism.
5. Experimental Settings
5.1. Datasets
- SPOI—Smart Points of Interest: A dataset which contains over 30 million Points of Interest important for tourism around the world [49].
- OLU—Open Land-Use: Maps land use on a local and regional level; contains over 11 million geometries—Polygons and MultiPolygons [50].
- NUTS—Nomenclature of Territorial Units for Statistics: A standard for referencing European countries and their regions, for statistical processes [15].
5.2. Experiments
5.2.1. Simulation
5.2.2. Real-World Scenarios
5.2.3. Practical Use Cases
6. Discussion
- Scalability and efficiency: Geo-L configuration allowed to form a dataset directly by the SPARQL query that defined it. This feature was, in particular, useful when data at the SPARQL endpoint were stored differently than specified for the linking task, but could be transformed into the required format through SPARQL functions.LIMES, on the other hand, allowed only the detection of relations applied directly to entities of the datasets:
- —
- Download time: Datasets were cached not for a single task, but were regarded as resources of their own. Thanks to its caching mechanism, Geo-L accessed the SPARQL endpoints only when data required in the dataset were missing, and expanded existing datasets where possible. LIMES, on the other hand, performed a download for each dataset query; previously downloaded datasets were re-downloaded and, as a result, its operation required more time and space.
- —
- Mapping time: Geo-L utilized PostgreSQL with the PostGIS index for the storing and indexing of the data. This enabled efficient spatial joins between source and target datasets.
- Robustness: Geo-L included multiple features that strengthened the robustness of the application.
- —
- Caching: Geo-L cached portions of the data as they were downloaded, rather than writing the whole dataset after being downloaded, as LIMES did. This property prevented data loss when, e.g., connection to the remote endpoint was lost.
- —
- Mapping accuracy: Geo-L detected entities with invalid geometries (compliant to OGC OpenGIS “Simple Features for SQL” specifications) and did not include them in the search space. In addition, in several cases, LIMES did not include valid geometries in the result set, whereas Geo-L correctly did.
- Interoperability and flexibility: Geo-L could be used as a stand-alone application or as a REST service (in a docker), which would allow it to be integrated with other applications. The easy SPARQL-based and slim set-up of the target and source configuration (as JSON) enabled a very free usage of the tool.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- W3C. W3C Data Web Activity—Building the Web of Data; W3C: Cambridge, MA, USA, 2014. [Google Scholar]
- Klyne, G.; Caroll, J.J. Resource Description Framework (RDF): Concepts and Abstract Syntax; W3C: Cambridge, MA, USA, 2004. [Google Scholar]
- RDF Working Group. Resource Description Framework (RDF); W3C: Cambridge, MA, USA, 2014. [Google Scholar]
- Bechhofer, S.; Van Harmelen, F.; Hendler, J.; Horrocks, I.; McGuinness, D.L.; Patel-Schneider, P.F.; Stein, L.A. OWL web ontology language reference. W3C Recomm. 2004, 10, 1–53. [Google Scholar]
- OWL Working Group. OWL 2 Web Ontology Language Document Overview, 2nd ed.; W3C: Cambridge, MA, USA, 2012. [Google Scholar]
- Prud’hommeaux, E.; Seaborne, A. SPARQL Query Language for RDF. W3C Recomm. 2008. Available online: http://www.w3.org/TR/rdf-sparql-query/ (accessed on 11 October 2021).
- Hitzler, P.; Krotzsch, M.; Rudolph, S. Foundations of Semantic Web Technologies; Chapman and Hall/CRC: Boca Raton, FL, USA, 2009. [Google Scholar]
- Battle, R.; Kolas, D. Geosparql: Enabling a geospatial semantic web. Semant. Web J. 2011, 3, 355–370. [Google Scholar] [CrossRef]
- Nikolaou, C.; Dogani, K.; Bereta, K.; Garbis, G.; Karpathiotakis, M.; Kyzirakos, K.; Koubarakis, M. Sextant: Visualizing time-evolving linked geospatial data. J. Web Semant. 2015, 35, 35–52. [Google Scholar] [CrossRef] [Green Version]
- Koubarakis, M.; Bereta, K.; Papadakis, G.; Savva, D.; Stamoulis, G. Big, Linked Geospatial Data and Its Applications in Earth Observation. IEEE Internet Comput. 2017, 21, 87–91. [Google Scholar] [CrossRef]
- Auer, S.; Lehmann, J.; Hellmann, S. Linkedgeodata: Adding a spatial dimension to the web of data. In Proceedings of the International Semantic Web Conference, Washington, DC, USA, 25–29 October 2009; pp. 731–746. [Google Scholar]
- Čerba, O.; Charvát, K.; Mildorf, T.; Bērziņš, R.; Vlach, P.; Musilová, B. SDI4Apps Points of Interest Knowledge Base. In Progress in Cartography; Springer: Berlin/Heidelberg, Germany, 2016; pp. 229–237. [Google Scholar]
- De León, A.; Saquicela, V.; Vilches, L.M.; Villazón-Terrazas, B.; Priyatna, F.; Corcho, O. Geographical linked data: A Spanish use case. In Proceedings of the 6th International Conference on Semantic Systems, Graz, Austria, 1–3 September 2010; p. 36. [Google Scholar]
- Debruyne, C.; Clinton, É.; McNerney, L.; Nautiyal, A.; O’Sullivan, D. Serving Ireland’s Geospatial Information as Linked Data. In Proceedings of the International Semantic Web Conference (Posters & Demos), Kobe, Japan, 19 October 2016. [Google Scholar]
- Eurostat-European Commission. Regions in the European Union. Nomenclature of Territorial Units for Statistics—NUTS 2013/EU-28; European Union: Brussels, Belgium, 2015. [Google Scholar]
- Bizer, C.; Heath, T.; Berners-Lee, T. Linked data: The story so far. In Semantic Services, Interoperability and Web Applications: Emerging Concepts; IGI Global: Hershey, PA, USA, 2011; pp. 205–227. [Google Scholar]
- Wiemann, S.; Bernard, L. Spatial data fusion in Spatial Data Infrastructures using Linked Data. Int. J. Geogr. Inf. Sci. 2016, 30, 613–636. [Google Scholar] [CrossRef]
- Clementini, E.; Di Felice, P.; Van Oosterom, P. A small set of formal topological relationships suitable for end-user interaction. In Proceedings of the International Symposium on Spatial Databases, Singapore, 23–25 June 1993; Springer: Berlin/Heidelberg, Germany, 1993; pp. 277–295. [Google Scholar]
- Clementini, E.; Sharma, J.; Egenhofer, M.J. Modelling topological spatial relations: Strategies for query processing. Comput. Graph. 1994, 18, 815–822. [Google Scholar] [CrossRef]
- Standard, International Organization for Standardization. Geographic Information—Spatial Schema; Standard, International Organization for Standardization: Geneva, CH, USA, 2003. [Google Scholar]
- Strobl, C. Dimensionally Extended Nine-Intersection Model (DE-9IM). In Encyclopedia of GIS; Shekhar, S., Xiong, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 240–245. [Google Scholar]
- Freeman, H.; Shapira, R. Determining the Minimum-Area Encasing Rectangle for an Arbitrary Closed Curve. Commun. ACM 1975, 18, 409–413. [Google Scholar] [CrossRef]
- Smeros, P.; Koubarakis, M. Discovering Spatial and Temporal Links among RDF Data. In Proceedings of the Workshop on Linked Data on the Web (LDOW 2016), Montreal, QC, Canada, 12 April 2016; Volume 1593. [Google Scholar]
- Isele, R.; Jentzsch, A.; Bizer, C. Efficient Multidimensional Blocking for Link Discovery without losing Recall. In Proceedings of the Fourteenth International Workshop on Web and Databases (WebDB 2011), Athens, Greece, 12 June 2011. [Google Scholar]
- Volz, J.; Bizer, C.; Gaedke, M.; Kobilarov, G. Silk—A Link Discovery Framework for the Web of Data. In Proceedings of the Workshop on Linked Data on the Web (LDOW 2009), Madrid, Spain, 20 April 2009; Volume 538. [Google Scholar]
- Sherif, M.A.; Dreßler, K.; Smeros, P.; Ngomo, A.N. Radon—Rapid Discovery of Topological Relations. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 175–181. [Google Scholar]
- Ngomo, A.C.N.; Auer, S. LIMES—A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; pp. 2312–2317. [Google Scholar]
- DICE Group. LIMES—Link Discovery Framework for Metric Spaces: v1.5.5. 2019. Available online: https://github.com/dice-group/LIMES/releases/tag/1.5.5 (accessed on 11 October 2021).
- Faria, D.; Balasubramani, B.S.; Shivaprabhu, V.R.; Mott, I.; Pesquita, C.; Couto, F.M.; Cruz, I.F. Results of AML in OAEI 2017. In Proceedings of the Twelfth International Workshop on Ontology Matching, Vienna, Austria, 21 October 2017; pp. 122–128. [Google Scholar]
- Faria, D.; Pesquita, C.; Santos, E.; Cruz, I.F.; Couto, F.M. AgreementMakerLight: A scalable automated ontology matching system. In Proceedings of the 10th International Conference on Data Integration in the Life Sciences (DILS 2014), Lisbon, Portugal, 17–18 July 2014; pp. 29–32. [Google Scholar]
- DICE Group. Geometry API for Java: v2.2.3. 2019. Available online: https://github.com/Esri/geometry-api-java/releases/tag/v2.2.3 (accessed on 11 October 2021).
- Khiat, A.; Mackeprang, M. I-Match and OntoIdea results for OAEI 2017. In Proceedings of the Twelfth International Workshop on Ontology Matching, Vienna, Austria, 21 October 2017; pp. 135–137. [Google Scholar]
- Achichi, M.; Cheatham, M.; Dragisic, Z.; Euzenat, J.; Faria, D.; Ferrara, A.; Flouris, G.; Fundulaki, I.; Harrow, I.; Ivanova, V.; et al. Results of the Ontology Alignment Evaluation Initiative 2017. In Proceedings of the Twelfth International Workshop on Ontology Matching, Vienna, Austria, 21 October 2017; pp. 61–113. [Google Scholar]
- TomTom. Available online: https://www.tomtom.com (accessed on 11 October 2021).
- Saveta, T.; Fundulaki, I.; Flouris, G.; Ngomo, A.N. SPgen: A Benchmark Generator for Spatial Link Discovery Tools. In Proceedings of the 17th International Semantic Web Conference (ISWC2018), Part I, Monterey, CA, USA, 8–12 October 2018; Volume 11136, pp. 408–423. [Google Scholar]
- Doudali, T.D.; Konstantinou, I.; Koziris, N. Spaten: A Spatio-temporal and Textual Big Data Generator. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 3416–3421. [Google Scholar]
- Kyzirakos, K.; Karpathiotakis, M.; Koubarakis, M. Strabon: A Semantic Geospatial DBMS. In Proceedings of the 11th International Semantic Web Conference (ISWC 2012), Boston, MA, USA, 11–15 November 2012; Volume 7649, pp. 295–311. [Google Scholar]
- European Commission. CORINE Land Cover Project—Technical Guide; Office for Official Publications of the European Communities: Luxembourg, 1994. [Google Scholar]
- Lee, J.G.; Kang, M. Geospatial Big Data: Challenges and Opportunities. Big Data Res. 2015, 2, 74–81. [Google Scholar] [CrossRef]
- Li, S.; Dragicevic, S.; Castro, F.A.; Sester, M.; Winter, S.; Coltekin, A.; Pettit, C.; Jiang, B.; Haworth, J.; Stein, A.; et al. Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges. ISPRS J. Photogramm. Remote. Sens. 2016, 115, 119–133. [Google Scholar] [CrossRef] [Green Version]
- Guttman, A. R-Trees: A Dynamic Index Structure for Spatial Searching. In Proceedings of the 1984 ACM SIGMOD Conference, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar]
- Brinkhoff, T.; Kriegel, H.; Seeger, B. Efficient Processing of Spatial Joins Using R-Trees. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; pp. 237–246. [Google Scholar]
- Jordahl, K.; den Bossche, J.V. Geopandas/Geopandas: v0.4.0. 2018. Available online: https://github.com/geopandas/geopandas/tree/v0.4.0 (accessed on 11 October 2021).
- Beckmann, N.; Kriegel, H.P.; Schneider, R.; Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD Internatioanl Conference on Management of Data (SIGMOD’90), Atlantic City, NJ, USA, 23–26 May 1990; pp. 322–331. [Google Scholar]
- Behnel, S.; Bradshaw, R.; Citro, C.; Dalcin, L.; Seljebotn, D.S.; Smith, K. Cython: The Best of Both Worlds. Comput. Sci. Eng. 2011, 13, 31–39. [Google Scholar] [CrossRef]
- The PostGIS Development Group. PostGIS 3.1.5dev Manual; The PostGIS Development Group: Beaverton, OR, USA, 2021. [Google Scholar]
- Refractions Research Inc. PostGIS 2.5.0 Manual; Refractions Research Inc.: Victoria, BC, Canada, 2018. [Google Scholar]
- Hellerstein, J.M.; Naughton, J.F.; Pfeffer, A. Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB’95), Zurich, Switzerland, 11–15 September 1995; pp. 562–573. [Google Scholar]
- Cerba, O.; Mildorf, T. Smart Points of Interest: Big, Linked and Harmonized Spatial Data. In Proceedings of the 19th International Research Symposium on Computer-Based Cartography (AutoCarto 2016), Albuquerque, NM, USA, 14–16 September 2016; pp. 4–13. [Google Scholar]
- Mildorf, T.; Charvát, K.; Ježek, J.; Templer, S.; Malewski, C. Open Land Use Map. AGRIS On-Line Pap. Econ. Inform. 2014, 6, 81–88. [Google Scholar]
- Charvát, K.; Esbri, M.A.; Mayer, W.; Charvát, K., Jr.; Campos, A.; Palma, R.; Krivanek, Z. FOODIE—Open data for agriculture. In Proceedings of the IST-Africa 2014 Conference Proceedings, Pointe aux Piments, Mauritius, 7–9 May 2014; pp. 1–9. [Google Scholar]
- Norton, B.; Vilches, L.M.; De Léon, A.; Goodwin, J.; Stadler, C.; Anand, S.; Harries, D.; Villazón-Terrazas, B.; Atemezing, G.A. NeoGeo Vocabulary Specification–Madrid Edition. Martín Salas, J., Harth, A., Eds.; Public Draft. 2012. Available online: http://geovocab.org/doc/neogeo/ (accessed on 11 October 2021).
- Zhang, X.; Liu, X.; Zhang, M.; Dahlgren, R.A.; Eitzel, M. A Review of Vegetated Buffers and a Meta-analysis of Their Mitigation Efficacy in Reducing Nonpoint Source Pollution. J. Environ. Qual. 2010, 39, 76–84. [Google Scholar] [CrossRef]
- Vanwalleghem, T. Soil Erosion and Conservation. In International Encyclopedia of Geography: People, the Earth, Environment and Technology; Wiley Online Library: Hoboken, NJ, USA, 2016; pp. 1–10. [Google Scholar]
- Larson, W.E.; Pierce, F.J.; Dowdy, R.H. The threat of soil erosion to long-term crop production. Science 1983, 219, 458–465. [Google Scholar] [CrossRef] [PubMed]
- Blanco-Canqui, H.; Lal, R. Erosion Control and Soil Quality. In Principles of Soil Conservation and Management; Springer: Berlin/Heidelberg, Germany, 2010; pp. 477–492. [Google Scholar]
- Curl, E.A. Control of plant diseases by crop rotation. Bot. Rev. 1963, 29, 413–479. [Google Scholar] [CrossRef]
2 | ||
−1 | ||
−1 |
System | Scalability and Efficiency | Robustness | Interoperability and Flexibility |
---|---|---|---|
Silk |
|
|
|
AML |
|
|
|
OntoIdea |
|
|
|
Strabon |
|
|
|
LIMES |
|
|
|
Geo-L |
|
|
|
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zinke-Wehlmann, C.; Kirschenbaum, A. Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy. ISPRS Int. J. Geo-Inf. 2021, 10, 712. https://doi.org/10.3390/ijgi10100712
Zinke-Wehlmann C, Kirschenbaum A. Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy. ISPRS International Journal of Geo-Information. 2021; 10(10):712. https://doi.org/10.3390/ijgi10100712
Chicago/Turabian StyleZinke-Wehlmann, Christian, and Amit Kirschenbaum. 2021. "Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy" ISPRS International Journal of Geo-Information 10, no. 10: 712. https://doi.org/10.3390/ijgi10100712
APA StyleZinke-Wehlmann, C., & Kirschenbaum, A. (2021). Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy. ISPRS International Journal of Geo-Information, 10(10), 712. https://doi.org/10.3390/ijgi10100712