Challenges in Geocoding: An Analysis of R Packages and Web Scraping Approaches
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Application Programming Interfaces (APIs)
3.2. Automated Data Extraction from Websites (Web Scraping)
4. Results
5. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shreyas, M.; Tyagi, A. The World with Future Technologies (Post-COVID-19): Open Issues, Challenges, and the Road Ahead. Intelligent Interactive Multimedia Systems for e-Healthcare Applications; Springer: Auburn, AL, USA, 2022; pp. 411–452. [Google Scholar] [CrossRef]
- Guillén, M.A.; López-Ayuso, B.; Paniagua, E.; Cadenas, J.M. Una revisión de la Cadena Datos-Información-Conocimiento desde el Pragmatismo de Peirce. Doc. Cienc. Inf. 2015, 38, 153–177. [Google Scholar] [CrossRef]
- Curto-Rodríguez, R. Análisis multidimensional de los portales de datos abiertos autonómicos españoles. Rev. Española Doc. Científica 2021, 44, e284. [Google Scholar] [CrossRef]
- Pérez, V. Aproximaciones Metodológicas para la Obtención de Bases de Datos de Calidad. Instrumentos de Análisis del Cambio Económico y Social. Ph.D. Thesis, University of Valencia, Valencia, Spain, 2023. Available online: https://roderic.uv.es/handle/10550/85100 (accessed on 9 March 2024).
- Peset, F.; Aleixandre, R.; Blasco, Y.; Ferrer, A. Datos abiertos de investigación. Camino recorrido y cuestiones pendientes. An. Doc. 2017, 20, 1–12. [Google Scholar] [CrossRef]
- Pérez, V.; Aybar, C.; Pavía, J.M. Spanish electoral archive. SEA database. Sci. Data 2021, 8, 189. [Google Scholar] [CrossRef] [PubMed]
- Zhao, P.; Foerster, T.; Yue, P. The Geoprocessing Web. Comput. Geosci. 2012, 47, 3–12. [Google Scholar] [CrossRef]
- Abella, A.; Ortiz, M.; Pablos-Heredero, C. Indicadores de calidad de datos abiertos: El caso del portal de datos abiertos de Barcelona. Prof. Inf. 2018, 27, 375–382. [Google Scholar] [CrossRef]
- Abella, A.; Ortiz, M.; Pablos-Heredero, C.; García-Luna, D. La Reutilización de Datos Abiertos III; ESIC: Madrid, Spain, 2021. [Google Scholar]
- Alam, M.; Torgo, L.; Bifet, A. A Survey on Spatio-temporal Data Analytics Systems. ACM Comput. Surv. 2022, 54, 1–38. [Google Scholar] [CrossRef]
- Vallejo, I.; Ramírez, E. Potencialidades y limitaciones de los datos inspire de catastro para la cartografía y caracterización de la edificación rural. Aplicación a la provincia de Sevilla. GeoFocus. Rev. Int. Cienc. Tecnol. 2019, 23, 19–30. [Google Scholar] [CrossRef]
- European Data Portal. Reusing Open Data; EU Publications Office: Luxembourg, 2020. [Google Scholar] [CrossRef]
- Andrienko, G.; Andrienko, N.; Fuchs, G. Understanding movement data quality. J. Locat. Based Serv. 2016, 10, 31–46. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
- Sardareh, A.S.; Brown, G.; Denny, P. Comparing four contemporary statistical software tools for introductory data science and statistics in the social sciences. Teach. Stat. 2021, 43, S157–S172. [Google Scholar] [CrossRef]
- Prener, C.G.; Fox, B. Creating open source composite geocoders: Pitfalls and oportunities. Trans. GIS 2021, 25, 1868–1887. [Google Scholar] [CrossRef]
- Tchuente, D.; Nyawa, S. Real estate price estimation in French cities using geocoding and machine learning. Ann. Oper. Res. 2022, 308, 571–608. [Google Scholar] [CrossRef]
- Kiliç, B.; Gulgen, F. Accuracy and similarity aspects in online geocoding services: A comparative evaluation for Google and Bing Maps. Int. J. Eng. Geosci. 2020, 5, 109–119. [Google Scholar] [CrossRef]
- Dumedah, G. Address points of landmarks and paratransit services as a credible reference database for geocoding. Trans. GIS 2021, 25, 1027–1048. [Google Scholar] [CrossRef]
- Wang, R.; Mao, H.; Wang, Y.; Rae, C.; Shaw, W. Hyper-resolution monitoring of urban flooding with social media and crowdsourcing data. Comput. Geosci. 2018, 111, 139–147. [Google Scholar] [CrossRef]
- Ali, U.; Shamsi, M.; Bohacek, M.; Purcell, K.; Hoare, K.; Mangina, E.; O’Donnell, J. A data-driven approach for multi-scale GIS-based building energy modeling for analysis, planning and support decision making. Appl. Energy 2020, 279, 115834. [Google Scholar] [CrossRef]
- Kinnee, E.; Tripathy, S.; Schinasi, L.; Shmool, J.; Sheffield, P.; Holguin, F.; Clougherty, J. Geocoding Error, Spatial Uncertainty, and Implications for Exposure Assessment and Environmental Epidemiology. Int. J. Environ. Res. Public Health 2020, 17, 5845. [Google Scholar] [CrossRef] [PubMed]
- Jiang, W.; Stefanakis, E. What3Words Geocoding Extensions. J. Geovis. Spat. Anal. 2018, 2, 7. [Google Scholar] [CrossRef]
- Präger, M.; Kurz, C.; Böhm, J.; Laxy, M.; Maier, W. Using data from online geocoding services for the assessment of environmental obesogenic factors: A feasibility study. Int. J. Health Geogr. 2019, 18, 13. [Google Scholar] [CrossRef]
- Chopin, J.; Caneppele, S. Geocoding child sexual abuse: An explorative analysis on journey to crime and to victimization from French police data. Child Abus. Negl. 2019, 91, 116–130. [Google Scholar] [CrossRef]
- McIntire, R.; Keith, S.; Boamah, M.; Leader, A.; Glanz, K.; Klassen, A.; Zeigler, C. A Prostate Cancer Composite Score to Identify High Burden Neighborhoods. Prev. Med. 2018, 112, 47–53. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Sinnott, R.; Nepal, S. P-GENT: Privacy-Preserving Geocoding of Non-Geotagged Tweets. In Proceedings of the 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, New York, NY, USA, 1–3 August 2018; pp. 972–983. [Google Scholar] [CrossRef]
- Roongpiboonsopit, D.; Karimi, H.A. Comparative evaluation and analysis of online geocoding services. Int. J. Geogr. Inf. Sci. 2010, 24, 1081–1100. [Google Scholar] [CrossRef]
- Sierra Requena, R.; Martínez-Llario, J.C.; Lorenzo-Sáez, E.; Coll-Aliaga, E. Development of an Algorithm to Evaluate the Quality of Geolocated Addresses in Urban Areas. ISPRS Int. J. Geo-Inf. 2023, 12, 407. [Google Scholar] [CrossRef]
- Edwards, S.E.; Strauss, B.; Miranda, M.L. Geocoding Large Population-Level Administrative Datasets at Highly Resolved Spatial Scales. Trans. GIS 2014, 18, 586–603. [Google Scholar] [CrossRef]
- Chow, T.; Dede-Bamfo, N.; Dahal, K. Geographic disparity of positional errors and matching rate of residential addresses among geocoding solutions. Ann. GIS 2016, 22, 29–42. [Google Scholar] [CrossRef]
- Faure, E.; Danjou, A.; Clavel-Chapelon, F.; Boutron-Ruault, M.; Dossus, L.; Fervers, B. Accuracy of two geocoding methods for geographic information system-based exposure assessment in epidemiological studies. Environ. Health 2017, 16, 15. [Google Scholar] [CrossRef] [PubMed]
- Lee, K.; Claridades, A.R.; Lee, J. Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques. Appl. Sci. 2020, 10, 5628. [Google Scholar] [CrossRef]
- Lin, Y.; Kang, M.; Wu, Y.; Du, Q.; Liu, T. A deep learning architecture for semantic address matching. Int. J. Geogr. Inf. Sci. 2020, 34, 559–576. [Google Scholar] [CrossRef]
- Horbinski, T.; Cybulski, P. Similarities of global web mapping services functionality in the context of responsive web design. Geod. Cartogr. 2018, 67, 159–177. [Google Scholar] [CrossRef]
- Katz, C.G. One Map to Rule Them All: Google Maps and Quasi-Sovereign Power in International Legal Disputes. Hastings Sci. Technol. Law J. 2023, 14, 67–122. [Google Scholar]
- Kiliç, B.; Hacar, M.; Gulgen, F. Effects of reverse geocoding on OpenStreetMap tag quality assessment. Trans. GIS 2023, 27, 1599–1613. [Google Scholar] [CrossRef]
- Li, X.; Liu, L.; Chen, Z.; Liu, Y.; Liu, H. Describing the APIs comprehensively: Obtaining the holistic representations from multiple modalities data for different tasks. Inf. Softw. Technol. 2023, 158, 107188. [Google Scholar] [CrossRef]
- Kahle, D.; Wickham, H. ggmap: Spatial Visualization with ggplot2. R J. 2013, 5, 144–161. [Google Scholar] [CrossRef]
- Cambon, J.; Hernangómez, D.; Belanger, C.; Possenriede, D. tidygeocoder: An R package for geocoding. J. Open Source Softw. 2021, 65, 3544. [Google Scholar] [CrossRef]
- Madrid City Council. Callejero Oficial del Ayuntamiento de Madrid. Available online: https://datos.madrid.es/portal/site/egob (accessed on 9 March 2024).
- Kahle, D.; Wickham, H.; Jackson, S.; Korpela, M. ggmap: Spatial Visualization with ggplot2; R Package Version 4.0.0. 2023. Available online: https://cran.r-project.org/package=ggmap (accessed on 9 March 2024).
- Unterfinger, M.; Possenriede, D. hereR: ‘sf’-Based Interface to the ‘HERE’ REST APIs; R Package Version 1.0.0. 2023. Available online: https://cran.r-project.org/package=hereR (accessed on 9 March 2024).
- Chiou, E. mapquestr: R Interface to Interact with (Parts of) the MapQuest APIs; R Package Version 0.1.0. 2023. Available online: https://github.com/chiouey/mapquestr/ (accessed on 9 March 2024).
- Tennekes, M. tmaptools: Thematic Map Tools; R Package Version 3.1.1. 2021. Available online: https://cran.r-project.org/package=tmaptools (accessed on 9 March 2024).
- Possenriede, D.; Sadler, J.; Salmon, M.; Ross, N.; Russ, J.; Silge, J. opencage: Geocode with the OpenCage API; R Package Version 0.2.2. 2021. Available online: https://cran.r-project.org/package=opencage (accessed on 9 March 2024).
- Walker, K.; Pousson, E.; North, A.; McBain, M. mapboxapi: R Interface to ‘Mapbox’ Web Services; R Package Version 0.5.3. 2022. Available online: https://cran.r-project.org/package=mapboxapi (accessed on 9 March 2024).
- Pebesma, E. Simple Features for R: Standardized Support for Spatial Vector Data. R J. 2018, 10, 439–446. [Google Scholar] [CrossRef]
- Murillo, D.; Saavedra, D. Web Scraping de los Perfiles y Publicaciones de una Afiliación en Google Scholar utilizando Aplicaciones Web e implementando un Algoritmo en R. In Proceedings of the 4th Internacional Congress AmITIC, Popayán, Colombia, 6 September 2017; Available online: https://ridda2.utp.ac.pa/handle/123456789/1689 (accessed on 9 March 2024).
- Wickham, H. rvest: Easily Harvest (Scrape) Web Pages; R Package Version 1.0.4. 2024. Available online: https://cran.r-project.org/package=rvest (accessed on 9 March 2024).
- Harrison, J. RSelenium: R Bindings for Selenium WebDriver; R Package Version 1.7.9. 2022. Available online: https://cran.r-project.org/package=RSelenium (accessed on 9 March 2024).
- Parvez, M.S.; Agah-Tasneem, K.S.; Rajendra, S.; Bodke, K. Analysis of Different Web Data Extraction Techniques. In Proceedings of the 2018 International Conference on Smart City and Emerging Technology (ICSCET), Mumbai, India, 5 May 2018. [Google Scholar] [CrossRef]
- Zhan, Z. Remote Control Server. In Selenium WebDriver Recipes in C#, 2nd ed.; Zhan, Z., Ed.; Apress: Berkeley, CA, USA, 2015; pp. 149–154. [Google Scholar] [CrossRef]
Name of API | R Package |
---|---|
Google Maps Geocoding API | ggmap, tidygeocoder, googleway, RgoogleMaps |
Bing Maps API | tidygeocoder |
HERE Geocoding & Search API | hereR, tidygeocoder, nominatimlite |
MapQuest Geocoding API | mapquestr, tidygeocoder, nominatimlite |
Nominatim API | osmar, tidygeocoder, ggspatial, osmdata, nominatimlite, tmaptools |
OpenCage Geocoding API | opencage, tidygeocoder, nominatimlite |
Mapbox Geocoding API | tidygeocoder |
TomTom Geocoding API | tidygeocoder |
ArcGIS REST API-Geocoding Services | arcgisbinding, tidygeocoder |
R Package | Number of Downloads |
---|---|
ggmap | 724,324 |
RgoogleMaps | 286,308 |
tmaptools | 187,888 |
ggspatial | 115,003 |
osmdata | 89,737 |
tidygeocoder | 42,605 |
googleway | 42,265 |
nominatimlite | 7798 |
mapboxapi | 7724 |
hereR | 7494 |
opencage | 4061 |
osmar | 1267 |
Name of API | API-KEY Needed | Conditions of Use |
---|---|---|
Google Maps Geocoding API | YES | Credit of USD 200 per month (equivalent to 28,500 requests). * |
Bing Maps API | YES | 50,000 requests per day for educational/non-commercial uses. |
HERE Geocoding & Search API | YES | 1000 free requests per day. |
MapQuest Geocoding API | YES | 15,000 free transactions per month. |
Nominatim API | NO | |
OpenCage Geocoding API | YES | 2500 free requests per day (for testing purposes). |
Mapbox Geocoding API | YES | 100,000 free requests per month. * |
TomTom Geocoding API | YES | 2500 free requests per day. |
ArcGIS REST API-Geocoding Services | NO |
Method | S1 | S2 | S3 | S98 | S99 | S100 | Average * (S1 to S100) |
---|---|---|---|---|---|---|---|
16.399 (0) | 14.927 (0) | 14.161 (0) | 13.309 (0) | 11.689 (0) | 13.055 (0) | 15.172 (0) | |
Bing | 34.862 (0) | 41.397 (0) | 34.254 (0) | 45.591 (0) | 47.16 (0) | 46.922 (0) | 39.658 (0.007) |
HERE | 22.591 (0) | 21.751 (1) | 22.067 (0) | 21.888 (0) | 21.799 (2) | 21.905 (0) | 22.08 (0.947) |
MapQuest | 2.095 (0) | 1.966 (0) | 2.002 (0) | 1.992 (0) | 1.903 (0) | 2.053 (0) | 2.099 (0) |
OSM | 103.15 (3) | 105.852 (3) | 106.63 (7) | 153.322 (1) | 243.697 (1) | 176.853 (4) | 139.641 (3.107) |
OpenCage | 102.294 (0) | 102.24 (0) | 102.413 (0) | 102.121 (0) | 102.385 (0) | 102.24 (0) | 104.805 (0) |
Mapbox | 16.286 (0) | 19.577 (0) | 18.654 (0) | 17.618 (0) | 18.15 (0) | 20.239 (0) | 20.968 (0) |
TomTom | 19.125 (0) | 19.485 (0) | 18.885 (0) | 19.107 (0) | 19.677 (0) | 18.981 (0) | 19.894 (0.007) |
ArcGIS | 48.599 (0) | 53.701 (0) | 54.623 (0) | 50.831 (0) | 49.717 (0) | 51.08 (0) | 50.785 (0) |
R Package | S1 | S2 | S3 | S98 | S99 | S100 | Average * (S1 to S100) |
---|---|---|---|---|---|---|---|
ggmap | 12.284 (0) | 17.146 (0) | 15.662 (0) | 15.64 (0) | 16.008 (0) | 15.369 (0) | 16.706 (0) |
hereR | 26.773 (0) | 24.621 (0) | 24.395 (0) | 24.472 (0) | 25.554 (0) | 26.036 (0) | 24.743 (0) |
mapquestr | 26.432 (0) | 26.001 (0) | 27.406 (0) | 27.417 (0) | 27.412 (0) | 27.031 (0) | 27.169 (0) |
tmaptools | 59.07 (3) | 57.651 (3) | 57.134 (7) | 54.086 (1) | 56.773 (1) | 54.241 (4) | 57.908 (3.053) |
opencage | 128.102 (0) | 130.952 (0) | 123.679 (0) | 124.793 (0) | 125.149 (0) | 124.511 (0) | 131.743 (0) |
mapboxapi | 7.25 (0) | 18.988 (0) | 18.333 (0) | 16.164 (0) | 15.829 (0) | 16.144 (0) | 16.335 (0) |
Method | S1 | S2 | S3 | S98 | S99 | S100 | Average * (S1 to S100) | |
---|---|---|---|---|---|---|---|---|
M1 | Google ** | 6.6925 | 6.779 | 7.985 | 5.2305 | 8.453 | 7.771 | 6.5186 |
M2 | Bing ** | 4.9065 | 6.0135 | 6.3115 | 5.487 | 5.2745 | 4.7655 | 5.2322 |
M3 | HERE ** | 1.7485 | 1.775 | 1.7635 | 1.756 | 1.7815 | 1.92 | 1.7739 |
M4 | MapQuest ** | 9.5625 | 8.6115 | 9.7975 | 9.628 | 8.932 | 9.8735 | 9.6415 |
M5 | OSM ** | 65.413 | 56.843 | 91.327 | 60.341 | 74.502 | 83.3505 | 71.2541 |
M6 | OpenCage ** | 461.807 | 264.147 | 1286.724 | 270.0545 | 709.77 | 242.6665 | 483.547 |
M7 | Mapbox ** | 9.711 | 8.1825 | 15.6265 | 9.4355 | 9.896 | 15.297 | 10.6494 |
M8 | TomTom ** | 6.8475 | 6.46 | 7.1395 | 5.4055 | 5.53 | 5.6115 | 5.7818 |
M9 | ArcGIS ** | 0.003 | 0.002 | 0.003 | 0.002 | 0.003 | 0.002 | 0.0025 |
M10 | ggmap | 6.502 | 6.967 | 7.985 | 5.2305 | 8.453 | 7.771 | 6.5124 |
M11 | hereR | 1.7485 | 1.775 | 1.7635 | 1.756 | 1.767 | 1.92 | 1.7744 |
M12 | mapquestr | 1.7485 | 1.7845 | 1.7635 | 1.756 | 1.829 | 1.92 | 1.7957 |
M13 | tmaptools | 79.43 | 65.845 | 79.323 | 61.588 | 75.837 | 87.4775 | 70.0706 |
M14 | opencage | 466.58 | 264.147 | 1286.724 | 270.0545 | 686.682 | 242.6665 | 481.9236 |
M15 | mapboxapi | 9.711 | 8.335 | 15.468 | 9.4355 | 9.896 | 15.297 | 10.6473 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pérez, V.; Aybar, C. Challenges in Geocoding: An Analysis of R Packages and Web Scraping Approaches. ISPRS Int. J. Geo-Inf. 2024, 13, 170. https://doi.org/10.3390/ijgi13060170
Pérez V, Aybar C. Challenges in Geocoding: An Analysis of R Packages and Web Scraping Approaches. ISPRS International Journal of Geo-Information. 2024; 13(6):170. https://doi.org/10.3390/ijgi13060170
Chicago/Turabian StylePérez, Virgilio, and Cristina Aybar. 2024. "Challenges in Geocoding: An Analysis of R Packages and Web Scraping Approaches" ISPRS International Journal of Geo-Information 13, no. 6: 170. https://doi.org/10.3390/ijgi13060170
APA StylePérez, V., & Aybar, C. (2024). Challenges in Geocoding: An Analysis of R Packages and Web Scraping Approaches. ISPRS International Journal of Geo-Information, 13(6), 170. https://doi.org/10.3390/ijgi13060170