Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database
Abstract
:1. Introduction
2. System Framework
- A training component that provides the LLM with the database schema and sample data through a prompt. The schema can usually be obtained through certain database commands. Please see Figure 2 for an example.
- A prompt template that combines the training data with a data analysis question in natural language. Prompt templates are pre-defined recipes for generating prompts for language models. A template may include instructions, few-shot examples, and specific context and questions appropriate for a given task. In our setup, the template is “given the following tables in PostGIS, [database schema], please write a SQL query to answer [question]”. “Database schema” is a text description of the database schema and sample data, as described in Figure 2. “Question” is the data analytics question we would like to ask.
- A parser then parses the response from the database into human language following the format of the input question. This is also implemented using the LLM. For example, when the question is “what is the total population of New York City?”, the framework would return “the total population of New York City is around 8 million”, instead of returning a result table with the total column as the single column (Figure 3).
3. Case Study
3.1. Data
- Census blocks [26]: A census block is the smallest geography for which census data are reported. All higher-level census geographies (metro areas, counties, etc.) can be built from unions of census blocks. Example attributes include the total number of people, number of people self-identifying as “Asian”, etc.
- Neighborhoods: Neighborhoods are social constructs that do not follow the lines laid down by the government. For example, the Brooklyn neighborhoods of Carroll Gardens, Red Hook, and Cobble Hill were once collectively known as “South Brooklyn”.
- Streets: The street centerlines form the transportation network of the city. These streets have been flagged as different types in order to distinguish between thoroughfares such as back alleys, arterial streets, freeways, and smaller streets.
- Subway stations: Subway stations link the upper world where people live to the invisible network of subways beneath.
- Social economic data at the census tract level: Example attributes of this variable include the number of families and median family income.
3.2. Training Data
3.3. Non-Spatial Query
3.4. Spatial Query Based on a Single Table
3.5. Spatial Query with Spatial Joins
- ChatGPT knows how to translate the question into a query with the spatial join: “ST_Intersects ()”
- The right aggregation operation “SUM ()” was selected.
- The right tables and geometry types were identified, which are “nyc_census_blocks” and “nyc_neighborhoods”.
4. Quantitative Evaluation
5. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D. Emergent abilities of large language models. arXiv 2022, arXiv:2206.07682. [Google Scholar]
- Rajkumar, N.; Li, R.; Bahdanau, D. Evaluating the text-to-sql capabilities of large language models. arXiv 2022, arXiv:2204.00498. [Google Scholar]
- Bahrini, A.; Khamoshifar, M.; Abbasimehr, H.; Riggs, R.J.; Esmaeili, M.; Majdabadkohne, R.M.; Pasehvar, M. ChatGPT: Applications, opportunities, and threats. In Proceedings of the 2023 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 27–28 April 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 274–279. [Google Scholar]
- Lund, B.D.; Wang, T. Chatting about ChatGPT: How may AI and GPT impact academia and libraries? Libr. Hi Tech News 2023, 40, 26–29. [Google Scholar] [CrossRef]
- Biswas, S.S. Role of chat gpt in public health. Ann. Biomed. Eng. 2023, 51, 868–869. [Google Scholar] [CrossRef] [PubMed]
- Fraiwan, M.; Khasawneh, N. A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions. arXiv 2023, arXiv:2305.00237. [Google Scholar]
- Dong, Y.; Jiang, X.; Jin, Z.; Li, G. Self-collaboration Code Generation via ChatGPT. arXiv 2023, arXiv:2304.07590. [Google Scholar]
- Zhang, C.; Zhang, C.; Zheng, S.; Qiao, Y.; Li, C.; Zhang, M.; Dam, S.K.; Thwal, C.M.; Tun, Y.L.; Huy, L.L. A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need? arXiv 2023, arXiv:2303.11717. [Google Scholar]
- Zhang, Y.; Wei, C.; Wu, S.; He, Z.; Yu, W. GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT. arXiv 2023, arXiv:2307.07930. [Google Scholar]
- Mai, G.; Huang, W.; Sun, J.; Song, S.; Mishra, D.; Liu, N.; Gao, S.; Liu, T.; Cong, G.; Hu, Y. On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv 2023, arXiv:2304.06798. [Google Scholar]
- Osco, L.P.; Lemos, E.L.D.; Gonçalves, W.N.; Ramos, A.P.M.; Marcato Junior, J. The Potential of Visual ChatGPT For Remote Sensing. Remote Sens. 2023, 15, 3232. [Google Scholar] [CrossRef]
- Wang, D.; Lu, C.-T.; Fu, Y. Towards automated urban planning: When generative and chatgpt-like ai meets urban planning. arXiv 2023, arXiv:2304.03892. [Google Scholar]
- Tao, R.; Xu, J. Mapping with ChatGPT. ISPRS Int. J. Geo-Inf. 2023, 12, 284. [Google Scholar] [CrossRef]
- Gulwani, S. Automating string processing in spreadsheets using input-output examples. ACM Sigplan Not. 2011, 46, 317–330. [Google Scholar] [CrossRef]
- Balog, M.; Gaunt, A.L.; Brockschmidt, M.; Nowozin, S.; Tarlow, D. Deepcoder: Learning to write programs. arXiv 2016, arXiv:1611.01989. [Google Scholar]
- Le, T.H.; Chen, H.; Babar, M.A. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Comput. Surv. CSUR 2020, 53, 1–38. [Google Scholar] [CrossRef]
- Liu, A.; Hu, X.; Wen, L.; Yu, P.S. A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability. arXiv 2023, arXiv:2303.13547. [Google Scholar]
- Liu, J.; Xia, C.S.; Wang, Y.; Zhang, L. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv 2023, arXiv:2305.01210. [Google Scholar]
- Vaithilingam, P.; Zhang, T.; Glassman, E.L. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Proceedings of the Chi Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–7. [Google Scholar]
- Khan, J.Y.; Uddin, G. Automatic code documentation generation using gpt-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, MI, USA, 10–14 October 2022; pp. 1–6. [Google Scholar]
- Poldrack, R.A.; LU, T.; Beguš, G. AI-assisted coding: Experiments with GPT-4. arXiv 2023, arXiv:2304.13187. [Google Scholar]
- Qin, B.; Hui, B.; Wang, L.; Yang, M.; Li, J.; Li, B.; Geng, R.; Cao, R.; Sun, J.; Si, L. A survey on text-to-sql parsing: Concepts, methods, and future directions. arXiv 2022, arXiv:2208.13629. [Google Scholar]
- Finegan-Dollak, C.; Kummerfeld, J.K.; Zhang, L.; Ramanathan, K.; Sadasivam, S.; Zhang, R.; Radev, D. Improving text-to-sql evaluation methodology. arXiv 2018, arXiv:1806.09029. [Google Scholar]
- Li, J.; Hui, B.; Qu, G.; Li, B.; Yang, J.; Li, B.; Wang, B.; Qin, B.; Cao, R.; Geng, R. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. arXiv 2023, arXiv:2305.03111. [Google Scholar]
- Trummer, I. CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex. arXiv 2022, arXiv:2204.08941. [Google Scholar]
- Leslie, M.; Ramsey, P. Introduction to PostGIS. 2023. Available online: https://postgis.net/workshops/postgis-intro/ (accessed on 1 September 2023).
- Obe, R.; Hsu, L.S. PostGIS in Action; Simon and Schuster: New York, NY, USA, 2021. [Google Scholar]
- Loukas, L.; Stogiannidis, I.; Malakasiotis, P.; Vassos, S. Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance. arXiv 2023, arXiv:2308.14634. [Google Scholar]
- Scaringi, G.; Loche, M. An Interview with ChatGPT: Discussing Artificial Intelligence in Teaching, Research, and Practice. Available online: https://eartharxiv.org/repository/view/5041/ (accessed on 1 September 2023).
- Kasneci, E.; Sessle, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
- Sallam, M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
- Zhong, Q.; Ding, L.; Liu, J.; Du, B.; Tao, D. Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv 2023, arXiv:2302.10198. [Google Scholar]
Non-Spatial | Single Table Spatial Query | Spatial Join Query | |
---|---|---|---|
Temperature = 0 | 0.93 | 1 | 0.73 |
Temperature = 0.5 | 0.93 | 0.93 | 0.6 |
Temperature = 0.8 | 0.67 | 0.6 | 0.27 |
Non-Spatial | Single Table Spatial Query | Spatial Join Query | |
---|---|---|---|
GPT-4 | 0.93 | 1 | 0.73 |
PALM 2 | 1 | 0.93 | 0.46 |
LLAMA 2 | 0.8 | 0.93 | 0.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, Y.; Yang, C. Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database. ISPRS Int. J. Geo-Inf. 2024, 13, 26. https://doi.org/10.3390/ijgi13010026
Jiang Y, Yang C. Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database. ISPRS International Journal of Geo-Information. 2024; 13(1):26. https://doi.org/10.3390/ijgi13010026
Chicago/Turabian StyleJiang, Yongyao, and Chaowei Yang. 2024. "Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database" ISPRS International Journal of Geo-Information 13, no. 1: 26. https://doi.org/10.3390/ijgi13010026
APA StyleJiang, Y., & Yang, C. (2024). Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database. ISPRS International Journal of Geo-Information, 13(1), 26. https://doi.org/10.3390/ijgi13010026