A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language

Lu, Wenjuan; Ming, Dongping; Mao, Xi; Wang, Jizhou; Zhao, Zhanjie; Cheng, Yao

doi:10.3390/app15031073

Open AccessArticle

A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language

by

Wenjuan Lu

^1,2,

Dongping Ming

¹

,

Xi Mao

^2,*,

Jizhou Wang

²,

Zhanjie Zhao

² and

Yao Cheng

²

¹

School of Information Engineering, China University of Geosciences Beijing, Beijing 100083, China

²

Chinese Academy of Surveying and Mapping, Beijing 100036, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1073; https://doi.org/10.3390/app15031073

Submission received: 14 December 2024 / Revised: 20 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

To address current issues in natural language spatiotemporal queries, including insufficient question semantic understanding, incomplete semantic information extraction, and inaccurate intent recognition, this paper proposes NL2Cypher, a DeBERTa (Decoding-enhanced BERT with disentangled attention)-based natural language spatiotemporal question semantic conversion model. The model first performs semantic encoding on natural language spatiotemporal questions, extracts pre-trained features based on the DeBERTa model, inputs feature vector sequences into BiGRU (Bidirectional Gated Recurrent Unit) to learn text features, and finally obtains globally optimal label sequences through a CRF (Conditional Random Field) layer. Then, based on the encoding results, it performs classification and semantic parsing of spatiotemporal questions to achieve question intent recognition and conversion to Cypher query language. The experimental results show that the proposed DeBERTa-based conversion model NL2Cypher can accurately achieve semantic information extraction and intent understanding in both simple and compound queries when using Chinese corpus, reaching an F1 score of 92.69%, with significant accuracy improvement compared to other models. The conversion accuracy from spatiotemporal questions to query language reaches 88% on the training set and 92% on the test set. The proposed model can quickly and accurately query spatiotemporal data using natural language questions. The research results provide new tools and perspectives for subsequent knowledge graph construction and intelligent question answering, effectively promoting the development of geographic information towards intelligent services.

Keywords:

semantic encoding; natural language spatiotemporal questions; semantic understanding; DeBERTa

1. Introduction

With the rapid development of big data technology, researchers have placed higher demands on the processing of massive geographic information data [1]. The specific goal is to improve human–computer interaction efficiency and promote the comprehensive advancement of geographic spatiotemporal information from traditional data management and analysis modes towards intelligent services. Currently, most of the search engines are based on traditional keyword matching, which have limitations in dealing with natural language questioning with spatiotemporal characteristics, such as “What are the five restaurants closest to the Summer Palace?”, “Which subway stations were in Beijing in 2012?”, and “Which cities are within 600 km of Beijing?” [2]. To address these issues and help users experience the convenience brought by the big data era [3], enabling users to perform spatial queries and analyses through natural language descriptions without prior knowledge of complex Geographic Information Systems (GIS) or structured query languages is undoubtedly an innovative and practical attempt [4].

Spatiotemporal queries refer to query statements involving temporal and spatial attributes, typically combining geographic and temporal information for understanding and answering. Semantic understanding is a key technology in spatiotemporal query processing [5]. Converting natural language spatiotemporal queries into structured query statements is currently a hot research topic in natural language processing. Three main approaches address this challenge: rule-based methods, machine learning, and deep learning.

Rule-based methods utilize longest matching word segmentation with spatial knowledge base support to perform word segmentation on query statements while using spatial query sentence pattern templates for syntactic analysis [6], thereby interpreting spatial query targets and corresponding spatial operations [7]. Natural language questions are converted into standard spatial query statements and executed by relational databases, with results presented in textual or graphical form [8]. Reference [9] proposed a rule-matching model based on edit distance for matching words and sentence patterns in queries. Reference [10] combined rule-based and statistical models to extract spatiotemporal and attribute information from Chinese texts. In terms of query processing, however, methods based on statistical models are not suitable for performing the necessary data extraction tasks [11]. Reference [12] conducted a statistical analysis and classification of predicates and quantifiers describing spatial relationships in natural language, establishing four syntactic patterns for natural language spatial queries. Regarding structured query language conversion, the common approach is to implement natural language to structured language mapping through template construction [13,14]. Reference [15] designed a template-based question answering system that converts natural language questions in the geospatial dimension into GeoSQL (Geo Spatial SQL) queries.

Machine learning methods construct models using corpus data and machine learning algorithms, automatically converting natural language queries into query statements by learning the correspondence between language and databases. References [16,17] employ pre-trained named entity recognition models and dictionary lookup methods to identify geographic entities and utilize constituent syntax to extract spatial relationships between different entities. Through semantic constraint syntax, they extract spatial relationships between entities, and after annotating geographic entities and spatial relationship terms, they map their semantics to predefined templates [18].

In recent years, with the rapid accumulation of big data and the significant improvement of computer performance, deep learning methods have become a hotspot for many scholars’ research. Ian Goodfellow et al. [19] have deeply analyzed the core principle of deep learning technology and laid a solid theoretical foundation for exploring its application in the field of natural language processing. In the field of natural language processing, the application of deep learning models has become more and more widespread, especially in the task of transforming natural language problems into query statements, where it shows a powerful ability. The realization of this process mainly relies on the encoding–decoding framework of deep learning models; as explained by Charu C. Aggarwal [20] in their book Neural Networks and Deep Learning, advanced algorithms, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and other advanced algorithms, have played a key role in semantic representation and parsing of query statements, which have greatly contributed to the natural language processing technology.

Rule-based methods are overly dependent on predefined rules and templates for parsing and converting natural language queries, making it challenging to handle complex and variable semantics. Meanwhile, machine learning methods require manual feature extraction from data to achieve semantic understanding. However, this approach faces challenges, such as time consumption with massive datasets, and the model’s effectiveness is significantly impacted by feature quality, thus limiting the extracted features’ utility. The emergence of deep learning has, to some extent, alleviated these issues. Deep natural language learning models primarily focus on general natural language structure analysis and semantic expression [21,22], while research on transferring deep learning models to the spatiotemporal query domain remains relatively scarce. Existing pre-trained models are largely based on general corpora and lack a deep understanding of domain-specific terminology and special structures in spatiotemporal queries, resulting in suboptimal performance when processing such specialized terms. More critically, research on spatiotemporal query processing for Chinese language corpora is particularly limited.

Given the uniqueness of Chinese text, the complexity of spatiotemporal questions, and the abstractness of entity relations, traditional rule-based and syntactic pattern-based spatiotemporal question parsing methods are difficult to accurately capture the intention of the question and fail to accurately understand the intent of natural language spatiotemporal questions, for example, in processing “What are the hotels within 3 km north of the Forbidden City in Beijing?” This kind of complex query statement faces significant challenges. Based on the above background and challenges, this study proposes two core hypotheses:

(1) With a semantic coding module, by introducing the deep learning model DeBERTa for pre-training feature extraction, feeding the feature vector sequence into BiGRU [23] to learn text features, and combining with the CRF [24] layer to obtain the globally optimal labeling sequences, the entity recognition accuracy and generalization ability in natural language spatiotemporal interrogative sentences can be significantly improved. The model is able to capture the contextual information and semantic features in the text more effectively and thus recognize and classify entities more accurately.

(2) Semantic understanding methods complete the classification of interrogative sentences according to the encoding module and the expected query type and realize the conversion of natural language spatiotemporal interrogative sentences to the database language Cypher based on the semantic parsing of the interrogative sentences within the class.

In this paper, through the semantic encoding module and the semantic understanding methods, we realize the natural language spatiotemporal question semantics to database conversion model NL2Cypher. We use Chinese corpus to generate Cypher utterances and conduct an accuracy test; this model has exceeded the performance of other models, and compared with GPT2’s, Thousand Questions’, and other large models’ accuracy has obvious improvement, the conversion accuracy of the training set reaches 88%, and the test set reaches 92%. The experimental results show that the proposed model is able to query spatiotemporal data quickly and accurately using natural language problems, and the results of the study are useful for advancing the interaction between artificial intelligence and database, facilitating more efficient and convenient data querying and analysis [25].

2. NL2Cypher Conversion Model

NL2Cypher aims to address the challenge of converting complex natural language spatiotemporal queries into structured Cypher query statements. These natural language spatiotemporal queries extensively encompass multiple elements including geographic entities, location information, spatiotemporal relationships, and place types, while involving various query purposes such as location queries, distance calculations, directional guidance, quantitative statistical analysis, and extremum searches. Given the limitations of traditional keyword-based search methods in handling such complex and variable semantic requirements, and in order to ensure the reliability of the experiments and the superiority of the results, this study preprocesses the datasets, which includes steps such as data cleansing, word segmentation, data annotation, and segmentation of sentence lengths according to the data characteristics and model requirements. This paper thoroughly explores the semantic information extraction and parsing process of spatiotemporal queries, thereby achieving precise mapping between natural language and database query language.

The conversion model consists of two modules: a semantic encoding module and a semantic understanding method. The overall research framework is illustrated in Figure 1, and Table 1 provides an example of a natural language spatiotemporal query. The specific contents are as follows:

(1) Semantic encoding module: In this study, the DeBERTa pre-trained model is applied for the first time to the natural language spatiotemporal interrogative entity recognition task, which effectively overcomes the limitations of the traditional static word vectors and the BERT model in processing Chinese character features. By combining BiGRU, this model effectively mines the temporal and spatial features of the text, and furthermore, the dynamic integration between features is achieved with the help of CRF, which greatly optimizes the model’s ability to understand and characterize spatiotemporal interrogative sentences in Chinese natural language. In the Section 5.2 Semantic Encoding Model Results section of Section 5, comparisons are made with other state-of-the-art models, and the experimental results show that the F1 value of the present model is as high as 92.69%, which is a significant improvement in accuracy compared with other models.

(2) Semantic understanding method: This component classifies queries based on semantic encoding results; performs semantic parsing of intra-class queries according to temporal relationships, spatial relationships, and dependency relationships in the query structure; and subsequently generates Cypher statements to complete query intent recognition.

Through the above research framework, a comprehensive understanding and mapping of natural language spatiotemporal interrogations will be realized, the connection efficiency and accuracy between natural language interrogations and database queries will be improved, and effective solutions will be provided for natural language interrogation queries oriented to natural language with spatiotemporal characteristics.

3. Semantic Encoding Module

3.1. Semantic Encoding Elements

The key to the NL2Cypher transformation model is to accurately extract semantic information from natural language temporal questioning and deeply understand the intent of the questioning so that accurate Cypher query statements can be automatically generated.

A study by EhsanHamzei et al. [26] provides an in-depth analysis of patterns in location-based questions through large-scale question/answer datasets and generalizes five core semantic categories related to location: location, location type, activity, spatiotemporal relationship, and quality. In order to further improve the comprehension of natural language spatiotemporal questions, this paper builds on the research of EhsanHamzei et al. and the two datasets GeoQuestions and GeoQuery mentioned in Section 4.1, and the semantic coding schema of locations has been extended by increasing the number of categories from five to eight and introduces a more fine-grained coding mechanism to realize effective extraction of semantic information, which is designed to cover the complex semantics of natural language questions more comprehensively and thus represent the information in the spatiotemporal questions more accurately and in spatiotemporal interrogative sentences.

According to Table 2, the basic semantic encoding elements for Chinese spatiotemporal queries include eight categories:

(1) Interrogative words: locative interrogatives (where, which place, etc.), specific interrogatives (what, etc.), selective interrogatives (which one, which ones, etc.), quantitative interrogatives (how many, how much, etc.), measurement interrogatives (how long, how far, etc.), and judgment interrogatives (is it, whether, etc.);

(2) Place names: Names of geographic entities, such as Shanghai, Beijing’s Forbidden City, etc.;

(3) Location types: Classifications of geographic entities, such as hospitals, parks, etc.;

(4) Attributes: Properties of locations, such as population, area, time, etc.;

(5) Quality: Quality of locations and their attributes and location types, such as nearest, highest, etc.;

(6) Activities: Indicating actions, such as crossing, flowing through, building, etc.;

(7) Spatiotemporal relationships: Describing spatial and temporal relationships, including direction, topology, and temporal relationships, such as northeast, nearby, surrounding, before, after, etc.;

(8) Numbers and units: Including time, distance, quantity, etc., such as the year 2021, five kilometers, and three units.

Based on the above description, the query “Which cities are within 600 km around Beijing?” is encoded as “prd3t”.

3.2. Encoding Model Construction

The semantic encoding of spatiotemporal queries in Chinese corpora can be viewed as a sequence labeling process, where training on annotated data enables automatic encoding of query semantics. Chinese spatiotemporal queries contain rich semantic information, with sentence structures exhibiting long-distance dependencies and diverse semantic encoding elements [27], resulting in phenomena where the same word may be encoded as different types of information in various contexts.

To address these challenges, this paper proposes a DeBERTa-BiGRU-CRF model for spatiotemporal query semantic encoding, which is an improvement based on the dynamic pre-training model DeBERTa, and the model structure is shown in Figure 2. The model integrates a BiGRU network following the DeBERTa model to capture more contextual semantic features, followed by a CRF layer to obtain globally optimal label sequences. This architecture better understands semantic relationships between different words in sentences, captures long-distance dependencies, and resolves ambiguities.

(1) Embedding Layer

The embedding layer converts discrete text data into continuous, low-dimensional vector representations. Given an input text

X = {X_{1}, X_{2}, X_{3} . . ., X_{n}}

, the embedding layer learns a corresponding embedding vector for each character

X_{i}

. The output vector

Y = {Y_{1}, Y_{2}, Y_{3} . . ., Y_{n}}

, where

Y_{i}

is the embedding vector of the corresponding word

X_{i}

, serves as the input to subsequent layers.

(2) DeBERTa Layer

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model based on the Transformer architecture that enhances the performance of various NLP (natural language processing) tasks [28]. It uses a bidirectional training approach, leveraging contextual information from both directions for understanding and processing. Compared to traditional models like recurrent neural networks (RNN) and convolutional neural networks (CNN), BERT can better capture long-range dependencies through the attention mechanism of Transformers [29], thus improving language modeling [30].

The DeBERTa (Decoding-enhanced BERT with disentangled attention) model is an improvement over the BERT model. Introduced by Microsoft in 2020, the DeBERTa model primarily utilizes the Transformer’s encoder and demonstrates superior performance and robustness in spatiotemporal questions [31,32]. The task of the DeBERTa layer is to acquire deep semantic features of natural semantic spatiotemporal questions through word embedding training, converting text into word vectors. The model structure is shown in Figure 3.

The calculation process for the disentangled attention input representation in the DeBERTa model is shown in Equation (1).

A_{i, j} = H_{i} H_{j}^{T} + H_{i} P_{j | i}^{T} + P_{i | j} H_{j}^{T} + P_{i | j} P_{j | i}^{T}

(1)

Here,

H_{i}

and

H_{j}

represent the content embeddings of the i-th and j-th words, respectively.

P_{i | j}

represents the relative positional embedding between words i and j.

H_{i} H_{j}^{T}

represents the computation from content to content, while

H_{i} P_{j | i}^{T}

and

P_{i | j} H_{j}^{T}

represent the computations from content to position and position to content, respectively.

P_{i | j} P_{j | i}^{T}

represents the computation from position to position. The formula for calculating the attention scores is as follows:

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(2)

Among them, Q, K, and V are all word-embedding representations. The variable d represents the dimensionality of the hidden states. The variable k denotes the maximum relative distance, and T indicates the transpose operation.

(3) BiGRU Layer

The task of the BiGRU layer is to extract more semantic features from the text. A Bidirectional Gated Recurrent Unit (BiGRU) is constructed based on two GRUs, each operating in opposite directions [33]. In the structure of a unidirectional GRU network, state information is propagated in a single direction; for example, a left-to-right GRU can only learn information from previous time steps. However, the output of a BiGRU network is determined by both GRUs, allowing it to capture information from both past and future time steps. Since the semantic information of a sentence is related to both its preceding and succeeding context, this paper employs BiGRU to capture more comprehensive contextual information.

In BiGRU, the reset gate n selectively forgets parts of the information in the previous hidden state, while the update gate controls the degree to which the previous hidden state is updated and selects the candidate hidden state to be updated [34]. The computational propagation of BiGRU is described by Equations (3)–(6).

z_{t} = σ (U_{z} h_{t - 1} + W_{z} x_{t} + b_{z})

(3)

r_{t} = σ (U_{r} h_{t - 1} + W_{r} x_{t} + b_{r})

(4)

\tilde{h_{t}} = \tanh (U_{h} (r_{t} \otimes h_{t - 1}) + W_{h} x_{t} + b_{h})

(5)

h_{t} = z_{t} \otimes \tilde{h_{t}} + (1 - z_{t}) \otimes h_{t - 1}

(6)

Here,

z_{t}

and

r_{t}

represent the update gate and reset gate at time t, respectively;

U

and

W

are the weight matrices; and

b

denotes the bias matrix. The variable

σ

is the activation function of

s i g m o i d

,

x_{t}

represents the input information at the current time step,

\tilde{h_{t}}

is the state information of the new memory unit at the current time step, and

h_{t}

denotes the updated state information at the current time step, while

h_{t - 1}

represents the updated state from the previous time step. The symbol

\otimes

multiplication of the corresponding elements of the matrix. The final calculation result of the BiGRU is a combination of the forward and backward propagation results, with the backward propagation process mirroring the forward propagation.

(4) CRF Layer

Following the BiGRU layer, a Conditional Random Field (CRF) layer is added to predict the label sequence [35]. The CRF effectively models the dependencies between labels, ensuring that the generated label sequence is coherent and reasonable. The goal of this layer is to recognize geographic entities in natural language problems and to annotate the extracted word vectors using the BIOES format.

In the annotation stage, the commonly used annotation schemes include BIO and BIOES patterns. The BIOES schema is applicable to the spatiotemporal interrogative data in this study due to its ability to clearly distinguish the boundaries of spatiotemporal interrogative entities from single-word entities. Compared to the simplicity of the BIO schema, BIOES provides clearer marking at the end of the entity and reduces the ambiguity of boundary identification. B denotes the labeling of the start of an entity, I denotes the interior of an entity, E identifies the character located at the end of the entity, S denotes a single word, and O denotes irrelevant information.

4. Semantic Understanding Methods

4.1. Question Classification

The extraction of semantic information from natural language spatiotemporal questions focuses on the perspective of the question itself. Often, the semantics within a question may be abbreviated or misspelled. After extracting spatiotemporal questions using the semantic encoding module, it is necessary to perform a normalization process on the question set. This involves classifying and semantically parsing the questions to further determine the category to which a spatiotemporal question belongs.

Question classification aids in better understanding the user’s intent and requirements, thereby providing customized query results based on different question categories. This paper uses the benchmark datasets GeoQuestions [36] and GeoQuery [37] for geospatial question answering as a basis, extending categories to include direction, distance, and time question types, such as basic queries, composite queries, and fuzzy queries, and reclassifying the questions. In the geospatial context, the answers expected from spatiotemporal question queries are related to factors such as location, distance, direction, quantity, and extremum. Therefore, as shown in Table 3, questions are classified into eight types based on the expected answer type. These classification methods offer a structured processing framework for spatiotemporal questions, facilitating accurate understanding of user needs and the provision of corresponding query results.

After semantic encoding, extracted keywords can serve as critical features for distinguishing between question categories. Due to the obvious features of interrogative sentences, this paper adopted Bert (Bidirectional Encoder Representations from Transformers) [38] large model to distinguish different types of interrogative sentences. To accurately distinguish complex queries with similar textual descriptions and question types that return answers in similar formats, a fusion of question vectors and encoded key elements was used, with the final classification results obtained through model training. For example, interrogative keywords were used to differentiate between two complex queries like “Where is the hotel closest to Tiananmen?” and “Which is the hotel closest to Tiananmen?” Buffer zone queries, extremum queries, and list queries all returned answers as lists of location types, but due to different key elements in the questions, the question type that best reflected the features of the keywords was prioritized in the response.

4.2. Semantic Parsing

Building on question classification, semantic parsing was performed on intra-category questions by identifying the relationships among encoded entity elements to determine the query conditions for each type of question. In the context of geospatial and temporal questions, the focus was primarily on spatial relationships, temporal relationships, and dependency relationships [39]. The LTP (Language Technology Platform) tool [40] was used to perform dependency syntax analysis on the questions, and relationship recognition was carried out based on the needs and characteristics of different question types.

(1): Spatial Relationships

The three elements of spatial descriptions are the position object (P), the reference object (C), and the spatial relation (r) [41]. The position object is the core of the spatial description, representing the object being located or described. The reference object serves as the benchmark or reference point used to locate the position object within the spatial description, providing a reference framework for the position object. Spatial relations are the spatial connections between the position object and the reference object, including directional relationships (such as east, west, south, north, northeast, southeast, etc.), distance relationships (such as around, nearby, etc.), and topological relationships (such as bordering, containing, adjacent, etc.). The spatial relationship triplet in a question is represented in the following form:

{< P_{i}, r_{j}, C_{i} >}_{j} i, j = 1, 2, \dots, n

(7)

In the formula, i represents all objects related to r in the question, and j represents all spatial relationships in the sentence. For example, in the sentence “the park and bank north of the museum”, it contains

{< P a r k, N o r t h, M u s e u m >}_{1}

,

{< B a n k, N o r t h, M u s e u m >}_{1}

,

{< P a r k, N e a r b y, M u s e u m >}_{2}

,

{< B a n k, N e a r b y, M u s e u m >}_{2}

.

Based on the semantic encoding results of the sentence, if there is a spatial relationship encoding r, then use a combination of dependency syntax and rule-based methods to identify the location object and reference object of the spatial relationship and determine the spatial relationship triple.

(2): Temporal Relationships

When describing geographic entities, temporal factors should be considered, such as the historical evolution of geographic entities, changing trends, and the state of specific periods [42], to more comprehensively understand and describe geographic phenomena and analyze the changes in geographic data over time. Spatiotemporal questions mainly include absolute time (e.g., September 2010) and relative time (e.g., five years ago). Their temporal relationship words mainly express temporal sequence relationships (e.g., before, after, until, etc.) and temporal modification relationships (e.g., around, about, approximately, etc.) [43]. In spatiotemporal questions, the queries and descriptions of time primarily focus on the time attributes of geographic entities or the temporal relationships between geographic entities. If a time encoding ‘d’ and a temporal relationship encoding ‘r’ are identified in the question, then time is assigned as an attribute or association to the entity encodings in the question, and ‘r’ is mapped to the temporal relationship function based on the temporal relationship set.

(3): Dependency Relationships

The entities in the question usually have dependency relationships, such as subject–predicate, verb–object, coordination, attribute modified, and preposition–object [44,45]. For instance, if there is a coordination relationship among entities in the sentence and the first entity is associated with a certain relationship (such as a spatial relationship), then the coordinated entity is usually also associated with the same relationship. If the dependency relationship between entities is that of an attribute modifier, it indicates a modification relationship between the entities.

Based on the syntactic relationships, determine the query target of the question: location queries typically take the location ‘p’ or location type ‘t’ as the subject; distance queries take location ‘p’ as the subject; direction queries must determine the reference object and query object based on the subject–predicate relationship, verb–object relationship, etc.; buffer, counting, extremum, and list queries generally have the location type ‘t’ as the query target; and judgment queries have the IF function as the query target. List the extracted entity encoding classes and define the relationships between entities as functions; treat the query target of the question as a variable and the remaining information as constants; and define functions for buffer, direction, distance, sorting, fuzzy queries, and topological queries, such as Buffer(p, number, t), Direction(

p_{1}

, East,

p_{2}

),

D i s t a n c e (p_{1}, p_{2})

, Order(o, number), etc.

As an example, for the query “What parks within five kilometers south of the Forbidden City in Beijing were built in the twentieth century?”, follow the following steps:

(1) Extract entities and relations, location (p) and relation, location type (t), etc.

(2) Declare terms based on the encoding, declare location p (‘Forbidden City’), declare location type t (‘Park’).

(3) Define query target by taking the unknown query target as a variable

x

,

x

= t (‘Park’).

(4) Define function by defining the query conditions, such as dependency, spatial and temporal relations as functions, InCity (‘Forbidden City’, ‘Beijing’), Buffer (‘Forbidden City’, 5000 m, ‘Park’), Direction (‘Forbidden City’, ‘South’, ‘Park’).

(5) Nominalize language as p (‘Forbidden City’) AND p (‘Beijing’) AND InCity (‘Forbidden City’, ‘Beijing’) AND t(‘Park’) AND

x

= t AND Direction (‘Forbidden City’, ‘to the south’,) AND Buffer (‘Forbidden City’, 5000 m,

x

) AND o (‘Time’ = ‘twentieth century’) AND Return (‘

x

’).

4.3. Generating Structured Query Language

Based on the classification of questions (count queries) and semantic parsing, a structured query statement in Cypher was constructed. Cypher is a query language widely used for interacting with graph databases [46]. It occupies a significant position in the field of graph data processing due to its intuitive and powerful expressive capability. Cypher provides a natural language-like way to query, update, and manage graph data, enabling developers to efficiently manipulate and explore complex graph structures. This paper took Cypher as an example to conduct an in-depth study of the algorithmic process for generating structured language, specifically Cypher query statements.

The basic structure of a Cypher query statement includes components such as MATCH, WHERE, RETURN, ORDER BY, and WITH. The algorithm flow is illustrated in Figure 4. The MATCH statement matches the geographical entity part to the nodes and relationships in the knowledge base, assigning a unique variable to each extracted geographic concept. The WHERE clause is generated by query conditions (function modules), which are written in Cypher. For example, the direction function in the query conditions is calculated using the arctangent function to obtain directional results such as east, south, northeast, southwest, south by southwest, and west–southwest degrees. In the distance function, fuzzy queries (such as around, nearby, etc.) refer to quantitative distance ranges assigned according to place types as proposed by D. Punjani [36]. In the time function, time modifiers (such as around, approximately, etc.) are standardized and quantitatively analyzed using different scales of time units. The RETURN clause specifies the answer to the query, which is determined by the type of question and the query target. Semantic parsing results is shown in Table 4.

4.4. Evaluation Metrics

4.4.1. Model Performance Analysis

In order to quantitatively evaluate the performance of the model, the confusion matrix and related metrics are used in this study. In the context of the semantic encoding model, “TN” refers to the number of samples correctly predicted as negative, “TP” (True Positive) refers to the number of samples correctly predicted as positive, “FN” (False Negative) refers to the number of samples incorrectly predicted as negative, and “FP” (False Positive) refers to the number of samples incorrectly predicted as positive [47,48]. These four ratios form the fundamental components of the confusion matrix, which constrain each other and collectively reflect the partitioning outcome.

Precision rate indicates the rate at which samples in the test set that were actually positive were predicted positively.

Recall rate indicates the rate of samples in the test set that were predicted to be positive that were actually positive.

The FI score represents the reconciled average of the precision and recall rates. A higher F1 score indicates superior performance of the model, showcasing its effectiveness in accurately and comprehensively identifying positive instances.

Accuracy rate indicates the overall rate of correct predictions in the test set.

The calculation formula is shown in Figure 5.

4.4.2. Accuracy Verification

In this paper, the accuracy of Cypher statements is evaluated using three metrics [49]; here, N denotes the total number of query statements in the dataset;

N_{q m}

represents the number of query statements that yield correct results upon execution; and

N_{l f}

indicates the number of query statements that match perfectly with manually generated standard Cypher queries.

(1) Query accuracy (

A_{{C C}_{q m}}

) is the prediction accuracy, where “the execution result of the automatically generated Cypher matches the execution result of the true Cypher”.

A_{{C C}_{q m}} = \frac{N_{q m}}{N}

(8)

(2) Logical form accuracy (

A_{{C C}_{l f}}

) involves a manual determination of whether the generated Cypher language matches the verification labels in terms of query intent and the query return results.

A_{{C C}_{l f}} = \frac{N_{l f}}{N}

(9)

(3) Execution accuracy (

A_{{C C}_{e x}}

) combines the previous two metrics to check whether the execution results of the two Cypher statements are consistent.

A_{{C C}_{e x}} = \frac{A_{{C C}_{q m}}}{A_{{C C}_{l f}}}

(10)

5. Experiment and Discussion

5.1. Dataset Construction

Due to the lack of Chinese datasets for natural language spatiotemporal interrogative queries, this study primarily constructs a Chinese dataset with spatiotemporal characteristics for natural language question queries by performing entity replacement, sentence structure reconstruction, and synonym substitution on the gold standard geospatial datasets GeoQuestions and GeoQuery. The specifics of the dataset are as follows:

(1) GeoQuestions contains 201 English geospatial questions, including questions about location, spatial relationships of geographic entities, spatial relationships of geographic features, quantities, and aggregation.

(2) GeoQuery (http://www.cs.utexas.edu/users/ml/geo.html, accessed on 2 July 2023) consists of 880 natural language questions divided into a training set with 600 question–answer pairs and a test set with 280 question–answer pairs.

Based on the GeoQuestions and GeoQuery datasets, this paper performs entity substitution, sentence structure reconstruction, synonym substitution, and accurate Chinese translation on geographic queries and expands and optimizes the original query sentences into 9416 Chinese geographic questions, with 13 coding types, totaling 64,994 characters. Along with the expansion optimization, the problem of representativeness and bias of the dataset is also highly emphasized. To solve these problems, data expansion and filtering techniques are used in this paper. Through data expansion, the number of samples in the dataset is increased. Through data filtering, the noise and abnormal data are effectively removed, and the bias is reduced. The implementation of these measures provides a strong guarantee for the accuracy and reliability of the study. The number of semantic encodings for each type is shown in Table 5 (datasets: https://github.com/yinxingren1/dataset.git, accessed on 10 December 2024).

The dataset is annotated using the BIOES format (B for the beginning of an annotation sequence, I for inside, E for the end, S for a single character, O for outside information; the meanings of other English letters are shown in Table 2. Elements of Spatiotemporal Semantic Encoding), with detailed annotations shown in Table 6.

(3) The classification dataset contains a total of 9416 Chinese geospatial questions with manually annotated question categories to ensure that each question is accurately categorized into its corresponding category, which provides a rich dataset for the training of the semantic coding model in the following.

5.2. Semantic Encoding Model Results

This study utilizes the PyTorch deep learning framework and is programmed in Python. The integrated development environment used is JetBrains PyCharm software (version 2020.1), running on the Windows 11 64-bit operating system. After tuning the parameters of the DeBERTa-BiGRU-CRF model, the number of layers is set to 12, and the Adam optimization algorithm is employed with a learning rate of 1 × 10⁻⁵. To prevent overfitting, adam_epsilon and dropout mechanisms are applied. The model’s batch size is set to 64, with the number of epochs being 60. The dataset is divided into training, validation, and test sets in an 8:1:1 ratio, with specific corpus statistics shown in Table 7.

In the comparative experiments, the model performance was evaluated using metrics commonly employed in sequence labeling tasks, including precision, recall, and F1 score. The DeBERTa-BiGRU-CRF model demonstrated improvements in accuracy by 12.33%, 7.32%, and 2.71% over the baseline models BiLSTM-CRF, BERT-BiLSTM-CRF, BERT-BiGRU-CRF, and T5, respectively; recall increased by 12.97%, 7.52%, 3.63%, and 0.9%; and the F1 score improved by 11.93%, 7.18%, 2.53%, and 1.17%.

To facilitate a more intuitive visualization of the experimental data, Figure 6 presents and compares the results of the different models using a bar chart.

We performed a comparative analysis using the training process curves of the validation dataset. As indicated by the curves in Figure 7, the gradual decrease in the loss function and increase in accuracy reflect the learning capabilities of the five models. The eventual stabilization of the curves indicates that each model has developed into a stable structure with decision-making capability. In terms of training outcomes, the model proposed in this paper has the highest training accuracy, with an accuracy of 0.9562.

Through the analysis of the experimental results, it can be seen that the BiLSTM-CRF model has limitations in its feature representation ability, especially in dealing with polysemy, long-distance dependencies, and generalization, which directly affects the accuracy of the results. The BERT model, although it successfully incorporates the contextual information, ignores the dependencies between the words, and thus, its performance is better than that of BiLSTM-CRF but still not as good as the model proposed in this paper. Similarly, although BERT considers contextual information, it does not fully consider absolute positional information, which makes its performance better than LSTM-CRF, but its F1 value is lower than the model in this paper. T5 shows a stronger ability in natural language generation compared to other large models, but its accuracy is slightly lower in the application scenarios of spatiotemporal questioning compared to the method proposed in this paper, mainly because the model in this paper has been specially designed to be more accurate in the application scenarios of spatiotemporal questioning, but its performance is still lower than the model proposed in this paper because the model in this paper has been specifically designed to better fit the processing needs of spatiotemporal interrogative sentences, while T5 is more adept at a wide range of natural language processing tasks due to its wide applicability, which requires a lot of pre-training and fine-tuning processes. The DeBERTa model adopted in this paper, with its decoupled attention mechanism and introduced absolute positional encoding, achieves a significant performance improvement, not only optimizing the pre-training efficiency but also substantially enhancing the generalization ability.

The experimental analysis in this paper shows that the accuracy of question word coding is higher, while the accuracy of location and spatiotemporal relationship extraction is slightly lower than other coding types because the question words are relatively fixed and limited in number, and the learning effect of the model is better. For place names that contain multiple layers and are complex, the recognition is more difficult. Due to the limitation of the size of the training corpus, the frequency of these complex place names and certain spatiotemporal relationship words is low, and the model fails to fully learn these semantic information. To address this problem, the follow-up will focus on enhancing the diversity of recognized entity types and consider introducing a hierarchical named entity recognition method to further improve the recognition effect.

5.3. Conversion Result

To evaluate the accuracy of transforming spatiotemporal questions into database language, 160 questions were randomly selected from the dataset according to the category, with 20 questions per category. Accurate Cypher language annotations were provided as validation tags. Using the three evaluation metrics outlined in Section 4.4, the NL2Cypher translation model’s performance was assessed across different datasets. Accuracy inspection of NL2Cypher is shown in Table 8.

The results indicate that the NL2Cypher translation model achieved over 88% when measuring

A_{{C C}_{q m}}

for the training set, over 88% when measuring

A_{{C C}_{l f}}

, and over 92% when measuring

A_{{C C}_{q m}}

for the test set of the GeoQuestions dataset, with execution accuracy around 1.06. On the GeoQuery dataset, it achieved over 88% when measuring

A_{{C C}_{q m}}

for the training set and over 93% when measuring

A_{{C C}_{q m}}

for the test set. Table 9 represents the accuracy evaluation results obtained for these two datasets by different models.

The results indicate that compared to other models, NL2Cypher achieves a significant improvement in accuracy. This improvement is attributable to the use of the DeBERTa model to generate word vectors, as DeBERTa supports training on larger-scale corpora in an unsupervised manner, thereby enhancing accuracy and generalization capability. In summary, the research on generating structured language Cypher from natural language spatiotemporal queries conducted by this model holds considerable significance.

5.4. Error Analysis

The NL2Cypher model, based on the DeBERTa semantic conversion of natural language spatiotemporal queries, adheres to the semantics of Cypher, removing non-executable parts of Cypher queries to achieve more accurate results. This involves handling scenarios where the generated Cypher statements contain syntactic errors, mismatches between functions and target field names, and instances where query results return null values. This mechanism can be used with any autoregressive model to improve the accuracy of results. Consequently, the final generated Cypher statements were compared with standard Cypher statements by randomly selecting 100 samples where logical forms did not match for analysis. These were divided into two categories:

(1) “Unanswerable” cases (20 instances) are situations where the Cypher statements cannot generate correct query results from the given data. They primarily include the following two types: Type 1, insufficient semantic information recognition, is where complex place names, abbreviations, and spatiotemporal relationships are not accurately recognized, such as “Xing’an League A’ershan City Hetu Ala Town”, “Peking University”, and “adjacent”. Type 2, limitations in question type, are described as follows: The question classification framework proposed in this paper includes the most commonly used basic and compound queries in the geographic spatiotemporal field but also has limitations, resulting in a few questions that cannot be classified, leading to incorrect results. For example, the question “What is the length of the river that runs through London?” is an attribute query related to spatial association, which is not adequately considered in this classification framework. The existing framework is limited in capturing implicit semantic relationships and complex contextual information, leading to incomplete or inaccurate conversion results.

(2) For “Answerable” cases (80 instances), in analyzing the remaining 80 Cypher statements that were answerable, it was found that 35 of them had logical form errors. Further analysis of these 35 Cypher statements revealed that 33 had errors when generating functions, for example, overlooking “max” or “min” during comparison, and 2 contained errors in constructing the WHERE clause, such as confusing “feature code” when generating the clause for “What cities in ChaoYang have the highest populations?”, which led to confusion between Chaoyang in Beijing and Chaoyang in Liaoning.

Manual inspection revealed that in these 35 Cypher statements with logical form errors, 31 were capable of producing correct results despite not fully matching standard Cypher statements. This indicates that the actual performance of the model may be underestimated.

6. Conclusions and Future Work

Natural language semantic understanding enables computers to better comprehend user intentions and provides interfaces for user queries, thereby improving the user-friendliness, query expressiveness, and human–computer interaction of database systems, promoting intelligent geographic knowledge services. To address the issues of incomplete semantic information extraction and inaccurate intention recognition in natural language queries with spatiotemporal characteristics, this paper proposes the NL2Cypher model for automated semantic encoding and understanding of spatiotemporal queries. Through inter-class coarse division and intra-class fine division strategies, the model effectively enhances the semantic understanding and intention recognition capabilities of spatiotemporal queries. The model achieves relatively high accuracy in query matching and logical form generation, and even some Cypher statements with partially mismatched logical forms can still provide correct results based on given data. Overall, the conversion from natural language spatiotemporal queries to structured Cypher queries proposed in this paper is effective and feasible.

However, our method still has some limitations. For example, this study mainly relies on the dictionary information and glyph features of Chinese. However, recent studies have revealed that the phonetic and phonetic information of Chinese characters has potential value in improving task performance. Therefore, we will consider integrating the phonetic and phonetic information of Chinese characters into the character-level representation to achieve a comprehensive grasp of Chinese features. In future work, there are mainly the following areas that can be improved:

(1) Enhance Model Generalization

Although the NL2Cypher model performs well on the current test dataset, user queries in practical applications may be more diverse and complex. Therefore, it is necessary to collect more diverse natural language spatiotemporal query data that should cover a variety of complex spatiotemporal relationships and the use of unknown vocabulary, especially datasets from foreign language environments such as Russian, Greek, or Arabic. By incorporating these diverse data, we can more effectively train and optimize the NL2Cypher model to ensure that it can maintain high accuracy and stability in a wider range of query scenarios.

(2) Improve Query Matching and Logical Form Accuracy

While the model demonstrates good performance in query matching and logical form accuracy, there is still room for improvement. Future work should focus on more detailed model tuning and training as well as introducing additional constraints and rules to further enhance query matching accuracy and logical form consistency. Given the potential of the NL2Cypher model in enhancing database query functions and improving user friendliness, we can subsequently cooperate with big data analysis companies, artificial intelligence, and machine learning companies to promote the development and application of technology.

(3) Integrating multimodal data to improve query results

Spatial data such as maps and satellite images are integrated into the NL2Cypher model, and reinforcement learning techniques are introduced to more accurately predict user intentions and provide more personalized query suggestions based on user query preferences and patterns. At the same time, we are committed to developing and improving fairness assessment algorithms to ensure that query results are not only accurate and efficient but also meet the needs of different user groups.

This research result not only has a far-reaching impact in China but also has broad promotion value in the international market. By promoting the NL2Cypher model to the international market, it is expected to provide users in more countries and regions with an efficient and intelligent query experience.

Author Contributions

Conceptualization, methodology and validation are W.L., X.M. and D.M. Investigation and data curation are Y.C. Resources is J.W., review and editing are all authors. Supervision is W.L. and Z.Z., project administration is X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research and development and application demonstration of remote sensing monitoring technology for typical natural resource elements (Grant No. 2023YFE0207900).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, J.; Liu, H.; Chen, X.; Guo, X.; Zhao, Q.; Liu, J.; Kang, L. Rapid Retrieval of Geospatial Data Considering Semantic Knowledge. Geomat. Inf. Sci. Wuhan Univ. 2022, 47, 463–472. [Google Scholar]
Kefalidis, S.A.; Punjani, D.; Tsalapati, E.; Plas, K.; Pollali, M.; Mitsios, M.; Tsokanaridou, M.; Koubarakis, M.; Maret, P. Benchmarking geospatial question answering engines using the dataset GeoQuestions1089. In Proceedings of the International Semantic Web Conference, Athens, Greece, 6–10 November 2023; Springer Nature: Cham, Switzerland, 2023; pp. 266–284. [Google Scholar]
Li, S.; Zhu, X.; Li, Z.; Liu, W.; Cui, B. From geographic information service to geographic knowledge service: Research issues and development roadmap. Acta Geod. Cartogr. Sin. 2021, 50, 1194–1202. [Google Scholar]
Wu, R. Scenario-Based Query for Power Marketing Based on Knowledge Graph. Bachelor’s Thesis, Donghua University, Shanghai, China, 2023. [Google Scholar]
Janowicz, K.; Gao, S.; McKenzie, G.; Hu, Y.; Bhaduri, B. GeoAI: Spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond. Int. J. Geogr. Inf. Sci. 2020, 34, 625–636. [Google Scholar] [CrossRef]
Wei, Y.; Li, H.; Hu, D.; Li, X.; Ma, L. A method of Chinese place name recognition based on composite features. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 17–23. [Google Scholar]
Meng, C.; Li, Q.; Li, H.; Jia, J. Research on query method of geographic information based on ontology. Sci. Surv. Mapp. 2008, 33, 251–253. [Google Scholar]
Li, B. Research on Interpreting Mechanism of Natural Spatial Query Language. Bachelor’s Thesis, Information Engineering University, Zhengzhou, China, 2009. [Google Scholar]
Gai, S.; Liu, J.; Xiong, W.; Zhang, X.; Li, J. Research on Rule Matching in Natural Language Spatial Query Based on Levenshtein Distance. J. Geomat. Sci. Technol. 2015, 32, 416–421. [Google Scholar]
Zhang, C. Interpretation of Event Spatio temporal and Attribute Information in Chinese Text. Acta Geod. Cartogr. Sin. 2015, 44, 590. [Google Scholar]
Sirichanya, C.; Kraisak, K. Semantic data mining in the information age: A systematic review. Int. J. Intell. Syst. 2021, 36, 3880–3916. [Google Scholar] [CrossRef]
Deng, M.; Huang, X.; Liu, H.; Liu, G. An Approach for Spatial Query Based on Natural-Language Spatial Relations. Geomat. Inf. Sci. Wuhan Univ. 2011, 36, 1089–1093. [Google Scholar]
Cai, Q.; Xu, B.; Dong, X. Knowledge Graph Completion Model using Semantically Enhanced Prompts and structural Information. Comput. Sci. 2024, 12, 7–23. [Google Scholar]
Tu, W.; Li, B.; Liu, X.; Zheng, J. Application of NL2SQL with knowledge graph fusion in equipment maintenance data retrieval. Intell. Comput. Appl. 2024, 14, 118–124. [Google Scholar]
Cao, J.; Huang, T.; Chen, G.; Wu, X.; Chen, K. Research on Technology of Generating Multi-table SQL Query Statement by Natural Language. J. Front. Comput. Sci. Technol. 2020, 14, 1133–1141. [Google Scholar]
Chen, W.; Fosler-Lussier, E.; Xiao, N.; Raje, S.; Ramnath, R.; Sui, D. A synergistic framework for geographic question answering. In Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, 16–18 September 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 94–99. [Google Scholar]
Chen, W. Parameterized spatial SQL translation for geographic question answering. In Proceedings of the 2014 IEEE International Conference on Semantic Computing, Newport Beach, CA, USA, 16–18 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 23–27. [Google Scholar]
Scheider, S.; Nyamsuren, E.; Kruiger, H.; Xu, H. Geo-analytical question-answering with GIS. Int. J. Digit. Earth 2021, 14, 1–14. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Charu, C. Aggarwal. Neural Networks and Deep Learning: A Textbook; Springer: Cham, Switzerland, 2018. [Google Scholar]
Le, X.; Yang, C.; Yu, W. Spatial concept extraction based on spatial semantic role in natural language. Geomat. Inf. Sci. Wuhan Univ. 2005, 30, 1100–1103. [Google Scholar]
Kang, M.; Du, Q.; Wang, M. A newmethod of Chinese address extraction based on addresse tree model. Acta Geod. Cartogr. Sin. 2015, 44, 99–107. [Google Scholar]
Zhang, X.; Cai, Z. Text Sentiment Analysis Based on Bert-BiGRU-CNN. Comput. Simul. 2023, 40, 519–523. [Google Scholar]
Ren, Z.; Qin, X.; Ran, W. SLNER: Chinese Few-Shot Named Entity Recognition with Enhanced Span and Label Semantics. Appl. Sci. 2023, 13, 8609. [Google Scholar] [CrossRef]
Lee, J.G.; Kang, M. Geospatial big data: Challenges and opportunities. Big Data Res. 2015, 2, 74–81. [Google Scholar] [CrossRef]
Hamzei, E.; Li, H.; Vasardani, M.; Baldwin, T.; Winter, S.; Tomko, M. Place questions and human-generated answers: A data analysis approach. In Geospatial Technologies for Local and Regional Development, Proceedings of the 22nd AGILE Conference on Geographic Information Science 22, Limassol, Cyprus, 17–20 June 2019; Springer International Publishing: Cham, Switzerland, 2020; pp. 3–19. [Google Scholar]
Cheng, Y.; Xu, D.; Lv, X. Research on text reading comprehension and question answering methods based on hierarchical interactive network. Data Anal. Knowl. Discov. 2019, 2, 23–32. [Google Scholar]
Koroteev, M.V. BERT: A review of applications in natural language processing and understanding. arXiv 2021, arXiv:2103.11943. [Google Scholar]
Gui, T.; Xi, Z.; Zheng, R. A review of robustness research in natural language processing based on deep learning. Comput. Sci. 2023, 7, 1–26. [Google Scholar]
Lewis, M.; Liu, Y.; Goyal, N. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Liu, Q.; Xiao, K.; Cao, S.; Zhang, H.; Jiang, D. Research on Text Classification Methods by Fusing DeBERTa Model with Graph Convolutional Networks. Artif. Intell. Robot. Res. 2024, 13, 715. [Google Scholar]
He, P.; Liu, X.; Gao, J.; Chen, W. Deberta: Decoding-enhanced bert with disentangled attention. arXiv 2020, arXiv:2006.03654. [Google Scholar]
Zhao, Z.A.; Wang, J.; Mao, X.; Ma, W.; Lu, W.; He, Y.; Gao, X. A Multi-dimensional CNN Coupled Landslide Susceptibility Assessment Method. Geomat. Inf. Sci. Wuhan Univ. 2024, 49, 1466–1481. [Google Scholar]
Gao, X.; Wang, J.; Mao, X.; Zhao, Z.; Lu, W. The suseeptibility assessment of landslide based on Bi-GRU network. Seience Surv. Mapp. 2023, 48, 221–230. [Google Scholar]
Yu, B.; Fan, Z. A review of conditional random field models for natural Language Processing. J. Inf. Resour. Manag. 2020, 10, 96–111. [Google Scholar]
Punjani, D.; Singh, K.; Both, A.; Koubarakis, M.; Angelidis, I.; Bereta, K.; Beris, T.; Bilidas, D.; Ioannidis, T.; Karalis, N.; et al. Template-based question answering over linked geospatial data. In Proceedings of the 12th Workshop on Geographic Information Retrieval, Seattle, WA, USA, 6 November 2018; ACM: New York, NY, USA, 2018; pp. 1–10. [Google Scholar]
Zelle, J.M.; Mooney, R.J. Learning to parse database queries using inductive logic programming. In Proceedings of the National Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996; pp. 1050–1055. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.G.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Lu, F.; Zhu, Y.; Zhang, X. Spatiotemporal knowledge graph: Advances and perspectives. J. Geo-Inf. Sci. 2023, 25, 1091–1105. [Google Scholar]
Che, W.; Feng, Y.; Qin, L.; Liu, T. N-LTP: An Open-source Neural Language Technology Platform for Chinese. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 42–49. [Google Scholar]
Vasardani, M.; Timpf, S.; Winter, S.; Tomko, M. From descriptions to depictions: A conceptual framework. In Spatial Information Theory, Proceedings of the 11th International Conference, COSIT 2013, Scarborough, UK, 2–6 September 2013; Springer International Publishing: Cham, Switzerland, 2013; pp. 299–319. [Google Scholar]
Zhang, X.; Zhang, C.; Wu, M.; Lv, G. Spatiotemporal features based geographical knowledge graph construction. Sci. Sin. Inf. Sci. China: Inf. Sci. 2020, 50, 1019–1032. [Google Scholar]
Zhang, C.J.; Zhang, X.Y.; Wang, S.; Chen, X.D. Annotation of Spatial-Temporal Information of Event in Chinese Text. J. Chin. Inf. Process. 2016, 30, 213–222. [Google Scholar]
Guo, X.; He, T.; Hu, X.; Chen, Q. Chinese named entity relation extraction based on syntactic and semantic features. J. Chin. Inf. Process. 2014, 28, 183–189. [Google Scholar]
Gan, L.; Wan, C.; Liu, D.; Zhong, Q.; Jiang, T. Chinese named entity relation extraction based on syntactic and semantic features. J. Comput. Res. Dev. 2016, 53, 284–302. [Google Scholar]
Francis, N.; Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Rydberg, M.; Selmer, P.; Taylor, A. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; ACM: New York, NY, USA, 2018; pp. 1433–1445. [Google Scholar]
Qi, Y.; Zhai, R.; Wu, F.; Yin, J.; Gong, X.; Zhu, L.; Yu, H. CSMNER: A Toponym Entity Recognition Model for Chinese Social Media. ISPRS Int. J. Geo-Inf. 2024, 13, 311. [Google Scholar] [CrossRef]
Yao, X.; Hao, X.; Liu, R.; Li, L.; Guo, X. AgCNER, the first large-scale Chinese named entity recognition dataset for agricultural diseases and pests. Sci. Data 2024, 11, 769. [Google Scholar] [CrossRef] [PubMed]
Yang, M. Research on Semantic Driven Data Query and Intelligent Visualization. Bachelor’s Thesis, Chongqing University, Chongqing, China, 2018. [Google Scholar]

Figure 1. Technology framework of semantic understanding of spatiotemporal questions in natural language.

Figure 2. Model structure of DeBERTa-BiGRU-CRF. (N stands for multiple Transformer layers).

Figure 3. Model structure of DeBERTa.

Figure 4. Generating structured query workflows.

Figure 5. Model performance evaluation indices and calculation formulas.

Figure 6. Precision, recall, and F1 score of different models.

Figure 7. Comparison of model training processes.

Table 1. Examples of natural language to Cypher.

Question	What Are the Three Closest Parks to Tiananmen Square?
Semantic encoding	spqdt2
Question type	maximum/minimum value query
Cypher question	MATCH (p{name: ‘Tiananmen Square’}), (t:parks) WITH p, t ORDER BY point.distance(point({latitude: p.latitude, longitude: p.longitude}), point({latitude: t.latitude, longitude: t.longitude})) LIMIT 3 RETURN t

Table 2. Elements of spatiotemporal semantic encoding.

Semantic Types	Word Class	Code	Semantic Types	Word Class	Code
Location interrogative	Question word	1	Place type	Noun	t
Specific interrogative	Question word	2	Attribute	Noun	o
Choice interrogative	Question word	3	Quality	Adjective	q
Quantity interrogative	Question word	4	Activity	Verb	s
Measurement interrogative	Question word	5	Spatiotemporal relation	Preposition/Noun	r
Judgment interrogative	Question word	6	Numbers and units	Numerals, Units	d
Place name	Noun	p

Table 3. Question sentence type.

Question Types	Question Examples
Location Query	[Tiananmen Square][nearby][parking lot] where is it?
Distance Query	What’s the distance between [Beijing] and [Shanghai]?
Direction Query	What direction is [Nanjing] from [Paris]?
Buffer Query	What [hotels] are within [3 km] [north] of [Beijing][Forbidden City]?
Count Query	How many [parks] were there in [Beijing] in [2000]?
Maximum/Minimum Query	What are the [closest][eight][hotels] to [Beijing][Forbidden City]?
List Query	What [universities] were in [Beijing] in [1980]?
Judgment Query	Was [Beijing]’s [population][one million] in the [19th century]?

Table 4. Semantic parsing results.

Question 1	How Many Cities Are There Within a 500-km Radius of Jinan, Shandong?
Semantic Encoding	ppd3t
Question Type	Count Query
Language Normalization	COUNT(x):p(‘Shandong’) AND p(‘Jinan’) AND t(city) AND x = COUNT(t) AND Buffer(‘Jinan’, 500,000 m, x) AND Return(x)

Table 5. Dataset statistical information.

Semantic Coding Elements	Count	Semantic Coding Elements	Count
Location interrogative-1	371	Place type-t	2056
Specific interrogative-2	186	Attribute-o	1004
Choice interrogative-3	1468	Quality-q	493
Quantity interrogative-4	629	Activity-s	945
Measurement interrogative-5	240	Temporal relation-r	669
Judgment interrogative-6	405	Numbers and units-d	1013
Place name-p	6094

Table 6. Example of spatiotemporal semantic encoding annotation. (Which direction is Wuhan relative to Qingdao? How many restaurants are there near Yuyuantan Park in Beijing? What hospitals were in NanJing in the 1980s?).

Corpus	Annotation Label	Corpus	Annotation Label	Corpus	Annotation Label
武	B-p	北	B-p	二	B-d
汉	E-p	京	E-p	十	I-d
相	O	玉	B-p	世	I-d
对	O	渊	I-p	纪	E-d
青	B-p	潭	I-p	八	B-d
岛	E-p	公	I-p	十	I-d
在	O	园	E-p	年	I-d
哪	B-3	附	B-r	代	E-d
个	E-3	近	E-r	,	O
方	B-o	有	O	南	B-p
向	E-o	多	B-4	京	E-p
?	O	少	E-4	有	O
		家	O	哪	B-3
		餐	B-t	些	E-3
		厅	E-t	医	B-t
		?	O	院	E-t
				?	O

Table 7. Training corpus statistics.

Corpus	Training Set	Validation Set	Test Set
Character count	52,311	6234	6449
Question count	7508	948	960
Code count	13,618	1656	1799

Table 8. Accuracy inspection of NL2Cypher.

Dataset Classify	Precision of the Training Set			Precision of the Test Set
GeoQuestions	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{e x}}$	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{e x}}$
GeoQuestions	0.8825	0.8371	1.054	0.9226	0.8863	1.041
GeoQuery	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{e x}}$	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{e x}}$
GeoQuery	0.8837	0.8321	1.062	0.9282	0.8833	1.051

Table 9. The accuracy evaluation results obtained for these two datasets by different models, %.

Model	GeoQuestions				GeoQuery
	Training Set		Test Set		Training Set		Test Set
	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$	$A_{{C C}_{q m}}$	$A_{{C C}_{l f}}$
NL2Cypher	88.3	83.7	92.3	88.6	88.4	83.2	92.8	88.3
GPT2 (standard)	84.0	76.0	83.8	75.5	81.2	79.3	82.5	81.7
BaiChuan	65.8	59.5	59.4	55.9	76.0	70.4	72.5	71.6
QianWen	57.9	52.4	55.8	49.5	52.3	48.2	61.3	56.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, W.; Ming, D.; Mao, X.; Wang, J.; Zhao, Z.; Cheng, Y. A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language. Appl. Sci. 2025, 15, 1073. https://doi.org/10.3390/app15031073

AMA Style

Lu W, Ming D, Mao X, Wang J, Zhao Z, Cheng Y. A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language. Applied Sciences. 2025; 15(3):1073. https://doi.org/10.3390/app15031073

Chicago/Turabian Style

Lu, Wenjuan, Dongping Ming, Xi Mao, Jizhou Wang, Zhanjie Zhao, and Yao Cheng. 2025. "A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language" Applied Sciences 15, no. 3: 1073. https://doi.org/10.3390/app15031073

APA Style

Lu, W., Ming, D., Mao, X., Wang, J., Zhao, Z., & Cheng, Y. (2025). A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language. Applied Sciences, 15(3), 1073. https://doi.org/10.3390/app15031073

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language

Abstract

1. Introduction

2. NL2Cypher Conversion Model

3. Semantic Encoding Module

3.1. Semantic Encoding Elements

3.2. Encoding Model Construction

4. Semantic Understanding Methods

4.1. Question Classification

4.2. Semantic Parsing

4.3. Generating Structured Query Language

4.4. Evaluation Metrics

4.4.1. Model Performance Analysis

4.4.2. Accuracy Verification

5. Experiment and Discussion

5.1. Dataset Construction

5.2. Semantic Encoding Model Results

5.3. Conversion Result

5.4. Error Analysis

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Corpus	Annotation Label	Corpus	Annotation Label	Corpus	Annotation Label
武	B-p	北	B-p	二	B-d
汉	E-p	京	E-p	十	I-d
相	O	玉	B-p	世	I-d
对	O	渊	I-p	纪	E-d
青	B-p	潭	I-p	八	B-d
岛	E-p	公	I-p	十	I-d
在	O	园	E-p	年	I-d
哪	B-3	附	B-r	代	E-d
个	E-3	近	E-r	,	O
方	B-o	有	O	南	B-p
向	E-o	多	B-4	京	E-p
?	O	少	E-4	有	O
		家	O	哪	B-3
		餐	B-t	些	E-3
		厅	E-t	医	B-t
		?	O	院	E-t
				?	O

Corpus	Annotation Label	Corpus	Annotation Label	Corpus	Annotation Label
武	B-p	北	B-p	二	B-d
汉	E-p	京	E-p	十	I-d
相	O	玉	B-p	世	I-d
对	O	渊	I-p	纪	E-d
青	B-p	潭	I-p	八	B-d
岛	E-p	公	I-p	十	I-d
在	O	园	E-p	年	I-d
哪	B-3	附	B-r	代	E-d
个	E-3	近	E-r	,	O
方	B-o	有	O	南	B-p
向	E-o	多	B-4	京	E-p
?	O	少	E-4	有	O
		家	O	哪	B-3
		餐	B-t	些	E-3
		厅	E-t	医	B-t
		?	O	院	E-t
				?	O

Corpus	Annotation Label	Corpus	Annotation Label	Corpus	Annotation Label
武	B-p	北	B-p	二	B-d
汉	E-p	京	E-p	十	I-d
相	O	玉	B-p	世	I-d
对	O	渊	I-p	纪	E-d
青	B-p	潭	I-p	八	B-d
岛	E-p	公	I-p	十	I-d
在	O	园	E-p	年	I-d
哪	B-3	附	B-r	代	E-d
个	E-3	近	E-r	,	O
方	B-o	有	O	南	B-p
向	E-o	多	B-4	京	E-p
?	O	少	E-4	有	O
		家	O	哪	B-3
		餐	B-t	些	E-3
		厅	E-t	医	B-t
		?	O	院	E-t
				?	O