1. Introduction
Place names are the designations of natural or human geographic entities at specific spatial locations. In simple terms, place names originate from the conceptualization and naming of geographical elements, entities, or places [
1]. Place names are a representative and special category of geospatial data within geographic information systems, providing an intuitive way to identify and access specific geographic locations, thereby enhancing the retrieval, analysis, and visualization capabilities of geospatial data. The naming of places typically reflects various factors such as geography, history, culture, society, and language [
2]. Among them, place names can be categorized into primary and related names based on their interrelationships [
3]. Primary names are newly coined for new places, while related names utilize primary names for naming new places. If the relationship between two place names is a derivational one, the newly named place is referred to as a derived place name [
4]. Derivational relationships refer to the process of naming new geographical entities by affixing, combining, and condensing existing place names to establish a relationship between the two place names. Among these derivational relationships, spatio-temporal derivational relationships stand out because they not only reflect similarities in the names of the two places but also relate to the location of the geographical entities, indicating spatial proximity between the two place names, playing an important role in semantic expression and geographic information retrieval.
In light of the current research landscape, which has largely overlooked the exploration of spatio-temporal derivation relationships among place names, there remains an untapped potential for leveraging these relationships to bolster the interconnectedness of place names and to elevate the capabilities of spatial retrieval and geographic question-answering systems. This paper introduces a comprehensive approach to identifying and articulating spatio-temporal derivation relationships, grounded in a clear understanding of these relationships. Initially, the paper delineates a precise definition of the spatio-temporal derivation relationships pertaining to place names. It then articulates the specific constraints that govern such relationships. With these constraints as a foundation, the paper proceeds to investigate methodologies adept at recognizing spatio-temporal derivation relationships. Following this, a spatio-temporal derivation network of place names is established to formally delineate the derivational connections between place names. Concluding the research, the paper delves into the reasoning of spatial adjacency and positional relationships through the lens of this network.
2. Related Work
Surveying and mapping geospatial information is an important strategic data resource and a new factor of production, and geographic name information is a representative and special category of geospatial data within surveying and mapping geospatial information. Statistical data show that place names play a pivotal role in the organization and management of geographic information, and research on the qualitative geospatial expression of geographic name information and its application services has become a hot topic in the GIS academic community at home and abroad in recent years. The application of geographic name information can be traced back to the 17th century in China and the 19th century in the United Kingdom, but the application and high regard for geographic name information did not widely occur for nearly two centuries [
1]. The study and application of geographic name information only began to receive high attention in the GIS academic community in the 21st century. Currently, scholars have conducted relevant research on the derivation of place names. Reference [
5] explored the role of place name derivation in constructing Yoruba riddles, finding that Yoruba riddles with derived place names not only reflect the habits, characteristics, and personalities of the people of the relevant towns and cities but also can serve as a supplementary source of information for historical construction and reconstruction. Reference [
6] deeply analyzed the derivational characteristics of suffixes in English and German, expanding our understanding of language evolution. Reference [
7] proposed a new method for identifying semantic relationships of proper noun derivation in geographic entities, which can effectively identify the semantic relationships of proper noun derivation in geographic entities and has potential application value in the fields of place name management and translation. Reference [
3] studied and analyzed the concept of derived place names, derivation methods, characteristics, and the current state of transliteration of foreign derived place names, helping people to comprehensively and systematically understand derived place names and providing references for workers in the transliteration of foreign place names into Chinese characters. Reference [
8] elaborated on the three sets of concepts in toponymic linguistics—old place names and new place names, primary place names and derived place names, symmetric place names and compound place names. Reference [
9] addressed the issue of the lack of annotation of derived place names in global geographic name data as well as the corresponding primary place names, derivation categories, positional relationships, etc., which cause barriers in the translation, research, and retrogression of derived place names and low efficiency in manually identifying completely derived place names and annotating derived information, by proposing an identification algorithm for common noun derivation and completely derived place names, improving the efficiency of translation. The aforementioned research delved into the definition of place name derivation and the impact on the genesis of derived place names, which not only enriches the theoretical foundation of place name derivation but also provides strong technical support for practical applications, but there is less research on the derivational relationships between primary place names and derived place names.
Semantics refers to the meaning inherent in a linguistic symbol, which aids readers in gaining a deeper understanding and comprehension of data. Understanding semantics can enhance people’s grasp of the meanings conveyed by language. Geographical name semantics refers to the meaning expressed by geographical symbols, encompassing the textual origins of place names. While place names, as symbolic representations, convey “what it is”, the semantics of place names, with the implied meaning, discuss “why it is so”. The semantics of place names, based on their constituent parts, include aspects such as spatial location, the meaning of the place name, its etymology, and administrative affiliation. Although semantics do not exist within the geographical entities and attributes themselves, when describing these entities and attributes using symbolic language, semantics permeate, expressing their intrinsic meanings. These intrinsic meanings are the result of the continuous accumulation of people’s cognition of the objective world in their living and growing environments.
In the study of place name semantics, reference [
10] revealed the one-to-one correspondence between the meanings of the constituent morphemes of Igbo place names and their arrangement (syntactic structure) and also demonstrates how place names are derived, which helps create a deeper understanding of the history, origins, and culture of the ethnic group. Reference [
11] conducted a morpho-syntactic and semantic analysis of the place names of the Luhya ethnic group in Bungoma County, western Kenya, using Fillmore’s frame semantics theory to determine whether the semantic elements in the place names reflect the historical functions and meanings of the names. The study showed that Luhya place names are generated through lexical rules and word transformations involving prefixation, compounding, and borrowing, and semantically, Luhya place names are transparent and descriptive in function, usually named according to topographical features, historical events, climatic conditions, and prominent figures. Reference [
12] investigated the unique meanings within place names by analyzing their constituent elements, classifying them morphologically, and exploring the distinctive meanings within place names, finding that the formation of Dholuo place names is primarily through derivational morphology and also includes compounding and inflection. Reference [
13] categorized types of villages and towns by mining the spatial information, naming characteristics, and spatial distribution of their place name semantics, proving that village and town place names change less compared to urban place names, have a strong correspondence between the origin of the place name and the entity, and uses place name semantics to excavate twenty-one types of characteristic forms of villages and towns in the Qinba mountain area. Reference [
14], based on the summarization of the characteristics of standardized place name word formation, started by analyzing people’s cognitive habits towards place names and, through the calculation of the semantic similarity of place names and the semantic consistency of the spatial topological relationships of geographical entities, carried out a comprehensive semantic consistency matching treatment of place names, thereby improving the accuracy and efficiency of place name semantic matching. Reference [
15] established a standardized semantic knowledge base of common place names based on the relationship between the common names and types of place names in standardized Chinese place names and used the semantic meanings of place names provided by it as an important indicator for place name similarity matching. The study of place name semantics reveals the deep meanings behind geographical symbols. Research on place name semantics not only enriches the theoretical system of place name semantics but also allows for a more comprehensive excavation of the intrinsic value of place names.
Scholars have conducted research on place name derivation and place name semantics from various perspectives, yet there is a lack of discussion on the spatio-temporal derivation relationships of place names in existing studies [
10,
11,
12,
13,
14,
15]. There has been insufficient exploration of leveraging these relationships to enhance the connectivity between place names and to improve the performance of spatial retrieval or geographic question-answering systems. As the application scenarios of place names continue to expand, they play an essential role in geographic information systems, not only serving as the core reference for spatial data positioning but also carrying a wealth of semantic information. The spatio-temporal derivation relationship, as a fundamental attribute of place names, is an important component of place name semantics and a crucial aspect of how people recognize and use place names. However, there has not been sufficient research on spatio-temporal derivation relationships, which has constrained further exploration and application of place names and their semantics. In this pioneering work, we introduce the novel concept of spatio-temporal derivation relationships for place names and articulate these connections. Through a rigorous definition and the establishment of clear criteria and identification techniques, we construct a network that encapsulates the spatio-temporal derivation of place names. Subsequently, leveraging this network, we delve into the study of spatial adjacency relationships through a reasoned approach.
The rest of this paper is organized as follows. In
Section 2, we introduce the relevant work of this article. In
Section 3, we provide a standardized definition of spatio-temporal derivation relationships of place names and outline the constraints of these relationships. In
Section 4, based on place name semantics, we construct a spatio-temporal derivation network of place names to formally express spatio-temporal derivation relationships. In
Section 5, we explain the reasoning of spatial adjacency relationships and spatial positions through the constructed network. In
Section 6, we discuss the experimentation and analysis of the proposed methods, comparing and discussing their advantages and disadvantages.
3. Concept, Definition and Judgment Methods of the Spatio-Temporal Derivative Relationship of Place Names
3.1. Concept of Spatio-Temporal Derivative Relationships of Place Names
The methods of naming place names are diverse, and according to different naming methods, place names can be categorized into the following types [
16]: (1) Descriptive place names: Those that depict the geographical characteristics of geographical entities, mainly including place names that indicate geographical locations, describe natural landscapes, and explain natural resources. (2) Narrative place names: Those that reflect the characteristics of human geography, mainly including place names that narrate cultural landscapes, record ethnic identities, document historical facts and legends, and embody certain ideological concepts. (3) Primary and related place names: Place names are divided into primary and related place names based on the relationships between them, with related place names mainly including transformed place names, imitation place names, and derived place names. The derivation relationship between primary and derived place names reveals the connections between place names.
The term “derivation relationship” refers to the process of naming newly discovered geographical entities by affixing, combining, and condensing existing place names to establish a relationship between two place names. In this context, the existing place name is called the “primary place name”, and the newly formed place name through derivation is defined as the “derived place name”.
The derivation relationship can be further divided into inheritance derivation, influence derivation, and spatio-temporal derivation: (1) Inheritance derivation involves direct inheritance of the name of the original place name by adding words such as “New” to the front of the primary place name to create a new place name, for example, the relationship between New York and York in the United Kingdom. By adding the word “New” to the original place name “York”, “New York” was created to commemorate their origin from the town of York in the United Kingdom. (2) Influence derivation naming is done by borrowing the name of a well-known place for new naming; for example, “Yantai Road” in Jinan City, Shandong Province, is named after Yantai, a prefecture-level city in Shandong Province. (3) Spatio-temporal derivation occurs when, based on the latest renamed primary place name, the newly discovered place name is located around the primary place name, so the primary place name is borrowed as part of the proper noun in the naming process. For example, “Peking University Subway Station” is named because the subway station is located near “Peking University”.
In the aforementioned derivation relationships, the spatio-temporal derivation relationship can indicate the semantic relationship and spatial proximity between two place names. However, current research has not conducted a detailed discussion on the spatio-temporal derivation relationships of place names, and there is also a lack of standardized definitions. Therefore, based on summarizing existing research, this paper provides a definition of the spatio-temporal derivation relationship of place names; that is, when naming newly discovered geographical entities, people often combine the existing place names of surrounding natural or artificial geographical entities and generate new place names through the derivation of these existing place names. In this context, we define the relationship between the two place names as the “spatio-temporal derivation relationship of place names”. Its formal expression is as follows:
In the formula, represents the set of primary place names, and represents the set of derived place names. and are sub-sets of and , respectively. R denotes the spatio-temporal derivation relationship. For each primary place name in the set , there is at least one derived place name in the set such that the two place names satisfy the spatio-temporal derivation relationship R. That is, each primary place name has at least one associated derived place name, and they are connected through the spatio-temporal derivation relationship R. The symbol denotes the time at which the original place name was generated, while signifies the time of emergence for the derived place name. Since derived place names are generated on the basis of original place names through a process of derivation, the time of origination for the original place name precedes that of the derived place name; that is, . The term represents the spatial distance constraint between the geographical entities represented by the original and derived place names, indicating a certain proximity.
Derived place names named using the spatio-temporal derivation relationship not only retain the geographical information that reflects the surrounding environment but also imply the relative positional relationship between the geographical entity and its neighboring entities, embodying the location function of place names. As a special type of place name relationship, the spatio-temporal derivation relationship can simultaneously represent the semantic and spatio-temporal connections between two place names. Therefore, identifying the derivation relationship between place names can not only enrich the expression of place name semantics but also more accurately retrieve geographical information.
In addition, other place name-related concepts involved in this paper are as follows:
Concept One: A primary geographical entity refers to the entity denoted by a primary place name.
Concept Two: A derived geographical entity refers to the entity denoted by a derived place name.
3.2. Definition of the Spatio-Temporal Derivative Relationship of Place Names
The determination of spatio-temporal derivation relationships of place names primarily involves constraints in both semantic and spatial aspects. Initially, the spatio-temporal derivation relationship and its components are formally expressed, and the semantic constraints are analyzed [
17].
In the formula, place name t is composed of the proper noun s and the common noun g; that is, ; represents the primary place name, which is composed of the primary proper noun and the primary common noun ; represents the derived place name, which is composed of the derived proper noun and the derived common noun ; C represents the set of categories, represents the category of the primary geographical entity, and represents the category of the derived geographical entity; the spatio-temporal derivation relationship R includes complete derivation relationship and incomplete derivation relationship .
A derived place name with a complete derivation relationship is composed of the primary proper noun , the primary common noun , and the derived common noun ; a derived place name with an incomplete derivation relationship is composed of the primary proper noun , the primary common noun , the derived proper noun , and the derived common noun . And there exists a derivation relationship where the two geographical entity categories are different; that is, . Therefore, the semantic constraint requires that there is an inclusion relationship between the primary place name and the derived place name in terms of the place name, and in terms of the category of the geographical entities, the two place names do not belong to the same category.
In terms of spatial distribution, the derived geographical entities should be clustered around the primary geographical entities and have a certain adjacency relationship with them. As shown in
Figure 1, if the spatial topological relationship between the two geographical entities is containment and adjacency, it indicates that the two geographical entities have spatial adjacency. Therefore, under the premise that the two place names meet the semantic constraints, it can be directly determined that the place names have a spatio-temporal derivation relationship; if the spatial topological relationship between the two geographical entities is separation, it is necessary to determine whether they have a spatio-temporal derivation relationship based on whether the spatial distance between them meets a specific spatial constraint distance.
The constraint distance between geographical entities is affected by the category of the primary geographical entity, and different categories of geographical entities have different spatial influence ranges. Therefore, it is necessary to determine the spatial distance constraint for this category in conjunction with the category of the primary geographical entity.
3.3. Methods for Determining the Derivative Relationship of Place Names
3.3.1. Semantic Constraint Judgment
- (1)
Semantic Similarity
Semantic similarity serves as a metric for gauging the degree of closeness in semantics or meaning between two texts. The smaller the value, the greater the semantic difference between the texts, that is, a lower level of semantic similarity; conversely, the larger the value, the higher the semantic similarity between the texts [
18,
19]. A distinct feature between the primary place name and the derived place name is the similarity in the names of the place names. Therefore, this paper utilizes semantic similarity to calculate the similarity between two place names and preliminarily determine the semantic relationship between them based on the similarity scores of the place names.
Given that place names are expressed in the form of strings, this paper employs a sequence comparison-based method to calculate the similarity between two place names. This method calculates the similarity by identifying the longest common continuous character sequence (longest common sub-sequence, LCS) between two strings [
20]. This method is fast in computation, takes into account the length of the sequence, and standardizes the results, allowing for direct comparison of the outcomes. The formula is as follows:
In the formula, LCS refers to the longest common sub-sequence between the two place names, s1 represents the primary place name, and s2 represents the proper noun of the derived place name. The similarity score is normalized to a range of 0 to 1, with a score of 0 indicating no common sub-sequence between the two strings, meaning the two place names have a common place name relationship; a score of 1 indicates that the two strings are identical, meaning the two place names have a complete derived place name relationship; any other score indicates that the two strings are partially identical, suggesting that the two place names have an incomplete derived place name relationship.
- (2)
Category Ontology
The place name data studied in this paper were publicly obtained from OpenStreetMap (OSM). The classification of data in OSM is primarily based on the characteristics and uses of geographical elements. This classification is implemented through tags, where each geographical element can be associated with one or more tags to describe its attributes, features, and uses. In OSM, the classification is typically divided into two levels: major categories and sub-categories. Sub-categories are refinements of major categories. For example: tag: shopping = supermarket, tag: shopping = clothes, where supermarket and clothes are refinements of the category shopping. Different combinations of tags can be used to define different categories, thereby achieving a more nuanced classification.
By preprocessing the acquired tags, redundant ones are eliminated, and category levels are defined, that is, the types of relationships between various categories. Among these, there is an inheritance relationship between major categories and sub-categories; for example, supermarket inherits from shopping. Sub-categories have a sibling relationship, meaning that supermarket and clothes have a sibling category relationship. For two place names with spatio-temporal derivation relationships, they are of different category derivation, so the categories to which they belong cannot be the same; that is, the categories of the two place names being different satisfies the category constraint.
To effectively store and manage the category ontology, this paper employs the Neo4j graph database. By mapping the ontology information into the graph database, efficient storage and querying from the ontology to the graph database was achieved [
21].
3.3.2. Spatial Constraint Judgment
- (1)
Topological Relationships
This paper initially adopts the nine-intersection model proposed by Clementini [
22], which is based on the extension of dimensions, to construct a relationship matrix by analyzing the intersecting dimensions (dimension, DIM) of the interior (interior, I), boundary (boundary, B), and exterior (exterior, E) of geographical entities a and b, thereby extracting the topological relationships of the geographical entities [
23].
In the formula, a and b represent two geographical entities; I, B, and E represent the interior, boundary, and exterior of the geographical entities, respectively; and DIM indicates the dimensions [
24]. Considering the characteristics of spatio-temporal derivation relationships, this study focuses on three basic topological relationships: disjoint, adjacent, and containment.
- (2)
Spatial Measurement
To accurately quantify the spatial relationships between geographical entities, this paper utilizes numerical values to represent the quantitative distance between entities. Taking into account that the geographical entities denoted by place names include point, line, and area feature types, the distance is calculated by first extracting the centroid of area and line features and then calculating the distance between the centroid and the point features. The formula for the distance between geographical entities is as follows:
In the formula: () and () represent the spatial coordinate values of the geographical entities denoted by two place names, respectively, and represents the distance between the two entities.
- (3)
Decision Tree Model
The decision tree (DT) model is a supervised learning algorithm that constructs a tree-like model through a top-down recursive process. The objective of this model is to learn decision rules from the training data to predict the label values of the target variable. The classification method based on the decision tree model is straightforward and easy to understand and interpret. Additionally, as a very fast learning and prediction algorithm, it can provide high efficiency for text classification and is suitable for classifying large-scale text data scenarios.
The CART decision tree classification method is characterized by its convenience, understandability, and high efficiency, making it one of the mainstream classification methods today [
25,
26]. In this study, considering the differences in the influence range of geographical features of different categories, the category of the original geographical features and the distance between two geographical features are taken as the basis for decision tree classification. A decision tree model is constructed and trained to determine the spatial constraints of place names that meet semantic relationships.
In summary, the identification of spatio-temporal derivation relationships of place names requires consideration of both semantic and spatial dimensions. In this process (
Figure 2), semantic constraints form the basic premise for derivation identification. If two place names do not meet the semantic constraints, there is no need to further explore their spatial relationships. This hierarchical identification method can improve the accuracy and efficiency of identification, avoiding ineffective spatial analysis between obviously unrelated pairs of place names.
7. Conclusions
This research first provides a standardized definition of spatio-temporal derivation relationships of place names, establishes the criteria and identification methods for these relationships, and constructs the corresponding network of spatio-temporal derivation relationships of place names. Through this network, inference of spatial adjacency relationships was conducted, thereby providing an effective approach to enhance the expression of place name semantics and the retrieval of geographic information. Using the method proposed in this paper to identify the spatio-temporal derivation relationships of Canadian place names, the precision rate reached 98.5%, and the recall rate was 93.4%. Furthermore, inference of spatial adjacency relationships through the network constructed in this paper can enhance the accuracy of existing quantitative query results, and the reasoning of spatial locations provides a solution for data not yet included in the repository.
In addition, this paper also has limitations such as incomplete identification of spatio-temporal derivation relationships of place names and insufficient coverage of place name data sources. Therefore, in further research, the sources of place name data can be expanded, and mappings between category systems of different sources of place name data can be increased to construct a more comprehensive multi-source network of spatio-temporal derivation relationships of place names.