1. Introduction
With the rise in advanced technologies like Internet of Things (IoT) and augmented reality (AR), there is a growing demand for efficient and accurate indoor navigation systems that can support a wide range of applications [
1,
2,
3], from guiding users through complex buildings to ensuring safe, wheelchair-accessible pathways. At the core of these systems is the need for precise indoor navigation models, which represent the spatial and semantic structure of indoor environments.
In recent years, various indoor navigation models have been developed, mainly including three types [
4,
5]: the network model [
6,
7,
8,
9], the grid/voxel model [
10,
11], and the space entity model [
12]. These models, along with their development methods, each have unique advantages and limitations in representing complex indoor environments, making it essential to choose both the appropriate model and construction method based on the specific requirements of different navigation applications, such as low cost, high efficiency in development, and robust performance in real-world applications. For the network model, its simplicity, flexibility, and efficiency have made it widely favored for constructing indoor navigation systems. Moreover, recent advancements in 3D mapping technologies, along with the development of deep learning, have opened new avenues for the automatic extraction of indoor navigation network models [
2,
13,
14].
In the field of 3D mapping, the availability of low-cost sensor devices such as the Apple smart devices [
15,
16], Microsoft HoloLens [
17,
18,
19], and depth cameras [
20,
21,
22] allows for an increasing number of users, even those without professional knowledge, to perform 3D mapping and obtain indoor spatial information [
15]. This not only reduces the cost of indoor mapping, but also allows more non-experts to participate. Furthermore, with the rapid development of deep learning technologies [
23,
24,
25], it has become possible to train 3D point cloud semantic segmentation models that can automatically recognize and classify various elements as semantic point clouds within indoor spaces with a reasonable degree of accuracy. Semantic point clouds, which contain detailed geometric and semantic information about indoor environments, offer significant potential for extracting indoor navigation elements. However, their quality can vary substantially due to differences in data acquisition methods, annotation techniques, and segmentation models, often resulting in inconsistencies, noise, and incomplete semantic labels. Consequently, extracting accurate navigation networks from such point clouds often requires manual intervention during the preprocessing stage. However, to the best of our knowledge, no standardized preprocessing framework currently exists to guide this process.
While research on extracting indoor navigation networks from semantic point clouds remains limited, some progress has been made. For example, Yang et al. [
26] proposed a semantics-guided method for reconstructing indoor navigation networks directly from RGB-D sensor data. However, their research does not fully address the challenges posed by multi-floor environments, particularly the accurate representation of transitional spaces such as staircases and elevators. The transitional spaces are critical for seamless vertical connectivity and play a crucial role in various real-world applications. For example, in robotics, an accurate navigation network of such spaces facilitates autonomous navigation across multiple floors. Additionally, in emergency response systems, precise information about transitional spaces enables responders to quickly navigate complex indoor environments, thereby improving the success rate of rescue operations.
Specifically, staircases present a notable challenge in accurately extracting the direction and layout of staircases in an automated manner due to their complex geometries, variable step structures, and possible direction changes, especially for unstructured point cloud data. While the current method can automatically identify staircase areas to some extent, representing staircase areas with a single node or a single straight line in the navigation network fails to accurately reflect the actual paths of staircases [
6,
27,
28]. And manually extracted detailed staircase nodes [
5] are time-consuming and may not be suitable for large-scale indoor navigation network extraction. Using Building Information Modeling (BIM) data, Qiu et al. [
29] proposed an automatic method to generate comprehensive 3D indoor maps, integrating both floor-level and cross-floor paths through a hybrid grid-topological map approach. Their method effectively adapts to complex building geometries and achieves high accuracy rates in extracting paths across multi-floor structures. However, compared to structured BIM data, point cloud data can be more directly acquired, developing an automatic and efficient method to accurately capture the actual paths of staircases from 3D point cloud and adapt to different staircase designs still requires further research.
Therefore, the main challenges in accurately extracting indoor navigation networks from semantic point clouds can be summarized into two key aspects: (1) developing a standardized preprocessing framework to ensure the quality and reliability of the data for navigation network extraction; and (2) effectively representing complex multi-floor transitional spaces, particularly staircases, in a way that accurately and efficiently captures their true paths and layouts.
To address the challenges above, in this work, we mainly focused on the following aspects:
- (1)
Introducing a preprocessing rule to standardize and refine the semantic point cloud data, ensuring its accuracy and consistency for reliable navigation network extraction.
- (2)
Developing a moving window method to ensure that the extracted navigation network aligns more closely with the actual structure of staircases.
- (3)
Designing a lightweight data storage structure in the JavaScript Object Notation (JSON) format [
30] specifically to store navigation network information extracted from 3D semantic point clouds.
Additionally, for indoor navigable space classification, this work referenced the IndoorGML standard [
31], which divides navigable space into general space, connection space, and transition space. For experimental evaluation, in this study, besides using the public Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [
32], we further validated the effectiveness and robustness of our indoor navigation network extraction algorithms on our own dataset.
In the following,
Section 2 outlines the methodology for extracting JSON-encoded navigation networks from 3D semantic point clouds, including data preprocessing, space classification, and navigation network generation.
Section 3 presents the experimental process and results.
Section 4 provides a discussion and analysis of the results, and
Section 5 concludes the study and suggests future work.
2. Methodology
The goal of this study is to develop an automatic algorithm to extract indoor navigation networks from 3D semantic point clouds.
Figure 1 illustrates the pipeline of the proposed method. In the following sections, each step of the proposed method will be introduced in detail.
2.1. Data Preprocessing
In this study, the purpose of data preprocessing is to establish specific rules for generating standardized semantic point clouds that are suitable for our indoor navigation network extraction algorithm, while also ensuring consistency in the file storage structure.
In this study, to ensure that the semantic point cloud data input to the algorithm has good quality, we conducted data preprocessing floor-wise and manually segmented the data to obtain the required semantic point clouds. All data segmentation and semantic labeling are conducted in CloudCompare software, version 2.13 [
33]. In association with the space classification in
Section 2.2, we conducted data preprocessing according to the characteristics of different navigation spaces, with a particular focus on processing hallways and stairwells, as well as the supplementation of virtual doors. Additionally, the preprocessing step effectively removed noise points from the data, ensuring higher accuracy and reliability of the semantic point clouds.
Hallways are important passages connecting different rooms and other indoor areas. Thus, the accurate segmentation and labeling of hallways are important for indoor navigation element extraction and network construction. Our preprocessing of hallways mainly focuses on those with turns (i.e., close to 90°) in the original data. If a hallway has turns along its centerline, it is treated as a composite hallway. In such cases, composite hallways could complicate the indoor navigation element extraction and construction of navigation networks. Therefore, we segmented these composite hallways into multiple simple, straight hallways without turns, as illustrated in
Figure 2.
In general, stairwells consist mainly of the staircase and landing. The staircase is an important component that connects the topological relationship between different floors. In this study, we mainly preprocessed the data for the staircase part and extracted the staircase elements from the stairwell individually. Additionally, we also extracted the elevator element from the elevator lobby. The illustration of the extracted staircase and elevator is shown in
Figure 3.
Furthermore, we also extracted the doors corresponding to each type of space, such as the doors leading to office rooms or those associated with hallways. It is important to note that in indoor spaces, there are not only physical doors but also walkable shared boundaries. For example, the walkable shared boundary between two adjacent hallways or the walkable shared boundary between a room and a hallway without a physical door. In the research of Yang et al. [
26], they defined these walkable shared boundaries as virtual doors. Although these virtual doors do not look like physical doors, they play an equally important role in connecting adjacent indoor spaces.
In our study, we adopted this definition, using virtual doors to connect adjacent indoor spaces that have walkable shared boundaries. Users can also move from one space to another through these virtual doors. These virtual doors, along with physical doors, are part of the connection space. Through data preprocessing, we individually extracted these doors and virtual doors and saved them in their respective spatial directories. As an example, the extracted doors and virtual doors from the hallway are illustrated in
Figure 4. It is worth noting that, because our virtual doors are manually segmented, we did not extract the virtual doors using a uniform thickness standard during the segmentation process. As a result, in some cases, the thickness of the virtual doors may vary. However, this difference does not affect the primary function of the virtual doors in the navigation network, which is to connect adjacent spaces that lack physical doors but have walkable shared boundaries.
Additionally, to ensure the consistency of the file storage structure, all data are saved following a uniform file tree structure. For example, in the hypothetical Floor 1 scene of a building, this scene includes 2 hallways, 1 stairwell, 1 elevator lobby, 1 office, and 1 conference room. The corresponding file tree for the semantic point clouds in this scene is shown in
Figure 5. Taking stairwell_1 as an example, stairwell_1.txt corresponds to the point cloud file for the stairwell, while the annotations folder contains the point cloud files for the doors and staircase associated with the stairwell. Through data preprocessing, we obtained the refined semantic point cloud files, ensuring that the input files to the algorithm follow the same directory structure.
2.2. Space Classification
For the classification of point clouds, we adopted the key concept of navigable space classification from the IndoorGML standard [
31]. We classified the point clouds into general space (e.g., offices, conference rooms, and storage), connection space (e.g., doors and virtual doors) and transition space (e.g., hallways and stairwells), as shown in
Table 1. These three space categories together form the navigable space, which is an important component of the indoor navigation network.
However, the number of space categories can be adjusted based on specific applications or objectives. For example, if it is necessary to highlight certain areas such as building entrance, emergency exit, or water closet (WC), additional space types can be introduced. In this work, the detailed connectivity provided by connection spaces and transition spaces is sufficient for constructing an effective indoor navigation network, so we classified the point clouds into just these three categories.
2.3. Node Extraction and Edge Construction
In this section, we detail the process of extracting nodes from 3D semantic point clouds. Nodes representing spaces with connected topological relationships are subsequently linked to form edges. The nodes and edges form the backbone of our indoor navigation network, representing key locations and the paths connecting them.
The theoretical foundation for our approach relies on two key concepts: The Node-Relation Graph (NRG) [
6] and Poincaré duality [
34]. The NRG is a framework representing topological relationships, such as adjacency and connectivity, among indoor objects. It implements these relationships as a graph without requiring geometrical properties, enabling efficient path planning in indoor navigation systems. Poincaré duality provides the theoretical basis for mapping indoor spaces to an NRG. According to Poincaré duality, a k-dimensional object in N-dimensional primal space maps to an (N-k) dimensional object in dual space. For example, 3D rooms are represented as 0D nodes, and 2D surfaces between rooms become 1D edges. This simplification allows for a combined topological network model that effectively manages complex spatial relationships indoors.
Due to the different structural features, the importance of different types of navigable spaces is also different, e.g., hallways connecting many rooms and staircases connecting different floors are more important than general and connection spaces. Therefore, the node extraction strategies for general space, connection space, and transition space are slightly different.
For each semantic point cloud representing general, connection, or some transition spaces (e.g., elevator lobbies), we extract the centroid to serve as the node corresponding to each space in the NRG. For a group of points
P = {
p1,
p2, …,
pn} in a semantic point cloud (e.g., offices, doors), where
pi = (
xi,
yi,
zi), the centroid is calculated as
However, for staircases and hallways, which belong to transition space, a single representative node sometimes fails to accurately reflect the geometric relationships, making it impossible to generate the shortest paths. Since hallways could be long and connect many rooms, and staircases link different floors, it is necessary to refine the nodes for both hallways and staircases. In the following, we will detail the extraction of nodes for staircases and hallways in
Section 2.3.1 and
Section 2.3.2, respectively.
2.3.1. Node Extraction and Edge Construction for Staircases
In this study, considering the vertical height changes and the more complex 3D structure of staircases, we designed a dedicated algorithm for extracting the nodes and edges of staircases for each floor.
The first step involves extracting the centerline of the staircase, which is achieved using a moving window approach along the
Z-axis, as illustrated in
Figure 6. This method calculates the bounding box of the point cloud data representing the staircase to understand its spatial extents. The point cloud data are then processed in slices incrementally along the
Z-axis. For each slice within the bounding box, we identify the points that fall within the current Z-range and calculate their centroid using Equation (1). These centroids form the centerline points of the staircase.
To ensure the extracted centerline is smooth and continuous, we adjust each point on the centerline by considering the positions of its neighboring points, resulting in a more regular representation of the staircase path.
Specifically, given a sequence of centerline points
, where
represents the index of each point along the centerline, for each interior point
(i.e., excluding the first and last points), the new smoothed position
is calculated by blending the current point with the average of its neighboring points:
In Equation (2), is the current point being smoothed and and are the previous and next points, respectively. α is the smoothing factor, controlling the influence of neighboring points. A smaller value of α results in less smoothing, while a larger value increases the smoothing effect.
Once the centerline is extracted and smoothed, nodes are created at each point along the smoothed centerline, each assigned an ID and labeled as a staircase node, with coordinates reflecting its position on the centerline. Edges are created between consecutive nodes along the centerline, establishing a connected path. Additionally, nodes representing doors in the stairwell space are connected to the first node of the staircase.
To maintain consistency in the vertical dimension, the Z-values of non-staircase nodes are adjusted to match the minimum Z-value of the staircase nodes. This ensures that all extracted non-staircase nodes on the same floor have a uniform height with respect to the minimum Z-value of the staircase node. We first extract the navigation network structure for each floor separately and then connect these structures through the staircase nodes.
2.3.2. Node Extraction and Edge Construction for Hallways
For a more practical extraction of a navigation route in hallways, Jianga et al. [
35] proposed a method for subdividing corridors based on the positions of doors. By calculating the cutting points between doors and projecting perpendicular lines, they divide the hallway into several segments, thereby generating a more detailed and accurate navigation route. However, this method relies on 2D CAD files for navigation network generation and may not be directly applicable to 3D point clouds. In the study by Yang et al. [
26], they used a method for extracting sub-nodes along the hallway. In their method, one key step is determining the hallway centerline, followed by projecting the positions of doors onto the centerline for sub-node extraction, but they did not provide details on how to calculate the centerline of the hallway.
In this study, we used PCA [
36] to determine the main direction of the hallway. As shown in
Figure 7, the red arrow indicates the main direction, which corresponds to the primary axis and aligns with the longer direction of the hallway. The yellow arrow represents the secondary direction, which is perpendicular to the primary axis. By identifying the primary axis of variance in the point cloud data, we projected the points onto this axis to define the central line. However, we observed that the centerline extracted directly using PCA occasionally drifted towards one side of the hallway. This drift occurs mainly due to the asymmetric or irregular distribution of the point cloud, as the direction of maximum variance calculated by PCA may not always perfectly align with the geometric center of the hallway. To optimize the extraction results, we introduced the centroid of the hallway as a reference, calculated by Equation (1).
Let the primary direction vector obtained from PCA be denoted as
, and the centroid of the hallway as
; the minimum and maximum projection values along the primary direction are represented by
and
, respectively. Without correction, the endpoints derived from the projections can be expressed as
However, such endpoints may deviate from the true center of the hallway. Therefore, we correct the endpoints by considering the centroid and adjusting the positions symmetrically around it. The corrected endpoints are calculated using the following equations:
In Equation (4), represents the projection of the centroid onto the primary direction, ensuring that the endpoint correction is made relative to the position of the centroid. and compute the offsets of the minimum and maximum projection values relative to the centroid’s projection. The final endpoints and are determined by adding the corrected offsets to the centroid , ensuring symmetrical positioning around it.
By projecting the points onto the main direction identified by PCA and then adjusting these projections relative to the centroid, this adjustment reduced the drift of the extracted centerline, ensuring the centerline more accurately represents the true central axis of the hallway.
Subsequently, the centers of doors parallel to the centerline of each hallway are projected onto the centerline to obtain intersection points, as shown in
Figure 8. These intersection points are then sorted, and if the distances between any two intersection points are less than a set threshold of 1 m, they are merged into a new intersection point. This process continues until the distances between all intersection points are greater than the set threshold, ensuring that the geometric relationships within the hallway are accurately reflected.
Based on our prior knowledge, not all hallways require sub-node extraction. To avoid unnecessary sub-node extraction and improve the efficiency of the algorithm, this study classifies hallway network extraction into two types based on the length of the hallways. For hallways with a length less than or equal to 4 m, we extracted only the centroid as the representative node. However, for hallways with a length longer than 4 m, we applied sub-node extraction to enhance the accuracy of the navigation network.
In this way, a short hallway is represented by a single node, which is connected to its associated door nodes to form the corresponding edges. In contrast, a longer hallway is represented by multiple sub-nodes within the navigation network, with each sub-node connected to either adjacent doors or neighboring sub-nodes, thereby forming the edges of the network.
2.3.3. Node Duplicate Detection for Doors
According to the file tree structure shown in
Figure 5, point cloud files representing the same door can appear simultaneously in both general space (e.g., office) and transition space (e.g., hallway) directories. This setup introduces the potential issue of duplicate node extraction when processing door data, as a newly extracted door node may have already been processed and saved in the navigation network. To address this issue, we used a node duplicate detection algorithm that filters out duplicate nodes extracted from the door point cloud data.
Since the semantically segmented point cloud files representing the same door in the general space (e.g., office) and transition space (e.g., hallway) directories may have slight differences in detail, but are generally consistent, such as points distribution or number of points, the extracted centroids for the same door may be different. To accurately identify duplicate nodes, we employed a distance threshold-based method.
To ensure that each node uniquely represents a specific door, for each new door point cloud file, we calculate its centroid coordinates C
new = (
xnew,
ynew,
znew). For each existing door node in the extracted navigation network, we directly iterate through the centroid coordinates C
e = (
xe,
ye,
ze) of each door, then calculate the Euclidean distance
d between the new door centroid and each door centroid using the following equation:
If the calculated distance d is less than the threshold, the nodes are considered duplicates and only the existing node is retained. Based on our prior knowledge, we set the threshold to 1 m. By using this threshold range, we can identify whether nodes are duplicates. This method ensures that each node is unique and accurately represents a door location within the indoor space.
2.4. JSON-Encoded Navigation Network
The final extracted data structure is saved in JSON [
30] format, as illustrated in
Figure 9. JSON is a lightweight data exchange format that is easy for people to read and write, and easy for machines to parse and generate. This format facilitates easy manipulation and integration with various applications, making JSON an ideal choice for data exchange, especially in web applications.
In this work, the designed JSON format data structure includes two main components: nodes and edges, as shown in
Figure 9. Each node in the JSON structure represents a specific point (e.g., office, hallway, door, or staircase) in the indoor navigation network. The attributes of each node include the following:
ID: A unique identifier for each node.
Name: The name assigned to the node (e.g., “door”, “staircase”).
Navigable Type: The type of space the node represents (e.g., general space, connection space, or transition space).
Affiliation: The specific floor to which the node belongs in a multi-floor building.
Coordinates: The x, y, and z coordinates that specify the node’s position in the 3D space.
Accessibility: A Boolean attribute indicating whether the node is wheelchair accessible.
Edges define the connections between nodes, forming the basis of the indoor navigation network. Each edge includes the following:
This approach enhances the interoperability of the extracted indoor navigation network data, allowing it to be seamlessly used in further computational analyses and real-world navigation applications.
5. Conclusions
This study presents a comprehensive method for extracting accurate and practical indoor navigation network models from 3D semantic point clouds, with a focus on addressing key challenges in complex environments, such as staircases and multi-floor spaces. To ensure consistency in the data input, we developed a series of preprocessing rules. Additionally, we refined the staircase node extraction algorithm and provided a more comprehensive solution for hallway sub-node extraction. Furthermore, we designed a lightweight JSON data structure to store the extracted indoor navigation network data, enabling the successful construction of indoor navigation networks using semantic point clouds.
To evaluate the effectiveness of our approach, we first validated the method using the S3DIS dataset. Subsequently, we conducted further validation on a self-collected dataset, which consisted of 3D data from a multi-floor building captured using HoloLens 2 and pre-processed to generate the required semantic point clouds. This two-step evaluation demonstrated the robustness of our method across various spatial environments. Finally, we evaluated the path planning capabilities of the navigation network using the Dijkstra algorithm. The results showed that our network effectively supports spatial connectivity analysis and performs well in different indoor environments. Additionally, wheelchair accessibility was considered, and the paths were correctly determined.
While our method successfully extracted indoor navigation networks from semantic point clouds, and applied them to shortest path planning, our research mainly focuses on Manhattan-world models [
38] and static point cloud data. In the future, we aim to further improve our method by adapting it to more complex indoor environments. Additionally, we plan to integrate scene recognition and self-adaptive algorithms to achieve more efficient and flexible construction of indoor navigation networks.