1. Introduction
The underground pipeline system includes pipes used for water supply, drainage, natural gas, and electricity, forming a critical component of urban underground infrastructure [
1,
2,
3]. However, discrepancies between the designed and actual pipeline, arising from construction errors, design changes, and the complexity of the underground environment, are common. Such deviations exacerbate maintenance and management challenges, leading to construction delays, cost overruns, and conflicts with existing utilities. These issues can result in utility outages, project disruptions, and elevated risks to urban operations. Consequently, accurate and efficient digital information of underground pipelines is crucial for urban management and the maintenance of essential services.
Building Information Modeling (BIM) has emerged as a powerful tool for managing both above-ground and underground infrastructure. It supports the creation, management, and maintenance of digital representations of structures throughout their lifecycle [
4]. Within urban infrastructure, BIM integrates and coordinates various information, facilitating the visualization and management of underground pipes [
5]. Recently, the advancements in remote sensing technologies have provided new opportunities for modeling of pipeline systems [
6]. Among these technologies, LiDAR has proven particularly valuable due to its ability to capture high-resolution three-dimensional data of terrain and objects through laser-based distance measurements, and is widely applied in the Architecture, Engineering, and Construction (AEC) industry [
7,
8,
9,
10,
11]. LiDAR offers several advantages over traditional surveying methods, including non-invasive point cloud collection and minimal disruption to existing infrastructure [
12]. The resulting point cloud data serve as a reliable data foundation for the creation of BIMs of pipeline systems [
13,
14]. Significant efforts have been directed toward developing efficient frameworks for pipeline system reconstruction using high-resolution point cloud data [
15,
16,
17,
18]. The key to modeling underground pipeline systems lies in the accurate detection of pipeline structures, which predominantly exhibit cylindrical geometry. A bottom-up approach simplifies the problem by reducing pipe detection to a cylinder detection challenge, enabling more intuitive and efficient modeling. Commonly used methods of cylinder detection include Random Sample Consensus (RANSAC) [
19], region growing [
20], and Hough transform algorithms. The RANSAC algorithm fits models by iteratively sampling random subsets of points and evaluates the model quality based on fitting errors [
21]. Liu et al. [
22] applied the RANSAC algorithm to detect circular features by projecting pipe structures onto a plane. However, this approach is limited to pipelines that are either vertical or parallel to the ground. Jin and Lee [
23] introduced a RANSAC-based sphere fitting method, which involved rolling spheres of uniform diameter along the cylinder interior to determine its central axis. While innovative, this method is highly susceptible to noise and inaccuracies, particularly in scenarios with sparse or incomplete point cloud data. The region growing algorithm segments point cloud data by analyzing the similarity between seed points and their neighboring points. Kawashima, Kanai, and Date [
16] applied a normal-based region growing method to extract pipe points, transforming 3D cylinder detection into a 2D projection plane circle fitting problem. Li et al. [
24] combined density-based clustering and region growing algorithms to accurately extract geometric features, but this method is only suitable for straight pipes. The region growing method is sensitive to similarity measures and parameter selection, and improper choices can introduce errors, especially when dealing with noisy data. The Hough transform maps data points to parameter space and detects specific geometric shapes through a voting mechanism. Ahmed et al. [
25] sliced the point cloud and projected it at equal intervals, using a 2D Hough transform for circle detection in each slice. The main challenge with the classical Hough transform is its memory requirements and computation time. To address this, Figueiredo et al. [
26] and Patil et al. [
27], inspired by the work of Rabbani and Van Den Heuvel [
28], proposed staged Hough transform approaches to reduce complexity, respectively.
Despite these advancements, bottom-up methods often lack a global perspective, posing significant challenges in the reconstruction of topological connections of pipelines. To address these challenges, some studies have employed skeletonization techniques to derive the pipeline’s central axis with geometric and topological features [
29,
30,
31]. For instance, Lee et al. [
32] extracted the skeleton based on Voronoi diagrams and topological refinement methods, enabling component segmentation by analyzing node connectivity and acute angles between adjacent nodes. Additionally, curvature-based methods [
33] and RANSAC-based methods [
34] have been applied to estimate skeleton nodes effectively. More recently, deep learning models, such as convolutional neural networks [
35], and the innovative PipeNet framework [
17], have demonstrated promising applications in pipeline reconstruction.
Current research predominantly focuses on relatively controlled and accessible building environments, such as mechanical, electrical, and plumbing (MEP) systems and industrial plants, where precise and multi-angle on-site measurements ensure high-quality data collection [
17,
27,
33,
36]. However, pipeline reconstruction in underground environments poses numerous challenges. The complex underground conditions and cluttered surface cover often lead to occlusions and indistinct surface features. These issues result in sparse and incomplete point cloud data, complicating the development of models that meet precision requirements for practical applications.
To address these limitations, this study proposes a framework for underground pipeline reconstruction based on unstructured point clouds. The proposed framework, which requires no prior assumptions, is capable of processing pipes with arbitrary orientation and size and constructing topologically connected underground pipeline models. First, the pipeline point cloud is derived through semantic segmentation. Subsequently, the geometric features of the pipes are utilized to preliminarily extract the central point set and radius information, with the central point set serving as input for subsequent structural analysis. To address data gaps and noise in the point cloud, this study proposes a centerline generation method combining a parameter-adaptive adjustment strategy and angle-distance metrics. This approach ultimately enables the analysis of pipe topological connections, facilitating the creation of high-precision underground pipe BIMs. The contributions of this framework include: (1) an improved RANSAC algorithm that effectively mitigates noise interference and enhances segmentation efficiency compared to traditional methods; (2) a parameter-adaptive adjustment strategy that enables the framework’s applicable to pipes of various types and sizes; (3) a set of evaluation metrics based on angle and distance indicators which can reconnect fragmented straight pipe structures during refinement and reconstruct pipeline topological connections; (4) the achievement of high-precision underground pipeline reconstruction from incomplete point clouds to structured BIMs. This research significantly improves the efficiency of underground pipe management, facilitating the intelligent development of urban infrastructure. It also enhances the sustainability and adaptability of urban management practices, providing a robust foundation for the realization of the smart city.
2. Materials
The point cloud data used in this study are obtained from the underground pipeline system in the Sha Tin District of Hong Kong, using ground-based laser scanning equipment. The pipe network includes both freshwater and saltwater pipes. The dataset consists of 171 scan files stored in ASCII format, with each file containing the following attributes: {x, y, z, r, g, b, label}, where {x, y, z} represents the spatial coordinates, {r, g, b} denotes the color attribute, and {label} corresponds to the semantic label. The semantic labels are manually annotated and classified into two classes: pipe and non-pipe. This dataset is used to train and test the semantic segmentation network, which segments pipe data from complex underground environments, used for subsequent structural analysis and BIM reconstruction. Due to factors such as sensor resolution, viewpoint, light reflection, and pipe material, some pipe areas exhibit inconsistent or missing color information. Therefore, the RGB color attribute is excluded as an input feature for the semantic segmentation network. Instead, they are used as a reference during the BIM construction.
To validate the proposed reconstruction framework, three representative real-world scenes are selected based on segmented data. These scenes encompass different pipe arrangements, background complexities, and data quality levels. Scene A, derived from a freshwater pipeline system, presents a complex pipe layout and a cluttered background, with significant data missing due to occlusions. Its spatial dimensions are 104.05 × 57.92 × 6.45 m. Scene B, also from a freshwater pipeline system, features a simpler pipe arrangement with some residual noise. Its spatial dimensions are 43.11 × 35.31 × 5.84 m. Scene C, from a saltwater pipeline system, represents a simple scene and high-quality data. Its spatial dimensions are 106.60 × 18.58 × 11.19 m. The basic statistical details of these scenes are shown in
Table 1.
3. Methods
Due to the harsh underground environment, object occlusions, and limitations of LiDAR technology, underground pipe data often suffer from missing data, along with noise and outliers, which complicates the pipe structure analysis. To address the issues, this study proposes a point cloud-based underground pipeline reconstruction framework designed to extract high-precision pipe topological information from incomplete point clouds.
The proposed framework consists of four main steps (as shown in
Figure 1). First, the raw point cloud is preprocessed to isolate the underground pipe data and eliminate irrelevant points. Second, an initial set of central points and radius information of pipes is extracted. To mitigate the effects of sparse and missing data in the point cloud, normals are used for compensation. Third, a centerline generation method is proposed, incorporating parameter-adaptive adjustment strategies and introducing angle and distance metrics. The adaptive adjustment strategy optimizes the fitting range of the RANSAC algorithm, enabling its applicability to different types of pipes. The angle and distance metrics enhance the accuracy of the centerline. Finally, a topological connectivity analysis is conducted on centerline segments, ultimately generating the underground pipeline BIMs.
3.1. Data Preprocessing
Due to the complex background of point cloud data in real-world underground pipe scenes (including ground, markers, and fixed objects), this study first applies the RandLA-Net network for semantic segmentation of pipes. The segmentation results are then further optimized using clustering and filtering algorithms to improve the accuracy and reliability of pipe points extraction.
3.1.1. Semantic Segmentation Based on RandLA-Net
Recent advancements in deep learning have revolutionized point cloud processing, with existing semantic segmentation networks having great potential in eliminating the influence of complex backgrounds. RandLA-Net, as an advanced semantic segmentation network, is capable of directly processing point clouds [
37]. RandLA-Net utilizes multilayer perceptrons (MLPs) as its foundational components and integrates random sampling (RS) and local feature aggregation (LFA) modules, enabling efficient and accurate segmentation of underground pipe points. As shown in
Figure 2a, RandLA-Net adopts an encoder–decoder architecture with skip connections. In the encoder, random sampling is used to reduce point cloud density, thereby minimizing memory and computational overhead. In the LFA module, an attention mechanism and dilated residual block are introduced to fully extract features. In the decoder, upsampling is achieved by k-nearest neighbor (KNN) and linear interpolation.
3.1.2. Filtering and Optimization
The pipe points extracted from semantic segmentation still contain some noise and outliers, which are sparsely distributed. To address this, statistical filtering and the HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) clustering are employed. Statistical filtering calculates the distance between each point and its neighboring points, statistically analyzes the distribution of these distances, and compares them with a predefined threshold (usually determined by the mean distance and standard deviation). Points exceeding this threshold are then removed, effectively targeting small-scale and scattered noise points. HDBSCAN constructs a density tree to form a clustering structure [
38]. This method does not require pre-specifying the number of clusters and can automatically identify the optimal number of clusters, making it suitable for handling data with size variations and a large amount of noise. The segmentation results before and after applying the filtering and optimization are shown in
Figure 2b.
3.2. Extraction of Centerline Point Set
The centerline, also described as the skeleton, serves as an abstract representation of pipelines, reflecting both their geometric features and topological structures [
29]. A pipe can be defined by its radius and centerline, along with the associated topological connections. In this study, the centerline of a pipe is initially extracted based on the Rotational Symmetry Axis (ROSA) method proposed by Tagliasacchi et al. [
39]. This algorithm conceptualizes the pipe centerline as a generalized rotational symmetry axis, where centerline can be formed by connecting center points of several locally optimal cutting planes (ROSA points). Thus, extraction of the pipe centerline is transformed into searching for local optimal cutting planes and localization of ROSA points.
Figure 3a presents an example of pipe data. To mitigate the impact of missing data, the ROSA algorithm incorporates compensation strategies using features derived from point normals.
3.2.1. Local Optimal Cutting Plane
A random sample point
is selected from pipe points, and its corresponding local optimal cutting plane
, which is perpendicular to the pipe’s extension direction, can be determined. The center point of
is the ROSA point
, defined by coordinates
and normal vector
. The determination of
is performed iteratively. Starting from a randomly selected initial direction
, the normal vector
is iteratively optimized to update the cutting plane. In each iteration, the cutting plane
is determined based on
and the current normal
. The Mahalanobis distance, incorporating both Euclidean and directional space information, is used to derive current neighborhood points
for
, where
j represents the number of direction iterations. The optimal normal
of
(i.e., the normal of
) is the one minimizing sum of angular variance with normals of points in
, ensuring
’s normal is most rotationally symmetric with respect to normals of points in
. As shown in
Figure 3a, black points represent randomly selected sample points
, blue and green bands represent the neighborhood point cloud
derived from
, gray arrows indicate normals of points in
, and red arrows indicate normal
of
. The sum of angular variation between
and normals of points in
should be minimized. Due to missing data,
is often incomplete, but this is compensated using point normals to preserve symmetry.
3.2.2. Positioning of the ROSA Point
After determining the local optimal cutting plane
, the normal
of the ROSA point is also determined. Next, the position of the ROSA point is optimized. Given the optimal cutting plane
and corresponding neighborhood
, the optimal position
of the ROSA point is determined by minimizing the sum of squared distances from
to the extensions of normals of points in the neighborhood
. The red point in
Figure 3a represents the final position of the ROSA point. The optimization function is as follows:
where
is the unit normal of point
. By solving the function, the position
of the ROSA point can be found. Since pipeline joints are usually non-cylindrical and lack a clear rotational symmetry axis, the Laplacian smoothing algorithm is used to ensure smooth connections between branches. Additionally, the moving least squares (MLS) method is applied for further refinement. Finally, the center of the joint is re-adjusted using Equation (1), resulting in the initial centerline point set
.
The radius of the pipe is calculated by the shortest distance from each center point to surrounding pipe points. However, due to the presence of noise and outliers, the nearest point may not correspond to the correct point on the pipe surface, leading to errors. To mitigate the impact of noise, the k-nearest neighbor method is used, where the median of distances from to the k points is taken as radius . Following the generation of initial centerline point set , due to the disordered nature and the possibility of positional offsets, fitting and refinement of the centerline are necessary.
3.3. Centerline Generation
3.3.1. Centerline Fitting
This study uses the RANSAC line fitting algorithm to process the initial centerline point set C. The classic RANSAC algorithm iteratively optimizes the fitting result by identifying inliers and outliers. At each iteration, the distances between data points and the fitted model are measured and compared against a predefined distance threshold. Points within the threshold are classified as inliers, while those outside are marked as outliers. By adjusting the threshold, the sensitivity of the model to noise and outliers can be controlled, which affects the accuracy of the final fitting result.
However, the RANSAC algorithm’s inherent randomness makes it vulnerable to noisy data, increasing the risk of misfitting different line segments as a single line. To address this limitation, an improved RANSAC fitting method is proposed (
Figure 3b). Specifically, after each RANSAC fitting, Density-Based Spatial Clustering of Application with Noise (DBSCAN) clustering is applied to process the remaining data points. This clustering step partitions the points into multiple subsets based on spatial proximity, allowing subsequent line fitting to be performed independently within each subset. Since points belonging to the same line segment are theoretically close in space, this data partition strategy effectively reduces the risk of misfitting.
To further suppress the false line segment problem caused by noisy points in , this study adopts a parameter-adaptive adjustment strategy. It is known that noisy points in are mostly located inside the pipe, with a denser distribution closer to the pipe centerline. The RANSAC algorithm is applied iteratively to generate candidate centerlines from . The distance threshold for each fitting is set as , where is the average radius of points in , and is an empirically set constant scalar. Next, with as the centerline and as the radius, where is an empirically set constant scalar, a cylindrical neighborhood search is performed. The points found in the search are labeled as noisy points and removed from the initial point set , preventing them from participating in subsequent fittings. Thus, the fitting parameters are dynamically adjusted based on the pipe radius, enhancing the algorithm’s generalization ability in pipes of various sizes. The final result is a candidate centerline set , where each is defined by the centerline endpoints , direction vector , and radius , with being the average radius of the inlier points fitted.
3.3.2. Centerline Refinement
To prevent accuracy degradation caused by incorrect fitting and overfitting, this study introduces three indicators to optimize the fitting results. The first indicator is the angle between candidate centerlines. The second is the endpoint distance (ED), where
represents the shortest distance between the endpoints of two straight pipes. The third is the line distance (LD), where
represents the shortest distance from line
to line
, as shown in
Figure 4.
where
calculates the distance matrix between endpoint pairs
and
;
−
, where
represents the relative displacement between the endpoints of the line segments.
The optimization strategy is based on angle and distance matching criteria to be implemented as follows:
where
= {
} represents the point pairs obtained by refinement, where
m is the iteration number. The function
performs least squares fitting on all candidate line segments that meet the criteria and computes the endpoints.
is a pre-set constant and can be set to a relatively large value, as the centerline may be disconnected due to missing pipe data. The LD indicator ensures that even with a large
different parallel pipes will not be incorrectly connected. The algorithm is implemented iteratively. After each iteration, the line segments that meet the criteria are removed from
CL, and the longest candidate line segment is added as the new seed, until all valid candidate line segments are processed.
3.4. Topological Reconstruction
Underground pipe components include straight pipes, elbows, and tees. Through the aforementioned steps, we obtained the key point set
, representing the positions of straight pipes. The information of elbows and tees can be determined by analyzing the topological connections between straight pipes (
Figure 5). To achieve topological construction, this study introduces a new indicator, segment distance (SD).
represents the shortest distance from endpoints of the
i-th straight pipe to the
j-th straight pipe, and is calculated as follows:
where
is the projected point of
on line segment
.
By combining the SD and ED proposed in
Section 3.3.2, the connection type between straight pipes is determined. If the ratio of ED to SD exceeds a preset threshold, indicating the SD between straight pipes is small but the ED is large, the connection is classified as a tee, with the
i-th straight pipe as the branch pipe and the
j-th straight pipe as the main pipe. If both the SD and ED are small, the connection is classified as an elbow. The classification of these three connection types is shown in
Figure 5. The parameter extraction results (radius and centerline with topological connections) are integrated to realize BIM reconstruction of the pipeline. The final model is created in Revit with Dynamo.
3.5. Accuracy Evaluation
To evaluate the performance of the proposed framework, accuracy assessment from multiple perspectives is conducted. At the object level, the framework’s ability to detect pipeline components is assessed using precision and recall metrics. Precision is used to measure the proportion of true pipe components among the components detected. Recall is used to measure the proportion of pipe components correctly detected among the true components. From a point-level perspective, the deviation analysis method proposed by Anil et al. [
40] for BIM quality evaluation was adopted. The nearest distance from each point to the reconstructed model is calculated as a measure of similarity, which can reflect both the overall fitting effect, as well as identifying areas with poor local fitting. The deviation threshold of 0.15 m set in this study is an empirically determined parameter, designed to effectively excludes local errors, while concurrently providing an accurate representation of the overall alignment between the model and the actual point cloud data [
15]. Points with deviations exceeding this threshold were considered noise or irrelevant objects that were not modeled. These points were excluded from further analysis to minimize the impact of noise on accuracy metrics. Additionally, three metrics—mean absolute error (MAE), root mean square error (RMSE), and relative mean error (MRE)—were used to measure the error in pipe radius, as shown in the following formulas:
where
is the true radius of the
i-th pipe, obtained through manual measurements and verified with design documents, and
is the estimated radius, with m representing the number of samples.
5. Discussion
This study proposes a BIM reconstruction framework for underground pipeline systems based on laser point clouds. Given the prevalent challenges of missing and sparse point cloud data, achieving high-precision and fully connected BIMs has remained elusive in prior research. Many existing methods for 3D reconstruction face notable limitations. For instance, some approaches require pipe orientations to align with the main axes [
25], fail to reconstruct the topological relationships between pipes [
22,
35], or involve complex parameter settings [
23]. Moreover, most research has focused on pipeline systems in open environments, such as industrial plants or mechanical, electrical, and plumbing (MEP) systems. These methods are often unsuitable for the unique complexities of underground environments, making their direct application to BIM reconstruction of underground pipelines challenging. To address these issues, this study proposes a high-precision centerline generation and topological reconstruction framework, which effectively mitigates the influence of residual noise in pipe points and enables the topological connection between disconnected pipes.
To validate the effectiveness of the proposed framework, actual point cloud data with varying complexity were tested. Experimental results demonstrated its high efficiency and robustness across all test scenes, achieving an overall recall rate of 88.8% and a precision rate of 96.2%. These metrics highlight the framework’s superior performance and practical applicability in real-world scenes.
Despite achieving remarkable results, this study has some limitations. Firstly, the performance of the RandLA-Net semantic segmentation network may be suboptimal in highly complex scenes, particularly when there is substantial noise or when the distinction between pipes and the surrounding background is minimal. Given that the primary focus of this study is on the structural reconstruction of pipelines, optimization strategies were employed to enhance segmentation performance. However, to address the challenges posed by more complex underground environments, future research will seek to enhance segmentation accuracy by redesigning the segmentation architecture and benchmarking it against advanced, state-of-the-art semantic segmentation networks.
Next, while the framework is designed for underground pipelines, it does not address other common components in underground utility systems, such as valves and manholes. The reconstruction of these irregularly shaped components poses challenges for geometric-based methods. We will consider further exploring the automatic detection and reconstruction of these complex components based on template matching or deep learning algorithms. Enhancing the framework with these capabilities would further enhance its practicality and expand its application range in BIM reconstruction of underground utility systems.
6. Conclusions
This paper presents a framework for topological reconstruction of underground pipelines based on laser point clouds. Given the complex structure of underground pipelines and the issues of self-obscuration or occlusion by surrounding objects, the comprehensive framework which integrates a series of processes, including data preprocessing, the extraction of initial point sets for pipe centerlines, radius estimation, centerline generation, and topology reconstruction, is developed to achieve a high-precision BIM representation. Among these, centerline generation is a critical module of this framework. However, compared to above-ground scenes, an underground point cloud typically contains more intricate background noise, which can persist even after pipe segmentation and lead to inaccuracies in centerline determination. To address this, a centerline fitting and refinement approach is developed, significantly improving the modeling accuracy.
The framework is validated using real-world data from Hong Kong, successfully detecting and modeling 175 out of 197 components, including structures such as pipes, elbows, and tees. The framework achieves an overall recall rate of 88.8%, demonstrating its ability to effectively model most components. Deviation analysis reveals that the mean distance between pipe points and the reconstructed model is approximately 3.79 cm, indicating a strong geometric match between them. The mean relative errors in pipe radius estimation across three scenes were 2.60%, 2.54%, and 2.22%, respectively, all below 3%, confirming the high accuracy of parameter extraction. Additionally, the ablation experiment demonstrates that the improved RANSAC and centerline refinement method play crucial roles in enhancing the quality of centerline generation results.
In conclusion, the proposed framework provides a robust technical foundation for the management of underground pipeline systems in the context of smart city development. By facilitating the digitalization and intelligent management of urban infrastructure, this framework supports the operational efficiency and sustainable development of urban environments.