1. Introduction
In the context of contemporary urban informatization and digitalization, the precise extraction of building facade structures holds significant value in areas such as urban planning, heritage conservation, disaster assessment, and smart city development [
1,
2,
3,
4]. With the rapid advancement of 3D laser scanning technology (LiDAR), it is now possible to efficiently obtain high-precision 3D point cloud data of buildings, enabling the automatic extraction of building facade structures [
5]. However, due to factors such as occlusion during the scanning process, variations in reflectivity, and limitations in instrument accuracy, the acquired point cloud data often suffer from issues such as incompleteness, noise, and structural discontinuities, posing challenges to the accurate extraction of building facade structures.
In the extraction of building facade structures, researchers have proposed various methods. These methods are primarily based on the geometric properties of point clouds, such as curvature, normals, and distance metrics [
6], and use algorithms like region growing, Random Sample Consensus (RANSAC), and graph cuts to identify and reconstruct building facade structures [
7]. With the development of deep learning technology, learning-based methods have also been gradually introduced into this field [
8,
9,
10], leveraging trained datasets to identify and extract facade structure features. While these methods have improved the efficiency and accuracy of extraction to some extent, they still face limitations in handling large-scale data, complex scenes, and missing data issues. Current research faces several core bottlenecks: First, existing methods rely heavily on the quality and completeness of point cloud data, with insufficient handling of missing data and noise generated during the scanning process. Second, the geometric primitives derived from complex feature descriptions still struggle to achieve object-level correction. These structures make object-level correction more difficult. Due to limitations such as the uneven sampling of facade point clouds, variations in reflective materials, and occlusions or missing data, it is challenging to extract robust contour structures from individual window targets. Our approach treats each window as an independent object, incorporating consistent discrimination of objectification and mutual optimization among objects, resulting in improved structural extraction outcomes. We emphasize optimizing the spatial structural logic of semantic objects after extracting structural primitives such as points and lines. Third, the increasing complexity of urban architecture has constrained the generalization capabilities of existing algorithms.
In response to these challenges, we propose a novel point cloud feature description and a dual-layer optimization method for building facade semantic structure extraction based on optimal feature guidance. This method addresses low-quality or occluded facades by performing object-level extraction and inference from both positional and shape perspectives. The method achieves the task of identifying and optimizing the extraction of semantic objects on building facades through three coherent steps: feature screening, similar type recognition, and double-layer optimization. The proposed approach is designed for extracting the facade structures of common high-rise buildings in urban environments and automatically generates accurate spatial boundaries of semantic structure objects. The resulting facade layouts can be directly applied to the enhancement and reconstruction of LOD1/2 (the level of detail) models [
11,
12,
13], achieving high accuracy and applicability. This method provides innovative technological support for urban informatization and digitalization efforts.
The main contributions of this work are as follows:
(1) A novel point cloud feature description and feature point extraction method has been proposed to identify the semantic information of the facade, namely the window structure. Our method incorporates a dynamic neighborhood search mechanism that significantly amplifies the distinction between feature points and non-feature points, thereby enabling a more accurate alignment with the underlying geometric edges of building facades and has been verified in three datasets with defects and occlusions.
(2) We have developed an approach based on the improved Hausdorff distance criterion for effectively identifying objects with similar structures from raw data. This algorithm effectively overcomes the issues of missing data and granularity differences in the scanning of objects with similar structures and achieves global optimization of semantic object grouping through matrix analysis.
(3) We have implemented a dual-layer optimization algorithm for facade semantic structure enhancement, which takes into account both positional and shape aspects. By fully leveraging the spatial and morphological similarities of semantic objects on building facades, and effectively addressing the incompleteness caused by point cloud data and noise interference, the accuracy and efficiency of 3D reconstruction can be significantly improved.
In the following sections,
Section 2 introduces the existing related work that has been conducted in this field, providing a foundation for our research.
Section 3 provides a comprehensive description of a semantic detail enhancement framework for building facade point clouds, delving into the core innovative aspects from three perspectives.
Section 4 outlines the entire experimental process and conducts a thorough evaluation, including an assessment of accuracy. Lastly, in
Section 5, we present a comprehensive discussion and the conclusions of this paper.
2. Related Works
Unlike most methods that focus on building roofs, the extraction of building facades presents unique challenges due to the presence of multiple heterogeneous structures and occlusions during data collection, particularly issues such as uneven point cloud density caused by TLS (Terrestrial Laser Scanning). To enhance the semantic details of facades, it is crucial to detect window targets from the extracted features of the building. Therefore, this paper briefly reviews the semantic enhancement of building facade point clouds from two perspectives: (1) point cloud feature extraction and (2) facade structure extraction.
2.1. Point Cloud Feature Extraction
As an unstructured form of 3D data, point clouds contain rich spatial information. However, due to their large volume, lack of order, and absence of topological structure, direct processing and analysis of point clouds present numerous challenges. Consequently, effective feature extraction has become a critical research topic in the field of point cloud processing.
Compared to early methods that focused on statistical values of point cloud geometric features (such as point distribution, normals, and curvature) [
14,
15,
16,
17], recent research has increasingly turned to geometric feature extraction methods to better utilize the spatial structural information of point clouds. For example, due to the complexity of handling point clouds in three-dimensional space, some researchers have suggested extracting features in two-dimensional space. By projecting point clouds onto a 2D plane, high-precision feature points can be extracted using the high-accuracy feature line intersections available in 2D space [
18]. Although this method achieves high precision in feature extraction, it inevitably leads to occlusions and data loss during the projection process, particularly when dealing with point clouds with complex structures.
When a single geometric feature is insufficient for handling complex extraction tasks, methods combining multiple geometric features (such as curvature, normals, and point distances) have been proposed. These methods use the cross-entropy of multiple geometric features to select relevant features [
19]. Compared to using only one geometric feature, this approach improves accuracy; however, it results in longer processing times and lower efficiency when dealing with large-scale data.
To robustly address multi-dimensional unstructured point clouds, a new point cloud feature descriptor called Spherical Shell Point Feature (SSPF) was introduced, which combines point coordinates to generate embedded features [
20]. This approach considers both point coordinates and spatial alignment, but it faces challenges in handling point clouds in urban scenes.
Deep learning methods, by automatically learning complex patterns within data, are capable of extracting more expressive features from large-scale point cloud data [
21,
22,
23,
24]. Methods that combine multi-scale features with spectral information can distinguish the semantic information of point clouds. By filtering multi-scale point cloud features using a random forest approach, and incorporating spectral information, the constructed point cloud features are then fused with local feature aggregation modules in the network to learn deeper semantic information [
25]. This approach not only enables feature extraction but also teaches semantic information; however, it is limited in handling object-level tasks, such as window detection.
The novel residual network module introduces smaller residual blocks within the residual units to achieve a finer-grained multi-scale feature representation. It integrates Res2Net with a cascade structure to extract multi-scale features [
26]. This method considers the geometric features of point clouds across multiple scales and expands the receptive field size of each network layer. While Res2Net enhances the receptive field and multi-scale feature representation capabilities by constructing hierarchical residual connections within residual units, it simultaneously increases the complexity of the network architecture, making network design and debugging more challenging. Although there are numerous feature extraction methods, most are not well-suited for handling point clouds of building facades.
2.2. Facade Structure Detection
Existing research on building modeling primarily focuses on the extraction and reconstruction of the main structure and roof contours of buildings [
27,
28,
29,
30,
31,
32], while relatively little attention has been paid to the semantic information of building facades, particularly the extraction and analysis of windows. This limitation, to some extent, affects the accuracy of the models and their effectiveness in practical applications. Existing methods apply ground filtering to facades [
33] and enhance traditional mature methods by constructing new semantic segmentation networks [
34]. These approaches address the problem by improving traditional mature methods and exploring solutions from the perspective of multi-module coupling networks. However, the former requires careful parameter selection, while the latter must address the issue of incomplete point clouds. Facade semantic features such as doors and windows are reflected in fine structures on one hand, and on the other hand, they can also be easily identified in color textures, where multi-objective functions have been used to select optimal facade texture images. Geometric correction, color balancing, and texture restoration are also considered to correct distortions, color inconsistencies, and occlusions by other objects (such as trees), thereby producing realistic 3D models of building facades [
35]. Although this method can achieve visually realistic 3D models, the strong texture but weak structural nature of the image data makes it difficult to accurately restore the true geometric structure.
The geometric and topological relationships of semantic entities can be used to reconstruct complex facade models. By formalizing the geometric and topological relationships of window semantic entities on building facades through facade layout models, it is possible to automatically develop semantic building models using the extracted window parameter sets [
36]. While this method ensures consistency between geometry, semantics, and topology in the reconstructed building models, it is limited in handling subtle spatial positional differences. Utilizing the rich radiative information in laser point clouds can enhance the capability of point cloud segmentation. A method based on the fusion of point cloud radiance intensity with images for road segmentation can effectively separate dry, damp, and snowy road surfaces and address the issues caused by data sparsity [
37]. However, this method is targeted at roads and has limitations when it comes to window segmentation.
Due to the high time and labor costs of fully supervised learning, weakly supervised learning methods have been proposed. For instance, a weakly supervised building facade segmentation method using Spatially Adaptive Fusion with Consistency Contrast Constraint (SAF-C3) has been developed. This method employs a Spatially Adaptive Fusion (SAF) module to extract discriminative features for building facade point clouds [
38]. Although weakly supervised learning reduces annotation costs and improves the generalization ability of models, segmentation at the dataset level fails to account for object-level features such as windows.
The challenges of window reconstruction also include issues related to incomplete building facades. To bridge the gap caused by noise, sparsity, and occlusion, and to generate high-fidelity 3D building models, APC2Mesh was proposed. This method integrates the completed point cloud into the 3D reconstruction pipeline, enabling the learning of dense and geometrically accurate representations of buildings [
39]. While APC2Mesh bridges the gap between neural point completion and surface reconstruction, and addresses challenges posed by occlusion, noise, and variable point densities when generating building models from ALS (Airborne Laser Scanning) point sets, it still suffers from minimal edge and corner smoothing issues when applied to complex or high-rise buildings. It is evident that while 3D urban reconstruction has become highly advanced, little work has been conducted on object-level semantic enhancement for building facades.
3. Methods
This paper introduces a novel approach for identifying and optimizing the semantic structure of windows from building facade point clouds. Specifically tailored for complex high-rise buildings commonly encountered in urban management, the method starts with local semantic recognition to derive an optimized window layout for the entire facade, thereby enhancing LOD1/2 models that lack detailed facade information. The approach consists of three key steps: (1) the robust extraction of semantic feature point clouds from the facade; (2) the optimization of object clustering based on the Hausdorff distance; and (3) facade enhancement through a dual-layer optimization of shape and position.
Figure 1 provides an overview of this comprehensive framework.
3.1. Robust Extraction of Feature Point Clouds
The first step is to extract semantic structure points from the building facade’s point cloud data as accurately as possible, while excluding non-semantic structure points. Effective feature description is the foundation for mining high-precision information from point clouds. This study focuses on the facade point clouds of large urban buildings, which not only contain abundant planar structures but also exhibit significant variations in architectural contours and subtle differences in semantic structures such as windows and doors, presenting new challenges for feature description methods. Before proceeding with feature extraction, it is essential to preprocess the raw point cloud data using established algorithms, including filtering and classification techniques to extract the building point cloud and performing planar segmentation to isolate the building’s facade point cloud [
36,
37,
38,
39]. Based on this, the extraction of facade feature point clouds is achieved through the following two steps:
3.1.1. Optimal Neighborhood Computation
For three-dimensional point clouds with discrete distributions, the local structural information of points needs to be extracted from the spatial relationships of adjacent points. In the semantic feature description and recognition of facade point clouds, the challenge we face is to identify minor structural changes from numerous planar structures and overcome noise interference introduced during the point cloud acquisition process. To this end, we should construct a dynamic neighborhood search mechanism that adjusts based on the differences in the complexity of the local neighborhood structure surrounding a point. Shannon entropy, as a measure of the randomness or uncertainty of information, is commonly used to quantify information [
40]. In point cloud analysis, it is often used to assess the amount of information contained within the neighborhood of a point cloud; less information indicates more uniform local features, which aids in dynamically adjusting the neighborhood settings of points.
The Shannon entropy facades are calculated as follows:
In the formula,
represents the magnitude of Shannon entropy.
represent the probabilities of a point belonging to a linear point, a planar point, or is a scattered point, respectively. These three values can be determined using Principal Component Analysis (PCA). In brief, for each point
in the point cloud and its neighborhood set
, PCA is used to fit the domain plane, resulting in the covariance matrix
. After performing an eigenvalue decomposition, the eigenvalues
are obtained. Then, the saliency features
of point
are calculated as Equation (2).
It can be observed that changes in the neighborhood radius affect the value of Shannon entropy. A lower Shannon entropy indicates that the neighborhood of the target point contains less information and has more uniform structural features. This phenomenon guides us in the adaptive selection of the radius. Choosing the optimal radius based on the principle of minimizing Shannon entropy enhances the ability to recognize local subtle structures. It is important to note that a smaller radius does not necessarily mean a lower Shannon entropy, as the influence of noise points decreases with an increase in radius. Therefore, ensuring an adequate neighborhood range and the number of points is essential. We should set reasonable upper and lower limits for the search radius and combine the calculation of Shannon entropy to determine the optimal neighborhood search radius.
For any set of point cloud data, we first need to calculate the average point density ρ. By randomly selecting multiple seed points from the point cloud and calculating the average distance between these seed points and their neighboring points, we determine the point density ρ. We set the maximum value of the search radius to 10ρ and the minimum to 3ρ, and then calculate the optimal radius r according to Equation (3).
3.1.2. Semantic Feature Point Detection via Centroid Displacement
Point cloud data are complex and varied due to their rich geometric features, ranging from coarse structures to fine edges. Precisely identifying feature points directly from these discrete point clouds is a technical challenge [
41,
42,
43,
44]. To address this challenge, we propose a feature point detection scheme based on centroid displacement. Combined with the previously determined optimal neighborhood radius r, this ensures that these feature points can accurately match the underlying geometric edges.
Specifically, we first calculate point features using traditional methods, as shown in
Figure 2a. For a point pi in the point cloud, we obtain its neighborhood point set
p based on the optimal neighborhood radius r and fit its centroid as
. We define the vector
, and the larger the magnitude of this vector
, the more complex the local shape of its neighborhood. However, since
is a uniform Laplacian vector, its weight strategy is insensitive to surface changes, which limits its effectiveness in describing complex and detailed facade semantic structures. As shown in
Figure 2b, in flat areas and shallow structural regions, the vectors from points to the centroid are small and indistinguishable, whereas as shown in
Figure 2c, under noise interference, the vectors from points to the centroid lengthen, potentially misidentifying pi as a feature point.
To enhance the distinction between feature points and non-feature points, and to mitigate the impact of noise on feature detection, we propose a method of locally weighted centroid displacement. Building on the calculation of vector
for each neighboring point of point
, we first create a temporary point
(as shown by the green point in
Figure 3b), which is used to construct the vector
. Subsequently, we fit a new centroid
using all these temporary points
(as depicted in
Figure 3c). We then construct a new centroid displacement vector
, which is calculated according to Equation (4).
Here, , where represents the set of nearest neighboring points of point , and denotes the average distance between and all its neighboring points. The locally weighted centroid displacement scheme enhances the differences between points in shallow features and those in flat areas by considering the relative positions of points to their neighbors.
Subsequently, the magnitude of the centroid displacement vector for each point will be calculated and statistically analyzed, serving as a superior reference metric to replace . Based on the statistical outcomes, a reasonable threshold will be established to differentiate feature points from non-feature points. With these steps completed, the extraction of the facade’s semantic feature point cloud set is finalized.
3.2. Improved Hausdorff Distance Discrimination
Considering that the facade windows of large urban buildings are often arranged in a regular pattern of identical or similar structural arrays, identifying and grouping windows with the same structure will help overcome data gaps and point density differences caused by scanning view limitations, thereby optimizing the extraction results of semantic structures. Therefore, in this phase, we utilize object-level grouping based on the Hausdorff distance for discrimination.
3.2.1. Utilizing Slice-Based Region Growing for Preliminary Extraction of Semantic Individual Objects
At the onset of processing, we initially employ the region growing method to segment the obtained facade semantic feature point cloud U into individual object. Since common attachments on building facades, such as long pipelines and drying clothes, can interfere with the effectiveness of the region growing algorithm, we first slice the point cloud U vertically. The initial thickness of the slices is set based on common building floor heights and is refined through a sliding window algorithm that adjusts the slice thickness for different levels, gradually optimizing until each slice fully covers all areas of the same floor. Subsequently, the region growing algorithm is applied to each floor slice for preliminary segmentation, and the segmentation results of each slice are merged to form an overall preliminary object segmentation. Ultimately, we obtain several semantic object point cloud clusters, denoted as
, where i represents the number of segmented semantic objects. As shown in
Figure 4.
3.2.2. Objects Similarity Measurement
The Hausdorff distance is a metric for measuring the minimum distance between two point sets, defined as the greatest distance between any two points from the compared sets. This measurement effectively captures the similarity of shapes and is particularly useful for comparing and identifying geometric objects, especially when dealing with nonlinear or irregular shapes. However, traditional Euclidean distance may overestimate in the presence of erroneous data, particularly with nonlinear data, due to its characteristic of treating errors in all dimensions equally.
To address this challenge, we introduce the FPFH (Fast Point Feature Histograms) feature as an alternative to traditional Euclidean distance [
16]. FPFH captures the local geometric characteristics of point clouds through statistical analysis, generating a 33-dimensional feature vector that effectively describes the local shape and surface changes in the point cloud. It extends the scope of feature description by considering the relationships between the query point, its neighboring points, and the neighbors of those neighbors, while maintaining high computational efficiency. FPFH excels in multi-scale feature extraction, adapting to point cloud datasets of varying sizes, reducing the need for parameter tuning, and enhancing the algorithm’s adaptability to changes in the scale of point clouds. This ensures the accurate capture of key geometric features at various scales, thereby strengthening the comprehensiveness and robustness of feature description.
For two sets of points A and B, the computation of the Hausdorff distance
is as follows
Figure 5.
The Hausdorff distance quantifies the similarity between the two sets of points and is used for the subsequent processing steps.
3.2.3. Hierarchical Clustering Grouping Based on Similarity Distance Difference Matrix
Considering the focus of this paper on semantic extraction for large urban building facades, the window structures on building facades typically exhibit regular spatial arrangements and often include a multitude of similarly shaped and sized individual objects. Leveraging this characteristic, we group the individual objects obtained from region growing segmentation in this stage based on structural similarity.
Specifically, we calculate the Hausdorff distances between objects within the point cloud clusters
to assess the similarity among these semantic object point cloud clusters. Assuming the preliminary segmentation results in
objects, we construct an
matrix, which is a diagonal matrix, with the element at the i-th row and j-th column representing the Hausdorff distance between the i-th and j-th segmented objects. By summing and statistically analyzing each row of this matrix, we can use hierarchical clustering methods to group segmented objects with similar structures together [
45]. Generally, windows with the same structure, even if not completely identical in the point cloud scanning results, will show clear similarities. Moreover, windows on different facades of the same building are often designed to be the same or similar. Therefore, when grouping, we aim to minimize the number of groups to better adapt to noise and data loss in the point cloud.
Additionally, it is important to consider that attachments on building facades, such as pipes, can interfere with the segmentation of semantic objects, causing adjacent windows to be incorrectly grouped together, or a single object to be segmented into multiple parts due to point cloud data loss. Experiments have shown that while such inevitable missegmentations occur, they are not dominant. Through the assessment of similarity between objects in this stage, these unavoidable missegmentations will be identified and excluded from the existing groups, thereby being removed from the set to prevent interference with subsequent computational processes.
3.3. Position-Shape Dual Optimization
In the previous phase, we obtained preliminary object-level point cloud segmentation results and grouped segmented objects with similar structures. In this phase, we will fully consider the spatial and shape-related similarities of window objects on building facades. To address the challenges caused by missing point cloud data, noise interference, and segmentation errors resulting from the limitations of segmentation methods, we will enhance and optimize the semantic structure of the facades from both positional and shape perspectives.
3.3.1. Spatial Distribution Correlated Position Optimization
When processing the grouped point cloud set , we first calculate its bounding box and determine the central point . Although theoretical, the central points of these semantic structure objects should follow spatial patterns of collinearity, coplanarity, or equal spacing, the characteristics of point cloud scanning and the accuracy of segmentation algorithms actually affect the distribution of these central points . Therefore, the primary task of this step is to establish spatial relationships among these central points.
Specifically, for the extracted building facade point cloud, we project it onto a plane and define a planar coordinate system
. The origin O of the coordinate system is located at the lower left corner of the plane’s bounding box, with the
Z-axis pointing towards the zenith and the
X-axis pointing to the right, perpendicular to the
Z-axis. Next, we project the extracted central points
onto both the
X-axis and
Z-axis. The projection onto the
X-axis is used to identify the collinear relationships of
, while the projection onto the
Z-axis is used to identify the coplanar relationships.
Figure 6 illustrates this step, with
Figure 6a showing the extracted central points
(indicated by blue dots), and
Figure 6b demonstrating that these fitted central points should theoretically satisfy collinear constraints, although there are minor deviations due to the accumulation of errors.
Based on the segmentation results of the windows, the target center points (blue points
) are obtained. By projecting these points onto the X and Z axes and fitting them to lines (
,
), the intersection of these lines represents the optimized window position (yellow points
). Furthermore, the previously obscured window structures in the point cloud can also be inferred (red points), as shown in
Figure 6b. The corresponding co-row and co-column lines can be expressed as follows.
Subsequently, we will use these intersection points
to optimize the positions of
, ensuring they strictly adhere to collinear relationships. For each segmented object
, we will find the nearest
point and replace the current
coordinates with it, while removing the corresponding nearest point from the set, as shown in
Figure 6c. This process does not alter the original positions of the building point cloud but instead provides a more accurate center point for each segmented object as an anchor (indicated by yellow dots) to support further structural optimization.
It should be noted that the intervals of co-row and co-column relationships should be regular, meaning that
and
should conform to some arithmetic sequence combination. In the simplest case, if all windows on a facade have the same structure and are evenly distributed, then
and
will form arithmetic sequences. For more complex facades, as shown in
Figure 7, the distribution of window columns may exhibit a pattern similar to alternating arithmetic sequences, where for a set
= (
), the intervals between adjacent columns can be described as (
).
3.3.2. Robust Extraction of Window Contours with In-Group Superimposed Line Optimization
Considering the uniform structural arrangement of windows on the facade of a building, in the shape optimization phase, we will superimpose and fit the windows of the same group to obtain the most complete and accurate window outline, and match it to the center point of each, thereby addressing the differences in scanning accuracy, omissions, and interference from facade attachments. For each segmented object , the classic -shape algorithm is used to extract the contour boundary from the point cloud data, which can handle the concavity and convexity in the point cloud data, generating a polygonal shell to approximate the region occupied by the point cloud.
Subsequently, based on the grouping results of
, each group is processed individually. For any group, based on the optimized object center point
as the insertion anchor point, the obtained contour boundary is superimposed, and the line segments are measured for similarity and merged. Specifically, we first define a metric to measure the similarity between two adjacent line segments as follows:
where
is two similar line segments
We created a class LINE to store and represent the parameters of a line segment, as shown in
Figure 8.
translates angular differences into a length representation, where
represents the angle between line segments
and
.
denotes the perpendicular distance, which is a method that comprehensively considers distance and mitigates the impact of extreme values. Here,
and
represent the projection distances from the two endpoints of line segment
to line segment
.
quantifies the distance between two line segments by calculating the distance between their projected endpoints, where
indicates the minimum distance between the endpoints of the two line segments. Generally, if two line segments
and
are similar, the similarity metrics
will have smaller values. In other words, a smaller
value implies that line segments
and
are more similar, as shown in Step 2 of
Figure 9.
Under typical circumstances, the longer the line segment, the more likely it is to be correct. Therefore, the midpoint of the longest line segment is used as the positioning point for a new line segment (the red point in the figure). By utilizing similarity-based line segment clustering, a dominant direction is calculated, and then we can generate a new line using the central point and the directional vector. Finally, based on orthogonal projection, we use the two endpoints with the maximum distance to crop the line, resulting in a merged line segment, as shown in
Figure 9.
Through the aforementioned steps, we have been able to construct a stable window outline for the segmented objects within the same group. Windows within the same group should exhibit consistent structural features, allowing us to use this outline to uniformly replace the contours of all individual objects within the group. Each group is processed independently, and by integrating the optimized positional results , we can generate a semantic layout map for the entire building facade.
3.3.3. Enhancement of Facade Model
The facade layout map obtained from the aforementioned steps will be applied to optimize the facade structure of 3D building models. Currently, it is easy to obtain LOD1/2 block models of buildings, where facades are often represented by planes with images as textures to describe the facade structures. Utilizing the facade layout map resulting from the methods proposed in this paper, we can achieve an accurate distribution of facade semantic structures that closely aligns with the real structures, guiding the refinement operations of building models.
Figure 10 demonstrates the enhancement outcomes of the facade models from the dataset used in this study. In comparison to the block models, the models obtained are suitable for further applications in urban management and analysis.
5. Discussion and Conclusions
5.1. Discussion and Limitation
Based on these analyses, the following discussion highlights both the innovations and limitations of the method presented in this paper.
(1) Innovations. A novel point cloud feature extraction method is proposed, capable of accurately capturing features in shallow regions and adapting to various complex environments. Moreover, the detection accuracy of window reconstruction is infallible. Through effective feature discrimination and regularization, the reconstructed windows achieve a high degree of precision in both position and shape.
(2) Limitation. The existing methods simplify semantic entities into rectangles, overlooking original shapes such as columns and staircases, leading to inadequate recognition of complex windows. Additionally, the effectiveness of judging the similarity of window structures depends on the previous segmentation results; poor segmentation can lead to errors in discerning the similarity of windows. Although the use of a sliding window to segment buildings helps reduce errors, this approach still results in missed detections. Furthermore, the set thresholds may exclude valid entities.
5.2. Conclusion and Future Work
In summary, this study successfully implemented a dual optimization method for building facade point cloud semantic detail enhancement based on optimal feature guidance in terms of both position and shape. The proposed framework effectively addresses the shortcomings of urban building facade reconstruction, such as the lack of semantic information, shape deformation, and positional inaccuracies. The novelty of our framework lies in the introduction of a new feature extraction method based on optimal neighborhood search, coupled with the dual optimization of position and shape to enhance the semantic details of facade point clouds. This framework not only efficiently extracts point cloud features in complex scenes but also integrates similarity discrimination and dual optimization of position and shape into the semantic facade model reconstruction process. The detection accuracy reached 98.5%, surpassing existing methods. According to the experimental analysis, the proposed framework adapts well to different complex environments and can recover point cloud facade semantic information to the greatest extent, even in areas with significant occlusion.
The extension research aims to advance the extraction of semantic details from photogrammetric point clouds or LiDAR point clouds, reducing potential quality degradation in facade detail reconstruction. We have made relevant progress in point cloud feature extraction and regularization. Furthermore, considering the increasing prevalence of large-scale models such as GPT (Generative Pretrained Transformer) and Stable Diffusion, we plan to integrate the point cloud facade semantic information model with these models to explore methods for recognizing and completing the semantic reconstruction of complex building structures. The extraction of cultural heritage details, patterns, and graphically weighted features involves the extraction and fitting of curves on the plane and even line segments in three-dimensional space, which is an independent and complex research topic. The methods for extracting and weighting features of cultural heritage details, patterns, and graphics are part of our further research content.