Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network

Wang, Cheng; Liu, Haibing; Deng, Fei

doi:10.3390/rs17030359

Open AccessArticle

Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network

by

Cheng Wang

,

Haibing Liu

and

Fei Deng

^*

School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(3), 359; https://doi.org/10.3390/rs17030359

Submission received: 5 December 2024 / Revised: 19 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

(This article belongs to the Special Issue 3D and Semantic Reconstruction of the Urban Environment Using Multi-Modal and Multi-Resolution Remote Sensing Data)

Download

Browse Figures

Versions Notes

Abstract

:

Geometric building models are essential in BIM technology. The reconstruction results using current methods are usually represented using mesh, which is limited to visualization purposes and hard to directly import into BIM or modeling software for further application. In this paper, we propose a building model reconstruction method based on a transformer network (DeepBuilding). Instead of reconstructing the polyhedron model of buildings, we strive to recover the CAD modeling operation of constructing the building models from the building point cloud. By representing the building model with its modeling sequence, the reconstruction results can be imported into BIM software for further application. We first translate the procedure of constructing a building model into a command sequence that can be vectorized and processed by the transformer network. Then, we propose a transformer-based network that can convert input point clouds into the vectorized representation of the modeling sequences by decoding the geometry information encoded in the point features. A tool is developed to convert the vectorized modeling sequence into a 3D shape representation (such as mesh) or file format that other BIM software supports. Comprehensive experiments are conducted, and the evaluation results demonstrate that our method can produce competitive reconstruction results with high geometric fidelity while preserving more details of the building reconstruction.

Keywords:

three-dimensional reconstruction; building model; point cloud; BIM; language model; CAD

1. Introduction

Building Information Modeling (BIM) technology represents a comprehensive virtual model that encapsulates all the relevant information of a building asset [1]. The use of three-dimensional geometric models enables the integration of various types of building data into a unified, parametric representation [2], facilitating efficient and seamless interactions within a Common Data Environment (CDE). In parallel, point clouds derived from advanced sensing technologies, such as 3D laser scanning and photogrammetry, are capable of capturing the surface geometries of target objects with high levels of accuracy and efficiency [3,4]. Consequently, point clouds have become an essential tool for reconstructing the geometric models of existing buildings [5], aiding in the creation of their digital twins [6]. However, current building reconstruction methods typically generate mesh models, which present challenges when it comes to importing them directly into BIM platforms like Revit [7]. As a result, there is a notable gap in the development of an automated solution capable of converting point clouds into editable geometric models that are easily integrated into BIM software for subsequent tasks.

Computer-aided design (CAD) is a cornerstone of modern industries, and is widely used in mechanical engineering, aerospace, architecture, and many other disciplines. CAD models are widely used as representations of parameterized designs, particularly in the field of civil engineering. During the engineering design phase, building design typically follows a top-down approach, where 3D CAD models of buildings are first created and subsequently utilized in various downstream tasks [8], including further design modifications and construction activities. In practice, building models are usually constructed through a sequence of modeling commands, such as drafting, sketching, extrusion, and Boolean operations. In contrast, establishing the Building Information Model (BIM) of an existing building can be viewed as the reverse process of the design workflow. If the original modeling commands are recoverable, both the geometric models and BIM data can be reconstructed. Additionally, these modeling commands can be reformatted into file formats compatible with BIM or modeling software, providing significant benefits for downstream applications like finite element analysis [9] and daily management [10]. Motivated by this concept, this paper explores the recovery of modeling commands used to construct a building’s geometry from input point clouds. These modeling commands can be regarded as a design language, akin to the language tokens in natural language processing [11], which encapsulate the operations and commands employed to generate the geometric model. In this study, we refer to this collection of commands as the modeling sequence. By recovering this modeling sequence, the corresponding building geometry can be recreated, and the generated sequence can be easily imported into BIM software for further use.

Recent advancements in Artificial Intelligence have led to significant breakthroughs, particularly in the field of Large Language Models (LLMs). Models such as BERT [11] and ChatGPT [12] have demonstrated exceptional capabilities in reasoning and multi-modal predictions [13]. The modeling sequence of civil structures can be interpreted as a design language, which can similarly be predicted using transformer-based language models. Moreover, the process of recovering the modeling sequence can be viewed as extracting the implicit geometric features encoded within the point cloud data. To address this, we propose an end-to-end network based on a transformer network designed to convert input point clouds into corresponding modeling sequences. By leveraging the transformer’s powerful ability of feature representation learning, our approach decodes the implicit geometric information embedded in the point cloud and transforms it into a modeling sequence.

In this paper, we propose an approach for reconstructing building models by recovering the modeling sequence from point cloud data. By retrieving the modeling sequence, we can achieve high geometric fidelity in building reconstruction, with results that can be seamlessly imported into BIM software for further applications. Our approach begins with the representation of building models as a vectorized modeling sequence, utilizing the widely adopted sketch–extrude modeling process. We then introduce an end-to-end network, based on the transformer architecture (DeepBuilding), which is capable of converting point clouds into the corresponding vectorized modeling sequence.

The main contributions of this paper can be summarized as follows:

We propose a method for building model reconstruction by recovering the modeling sequence from input point clouds. In this approach, the building model is represented by its vectorized modeling sequence rather than a mesh. By representing the geometry through modeling commands, the reconstruction results can be directly imported into BIM software for further applications. Additionally, we develop a tool to convert the modeling sequence into file formats compatible with BIM software.
We introduce an end-to-end network based on the transformer architecture that converts point clouds into vectorized modeling sequences. This network employs PointNet++ as the point tokenizer to extract point embeddings, after which a transformer network decodes the extracted features into corresponding command sequences.
We conduct a comprehensive evaluation of the proposed building reconstruction method. The results show that our approach preserves more geometric details in the reconstruction while achieving competitive performance compared to existing methods.

2. Related Works

The reconstruction of building models has garnered significant attention due to its potential to integrate various types of data within Building Information Modeling (BIM) [14]. A wide range of approaches for urban building reconstruction has been proposed, including rule-based methods [15], image-based techniques [16], and 3D point cloud-based methods [17]. This paper specifically focuses on the reconstruction of geometric building models using point clouds. Furthermore, we propose a novel reconstruction method that predicts the modeling sequence directly from the input point cloud. This modeling sequence is formatted in accordance with CAD standards, ensuring seamless integration with BIM software such as Revit. Thus, both building model reconstruction techniques and CAD-based methods in reverse engineering are thoroughly discussed in the following section.

2.1. Building Model Reconstruction

Building model reconstruction methods based on point clouds can be primarily classified into two categories: data-driven and model-driven approaches [18]. Data-driven reconstruction begins by segmenting the point cloud using various data features. The predicted labeled point sets are then modeled individually, and different primitives are combined to obtain the final 3D building reconstruction. This approach includes several mainstream techniques for 3D building reconstruction from point clouds, such as region growing methods [19,20], feature clustering methods [21,22,23,24], model fitting methods [25,26,27,28], and global energy optimization methods [29].

Model-driven reconstruction, in contrast, is a top-down approach where the primitive library is first established, containing geometrical and topological information about the primitives. Optimization methods, like least square matching, are then used to fit the point cloud to these predefined primitives. Zhang and Weng [30] divided irregular building contours into non-overlapping quadrilaterals using 2D building contour data, with each corresponding roof primitive being identified via decision trees. The adjacent simple roof primitives were subsequently combined to form complex roof models. Similarly, Xiong et al. [31] developed roof topology graphs to identify the corresponding primitives for each roof, and through topological operations, they were able to generate complete building models. Nan et al. [32] proposed the PolyFit framework for reconstructing lightweight polygonal surfaces from point clouds by intersecting plane primitives and combining them to generate a manifold polygonal surface model without boundaries. Huang [33] introduced a fully automatic approach (City3D) for reconstructing compact 3D building models from large-scale airborne point clouds, directly inferring vertical walls from the data. Model-driven methods benefit from prior assumptions that ensure the correctness of topological relationships and aid in semantic information extraction. However, challenges persist, such as mismatches between the point cloud and primitive types, low model fitting accuracy, and the need for careful parameter tuning.

Additionally, some researchers have integrated deep learning algorithms into building model reconstruction. Huang et al. [34] designed 11 roof primitives and reconstructed complex roofs by merging them in the influence space. Li et al. [35] and Zhang et al. [36] proposed a model-driven parametric modeling method that uses the PointNet++ network model for high-precision classification of roof point cloud data. Chen et al. [37] proposed a framework called Points2poly for reconstructing building models from point clouds. This framework leverages a deep neural network for building occupancy estimation and uses a Markov random field to extract the outer surface of the building through combinatorial optimization. Chen et al. [38] also introduced PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction, which learns to assemble primitives obtained by polyhedral decomposition via graph node classification to achieve a watertight and compact reconstruction.

While most point cloud-based 3D building reconstruction methods focus on geometric dimensions such as model accuracy and abstraction levels, they often lack semantic descriptions of the models. The output representations for these reconstructions are typically parametric models (e.g., model fitting or matching) and surface modeling (e.g., boundary representation, B-rep) [39]. The results are usually expressed using geometric representations like polygonal models (mesh), which are limited to visualization purposes. This restricts the ability to query, edit, and analyze building models and their components, thereby hindering deeper applications [40]. If the reconstruction results are programmatic or parametric, users can modify complex objects by altering their positions or performing logical operations. However, CSG-based representations for 3D building reconstruction are rarely employed, and no method currently uses programmatic CSG models to represent reconstruction results.

In this paper, we propose a novel method for building model reconstruction based on an end-to-end transformer network that converts point clouds into modeling sequences. By representing the geometric model using modeling commands, the reconstruction results can be easily imported into BIM software for further applications.

2.2. CAD Model Reconstruction

CAD reverse engineering has long been a sought-after goal in the CAD industry due to the significant time and resource savings it offers [41]. Fitting geometric primitives to 3D point clouds through algorithms such as RANSAC and the Hough transform has dominated this field for years [27,42,43,44]. However, these methods require meticulous parameter tuning for each shape type and often rely on user assistance.

With the recent availability of large boundary representation (B-Rep) datasets, a data-driven framework has emerged for reverse engineering in the CAD industry. SPFN [45] was the first end-to-end supervised primitive fitting network. PARS-ENET [46] expanded the range of primitives the network can handle, improving surface representation fidelity. CPFN [47] further enhanced SPFN’s performance on high-resolution point clouds by dynamically aggregating primitives across global and local scales. Despite these advances, these methods primarily segment primitive patches and overlook the edges that define the boundaries of the primitives, leading to inaccurate and incomplete reconstructions.

Beyond fitting primitives, methods such as ExtrudeNet represent sketches using signed distance fields (SDFs). Inspired by DeepCAD [48], Point2Cyl reconstructs 3D shapes from point clouds through extrusion segmentation and encodes the sketch using the latent embeddings of its SDF. Similarly, [49] advocates for the use of implicit fields for sketch representation and proposes SECAD-Net, which predicts 2D sketches and 3D extrusion parameters from raw shapes. However, these methods rely on implicit sketch representations, which introduce curved edges in the inferred sketches. Transforming from the implicit field to the sketch can lead to parameter errors, further complicating the process.

Some CAD generation methods [50,51] utilize generative models, such as diffusion models, to generate modeling sequences. CAD-Diffuser [50] is a generative model based on diffusion, but it is not designed for reconstruction purposes. While the diffusion model is inherently generative, our goal is to facilitate building reconstruction through parametric CAD modeling sequences for further application. By recovering the modeling sequence from input point clouds, the reconstruction results can be used in mechanical simulations or BIM reconstruction, moving beyond mere visualization.

To address this, we propose a reconstruction-based approach using a transformer network, similar to DETR [52] in image detection and PoinTr [53] in point cloud completion. Our method directly predicts or infers the modeling sequence, emphasizing editable and reusable building model reconstruction. The proposed geometric model reconstruction method explicitly represents the 3D shape of the point cloud through its CAD modeling sequence. The DeepBuilding network can directly produce vectors of the sketch command sequences and extrusion operations, which can then be converted and imported into BIM software to reconstruct editable 3D solids with sharp edges.

3. The Modeling Sequence of Buildings

Our goal is to generate the CAD modeling sequence of building models from input point clouds. To represent the modeling process of buildings, we introduce a command formation method, as illustrated in Figure 1. This modeling sequence is then vectorized to enable processing by the neural network. We leverage the widely used 3D modeling pattern, specifically the sketch–extrude modeling procedure, as the basis for the modeling sequence. This approach is capable of covering most modeling procedures, as demonstrated in the Fusion 360 Gallery Dataset [54]. In the following section, we will provide a detailed description of the sequence hierarchy and the definition of each command within the sequence.

3.1. Sequence Hierarchy

Figure 1 demonstrates a simple example of constructing a civil building in BIM or CAD software (such as AutoCAD 25.0.071.0) and the formulation of the modeling sequence that records the whole modeling procedure. It can be seen that the building modeling procedure can be seen as a repetitive sketch–extrude process. In this paper, only the sketch–extrude process is considered. More specifically, a sketch

S_{i}

is extruded by an extrusion operation

E_{i}

to form an extrusion solid. Multiple extrusions are grouped by Boolean operation and constitute the final geometrical model of the buildings. Thus, the modeling sequence M of a building can be formulated as

M = [〈 S O L 〉, S_{1}, E_{1}, S_{2}, E_{2}, . . ., S_{i}, E_{i}, 〈 E O L 〉]

. The modeling sequence M can then be seen as a type of building modeling language and can be processed and reasoned using a language model.

3.2. Command Specification

The definition of each modeling command is shown in Table 1. The command sequence mainly consists of two types of modeling commands, drafting a sketch and extrusion operations, respectively.

Sketch. The sketch hierarchy consists of curves and loops. In this paper, the curve includes the three most widely used commands (draw a line, draw an arc, and draw a circle). A closed path can be defined by grouping one or multiple curves, namely a loop. A sketch profile is then formed by some outer loops as boundaries and inner loops as holes. A loop always starts with an indicator command,

〈 S O L 〉

, and the whole modeling sequence ends with an

〈 E O L 〉

command as the indicator. Furthermore, if a curve is defined by its two endpoints, an extra matching procedure is needed to align its endpoint to the start point of the next curves, leading to accuracy cost. Therefore, we omit the starting points of each curve and leave the endpoint of the last curve as the starting point of each loop. The line is represented by its endpoint. We represent an arc using its start point, midpoint, and circle, as well as its center and radius. This simplification helps to uniformize the parameters. Therefore, a sketch

S_{i}

is defined as

S_{i} = [t_{i}, x, y, x_{m}, y_{m}, r]

, where

t_{i}

encodes the command type information.

Extrude. The extrusion operation consists of four types of parameters, including the sketch plane orientation, sketch plane origin, extrusion distances towards both sides, and Boolean operation. The sketch plane defines where the sketch is drafted, and the extrusion distance indicates how far the sketch is extruded. Each extrusion is grouped according to the type of Boolean operation. Thus, the extrusion operation can be defined as

E = [θ, ϕ, γ, o_{x}, o_{y}, o_{z}, e_{1}, e_{2}, b, s]

.

To sum up, for each command

m_{i}

, its parameters are stacked into a

1 \times 15

vector, which is

p a r a m_{i} = [x, y, x_{m}, y_{m}, r, θ, ϕ, γ, o_{x}, o_{y}, o_{z}, e_{1}, e_{2}, b, s]

, the elements of which correspond to the collective parameters shown in Table 1. Unused parameters are padded with

- 1

. Therefore, each command

C_{i} = (t_{i}, p a r a m_{i}) = (t_{i}, x, y, x_{m}, y_{m}, r, θ, ϕ, γ, o_{x}, o_{y}, o_{z}, e_{1}, e_{2}, b, s)

is specified by its command type

t_{i}

and command parameters

p a r a m_{i}

. In conclusion, we represent a geometrical model using a fixed total number

N_{c}

of modeling commands, namely

M = [〈 S O L 〉, M_{E_{1}}, M_{E_{2}}, . . ., M_{E_{i}}, 〈 E O L 〉]

. In this paper, we choose

N_{c} = 120

, a model’s maximal command sequence length in the training dataset.

4. Materials and Methods

4.1. Overview

We propose a building geometry model reconstruction method from point clouds, as shown in Figure 2. The point cloud of the buildings is first tokenized into point embeddings by the point tokenizer. The point embeddings are refined by the transformer encoder to extract fine-grained point features. The point features are then decoded into the vectorized modeling sequence

M = [〈 S O L 〉, M_{E_{1}}, M_{E_{2}}, . . ., M_{E_{i}}, 〈 E O L 〉]

by the transformer decoder. A tool is developed based on the PythonOCC library [55] to convert the command vectors M into 3D representation, such as mesh, and the file formats that other BIM software support, such as ‘.step’ and. ‘JSON’. By importing the modeling sequence into this software, the final geometrical models can be further applied in simulation and daily management.

By recovering the modeling sequence from the building point cloud, the reconstruction results can be seamlessly imported into BIM software such as REVIT or CAD. To achieve this, we propose an end-to-end network based on the transformer architecture, namely DeepBuilding, as illustrated in Figure 3. The input point cloud is first tokenized into point embeddings using a point tokenizer. These point embeddings are then processed by the transformer encoder. By leveraging the long-range dependency modeling capabilities of the attention layer, the point features are refined. Following this, a transformer decoder decodes the refined point features into sequence features. Finally, two separate prediction heads—namely the type and parameter prediction heads—are employed to produce the final modeling sequence prediction.

To find the geometry information encoded in the point cloud of the buildings or infrastructure, we first develop a point tokenizer based on the PointNet++ used by Qi et al. [56] to extract point embeddings that the network can learn and further decode into modeling sequences, as shown in Figure 4a. After the repetitive sampling–grouping–PointNet operation, the final point embeddings are extracted. The point embeddings can be seen as a sequence of key point features which encode the latent geometry information contained in the input point cloud. The point embeddings are then input into the transformed network for modeling sequence prediction.

4.2. Transformer Encoder for Implicit Space Conversion

The procedure of decoding point embeddings into a modeling sequence can be viewed as a set-to-set conversion problem, which involves transforming the latent embeddings from the latent space of point features to the latent space of the modeling sequence. To achieve this transformation, a bridge is required. We leverage the long-range modeling capabilities of the transformer [57] and utilize the transformer encoder as this bridge to connect the two latent spaces. The model architecture of the transformer encoder is shown in Figure 4a. The transformer encoder consists of stacked self-attention layers, as depicted in Figure 4c. The embeddings are generated by the previously mentioned point tokenizer. This process can be seen as establishing deep connections between the point features

f_{p i} \in R^{1 \times C}

through the repeated application of the self-attention layers. By leveraging the efficient long-range dependency modeling capabilities of the attention mechanism, the latent space of the point features is effectively connected to the latent space of the modeling sequence.

4.3. Transformer Decoder

The transformer decoder decodes the point features refined by the transformer encoder into a modeling sequence. As shown in Figure 4b, the transformer decoder has a symmetric architecture to the transformer encoder, except for the additional self-attention layers for the learnable queries. N layers of the self-attention and cross-attention layers constitute the transformer decoder. The point features

f_{p} \in R^{N \times C}

refined by the transformer encoder are input into the cross-attention layer, where a learnable query is used to query the corresponding information from the point features. We feed the positional queries (generated by the key points sampled by the FPS) and content queries (point features) into the decoder to probe the commands that correspond to the position encodings and have similar patterns to the content queries. Output from the last transformer block is fed into a linear layer to predict a modeling command sequence

M = [〈 S O L 〉, S_{1}, E_{1}, S_{2}, E_{2}, . . ., S_{i}, E_{i}, 〈 E O L 〉]

, including both command type

{\hat{t}}_{i}

and command parameters

{\hat{p}}_{i}

. Two separate Multi-Layer Perceptions (MLPs) are used to output the command type and command parameters, namely the type and parameter head. The prediction of our model can be factorized as

p (\hat{M} | \hat{f_{p}}, Θ) = \prod_{i = 1}^{N_{c}} p ({\hat{t}}_{i}, {\hat{p}}_{i} | \hat{f_{p}}, Θ),

(1)

where

Θ

denotes the network parameter of the transformer decoder, and

\hat{f_{p}}

denotes the point features refined by the transformer encoder. i denotes the i-th modeling command and

N_{c}

denotes the number of predicted modeling commands.

4.4. Loss Function

The command parameter prediction is supervised using the L-1 loss, while the command type prediction is supervised using the focal loss [58]. The loss

L

for the training process can be formulated as

L = \sum_{i = 1}^{N_{c}} l_{f o c a l} ({\hat{t}}_{i}, t_{i}) + β \sum_{i = 1}^{N_{c}} \sum_{j = 1}^{N_{p}} l_{1} ({\hat{p}}_{i, j}, p_{i, j})

(2)

where

l_{f o c a l} (\cdot, \cdot)

represents the focal loss and

l_{1} (\cdot, \cdot)

represents the L-1 loss, respectively.

N_{P}

is the number of parameters and

N_{c}

is the number of commands.

β

is the weight to balance both terms (

β = 2

in our examples).

4.5. Importing into BIM Software

As for our method, the input of the model is the point cloud, and the output is a matrix

M = [〈 S O L 〉, S_{1}, E_{1}, 〈 S O L 〉, S_{2}, E_{2}, . . ., 〈 S O L 〉, S_{i}, E_{i}, 〈 E O L 〉]

that defines the modeling sequence. The matrix can be easily converted into ‘.step’ or ‘.JSON’ files using tools developed based on the PythonOCC [55] library. Each row of the output tensor

M [i]

represents one modeling command. The first element

M [i] [0]

of each row vector indicates the command type, and the other elements

M [i] [1 :]

are the parameters. After parsing the output matrix into a sequence of the modeling commands

[S_{i}, E_{i}]

, a tool developed based on the PythonOCC library is used to construct the CAD shape from the modeling sequence. Then, the function ‘write_step_file()’ of OCC data API can be used to convert the reconstructed CAD shape into ‘.step’ files. The pseudocode of converting the command sequence into other file formats is presented in Appendix A. Therefore, the final output of our proposed reconstruction method can be imported into CAD software. Also, we provide a simple example of the output JSON file format in Appendix B to display the format of the exported file.

5. Results

We evaluate the proposed building reconstruction method on the widely used public modeling sequence ABC [59] Dataset and the collected building Dataset.

5.1. Experiment Setup

The proposed transformer-based network, DeepBuilding, requires a large amount of data for training from scratch. Given the limited availability of building modeling sequence datasets, we utilized the ABC dataset [59] to pre-train the network. The ABC dataset contains one million modeling sequence samples collected from CAD software, which provides an ample dataset for pre-training. The data in the ABC dataset includes modeling operations commonly used by humans to create and design 3D models. In total, we collected more than ten thousand training samples. Additionally, we created a smaller building dataset containing 1000 samples based on the OnShape API [60] to fine-tune the model. We report the performance of our method on these two datasets to evaluate the effectiveness of our proposed reconstruction method.

We implemented our network using the easily available

P y t o r c h

library [61], which is prominent in this community. Our network was constructed using

P y t h o n

3.10 and

P y t o r c h

1.13.1. The transformer encoder and decoder contain six and eight transformer layers, respectively, with eight attention heads and latent model dimension 256. Moreover, we trained our model on a Linux server with two Nvidia V100 GPUs of 32 GB memory. The model weights pre-trained on the ABC database were used to initialize the model parameters during fine-tuning. The proposed network was trained for 400 epochs with a total batch size of 64 and a learning rate (lr) of

1 \times 10^{- 5}

with a linear warmup.

A d a m W

[62] with a

w e i g h t d e c a y

of

1 \times 10^{- 5}

was used to optimize our model for back propagation. The learning rate strategy starts with a 10-epoch linear warmup from

1 \times 10^{- 7}

and is continued by an

E x p o n e n t i a l L R

strategy with an initial learning rate of

1 \times 10^{- 5}

and a decay factor of

0.9 \times (1 / 50)

. The learning rate is adjusted by the strategy at each epoch.

5.2. Evaluation Metrics

Following Vitruvion [63], we used the command type and parameter accuracy as the metric for evaluating the performance of the proposed reconstruction method. For each modeling command, we calculated the command type accuracy using

A C C_{t y p e} = \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} Ψ [C_{i} = {\hat{C}}_{i}]

(3)

where

N_{c}

denotes the total number of the CAD commands;

C_{i}

and

{\hat{C}}_{i}

are the ground truth command type and predicted command type, respectively.

Ψ [i] \in {0, 1}

is the indicator function. The command parameter accuracy is calculated by

A C C_{p a r a m} = \frac{1}{K} \sum_{i = 1}^{N_{c}} \sum_{j = 1}^{| {\hat{p}}_{i} |} Φ [| p_{i, j} - {\hat{p}}_{i, j} | < ϵ] Ψ [C_{i} = {\hat{C}}_{i}]

(4)

where

K = \sum_{i = 1}^{N_{c}} Ψ [C_{i} = {\hat{C}}_{i}] | p_{i} |

is the total number of parameters in all correctly recovered commands.

p_{i, j}

and

{\hat{p}}_{i, j}

are ground truth command parameters and predicted command parameters.

ϵ

is the tolerance threshold accounting for the parameter accuracy. In practice, we used

ϵ = 0.01

.

After reconstructing the modeling command sequences of the input point clouds, the 3D geometry can be reconstructed using our developed tools based on PythonOCC [55]. To measure the quality of the recovered 3D geometry, we used chamfer distance (CD) and edge chamfer distance (ECD) following previous models [64]. We evaluated CD by uniformly sampling 2000 points on the surfaces of the reconstructed geometry shape. Furthermore, following [64], we also report chamfer distance (

C D

), edge chamfer distance (

E C D

), normal consistency (

N C

), and the number of generated primitives (

Δ # P

) to measure the quality of the recovered 3D geometry. Also, following [38], we evaluated the building reconstruction using the Hausdorff distance.

5.3. Results on Public Modeling Sequence Dataset

The proposed network was first pre-trained on the public ABC dataset [59], so we first report the quantitative and visualized results on this dataset to evaluate the effectiveness of our proposed reconstruction method.

Figure 5 demonstrates the validation loss curve during pre-training. It can be seen from Figure 5a that the type errors stably decrease along the training process and reach a minimum value of

4.1 %

, which indicates that the network is fully converged and the type prediction accuracy is

95.9 %

. Furthermore, Figure 5b demonstrates the parameter loss during validation. The graph shows that the L-1 parameter prediction errors for all command types (line, arc, and circle) decrease stably during the training process. The final parameter errors are below 0.039 for the line, 0.0009 for the circle, and 0.0127 for the arc. From both the type and parameter validation loss curves, we can see that the proposed network performs well when predicting modeling command type and parameter.

To demonstrate the effectiveness of our method, we compare the proposed model with several existing approaches. These include generation-based methods such as DeepCAD [48] and HNC-CAD [51], as well as methods that represent sketches using SDF encoding, including SECAD-Net [49], Point2Cyl [65], and ExtrudeNet [66].

Table 2 presents the quantitative comparison results. From the command type and parameter accuracy, we observe that our method achieves the best performance in modeling sequence prediction. The type and parameter accuracies are 93.27% and 83.97%, respectively. Additionally, our method achieves a lower chamfer distance (CD) of 0.41 and an edge chamfer distance (ECD) of 0.607, indicating that it produces reconstruction results with higher geometric fidelity. The number of primitives

# Δ P

reflects the closeness of the reconstruction to human design. Our method achieves the lowest value of 4.97, suggesting that the generated modeling sequence is closer to human design. Given that our goal is to recover the modeling operations used to build the 3D model, these results validate the effectiveness of our proposed method.

For a more comprehensive comparison, we visualized the reconstruction results from all the methods, as shown in Figure 6. From Figure 6, it is evident that the geometric models produced by our method are the most complete and accurate. The edges are sharper than those produced by the other methods, and the reconstructed models fit the input point cloud perfectly. These model samples represent typical operations used in real-world design or modeling practices in industry. Therefore, these results demonstrate that our proposed method can effectively recover the modeling operations used to design real-world objects.

5.4. Results on Building Dataset

After being trained on the public ABC dataset, the proposed network was fine-tuned on a building dataset to empower the proposed network to reconstruct building models from the input point. Several widely used building reconstruction methods were selected for comparison, including PolyFit [32], City3D [33], Point2Poly [37], and PolyGNN [38]. Point2Poly [37] and PolyGNN [38] are learning-based methods, while PolyFit [32] and City3D [33] are non-learning methods.

Table 3 demonstrates the quantitative results of the building dataset. From the results, we can see that our method achieves comparable results to PolyGNN [38]. The traditional optimization-based approach, City3D, is designed for airborne point clouds with more priors, so the results are in less detail, and the merits are relatively higher than the others.

Figure 7 showcases some reconstruction examples of the comparison experiment. All the methods exhibit compact reconstructions. Compared to the traditional optimization-based approach, City3D [33], our method demonstrates the capability to handle more complex buildings commonly found in urban scenes. City3D [33] adopts exhaustive partitioning, resulting in inferior reconstruction accuracy as measured by the Hausdorff distance. In contrast, our approach directly recovers the modeling sequence, resulting in a more detailed reconstruction.

Figure 8 demonstrates the reconstruction results from real-world point clouds. As for the small cities, we first normalized each building point set and stored the transformation used for the normalization (shift and scale). Then, the point sets of each building were converted into a modeling sequence using our proposed method. By applying the stored transformation for normalization to these reconstructions, the small cities were rebuilt. From the results, we can see that the geometrical models by all the methods fit the point cloud. There are some failures in the results of PolyFit [32] and City3D [33], while the completeness of other methods is better. The results demonstrate that our method can produce satisfying reconstruction results.

The experimental results prove the effectiveness of the proposed method. By recovering the modeling sequence from the point cloud, accurate geometrical models can be reconstructed. The quantitative performance is competitive with the SOTA reconstruction method. Also, Figure 9 demonstrates some examples of the close look at the visualized reconstruction results. It can be seen that more details of the building reconstruction are preserved by our method because the recovered modeling sequence describes the whole design process of the buildings. Also, since the modeling sequence can be easily converted into file formats (such as ‘.JSON’ and ‘.step’), the reconstruction results can be imported into BIM software such as AutoCAD or Revit (the converting process is detailed in Appendix B).

6. Discussion

6.1. Robustness Analysis

We randomly dropped the input points to analyze the robustness of our method against variations in point cloud density.

Table 4 demonstrates the quantitative results of the robustness analysis. When uniformly dropping 25% of the points, the overall shape can still be maintained. The method fails to produce reconstruction results when the dropping rate is bigger than 25%, so the results after 25% dropping rate are omitted.

Figure 10 demonstrates some examples of building reconstruction under different dropping rates. It can be seen that most of the main structure can be obtained when the dropping rate is less than 10%. Details of the building reconstruction will be lost if the dropping rate is larger than 15%.

6.2. Limitations

We acknowledge that there are some limitations to our proposed method. The modeling sequence utilized in our approach currently supports only the sketch–extrude modeling procedure. While this sketch–extrude process is capable of describing most of the models, especially those commonly used in BIM, it may not cover all possible 3D shapes. Additionally, the sketch command in our method currently only supports basic curves (such as lines, arcs, and circles), which may restrict the expressive capacity of the building reconstruction. Models that cannot be constructed using the sketch-and-extrude commands, such as roofs with conical shapes, cannot be reconstructed using the proposed method. Furthermore, our approach requires a relatively complete point cloud input for accurate reconstruction. Incorporating point cloud completion techniques can address this limitation.

7. Conclusions

We propose a method for reconstructing a building model by recovering the modeling sequence from the input point cloud. By representing the building model through its modeling sequence, the reconstruction results can be directly imported into BIM software for further applications. We introduce a transformer-based network that converts the point cloud into a vectorized modeling sequence. Additionally, a tool based on PythonOCC has been developed to transform the modeling sequence into a 3D shape representation (e.g., mesh) and file formats supported by BIM software. The experimental results demonstrate that the proposed building reconstruction method achieves competitive performance compared to state-of-the-art methods while preserving more geometric details.

Author Contributions

Conceptualization, C.W. and F.D.; methodology, C.W.; software, C.W. and H.L.; validation, C.W. and H.L.; formal analysis, C.W. and F.D.; investigation, C.W.; resources, F.D.; data curation, H.L.; writing—original draft preparation, C.W. and H.L.; writing—review and editing, C.W. and F.D.; visualization, C.W.; supervision, F.D.; project administration, F.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key R&D Program of Zhejiang 2024C01G1752215.

Data Availability Statement

Data are available on request due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PC	Point cloud
CD	Chamfer distance
ECD	Edge chamfer distance
NC	Normal consistency
Param	Parameter

Appendix A

We provide the code part of the tools we used to convert the tensors predicted by our method into the ‘.step’ file as below. Although the code is not complete, we add essential code comments as well as relevant variable and function names to make it readable. This pseudocode is only used to exemplify the general idea of the conversion operation.

Appendix B

We provide an example of the generated ‘JSON’ file format using the proposed method below:

References

Carrasco, C.A.; Lombillo, I.; Sánchez-Espeso, J.M.; Blanco, H.; Boffill, Y. Methodology for 3D Management of University Faculties Using Integrated GIS and BIM Models: A Case Study. Buildings 2024, 14, 3547. [Google Scholar] [CrossRef]
Mol, A.; Cabaleiro, M.; Sousa, H.S.; Branco, J.M. HBIM for storing life-cycle data regarding decay and damage in existing timber structures. Autom. Constr. 2020, 117, 103262. [Google Scholar] [CrossRef]
Wang, Q.; Tan, Y.; Mei, Z. Computational Methods of Acquisition and Processing of 3D Point Cloud Data for Construction Applications. Arch. Comput. Methods Eng. 2020, 27, 479–499. [Google Scholar] [CrossRef]
Andrianesi, D.E.; Dimopoulou, E. An Integrated BIM-GIS Platform for Representing and Visualizing 3D Cadastral Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, VI-4/W1-2020, 3–11. [Google Scholar] [CrossRef]
Wang, J.; Xu, Y.; Oussama, R.; Xie, X.; Ye, N.; Yi, C.; Wei, M. Automatic Modeling of Urban Facades from Raw LiDAR Point Data. Comput. Graph. Forum 2016, 35, 269–278. [Google Scholar] [CrossRef]
Bruno, S.; De Fino, M.; Fatiguso, F. Historic Building Information Modelling: Performance assessment for diagnosis-aided information modelling and management. Autom. Constr. 2018, 86, 256–276. [Google Scholar] [CrossRef]
Zotkin, S.P.; Ignatova, E.V.; Zotkina, I.A. The organization of autodesk revit software interaction with applications for structural analysis. Procedia Eng. 2016, 153, 915–919. [Google Scholar] [CrossRef]
Xia, S.; Chen, D.; Wang, R.; Li, J.; Zhang, X. Geometric Primitives in LiDAR Point Clouds: A Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 685–707. [Google Scholar] [CrossRef]
Nath, D.; Ankit; Neog, D.R.; Gautam, S.S. Application of machine learning and deep learning in finite element analysis: A comprehensive review. In Archives of Computational Methods in Engineering; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–40. [Google Scholar]
Pavón, R.M.; Alberti, M.G.; Álvarez, A.A.A.; Cepa, J.J. Bim-based Digital Twin development for university Campus management. Case study ETSICCP. Expert Syst. Appl. 2025, 262, 125696. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 6–7 June 2019. [Google Scholar] [CrossRef]
An, J.; Ding, W.; Lin, C. ChatGPT. Tack. Grow. Carbon Footpr. Gener. AI 2023, 615, 586. [Google Scholar]
Liu, H.; Li, C.; Li, Y.; Lee, Y.J. Improved Baselines with Visual Instruction Tuning. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 26286–26296. [Google Scholar] [CrossRef]
Moyano, J.; León, J.; Nieto-Julián, J.E.; Bruno, S. Semantic interpretation of architectural and archaeological geometries: Point cloud segmentation for HBIM parameterisation. Autom. Constr. 2021, 130, 103856. [Google Scholar] [CrossRef]
Müller, P.; Wonka, P.; Haegler, S.; Ulmer, A.; Van Gool, L. Procedural Modeling of Buildings. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 1st ed.; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar]
Furukawa, Y.; Curless, B.; Seitz, S.M.; Szeliski, R. Reconstructing building interiors from images. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 80–87. [Google Scholar] [CrossRef]
Friedman, S.; Stamos, I. Online facade reconstruction from dominant frequencies in structured point clouds. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 1–8. [Google Scholar] [CrossRef]
Maas, H.G.; Vosselman, G. Two algorithms for extracting building models from raw laser altimetry data. ISPRS J. Photogramm. Remote Sens. 1999, 54, 153–163. [Google Scholar] [CrossRef]
Vo, A.V.; Truong-Hong, L.; Laefer, D.F.; Bertolotto, M. Octree-based region growing for point cloud segmentation. ISPRS J. Photogramm. Remote Sens. 2015, 104, 88–100. [Google Scholar] [CrossRef]
Wang, M.; Tseng, Y.H. Automatic segmentation of LiDAR data into coplanar point clusters using an octree-based split-and-merge algorithm. Photogramm. Eng. Remote Sens. 2010, 76, 407–420. [Google Scholar] [CrossRef]
Wang, Y.; Hao, W.; Ning, X.; Zhao, M.; Zhang, J.; Shi, Z.; Zhang, X. Automatic Segmentation of Urban Point Clouds Based on the Gaussian Map. Photogramm. Rec. 2013, 28, 342–361. [Google Scholar] [CrossRef]
Ferraz, A.; Bretar, F.; Jacquemoud, S.; Gonçalves, G.; Pereira, L. 3D segmentation of forest structure using a mean-shift based algorithm. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, 26–29 September 2010; pp. 1413–1416. [Google Scholar] [CrossRef]
Sampath, A.; Shan, J. Segmentation and Reconstruction of Polyhedral Building Roofs From Aerial Lidar Point Clouds. IEEE Trans. Geosci. Remote Sens. 2010, 48, 1554–1567. [Google Scholar] [CrossRef]
Biosca, J.M.; Lerma, J.L. Unsupervised robust planar segmentation of terrestrial laser scanner point clouds based on fuzzy clustering methods. ISPRS J. Photogramm. Remote Sens. 2008, 63, 84–98. [Google Scholar] [CrossRef]
Gu, Y.; Cao, Z.; Dong, L. A hierarchical energy minimization method for building roof segmentation from airborne LiDAR data. Multimed. Tools Appl. 2017, 76, 4197–4210. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for Point-Cloud Shape Detection. Comput. Graph. Forum 2007, 26, 214–226. [Google Scholar] [CrossRef]
Xu, B.; Jiang, W.; Shan, J.; Zhang, J.; Li, L. Investigation on the weighted ransac approaches for building roof plane segmentation from lidar point clouds. Remote Sens. 2015, 8, 5. [Google Scholar] [CrossRef]
Dong, Z.; Yang, B.; Hu, P.; Scherer, S. An efficient global energy optimization approach for robust 3D plane segmentation of point clouds. ISPRS J. Photogramm. Remote Sens. 2018, 137, 112–133. [Google Scholar] [CrossRef]
Zheng, Y.; Weng, Q. Model-Driven Reconstruction of 3-D Buildings Using LiDAR Data. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1541–1545. [Google Scholar] [CrossRef]
Xiong, B.; Jancosek, M.; Oude Elberink, S.; Vosselman, G. Flexible building primitives for 3D building modeling. ISPRS J. Photogramm. Remote Sens. 2015, 101, 275–290. [Google Scholar] [CrossRef]
Nan, L.; Wonka, P. Polyfit: Polygonal surface reconstruction from point clouds. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2353–2361. [Google Scholar]
Huang, J.; Stoter, J.; Peters, R.; Nan, L. City3D: Large-Scale Building Reconstruction from Airborne LiDAR Point Clouds. Remote Sens. 2022, 14, 2254. [Google Scholar] [CrossRef]
Huang, H.; Brenner, C.; Sester, M. A generative statistical approach to automatic 3D building roof reconstruction from laser scanning data. ISPRS J. Photogramm. Remote Sens. 2013, 79, 29–43. [Google Scholar] [CrossRef]
Li, Z.; Zhang, W.; Shan, J. Holistic Parametric Reconstruction of Building Models from Point Clouds. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2020, XLIII-B2-2020, 689–695. [Google Scholar] [CrossRef]
Zhang, W.; Li, Z.; Shan, J. Optimal Model Fitting for Building Reconstruction From Point Clouds. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9636–9650. [Google Scholar] [CrossRef]
Chen, Z.; Ledoux, H.; Khademi, S.; Nan, L. Reconstructing compact building models from point clouds using deep implicit fields. ISPRS J. Photogramm. Remote Sens. 2022, 194, 58–73. [Google Scholar] [CrossRef]
Chen, Z.; Shi, Y.; Nan, L.; Xiong, Z.; Zhu, X.X. PolyGNN: Polyhedron-based graph neural network for 3D building reconstruction from point clouds. ISPRS J. Photogramm. Remote Sens. 2024, 218, 693–706. [Google Scholar] [CrossRef]
Xu, Y.; Stilla, U. Toward Building and Civil Infrastructure Reconstruction From Point Clouds: A Review on Data and Key Techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2857–2885. [Google Scholar] [CrossRef]
Zhang, W.; Chen, J.; Tan, G. Complex Roof Structure Reconstruction by 3D Primitive Fitting from Point Clouds. J. Geo-Inf. Sci. 2023, 25, 1531–1545. [Google Scholar] [CrossRef]
Mallis, D.; Ali, S.A.; Dupont, E.; Cherenkova, K.; Karadeniz, A.S.; Khan, M.S.; Kacem, A.; Gusev, G.; Aouada, D. SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines. arXiv 2023, arXiv:2308.15966. [Google Scholar]
Li, Y.; Wu, X.; Chrysathou, Y.; Sharf, A.; Cohen-Or, D.; Mitra, N.J. GlobFit: Consistently fitting primitives by discovering global relations. In Proceedings of the SIGGRAPH ’11: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, 7–11 August 2011. ACM SIGGRAPH 2011 papers. [Google Scholar] [CrossRef]
Tran, T.T.; Cao, V.T.; Laurendeau, D. Extraction of Reliable Primitives from Unorganized Point Clouds. 3D Res. 2015, 6, 44. [Google Scholar] [CrossRef]
Romanengo, C.; Raffo, A.; Biasotti, S.; Falcidieno, B. Recognising geometric primitives in 3D point clouds of mechanical CAD objects. Comput.-Aided Des. 2023, 157, 103479. [Google Scholar] [CrossRef]
Li, L.; Sung, M.; Dubrovina, A.; Yi, L.; Guibas, L.J. Supervised Fitting of Geometric Primitives to 3D Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Sharma, G.; Liu, D.; Maji, S.; Kalogerakis, E.; Chaudhuri, S.; Měch, R. ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2652–2660. [Google Scholar]
Le, E.T.; Sung, M.; Ceylan, D.; Mech, R.; Boubekeur, T.; Mitra, N. CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7457–7466. [Google Scholar]
Wu, R.; Xiao, C.; Zheng, C. DeepCAD: A Deep Generative Network for Computer-Aided Design Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6772–6782. [Google Scholar]
Li, P.; Guo, J.; Zhang, X.; Yan, D.m. SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
Ma, W.; Chen, S.; Lou, Y.; Li, X.; Zhou, X. Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 27144–27153. [Google Scholar] [CrossRef]
Xu, X.; Jayaraman, P.K.; Lambourne, J.G.; Willis, K.D.D.; Furukawa, Y. Hierarchical Neural Coding for Controllable CAD Model Generation. arXiv 2023, arXiv:2307.00149. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Yu, X.; Rao, Y.; Wang, Z.; Liu, Z.; Lu, J.; Zhou, J. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12498–12507. [Google Scholar]
Willis, K.D.D.; Pu, Y.; Luo, J.; Chu, H.; Du, T.; Lambourne, J.G.; Solar-Lezama, A.; Matusik, W. Fusion 360 Gallery: A Dataset and Environment for Programmatic CAD Construction from Human Design Sequences. arXiv 2021, arXiv:2010.02392. [Google Scholar] [CrossRef]
Paviot, T.; Feringa, J. pythonOCC–3D CAD for Python. October 2017. Available online: http://www.pythonocc.org (accessed on 1 January 2008).
Qi, C.; Yi, L.; Su, H.; Guibas, L. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
Koch, S.; Matveev, A.; Jiang, Z.; Williams, F.; Artemov, A.; Burnaev, E.; Alexa, M.; Zorin, D.; Panozzo, D. ABC: A Big CAD Model Dataset For Geometric Deep Learning. arXiv 2019, arXiv:1812.06216. [Google Scholar]
Guerrero, J.; Mantelli, L.; Naqvi, S.B. Cloud-based CAD parametrization for design space exploration and design optimization in numerical simulations. Fluids 2020, 5, 36. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2019; Volume 32. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar] [CrossRef]
Seff, A.; Zhou, W.; Richardson, N.; Adams, R. Vitruvion: A Generative Model of Parametric CAD Sketches. arXiv 2021, arXiv:2109.14124. [Google Scholar]
Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Uy, M.A.; Chang, Y.Y.; Sung, M.; Goel, P.; Lambourne, J.; Birdal, T.; Guibas, L. Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11840–11850. [Google Scholar] [CrossRef]
Ren, D.; Zheng, J.; Cai, J.; Li, J.; Zhang, J. ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]

Figure 1. The formation of a building CAD modeling sequence begins with drafting a sketch using multiple closed curves. This sketch is then extruded to form a 3D solid. By combining multiple extrusions, the final model is constructed.

Figure 2. The proposed building reconstruction method, which is based on a transformer network. The modeling sequence of the input point cloud is recovered through the DeepBuilding network. The sequence is easily transformed into other file formats supported by BIM or modeling software. Then, other 3D representation formats can be exported.

Figure 3. The network architecture of the proposed DeepBuilding network, which can convert the point cloud into a modeling sequence of buildings. The point cloud is first tokenized into point embeddings and then converted to sequence vectors by a transformer network.

Figure 4. Module specification of the proposed DeepBuilding network: (a) point tokenizer for the extraction of the point embeddings; (b) the transformer encoder used to refine the point embeddings; (c) the transformed decoder for decoding the point features into modeling sequence.

Figure 5. The validation loss curves during training on the public ABC dataset: (a) the validation curve for type errors; (b) the L1 loss curve for parameter errors.

Figure 6. The reconstruction results on the public ABC dataset.

Figure 7. Reconstruction examples of the building dataset.

Figure 8. The visualized reconstruction results of small cities.

Figure 9. Some zoomed-in views of the reconstruction results for the buildings dataset.

Figure 10. Reconstruction examples of the robustness analysis.

Table 1. Modeling sequence definition.

〈 S O L 〉

indicates the start of a loop;

〈 E O S 〉

indicates the end of the whole extrusion sequence.

Table 1. Modeling sequence definition.

〈 S O L 〉

indicates the start of a loop;

〈 E O S 〉

indicates the end of the whole extrusion sequence.

Command	Parameters
$〈 S O L 〉$	∅
L (Line)	x, y: line endpoint
A (Arc)	x, y: arc endpoint
A (Arc)	$x_{m}$ , $y_{m}$ : arc midpoint
R (Circle)	x, y: circle center
R (Circle)	r: radius
E (Extrude)	$θ, ϕ, γ$ : sketch plane orientation
E (Extrude)	$o_{x}, o_{y}, o_{z}$ : sketch plane origin
	$e_{1}, e_{2}$ : extrusion distances toward both sides
	b: Boolean type
	s: sketch scale
$〈 E O L 〉$	∅

Table 2. Evaluation results on the public ABC dataset.

Methods	${Acc}_{t}$	${Acc}_{p}$	$CD$	$ECD$	$NC$	$# Δ P$
Point2Cyl [65]	31.98%	27.94%	0.518	1.065	0.791	27.96
ExtrudeNet [66]	24.46%	21.83%	0.614	1.117	0.776	36.14
SECAD-Net [49]	33.18%	28.81%	0.437	1.079	0.806	34.18
DeepCAD [48]	79.49%	70.19%	0.898	1.883	0.708	7.16
HNC-CAD [51]	81.27%	73.41%	0.827	1.064	0.711	6.56
Our Method	93.27%	83.97%	0.410	0.607	0.819	4.97

Table 3. Evaluation results on the building dataset.

Methods	Hausdorff Distance	Chamfer Distance
City3D [33]	0.238	0.719
PolyFit [32]	0.191	0.675
Point2Poly [37]	0.160	0.516
PolyGNN [38]	0.107	0.218
Ours	0.096	0.213

Table 4. Quantitative results of the robustness analysis.

Dropping Rate	Hausdorff Distance	Chamfer Distance
0%	0.107	0.218
5%	0.153	0.273
10%	0.201	0.389
15%	0.307	0.417
20%	0.607	0.961
25%	0.607	0.961

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Liu, H.; Deng, F. Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network. Remote Sens. 2025, 17, 359. https://doi.org/10.3390/rs17030359

AMA Style

Wang C, Liu H, Deng F. Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network. Remote Sensing. 2025; 17(3):359. https://doi.org/10.3390/rs17030359

Chicago/Turabian Style

Wang, Cheng, Haibing Liu, and Fei Deng. 2025. "Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network" Remote Sensing 17, no. 3: 359. https://doi.org/10.3390/rs17030359

APA Style

Wang, C., Liu, H., & Deng, F. (2025). Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network. Remote Sensing, 17(3), 359. https://doi.org/10.3390/rs17030359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reconstructing Geometric Models from the Point Clouds of Buildings Based on a Transformer Network

Abstract

1. Introduction

2. Related Works

2.1. Building Model Reconstruction

2.2. CAD Model Reconstruction

3. The Modeling Sequence of Buildings

3.1. Sequence Hierarchy

3.2. Command Specification

4. Materials and Methods

4.1. Overview

4.2. Transformer Encoder for Implicit Space Conversion

4.3. Transformer Decoder

4.4. Loss Function

4.5. Importing into BIM Software

5. Results

5.1. Experiment Setup

5.2. Evaluation Metrics

5.3. Results on Public Modeling Sequence Dataset

5.4. Results on Building Dataset

6. Discussion

6.1. Robustness Analysis

6.2. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI