3DLEB-Net: Label-Efficient Deep Learning-Based Semantic Segmentation of Building Point Clouds at LoD3 Level
Round 1
Reviewer 1 Report
Contribution: A novel label-efficient DL network that obtains per-point semantic labels of LoD3 buildings’ point clouds with limited supervision.
The paper is quite well organized and written, and the authors' method seems very useful. I have identified only one issue of any significance, and a number of suggestions / questions for improvement.
Major issues:
line 155: You first define and provide reference for LoD1 - 3. Suggest this should be done very early in the paper, probably first paragraph of the intro.
Minor issues:
line 12: Define LoD3 here or in intro. Perhaps include reference. You have done for LoDs, but what is the "3"?
line 30: "enable and promote" -> "enables and promotes"
line 72: Something missing, likely at the end of the sentence
line 94: Does this sentence apply for only #3, or for all? If all, move it out from #3: "The result shows that our result surpasses or achieves performance comparable to that of recent state-of-the-art methods, with only 10% of training data."
line 278: just a suggestion, write as superscript "th" for "ith"
Eq. 1, Eq. 3 (example): These should not have a comma, they are end of sentence. So, a period. Applies throughout.
line 294: Don't think this is a new paragraph
line 454: "retrieved form" -> "retrieved from"
line 458: italicize x, y, z
Table 1: Why report Scene B before Scene A? Not critical, just seems odd.
line 543: "Chamfer Loss (CD)". Why is it CD instead of CL? May want to clarify, unless widely accepted.
lines 578 - 585: italicize x, y, z
References: Should "Cnn" be "CNN"? (in Wang et al 2019). Check for this kind of thing throughout references. Also, while you are mostly very consistent, some author lists end in period, some in colon, and some with no punctuation.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The qualitative results are not promising.
Figure 8: stairs mixed with floor, floor mixed with roof, problem with column
In my opinion, based on result I am not convinced to reuse proposed approach. I think classic approach for point cloud segmentation could reach much better results (such as based on simple shape segmentation with PCA/SVD, RANSAC and building some simple semantic topology).
I advise Authors to check if e.g. Point Cloud Library could reach better results. Comparing only to DNN is not convincing.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
A new machine learning model for architectural oriented point cloud semantic segmentation is proposed. It is used an AE network for extracting features without using any prelabeled data. Then, with a very limited number of labels it is successfully trained a semantic segmentation network for LoD3 point clouds.
Strengths: well written, consistent experiment.
Weaknesses:
1. Innovation is only at the level of the proposed architecture.
2. A reference to the code repository used in the experiments is missing.
A few observations:
1. Last sentence at lines 72-74 seems unfinished.
2. Line 252: "with just a few labeled data...". May be give the number of the labels are used?
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
A novel method 3DLEB-Net consists of two steps was proposed in the manuscript. The first step (Autoencoder - AE) is composed of a Dynamic Graph Convolutional Neural Network (DGCNN) encoder and a folding based decoder; the second step is the semantic segmentation network. The paper is technically correct, some comments are given as follows.
- Most of the results in Tables 1 and 2 comes from [5], but the main method of [5] is DGCNN-Mod and it is not included in the tables. I found DGCNN-Mod was included for comparison in Table 8. In fact, I found DGCNN-Mod outperformed PointNet, PointNet++, PCNN and DGCNN as well. Please explain why. Meanwhile, please list which page or table of [5] the data comes from in the responses.
- The scene was divided into 3 parts: Train, Validation, Test in [5], but this part was not mentioned here. The scenes in the manuscript were divided into Train and Test only, and from my opinion, they employed different frameworks for evaluation the methods mentioned in the papers. The best way is to evaluate in the same way, but if not, the details should be explained truthfully in the article. The proposed method can be evaluated with the same number of training scenes as others methods, and show the results too.
- Figures 4, 7 and 8 illustrate the classification results of the scenes, but only for the proposed method. I suggest the classification results of other mentioned methods should be presented, at least DGCNN and DGCNN-Mod.
- Table 3 demonstrates effectiveness of encoders. I suggest to change the "setting" in the table to "encoder", and “DGCNN” to “DGCNN-based” to reflect the content in the paragraph.
- The article stated “only 3 scenes (about 10% of 10 scenes)”, 10% of 10 equals to 3 ?
- Label x-axis of Fig 11.
- Note 3 in Table 8 is not annotated.
- It is better to use italics for variable symbols like equations in the context as equations, please modify the whole article.
Author Response
Please check attached coverletter
Author Response File: Author Response.docx