Local Transformer Network on 3D Point Cloud Semantic Segmentation
Round 1
Reviewer 1 Report
This manuscript contains exciting work from local transformer network on 3D point cloud semantic segmentation. However, the manuscript is written in style more like a report rather than a research article. The global innovativeness in research development hasn't been fully presented. Some figures and tables which involve world-wide novel research should be described and discussed with more details to emphasize the state-of-the-art-review all over the world novelty. Please use this the newest (2018-2022) Web of Science journal papers.
Author Response
Dear Reviewer:
We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.
Sincerely,
Ms. Wang
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper presents interesting topics but it is overstressed for missing of the meaning of mathematical symbols used in many formulas. Therefore the following are to be observed carefully for having a better presentation:
- in formula 3, which concatenation operation ⊕ stands?
- in formula 4, MLP stands for?
- in figure 2, please specify the role of KNN
- in formula 5, ??(???1), ??(???2), ??(???3) stand for? Please explain what is the batch normalization
- what is V in formula 17 and ff.?
- chinese at row 253!!!
- in figure 6, up-sampling stands for?
- in formula 23, mij stands for? And what is Mij in formula 22? Maybe mij=Mij?
- what is the formula for getting the numbers of tables 1, 2, 3, 4?
Author Response
Dear Reviewer:
We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.
Sincerely,
Ms. Wang
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments
1 – The paper is well written and well organized.
2 – Line 90. Please define the “mIoU” acronym.
3 – Section 1 should end with a paragraph stating the organization of the remainder of the paper.
4 – Line 112. Please, define the MLP acronym.
5 – Line 127. Please, define the DGCNN acronym.
6 – Lines 129 and 130. We have “proposed a graph attention convolution (GAV),…” I think it should be GAC.
7 – Line 163. We have the “normal of every point”. Please, explain what is the normal, in this context.
8 – Please treat the equations as elements of text. The punctuation rules also apply to equations. After equation (1) and (2), we should have a “,”. Equations (3) and (4) should end with a final dot. Equations (6) and (11) are missing a final dot….Please check all the equations.
9 – Line 177. We have “is the ?1 distance.” It should be the “is the ?1 norm.”.
10 – Please do not use symbols with double meaning. In equation (1), ‘n’ is the normal vector. In line 262, ‘n’ is the number of neighbors. Please check on this.
11 – On the experimental results from Figure 7 to Figure 11.
- On the caption, please change “line” to “row.
- Explain what is the difference between the (a), (b), and (c) cases.
12 – On the experimental results from Figure 9 to Figure 11, please change, “Third line is the results …..” -> “The third row has the results …..”
Writing
1 – Line 22
sematic
->
semantic
2 – Line 30
Convolutional neural network has
->
Convolutional neural network (CNN) has
researchers considered how
->
with researchers considering how
3 – Line 34
projected the 3D point clouds onto the 2D plane, generated the bird's
->
project the 3D point clouds onto the 2D plane, generating the bird's
4 – Line 43
segmentation accuracy; When the voxel
->
segmentation accuracy. When the voxel
5 – Line 58
This paper proposed a local transformer
->
This paper proposes a local transformer
6 – Line 60
we used transformer to learn
->
we use transformer to learn
7 – Line 70
adopted encoder-decoder structure.
->
adopts encoder-decoder structure.
8 – Line 74
In decoder layer, we used the
->
In the decoder layer, we use the
9 – Line 85
We proposed two different key
->
We propose two different key
10 – Line 99
Squeezeseg[8] converts
->
SqueezeSeg [8] converts
11 – Line 103
range image (RV) and bird’s view image (BEV),
->
range view (RV) image and bird’s eye view (BEV) image,
12 - Line 105
The network divided the point
->
The network divides the point
13 – Line 106
and then extracted the features
->
and then extracts the features
14 – Line 124
and then obtained
->
and then obtains
15 – Line 125
local feature map; Finally
->
local feature map. Finally
16 – Line 127
resorted to
->
resorts to
17 – Line 131
more accurate represent of local features
->
more accurate representations of local features
18 – Line 146
In this paper, we proposed a local transformer
->
In this paper, we propose a local transformer
19 – Line 171
of points are concatenate by the normalized
->
of points are concatenated by the normalized
20 – Line 173
and encode each neighbor
->
and encodes each neighbor
21 – Line 178
Then original input feature
->
Then, original input feature
22 – Line 188
is shown in Figure 2:
->
is shown in Figure 2.
23 – Line 202
of the ?, ?, ? matrix.
->
of the ?, ?, ? matrices.
24 – Line 204
product between ? and ?, The attention score
->
product between ? and ?. The attention score
25 – Line 217
defined as follow:
->
defined as follows:
26 – Line 232
has not learned enough feature yet to obtain
->
has not learned enough features yet to obtain
27 – Line 245
can be finally obtain by the operation
->
can be finally obtained by the operation
28 – Line 253
We have “shown in Figure 5 错误!未找到引用源。”Please correct this.
to the encoder module obtained
->
to the encoder module are obtained
29 – Line 260
is the coordinate set
->
are the coordinate set
30 – Line 261
at decoder block. respectively.
->
at decoder block, respectively.
31 – Line 272
datasets is a dataset for semantic task
->
datasets address semantic task
32 – Line 292
We set the number of encoder layers is 7
->
We set the number of encoder layers as 7
33 – Line 294
Since the random down
->
The random down
34 – Line 295
is most efficient than other
->
is more efficient than other
35 – Line 311
and SemanticKIITI respectively.
->
and SemanticKIITI, respectively.
36 - Line 352. Caption of figure 9.
We have repeated “First line is the ground truth. First line is the ground truth.”
Author Response
Dear Reviewer:
We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.
Sincerely,
Ms. Wang
Author Response File: Author Response.pdf
Reviewer 4 Report
This paper presents a transformer-based architecture for semantic segmentation of 3D point cloud. It utilizes the local content information to make the processing of large-scale point cloud data possible. A cross-skip selection is proposed to expand the receptive of field without increasing the computational load. The experiments are carried out on several public datasets and comparison is performed with other techniques. There are some merits of this work, but there are still several issues to be addressed or fixed. First, the major contribution should be emphasized more clearly. Although some key features are presented, it is also expected to see the improvement over the previous transformer-based methods. Second, several figures in the paper are not clear enough. Please make the figures easy to read in terms of text size, etc. If some network structures are borrowed from the previous works, please provide the sources. Third, semantic segmentation of 3D point cloud is an important research topic with many practical applications. This paper presents the results on outdoor traffic scenes and the indoor environment. However, the adoption to the robotics is also one important application scenario, for example, in the recent work "BiLuNetICP: a deep neural network for object semantic segmentation and 6D pose recognition," IEEE Sensors Journal, May 2021. This should be properly addressed or compared. Fourth, the writing of this work is not straightforward to follow. It is suggested to substantially rewrite with a more structured way, especially providing more reasoning instead of only plain statement of the implementation details. Also, one reference is missing on page 8. Fifth, as most related works provide open source for testing and validation, it is also suggested the authors to make the code publicly available to increase the impact of this paper.
Author Response
Dear Reviewer:
We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.
Sincerely,
Ms. Wang
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I recommend paying more attention to describing worldwide research innovativeness in the future.
Reviewer 2 Report
the authors have replied positively to the remarks of the reviewer, their corrections are exhaustive