Learning Color Distributions from Bitemporal Remote Sensing Images to Update Existing Building Footprints
Round 1
Reviewer 1 Report
(1) The first concern is that the accuracy is measured only based on the changed buildings or all the existing buildings corresponding to the post-temporal image. It seems unclear to the readers.
(2) Generally, the extracted building footprints are connected together for the images with lower resolution, especially when the buildings are densely distributed. While the labels acquired from a historical database are distinguishable building instances. I doubt that the proposed threshold-based post-processing is work when the changes occur near the buildings.
(3) In my view, despite the changed buildings are converted into polygons during the proposed post-processing, the additional changed buildings are not regularized and simplified to be directly utilized by the related department. The conclusion that the proposed updated method algorithm without manual relabeling is not fully supported.
(4) For the two building change detection datasets, it needs to clear the pre-processing strategy. Generally, the test data would be cropped without overlap. Additionally, the numbers of train and test subsets should make clear.
(5) It's better to simplify the part of the introduction.
(6) The details of the manuscript should be checked carefully, such as line 245 ‘from one from to’, line 369 ‘0.2m’, and line 411-412 ‘TPs FNs’.
(7) The abbreviation of the methods in Table 1 and Table 2 is unfriendly to the readers. It’s advised to replace them with their full names or to add a description.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
I only suggest to improve the references to a bibliography related to the state of art from a cultural point of view.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The manuscript presents a method to update existing building footprints by learning color distributions from bitemporal remote sensing images. The work is interesting, and the manuscript was well organized. But the process of the method is a little confusing. The detailed comments are as follows.
1. The semantic segmentation in Part 2 of the flowchart in Figure 1 is not clearly described. According to what is described in the figure, the image of the later period is used to train the building extraction model. The purpose of this paper is to replace the invariant area with historical label data, but the building label data of the later period do not exist. If the training of building semantic segmentation is carried out directly on the post-period image, it means there is the post-period building label image, therefore this method is meaningless.
2. For the line 349-357, what the author describes is that the former image X is color transformed to obtain the Fake X with a similar style to the later one, and then the Fake X and the labels of the former period are trained by building semantic segmentation, and the training results are used to predict the images of the later period.
The images and labels in the former period are unchanged, while the images in the later period may be different due to different acquisition methods, which may be different every time. According to the process described by the author, the model should be retrained every time when a new image is used to update the building dataset. Is this process reasonable? Why not directly use the pre-period image X and pre-period label L to train the building semantic segmentation model, and the post-period image Fake Y, obtained by color transformation from the image Y, is predicted?
3. Figures 6-9 describe the conversion of the color style of the pre-phase image into a style like that of the post-phase image, but the post-phase image is used in the building extraction and comparison experiment in Figures 10-11. Is there any contradiction in the experimental process?
Updating the data set of buildings is the purpose of the article, namely the building extraction of the new image and updating old tag data. Therefore, for figure 6 and figure 9 in experiment, the later image should be converted to the similar color of the style of the former images. For figure 10 and 11 in experiments, buildings should be extracted from the transformed later image.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
(1) The author evaluates the results on the existing buildings corresponding to post-temporal images, while most of unchanged buildings are copied from the former labels. The author did not clearly describe the way of obtaining the ground-truth corresponding to the post-temporal images. Actually, most of the building labels for post-temporal images are equal to that of the pre-temporal images. In my view, it is necessary to design an ablation experiment to validate the efficiency of the post-process and other modules.
(2) For the experiment in Section 3.4, the best threshold of the two datasets is totally opposite. One is 0.2 and another is 1. Additionally, the threshold has a significant effect, around 10% in the IoU metric, on the results. In practical engineering applications, there is no ground truth to select and validate the best threshold. So, this strategy seems meaningless.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
No comments
Author Response
Thanks for your suggestions.