Next Article in Journal
Local Differential Privacy Based Membership-Privacy-Preserving Federated Learning for Deep-Learning-Driven Remote Sensing
Previous Article in Journal
A Computationally Efficient Approach for Resampling Microwave Radiances from Conical Scanners to a Regular Earth Grid
Previous Article in Special Issue
Urban Vegetation Extraction from High-Resolution Remote Sensing Imagery on SD-UNet and Vegetation Spectral Features
 
 
Article
Peer-Review Record

Recurrent Residual Deformable Conv Unit and Multi-Head with Channel Self-Attention Based on U-Net for Building Extraction from Remote Sensing Images

Remote Sens. 2023, 15(20), 5048; https://doi.org/10.3390/rs15205048
by Wenling Yu 1,2, Bo Liu 1,2,3,*, Hua Liu 1,2,3 and Guohua Gou 4
Reviewer 1: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Remote Sens. 2023, 15(20), 5048; https://doi.org/10.3390/rs15205048
Submission received: 31 July 2023 / Revised: 18 October 2023 / Accepted: 19 October 2023 / Published: 20 October 2023

Round 1

Reviewer 1 Report

The authors have implemented a hybrid model for building extraction in which a combination of deep learning models are implemented. The novelty is good.

The following are some of the observations:

1. Abstract needs to be rewritten to provide highlights of the Proposed method.

2. The contributions are to be given more specifically

3. In the experimental analysis, if possible, include the model performance with respect to different batch sizes 

4. Recheck the repetition of words in certain locations (Ex: line 322). Check for the manuscript

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

In this study, authors have proposed a new architecture for building extraction from remote sensing images. The architecture includes Recurrent Residual Deformable Conv Unit and Multi Head 2 with Channel Self-Attention with UNet base.

Comments/Suggestions

·         Line number 25-27, abstract section, rewrite the sentence.

·         Literature review needs to be updated. Some of the recently published papers on building extraction are missing:

o   Chen K, Zou Z, Shi Z. Building Extraction from Remote Sensing Images with Sparse Token Transformers. Remote Sensing. 2021; 13(21):4441. https://doi.org/10.3390/rs13214441.

o   Tejeswari, B., Sharma, S. K., Kumar, M., and Gupta, K.: BUILDING FOOTPRINT EXTRACTION FROM SPACE-BORNE IMAGERY USING DEEP NEURAL NETWORKS, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2022, 641–647, https://doi.org/10.5194/isprs-archives-XLIII-B2-2022-641-2022.

o   Xu Y, Wu L, Xie Z, Chen Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sensing. 2018; 10(1):144. https://doi.org/10.3390/rs10010144

o   Li et al., 2022 W.B. Li, K.M. Sun, H.P. Zhao, W.Z. Li, J.J. Wei, S. Gao Extracting buildings from high-resolution remote sensing images by deep ConvNets equipped with structural-cue-guided feature alignment Int. J. Appl. Earth Obs. Geoinf., 113 (2022), Article 102970, 10.1016/j.jag.2022.102970.

·         Section 3.1.1 and 3.1.2 can be added in a new heading “Data used” before the methodology section.

·         Line number 322, rewrite the sentence.

·         There are a lot of building extraction architectures in the literature. It is suggested to add more architectures for comparison.

·         Authors have used only two datasets in this study. It is suggested to include more datasets.

·         More discussion needs to be added.

·         In the conclusion authors have mentioned “The proposed RDCU module has proven instrumental in mitigating the challenges of gradient vanishing….”. The gradient vanishing problem is already addressed in the literature. How this architecture is different in handling the vanishing gradient problem.

 

Grammatical errors were detected in many places in the manuscript.

 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper propose a remote sensing images building extraction method based on U-Net which conclude recurrent residual deformable convolution (RDCU) and multi head with channel self-attention (MHCSA). This paper is innovative to some extent, but there are still the following problems:

(1) What does the green box in Figure 1 indicate that is not stated?

(2) Figure 4 can not see how to use the channel self-attention mechanism?

(3) Section 2.2 should supplement the principle of the channel self-attention mechanism.

(4) Many of the comparison methods in this paper are not the most advanced, especially U-Net and DeepLab v3+. In the past two years, many papers on building extraction have been proposed, which need to be compared with SOTA methods to verify the advanced nature of the proposed method. In particular, Transformer semantic segmentation model with self-attention.

(5) There are some grammatical problems in this paper, which need to be revised.

There are some grammatical problems in this paper, which need to be revised.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report

Please follow the minor revisions detailed in the attached file

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All the comments/suggestions have been addressed by the authors.

 

Author Response

Thanks for your comment.

Reviewer 3 Report

It has been modified according to comments and received in its current form.

Minor editing of English language required.

Author Response

Thanks for your comment. I have revised the abstract, as detailed in the revised draft of the latest version

Back to TopTop