Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Recurrent Residual Deformable Conv Unit and Multi-Head with Channel Self-Attention Based on U-Net for Building Extraction from Remote Sensing Images

Remote Sens. 2023, 15(20), 5048; https://doi.org/10.3390/rs15205048

by Wenling Yu^1,2, Bo Liu^1,2,3,*

, Hua Liu^1,2,3 and Guohua Gou⁴

Reviewer 1: Anonymous

Reviewer 2:

Surendra Kumar Sharma

Reviewer 3:

Liang Huang

Reviewer 4: Anonymous

Remote Sens. 2023, 15(20), 5048; https://doi.org/10.3390/rs15205048

Submission received: 31 July 2023 / Revised: 18 October 2023 / Accepted: 19 October 2023 / Published: 20 October 2023

(This article belongs to the Special Issue Explainable Deep Neural Networks for Remote Sensing Image Understanding II)

Round 1

Reviewer 1 Report

The authors have implemented a hybrid model for building extraction in which a combination of deep learning models are implemented. The novelty is good.

The following are some of the observations:

1. Abstract needs to be rewritten to provide highlights of the Proposed method.

2. The contributions are to be given more specifically

3. In the experimental analysis, if possible, include the model performance with respect to different batch sizes

4. Recheck the repetition of words in certain locations (Ex: line 322). Check for the manuscript

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

In this study, authors have proposed a new architecture for building extraction from remote sensing images. The architecture includes Recurrent Residual Deformable Conv Unit and Multi Head 2 with Channel Self-Attention with UNet base.

Comments/Suggestions

· Line number 25-27, abstract section, rewrite the sentence.

· Literature review needs to be updated. Some of the recently published papers on building extraction are missing:

o Chen K, Zou Z, Shi Z. Building Extraction from Remote Sensing Images with Sparse Token Transformers. Remote Sensing. 2021; 13(21):4441. https://doi.org/10.3390/rs13214441.

o Tejeswari, B., Sharma, S. K., Kumar, M., and Gupta, K.: BUILDING FOOTPRINT EXTRACTION FROM SPACE-BORNE IMAGERY USING DEEP NEURAL NETWORKS, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2022, 641–647, https://doi.org/10.5194/isprs-archives-XLIII-B2-2022-641-2022.

o Xu Y, Wu L, Xie Z, Chen Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sensing. 2018; 10(1):144. https://doi.org/10.3390/rs10010144

o Li et al., 2022 W.B. Li, K.M. Sun, H.P. Zhao, W.Z. Li, J.J. Wei, S. Gao Extracting buildings from high-resolution remote sensing images by deep ConvNets equipped with structural-cue-guided feature alignment Int. J. Appl. Earth Obs. Geoinf., 113 (2022), Article 102970, 10.1016/j.jag.2022.102970.

· Section 3.1.1 and 3.1.2 can be added in a new heading “Data used” before the methodology section.

· Line number 322, rewrite the sentence.

· There are a lot of building extraction architectures in the literature. It is suggested to add more architectures for comparison.

· Authors have used only two datasets in this study. It is suggested to include more datasets.

· More discussion needs to be added.

· In the conclusion authors have mentioned “The proposed RDCU module has proven instrumental in mitigating the challenges of gradient vanishing….”. The gradient vanishing problem is already addressed in the literature. How this architecture is different in handling the vanishing gradient problem.

Grammatical errors were detected in many places in the manuscript.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper propose a remote sensing images building extraction method based on U-Net which conclude recurrent residual deformable convolution (RDCU) and multi head with channel self-attention (MHCSA). This paper is innovative to some extent, but there are still the following problems:

(1) What does the green box in Figure 1 indicate that is not stated?

(2) Figure 4 can not see how to use the channel self-attention mechanism?

(3) Section 2.2 should supplement the principle of the channel self-attention mechanism.

(4) Many of the comparison methods in this paper are not the most advanced, especially U-Net and DeepLab v3+. In the past two years, many papers on building extraction have been proposed, which need to be compared with SOTA methods to verify the advanced nature of the proposed method. In particular, Transformer semantic segmentation model with self-attention.

(5) There are some grammatical problems in this paper, which need to be revised.

There are some grammatical problems in this paper, which need to be revised.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report

Please follow the minor revisions detailed in the attached file

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All the comments/suggestions have been addressed by the authors.

Author Response

Thanks for your comment.

Reviewer 3 Report

It has been modified according to comments and received in its current form.

Minor editing of English language required.

Author Response

Thanks for your comment. I have revised the abstract, as detailed in the revised draft of the latest version

Article Menu

Recurrent Residual Deformable Conv Unit and Multi-Head with Channel Self-Attention Based on U-Net for Building Extraction from Remote Sensing Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI