Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multisource High-Resolution Remote Sensing Image Vegetation Extraction with Comprehensive Multifeature Perception

Remote Sens. 2024, 16(4), 712; https://doi.org/10.3390/rs16040712

by Yan Li¹, Songhan Min², Binbin Song¹, Hui Yang^3,*, Biao Wang¹

and Yongchuang Wu⁴

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2024, 16(4), 712; https://doi.org/10.3390/rs16040712

Submission received: 27 December 2023 / Revised: 10 February 2024 / Accepted: 13 February 2024 / Published: 18 February 2024

(This article belongs to the Special Issue Investigating State-of-the-Art Machine Learning Approaches in Vegetation Analysis through Earth Observation Data)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Review of: Multisource high-resolution remote sensing image vegetation extraction with comprehensive multi feature perception

Comments:

- Overall, the introduction is not structured very well. It reads as a mix between motivation and related work. These should be divided and approached separately instead of jumping back and forth

- Line 60: “Currently, these methods have been widely used in tasks such as…” only mentions image tasts/computer vision. The sentence before talks about AI/deep learning methods. These are not limited to images.

- Line 69: “networks re not suitable”… why? Explain this. You can’t make a baseless claim in the introduction without explaining reasoning. Loss in resolution is not an artifact of the networks? Spatially sensitive task inputs rely on the input data?

- Line 86: Isn’t U-NET a pixel based method? It performs semantic segmentation at a pixel level… be specific in what you mean by pixel based methods?

- Line 95: What is vegetation commission?

- Should have paragraph break at line 95. “Pixel based methods” and “combining multiple features/vegetation indices” don’t belong in same overall paragraph

- Lines 112-116: Major run-on sentence and making claims that are not supported with relevant citations. Explain what this all means and how you can make such claims.

- First contribution point is way too long. This is meant to be a high level contribution summary.

Method comments:

- Line 157: “center pixel of the collected samples”. Until now, you have given no information about the input samples. Why the center pixel? Can’t this be different than the rest of an input?

- Line 159: “sample labels”… what are the labels? This is necessary to understand what “higher numerical values” represent and why this model is necessary.

- Why would some random forrest correlation/importance have any correlation with how a neural network would learn? This is not well motivated. If anything, deep learning theory suggests your should import all the raw information and let the model learn for itself what to learn from. Your approach doesn’t make sense to me and you do not explain the motivation for it.

- Line 212-214: “Regular convolution cannot effectively extract the position information of spatial features, especially the boundary and fine features”. I do not believe this claim and you do not provide any support for it. CNNs encode spatial representations within the input data. Do you mean for your specific input data (i.e., vegetation data?). If so, show that this is the case and this addition is necessary, otherwise omit this claim.

- In “Simplified Dense Block”, this is an interesting concept, but you discuss how different dimensions of CONVOLUTIONAL layers are concatenated? Why are these labeled as dense blocks? This is confusing with fully connected layers also often being referred to as “dense” layers.

- Line 252: “QK”, you don’t specify these are query (Q) and key (K). Please do this for the reader. This is confusing given that Equation 1 uses a lowercase “k” for key.

- Lines 258 - 260: You do not explain any of these variables. These could mean anything, please be specific.

Results:

- Did you manually label 266k pixel values? This dataset would be valuable if made public.

- Please include information about the loss function and final layer activation used for the binary classification task.

- What is the performance of simply thresholding NDVI? Vegetation is the only thing with somewhat healthy NDVI.. I’d expect this could be used for binary vegetation/non-vegetation classification?

- Did you train the model using the severely imbalanced dataset? (266 vegetation / 435k non-vegetation). This could be problematic for training and lead to significant bias in the model

- I’m not seeing obvious evidence for all of the claims made in Lines 373 -> 381 considering you state “more deeply”. More compared to what? You should do the same comparative analysis as Table 4 for all of the test area (comparing against other algorithms for the entire test dataset)

- It would be nice to have some background on the type of other models used in Table 4 (i.e., RFC and OSBCL are not explained).

- Precision for NSICN in Table 4 (Planet data) is not the highest but it is bolded. Same with Recall in Table 7. Please explain what the bold is supposed to reference (typically it highlights the best result).

- Line 421: Y-NET uses both spectral features and vegetation index. Be specific what you mean by “classical algorithms”.

- Nice ablation study, well done addressing data inputs and model structure. It would be nice to point to this when making claims for the model being “more effective” and such due to these features.

Could cite:

- “Beyond measurement: Extracting vegetation height from high resolution imagery with deep learning”

- “MASANET: Muli-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images”

Comments on the Quality of English Language

English language is fine. Below are some typos I caught:

typos:

- Line 58/59: “making deep learning methods gradually become a research hotspot in AI” doesn’t make sense.

- Line 94: “in t vegetation…”

- Line 198: “parallelly” -> “simultaneously”… parallelly is not a word

- Equation 7: “Precess” -> “Precision”

- Table 3: “Test aera” -> “Test Area”

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

In this paper, a multi-feature joint perceptive convolutional neural network based on spectral features and vegetation index features parallel input is proposed for vegetation extraction in multi-source high-resolution remote sensing images. This method can effectively address the problems of traditional methods, such as low accuracy and internal fragmentation with different data sources, and achieve higher accuracy in vegetation extraction.The work of the paper is meaningful. Furthermore, the following issues should be revised:

1. In lines 198 to 222: Two network structure simplification experiments are proposed, but the adjustments of network training strategy after removing a certain module are not given, which may affect the experimental results. It is suggested to explain whether it is necessary to adjust the training configuration, such as the optimizer, after removing a certain module.

2. In lines 384 to 385: Whether MSICN network used the same training strategy when compared with other methods needs to be explained.

3. In lines 382 to 552: During the result analysis, more detailed visualization results can be provided to show the vegetation extraction effect of the MSICN method under different scenes, including different terrains such as cities and farmlands. It is helpful to evaluate the effectiveness of the method intuitively.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

-Why are the citations in section 2 so weird? The authors use "[33] et al.," and include citations at the beginning of sentences (i.e., [35] The Y-NET network..."), both of which are not common practice. You should address the paper's by their author's names and cite either directly after or at the end of the sentence (i.e., Radke et al. [35] propose the Y-NET network...).

- "dK" (line 295) should be $d_{K}$, the 'K' is a subscript in Equation 1.

- "Test Aera" in Table 3 is still a typo. Should be "Test Area"

- Line 511, "The Table5 shows" -> "Table 5 shows"

- Line 513, "and the figures7 show" -> "and Figure 7 shows"

Comments on the Quality of English Language

There are several typos I've outlined in the comments above. The citation formatting is very strange and should be corrected and made consistent throughout the paper. There are several instances similar to "the Table5" in which "the" should be removed and a space should be added between "Table" and "5". In Latex, this can be done using ~, such as "Figure~\ref{fig_label}". Whenever you are referencing a table or figure in the paper, these should be capitalized.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

Multisource High-Resolution Remote Sensing Image Vegetation Extraction with Comprehensive Multifeature Perception

Further Information

Guidelines

MDPI Initiatives

Follow MDPI