Next Article in Journal
Marine Heatwaves in the South China Sea: Tempo-Spatial Pattern and Its Association with Large-Scale Circulation
Next Article in Special Issue
EMO-MVS: Error-Aware Multi-Scale Iterative Variable Optimizer for Efficient Multi-View Stereo
Previous Article in Journal
Spatiotemporal Characteristics of Soil Moisture and Land–Atmosphere Coupling over the Tibetan Plateau Derived from Three Gridded Datasets
Previous Article in Special Issue
Bias Analysis and Correction for Ill-Posed Inversion Problem with Sparsity Regularization Based on L1 Norm for Azimuth Super-Resolution of Radar Forward-Looking Imaging
 
 
Article
Peer-Review Record

Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data

Remote Sens. 2022, 14(22), 5828; https://doi.org/10.3390/rs14225828
by Yuting He 1, Penghai Wu 1,2,3,*, Xiaoshuang Ma 1,2, Jie Wang 1,3 and Yanlan Wu 2,4
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Remote Sens. 2022, 14(22), 5828; https://doi.org/10.3390/rs14225828
Submission received: 11 October 2022 / Revised: 12 November 2022 / Accepted: 15 November 2022 / Published: 17 November 2022
(This article belongs to the Special Issue Deep Learning in Optical Satellite Images)

Round 1

Reviewer 1 Report

well written

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

All the comments and suggestions are written in the attached file.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

An interesting SSF model combined with physical constraints and deep fusion network is proposed in this manuscript. It is useful to retrieve Chl-a for coastal water. But as far as I am concerned, major revision is needed before publication.

1. For PSSDFN structure, MSI images are part of the input variables and the labels. Is my understanding correct? And is this practice common for SSF?

2. Is the amount of the sample (120 shown in Section 2.1) enough for ML training? Are the models chosen to retrieve Chl-a suitable for small-size samples?

3. In terms of Chl-a estimation, are there any differences in accuracy between downscaled MODIS Chl-a products (Line54, Ref[16]) and Chl-a retrieved by the method proposed in this manuscript?

 

 

Line54: Please move the reference at the beginning of the sentence, i.e. [16] to the middle or the end of the sentence.

 

Line65: What does “DL” mean? Deep learning? The acronym should be defined in the text upon the first use.

 

Line79-80: I didn’t find any Chl-a retrieving in Reference [23]. Please check this statement and the reference cited.

 

Line80-81: “The selection of remote sensed images and sensitive bands may affect the physical constraints in SSF methods.”

Is there any literature to support this statement? Or can it be proved by the conclusion of this manuscript?

 

Line81: “Therefore”

Is it the logical result of the previous statement?

Line79-81 needs to be improved for clarification.

 

Line111: I am sorry, I didn’t find Equation(1) in Reference[26], I just find a sentence “Chl-a was measured using a visible light spectrophotometer (HJ 897–2017)”, so, Chl-a concentration in this manuscript was measured in the same way?

 

Equation (1): Chl-a is the Chl-a concentration, Lc is the concentration of Chl-a as well, what is the difference between Chl-a and Lc?

 

Section3.1.1 “SRF-guided grouping”, I didn’t find the definition of the “spectral response functions” in the manuscript, can this term be explained briefly? As a constituent of the physical constraints highlighted in the manuscript, more details about SRF can be added. Besides, why B11 is classified into Group 3 (B3, B10, B11)? According to the proximity principle and Figure 2, B11 is closer to B4 than B10. Is there any more specific and quantified criterion for grouping?

 

Line218: “the MSI images were used as labels”. I am a little confused. The input variables in Figure 4 include MSI images, but the labels are MSI images too. In other words, the MSI image is considered as true/reference. Is this the general practice for SSF? There is another question. For example, to predict fused band 1 image, MODIS Bt is B1, MODIS Bi’ and BODIS Bj’ are MB10 and MB12, MSI Bi and MSI Bj are SB2 and SB3 in Figure 4(a). Is my understanding right? And what is the label for training FB1?

 

Equation (2): What is the final fusion image used for Chl-a retrieval, Ls or HR? Besides, the values of B, D, and n are determined by the network training or physical mechanism?

 

Equation (5) and Line 236, as there are three MODIS bands (Bt, Bi’, Bj’) are inputted in Figure 4(a), what is the original MODIS image calculated in Equation (5)?

 

Line242: What is the “spatial critical point”? Is there any specific value that can be given as an example? And how is the spatial critical point determined?

 

Line265: Why not choose FB1 and FB11? Is it due to low spectral resolution or low correlation?

 

Figure 9: In my opinion, the amount of data (only 44 samples) for training may not be enough for ML model. Are the models, which are chosen to retrieve Chl-a, suitable for small size sample? Besides, please show the mean absolute percent error (MAPE) in the figure.

 

Line378: “in -situ” -> “in situ” or “in-situ

 

Line387: How is the retrieved parameter (Chl) is considered during the fusion process? In other words, is there any influence of the retrieved parameter on the fused image, and how?

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

The author has addressed all my comments.

Back to TopTop