A Neural Network-Based Fusion Approach for Improvement of SAR Interferometry-Based Digital Elevation Models in Plain and Hilly Regions of India
Round 1
Reviewer 1 Report (Previous Reviewer 1)
The authors corrected the paper with respect to my remarks.
I do not have any further comments.
Author Response
Dear Sir/Madam,
Greetings!
We are extremely grateful for your appreciation and suggestions, which brought the manuscript in the current form.
The constructive suggestions given by you also motivates us for further academic and research activities along with sharing of them through the publications.
Thanks.
Kind regards,
Author Response File: Author Response.pdf
Reviewer 2 Report (Previous Reviewer 3)
The article's quality has improved; I recommend it to be published.
Author Response
Dear Sir/Madam,
Greetings!
We are extremely grateful for your appreciation and suggestions, which brought the manuscript to the current form.
The constructive suggestions given by you has motivated us for further academic and research activities along with sharing of the learnings / outputs / results through the publications.
Thanks.
Kind regards,
Author Response File: Author Response.pdf
Reviewer 3 Report (New Reviewer)
This paper presents an ANN which trained with DEM features to predict the elevation of two new areas. Overall, the Materials and Methods and Results sections are challenging to read and understand. Therefore, as a reviewer, it is not recommended to accept publication under the manuscript's current status. The author is also invited to re-modify it into a more concise and intuitive layout.
Regarding methods and materials, many points need to be clarified:
Please describe in detail the appearance of the input data and how it corresponds to the input layer of the ANN model.
Please explain the content and quantity of train/valid/test data in detail.
What is the output of the model? How to compare with the actual value of the test?
In the inference phase, what is the input data, and what is the ground truth value?
The training data includes a slope, aspect, and TPI.TRI, VRM, LULC, and other data. Except for LULC, most of them are derived from DEM. Please explain the influence. There may be experimental support for the repeated addition of these derived values ​​to benefit the predicted DEM results.
Please simplify the presentation of loss or parameter changes in the training process, such as whether one figure can combine Figure 3 and Figure 6. Also, why does Tanh not draw?
The meaning of the presentation in Table 2 is unclear—also the Table4. If the reader can not get the information from the form; please emphasize the key points or differences.
It is not appropriate to judge an image only by MSE or RMSE. Moreover, with such a large range of data, such an indicator can represent the overall quality is debatable. Please list other indicators supporting DEM performance, such as Mean, StDev, error % in meter, etc.
Please plot the 1 D profile of elevations to compare predicted (fusion) DEM to test (ground truth) DEM.
For the ann model, it is not detailed enough; please list the input and output parameters of each layer. In addition, for the Keras model, please directly export summary information for comparison in the paper. Please also show the input and output parameters of the Matlab model.
In the NN architecture, the 31-60-50-1 represents the layer's parameter, width, or depth.
In addition, the number of layers used by the Keras model and the Matlab model is different. The exact structure should be used if the two are to be compared.
The article puts too many unnecessary variable data graphs.
Author Response
Dear Sir/Madam
Greetings!
We thank you and are very grateful for the valuable suggestions which helped us to bring the manuscript in current improved form. The manuscript has been revised and improved as per the given suggestions. Please find our response to the comments/suggestions and pointwise details of the improvement made in the manuscript. The specific portions to be improved/revised have been done in the “track changes” mode and highlighted for your reference.
Reviewer 3
This paper presents an ANN which trained with DEM features to predict the elevation of two new areas. Overall, the Materials and Methods and Results sections are challenging to read and understand. Therefore, as a reviewer, it is not recommended to accept publication under the manuscript's current status. The author is also invited to re-modify it into a more concise and intuitive layout.
Regarding methods and materials, many points need to be clarified:
Please describe in detail the appearance of the input data and how it corresponds to the input layer of the ANN model.
=> The input data includes the elevation values from the multiple InSAR-based DEMs for each study site. DEM derivatives namely, the slope, aspect, TRI, TPI and VRM values (each raster layer) were derived from the input DEMs along with the LULC class information for each area. This information is extracted in the form of point values and applied to the neurons of the input layer of the neural network models. Each node in the input layer corresponds to the value of the individual raster layer as described in lines 327 to 342 in the Methodology section.
Please explain the content and quantity of train/valid/test data in detail.
=> The training and testing datasets are prepared from the values extracted from the raster layers (that includes elevation values of multiple InSAR DEMs, DEM derivative values and LULC class values) on the point location of ICESat-2 footprints (Reference elevation) distributed across the study areas. Further, in the Keras models, the training and testing (validation) dataset is split randomly in a ratio of 70:30. In the MATLAB based models, the default function of the Feed Forward Backpropagation model, “DIVIDERAND” is used which splits the complete dataset into Train:Validation:Test samples in the ratio of 70:15:15. The new dataset for testing the trained models are prepared in a similar manner from the subset of the study area. The quantity of train/valid/test data are given in the lines 423-425;523-525. New data for testing the trained model is apporximately 209000 and 550000 point samples for Ghaziabad and Dehradun regions respectively.
What is the output of the model? How to compare with the actual value of the test?
=> The output from the trained model is the prediction of elevation values on the test area which is the subset of the study area and which were not used in the training samples as well as the ICESat-2 reference elevation is not available for these point samples. Thus, the trained neural network model gives a prediction of elevation on the new/test area. The output from the models is assessed by comparing it with the TanDEM-X 90 m DEM and RMSE is calculated (Please refer to the lines 490-492; 565-566 in results section).
In the inference phase, what is the input data, and what is the ground truth value?
=> In the inference phase, the performance of the trained model is evaluated on the data points from the test area which is a subset of the study area. The data points from this new or test area were not used while training and are unseen to the model. The input data includes the elevation values of the InSAR DEMs and their corresponding DEM derivative values along with the LULC class values. As here the performance of the trained model is evaluated, those data points are considered where the reference values of ICESat-2 are not available. Hence, the trained model gives elevation predictions for the new area and these predictions are evaluated for accuracy by comparing them with the TanDEM-X 90 m DEM values. A code snippet is attached in the figure for reference. The figure shows the complete input dataset used in the inference phase.
Fig: code snippet showing the input data in the inference phase
The training data includes a slope, aspect, and TPI.TRI, VRM, LULC, and other data. Except for LULC, most of them are derived from DEM. Please explain the influence. There may be experimental support for the repeated addition of these derived values ​​to benefit the predicted DEM results.
=> In this study, we have developed a DEM fusion framework using the neural network models for their improvement. The Neural Network models the relationship of the input elevation with the reference elevation based on the parameters (or dependencies) like the slope, aspect, TPI, TRI, VRM and LULC class information. Although, the modelling can be performed with only elevation values, including the DEM derivatives will improve the predictions. Moreover, the literature survey shows that these derivatives have a relationship with the DEM quality and hence are useful in DEM fusion study [reference [17]). Each of these DEM derivatives and the class information will provide better detail of the topography and provides for a better prediction from the neural network models. Following references have been considered while selecting these DEM derivatives through literature survey:
Each of the DEM derivatives gives a different perspective of the terrain as derived from the DEM. The Slope gives the amount of change in elevation in both up or down steepest direction. Increase in the steepness of the slope, there will be a decrease in the accuracy of DEM while with the increase in the slope, stereo pair geometry is more crucial to define the accuracy of DEM. Aspect is the measure of the steepest slope of DEM in the downhill direction. A foreslope will generally give the best elevation accuracy while backslope accounts for the worst (reference [37] in manuscript). TRI is derived from the standard deviation of slope or elevation and it is the difference between the elevation of a cell and the mean elevation of 8 neighbouring cells (reference [38]). TPI is also a relative measure of elevation with the neighbouring pixels' heights. A positive TPI shows higher elevation than the surrounding pixels and a negative value shows lower elevation (reference [39], [40]). VRM is the measure of ruggedness in a three-dimensional perspective of the raster grid with its neighbourhood. It shows the vector dispersion of the elevation value in its neighbourhood (reference [41]).
Please simplify the presentation of loss or parameter changes in the training process, such as whether one figure can combine Figure 3 and Figure 6. Also, why does Tanh not draw?
=> The training and validation loss curves show the performance of the neural network model while training. Figures 3 and 6 depict the performance of the different models for the Ghaziabad and Dehradun areas respectively. Hence, they cannot be combined as they represents different model and pertains to different areas. Several iterations have been carried out to find the best suitable models for each study area and these are attached in appendix A. Only the performance curve for the best architecture models is shown in Figures 3 and 6 as per previous suggestions by the anonymous reviewers.
The meaning of the presentation in Table 2 is unclear—also the Table4. If the reader cannot get the information from the form; please emphasize the key points or differences.
=> Tables 2 and 4 present the value of loss parameters as obtained while performing iterations to select the best model architecture in the Ghaziabad and Dehradun regions respectively (described in lines 439-443). These tables represent the heuristic approach applied in selecting the best fit Keras-based neural network models in each of the study areas. From table 2 it is observed that for the plain terrain region of Ghaziabad and its surroundings, the architecture of 31-21-15-1 neurons in input- hidden layer1- hidden layer2- output layer with sigmoid activation function performs best. While from table 4, it is observed that for the hilly region of Dehradun and its surroundings a model with 31-64-128-1 neurons in the input-hidden layer1- hidden layer2- output layer with TanH activation function performs well.
It is not appropriate to judge an image only by MSE or RMSE. Moreover, with such a large range of data, such an indicator can represent the overall quality is debatable. Please list other indicators supporting DEM performance, such as Mean, StDev, error % in meter, etc.
=> We have carried out a point-based accuracy assessment on the data samples of the study areas by estimating the RMSE of the fused output predictions by comparing them with the TanDEM-X 90 m DEM elevation values. Further, the percentage improvement factor is provided to show the improvement achieved in fused output over the input DEMs. We have added the LE90 (Linear error at 90th Percentile) in support of RMSE as a performance indicator of scalar accuracy. (Please refer to the lines 383-387; Table 3 and 5).
Please plot the 1 D profile of elevations to compare predicted (fusion) DEM to test (ground truth) DEM.
=> The DEMs are continuous raster layers depicting the terrain in three-dimensional form and thus are tested as continuous rater layer using ICESat-2 ATL08 photon data for point based assessment andTanDEM-X for area wise assessment.
For the ann model, it is not detailed enough; please list the input and output parameters of each layer. In addition, for the Keras model, please directly export summary information for comparison in the paper. Please also show the input and output parameters of the Matlab model.
=> The parameters of the input and output layers for the Keras-based models are: Each layer is a sequential dense layer having a specific number of neurons as required to obtain the best fit model, the input and hidden layers have a sigmoid activation function while the output layer has ReLU activation function for the plain terrain of Ghaziabad. While for the Dehradun region, the input and hidden layer use tanh activation function and ReLU in the output layer. These are feed-forward backpropagation neural network models implemented in Keras python-based library. (Please refer to lines: 349-353)
While the MATLAB models are also Feed Forward Backpropagation neural network models using TRAINLM training algorithm. Here the neural network architecture is similar that is same number of neurons and activation functions are used as in Keras based models. For implementing a model in MATLAB, we can define the name, select the type of model, select the loss parameter, training function, number of layers and the activation function for each layer can be selected from the available parameters and the model is created in the NN-toolbox. (Please refer to lines 390-398).
Further, the details of each model used in the two study areas specifically have been described in sections 5.1 a, b and 5.2 a, b.
In the NN architecture, the 31-60-50-1 represents the layer's parameter, width, or depth.
=> The Neural Network architecture such as 31-21-15-1 represents the number of neurons in the Input layer- Hidden layer1- Hidden layer2- Output layer respectively meaning the number of input layer neuron is 31, Hidden Layer1 has 21 neurons, Hidden Layer2 has 15 neuron units and the Output layer has one node. (Please refer the lines 426-428; 518-520 and Table 2 and 4).
In addition, the number of layers used by the Keras model and the Matlab model is different. The exact structure should be used if the two are to be compared.
=> The number of layers, as well as the number of neurons in each layer is the same for both Keras and MATLAB-based models. For Ghaziabad and its surrounding regions, both models use 31-21-15-1 architecture and for Dehradun and its surrounding regions, both models use 31-64-128-1 architecture. (Please refer to Table 2 & Figure 4; Table 4 & Figure 7).
The article puts too many unnecessary variable data graphs.
=> The relevant graphs are included in the main body of the article representing the crucial elements of the study. Further, the heuristic approach and several iterations in obtaining the best fit models for the study areas are attached in Appendix A.
We again thanks yo for indicating the important points for improvement of the manuscript. We have incorporated almost every suggestion provided by all the respected reviewers.
Thanks.
Kind regards,
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report (New Reviewer)
Thanks to the authors' replies to the previous review comments, there are no more questions about the current manuscript.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
The article submitted to the review is interesting, but it requires a number of modifications, corrections and additions before publication.
From my point of view, the main drawback is the fact that the authors do not explain all the symbols used in the formulas. They also do not indicate the source of the formulas and formulas. That needs to be added.
Furthermore, the same symbolism must be used when explaining symbols - i.e. ITALIC style. This error occurs many times - eg lines 131, 132170,174, 175, 341.342 ....
The authors use a BOLD style when writing formulas, which is not necessary and typographically unsightly.
In addition, in Formula 2, the description of the individual members is completely confusing. It is necessary to correct and explain everything in detail!
In the text, some of the explained symbols are given by the authors in parentheses, they are not given. they are very inconsistent. In my opinion, parentheses are not necessary, Italic style and keeping thecorrect index notation is enough.
Figure 3 and Figure 6 are too small. The axes are completely illegible, moreover, the meaning of the axes is missing. I recommend enlarging, leaving only some parts of the pictures in the article and putting the rest in the Appendix.
Page 13, picture - what is Mu? .... need to be explained, again the description of the x-axis is missing.
Reviewer 2 Report
The paper is well written. The organization of the presentation is easy for the reader to understand. The results seem to be worthy of disseminating to researchers interested in topographical applications.
Reviewer 3 Report
The authors propose a novel approach for improvement of SAR Interferometry based Digital Elevation Models in Plain and hilly terrain of India based on a fusion neural network. Nevertheless, the following observations were made in the manuscript :
1 . The authors should clarify the research's novelty and clearly state their contributions in the Introduction section. It is also necessary to clearly state the structure of the article.
2. The author needs to state the purpose of using the loss function MSE.
3. Section 6 can be included in section 5.
4. It is necessary to check the detailed wording of acronyms such as RMSE in the Abstract section.
Reviewer 4 Report
The manuscript proposes to utilize NN models to predict elevations in the test areas. Overall, I don't think the manuscript is fit for the journal because:
1. The NN model has been one of the most basic model in the AI domain. The manuscript describes a lot about basic NN knowledge, and most audiences would have be very familiar with this.
2. The manuscript mainly focuses on applications and doesn't present any new improvements for NN models, which is more suitable for other journals, such as Remote Sensing.