Next Article in Journal
Estimating Completely Remote Sensing-Based Evapotranspiration for Salt Cedar (Tamarix ramosissima), in the Southwestern United States, Using Machine Learning Algorithms
Previous Article in Journal
Automatic Detection of VLF Tweek Signals Based on the YOLO Model
 
 
Article
Peer-Review Record

Boundary-Aware Deformable Spiking Neural Network for Hyperspectral Image Classification

Remote Sens. 2023, 15(20), 5020; https://doi.org/10.3390/rs15205020
by Shuo Wang 1, Yuanxi Peng 1, Lei Wang 2 and Teng Li 3,4,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Remote Sens. 2023, 15(20), 5020; https://doi.org/10.3390/rs15205020
Submission received: 12 September 2023 / Revised: 13 October 2023 / Accepted: 16 October 2023 / Published: 19 October 2023
(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

Introduction

 

The introduction can add the transformer method.

 

The authors can consider expanding on the limitations of existing methods for HSI classification, particularly in terms of the Hughes phenomenon and the inability to distinguish boundaries.

 

Method

It is necessary to introduce the technical details of the temporal-channel joint attention mechanism in more detail. For example, how this mechanism is represented mathematically.

The paper introduces temporal-channel joint attention mechanism, which has the same motivation as the paper titled “MSLAN: A Two-Branch Multidirectional Spectral-Spatial LSTM Attention Network for Hyperspectral Image Classification” -TGRS22. The authors can compare their differences and discuss them more.

 

Experiments

 

Use more recent and advanced baseline methods for comparison, not just CNN and SNN models.

 

BDSNN needs to be analyzed more clearly than other methods in terms of computational efficiency and energy consumption.

 

Conclusion

Discuss the implications of the results and their potential impact on the field.

 

Discuss limitations of the current method and future research directions.

Author Response

Thank you very much for taking the time to review this manuscript. The detailed responses can be seen in the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this manuscript, the authors report “Boundary-Aware Deformable Spiking Neural Network for Hyperspectral Image Classification”. In view of the existing problems in hyperspectral image (HSI) classification, a boundary-aware deformable spiking residual neural network (BDSNN) method is proposed for HSI classification, and a classification model based on deformable convolution and residual neural network is established. The classification accuracy of this model is comparable to that of existing methods in terms of overall accuracy (OA), average accuracy (AA) and statistical kappa (κ) coefficient. This study provided a theoretical basis for HSI classification.

The technical terms of this manuscript are standardized. 

Some grammar and expressions require minor changes.

Author Response

Thank you very much for taking the time to review this manuscript. The detailed responses can be seen in the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

First let me state that my expertise is more in hyperspectral sensor hardware and phenomenology and less with neural networks.  

As I understand it, one of the proposed advantages of using spiking neural networks is the integration with neuromorphic computing.  By only processing when you receive a spike, you get much less computational overhead.  The authors discuss this in their section 3.5, but don't really expand on it that much.  I would have liked to heard more of perhaps predictions as to how the processing times are estimated to improve if this was integrated into a neuromorphic processing system.  As it is, the extra overhead of the training and test is significant when using a synchronous computing system.  If the author's BDSNN is employed in the real world, there could be significant slow down in throughput using this technique.  That fact is presented, but not really dealt with and I think this is a critical question with implementation of any SNNs for data intensive classification tasks such as the HSI classification tasks investigated here.

I understand what their algorithm is doing at a very high level based on their description, but can't say that I tracked the implementation completely.  The authors introduce a "temporal-channel" filter.  Presumably this is a filter that goes through the steps of the neural net as it progresses, rather than a filter that uses successive temporal inputs to gain confidence.  That distinction wasn't clear and using terms like "temporal" mean different things to different readers.

The authors jump around with how the define the term "pixels".  In section 3.1 "Date Sets" (should this be Data Sets?), they use it to refer to to total individual elements (line 230 for example) and target instances.  I think their math is off here anyway as it should be 207400 and not 2207400.  Regardless, they use the term pixels for different things and that whole section is confusing.

Also, I am curious as to how they are choosing training pixels and whether that is consistent among the various neural nets that they are considering?  Did the same 1% of pixels get chosen each time or just another random 1%?  Did the authors consider the percentage of training pixels as an input variable?  From my understanding, the methods for training an SNN are quite different from that of a standard neural network.  Would this not imply that your training data volume and methods are a hidden variable in your experiment?  

Do the different patch sizes (section 3.2) have any impact on the computational time of the algorithm?

In all of your data tables, you claim some level of error - which is great.  How do you arrive at these error calculations?  Are they just standard deviation of the results or have you don't something different?  I ask as you rarely see HSI classification tasks with errors such as 0.001% such as in line 296.  The inherent shot noise in the data is likely more than that.  One observation though in their data - they pick a lot of "easy" tasks.  Most of their results are in the 95% or greater classification accuracy, with some of the results between 99-100%.  How does their algorithm do if the classification problem is harder?  Does it outperform other algorithms if the average score is 50%?  

Figures 13 and 14 are very very dark.  Perhaps you could invert the color mapping of the results so that the areas that are not classified appear white instead of black.  As it is, they are almost unreadable.

Overall, an interesting paper.  I learned some things from it. I like how the authors examined a number of data sets and didn't just focus on a single one.  Their tables are clear and their results are reasonable.  It could be written for a wider audience if they considered an actual implementation of this technique in a real-world remote sensing application with perhaps not near-100% problems.  Improving from 99% to 99.5% by going to a new algorithm often isn't as useful as going from 40% to 60%... 

 

The authors do a good job with their English.  It is readable and the places where the English is not perfect do not impair the readability.  However, there are a number of instances where case and number agreement are not correct (line 35 - "pixel" should be "pixels" for example).  

Author Response

Thank you very much for taking the time to review this manuscript. The detailed responses can be seen in the attachment.

Author Response File: Author Response.pdf

Back to TopTop