Next Article in Journal
E-Waste Management in Serbia, Focusing on the Possibility of Applying Automated Separation Using Robots
Previous Article in Journal
In Vitro Bioactivities of Cereals, Pseudocereals and Seeds: Assessment of Antiglycative and Carbonyl-Trapping Properties
Previous Article in Special Issue
Multitask Learning-Based Affective Prediction for Videos of Films and TV Scenes
 
 
Review
Peer-Review Record

A Survey on Visual Mamba

Appl. Sci. 2024, 14(13), 5683; https://doi.org/10.3390/app14135683
by Hanwei Zhang 1,2,3, Ying Zhu 4, Dan Wang 4, Lijun Zhang 1, Tianxiang Chen 5, Ziyang Wang 6 and Zi Ye 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2024, 14(13), 5683; https://doi.org/10.3390/app14135683
Submission received: 22 May 2024 / Revised: 17 June 2024 / Accepted: 27 June 2024 / Published: 28 June 2024
(This article belongs to the Special Issue Application of Artificial Intelligence in Visual Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

While acknowledging the merit of this comprehensive review of mamba applications to the field of computer vision I found the paper lacking in objective comparisons between mamba and other available technologies or in many cases even between different implementations of mamba. I suggest including such information in order to make the review more useful and complete.

Comments on the Quality of English Language

I also suggest a careful revision of the text. Below are my notes

Fig.1 caption: insert a space between “published” and “(“

Line 59: The sentence seems to be not finished. Termed mamba born?

Line 67: outperformed à outperforming

Line 77 and following:  This survey paper is the first à remove this survey is

By expanding upon the Naive-based Mamba visual framework, we have investigated how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance. à The investigation on how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance, by expanding upon the Naive-based Mamba visual framework.

We offer an in-depth à remove we offer

Line 96: please define what sigma symbol stands for.

Figure 2 caption: expand it à Graphical representation of Mamba Block (or something similar)

Line 107: insert a space between “Hold” and “(ZOH)”

Line 110: (1) à Eq.(1)

Line 111: (3) à Eq.(3)

Line 116: SSM usually serveS or SSMs usually serve

Line 121: remove comma after Hyena [21]

Line 125: two SSMs

Line 132: what before(2,3) refers to? Selective scan before eq.2 and 3? Please clarify this and also consider to split the sentence in LL 129-134 because it is not very clear.

Line 170: VMamaba à VMamba.

Line 171: please define ViM and VSS acronyms.

Figure 3 caption: expand caption

Line 187: (a) Fig. 4. à fig.4(a). Be consistent with what is already used before. Also change all following occurrences (L 196, L199)

L189: Eqs. (2) and (3). Also change L194

Fig4 caption: put letters in caption referring to images

L215 mamba should start with capital letter also in L319

L226: vim should be capitalized

L257: As a part of multi-dimensional data, existing models for multi-dimensional data à As a part of multi-dimensional data can be dropped.

L279-280: (3) à Eq.(3) and (7) Eq.(7)

Eq 7: Define the operation represented by a circle with a dot inside in third equation and also deifne z(t)

L284: Table. 1. à Table 1. Remove the full stop between Table and 1

L599-607: Few places where a new sentence starts without a blank space after previous sentence

Author Response

While acknowledging the merit of this comprehensive review of mamba applications to the field of computer vision I found the paper lacking in objective comparisons between mamba and other available technologies or in many cases even between different implementations of mamba. I suggest including such information in order to make the review more useful and complete.   Thank you very much for your valuable comments and suggestions on our paper. We have considered your feedback and made several revisions to enhance the paper's usefulness and completeness. To address your concerns, we have made the following changes: Diagrams for Better Understanding: We have added a diagram in Section 2.1.1 (Figure 3. Graphical representation of discretized SSM) in the revised version to help readers better understand the mathematical formulations. Detailed Comparative Analysis: We have included a more detailed comparative analysis with specific metrics and benchmarks. In the revised paper, we have added Section 3.4, which presents various tables of comparative studies between the Mamba model and other advanced models, including transformers and CNNs. This section includes multiple standardized benchmark results on the same dataset and uses common performance metrics for a detailed comparison. Through these quantitative comparisons, we demonstrate the advantages of the Mamba model over traditional models across various visual tasks. Challenges and Limitations: We highlight the challenges and limitations associated with Mamba models in Section 5.1. Specifically, we discuss issues such as scalability, stability, and the difficulties related to causality and sequential data. By including these points, we aim to provide a balanced perspective on the practical applications of Mamba models, highlighting potential issues such as scalability, integration with existing systems, and computing resource requirements. Future Directions: We have added Section 5.2, "Future Directions," to our paper, where we outline potential areas for further research. This section highlights innovative scanning mechanisms and other gaps identified in the current literature. We aim to guide future research efforts and encourage further exploration in developing and applying Mamba models. We believe these additions provide a more objective comparison and enhance the overall utility of the review.  
  1. Fig.1 caption: insert a space between “published” and “(“
Thank you for your valuable feedback. We have inserted a space between "published" and "(" in the caption of Fig.1. Thank you for pointing out this issue.  
  1. Line 59: The sentence seems to be not finished. Termed mamba born?
Thank you for your feedback. I have revised the sentence to ensure completeness and clarity.  
  1. Line 67: outperformed à outperforming
Thank you for your valuable feedback. Based on your suggestion, I have revised the sentence by changing "outperformed" to "outperforming" to ensure consistency in structure and tense.  
  1. Line 77 and following: This survey paper is the first à remove this survey is
Thank you for your valuable feedback. We have carefully considered your comments and made the necessary revisions to improve our manuscript.  
  1. By expanding upon the Naive-based Mamba visual framework, we have investigated how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance. à The investigation on how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance, by expanding upon the Naive-based Mamba visual framework.
Thank you for your valuable feedback on our paper. We have adopted this wording to more clearly express the focus and method of our research.  
  1. We offer an in-depth à remove we offer
Thank you for your valuable feedback. We have made the suggested change to improve the clarity and conciseness of our manuscript.  
  1. Line 96: please define what sigma symbol stands for.
Thank you for your valuable feedback. We have included a clear definition of this symbol in the revised manuscript to ensure clarity for the readers.  
  1. Figure 2 caption: expand it à Graphical representation of Mamba Block (or something similar)
Thank you for your valuable feedback. We have expanded the caption of Figure 2 to provide a more precise description.  
  1. Line 107: insert a space between “Hold” and “(ZOH)”
Thank you for your thorough review and valuable feedback. We have addressed the issue you pointed out regarding the need to insert a space between "Hold" and "(ZOH)" in line 107.  
  1. Line 110: (1) à Eq.(1)
Thank you for pointing this out. We have revised the text accordingly to refer to the equation correctly.  
  1. Line 111: (3) à Eq.(3)
Thank you for your valuable feedback. We have revised the text accordingly to refer to the equation correctly.  
  1. Line 116: SSM usually serveS or SSMs usually serve
Thank you for pointing out the grammatical issue in Line 116. We appreciate your careful review. We have corrected the sentence to ensure grammatical accuracy.  
  1. Line 121: remove comma after Hyena [21]
Thank you for your thorough review and valuable feedback. We have carefully addressed the issue you mentioned regarding the comma after "Hyena [21]" in line 121.  
  1. Line 125: two SSMs
We appreciate your careful review of our manuscript. Regarding the issue you pointed out in Line 125, we have made the necessary correction in the manuscript.  
  1. Line 132: what before(2,3) refers to? Selective scan before eq.2 and 3? Please clarify this and also consider to split the sentence in LL 129-134 because it is not very clear.
Thank you for pointing this out. We have added the clear meaning of "before(2,3)" and divided the long unclear sentences.  
  1. Line 172: VMamaba à VMamba.
Thank you for your valuable input. We have changed the name of VMamba correctly.  
  1. Line 172: please define ViM and vss acronyms.
Thanks to your feedback, we have clarified the definitions of the acronyms ViM and vss.  
  1. Figure 3 caption: expand caption
Thank you for your valuable input. We have revised the text correctly.  
  1. Line 190: (a)Fig.4.a fig.4(a).Be consistent with what is already used before.Also change allfollowing occurrences (L 196,L199)
Thank you for pointing this out. We have revised the text correctly.  
  1. L189: Eqs.(2) and (3). Also change to L194
Thank you for your valuable suggestions. We have expanded the title of Figure 3.  
  1. Fig4 caption: put letters in caption referring to images
Thank you for your valuable suggestions. To make the image captions clearer, we have labeled the caption with the letters.  
  1. L215 mamba should start with capital letter also in L319
Thank you for pointing this out. We have revised the text correctly.  
  1. L226: vim should be capitalized
Thank you for pointing this out. We have revised the text correctly.  
  1. L257: As a part of multi-dimensional data, existing models for multi-dimensional data à As a partof multi-dimensional data can be dropped.
Thank you for your valuable feedback. We have made the suggested change to improve the clarity and conciseness of our manuscript.  
  1. L279-280: (3) à Eq.(3) and (7)Eq.(7)
Thank you for pointing this out. We have revised the text accordingly to refer to the equation correctly.  
  1. Eq7: Define the operation represented by a circle with a dot inside in third equation and also define z(t)
Thank you for your valuable suggestions. We have added an explicit definition of the operation with points in a circle in the third equality and of z(t).  
  1. L285: Table.1.à Table 1.Remove the full stop between Table and 1
Thank you for your valuable feedback. We have revised the text accordingly to refer to the table correctly.  
  1. L599-607:Few places where a new sentence starts without a blank space after previoussentence
Thank you for your thorough review and valuable feedback. We've added a space before the start of the new sentence.

Reviewer 2 Report

Comments and Suggestions for Authors

This article is the first comprehensive survey on the use of Mamba model in computer vision. It covers both 2D and 3D image processing, discussing the integration of Mamba techniques with other algorithms. The content is both interesting and important, paving the way for new advancements in image processing using the Mamba model. The article is well-described and well-organized.

Author Response

This article is the first comprehensive survey on the use of Mamba model in computer vision. It covers both 2D and 3D image processing, discussing the integration of Mamba techniques with other algorithms. The content is both interesting and important, paving the way for new advancements in image processing using the Mamba model. The article is well-described and well-organized. Thank you for your high praise and valuable feedback on our article. We deeply appreciate your recognition of the article's clarity and organization. This will inspire us to continue our in-depth research in this field and share our findings.

Reviewer 3 Report

Comments and Suggestions for Authors

The document is a comprehensive survey of Mamba models applied to computer vision tasks. It details the foundational concepts, architectures, and applications of Mamba models. The paper highlights the challenges in existing models, particularly focusing on the computational demands of Transformers and the advantages of Mamba's state space model (SSM) with selective mechanisms. It includes sections on the mathematical formulation of Mamba, integration with other technologies, and specific applications in vision tasks.

 

1. It is recommended that the authors incorporate diagrams or other techniques that allow greater understanding in the mathematical section, since the mathematical formulations and detailed descriptions of the models can be dense.

In addition, it is felt that although the study highlights the advantages of Mamba over traditional models such as Transformers, a more detailed comparative analysis with specific metrics and benchmarks would strengthen the argument for Mamba's superiority.

 

 

2. The paper should include a more thorough analysis of the challenges and limitations associated with Mamba models, especially in real-world applications. Addressing potential issues such as scalability, integration with existing systems and computing resource requirements would provide a more balanced perspective.

 

3. Although the aim of this article is to raise further interest in Mamba models, it would be useful to devote a specific section to future lines of research and gaps identified in the current literature. This would guide researchers on where to focus their efforts in the future.

 

4. Add a section presenting comparative studies with quantitative data comparing Mamba models to other state-of-the-art models. Use standardized benchmarks to clearly illustrate performance differences.

 

Author Response

  1. It is recommended that the authors incorporate diagrams or other techniques that allow greater understanding in the mathematical section, since the mathematical formulations and detailed descriptions of the models can be dense.
In addition, it is felt that although the study highlights the advantages of Mamba over traditional models such as Transformers, a more detailed comparative analysis with specific metrics and benchmarks would strengthen the argument for Mamba's superiority.   Thank you very much for your valuable comments and suggestions on our paper. We have added a diagram in Section 2.1.1 (Figure 3. Graphical representation of discretized SSM) to help readers better understand the mathematical formulas. We have included a detailed analysis and specific metric comparisons to compare the Mamba and traditional models. In the revised paper, we have added multiple benchmark results on the same dataset and used common performance metrics for comparison. Please see Section 3.4. We compare the Mamba model with various transformer and CNN models across multiple well-known public datasets. Through this detailed comparison, we further demonstrate the superiority of the Mamba model over traditional models across various visual tasks.  
  1. The paper should include a more thorough analysis of the challenges and limitations associated with Mamba models, especially in real-world applications. Addressing potential issues such as scalability, integration with existing systems and computing resource requirements would provide a more balanced perspective.
  Thank you for your valuable feedback. Section 5.1 of our paper has addressed the challenges and limitations associated with Mamba models. Specifically, we have discussed issues such as scalability and stability and the difficulties related to causality and sequential data. By including these points, we aim to provide a balanced perspective on the practical applications of Mamba models, highlighting potential issues such as scalability, integration with existing systems, and computing resource requirements.  
  1. Although the aim of this article is to raise further interest in Mamba models, it would be useful to devote a specific section to future lines of research and gaps identified in the current literature. This would guide researchers on where to focus their efforts in the future.
  Thank you for your insightful feedback. We have added Section 5.2, "Future Directions," to our paper, where we outline potential areas for further research. This section highlights innovative scanning mechanisms and other gaps identified in the current literature. We aim to guide future research efforts and encourage further exploration in developing and applying Mamba models.  
  1. Add a section presenting comparative studies with quantitative data comparing Mamba models to other state-of-the-art models. Use standardized benchmarks to clearly illustrate performance differences.
Thank you for your valuable comments on our paper. In response to your suggestion to add a section with quantitative comparisons between the Mamba model and other state-of-the-art models, we have made the following changes: We have added a new Section 3.4 in the revised paper, specifically presenting various tables of comparative studies between the Mamba model and other advanced models. This section includes multiple standardized benchmark results to illustrate performance differences clearly. We used several common performance metrics for a detailed comparison across different tasks. Through these quantitative comparisons, we further demonstrate the advantages of the Mamba model across various metrics.
Back to TopTop