A Survey on Visual Mamba
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsWhile acknowledging the merit of this comprehensive review of mamba applications to the field of computer vision I found the paper lacking in objective comparisons between mamba and other available technologies or in many cases even between different implementations of mamba. I suggest including such information in order to make the review more useful and complete.
Comments on the Quality of English LanguageI also suggest a careful revision of the text. Below are my notes
Fig.1 caption: insert a space between “published” and “(“
Line 59: The sentence seems to be not finished. Termed mamba born?
Line 67: outperformed à outperforming
Line 77 and following: This survey paper is the first à remove this survey is
By expanding upon the Naive-based Mamba visual framework, we have investigated how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance. à The investigation on how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance, by expanding upon the Naive-based Mamba visual framework.
We offer an in-depth à remove we offer
Line 96: please define what sigma symbol stands for.
Figure 2 caption: expand it à Graphical representation of Mamba Block (or something similar)
Line 107: insert a space between “Hold” and “(ZOH)”
Line 110: (1) à Eq.(1)
Line 111: (3) à Eq.(3)
Line 116: SSM usually serveS or SSMs usually serve
Line 121: remove comma after Hyena [21]
Line 125: two SSMs
Line 132: what before(2,3) refers to? Selective scan before eq.2 and 3? Please clarify this and also consider to split the sentence in LL 129-134 because it is not very clear.
Line 170: VMamaba à VMamba.
Line 171: please define ViM and VSS acronyms.
Figure 3 caption: expand caption
Line 187: (a) Fig. 4. à fig.4(a). Be consistent with what is already used before. Also change all following occurrences (L 196, L199)
L189: Eqs. (2) and (3). Also change L194
Fig4 caption: put letters in caption referring to images
L215 mamba should start with capital letter also in L319
L226: vim should be capitalized
L257: As a part of multi-dimensional data, existing models for multi-dimensional data à As a part of multi-dimensional data can be dropped.
L279-280: (3) à Eq.(3) and (7) Eq.(7)
Eq 7: Define the operation represented by a circle with a dot inside in third equation and also deifne z(t)
L284: Table. 1. à Table 1. Remove the full stop between Table and 1
L599-607: Few places where a new sentence starts without a blank space after previous sentence
Author Response
While acknowledging the merit of this comprehensive review of mamba applications to the field of computer vision I found the paper lacking in objective comparisons between mamba and other available technologies or in many cases even between different implementations of mamba. I suggest including such information in order to make the review more useful and complete. Thank you very much for your valuable comments and suggestions on our paper. We have considered your feedback and made several revisions to enhance the paper's usefulness and completeness. To address your concerns, we have made the following changes: Diagrams for Better Understanding: We have added a diagram in Section 2.1.1 (Figure 3. Graphical representation of discretized SSM) in the revised version to help readers better understand the mathematical formulations. Detailed Comparative Analysis: We have included a more detailed comparative analysis with specific metrics and benchmarks. In the revised paper, we have added Section 3.4, which presents various tables of comparative studies between the Mamba model and other advanced models, including transformers and CNNs. This section includes multiple standardized benchmark results on the same dataset and uses common performance metrics for a detailed comparison. Through these quantitative comparisons, we demonstrate the advantages of the Mamba model over traditional models across various visual tasks. Challenges and Limitations: We highlight the challenges and limitations associated with Mamba models in Section 5.1. Specifically, we discuss issues such as scalability, stability, and the difficulties related to causality and sequential data. By including these points, we aim to provide a balanced perspective on the practical applications of Mamba models, highlighting potential issues such as scalability, integration with existing systems, and computing resource requirements. Future Directions: We have added Section 5.2, "Future Directions," to our paper, where we outline potential areas for further research. This section highlights innovative scanning mechanisms and other gaps identified in the current literature. We aim to guide future research efforts and encourage further exploration in developing and applying Mamba models. We believe these additions provide a more objective comparison and enhance the overall utility of the review.- Fig.1 caption: insert a space between “published” and “(“
- Line 59: The sentence seems to be not finished. Termed mamba born?
- Line 67: outperformed à outperforming
- Line 77 and following: This survey paper is the first à remove this survey is
- By expanding upon the Naive-based Mamba visual framework, we have investigated how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance. à The investigation on how Mamba’s capabilities can be enhanced and combined with other architectures in order to achieve superior performance, by expanding upon the Naive-based Mamba visual framework.
- We offer an in-depth à remove we offer
- Line 96: please define what sigma symbol stands for.
- Figure 2 caption: expand it à Graphical representation of Mamba Block (or something similar)
- Line 107: insert a space between “Hold” and “(ZOH)”
- Line 110: (1) à Eq.(1)
- Line 111: (3) à Eq.(3)
- Line 116: SSM usually serveS or SSMs usually serve
- Line 121: remove comma after Hyena [21]
- Line 125: two SSMs
- Line 132: what before(2,3) refers to? Selective scan before eq.2 and 3? Please clarify this and also consider to split the sentence in LL 129-134 because it is not very clear.
- Line 172: VMamaba à VMamba.
- Line 172: please define ViM and vss acronyms.
- Figure 3 caption: expand caption
- Line 190: (a)Fig.4.a fig.4(a).Be consistent with what is already used before.Also change allfollowing occurrences (L 196,L199)
- L189: Eqs.(2) and (3). Also change to L194
- Fig4 caption: put letters in caption referring to images
- L215 mamba should start with capital letter also in L319
- L226: vim should be capitalized
- L257: As a part of multi-dimensional data, existing models for multi-dimensional data à As a partof multi-dimensional data can be dropped.
- L279-280: (3) à Eq.(3) and (7)Eq.(7)
- Eq7: Define the operation represented by a circle with a dot inside in third equation and also define z(t)
- L285: Table.1.à Table 1.Remove the full stop between Table and 1
- L599-607:Few places where a new sentence starts without a blank space after previoussentence
Reviewer 2 Report
Comments and Suggestions for AuthorsThis article is the first comprehensive survey on the use of Mamba model in computer vision. It covers both 2D and 3D image processing, discussing the integration of Mamba techniques with other algorithms. The content is both interesting and important, paving the way for new advancements in image processing using the Mamba model. The article is well-described and well-organized.
Author Response
This article is the first comprehensive survey on the use of Mamba model in computer vision. It covers both 2D and 3D image processing, discussing the integration of Mamba techniques with other algorithms. The content is both interesting and important, paving the way for new advancements in image processing using the Mamba model. The article is well-described and well-organized. Thank you for your high praise and valuable feedback on our article. We deeply appreciate your recognition of the article's clarity and organization. This will inspire us to continue our in-depth research in this field and share our findings.Reviewer 3 Report
Comments and Suggestions for AuthorsThe document is a comprehensive survey of Mamba models applied to computer vision tasks. It details the foundational concepts, architectures, and applications of Mamba models. The paper highlights the challenges in existing models, particularly focusing on the computational demands of Transformers and the advantages of Mamba's state space model (SSM) with selective mechanisms. It includes sections on the mathematical formulation of Mamba, integration with other technologies, and specific applications in vision tasks.
1. It is recommended that the authors incorporate diagrams or other techniques that allow greater understanding in the mathematical section, since the mathematical formulations and detailed descriptions of the models can be dense.
In addition, it is felt that although the study highlights the advantages of Mamba over traditional models such as Transformers, a more detailed comparative analysis with specific metrics and benchmarks would strengthen the argument for Mamba's superiority.
2. The paper should include a more thorough analysis of the challenges and limitations associated with Mamba models, especially in real-world applications. Addressing potential issues such as scalability, integration with existing systems and computing resource requirements would provide a more balanced perspective.
3. Although the aim of this article is to raise further interest in Mamba models, it would be useful to devote a specific section to future lines of research and gaps identified in the current literature. This would guide researchers on where to focus their efforts in the future.
4. Add a section presenting comparative studies with quantitative data comparing Mamba models to other state-of-the-art models. Use standardized benchmarks to clearly illustrate performance differences.
Author Response
- It is recommended that the authors incorporate diagrams or other techniques that allow greater understanding in the mathematical section, since the mathematical formulations and detailed descriptions of the models can be dense.
- The paper should include a more thorough analysis of the challenges and limitations associated with Mamba models, especially in real-world applications. Addressing potential issues such as scalability, integration with existing systems and computing resource requirements would provide a more balanced perspective.
- Although the aim of this article is to raise further interest in Mamba models, it would be useful to devote a specific section to future lines of research and gaps identified in the current literature. This would guide researchers on where to focus their efforts in the future.
- Add a section presenting comparative studies with quantitative data comparing Mamba models to other state-of-the-art models. Use standardized benchmarks to clearly illustrate performance differences.