Develop a Lightweight Convolutional Neural Network to Recognize Palms Using 3D Point Clouds
Round 1
Reviewer 1 Report
The authors have proposed a novel neural network architecture for palm recognition. The system is slightly less accurate than other comparable networks (as per authors' own comparison), but it shows tremendous reduction in number of FLOPS which is of significant importance.
Author Response
We are delighted that you recognize the contributions we have made, and your valuable feedback has greatly assisted our research.
Reviewer 2 Report
The Multi-view Projection (MVP) algorithm is proposed to project 3D point clouds onto 2D images from several different views. Next, the TMBNet is employed which combines advanced feature fusion and extraction methods. The processing pipeline is tested for palm recognition.
Overall the paper reads nicely, but it has three major weaknesses:
1) A better formatting is needed so that figures and tables appear close to the point that they are cross-referenced for first time. Symbols need to be defined in Eq. (1). A better description is needed for Table 1. Define the term "accuracy".
2) Describe the experimental protocol. Have you employed leave-one-person-out? The description should enable reproduction of the results. This is not the case here, although several technical details are included. I would strongly suggest releasing code upon paper acceptance.
3) Having a small dataset, the performance differences might not be statistically significant. Support with detailed and precise arguments this matter.
Author Response
We sincerely appreciate the reviewer's accurate reading of our paper and valuable feedback. Your expertise and suggestions have been invaluable to our research.
1) We have reorganized the formatting, including the positioning and size of figures and tables. We now explicitly clarify what "Accuracy" represents in the experimental description.
2) We have rewritten the experimental section and incorporated new experiments to provide more robust evidence. The code will be publicly released upon acceptance.
3) Indeed, considering this dataset is a small dataset focused on palms, we conducted a new experiment. In this experiment, we reassign the training and testing sets in a 4:1 ratio and repeat the process ten runs. We have presented the results of this experiment in a new table to strengthen the evidence in our conclusions.
Reviewer 3 Report
This article discusses various aspects of image recognition and deep machine learning technologies in the context of solving problems related to the automatic conversion of 2D hand palm images into 3D representations. This topic is undeniably relevant, especially in terms of its potential as an alternative biometric approach. The article effectively highlights the current applicability of automated hand identification systems in everyday life. However, the development of such systems relies heavily on the availability of high quality 3D images in the early stages, as the effectiveness of new neural network methods depends on the quality of the visual data. While the article is generally well written and the authors' material is clear, it is not without certain shortcomings that warrant attention.
Shortcomings:
1) A notable issue is the lack of comparative analysis between the authors' results and those of other works that use the same dataset in their experiments. This absence raises questions about the lack of contemporary work or the omission of such comparisons.
2) The article lacks a comprehensive description of modern and, in particular, semi-polar datasets relevant to the task at hand. It also fails to provide examples of the best work using these datasets. This deficiency raises doubts about the availability of other modern datasets or the exclusion of noteworthy work.
3) The references to previous work within the global scientific community (from 2020 to 2023) should be expanded, in particular those presented at highly regarded conferences focusing on video analysis (such as CVPR, ICCV, ECCV, FG, ICASSP, SPECOM) or corpus collection (including LREC), as well as top-tier journals (such as Neurocomputing). Enriching the article with references to these sources would enhance its academic credibility.
4) Where possible, the article should include drawings in vector format, using tools such as draw.io or similar software. This would improve the clarity and quality of the visuals.
5) Any abbreviations used should be fully explained when they are first introduced. In addition, a comprehensive list of abbreviations should be included at the end of the article for easy reference.
6) The conclusion should include a discussion of potential avenues for future research, outlining the directions in which the field could progress and suggesting further investigations.
7) The style of the article needs to be revised to remove spelling and punctuation errors. Although the article is generally easy to read, these errors detract from its overall quality.
In its current state, the article raises certain questions and would benefit from expansion and refinement to achieve a more comprehensive and satisfactory presentation.
Moderate editing of English language required.
Author Response
We sincerely appreciate the reviewer's meticulous reading of our paper and valuable feedback. Your expertise and suggestions have been invaluable to our research.
1~2) According to our understanding, while the PolyU dataset is not considered a particularly obscure dataset, it lacks a consistent metric, making it challenging to compare different methods directly. Although there are numerous studies conducted over the past decade on it, it has been difficult to perform cross-sectional comparisons. However, based on your feedback, we experimented with PointNet++, a better version of PointNet. We observed slight improvements compared to the original PointNet. We believe this new experimental result supports the notion that relying solely on model improvements has a limited impact on enhancing recognition performance. Additionally, considering the PolyU dataset is a small, only palm dataset, we emphasize the importance of enhancing data representation capability rather than focusing solely on model enhancement. As a result, we have re-adjusted the content to highlight our proposed MVP method, which enhances the recognition capability based on 3D palm projects to 2D palm images by mimicking human observation behavior. Finally, we have incorporated standard classical 2D networks and our lightweight model to collaborate with MVP and strengthen our conclusion.
3) We have reorganized the related work section. We first review some methods for palm recognition, then discuss the 3D CNN models for point cloud processing, introduce several classical 2D CNN models, and present some past projection methods to introduce the main focus of our paper. We have added ten new references, including top conferences such as CVPR and ECCV.
4-6) We have attempted to reorganize the wording in the article's structure as much as possible while striving to maintain high-quality images. We have also revised the conclusion section further to emphasize our proposed method's role in palm recognition. We also provide a comprehensive list of abbreviations you suggest.
Round 2
Reviewer 2 Report
The authors falsely claim that they have improved the appearance of Tables. Table 1 is inserted far from the place it is cross-referenced.
The definition of accuracy appears too late.
What is important is to measure the standard deviation of accuracy.
Author Response
Once again, we sincerely appreciate your valuable feedback and suggestions, which have contributed significantly to improving the clarity and accuracy of our paper.
1. Thank you for your corrections and suggestions. We acknowledge that in Section 3.2.3, we mistakenly referenced Table 1 as Figure 1, leading to confusion. We have now rectified this error and ensured that all tables and figures are positioned closer to their first mention.
2. We appreciate your reminder, and we have synchronized the explanation of accuracy's definition when Table 1 is initially presented. Additionally, we will reintroduce the definition in the experiments section to provide further clarity.
3. We appreciate your suggestion; however, we believe that the minimum value is a more stringent criterion compared to the standard deviation. If the minimum accuracy consistently remains high across ten independent experiments, we believe that our method demonstrates a certain degree of reliability.
Reviewer 3 Report
The authors of the article have diligently addressed and resolved all of the significant comments raised in the previous review. As a result, the current version of the article is now suitable and ready for publication. It demonstrates the authors' commitment to refining their work and ensuring its quality.
Minor editing of English language required.
Author Response
We appreciate your valuable suggestions and efforts in reviewing our paper. Your comments have significantly improved the quality of our manuscript.