Multimodal Fusion of Voice and Gesture Data for UAV Control
Round 1
Reviewer 1 Report
<Multimodal Fusion of Voice and Gesture Data for UAV Control >
This paper proposes addresses the fusion of input modalities such as voice and gesture data, which are captured through a microphone and a Leap Motion controller, respectively, to control UAV swarms. The proposed approach is very interesting.But there are still some problems that need to be revised. My review comments are as follows:
1. The content of this paper is mostly mature methods and technology applications, and the problems to be solved and the substantial contributions to be made in this paper need to be further thought.
2. The language expression in this paper should be polished and further improved,There are minor problems in the format of the references, please check carefully for modifications.
3. The key novel conclusions should be stated more clearly compared with existing works.
4. The experimental results must be expanded and comparisons must be included and well justified. It would be nice to explain the evaluation and quantification part of the results in more technical way.
5. In section 4, details about image streaming from the UAV camera are missing. It is a must to describe the deployment procedure of the algorithm on the platform.
6. The method in this paper needs to be further tested in the actual environment and application.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
In the manuscript, the authors proposed fusion of voice and gesture data for controlling UAV in simulation environments. But I found the following limitations: The reviewed related work is very limited. The technical depth of the paper is shallow.It is not appropriate to compare the proposed method with voice/gesture. The method should be compared with other state-of-art fusion methods. The novelty of the paper is limited. Authors need to bring novelty and originality to the work, and need to establish clear superiority of their methodology through comprehensive comparison results with very recent algorithms. In the experiment section, there is no detailed explanation of the simulation setup or method. The results are only numerical. Thus, the accuracy of the provided analytical results needs to be validated. Overall, I cannot recommend the work or manuscript to be published in the current state.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I would like to thank the authors for the careful responses to the revision process. They have addressed my major concerns in a reasonable manner.
Author Response
Thank you very much for your review and full recognition of our work.
Reviewer 2 Report
The majority of the previous review comments were not completed answered or thoroughly explained in the authors' response.
The biggest concern is as follows:
For example, the authors claim there are "few methods" using fusion of gesture and speech. In fact, the authors should search for a more general term of gesture and speech control for robotics, instead just looking for application in drones.
And the lack of publication of gesture and speech control for drones is a concern. Is it possible the method does not perform as well as conventional joystick control/automatic control of drones?
The authors still didn't provide many important experiments to support and provide the advantage of the proposed method. For example, what is the system delay time for a drone system that adopts the proposed design? How it compares with conventional joystick control? What about the recognition precision and error rate compare to conventional joystick control?
The authors should also look at review comment points 1, 2, 3 again and provide more through answers to them in the next revision.
Author Response
Please see the attachment.
Author Response File: Author Response.docx