Figure 1.
Three basic hand gestures of a musical conductor: (a) 2/4 duple, (b) 3/4 triple, and (c) 4/4 quad.
Figure 1.
Three basic hand gestures of a musical conductor: (a) 2/4 duple, (b) 3/4 triple, and (c) 4/4 quad.
Figure 2.
Diagram of our proposed method.
Figure 2.
Diagram of our proposed method.
Figure 3.
Three inputs for our proposed system: (a) original input, (b) skeleton image, and (c) depth image with the target body marked in green.
Figure 3.
Three inputs for our proposed system: (a) original input, (b) skeleton image, and (c) depth image with the target body marked in green.
Figure 4.
Data points of the continuous gestures captured from tracking users’ palms: (a) duple, (b) triple, and (c) quad.
Figure 4.
Data points of the continuous gestures captured from tracking users’ palms: (a) duple, (b) triple, and (c) quad.
Figure 5.
Simplified graph of the height of the palm position during a duple gesture.
Figure 5.
Simplified graph of the height of the palm position during a duple gesture.
Figure 6.
Visualization of a simplified graph of the palm position during a duple gesture after performing segmentation.
Figure 6 shows how a continuous gesture is segmented into six different sections, and how each section is individually processed and classified based on the information between the starting and end points.
Figure 7 shows one of the segmentations from
Figure 6.
Figure 6.
Visualization of a simplified graph of the palm position during a duple gesture after performing segmentation.
Figure 6 shows how a continuous gesture is segmented into six different sections, and how each section is individually processed and classified based on the information between the starting and end points.
Figure 7 shows one of the segmentations from
Figure 6.
Figure 7.
Duple gesture segmentation.
Figure 7.
Duple gesture segmentation.
Figure 8.
Both the template and user data graphed together, where the blue curve indicates the template gesture and the orange curve indicates the user gesture.
Figure 8.
Both the template and user data graphed together, where the blue curve indicates the template gesture and the orange curve indicates the user gesture.
Figure 9.
Dynamic warping path marked in red between the template data, marked in blue, and the user data, marked in orange.
Figure 9.
Dynamic warping path marked in red between the template data, marked in blue, and the user data, marked in orange.
Figure 10.
Visual representation of dynamic time warping (DTW) with the optimal warp, where the blue line represents the template and the orange line represents the users’ gesture: (a) duple gesture, (b) triple gesture, and (c) quad gesture.
Figure 10.
Visual representation of dynamic time warping (DTW) with the optimal warp, where the blue line represents the template and the orange line represents the users’ gesture: (a) duple gesture, (b) triple gesture, and (c) quad gesture.
Figure 11.
Two examples of a duple (2/4) gesture.
Figure 11.
Two examples of a duple (2/4) gesture.
Figure 12.
Examples of a triple (3/4) gesture.
Figure 12.
Examples of a triple (3/4) gesture.
Figure 13.
Examples of a quad (4/4) gesture.
Figure 13.
Examples of a quad (4/4) gesture.
Figure 14.
Screenshots of a single musical conducting gesture (duple) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Figure 14.
Screenshots of a single musical conducting gesture (duple) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Figure 15.
Screenshots of a single musical conducting gesture (triple) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Figure 15.
Screenshots of a single musical conducting gesture (triple) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Figure 16.
Screenshots of a single musical conducting gesture (quad) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Figure 16.
Screenshots of a single musical conducting gesture (quad) being made while facing the camera, including the 2D video, a skeleton model of the musical conductor, and the depth video.
Table 1.
Hardware and software used in our system.
Table 1.
Hardware and software used in our system.
Item | Specification |
---|
Hardware | Personal Computer | ● CPU: Intel® Core™ i5-4460 @ 3.2 GHz ● RAM: 8 GB |
Webcam | ● Xbox 360 Kinect |
Software | Operating System | ● Microsoft Windows 7 64-bit |
Developing Tools | ● Visual Studio C++ 2010 ● Kinect for Windows SDK v1.7 |
Table 2.
Accuracy of musical conducting gesture recognition.
Table 2.
Accuracy of musical conducting gesture recognition.
Musical Gesture | Number of Testing Data | Number of Correctly Classified Gestures | Number of Misclassified Gestures | Accuracy |
---|
Duple | 1750 | 1633 | 117 | 93.31% |
Triple | 1750 | 1571 | 179 | 89.77% |
Quad | 1750 | 1458 | 292 | 83.31% |
Table 3.
Accuracy of musical conducting gesture recognition classified by BPM.
Table 3.
Accuracy of musical conducting gesture recognition classified by BPM.
Musical Gestures | Number of Testing Data | Number of Correctly Classified Data | Number of Misclassified Data | Accuracy |
---|
Duple (150 BPM) | 350 | 334 | 16 | 95.43% |
Triple (150 BPM) | 350 | 319 | 31 | 91.14% |
Quad (150 BPM) | 350 | 295 | 55 | 84.29% |
Duple (200 BPM) | 350 | 249 | 101 | 71.14% |
Triple (200 BPM) | 350 | 202 | 148 | 57.71% |
Quad (200 BPM) | 350 | 113 | 237 | 32.29% |
Other | 350 | 316 | 34 | 90.29% |
Table 4.
Accuracy of musical gestures classified by BPM.
Table 4.
Accuracy of musical gestures classified by BPM.
Tempo | Number of Testing Data | Number of Correctly Classified Data | Number of Misclassified Data | Accuracy |
---|
60 BPM | 1050 | 1050 | 0 | 100% |
90 BPM | 1050 | 1050 | 0 | 100% |
120 BPM | 1050 | 1050 | 0 | 100% |
150 BPM | 1050 | 948 | 102 | 90.29% |
200 BPM | 1050 | 564 | 486 | 53.71% |
Table 5.
Accuracy of musical conducting gesture recognition at different viewing angles.
Table 5.
Accuracy of musical conducting gesture recognition at different viewing angles.
Angles | Number of Gestures Made | Number of Correctly Classified Data | Number of Misclassified Data | Accuracy |
---|
45° Upward | 800 | 713 | 87 | 89.13% |
45° Downward | 800 | 716 | 84 | 89.50% |
0° Facing Camera | 800 | 724 | 76 | 90.50% |
−45° Left | 800 | 715 | 85 | 89.38% |
−90° Left | 800 | 702 | 98 | 87.75% |
45° Right | 800 | 710 | 90 | 88.75% |
90° Right | 800 | 698 | 102 | 87.25% |
Table 6.
Mean squared error results.
Table 6.
Mean squared error results.
Speed of the Gesture | Number of Gestures | Mean Squared Error |
---|
60 BPM | 75 | 0.3 BPM |
90 BPM | 97 | 0.4 BPM |
120 BPM | 136 | 0.6 BPM |
150 BPM | 173 | 0.9 BPM |
Table 7.
Accuracy of musical conducting gesture recognition in a whole musical piece.
Table 7.
Accuracy of musical conducting gesture recognition in a whole musical piece.
Musical Piece | Number of Gestures Made | Number of Correctly Classified Data | Number of Misclassified Data | Accuracy |
---|
Canon in D | 94 | 94 | 0 | 100% |
Mozart Symphony No. 40 | 107 | 107 | 0 | 100% |
Hallelujah | 111 | 110 | 1 | 99.10% |
Table 8.
Comparisons of related work with our method.
Table 8.
Comparisons of related work with our method.
Experimentations | H. Je et al. | N. Kawarazaki et al. | S. Cosentino et al. | Our Method |
---|
BPM 50-60 | ✔ | ✔ | ✔ | ✔ |
BPM 61-90 | ✔ | ✔ | ✔ | ✔ |
BPM 91-120 | ✔ | ✔ | ✔ | ✔ |
BPM 121-150 | | | | ✔ |
BPM 150-200 | | | | ✔ |
Duple (2/4) | ✔ | ✔ | | ✔ |
Triple (3/4) | ✔ | ✔ | | ✔ |
Quad (4/4) | ✔ | ✔ | ✔ | ✔ |
Facing Camera | ✔ | ✔ | ✔ | ✔ |
−45° Left | | ✔ | | ✔ |
−90° Left | | ✔ | | ✔ |
45° Right | | ✔ | | ✔ |
90° Right | | ✔ | | ✔ |
45° Upwards | | | | ✔ |
90° Downwards | | | | ✔ |
Depth Camera | ✔ | ✔ | | ✔ |
Inertia Sensors | | ✔ | ✔ | |
Table 9.
Accuracy comparison of related work with our method.
Table 9.
Accuracy comparison of related work with our method.
Method | H. Je et al. | N. Kawarazaki et al. | S. Cosentino et al. | Ours |
---|
Accuracy | 79.75% | 78.00% | 90.10% | 89.17% |