Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information
Abstract
:1. Introduction
2. Literature Review
2.1. Action Recognition System
2.2. Skeleton-Based Action Recognition
3. Proposed Methodology
3.1. Research Proposal
Algorithm 1. Working principle of the proposed method. |
1. Input a sequence of frames (), where n is the length of the video. 2. Calculate the difference between the adjacent frames and separately for X, Y, and Z axes, yielding a new sequence of frames with a length of ; (). 3. Make sure each training and testing dataset have frames in which half of the frames are overlapped. 4. Classify them for the desired group and repeat step 1 to 4. 5. Finish. |
3.2. Skeleton Joints Analysis and Preprocessing
3.3. Architecture of the Proposed System
4. Experimental Results
4.1. Experimental Setup
4.1.1. Dataset
4.1.2. System Specification
4.2. Performance Evaluation
4.3. State-of-the-Art Comparisons
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Köpüklü, O.; Gunduz, A.; Kose, N.; Rigoll, G. Real-time hand gesture detection and classification using convolutional neural networks. In Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, Lille, France, 14–18 May 2019; pp. 1–8. [Google Scholar]
- Molchanov, P.; Yang, X.; Gupta, S.; Kim, K.; Tyree, S.; Kautz, J. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4207–4215. [Google Scholar]
- Zou, Q.; Wang, Y.; Wang, Q.; Zhao, Y.; Li, Q. Deep Learning-Based Gait Recognition Using Smartphones in the Wild. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3197–3212. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Huang, Y.; Wang, L.; Wang, X.; Tan, T. A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 209–226. [Google Scholar] [CrossRef] [PubMed]
- Farooq, A.; Won, C.S. A survey of human action recognition approaches that use an RGB-D sensor. IEIE Trans. Smart Process. Comput. 2015, 4, 281–290. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. Action recognition from depth sequences using depth motion maps-based local binary patterns. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2015; pp. 1092–1099. [Google Scholar]
- Trelinski, J.; Kwolek, B. Convolutional Neural Network-Based Action Recognition on Depth Maps. In Proceedings of the International Conference on Computer Vision and Graphics, Warsaw, Poland, 17–19 September 2018; pp. 209–221. [Google Scholar]
- Wang, P.; Li, W.; Gao, Z.; Zhang, J.; Tang, C.; Ogunbona, P.O. Action recognition from depth maps using deep convolutional neural networks. IEEE Trans. Hum. Mach. Syst. 2015, 46, 498–509. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in video. Adv. Neural Inf. Process. Syst. 2014, 1, 568–576. [Google Scholar]
- Dollar, P.; Rabaud, V.; Cottrell, G.; Belongie, S. Behavior recognition via sparse spatio-temporal features. In Proceedings of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, 15–16 October 2005; pp. 65–72. [Google Scholar]
- Wu, D.; Shao, L. Silhouette analysis-based action recognition via exploiting human poses. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 236–243. [Google Scholar] [CrossRef]
- Ahmad, M.; Lee, S.W. HMM-based human action recognition using multiview image sequences. In Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, 20–24 August 2006; pp. 263–266. [Google Scholar]
- Xia, L.; Chen, C.C.; Aggarwal, J.K. View invariant human action recognition using histograms of 3d joints. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 20–27. [Google Scholar]
- Luo, J.; Wang, W.; Qi, H. Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recognit. Lett. 2014, 50, 139–148. [Google Scholar] [CrossRef]
- Megavannan, V.; Agarwal, B.; Babu, R.V. Human action recognition using depth maps. In Proceedings of the IEEE International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, 22–25 July 2012; pp. 1–5. [Google Scholar]
- Imran, J.; Raman, B. Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Humaniz. Comput. 2020, 11, 189–208. [Google Scholar] [CrossRef]
- Li, C.; Zhong, Q.; Xie, D.; Pu, S. Skeleton-based action recognition with convolutional neural networks. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017; pp. 597–600. [Google Scholar]
- Du, Y.; Fu, Y.; Wang, L. Skeleton based action recognition with convolutional neural network. In Proceedings of the IEEE 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 579–583. [Google Scholar]
- Chen, Y.; Wang, L.; Li, C.; Hou, Y.; Li, W. ConvNets-based action recognition from skeleton motion maps. Multimed. Tools Appl. 2020, 79, 1707–1725. [Google Scholar] [CrossRef]
- Li, C.; Hou, Y.; Wang, P.; Li, W. Joint distance maps-based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 2017, 24, 624–628. [Google Scholar] [CrossRef] [Green Version]
- Hou, Y.; Li, Z.; Wang, P.; Li, W. Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 807–811. [Google Scholar] [CrossRef]
- Wang, P.; Li, P.; Hou, Y.; Li, W. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 24th ACM international conference on ACM Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 102–106. [Google Scholar]
- Rashmi, M.; Guddeti, R.M.R. Skeleton based Human Action Recognition for Smart City Application using Deep Learning. In Proceedings of the International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 7–11 January 2020; pp. 756–761. [Google Scholar]
- Huynh-The, T.; Hua, C.H.; Ngo, T.T.; Kim, D.S. Image representation of pose-transition feature for 3D skeleton-based action recognition. Inf. Sci. 2020, 513, 112–126. [Google Scholar] [CrossRef]
- Si, C.; Jing, Y.; Wang, W.; Wang, L.; Tan, T. Skeleton-Based Action Recognition with Hierarchical Spatial Reasoning and Temporal Stack Learning Network. Pattern Recognit. 2020, 107, 107511. [Google Scholar] [CrossRef]
- Li, Y.; Xia, R.; Liu, X. Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recognit. 2020, 103, 107293. [Google Scholar] [CrossRef]
- Yang, Y.; Deng, C.; Gao, S.; Liu, W.; Tao, D.; Gao, X. Discriminative multi-instance multitask learning for 3D action recognition. IEEE Trans. Multimed. 2017, 19, 519–529. [Google Scholar] [CrossRef]
- Yang, X.; Tian, Y. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Zanfir, M.; Leordeanu, M.; Sminchisescu, C. The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013. [Google Scholar]
- Straka, M.; Hauswiesner, S.; Rüther, M.; Bischof, H. Skeletal Graph Based Human Pose Estimation in Real-Time. In BMVC; Graz University of Technology: Graz, Austria, 2011; pp. 1–12. [Google Scholar]
- Sapiński, T.; Kamińska, D.; Pelikant, A.; Anbarjafari, G. Emotion recognition from skeletal movements. Entropy 2019, 21, 646. [Google Scholar] [CrossRef] [Green Version]
- Filntisis, P.P.; Efthymiou, N.; Koutras, P.; Potamianos, G.; Maragos, P. Fusing Body Posture With Facial Expressions for Joint Recognition of Affect in Child–Robot Interaction. IEEE Robot. Autom. Lett. 2019, 4, 4011–4018. [Google Scholar] [CrossRef] [Green Version]
- Raptis, M.; Kirovski, D.; Hoppe, H. Real-time classification of dance gestures from skeleton animation. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Vancouver, BC, Canada, 5 August 2011; pp. 147–156. [Google Scholar]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor. In Proceedings of the IEEE International Conference on Image Processing, Quebec City, QC, Canada, 27 September 2015; pp. 168–172. [Google Scholar]
- Li, W.; Zhang, Z.; Liu, Z. Action recognition based on a bag of 3D points. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 9–14. [Google Scholar]
Hardware Requirements | Software Requirements |
---|---|
Intel Core i7 | Ubuntu 16.04, MS Office |
NVIDIA Graphics Card | MATLAB2019b |
Microsoft Kinect | Python2.7 |
Parameters | Values |
---|---|
Clip length | 20, 22, …, 32 |
Number of epochs | 20 |
Initial learning rate | 0.001 |
LearningRateDropFactor | 0.5 |
LearningRateDropPeriod | 5 |
Clip Length | UTD-MHAD | MSR-Action3D |
---|---|---|
20 | 87.42% | 89.62% |
22 | 89.21% | 91.96% |
24 | 90.86% | 93.44% |
26 | 91.84% | 93.71% |
28 | 93.33% | 94.84% |
30 | 94.58% | 95.06% |
32 | 95.01% | 95.36% |
Methods | Accuracy |
---|---|
Ours | 95.01% |
ResNet18 | 95.88% |
MobileNetV2 | 93.42% |
Ref. [19] | 93.26% |
Ref. [20] | 88.10% |
Ref. [21] | 86.97% |
Ref. [22] | 85.81% |
Methods | Accuracy |
---|---|
Ours | 95.36% |
ResNet18 | 96.28% |
MobileNetV2 | 93.69% |
Ref. [27] | 93.63% |
Ref. [28] | 93.09% |
Ref. [29] | 91.70% |
Model | Parameters (M) | FLOPs (M) |
---|---|---|
ResNet18 | 11.69 | 31.6 |
MobileNetV2 | 3.505 | 8.3 |
Ours | 0.171 | 2.6 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tasnim, N.; Islam, M.M.; Baek, J.-H. Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information. Inventions 2020, 5, 49. https://doi.org/10.3390/inventions5030049
Tasnim N, Islam MM, Baek J-H. Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information. Inventions. 2020; 5(3):49. https://doi.org/10.3390/inventions5030049
Chicago/Turabian StyleTasnim, Nusrat, Md. Mahbubul Islam, and Joong-Hwan Baek. 2020. "Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information" Inventions 5, no. 3: 49. https://doi.org/10.3390/inventions5030049
APA StyleTasnim, N., Islam, M. M., & Baek, J. -H. (2020). Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information. Inventions, 5(3), 49. https://doi.org/10.3390/inventions5030049