Semi-Supervised Adaptation for Skeletal-Data-Based Human Action Recognition †
Abstract
:1. Introduction
2. Related Work
3. Method
3.1. Data Domain Shift in Skeletal Data
3.2. Semi-Supervised Learning
- In the first stage, the Extremely Augmented Skeleton (EAS) scheme [6] is used to augment the training data with {} to enrich the input space in both spatial and temporal dimensions via eight augmentation operations: spatial shear, spatial flip, axis-wise rotate and mask, temporal flip, temporal crop, Gaussian noise and Gaussian blur.
- In the second stage, the pretrained model is refined on a small number of labeled skeleton data of the target environment. It takes the well-learnt skeletal knowledge-aware GCN encoder from the first stage and fine-tunes the reassembled GCN model over the target domain. This stage is driven by a Cross-Entropy loss:
4. Experiments
4.1. Datasets and Implementations
4.2. Results
4.3. T-SNE Action Clusters Visualization
4.4. Semi-Supervised Learning vs. Supervised Learning
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shahroudy, A.; Jun, L.; Tian-Tsong, N.; Gang, W. NTU RGB+D: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar]
- Liu, C.; Hu, Y.; Li, Y.; Song, S.; Liu, J. PKU-MMD: A large scale benchmark for skeleton-based human action understanding. In Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities, Mountain View, CA, USA, 23 October 2017; pp. 1–8. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Chen, Y.; Zhang, Z.; Yuan, C.; Li, B.; Deng, Y.; Hu, W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13359–13368. [Google Scholar]
- Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J.W. A theory of learning from different domains. Mach. Learn. 2010, 79, 151–175. [Google Scholar] [CrossRef]
- Guo, T.; Liu, H.; Chen, Z.; Liu, M.; Wang, T.; Ding, R. Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 762–770. [Google Scholar]
- Chi, H.G.; Ha, M.H.; Chi, S.; Lee, S.W.; Huang, Q.; Ramani, K. InfoGCN: Representation learning for human skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 20186–20196. [Google Scholar]
- Choi, J.; Sharma, G.; Chandraker, M.; Huang, J.-B. Unsupervised and semi-supervised domain adaptation for action recognition from drones. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1717–1726. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Model | On Benchmark (NTU RGB+D) | On Target Domain (PKU-MMD) |
---|---|---|
Source only | 69.07% | 57.32% |
Adaptation | 34.41% | 75.74% |
Percentage of data use | 5% | 20% | 30% | 50% | 70% | 100% |
Accuracy | 71.60% | 77.33% | 81.03% | 82.25% | 83.28% | 85.06% |
Full Supervision | Semi-Supervision | ||
---|---|---|---|
NTU RGB+D and 10% PKU-MMD (Fine-Tuning) | NTU RGB+D and 10% PKU-MMD (Combined) | NTU RGB+D and 10% PKU-MMD (Fine-Tuning) | |
Accuracy | 45.97% | 62.61% | 75.74% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, H.; Payeur, P. Semi-Supervised Adaptation for Skeletal-Data-Based Human Action Recognition. Eng. Proc. 2023, 58, 25. https://doi.org/10.3390/ecsa-10-16083
Tian H, Payeur P. Semi-Supervised Adaptation for Skeletal-Data-Based Human Action Recognition. Engineering Proceedings. 2023; 58(1):25. https://doi.org/10.3390/ecsa-10-16083
Chicago/Turabian StyleTian, Haitao, and Pierre Payeur. 2023. "Semi-Supervised Adaptation for Skeletal-Data-Based Human Action Recognition" Engineering Proceedings 58, no. 1: 25. https://doi.org/10.3390/ecsa-10-16083
APA StyleTian, H., & Payeur, P. (2023). Semi-Supervised Adaptation for Skeletal-Data-Based Human Action Recognition. Engineering Proceedings, 58(1), 25. https://doi.org/10.3390/ecsa-10-16083