Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition
Abstract
:1. Introduction
- We propose a hierarchical approach to partition the skeleton topology into multiple parts at two different scales. This method enables the exploration of movement patterns for various body parts, as well as their interrelationships during motion.
- We propose a part-wise adaptive topology graph convolution design that leverages data-driven methods to obtain an adaptive topology and extract discriminative features.
- The extensive experimental results highlight the benefits of the part-wise adaptive topology graph convolution. Our proposed PAT-GCN outperforms state-of-the-art methods on three different, skeleton-based action recognition benchmarks.
2. Related Work
2.1. Graph Convolutional Network
2.2. Gcn-Based Skeleton Action Recognition
2.3. Part-Based Skeleton Action Recognition
3. Methods
3.1. Part-Wise Graph Convolution
3.2. Adaptive Topology Graph Convolution
3.2.1. Feature Transformation
3.2.2. Adaptive Topology
3.3. Network Architecture
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Ablation Study
4.4. Comparisons with the State-of-the-Art Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; Volume 27. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1933–1941. [Google Scholar]
- Tu, Z.; Li, H.; Zhang, D.; Dauwels, J.; Li, B.; Yuan, J. Action-stage emphasized spatiotemporal VLAD for video action recognition. IEEE Trans. Image Process. 2019, 28, 2799–2812. [Google Scholar] [CrossRef] [PubMed]
- Tu, Z.; Xie, W.; Dauwels, J.; Li, B.; Yuan, J. Semantic cues enhanced multimodality multistream CNN for action recognition. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 1423–1437. [Google Scholar] [CrossRef]
- Thakkar, K.; Narayanan, P. Part-based graph convolutional network for action recognition. arXiv 2018, arXiv:1809.04983. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; Volume 29. [Google Scholar]
- Li, R.; Wang, S.; Zhu, F.; Huang, J. Adaptive graph convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–3 February 2018; Volume 32. [Google Scholar]
- Welling, M.; Kipf, T.N. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
- Ying, Z.; You, J.; Morris, C.; Ren, X.; Hamilton, W.; Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018; Volume 31. [Google Scholar]
- Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6861–6871. [Google Scholar]
- Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; Ver Steeg, G.; Galstyan, A. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 21–29. [Google Scholar]
- Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Wang, J.; Liu, Z.; Wu, Y.; Yuan, J. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2012; pp. 1290–1297. [Google Scholar]
- Zhang, P.; Lan, C.; Xing, J.; Zeng, W.; Xue, J.; Zheng, N. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2117–2126. [Google Scholar]
- Li, C.; Xie, C.; Zhang, B.; Han, J.; Zhen, X.; Chen, J. Memory attention networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4800–4814. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; He, Z.; Ye, X.; He, Z.; Han, K. Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 2019, 2019, 78. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 12026–12035. [Google Scholar]
- Liu, Z.; Zhang, H.; Chen, Z.; Wang, Z.; Ouyang, W. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 143–152. [Google Scholar]
- Cheng, K.; Zhang, Y.; He, X.; Chen, W.; Cheng, J.; Lu, H. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 183–192. [Google Scholar]
- Shah, A.; Mishra, S.; Bansal, A.; Chen, J.C.; Chellappa, R.; Shrivastava, A. Pose and joint-aware action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 3850–3860. [Google Scholar]
- Oikonomou, K.M.A.I.; Manaveli, P.; Grekidis, A.; Menychtas, D.; Aggelousis, N.; Sirakoulis, G.C.; Gasteratos, A. Joint-Aware Action Recognition for Ambient Assisted Living. In Proceedings of the 2022 IEEE International Conference on Imaging Systems and Techniques (IST), Kaohsiung, Taiwan, 21–23 June 2022; pp. 1–6. [Google Scholar]
- Li, J.; Liu, X.; Zhang, W.; Zhang, M.; Song, J.; Sebe, N. Spatio-temporal attention networks for action recognition and detection. IEEE Trans. Multimed. 2020, 22, 2990–3001. [Google Scholar] [CrossRef]
- Santavas, N.; Kansizoglou, I.; Bampis, L.; Karakasis, E.; Gasteratos, A. Attention! A lightweight 2d hand pose estimation approach. IEEE Sens. J. 2020, 21, 11488–11496. [Google Scholar] [CrossRef]
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1110–1118. [Google Scholar]
- Si, C.; Jing, Y.; Wang, W.; Wang, L.; Tan, T. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 103–118. [Google Scholar]
- Song, Y.F.; Zhang, Z.; Shan, C.; Wang, L. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1625–1633. [Google Scholar]
- Wang, M.; Ni, B.; Yang, X. Learning multi-view interactional skeleton graph for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1. [Google Scholar] [CrossRef] [PubMed]
- Gou, R.; Yang, W.; Luo, Z.; Yuan, Y.; Li, A. Tohjm-Trained Multiscale Spatial Temporal Graph Convolutional Neural Network for Semi-Supervised Skeletal Action Recognition. Electronics 2022, 11, 3498. [Google Scholar] [CrossRef]
- Dang, L.; Nie, Y.; Long, C.; Zhang, Q.; Li, G. Msr-gcn: Multi-scale residual graph convolution networks for human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11467–11476. [Google Scholar]
- Yan, Z.; Zhai, D.H.; Xia, Y. DMS-GCN: Dynamic mutiscale spatiotemporal graph convolutional networks for human motion prediction. arXiv 2021, arXiv:2112.10365. [Google Scholar]
- Chen, T.; Zhou, D.; Wang, J.; Wang, S.; Guan, Y.; He, X.; Ding, E. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4334–4342. [Google Scholar]
- Chen, Z.; Li, S.; Yang, B.; Li, Q.; Liu, H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 1113–1122. [Google Scholar]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7912–7921. [Google Scholar]
- Liu, J.; Shahroudy, A.; Perez, M.; Wang, G.; Duan, L.Y.; Kot, A.C. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2684–2701. [Google Scholar] [CrossRef] [PubMed]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Ye, F.; Pu, S.; Zhong, Q.; Li, C.; Xie, D.; Tang, H. Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 55–63. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3595–3603. [Google Scholar]
- Zhang, P.; Lan, C.; Zeng, W.; Xing, J.; Xue, J.; Zheng, N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1112–1121. [Google Scholar]
- Xu, K.; Ye, F.; Zhong, Q.; Xie, D. Topology-aware convolutional neural network for efficient skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2022; Volume 36, pp. 2866–2874. [Google Scholar]
- Song, Y.F.; Zhang, Z.; Shan, C.; Wang, L. Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1474–1488. [Google Scholar] [CrossRef] [PubMed]
Design | #Variant | Hierarchical | Acc (%) |
---|---|---|---|
PAT-GCN | 1 | ✔ | 89.9 |
PAT-GCN | 2 | ✔ | 90.4 |
PAT-GCN | 4 | ✔ | 90.3 |
w/o D | 2 | ✔ | 89.6 |
w/o A | 2 | ✔ | 89.3 |
w/o fine-scale | 2 | ✔ | 90.0 |
w/o coarse-scale | 2 | ✔ | 89.5 |
1 PAT-GC | 2 | × | 90.1 |
2 PAT-GC | 2 | × | 90.4 |
Methods | r | Acc (%) | |
---|---|---|---|
Baseline | - | - | 88.1 |
A | 4 | Tanh | 90.2 |
B | 8 | Tanh | 90.4 |
C | 16 | Tanh | 90.0 |
D | 8 | Sigmoid | 89.9 |
E | 8 | ReLU | 90.2 |
Methods | NTU RGB + D 60 | |
---|---|---|
X-Sub (%) | X-View (%) | |
ST-GCN (2018) [38] | 81.5 | 88.3 |
AS-GCN (2019) [39] | 86.8 | 94.2 |
2s-AGCN (2019) [18] | 88.5 | 95.1 |
DGNN (2019) [34] | 89.9 | 96.1 |
SGN (2020) [40] | 89.0 | 94.5 |
Shift-GCN (2020) [20] | 90.7 | 96.5 |
Dynamic GCN (2020) [37] | 91.5 | 96.0 |
MS-G3D (2020) [19] | 91.5 | 96.2 |
MST-GCN (2021) [33] | 91.5 | 96.6 |
4s DualHead-Net (2021) [32] | 92.0 | 96.6 |
Ta-CNN (2022) [41] | 90.4 | 94.8 |
Ta-CNN+ (2022) [41] | 90.7 | 95.1 |
EfficientGCN-B4 (2022) [42] | 91.7 | 95.7 |
PAT-GCN (Joint Only) | 90.4 | 95.1 |
PAT-GCN (Joint+Bone) | 92.2 | 96.4 |
PAT-GCN | 92.7 | 97.1 |
Methods | Param. (M) | NTU RGB + D 120 | |
---|---|---|---|
X-Sub (%) | X-Set (%) | ||
2s-AGCN (2019) [18] | 6.9 | 82.9 | 84.9 |
SGN (2020) [40] | 1.8 | 79.2 | 81.5 |
Shift-GCN (2020) [20] | 2.7 | 85.9 | 87.6 |
MS-G3D (2020) [19] | 6.4 | 86.9 | 88.4 |
Dynamic GCN (2020) [37] | 14.4 | 87.3 | 88.6 |
MST-GCN (2021) [33] | 12.0 | 87.5 | 88.8 |
4s DualHead-Net (2021) [32] | 12.0 | 88.2 | 89.3 |
Ta-CNN+ (2022) [41] | 4.4 | 85.7 | 87.3 |
EfficientGCN-B4 (2022) [42] | 1.1 | 88.4 | 89.1 |
PAT-GCN | 5.4 | 89.2 | 90.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Zou, L.; Fan, C.; Chi, R. Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition. Electronics 2023, 12, 1992. https://doi.org/10.3390/electronics12091992
Wang J, Zou L, Fan C, Chi R. Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition. Electronics. 2023; 12(9):1992. https://doi.org/10.3390/electronics12091992
Chicago/Turabian StyleWang, Jiale, Lian Zou, Cien Fan, and Ruan Chi. 2023. "Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition" Electronics 12, no. 9: 1992. https://doi.org/10.3390/electronics12091992
APA StyleWang, J., Zou, L., Fan, C., & Chi, R. (2023). Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition. Electronics, 12(9), 1992. https://doi.org/10.3390/electronics12091992