Human Motion Prediction via Dual-Attention and Multi-Granularity Temporal Convolutional Networks
Abstract
:1. Introduction
2. Related Work
2.1. Human Motion Prediction
2.2. Temporal Convolutional Networks
2.3. Multi-Granularity Convolution
2.4. Attention Mechanisms
3. Approach
3.1. Problem Formulation
3.2. Overview
3.3. Dual Attention (DA)
3.4. Multi-Granularity TCN (MgTCN)
3.5. Global and Local Residual Connection
3.6. Loss Function
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics and Baselines
4.4. Experimental Results and Analysis
4.5. Ablation Study
4.6. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DA | Dual attention |
TCNs | Temporal convolutional networks |
MgTCN | Multi-granularity TCN |
GNNs | Graph neural networks |
References
- Chen, S.; Liu, B.; Feng, C.; Vallespi-Gonzalez, C.; Wellington, C. 3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Process. Mag. 2020, 38, 68–86. [Google Scholar] [CrossRef]
- Gui, L.Y.; Zhang, K.; Wang, Y.X.; Liang, X.; Moura, J.M.; Veloso, M. Teaching robots to predict human motion. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 562–567. [Google Scholar]
- Koppula, H.S.; Saxena, A. Anticipating human activities using object affordances for reactive robotic response. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 14–29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sheng, W.; Li, X. Multi-task learning for gait-based identity recognition and emotion recognition using attention enhanced temporal graph convolutional network. Pattern Recognit. 2021, 114, 107868. [Google Scholar] [CrossRef]
- Kong, Y.; Wei, Z.; Huang, S. Automatic analysis of complex athlete techniques in broadcast taekwondo video. Multimed. Tools Appl. 2018, 77, 13643–13660. [Google Scholar] [CrossRef]
- Dong, Y.; Li, X.; Dezert, J.; Zhou, R.; Zhu, C.; Wei, L.; Ge, S.S. Evidential reasoning with hesitant fuzzy belief structures for human activity recognition. IEEE Trans. Fuzzy Syst. 2021, 29, 3607–3619. [Google Scholar] [CrossRef]
- Dong, Y.; Li, X.; Dezert, J.; Zhou, R.; Zhu, C.; Cao, L.; Khyam, M.O.; Ge, S.S. Multi-Source Weighted Domain Adaptation With Evidential Reasoning for Activity Recognition. IEEE Trans. Ind. Inform. 2022, 19, 5530–5542. [Google Scholar] [CrossRef]
- Lehrmann, A.M.; Gehler, P.V.; Nowozin, S. Efficient nonlinear Markov models for human motion. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1314–1321. [Google Scholar]
- Wang, J.M.; Fleet, D.J.; Hertzmann, A. Gaussian Process Dynamical Models for Human Motion. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 283–298. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Taylor, G.W.; Hinton, G.E.; Roweis, S. Modeling human motion using binary latent variables. Adv. Neural Inf. Process. Syst. 2006, 19, 1345–1352. [Google Scholar]
- Li, C.; Zhang, Z.; Lee, W.S.; Lee, G.H. Convolutional sequence to sequence model for human dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2275–2284. [Google Scholar]
- Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3316–3333. [Google Scholar] [CrossRef]
- Li, M.; Chen, S.; Zhao, Y.; Zhang, Y.; Wang, Y.; Tian, Q. Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 214–223. [Google Scholar]
- Zhong, C.; Hu, L.; Zhang, Z.; Ye, Y.; Xia, S. Spatio-Temporal Gating-Adjacency GCN For Human Motion Prediction. In Proceedings of the Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6447–6456. [Google Scholar]
- Fragkiadaki, K.; Levine, S.; Felsen, P.; Malik, J. Recurrent network models for human dynamics. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), IEEE Computer Society, Santiago, Chile, 7–13 December 2015; pp. 4346–4354. [Google Scholar]
- Jain, A.; Zamir, A.R.; Savarese, S.; Saxena, A. Structural-rnn: Deep learning on spatio-temporal graphs. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5308–5317. [Google Scholar]
- Martinez, J.; Black, M.J.; Romero, J. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2891–2900. [Google Scholar]
- Liu, Z.; Wu, S.; Jin, S.; Liu, Q.; Lu, S.; Zimmermann, R.; Cheng, L. Towards natural and accurate future motion prediction of humans and animals. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10004–10012. [Google Scholar]
- Shu, X.; Zhang, L.; Qi, G.J.; Liu, W.; Tang, J. Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3300–3315. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, S.; Jin, S.; Ji, S.; Liu, Q.; Lu, S.; Cheng, L. Investigating pose representations and motion contexts modeling for 3D motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 681–697. [Google Scholar] [CrossRef]
- Lebailly, T.; Kiciroglu, S.; Salzmann, M.; Fua, P.; Wang, W. Motion prediction using temporal inception module. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Cui, Q.; Sun, H.; Kong, Y.; Zhang, X.; Li, Y. Efficient human motion prediction using temporal convolutional generative adversarial network. Inf. Sci. 2021, 545, 427–447. [Google Scholar] [CrossRef]
- Mao, W.; Liu, M.; Salzmann, M.; Li, H. Multi-level motion attention for human motion prediction. Int. J. Comput. Vis. 2021, 129, 2513–2535. [Google Scholar] [CrossRef]
- Medjaouri, O.; Desai, K. Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2540–2549. [Google Scholar]
- Mao, W.; Liu, M.; Salzmann, M.; Li, H. Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 4317–4326. [Google Scholar]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Decoupled spatial-temporal attention network for skeleton-based action recognition. arXiv 2020, arXiv:2007.03263. [Google Scholar]
- Aksan, E.; Kaufmann, M.; Cao, P.; Hilliges, O. A spatio-temporal transformer for 3d human motion prediction. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 565–574. [Google Scholar]
- Li, Y.; Wang, Z.; Yang, X.; Wang, M.; Poiana, S.I.; Chaudhry, E.; Zhang, J. Efficient convolutional hierarchical autoencoder for human motion prediction. Vis. Comput. 2019, 35, 1143–1156. [Google Scholar] [CrossRef] [Green Version]
- Chiu, H.K.; Adeli, E.; Wang, B.; Huang, D.A.; Niebles, J.C. Action-agnostic human pose forecasting. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2018; pp. 1423–1432. [Google Scholar]
- Guo, X.; Choi, J. Human motion prediction via learning local structure representations and temporal dependencies. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 2580–2587. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]
- Farha, Y.A.; Gall, J. Ms-tcn: Multi-stage temporal convolutional network for action segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3575–3584. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the International conference on machine learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- Pavllo, D.; Feichtenhofer, C.; Grangier, D.; Auli, M. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7753–7762. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Reis, M.S. Multiscale and multi-granularity process analytics: A review. Processes 2019, 7, 61. [Google Scholar] [CrossRef] [Green Version]
- Yang, B.; Yang, J.; Ni, R.; Yang, C.; Liu, X. Multi-granularity scenarios understanding network for trajectory prediction. Complex Intell. Syst. 2023, 9, 851–864. [Google Scholar] [CrossRef]
- Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. arXiv 2015, arXiv:1506.07503. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Xu, Y.; Yu, L.; Xu, H.; Zhang, H.; Nguyen, T. Vector sparse representation of color image using quaternion matrix analysis. IEEE Trans. Image Process. 2015, 24, 1315–1329. [Google Scholar] [CrossRef] [PubMed]
- Tang, Y.; Ma, L.; Liu, W.; Zheng, W. Long-term human motion prediction by modeling motion context and enhancing motion dynamic. arXiv 2018, arXiv:1805.02513. [Google Scholar]
- Cai, Y.; Huang, L.; Wang, Y.; Cham, T.J.; Cai, J.; Yuan, J.; Liu, J.; Yang, X.; Zhu, Y.; Shen, X.; et al. Learning progressive joint propagation for human motion prediction. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 226–242. [Google Scholar]
- Ma, T.; Nie, Y.; Long, C.; Zhang, Q.; Li, G. Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction. In Proceedings of the Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1325–1339. [Google Scholar] [CrossRef]
- Mao, W.; Liu, M.; Salzmann, M. History Repeats Itself: Human Motion Prediction via Motion Attention. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 474–489. [Google Scholar]
- Dang, L.; Nie, Y.; Long, C.; Zhang, Q.; Li, G. MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction. In Proceedings of the International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 11467–11476. [Google Scholar]
Hyperparameter/Config | Value |
---|---|
Optimizer | AdamW |
Base learning rate | |
Weight decay | |
Optimizer momentum | |
Batch size | 16 |
Warmup epochs | 5 |
Epochs | 60 |
Layer | 10 |
Time (ms) | 80 | 160 | 320 | 400 | 560 | 1000 | 80 | 160 | 320 | 400 | 560 | 1000 |
Action | Walking | Eating | ||||||||||
Res. sup. [17] | 29.4 | 50.8 | 76.0 | 81.5 | 81.7 | 100.7 | 16.8 | 30.6 | 56.9 | 68.7 | 79.9 | 100.2 |
convSeq2Seq [11] | 17.7 | 33.5 | 56.3 | 63.6 | 72.2 | 82.3 | 11.0 | 22.4 | 40.7 | 48.4 | 61.3 | 87.1 |
DMGNN [13] | 17.3 | 30.7 | 54.6 | 65.2 | 73.4 | 95.8 | 11.0 | 21.4 | 36.2 | 43.9 | 58.1 | 86.7 |
LTD [25] | 12.3 | 23.0 | 39.8 | 46.1 | 54.1 | 59.8 | 8.4 | 16.9 | 33.2 | 40.7 | 53.4 | 77.8 |
MSR [49] | 12.2 | 22.7 | 38.6 | 45.2 | 52.7 | 63.0 | 8.4 | 17.1 | 33.0 | 40.4 | 52.5 | 77.1 |
Hisrep [48] | 10.0 | 19.5 | 34.2 | 39.8 | 47.4 | 58.1 | 6.4 | 14.0 | 28.7 | 36.2 | 50.0 | 75.7 |
ST-DGCN [45] | 10.2 | 19.8 | 34.5 | 40.3 | 48.1 | 56.4 | 7.0 | 15.1 | 30.6 | 38.1 | 51.1 | 76.0 |
Our model | 10.1 | 19.2 | 33.8 | 40.2 | 46.1 | 55.4 | 7.0 | 14.3 | 30.2 | 38.5 | 48.9 | 72.6 |
Action | Smoking | Discussion | ||||||||||
Res. sup. [17] | 23.0 | 42.6 | 70.1 | 82.7 | 94.8 | 137.4 | 32.9 | 61.2 | 90.9 | 96.2 | 121.3 | 161.7 |
convSeq2Seq [11] | 11.6 | 22.8 | 41.3 | 48.9 | 60.0 | 81.7 | 17.1 | 34.5 | 64.8 | 77.6 | 98.1 | 129.3 |
DMGNN [13] | 9.0 | 17.6 | 32.1 | 40.3 | 50.9 | 72.2 | 17.3 | 34.8 | 61.0 | 69.8 | 81.9 | 138.3 |
LTD [25] | 7.9 | 16.2 | 31.9 | 38.9 | 50.7 | 72.6 | 12.5 | 27.4 | 58.5 | 71.7 | 91.6 | 121.5 |
MSR [49] | 8.0 | 16.3 | 31.3 | 38.2 | 49.5 | 71.6 | 12.0 | 26.8 | 57.1 | 69.7 | 88.6 | 117.6 |
Hisrep [48] | 7.0 | 14.9 | 29.9 | 36.4 | 47.6 | 69.5 | 10.2 | 23.4 | 52.1 | 65.4 | 86.6 | 119.8 |
ST-DGCN [45] | 6.6 | 14.1 | 28.2 | 34.7 | 46.5 | 69.5 | 10.0 | 23.8 | 53.6 | 66.7 | 87.1 | 118.2 |
ours | 6.5 | 14.6 | 28.0 | 33.8 | 46.1 | 66.7 | 9.8 | 24.2 | 54.5 | 65.1 | 83.1 | 114.8 |
Action | Directions | Greeting | ||||||||||
Res. sup. [17] | 35.4 | 57.3 | 76.3 | 87.7 | 110.1 | 152.5 | 34.5 | 63.4 | 124.6 | 142.5 | 156.1 | 166.5 |
convSeq2Seq [11] | 13.5 | 29.0 | 57.6 | 69.7 | 86.6 | 115.8 | 22.0 | 45.0 | 82.0 | 96.0 | 116.9 | 147.3 |
DMGNN [13] | 13.1 | 24.6 | 64.7 | 81.9 | 110.1 | 115.8 | 23.3 | 50.3 | 107.3 | 132.1 | 152.5 | 157.7 |
LTD [25] | 9.0 | 19.9 | 43.4 | 53.7 | 71.0 | 101.8 | 18.7 | 38.7 | 77.7 | 93.4 | 115.4 | 148.8 |
MSR [49] | 8.6 | 19.7 | 43.3 | 53.8 | 71.2 | 100.6 | 16.5 | 37.0 | 77.3 | 93.4 | 116.3 | 147.2 |
Hisrep [48] | 7.4 | 18.4 | 44.5 | 56.5 | 73.9 | 106.5 | 13.7 | 30.1 | 63.8 | 78.1 | 101.9 | 138.8 |
ST-DGCN [45] | 7.2 | 17.6 | 40.9 | 51.5 | 69.3 | 100.4 | 15.2 | 34.1 | 71.6 | 87.1 | 110.2 | 143.5 |
Our model | 6.9 | 17.0 | 40.7 | 49.0 | 68.0 | 98.5 | 14.6 | 33.3 | 68.5 | 86.4 | 112.0 | 135.9 |
Action | Phoning | Posing | ||||||||||
Res. sup. [17] | 38.0 | 69.3 | 115.0 | 126.7 | 141.2 | 131.5 | 36.1 | 69.1 | 130.5 | 157.1 | 194.7 | 240.2 |
convSeq2Seq [11] | 13.5 | 26.6 | 49.9 | 59.9 | 77.1 | 114.0 | 16.9 | 36.7 | 75.7 | 92.9 | 122.5 | 187.4 |
DMGNN [13] | 12.5 | 25.8 | 48.1 | 58.3 | 78.9 | 98.6 | 15.3 | 29.3 | 71.5 | 96.7 | 163.9 | 310.1 |
LTD [25] | 10.2 | 21.0 | 42.5 | 52.3 | 69.2 | 103.1 | 13.7 | 29.9 | 66.6 | 84.1 | 114.5 | 173.0 |
MSR citedang2021msr | 10.1 | 20.7 | 41.5 | 51.3 | 68.3 | 104.4 | 12.8 | 29.4 | 67.0 | 85.0 | 116.3 | 174.3 |
Hisrep [48] | 8.6 | 18.3 | 39.0 | 49.2 | 67.4 | 105.0 | 10.2 | 24.2 | 58.5 | 75.8 | 107.6 | 178.2 |
ST-DGCN [45] | 8.3 | 18.3 | 38.7 | 48.4 | 65.9 | 102.7 | 10.7 | 25.7 | 60.0 | 76.6 | 106.1 | 164.8 |
Our model | 8.3 | 18.1 | 39.2 | 47.9 | 64.5 | 95.6 | 10.4 | 25.4 | 60.5 | 74.8 | 103.2 | 162.2 |
Time (ms) | 80 | 160 | 320 | 400 | 560 | 1000 | 80 | 160 | 320 | 400 | 560 | 1000 |
Action | Purchases | Sitting | ||||||||||
Res. sup. [17] | 36.3 | 60.3 | 86.5 | 95.9 | 122.7 | 160.3 | 42.6 | 81.4 | 134.7 | 151.8 | 167.4 | 201.5 |
convSeq2Seq [11] | 20.3 | 41.8 | 76.5 | 89.9 | 111.3 | 151.5 | 13.5 | 27.0 | 52.0 | 63.1 | 82.4 | 120.7 |
DMGNN [13] | 21.4 | 38.7 | 75.7 | 92.7 | 118.6 | 153.8 | 11.9 | 25.1 | 44.6 | 50.2 | 60.1 | 104.9 |
LTD [25] | 15.6 | 32.8 | 65.7 | 79.3 | 102.0 | 143.5 | 10.6 | 21.9 | 46.3 | 57.9 | 78.3 | 119.7 |
MSR [49] | 14.8 | 32.4 | 66.1 | 79.6 | 101.6 | 139.2 | 10.5 | 22.0 | 46.3 | 57.8 | 78.2 | 120.0 |
Hisrep [48] | 13.0 | 29.2 | 60.4 | 73.9 | 95.6 | 134.2 | 9.3 | 20.1 | 44.3 | 56.0 | 76.4 | 115.9 |
ST-DGCN [45] | 12.5 | 28.7 | 60.1 | 73.3 | 95.3 | 133.3 | 8.8 | 19.2 | 42.4 | 53.8 | 74.4 | 116.1 |
Our model | 12.6 | 29.1 | 59.0 | 72.4 | 91.6 | 128.3 | 8.4 | 18.5 | 40.4 | 52.9 | 72.0 | 113.6 |
Action | Sitting Down | Taking Photo | ||||||||||
Res. sup. [17] | 47.3 | 86.0 | 145.8 | 168.9 | 205.3 | 277.6 | 26.1 | 47.6 | 81.4 | 94.7 | 117.0 | 143.2 |
convSeq2Seq [11] | 20.7 | 40.6 | 70.4 | 82.7 | 106.5 | 150.3 | 12.7 | 26.0 | 52.1 | 63.6 | 84.4 | 128.1 |
DMGNN [13] | 15.0 | 32.9 | 77.1 | 93.0 | 122.1 | 168.8 | 13.6 | 29.0 | 46.0 | 58.8 | 91.6 | 120.7 |
LTD [25] | 16.1 | 31.1 | 61.5 | 75.5 | 100.0 | 150.2 | 9.9 | 20.9 | 45.0 | 56.6 | 77.4 | 119.8 |
MSR [49] | 16.1 | 31.6 | 62.5 | 76.8 | 102.8 | 155.5 | 9.9 | 21.0 | 44.6 | 56.3 | 77.9 | 121.9 |
Hisrep [48] | 14.9 | 30.7 | 59.1 | 72.0 | 97.0 | 143.6 | 8.3 | 18.4 | 40.7 | 51.5 | 72.1 | 115.9 |
ST-DGCN [45] | 13.9 | 27.9 | 57.4 | 71.5 | 96.7 | 147.8 | 8.4 | 18.9 | 42.0 | 53.3 | 74.3 | 118.6 |
Our model | 13.8 | 27.0 | 58.1 | 72.2 | 95.7 | 143.7 | 8.2 | 18.1 | 40.6 | 51.2 | 70.9 | 117.1 |
Action | Waiting | Walking Dog | ||||||||||
Res. sup. [17] | 30.6 | 57.8 | 106.2 | 121.5 | 146.2 | 196.2 | 64.2 | 102.1 | 141.1 | 164.4 | 191.3 | 209.0 |
convSeq2Seq [11] | 14.6 | 29.7 | 58.1 | 69.7 | 87.3 | 117.7 | 27.7 | 53.6 | 90.7 | 103.3 | 122.4 | 162.4 |
DMGNN [13] | 12.2 | 24.2 | 59.6 | 77.5 | 106.0 | 136.7 | 47.1 | 93.3 | 160.1 | 171.2 | 194.0 | 182.3 |
LTD [25] | 11.4 | 24.0 | 50.1 | 61.5 | 79.4 | 108.1 | 23.4 | 46.2 | 83.5 | 96.0 | 111.9 | 148.9 |
MSR [49] | 10.7 | 23.1 | 48.3 | 59.2 | 76.3 | 106.3 | 20.7 | 42.9 | 80.4 | 93.3 | 111.9 | 148.2 |
Hisrep [48] | 8.7 | 19.2 | 43.4 | 54.9 | 74.5 | 108.2 | 20.1 | 40.3 | 73.3 | 86.3 | 108.2 | 146.9 |
ST-DGCN [45] | 8.9 | 20.1 | 43.6 | 54.3 | 72.2 | 103.4 | 18.8 | 39.3 | 73.7 | 86.4 | 104.7 | 139.8 |
Our model | 9.2 | 19.9 | 43.6 | 53.0 | 67.3 | 100.8 | 18.5 | 37.7 | 72.8 | 87.6 | 105.8 | 137.2 |
Action | Waiting Together | Average | ||||||||||
Res. sup. [17] | 26.8 | 50.1 | 80.2 | 92.2 | 107.6 | 131.1 | 34.7 | 62.0 | 101.1 | 115.5 | 97.6 | 130.5 |
convSeq2Seq [11] | 15.3 | 30.4 | 53.1 | 61.2 | 72.0 | 87.4 | 16.6 | 33.3 | 61.4 | 72.7 | 90.7 | 124.2 |
DMGNN [13] | 14.3 | 26.7 | 50.1 | 63.2 | 83.4 | 115.9 | 17.0 | 33.6 | 65.9 | 79.7 | 103.0 | 137.2 |
LTD [25] | 10.5 | 21.0 | 38.5 | 45.2 | 55.0 | 65.6 | 12.7 | 26.1 | 52.3 | 63.5 | 81.6 | 114.3 |
MSR [49] | 10.6 | 20.9 | 37.4 | 43.9 | 52.9 | 65.9 | 12.1 | 25.6 | 51.6 | 62.9 | 81.1 | 114.2 |
Hisrep [48] | 8.9 | 18.4 | 35.1 | 41.9 | 52.7 | 64.9 | 10.4 | 22.6 | 47.1 | 58.3 | 77.3 | 112.1 |
ST-DGCN [45] | 8.7 | 18.6 | 34.4 | 41.0 | 51.9 | 64.3 | 10.3 | 22.7 | 47.4 | 58.5 | 76.9 | 110.3 |
Our model | 8.7 | 18.5 | 33.5 | 40.5 | 50.8 | 61.4 | 10.2 | 22.3 | 46.9 | 57.7 | 75.1 | 106.9 |
Time (ms) | 80 | 160 | 320 | 400 | 560 | 1000 |
---|---|---|---|---|---|---|
Res. sup. [17] | 24.0 | 43.0 | 74.5 | 87.2 | 105.5 | 136.3 |
convSeq2Seq [11] | 12.5 | 22.2 | 40.7 | 49.7 | — | 84.6 |
DMGNN [13] | 13.6 | 24.1 | 47.0 | 58.8 | 77.4 | 112.6 |
LTD [25] | 9.3 | 17.1 | 33.0 | 40.9 | 55.8 | 86.2 |
LPJP [44] | 9.8 | 17.6 | 35.7 | 45.1 | - | 93.2 |
MSR [49] | 8.1 | 15.2 | 30.6 | 38.6 | 53.7 | 83.0 |
ST-DGCN [45] | 7.6 | 14.3 | 29.0 | 36.6 | 50.9 | 80.1 |
Our model (DA-MgTCN) | 7.5 | 14.0 | 28.1 | 34.8 | 49.0 | 77.4 |
Human3.6M MPJPE (mm) | CMU-MoCap MPJPE (mm) | ||||||||||||
Channel-att | MgTCN | 80 | 160 | 320 | 400 | 560 | 1000 | 80 | 160 | 320 | 400 | 560 | 1000 |
√ | 10.4 | 22.9 | 48.1 | 59.0 | 75.9 | 108.7 | 7.7 | 14.5 | 29.0 | 36.1 | 51.9 | 81.7 | |
√ | 10.5 | 23.1 | 48.0 | 59.4 | 76.8 | 111.0 | 7.9 | 14.6 | 29.2 | 36.7 | 52.3 | 82.6 | |
√ | √ | 10.2 | 22.4 | 46.9 | 57.7 | 74.8 | 106.7 | 7.5 | 14.0 | 28.1 | 34.8 | 49.0 | 79.4 |
DA- | Human3.6M MPJPE (mm) | CMU-MoCap MPJPE (mm) | ||||||||||
MgTCNs | 80 | 160 | 320 | 400 | 560 | 1000 | 80 | 160 | 320 | 400 | 560 | 1000 |
6 | 11.5 | 25.0 | 51.9 | 65.6 | 81.1 | 118.0 | 8.4 | 15.5 | 31.7 | 39.5 | 54.3 | 88.3 |
8 | 10.6 | 23.5 | 49.0 | 61.4 | 78.3 | 111.1 | 7.9 | 14.8 | 29.8 | 37.1 | 52.4 | 82.9 |
10 | 10.2 | 22.3 | 46.9 | 57.7 | 74.8 | 106.7 | 7.5 | 14.0 | 28.1 | 34.8 | 49.0 | 77.4 |
12 | 10.3 | 22.7 | 47.9 | 58.6 | 74.6 | 106.5 | 7.8 | 13.8 | 28.3 | 35.3 | 50.9 | 78.8 |
14 | 10.3 | 22.8 | 46.8 | 58.0 | 76.3 | 108.1 | 7.6 | 14.3 | 28.9 | 35.6 | 49.2 | 78.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, B.; Li, X. Human Motion Prediction via Dual-Attention and Multi-Granularity Temporal Convolutional Networks. Sensors 2023, 23, 5653. https://doi.org/10.3390/s23125653
Huang B, Li X. Human Motion Prediction via Dual-Attention and Multi-Granularity Temporal Convolutional Networks. Sensors. 2023; 23(12):5653. https://doi.org/10.3390/s23125653
Chicago/Turabian StyleHuang, Biaozhang, and Xinde Li. 2023. "Human Motion Prediction via Dual-Attention and Multi-Granularity Temporal Convolutional Networks" Sensors 23, no. 12: 5653. https://doi.org/10.3390/s23125653
APA StyleHuang, B., & Li, X. (2023). Human Motion Prediction via Dual-Attention and Multi-Granularity Temporal Convolutional Networks. Sensors, 23(12), 5653. https://doi.org/10.3390/s23125653