Attention-Based Variational Autoencoder Models for Human–Human Interaction Recognition via Generation
Abstract
:1. Introduction
- The efficiency, and generation and classification accuracy on benchmark datasets of the three models (M1, M2, M3) are analyzed in both FP and TP environments. M1 yields the highest classification accuracy, followed closely by M2. In each environment, the accuracies are correlated with the number of trainable parameters. No model is the clear winner for generation accuracy.
- Three action selection methods (where to attend to) are analyzed for each of M1, M2, M3. Classification accuracy is comparable when sampling locations are determined from prediction error (without any weighting) or from learned weights (without involving prediction error); however, the latter is less efficient in terms of model size.
2. Related Work
3. Models and Methods
3.1. Preliminaries
3.2. Problem Statement
3.3. Models
Algorithm 1 Learning the proposed network |
|
Algorithm 2 |
|
Algorithm 3 |
|
3.4. Agent Architecture
- Environment. The environment is the source of sensory data. It is time-varying.
- Observation. The agent interacts with the environment via a sequence of eye and body movements. The observations, sampled from the environment at each time instant, are in two modalities: perceptual and proprioceptive.
- Pattern completion. A multimodal variational recurrent neural network (MVRNN) for variable-length sequences is used for completing the pattern for each modality. Recognition and generation are the two processes involved in the operation of an MVRNN.
- 4.
- Action selection. In the proposed models, action selection is to decide the weight (attention) given to each location in the environment in order to sample the current observation. At any time t, a saliency map is computed for modality i from which the action is determined. The saliency map assigns a salience score to each location l. There are 15 locations corresponding to the 15 skeleton joints: head (J1), neck (J2), torso (J3), left shoulder (J4), left elbow (J5), left hand (J6), right shoulder (J7), right elbow (J8), right hand (J9), left hip (J10), left knee (J11), left foot (J12), right hip (J13), right knee (J14), right foot (J15). As in [11], we compute the weights in three ways, as follows.
- 5.
4. Experimental Results
4.1. Datasets
- (1)
- The SBU Kinect Interaction Dataset [52] is a two-person interaction dataset comprising eight interactions: approaching, departing, pushing, kicking, punching, exchanging objects, hugging, and shaking hands. The data are recorded from seven participants, forming a total of 21 sets such that each set consists of a unique pair of participants performing all actions. The dataset has approximately 300 interactions of duration 9 to 46 frames. The dataset is divided into five distinct train–test splits as in [52].
- (2)
- The K3HI: Kinect-Based 3D Human Interaction Dataset [53] is a two-person interaction dataset comprising eight interactions: approaching, departing, kicking, punching, pointing, pushing, exchanging an object, and shaking hands. The data are recorded from 15 volunteers. Each pair of participants performs all the actions. The dataset has approximately 320 interactions of duration 20 to 104 frames. The dataset is divided into four distinct train–test splits as in [53].
4.2. Experimental Setup
4.3. Evaluation
- First person: Here we model the agent as the first person (one of the two skeletons). Its body constitutes its internal environment while the other skeleton constitutes its external (visual) environment. Two modalities are used in our model (see Figure 1a): (i) visual perception, which captures the other skeleton’s 3D joint coordinates, and (ii) body proprioception, which captures the first skeleton’s 3D joint coordinates. Here, in the objective function (ref. Equations (9)–(11)).
- Third person: Here we model the agent as a third person (e.g., audience). The two interaction skeletons constitute the agent’s external (visual) environment. One modality is used in our model (see Figure 1b): visual perception, which captures both the skeletons’ 3D joint coordinates. Here, in the objective function (ref. Equations (9)–(11)).
4.4. Evaluation Results
4.4.1. Qualitative Evaluation
4.4.2. Evaluation for Generation Accuracy
4.4.3. Evaluation for Classification Accuracy
4.4.4. Analysis of Action Selection
4.4.5. Evaluation for Efficiency
4.5. Design Evaluation for Different Models
4.5.1. Handling Missing Class Labels
4.5.2. Number of Trainable Parameters
4.5.3. Training Time
4.5.4. End-to-End Training
5. Discussion
5.1. Limitations of the Proposed Approach
5.1.1. Limited Interaction Context
5.1.2. Limited Interaction Modalities
5.1.3. Need for Labeled Training Data
5.2. Future Work
5.2.1. Incorporate More Interaction Context
5.2.2. Incorporate Multiple Interaction Modalities
5.2.3. Alleviate the Need for Labeled Training Data
5.2.4. Experiment with Other Models
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Loss Function Derivation
Appendix A.1. Model M1
Appendix A.2. Model M2
Appendix A.3. Model M3
Appendix B. Experimental Results (Details)
Appendix B.1. Efficiency evaluation
References
- Lokesh, R.; Sullivan, S.; Calalo, J.A.; Roth, A.; Swanik, B.; Carter, M.J.; Cashaback, J.G.A. Humans utilize sensory evidence of others’ intended action to make online decisions. Sci. Rep. 2022, 12, 8806. [Google Scholar] [CrossRef] [PubMed]
- Byom, L.J.; Mutlu, B. Theory of mind: Mechanisms, methods, and new directions. Front. Hum. Neurosci. 2013, 7, 413. [Google Scholar] [CrossRef] [PubMed]
- Huang, C.M.; Andrist, S.; Sauppé, A.; Mutlu, B. Using gaze patterns to predict task intent in collaboration. Front. Psychol. 2015, 6, 1049. [Google Scholar] [CrossRef] [PubMed]
- Wetherby, A.M.; Prizant, B.M. The expression of communicative intent: Assessment guidelines. Semin. Speech Lang. 1989, 10, 77–91. [Google Scholar] [CrossRef]
- Woodward, A.L. Infants’ grasp of others’ intentions. Curr. Dir. Psychol. Sci. 2009, 18, 53–57. [Google Scholar] [CrossRef] [PubMed]
- Woodward, A.L.; Sommerville, J.A.; Gerson, S.; Henderson, A.M.; Buresh, J. The emergence of intention attribution in infancy. Psychol. Learn. Motiv. 2009, 51, 187–222. [Google Scholar] [PubMed]
- Jain, S.; Argall, B. Probabilistic human intent recognition for shared autonomy in assistive robotics. ACM Trans. Hum.-Robot Interact. 2019, 9, 1–23. [Google Scholar] [CrossRef] [PubMed]
- Losey, D.P.; McDonald, C.G.; Battaglia, E.; O’Malley, M.K. A review of intent detection, arbitration, and communication aspects of shared control for physical human–robot interaction. Appl. Mech. Rev. 2018, 70, 010804. [Google Scholar] [CrossRef]
- Xie, D.; Shu, T.; Todorovic, S.; Zhu, S.C. Learning and inferring “dark matter” and predicting human intents and trajectories in videos. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1639–1652. [Google Scholar] [CrossRef] [PubMed]
- Camara, F.; Bellotto, N.; Cosar, S.; Weber, F.; Nathanael, D.; Althoff, M.; Wu, J.; Ruenz, J.; Dietrich, A.; Markkula, G.; et al. Pedestrian models for autonomous driving part ii: High-level models of human behavior. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5453–5472. [Google Scholar] [CrossRef]
- Baruah, M.; Banerjee, B.; Nagar, A.K. Intent prediction in human–human interactions. IEEE Trans. Hum.-Mach. Syst. 2023, 53, 458–463. [Google Scholar] [CrossRef]
- Baruah, M.; Banerjee, B. The perception-action loop in a predictive agent. In Proceedings of the CogSci, Virtual, 29 July–1 August 2020; pp. 1171–1177. [Google Scholar]
- Baruah, M.; Banerjee, B.; Nagar, A.K. An attention-based predictive agent for static and dynamic environments. IEEE Access 2022, 10, 17310–17317. [Google Scholar] [CrossRef]
- Banerjee, B.; Baruah, M. An attention-based predictive agent for handwritten numeral/alphabet recognition via generation. In Proceedings of the NeuRIPS Workshop on Gaze Meets ML, New Orleans, LA, USA, 10 December 2023. [Google Scholar]
- Baruah, M.; Banerjee, B. A multimodal predictive agent model for human interaction generation. In Proceedings of the CVPR Workshops, Seattle, WA, USA, 15 June 2020. [Google Scholar]
- Baruah, M.; Banerjee, B. Speech emotion recognition via generation using an attention-based variational recurrent neural network. In Proceedings of the Interspeech, Incheon, Republic of Korea, 18–22 September 2022; pp. 4710–4714. [Google Scholar]
- Lukander, K.; Toivanen, M.; Puolamäki, K. Inferring intent and action from gaze in naturalistic behavior: A review. Int. J. Mob. Hum. Comput. Interact. 2017, 9, 41–57. [Google Scholar] [CrossRef]
- Kong, Y.; Fu, Y. Human action recognition and prediction: A survey. Int. J. Comput. Vis. 2022, 130, 1366–1401. [Google Scholar] [CrossRef]
- Xu, Y.T.; Li, Y.; Meger, D. Human Motion Prediction via Pattern Completion in Latent Representation Space. In Proceedings of the Computer and Robot Vision, Kingston, QC, Canada, 29–31 May 2019; pp. 57–64. [Google Scholar]
- Chopin, B.; Otberdout, N.; Daoudi, M.; Bartolo, A. Human Motion Prediction Using Manifold-Aware Wasserstein GAN. arXiv 2021, arXiv:2105.08715. [Google Scholar]
- Vinayavekhin, P.; Chaudhury, S.; Munawar, A.; Agravante, D.J.; De Magistris, G.; Kimura, D.; Tachibana, R. Focusing on what is relevant: Time-series learning and understanding using attention. In Proceedings of the ICPR, Beijing, China, 20–24 August 2018; pp. 2624–2629. [Google Scholar]
- Hoshen, Y. Vain: Attentional multi-agent predictive modeling. In Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 2701–2711. [Google Scholar]
- Vemula, A.; Muelling, K.; Oh, J. Social attention: Modeling attention in human crowds. In Proceedings of the ICRA, Brisbane, Australia, 21–25 May 2018; pp. 1–7. [Google Scholar]
- Varshneya, D.; Srinivasaraghavan, G. Human trajectory prediction using spatially aware deep attention models. arXiv 2017, arXiv:1705.09436. [Google Scholar]
- Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft+ hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef] [PubMed]
- Adeli, V.; Adeli, E.; Reid, I.; Niebles, J.C.; Rezatofighi, H. Socially and contextually aware human motion and pose forecasting. IEEE Robot. Autom. Lett. 2020, 5, 6033–6040. [Google Scholar] [CrossRef]
- Kothari, P.; Kreiss, S.; Alahi, A. Human trajectory forecasting in crowds: A deep learning perspective. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7386–7400. [Google Scholar] [CrossRef]
- Huang, D.; Kitani, K. Action-reaction: Forecasting the dynamics of human interaction. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 489–504. [Google Scholar]
- Yao, T.; Wang, M.; Ni, B.; Wei, H.; Yang, X. Multiple granularity group interaction prediction. In Proceedings of the CVPR, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2246–2254. [Google Scholar]
- Ng, E.; Xiang, D.; Joo, H.; Grauman, K. You2me: Inferring body pose in egocentric video via first and second person interactions. In Proceedings of the CVPR, Seattle, WA, USA, 13–19 June 2020; pp. 9890–9900. [Google Scholar]
- Yu, J.; Gao, H.; Yang, W.; Jiang, Y.; Chin, W.; Kubota, N.; Ju, Z. A discriminative deep model with feature fusion and temporal attention for human action recognition. IEEE Access 2020, 8, 43243–43255. [Google Scholar] [CrossRef]
- Li, C.; Zhong, Q.; Xie, D.; Pu, S. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv 2018, arXiv:1804.06055. [Google Scholar]
- Manzi, A.; Fiorini, L.; Limosani, R.; Dario, P.; Cavallo, F. Two-person activity recognition using skeleton data. IET Comput. Vis. 2018, 12, 27–35. [Google Scholar] [CrossRef]
- Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 4263–4270. [Google Scholar]
- Fan, Z.; Zhao, X.; Lin, T.; Su, H. Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans. Multimed. 2018, 21, 363–374. [Google Scholar] [CrossRef]
- Le, T.M.; Inoue, N.; Shinoda, K. A fine-to-coarse convolutional neural network for 3D human action recognition. arXiv 2018, arXiv:1805.11790. [Google Scholar]
- Baradel, F.; Wolf, C.; Mille, J. Pose-conditioned spatio-temporal attention for human action recognition. arXiv 2017, arXiv:1703.10106. [Google Scholar]
- Qin, Y.; Mo, L.; Li, C.; Luo, J. Skeleton-based action recognition by part-aware graph convolutional networks. Vis. Comput. 2020, 36, 621–631. [Google Scholar] [CrossRef]
- Li, M.; Leung, H. Multi-view depth-based pairwise feature learning for person-person interaction recognition. Multimed. Tools Appl. 2019, 78, 5731–5749. [Google Scholar] [CrossRef]
- Kundu, J.N.; Buckchash, H.; Mandikal, P.; Jamkhandi, A.; Radhakrishnan, V.B. Cross-conditioned recurrent networks for long-term synthesis of inter-person human motion interactions. In Proceedings of the WACV, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2724–2733. [Google Scholar]
- Chopin, B.; Tang, H.; Otberdout, N.; Daoudi, M.; Sebe, N. Interaction Transformer for Human Reaction Generation. arXiv 2022, arXiv:2207.01685. [Google Scholar] [CrossRef]
- Men, Q.; Shum, H.P.H.; Ho, E.S.L.; Leung, H. GAN-based reactive motion synthesis with class-aware discriminators for human–human interaction. Comput. Graph. 2022, 102, 634–645. [Google Scholar] [CrossRef]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2020. [Google Scholar]
- Han, J.; Waddington, G.; Adams, R.; Anson, J.; Liu, Y. Assessing proprioception: A critical review of methods. J. Sport Health Sci. 2016, 5, 80–90. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I. NIPS 2016 tutorial: Generative adversarial networks. arXiv 2016, arXiv:1701.00160. [Google Scholar]
- Kingma, D.; Welling, M. Auto-encoding variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Spratling, M. Predictive coding as a model of the V1 saliency map hypothesis. Neural Netw. 2012, 26, 7–28. [Google Scholar] [CrossRef] [PubMed]
- Friston, K.J.; Daunizeau, J.; Kiebel, S.J. Reinforcement learning or active inference? PLoS ONE 2009, 4, e6421. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Mohamed, S.; Rezende, D.J.; Welling, M. Semi-supervised learning with deep generative models. In Proceedings of the NIPS, Cambridge, MA, USA, 8–13 December 2014; pp. 3581–3589. [Google Scholar]
- Chung, J.; Kastner, K.; Dinh, L.; Goel, K.; Courville, A.C.; Bengio, Y. A recurrent latent variable model for sequential data. In Proceedings of the NIPS, Cambridge, MA, USA, 7–12 December 2015; pp. 2980–2988. [Google Scholar]
- Wu, M.; Goodman, N. Multimodal generative models for scalable weakly-supervised learning. In Proceedings of the NIPS, Red Hook, NY, USA, 3–8 December 2018; pp. 5575–5585. [Google Scholar]
- Yun, K.; Honorio, J.; Chattopadhyay, D.; Berg, T.; Samaras, D. Two-person interaction detection using body-pose features and multiple instance learning. In Proceedings of the CVPR Workshops, Providence, RI, USA, 16–21 June 2012; pp. 28–35. [Google Scholar]
- Hu, T.; Zhu, X.; Guo, W.; Su, K. Efficient interaction recognition through positive action representation. Math. Probl. Eng. 2013, 2013, 795360. [Google Scholar] [CrossRef]
- Nguyen, X.S. GeomNet: A Neural Network Based on Riemannian Geometries of SPD Matrix Space and Cholesky Space for 3D Skeleton-Based Interaction Recognition. In Proceedings of the ICCV, Virtual, 16 October 2021; pp. 13379–13389. [Google Scholar]
- Li, M.; Leung, H. Multiview skeletal interaction recognition using active joint interaction graph. IEEE Trans. Multimed. 2016, 18, 2293–2302. [Google Scholar] [CrossRef]
- Verma, A.; Meenpal, T.; Acharya, B. Multiperson interaction recognition in images: A body keypoint based feature image analysis. Comput. Intell. 2021, 37, 461–483. [Google Scholar] [CrossRef]
- Zhu, W.; Lan, C.; Xing, J.; Zeng, W.; Li, Y.; Shen, L.; Xie, X. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Proceedings of the AAAI, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Liu, J.; Shahroudy, A.; Xu, D.; Kot, A.C.; Wang, G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 3007–3021. [Google Scholar] [CrossRef] [PubMed]
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015; pp. 1110–1118. [Google Scholar]
- Hu, T.; Zhu, X.; Wang, S.; Duan, L. Human interaction recognition using spatial-temporal salient feature. Multimed. Tools Appl. 2019, 78, 28715–28735. [Google Scholar] [CrossRef]
- Banerjee, B.; Kapourchali, M.H.; Baruah, M.; Deb, M.; Sakauye, K.; Olufsen, M. Synthesizing skeletal motion and physiological signals as a function of a virtual human’s actions and emotions. In Proceedings of the SIAM International Conference on Data Mining, Virtual Event, 29 April–1 May 2021; pp. 684–692. [Google Scholar]
- Tsai, Y.H.H.; Bai, S.; Liang, P.P.; Kolter, J.Z.; Morency, L.P.; Salakhutdinov, R. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Volume 2019, pp. 6558–6569. [Google Scholar]
- Banerjee, B.; Dutta, J.K. SELP: A general-purpose framework for learning the norms from saliencies in spatiotemporal data. Neurocomputing 2014, 138, 41–60. [Google Scholar] [CrossRef]
- Banerjee, B. Multi-Sensor Device for Environment State Estimation and Prediction by Sampling Its Own Sensors and Other Devices. U.S. Patent App. 16/719,828, 23 December 2021. [Google Scholar]
- Kapourchali, M.H.; Banerjee, B. State estimation via communication for monitoring. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 786–793. [Google Scholar] [CrossRef]
- Kapourchali, M.H.; Banerjee, B. EPOC: Efficient perception via optimal communication. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020; pp. 4107–4114. [Google Scholar]
- Najnin, S.; Banerjee, B. Emergence of vocal developmental sequences in a predictive coding model of speech acquisition. In Proceedings of the Interspeech, San Francisco, CA, USA, 8–12 September 2016; pp. 1113–1117. [Google Scholar]
- Najnin, S.; Banerjee, B. A predictive coding framework for a developmental agent: Speech motor skill acquisition and speech production. Speech Commun. 2017, 92, 24–41. [Google Scholar] [CrossRef]
Model | Appr | Depart | Kick | Push | Sh Hands | Hug | Exc Obj | Punch | Avg AFD |
---|---|---|---|---|---|---|---|---|---|
M1 (bs) | 0.031, 0.02 | 0.034, 0.02 | 0.072, 0.04 | 0.044, 0.02 | 0.032, 0.01 | 0.060, 0.02 | 0.037, 0.05 | 0.053, 0.02 | 0.045, 0.01 |
M2 (bs) | 0.026, 0.01 | 0.028, 0.02 | 0.064, 0.03 | 0.043, 0.02 | 0.031, 0.02 | 0.055, 0.02 | 0.032, 0.01 | 0.046, 0.02 | 0.041, 0.01 |
M3 (bs) | 0.020, 0.01 | 0.023, 0.02 | 0.050, 0.03 | 0.030, 0.01 | 0.021, 0.01 | 0.042, 0.02 | 0.024, 0.01 | 0.036, 0.02 | 0.031, 0.01 |
M1 (pe) | 0.102, 0.07 | 0.125, 0.10 | 0.244, 0.27 | 0.129, 0.10 | 0.112, 0.06 | 0.171, 0.11 | 0.132, 0.10 | 0.170, 0.11 | 0.148, 0.04 |
M2 (pe) | 0.092, 0.06 | 0.100, 0.07 | 0.228, 0.20 | 0.131, 0.08 | 0.113, 0.06 | 0.170, 0.07 | 0.126, 0.11 | 0.159, 0.11 | 0.140, 0.04 |
M3 (pe) | 0.065, 0.05 | 0.085, 0.06 | 0.189, 0.28 | 0.093, 0.10 | 0.076, 0.03 | 0.129, 0.07 | 0.092, 0.10 | 0.126, 0.12 | 0.107, 0.04 |
M1 (lwpe) | 0.028, 0.02 | 0.033, 0.02 | 0.071, 0.04 | 0.043, 0.02 | 0.032, 0.03 | 0.059, 0.03 | 0.035, 0.01 | 0.052, 0.02 | 0.044, 0.01 |
M2 (lwpe) | 0.029, 0.02 | 0.033, 0.02 | 0.077, 0.04 | 0.046, 0.02 | 0.033, 0.03 | 0.062, 0.02 | 0.036, 0.01 | 0.056, 0.02 | 0.047, 0.02 |
M3 (lwpe) | 0.026, 0.02 | 0.030, 0.02 | 0.067, 0.04 | 0.040, 0.02 | 0.027, 0.02 | 0.052, 0.02 | 0.033, 0.02 | 0.047, 0.02 | 0.040, 0.01 |
M1 (lw) | 0.032, 0.02 | 0.035, 0.02 | 0.072, 0.04 | 0.045, 0.02 | 0.032, 0.02 | 0.057, 0.02 | 0.036, 0.02 | 0.052, 0.02 | 0.045, 0.01 |
M2 (lw) | 0.061, 0.05 | 0.066, 0.07 | 0.146, 0.10 | 0.102, 0.05 | 0.076, 0.06 | 0.125, 0.07 | 0.082, 0.05 | 0.113, 0.07 | 0.096, 0.03 |
M3 (lw) | 0.020, 0.01 | 0.023, 0.02 | 0.052, 0.03 | 0.031, 0.02 | 0.022, 0.01 | 0.043, 0.02 | 0.025, 0.01 | 0.037, 0.02 | 0.032, 0.01 |
Model | Appr. | Depart | Kick | Push | Sh. Hands | Hug | Exc. Ob. | Punch | Avg. AFD |
---|---|---|---|---|---|---|---|---|---|
M1 (bs) | 0.040, 0.03 | 0.043, 0.03 | 0.097, 0.05 | 0.059, 0.03 | 0.042, 0.03 | 0.075, 0.04 | 0.046, 0.01 | 0.067, 0.03 | 0.059, 0.02 |
M2 (bs) | 0.056, 0.04 | 0.058, 0.04 | 0.134, 0.08 | 0.083, 0.04 | 0.056, 0.05 | 0.100, 0.05 | 0.063, 0.02 | 0.092, 0.05 | 0.080, 0.03 |
M3 (bs) | 0.026, 0.02 | 0.030, 0.02 | 0.072, 0.04 | 0.042, 0.02 | 0.028, 0.02 | 0.056, 0.02 | 0.034, 0.01 | 0.049, 0.02 | 0.042, 0.02 |
M1 (pe) | 0.098, 0.04 | 0.101, 0.04 | 0.215, 0.08 | 0.114, 0.07 | 0.172, 0.07 | 0.108, 0.04 | 0.152, 0.04 | 0.152, 0.04 | 0.137, 0.04 |
M2 (pe) | 0.118, 0.06 | 0.129, 0.06 | 0.279, 0.11 | 0.171, 0.08 | 0.126, 0.08 | 0.215, 0.06 | 0.126, 0.04 | 0.186, 0.04 | 0.169, 0.06 |
M3 (pe) | 0.068, 0.04 | 0.079, 0.04 | 0.184, 0.07 | 0.107, 0.04 | 0.082, 0.04 | 0.141, 0.06 | 0.082, 0.03 | 0.120, 0.03 | 0.108, 0.04 |
M1 (lwpe) | 0.046, 0.04 | 0.054, 0.05 | 0.121, 0.06 | 0.072, 0.03 | 0.051, 0.03 | 0.095, 0.04 | 0.059, 0.02 | 0.083, 0.03 | 0.073, 0.02 |
M2 (lwpe) | 0.078, 0.06 | 0.084, 0.09 | 0.177, 0.10 | 0.108, 0.04 | 0.079, 0.04 | 0.144, 0.08 | 0.089, 0.04 | 0.133, 0.07 | 0.111, 0.04 |
M3 (lwpe) | 0.038, 0.03 | 0.044, 0.03 | 0.095, 0.05 | 0.055, 0.02 | 0.039, 0.04 | 0.073, 0.03 | 0.046, 0.02 | 0.065, 0.02 | 0.057, 0.02 |
M1 (lw) | 0.042, 0.03 | 0.047, 0.03 | 0.108, 0.07 | 0.063, 0.03 | 0.044, 0.04 | 0.077, 0.04 | 0.048, 0.01 | 0.071, 0.03 | 0.062, 0.02 |
M2 (lw) | 0.076, 0.09 | 0.119, 0.22 | 0.191, 0.18 | 0.124, 0.10 | 0.092, 0.08 | 0.155, 0.14 | 0.101, 0.10 | 0.139, 0.11 | 0.125, 0.04 |
M3 (lw) | 0.028, 0.02 | 0.033, 0.02 | 0.078, 0.04 | 0.042, 0.02 | 0.029, 0.02 | 0.057, 0.02 | 0.034, 0.01 | 0.050, 0.02 | 0.044, 0.02 |
Model | Appr. | Depart | Exc. Ob. | Kick | Point | Punch | Push | Sh. Hands | Avg. AFD |
---|---|---|---|---|---|---|---|---|---|
M1 (bs) | 0.153, 0.99 | 0.015, 0.01 | 0.006, 0.01 | 0.011, 0.01 | 0.007, 0.00 | 0.010, 0.01 | 0.010, 0.00 | 0.006, 0.00 | 0.027, 0.05 |
M2 (bs) | 0.146, 1.0 | 0.016, 0.01 | 0.006, 0.01 | 0.012, 0.01 | 0.008, 0.00 | 0.010, 0.01 | 0.010, 0.01 | 0.006, 0.00 | 0.027, 0.05 |
M3 (bs) | 0.143, 0.85 | 0.022, 0.02 | 0.013, 0.01 | 0.022, 0.03 | 0.016, 0.02 | 0.020, 0.03 | 0.019, 0.02 | 0.012, 0.02 | 0.033, 0.04 |
M1 (pe) | 0.135, 0.74 | 0.037, 0.03 | 0.020, 0.01 | 0.033, 0.02 | 0.025, 0.02 | 0.026, 0.02 | 0.027, 0.01 | 0.019, 0.02 | 0.040, 0.04 |
M2 (pe) | 0.136, 0.66 | 0.048, 0.03 | 0.029, 0.02 | 0.052, 0.03 | 0.038, 0.03 | 0.039, 0.02 | 0.041, 0.02 | 0.031, 0.02 | 0.052, 0.03 |
M3 (pe) | 0.126, 0.61 | 0.041, 0.03 | 0.021, 0.01 | 0.038, 0.03 | 0.028, 0.03 | .029, 0.03 | 0.031, 0.02 | 0.021, 0.02 | 0.042, 0.03 |
M1 (lwpe) | 0.143, 0.87 | 0.017, 0.02 | 0.007, 0.01 | 0.013, 0.01 | 0.010, 0.02 | 0.011, 0.01 | 0.011, 0.01 | 0.007, 0.01 | 0.027, 0.05 |
M2 (lwpe) | 0.148, 0.91 | 0.020, 0.02 | 0.009, 0.01 | 0.016, 0.01 | 0.012, 0.01 | 0.013, 0.01 | 0.013, 0.01 | 0.009, 0.01 | 0.030, 0.05 |
M3 (lwpe) | 0.135, 0.75 | 0.029, 0.03 | 0.017, 0.02 | 0.031, 0.05 | 0.021, 0.03 | 0.026, 0.04 | 0.027, 0.04 | 0.017, 0.03 | 0.038, 0.04 |
M1 (lw) | 0.164, 1.1 | 0.016, 0.01 | 0.006, 0.01 | 0.012, 0.01 | 0.007, 0.00 | 0.009, 0.01 | 0.009, 0.01 | 0.006, 0.00 | 0.029, 0.05 |
M2 (lw) | 0.154, 0.97 | 0.018, 0.02 | 0.007, 0.01 | 0.014, 0.01 | 0.008, 0.01 | 0.011, 0.01 | 0.011, 0.01 | 0.006, 0.01 | 0.029, 0.05 |
M3 (lw) | 0.141, 0.85 | 0.027, 0.02 | 0.017, 0.02 | 0.030, 0.05 | 0.021, 0.03 | 0.026, 0.04 | 0.025, 0.04 | 0.017, 0.03 | 0.038, 0.04 |
Model | Appr. | Depart | Exc. Ob. | Kick | Point | Punch | Push | Sh. Hands | Avg. AFD |
---|---|---|---|---|---|---|---|---|---|
M1 (bs) | 0.155, 0.96 | 0.024, 0.01 | 0.013, 0.01 | 0.025, 0.02 | 0.018, 0.02 | 0.019, 0.02 | 0.020, 0.01 | 0.014, 0.01 | 0.036, 0.05 |
M2 (bs) | 0.155, 0.89 | 0.026, 0.01 | 0.016, 0.01 | 0.027, 0.02 | 0.023, 0.03 | 0.022, 0.02 | 0.023, 0.01 | 0.019, 0.02 | 0.039, 0.05 |
M3 (bs) | 0.154, 0.96 | 0.017, 0.01 | 0.007, 0.01 | 0.015, 0.01 | 0.010, 0.01 | 0.011, 0.01 | 0.012, 0.01 | 0.007, 0.01 | 0.029, 0.05 |
M1 (pe) | 0.161, 0.75 | 0.044, 0.02 | 0.027, 0.02 | 0.054, 0.03 | 0.047, 0.04 | 0.040, 0.02 | 0.042, 0.02 | 0.031, 0.02 | 0.056, 0.04 |
M2 (pe) | 0.169, 0.66 | 0.047, 0.02 | 0.031, 0.02 | 0.062, 0.03 | 0.055, 0.05 | 0.046, 0.02 | 0.048, 0.02 | 0.035, 0.02 | 0.062, 0.04 |
M3 (pe) | 0.154, 0.71 | 0.038, 0.02 | 0.024, 0.02 | 0.048, 0.03 | 0.038, 0.03 | 0.037, 0.03 | 0.039, 0.02 | 0.026, 0.02 | 0.051, 0.04 |
M1 (lwpe) | 0.159, 0.94 | 0.024, 0.02 | 0.013, 0.01 | 0.026, 0.02 | 0.022, 0.03 | 0.019, 0.01 | 0.021, 0.01 | 0.014, 0.01 | 0.037, 0.05 |
M2 (lwpe) | 0.156, 0.92 | 0.029, 0.02 | 0.020, 0.02 | 0.036, 0.03 | 0.029, 0.03 | 0.029, 0.02 | 0.031, 0.02 | 0.020, 0.01 | 0.044, 0.04 |
M3 (lwpe) | 0.151, 1.0 | 0.033, 0.02 | 0.021, 0.02 | 0.041, 0.05 | 0.039, 0.05 | 0.033, 0.03 | 0.033, 0.03 | 0.023, 0.02 | 0.047, 0.04 |
M1 (lw) | 0.161, 1.0 | 0.021, 0.02 | 0.010, 0.01 | 0.020, 0.01 | 0.015, 0.02 | 0.014, 0.01 | 0.015, 0.01 | 0.009, 0.01 | 0.033, 0.05 |
M2 (lw) | 0.154, 0.92 | 0.024, 0.02 | 0.012, 0.01 | 0.024, 0.02 | 0.019, 0.02 | 0.018, 0.01 | 0.019, 0.01 | 0.012, 0.01 | 0.035, 0.05 |
M3 (lw) | 0.146, 0.90 | 0.031, 0.02 | 0.019, 0.02 | 0.036, 0.05 | 0.030, 0.04 | 0.030, 0.04 | 0.030, 0.04 | 0.020, 0.03 | 0.043, 0.03 |
Model | First Person | Third Person |
---|---|---|
M1 (bs) [11] | 1,656,348 | 1,089,996 |
M2 (bs) | 1,134,284 | 833,676 |
M3 (bs) | 1,111,420 | 827,692 |
M1 (pe) [11] | 1,656,348 | 1,089,996 |
M2 (pe) | 1,134,284 | 833,676 |
M3 (pe) | 1,111,420 | 827,692 |
M1 (lwpe) [11] | 1,657,728 | 1,092,726 |
M2 (lwpe) | 1,135,664 | 836,406 |
M3 (lwpe) | 1,112,800 | 830,422 |
M1 (lw) [11] | 1,657,728 | 1,092,726 |
M2 (lw) | 1,135,664 | 836,406 |
M3 (lw) | 1,112,800 | 830,422 |
Dataset | Characteristics | Models | Accuracy | |||
---|---|---|---|---|---|---|
Raw Skeleton | Skeletal Features | Attention | ||||
SBU | ✓ | Other models | [54] | 96.3 | ||
✓ | [55] | 94.12 | ||||
✓ | [56] | 94.28 | ||||
✓ | [57] | 90.41 | ||||
✓ | [58] | 93.3 | ||||
✓ | [57,59] | 80.35 | ||||
✓ | Our models (first person) | M1 (bs) | 93.2 | |||
✓ | ✓ | M1 (pe) | 93.1 | |||
✓ | ✓ | M2 (lwpe) | 93.8 | |||
✓ | ✓ | M1 (lw) | 91.5 | |||
✓ | Our models (third person) | M1 (bs) | 93.7 | |||
✓ | ✓ | M1 (pe) | 92.5 | |||
✓ | ✓ | M2 (lwpe) | 91.4 | |||
✓ | ✓ | M1 (lw) | 92.9 | |||
K3HI | ✓ | Other models | [53] | 83.33 | ||
✓ | [60] | 80.87 | ||||
✓ | [53] | 45.2 | ||||
✓ | [60] | 48.54 | ||||
✓ | Our models (first person) | M1 (bs) | 87.5 | |||
✓ | ✓ | M1 (pe) | 85.9 | |||
✓ | ✓ | M2 (lwpe) | 84.9 | |||
✓ | ✓ | M1 (lw) | 86.9 | |||
✓ | Our models (third person) | M1 (bs) | 83.0 | |||
✓ | ✓ | M1 (pe) | 82.7 | |||
✓ | ✓ | M2 (lwpe) | 82.1 | |||
✓ | ✓ | M1 (lw) | 80.8 |
Model | Acc. | Recall | Precision | F1 Score |
---|---|---|---|---|
M1 (bs) | 93.2, 4.7 | 0.934, 0.04 | 0.931, 0.05 | 0.928, 0.05 |
M2 (bs) | 91.9, 5.6 | 0.927, 0.04 | 0.913, 0.06 | 0.912, 0.05 |
M3 (bs) | 82.2, 10.1 | 0.846, 0.09 | 0.817, 0.11 | 0.814, 0.11 |
M1 (pe) | 93.1, 3.75 | 0.940, 0.03 | 0.924, 0.04 | 0.925, 0.03 |
M2 (pe) | 89.3, 5.1 | 0.895, 0.03 | 0.869, 0.05 | 0.886, 0.04 |
M3 (pe) | 80.4, 8.5 | 0.837, 0.08 | 0.799, 0.09 | 0.796, 0.09 |
M1 (lwpe) | 93.1, 3.9 | 0.939, 0.04 | 0.929, 0.04 | 0.929, 0.04 |
M2 (lwpe) | 93.8, 4.7 | 0.945, 0.04 | 0.934, 0.06 | 0.931, 0.06 |
M3 (lwpe) | 81.4, 9.1 | 0.842, 0.08 | 0.809, 0.10 | 0.807, 0.10 |
M1 (lw) | 91.5, 6.0 | 0.920, 0.05 | 0.902, 0.07 | 0.903, 0.07 |
M2 (lw) | 59.8, 14.7 | 0.655, 0.13 | 0.564, 0.14 | 0.627, 0.13 |
M3 (lw) | 83.2, 8.3 | 0.855, 0.07 | 0.823, 0.09 | 0.823, 0.09 |
Model | Acc. | Recall | Precision | F1 Score |
---|---|---|---|---|
M1 (bs) | 87.5, 7.1 | 0.865, 0.08 | 0.859, 0.08 | 0.856, 0.08 |
M2 (bs) | 82.7, 3.1 | 0.817, 0.04 | 0.806, 0.04 | 0.804, 0.04 |
M3 (bs) | 80.1, 3.1 | 0.796, 0.03 | 0.783, 0.02 | 0.777, 0.03 |
M1 (pe) | 85.9, 5.2 | 0.854, 0.07 | 0.838, 0.06 | 0.839, 0.06 |
M2 (pe) | 84.9, 3.3 | 0.836, 0.04 | 0.835, 0.04 | 0.831, 0.04 |
M3 (pe) | 76.9, 2.6 | 0.768, 0.02 | 0.760, 0.02 | 0.752, 0.02 |
M1 (lwpe) | 84.9, 3.5 | 0.850, 0.05 | 0.818, 0.03 | 0.818, 0.03 |
M2 (lwpe) | 82.1, 6.3 | 0.828, 0.07 | 0.802, 0.07 | 0.801, 0.06 |
M3 (lwpe) | 75.6, 4.0 | 0.759, 0.03 | 0.746, 0.03 | 0.739, 0.03 |
M1 (lw) | 86.9, 4.3 | 0.865, 0.05 | 0.852, 0.05 | 0.853, 0.05 |
M2 (lw) | 83.7, 3.0 | 0.840, 0.05 | 0.824, 0.04 | 0.822, 0.04 |
M3 (lw) | 76.3, 4.7 | 0.760, 0.04 | 0.753, 0.04 | 0.745, 0.04 |
Model | Acc. | Recall | Precision | F1 Score |
---|---|---|---|---|
M1 (bs) | 93.7, 6.1 | 0.944, 0.05 | 0.935, 0.05 | 0.934, 0.06 |
M2 (bs) | 92.1, 3.9 | 0.923, 0.03 | 0.920, 0.04 | 0.914, 0.04 |
M3 (bs) | 82.5, 8.8 | 0.847, 0.08 | 0.818, 0.10 | 0.814, 0.10 |
M1 (pe) | 92.5, 5.5 | 0.930, 0.05 | 0.927, 0.05 | 0.922, 0.05 |
M2 (pe) | 90.1, 6.2 | 0.909, 0.05 | 0.879, 0.05 | 0.894, 0.06 |
M3 (pe) | 79.3, 7.8 | 0.807, 0.09 | 0.781, 0.09 | 0.775, 0.09 |
M1 (lwpe) | 91.3, 7.5 | 0.915, 0.06 | 0.907, 0.08 | 0.906, 0.07 |
M2 (lwpe) | 91.4, 5.5 | 0.919, 0.05 | 0.908, 0.05 | 0.905, 0.06 |
M3 (lwpe) | 81.7, 7.2 | 0.842, 0.07 | 0.815, 0.08 | 0.811, 0.07 |
M1 (lw) | 92.9, 5.8 | 0.951, 0.03 | 0.921, 0.05 | 0.924, 0.05 |
M2 (lw) | 71.3, 6.0 | 0.773, 0.07 | 0.694, 0.08 | 0.738, 0.04 |
M3 (lw) | 82.1, 8.5 | 0.074, 0.08 | 0.815, 0.09 | 0.813, 0.09 |
Model | Acc. | Recall | Precision | F1 Score |
---|---|---|---|---|
M1 (bs) | 83.0, 6.6 | 0.827, 0.07 | 0.816, 0.08 | 0.813, 0.08 |
M2 (bs) | 81.1, 3.3 | 0.796, 0.03 | 0.783, 0.03 | 0.780, 0.03 |
M3 (bs) | 80.1, 3.1 | 0.796, 0.03 | 0.783, 0.02 | 0.777, 0.03 |
M1 (pe) | 82.7, 7.3 | 0.816, 0.08 | 0.815, 0.08 | 0.810, 0.08 |
M2 (pe) | 82.4, 3.9 | 0.825, 0.04 | 0.804, 0.04 | 0.805, 0.05 |
M3 (pe) | 75.0, 5.7 | 0.762, 0.04 | 0.741, 0.05 | 0.738, 0.05 |
M1 (lwpe) | 82.1, 4.5 | 0.809, 0.04 | 0.800, 0.06 | 0.796, 0.05 |
M2 (lwpe) | 80.5, 7.8 | 0.794, 0.08 | 0.790, 0.10 | 0.784, 0.09 |
M3 (lwpe) | 72.7, 8.3 | 0.731, 0.07 | 0.720, 0.07 | 0.712, 0.07 |
M1 (lw) | 80.8, 6.3 | 0.793, 0.07 | 0.775, 0.08 | 0.777, 0.08 |
M2 (lw) | 78.3, 6.3 | 0.803, 0.07 | 0.766, 0.07 | 0.764, 0.08 |
M3 (lw) | 75.0, 7.1 | 0.758, 0.05 | 0.741, 0.06 | 0.736, 0.06 |
Dataset | Model | Approach | Depart | Kick | Push | Sh. Hands | Exc. Obj. | Punch | Hug | Avg. |
---|---|---|---|---|---|---|---|---|---|---|
SBU | M1 [11] | 48.9, 4.2 | 48.7, 3.9 | 46.6, 2.8 | 49.3, 2.1 | 49.8, 2.3 | 48.9, 2.2 | 49.9, 3.2 | 48.3, 3.0 | 48.8, 1.0 |
M2 | 48.3, 3.5 | 48.4, 4.3 | 46.7, 3.3 | 49.2, 2.4 | 49.8, 2.7 | 48.4, 2.5 | 49.3, 2.5 | 47.4, 2.6 | 48.4, 1.0 | |
M3 | 48.5, 3.7 | 47.8, 4.4 | 46.3, 2.6 | 49.2, 2.2 | 48.7, 2.4 | 48.0, 1.9 | 49.3, 3.4 | 48.4, 2.3 | 48.3, 1.0 | |
Approach | Depart | Exc. obj. | Kick | Point | Punch | Push | Sh. Hands | Avg. | ||
K3HI | M1 [11] | 47.9, 3.0 | 47.6, 2.4 | 47.8, 3.0 | 45.8, 2.6 | 46.8, 4.4 | 47.4, 2.4 | 47.6, 2.0 | 46.3, 2.9 | 47.2, 1.0 |
M2 | 48.4, 2.4 | 48.4, 2.1 | 48.3, 3.9 | 44.5, 2.5 | 44.8, 4.1 | 47.3, 2.7 | 47.9, 3.2 | 47.7, 4.2 | 47.1, 1.6 | |
M3 | 48.0, 2.2 | 47.9, 2.2 | 48.2, 3.2 | 44.6, 2.6 | 45.9, 4.4 | 47.5, 2.7 | 48.0, 3.3 | 47.0, 3.9 | 47.1, 1.3 |
Dataset | Model | Approach | Depart | Kick | Push | Sh. Hands | Exc. Obj. | Punch | Hug | Avg. |
---|---|---|---|---|---|---|---|---|---|---|
SBU | M1 [11] | 47.5, 3.8 | 45.8, 4.6 | 45.1, 3.2 | 48.4, 2.7 | 47.7, 3.2 | 47.6, 2.8 | 48.7, 3.8 | 47.4, 2.9 | 47.3, 1.2 |
M2 | 47.8, 4.5 | 45.6, 4.4 | 44.4, 3.0 | 48.6, 3.6 | 47.7, 3.8 | 47.5, 3.2 | 48.0, 3.9 | 47.1, 3.4 | 47.1, 1.4 | |
M3 | 46.7, 3.4 | 46.2, 4.4 | 44.6, 3.0 | 48.9, 3.1 | 47.9, 4.0 | 47.4, 2.5 | 47.7, 5.3 | 47.8, 3.4 | 47.1, 1.3 | |
Approach | Depart | Exc. | Kick | Point | Punch | Push | Sh. Hands | Avg. | ||
K3HI | M1 [11] | 47.2, 2.9 | 47.9, 3.0 | 46.9, 2.9 | 41.1, 3.5 | 39.9, 7.2 | 45.5, 3.1 | 45.8, 3.7 | 46.8, 5.5 | 45.1, 3.0 |
M2 | 48.0, 3.6 | 48.6, 2.7 | 47.1, 2.6 | 41.0, 3.1 | 37.7, 6.4 | 44.6, 3.8 | 45.5, 3.1 | 45.9, 4.3 | 44.8, 3.7 | |
M3 | 47.2, 4.3 | 47.1, 3.0 | 45.9, 3.4 | 41.2, 3.7 | 40.4, 6.9 | 45.0, 2.5 | 44.3, 3.4 | 45.4, 4.4 | 44.6, 2.5 |
Model | SBU | K3HI | ||
---|---|---|---|---|
First Person | Third Person | First Person | Third Person | |
M1 (bs) | 1.0, 7368 | 0.4, 4364 | 1.6, 5388 | 0.7, 2452 |
M2 (bs) | 1.5, 9201 | 0.9, 8720 | 2.2, 5862 | 4.4, 17,499 |
M3 (bs) | 0.4, 8250 | 0.3, 8018 | 0.7, 3459 | 0.5, 3310 |
M1 (pe) | 1.8, 7146 | 0.5, 4166 | 5.2, 9154 | 1.8, 7199 |
M2 (pe) | 2.7, 10,627 | 1.0, 8282 | 3.8, 6673 | 5.6, 17,430 |
M3 (pe) | 0.6, 8207 | 0.3, 8105 | 1.0, 3255 | 0.4, 2926 |
M1 (lwpe) | 1.2, 5512 | 0.5, 2844 | 2.7, 5421 | 2.5, 5832 |
M2 (lwpe) | 3.4, 12,169 | 2.6, 13,030 | 6.5, 10,350 | 8.7, 17,499 |
M3 (lwpe) | 0.5, 7727 | 0.4, 7586 | 0.8, 2887 | 0.6, 2685 |
M1 (lw) | 1.4, 5203 | 1.3, 6889 | 4.6, 10,519 | 2.0, 3350 |
M2 (lw) | 4.0, 17,999 | 3.4, 17,999 | 7.0, 12,352 | 8.2, 17,499 |
M3 (lw) | 0.6, 8857 | 0.5, 8541 | 0.8, 2715 | 1.0, 3491 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Banerjee, B.; Baruah, M. Attention-Based Variational Autoencoder Models for Human–Human Interaction Recognition via Generation. Sensors 2024, 24, 3922. https://doi.org/10.3390/s24123922
Banerjee B, Baruah M. Attention-Based Variational Autoencoder Models for Human–Human Interaction Recognition via Generation. Sensors. 2024; 24(12):3922. https://doi.org/10.3390/s24123922
Chicago/Turabian StyleBanerjee, Bonny, and Murchana Baruah. 2024. "Attention-Based Variational Autoencoder Models for Human–Human Interaction Recognition via Generation" Sensors 24, no. 12: 3922. https://doi.org/10.3390/s24123922
APA StyleBanerjee, B., & Baruah, M. (2024). Attention-Based Variational Autoencoder Models for Human–Human Interaction Recognition via Generation. Sensors, 24(12), 3922. https://doi.org/10.3390/s24123922