Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems
Abstract
:1. Introduction
List of Contributions
- This work presents a novel multi-stage approach that incorporates hand pose data with regard to both hand joints and vertices, especially to recognize human–object interaction and identify which one of the users interacts. Here, the statistical features are applied independently on hand joints and vertices, and the resulting features are fused to enhance recognition performance.
- It proposes a new mathematical model for feature extraction and description, which captures distinctive statistical attributes from hand joints and vertices. These attributes are then represented as a feature vector for HOIR, enhancing both interaction recognition and user identification with an average F1-Score of 81% for HOI recognition and an average F1-Score of 80% for user identification.
2. Related Studies
3. Proposed System Model
Algorithm 1 HOI Recognition Model |
|
3.1. Data Acquisition
3.2. Preprocessing
3.2.1. Window-Based Segmentation
3.2.2. Snippet-to-Frame Conversion
3.2.3. Hand Pose Data Extraction
3.2.4. 3D Hand Joint and Vertex Representation
Algorithm 2 Segmentation and Hand Pose Extraction |
|
3.3. Hand Joint and Vertex Transformation: 3D to 7D Space
3.3.1. Enhanced Hand Joint Representation
3.3.2. Enhanced Hand Vertex Representation
3.3.3. Normalization
Algorithm 3 Hand Joint and Vertex Transformation |
|
3.4. Feature Extraction and Description
3.4.1. Statistical Feature Extraction
3.4.2. Mean Pooling
3.4.3. Feature Vector Concatenation
Algorithm 4 Feature Extraction and Description |
|
3.5. Classification
4. Experimental Results and Analysis
4.1. Implementation Details
4.2. Evaluation Metrics and Validation Method
4.3. Analysis of Multi-Stage HOI Recognition
4.3.1. Stage 1: Analysis of Object Recognition
4.3.2. Stage 2: Analysis of Interaction Recognition
4.3.3. Stage 3: Analysis of User Recognition
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gupta, S.; Malik, J. Visual semantic role labeling. arXiv 2015, arXiv:1505.04474. [Google Scholar]
- Hou, Z.; Yu, B.; Qiao, Y.; Peng, X.; Tao, D. Affordance transfer learning for human-object interaction detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 495–504. [Google Scholar]
- Li, Q.; Xie, X.; Zhang, J.; Shi, G. Few-shot human–object interaction video recognition with transformers. Neural Netw. 2023, 163, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Chao, Y.W.; Wang, Z.; He, Y.; Wang, J.; Deng, J. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1017–1025. [Google Scholar]
- Sadhu, A.; Gupta, T.; Yatskar, M.; Nevatia, R.; Kembhavi, A. Visual semantic role labeling for video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 5589–5600. [Google Scholar]
- Li, Y.; Ouyang, W.; Zhou, B.; Shi, J.; Zhang, C.; Wang, X. Factorizable net: An efficient subgraph-based framework for scene graph generation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 335–351. [Google Scholar]
- Zhou, T.; Wang, W.; Qi, S.; Ling, H.; Shen, J. Cascaded human-object interaction recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4263–4272. [Google Scholar]
- Bansal, S.; Wray, M.; Damen, D. HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision. arXiv 2024, arXiv:2404.09933. [Google Scholar]
- Cai, M.; Kitani, K.; Sato, Y. Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes. arXiv 2018, arXiv:1807.08254. [Google Scholar]
- Chen, L.; Lin, S.Y.; Xie, Y.; Lin, Y.Y.; Xie, X. Mvhm: A large-scale multi-view hand mesh benchmark for accurate 3d hand pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 836–845. [Google Scholar]
- Ge, L.; Ren, Z.; Yuan, J. Point-to-point regression pointnet for 3d hand pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 475–491. [Google Scholar]
- Wan, B.; Zhou, D.; Liu, Y.; Li, R.; He, X. Pose-aware multi-level feature network for human object interaction detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9469–9478. [Google Scholar]
- Chu, J.; Jin, L.; Xing, J.; Zhao, J. UniParser: Multi-Human Parsing with Unified Correlation Representation Learning. arXiv 2023, arXiv:2310.08984. [Google Scholar] [CrossRef] [PubMed]
- Chu, J.; Jin, L.; Fan, X.; Teng, Y.; Wei, Y.; Fang, Y.; Xing, J.; Zhao, J. Single-Stage Multi-human Parsing via Point Sets and Center-based Offsets. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1863–1873. [Google Scholar]
- Wang, T.; Yang, T.; Danelljan, M.; Khan, F.S.; Zhang, X.; Sun, J. Learning human-object interaction detection using interaction points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4116–4125. [Google Scholar]
- He, T.; Gao, L.; Song, J.; Li, Y.F. Exploiting scene graphs for human-object interaction detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15984–15993. [Google Scholar]
- Nagarajan, T.; Feichtenhofer, C.; Grauman, K. Grounded human-object interaction hotspots from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8688–8697. [Google Scholar]
- Ehatisham-ul Haq, M.; Azam, M.A.; Loo, J.; Shuang, K.; Islam, S.; Naeem, U.; Amin, Y. Authentication of smartphone users based on activity recognition and mobile sensing. Sensors 2017, 17, 2043. [Google Scholar] [CrossRef] [PubMed]
- Shoaib, M.; Bosch, S.; Incel, O.D.; Scholten, H.; Havinga, P.J. A survey of online activity recognition using mobile phones. Sensors 2015, 15, 2059–2085. [Google Scholar] [CrossRef] [PubMed]
- Ye, Q.; Xu, X.; Li, R. Human-object Behavior Analysis Based on Interaction Feature Generation Algorithm. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [Google Scholar] [CrossRef]
- Yang, N.; Zheng, Y.; Guo, X. Efficient transformer for human-object interaction detection. In Proceedings of the Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), Hangzhou, China, 26–28 May 2023; Volume 12800, pp. 536–542. [Google Scholar]
- Zaib, M.H.; Khan, M.J. An HMM-Based Approach for Human Interaction Using Multiple Feature Descriptors; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar]
- Ozaki, H.; Tran, D.T.; Lee, J.H. Effective human–object interaction recognition for edge devices in intelligent space. SICE J. Control Meas. Syst. Integr. 2024, 17, 1–9. [Google Scholar] [CrossRef]
- Gkioxari, G.; Girshick, R.; Dollár, P.; He, K. Detecting and recognizing human-object interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8359–8367. [Google Scholar]
- Zhou, P.; Chi, M. Relation parsing neural network for human-object interaction detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 843–851. [Google Scholar]
- Kato, K.; Li, Y.; Gupta, A. Compositional learning for human object interaction. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 234–251. [Google Scholar]
- Xie, X.; Bhatnagar, B.L.; Pons-Moll, G. Visibility aware human-object interaction tracking from single rgb camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 4757–4768. [Google Scholar]
- Purwanto, D.; Chen, Y.T.; Fang, W.H. First-person action recognition with temporal pooling and Hilbert–Huang transform. IEEE Trans. Multimed. 2019, 21, 3122–3135. [Google Scholar] [CrossRef]
- Liu, T.; Zhao, R.; Jia, W.; Lam, K.M.; Kong, J. Holistic-guided disentangled learning with cross-video semantics mining for concurrent first-person and third-person activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5211–5225. [Google Scholar] [CrossRef] [PubMed]
- Yao, Y.; Xu, M.; Choi, C.; Crandall, D.J.; Atkins, E.M.; Dariush, B. Egocentric vision-based future vehicle localization for intelligent driving assistance systems. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9711–9717. [Google Scholar]
- Liu, O.; Rakita, D.; Mutlu, B.; Gleicher, M. Understanding human-robot interaction in virtual reality. In Proceedings of the 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Lisbon, Portugal, 28 August–1 September 2017; pp. 751–757. [Google Scholar]
- Leonardi, R.; Ragusa, F.; Furnari, A.; Farinella, G.M. Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario. Comput. Vis. Image Underst. 2024, 242, 103984. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, Y.; Jiang, C.; Lyu, K.; Wan, W.; Shen, H.; Liang, B.; Fu, Z.; Wang, H.; Yi, L. Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21013–21022. [Google Scholar]
- Romero, J.; Tzionas, D.; Black, M.J. Embodied hands. ACM Trans. Graph. 2017, 36, 245. [Google Scholar] [CrossRef]
Objects | Interactions |
---|---|
C:Bottle | T:Pickandplace |
T:Pickandplace(withwater) | |
T:Pourallthewaterintoamug | |
T:Putitinthedrawer | |
T:Repositionthebottle | |
T:Takeitoutofthedrawer | |
C:Bowl | T:Pickandplace |
T:Pickandplace(withball) | |
T:Putitinthedrawer | |
T:Puttheballinthebowl | |
T:Takeitoutofthedrawer | |
T:Taketheballoutofthebowl | |
C:Bucket | T:Pickandplace |
T:Pourwaterintoanotherbucket | |
C:Chair | T:Pickandplacetoanewposition |
T:Pickandplacetotheoriginalposition | |
C:Kettle | T:Pickandplace |
T:Pourwaterintoamug | |
C:Knife | T:Cutapple |
T:Pickandplace | |
T:Putitinthedrawer | |
T:Takeitoutofthedrawer | |
C:Lamp | T:Pickandplace |
T:Turnandfold | |
T:Turnonandturnoff | |
C:Laptop | T:Openandclosethedisplay |
T:Pickandplace | |
C:Mug | T:Fillwithwaterbyakettle |
T:Pickandplace | |
T:Pickandplace(withwater) | |
T:Pourwaterintoanothermug | |
T:Putitinthedrawer | |
T:Takeitoutofthedrawer | |
C:Pliers | T:Clampsomething |
T:Pickandplace | |
T:Putitinthedrawer | |
T:Takeitoutofthedrawer | |
C:Safe | T:Openandclosethedoor |
T:Putsomethinginit | |
T:Takesomethingoutofit | |
C:Scissor | T:CutSomething |
T:Pickandplace | |
C:Stapler | T:Bindthepaper |
T:Pickandplace | |
C:Furniture | T:Openandclosethedoor |
T:Openandclosethedrawer | |
T:Putthedrinkinthedoor | |
T:Putthedrinkinthedrawer | |
C:ToyCar | T:Pickandplace |
T:Pushtoycar | |
T:Putitinthedrawer | |
T:Takeitoutofthedrawer | |
C:TrashCan | T:Openandclose |
T:Throwsomethinginit |
Feature | Mathematical Description |
---|---|
F1: MaximumAmplitude | where D can be either accelerometer or gyroscope data, and b represents the axis. |
F2: MinimumAmplitude | |
F3: ArithmeticMean | where M is the total number of samples. |
F4: StandardDeviation | where represents the mean value. |
F5: Kurtosis | where F represents the expected value. |
F6: Skewness | |
F7: PeakToPeakSignalValue | , where is peak to peak value of the signal |
F8: PeakToPeakTime | |
F9: PeakToPeakSlope | |
F10: MaximumLatency | |
F11: MinimumLatency | |
F12: AbsoluteLatencyTo AmplitudeRatio | |
F13: MeanOfAbsolute ValuesOfFirstDifference | , where |
F14: MeanOfAbsolute ValuesOfSecondDifference | |
F15: NormalizedMeanOfAbsolute ValuesOfFirstDifference | , where n is normalized signal |
F16: NormalizedMeanOfAbsolute ValuesOfSecondDifference | , where n is normalized signal |
F17: Energy | |
F18: NormalizedEnergy | |
F19: Entropy | , where |
Object | Hand | Hand | Fusion | Fusion | Fusion | Fusion | Fusion |
---|---|---|---|---|---|---|---|
Joints | Vertex | Concat | Mean | Max | Min | Sum | |
F1 | F1 | F1 | F1 | F1 | F1 | F1 | |
C:Bottle | 0.87 | 0.88 | 0.89 | 0.83 | 0.82 | 0.83 | 0.79 |
C:Bowl | 0.77 | 0.76 | 0.90 | 0.93 | 0.79 | 0.79 | 0.79 |
C:Bucket | 0.69 | 0.70 | 0.76 | 0.77 | 0.70 | 0.71 | 0.72 |
C:Chair | 0.89 | 0.88 | 0.90 | 0.85 | 0.92 | 0.89 | 0.94 |
C:Kettle | 0.78 | 0.79 | 0.89 | 0.88 | 0.88 | 0.86 | 0.80 |
C:Knife | 0.67 | 0.68 | 0.77 | 0.76 | 0.81 | 0.80 | 0.78 |
C:Lamp | 0.68 | 0.68 | 0.78 | 0.72 | 0.79 | 0.81 | 0.83 |
C:Laptop | 0.81 | 0.80 | 0.85 | 0.87 | 0.88 | 0.91 | 0.92 |
C:Mug | 0.74 | 0.77 | 0.77 | 0.80 | 0.75 | 0.73 | 0.84 |
C:Pliers | 0.65 | 0.65 | 0.75 | 0.71 | 0.71 | 0.73 | 0.69 |
C:safe | 0.59 | 0.60 | 0.66 | 0.72 | 0.77 | 0.75 | 0.77 |
C:Scissors | 0.81 | 0.84 | 0.87 | 0.89 | 0.87 | 0.86 | 0.84 |
C:Stapler | 0.70 | 0.68 | 0.81 | 0.77 | 0.81 | 0.81 | 0.80 |
C:Furniture | 0.70 | 0.68 | 0.66 | 0.58 | 0.79 | 0.85 | 0.79 |
C:Toy Car | 0.60 | 0.58 | 0.64 | 0.60 | 0.66 | 0.67 | 0.69 |
C:Trash Can | 0.70 | 0.71 | 0.77 | 0.77 | 0.71 | 0.77 | 0.73 |
Average | 0.73 | 0.73 | 0.79 | 0.78 | 0.79 | 0.80 | 0.79 |
Hand | Hand | Fusion | Fusion | Fusion | Fusion | Fusion | ||
---|---|---|---|---|---|---|---|---|
Joints | Vertex | Concat | Sum | Mean | Max | Min | ||
Objects | Interactions | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
C:Bottle | T:Pickandplace | 0.72 | 0.72 | 0.84 | 0.88 | 0.63 | 0.62 | 0.61 |
T:Pickandplace(withwater) | 0.61 | 0.67 | 0.84 | 0.57 | 0.86 | 0.81 | 0.74 | |
T:Pourallthewaterintoamug | 0.37 | 0.41 | 0.47 | 0.42 | 0.62 | 0.44 | 0.57 | |
T:Putitinthedrawer | 0.45 | 0.43 | 0.61 | 0.68 | 0.49 | 0.40 | 0.56 | |
T:Repositionthebottle | 0.75 | 0.73 | 0.82 | 0.78 | 0.79 | 0.81 | 0.81 | |
T:Takeitoutofthedrawer | 0.58 | 0.65 | 0.73 | 0.69 | 0.68 | 0.77 | 0.76 | |
C:Bowl | T:Pickandplace | 0.86 | 0.85 | 0.91 | 0.90 | 0.88 | 0.83 | 0.91 |
T:Pickandplace(withball) | 0.83 | 0.83 | 0.71 | 0.71 | 1.00 | 1.00 | 1.00 | |
T:Putitinthedrawer | 0.83 | 0.79 | 0.91 | 0.88 | 0.83 | 0.81 | 0.84 | |
T:Puttheballinthebowl | 0.77 | 0.69 | 0.73 | 0.85 | 1.00 | 0.95 | 0.84 | |
T:Takeitoutofthedrawer | 0.71 | 0.68 | 0.77 | 0.72 | 0.78 | 0.70 | 0.82 | |
T:Taketheballoutofthebowl | 0.76 | 0.82 | 0.78 | 0.82 | 0.92 | 0.84 | 0.90 | |
C:Bucket | T:Pickandplace | 0.77 | 0.77 | 0.90 | 0.87 | 0.86 | 0.88 | 0.89 |
T:Pourwaterintoanotherbucket | 0.84 | 0.84 | 0.93 | 0.91 | 0.89 | 0.91 | 0.92 | |
C:Chair | T:Pickandplacetoanewposition | 0.54 | 0.52 | 0.51 | 0.45 | 0.57 | 0.53 | 0.45 |
T:Picknplacetooriginalposition | 0.54 | 0.48 | 0.54 | 0.44 | 0.61 | 0.64 | 0.51 | |
C:Kettle | T:Pickandplace | 0.96 | 0.96 | 0.94 | 0.92 | 0.96 | 0.97 | 0.95 |
T:Pourwaterintoamug | 0.95 | 0.96 | 0.93 | 0.91 | 0.96 | 0.97 | 0.95 | |
C:Knife | T:Cutapple | 0.86 | 0.85 | 0.92 | 0.90 | 0.92 | 0.93 | 0.93 |
T:Pickandplace | 0.93 | 0.92 | 0.96 | 0.94 | 0.91 | 0.93 | 0.93 | |
T:Putitinthedrawer | 0.62 | 0.63 | 0.65 | 0.65 | 0.52 | 0.60 | 0.56 | |
T:Takeitoutofthedrawer | 0.64 | 0.67 | 0.83 | 0.70 | 0.67 | 0.81 | 0.69 | |
C:Lamp | T:Pickandplace | 0.9 | 0.83 | 0.95 | 0.97 | 0.86 | 0.89 | 0.86 |
T:Turnandfold | 0.97 | 0.95 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
T:Turnonandturnoff | 0.89 | 0.87 | 0.94 | 0.97 | 0.81 | 0.84 | 0.80 | |
C:Laptop | T:Openandclosethedisplay | 0.92 | 0.9 | 0.93 | 0.95 | 0.95 | 0.94 | 0.96 |
T:Pickandplace | 0.87 | 0.83 | 0.88 | 0.92 | 0.90 | 0.89 | 0.93 | |
C:Mug | T:Fillwithwaterbyakettle | 0.6 | 0.56 | 0.59 | 0.56 | 0.96 | 0.83 | 0.77 |
T:Pickandplace | 0.76 | 0.74 | 0.70 | 0.75 | 0.87 | 0.84 | 0.85 | |
T:Pickandplace(withwater) | 0.67 | 0.63 | 0.47 | 0.45 | 0.40 | 0.40 | 0.47 | |
T:Pourwaterintoanothermug | 0.75 | 0.71 | 0.68 | 0.85 | 0.89 | 0.81 | 0.85 | |
T:Putitinthedrawer | 0.5 | 0.52 | 0.40 | 0.45 | 0.54 | 0.46 | 0.57 | |
T:Takeitoutofthedrawer | 0.82 | 0.79 | 0.82 | 0.83 | 0.79 | 0.64 | 0.85 | |
C:Pliers | T:Clampsomething | 0.85 | 0.86 | 0.84 | 0.83 | 0.82 | 0.86 | 0.74 |
T:Pickandplace | 0.88 | 0.83 | 0.91 | 0.88 | 0.77 | 0.76 | 0.77 | |
T:Putitinthedrawer | 0.52 | 0.48 | 0.50 | 0.48 | 0.50 | 0.68 | 0.52 | |
T:Takeitoutofthedrawer | 0.7 | 0.66 | 0.76 | 0.74 | 0.73 | 0.67 | 0.63 | |
C:Safe | T:Openandclosethedoor | 0.75 | 0.76 | 0.87 | 0.77 | 0.72 | 0.71 | 0.72 |
T:Putsomethinginit | 0.66 | 0.65 | 0.80 | 0.65 | 0.71 | 0.68 | 0.68 | |
T:Takesomethingoutofit | 0.54 | 0.53 | 0.74 | 0.56 | 0.59 | 0.67 | 0.61 | |
C:Scissor | T:CutSomething | 0.95 | 0.95 | 0.99 | 0.91 | 0.94 | 0.92 | 0.94 |
T:Pickandplace | 0.83 | 0.86 | 0.97 | 0.71 | 0.85 | 0.82 | 0.85 | |
C:Stapler | T:Bindthepaper | 0.89 | 0.89 | 0.96 | 0.91 | 0.92 | 0.93 | 0.94 |
T:Pickandplace | 0.85 | 0.86 | 0.95 | 0.89 | 0.91 | 0.92 | 0.93 | |
C:Furniture | T:Openandclosethedoor | 0.81 | 0.79 | 0.78 | 0.79 | 0.76 | 0.77 | 0.63 |
T:Openandclosethedrawer | 0.85 | 0.82 | 1.00 | 0.89 | 0.82 | 0.87 | 0.79 | |
T:Putthedrinkinthedoor | 0.73 | 0.66 | 0.66 | 0.69 | 0.83 | 0.84 | 0.74 | |
T:Putthedrinkinthedrawer | 0.76 | 0.7 | 0.81 | 0.73 | 0.81 | 0.83 | 0.81 | |
C:Toy Car | T:Pickandplace | 0.8 | 0.75 | 0.86 | 0.84 | 0.82 | 0.79 | 0.83 |
T:Pushtoycar | 0.63 | 0.62 | 0.68 | 0.63 | 0.68 | 0.63 | 0.73 | |
T:Putitinthedrawer | 0.47 | 0.45 | 0.63 | 0.67 | 0.45 | 0.43 | 0.48 | |
T:Takeitoutofthedrawer | 0.74 | 0.69 | 0.77 | 0.77 | 0.62 | 0.67 | 0.74 | |
C:TrashCan | T:Openandclose | 0.83 | 0.81 | 0.81 | 0.82 | 0.82 | 0.82 | 0.82 |
T:Throwsomethinginit | 0.83 | 0.81 | 0.82 | 0.81 | 0.79 | 0.77 | 0.79 | |
Average | 0.75 | 0.73 | 0.82 | 0.78 | 0.80 | 0.80 | 0.79 |
HOI | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Bottle: Pick and place | 0.85 | 0.78 | 0.85 | 0.81 |
Bottle: Pick and place (with water) | 1.00 | 1.00 | 1.00 | 1.00 |
Bottle: Pour all the water into a mug | 0.80 | 0.78 | 0.80 | 0.79 |
Bottle: Put it in the drawer | 0.50 | 0.50 | 0.50 | 0.50 |
Bottle: Reposition the bottle | 0.68 | 0.59 | 0.68 | 0.62 |
Bottle: Take it out of the drawer | 0.50 | 0.50 | 0.50 | 0.50 |
Bowl: Pick and place | 0.66 | 0.66 | 0.66 | 0.66 |
Bowl: Put it in the drawer | 1.00 | 1.00 | 1.00 | 1.00 |
Bowl: Put the ball in the bowl | 1.00 | 1.00 | 1.00 | 1.00 |
Bowl: Take it out of the drawer | 0.93 | 0.95 | 0.93 | 0.93 |
Bowl: Take the ball out of the bowl | 1.00 | 1.00 | 1.00 | 1.00 |
Bowl: Pick and place (with Ball) | 1.00 | 1.00 | 1.00 | 1.00 |
Bucket: Pick and place | 0.40 | 0.39 | 0.40 | 0.39 |
Bucket: Pour water into another bucket | 0.35 | 0.34 | 0.35 | 0.34 |
Chair: Pick and place to a new position | 0.91 | 0.92 | 0.91 | 0.91 |
Chair: Pick and place to the original position | 0.87 | 0.90 | 0.87 | 0.87 |
Kettle: Pick and place | 0.94 | 0.95 | 0.94 | 0.93 |
Kettle: Pour water into a mug | 0.90 | 0.89 | 0.90 | 0.89 |
Knife: Cut apple | 0.88 | 0.85 | 0.88 | 0.84 |
Knife: Pick and place | 0.94 | 0.95 | 0.94 | 0.94 |
Knife: Put it in the drawer | 0.80 | 0.75 | 0.80 | 0.77 |
Knife: Take it out of the drawer | 0.80 | 0.77 | 0.80 | 0.78 |
Lamp: Pick and place | 0.80 | 0.81 | 0.80 | 0.79 |
Lamp: Turn and fold | 0.94 | 0.92 | 0.94 | 0.92 |
Lamp: Turn on and turn off | 0.87 | 0.86 | 0.87 | 0.85 |
Laptop: Open and close the display | 0.93 | 0.93 | 0.93 | 0.93 |
Laptop: Pick and place | 0.68 | 0.69 | 0.68 | 0.66 |
Mug: Fill with water by a kettle | 0.67 | 0.67 | 0.67 | 0.67 |
Mug: Pick and place | 0.93 | 0.95 | 0.93 | 0.93 |
Mug: Pick and place (with water) | 0.60 | 0.57 | 0.60 | 0.58 |
Mug: Pour water into another mug | 0.91 | 0.88 | 0.91 | 0.89 |
Mug: Put it in the drawer | 0.59 | 0.59 | 0.59 | 0.59 |
Mug: Take it out of the drawer | 0.60 | 0.58 | 0.60 | 0.59 |
Pliers: Clamp something | 0.75 | 0.69 | 0.75 | 0.71 |
Pliers: Pick and place | 0.81 | 0.80 | 0.81 | 0.78 |
Pliers: Put it in the drawer | 0.75 | 0.75 | 0.75 | 0.75 |
Pliers: Take it out of the drawer | 0.63 | 0.63 | 0.63 | 0.63 |
Safe: Open and close the door | 0.78 | 0.73 | 0.78 | 0.72 |
Safe: Put something in it | 0.88 | 0.93 | 0.88 | 0.88 |
Safe: Take something out of it | 0.90 | 0.85 | 0.90 | 0.87 |
Scissors: Cut something | 0.98 | 0.98 | 0.98 | 0.98 |
Scissors: Pick and place | 0.65 | 0.65 | 0.65 | 0.65 |
Stapler: Bind the paper | 0.92 | 0.92 | 0.92 | 0.92 |
Stapler: Pick and place | 0.92 | 0.90 | 0.92 | 0.90 |
Storage Furniture: Open and close the door | 0.70 | 0.70 | 0.70 | 0.70 |
Storage Furniture: Open and close the drawer | 0.63 | 0.60 | 0.63 | 0.60 |
Storage Furniture: Put the drink in the door | 0.58 | 0.59 | 0.58 | 0.58 |
Storage Furniture: Put the drink in the drawer | 0.62 | 0.63 | 0.62 | 0.62 |
Toy Car: Pick and place | 0.95 | 0.98 | 0.95 | 0.96 |
Toy Car: Push toy car | 0.87 | 0.92 | 0.87 | 0.87 |
Toy Car: Put it in the drawer | 0.98 | 0.98 | 0.98 | 0.97 |
Toy Car: Take it out of the drawer | 1.00 | 1.00 | 1.00 | 1.00 |
Trash Can: Open and close | 0.88 | 0.92 | 0.88 | 0.89 |
Trash Can: Throw something in it | 0.90 | 0.94 | 0.90 | 0.90 |
Average | 0.80 | 0.80 | 0.80 | 0.79 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hamid, D.; Ul Haq, M.E.; Yasin, A.; Murtaza, F.; Azam, M.A. Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems. Information 2024, 15, 629. https://doi.org/10.3390/info15100629
Hamid D, Ul Haq ME, Yasin A, Murtaza F, Azam MA. Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems. Information. 2024; 15(10):629. https://doi.org/10.3390/info15100629
Chicago/Turabian StyleHamid, Danish, Muhammad Ehatisham Ul Haq, Amanullah Yasin, Fiza Murtaza, and Muhammad Awais Azam. 2024. "Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems" Information 15, no. 10: 629. https://doi.org/10.3390/info15100629
APA StyleHamid, D., Ul Haq, M. E., Yasin, A., Murtaza, F., & Azam, M. A. (2024). Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems. Information, 15(10), 629. https://doi.org/10.3390/info15100629