Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method
Abstract
:1. Introduction
- (1)
- When performing hand pose estimation using a stacked hourglass network, it is necessary to generate heatmaps by considering each hand joint coordinate as the mean of a Gaussian distribution. However, this process introduces quantization errors, which, in turn, affect the accurate localization of hand joints. In this paper, we propose the generation of unbiased heatmaps to improve the standard coordinate encoding. Additionally, we adopt a distribution-aware method for coordinate decoding to alleviate the errors in joint coordinate encoding and decoding.
- (2)
- Due to the high degree of freedom in human hands, occlusion often occurs between hand joints, which poses significant challenges for joint prediction. Considering the interdependence among hand joints, we introduce a hand topology structure to extract more features that are beneficial for accurate joint estimation. This allows us to make reasonable predictions for occluded hand joints.
- (3)
- We employ graph convolution to model the complex dependency relationships between hand joints, as well as the relationships between pixels and joints. By determining the relationships between hand joints, we enhance the feature information of each joint. Additionally, we utilize a skeleton constraint loss function to impose constraints on the direction and length of the joints. This approach enables the generation of a natural and undistorted hand skeleton structure.
2. Related Work
3. Methodology
3.1. Overall Network Structure
3.2. Hourglass Network
3.3. Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method
3.3.1. Encoding of Hand Joint Coordinates
3.3.2. Decoding of Hand Joint Point Coordinates
3.4. Hand Arthrogram Reasoning Module
3.4.1. Pixel–Joint Voting
3.4.2. Graph Convolution
3.4.3. Joint–Pixel Mapping
3.5. Skeletal Constraint Loss Function
4. Experiments
4.1. Experimental Environment
4.2. Experimental Settings
5. Results
5.1. Qualitative Comparison
5.2. Quantitative Comparison
5.2.1. Comparative Experiments with Different Methods
5.2.2. Ablation Experiment
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xia, S.; Gao, L.; Lai, Y.-K.; Yuan, M.-Z.; Chai, J. A Survey on Human Performance Capture and Animation. J. Comput. Sci. Technol. 2017, 32, 536–554. [Google Scholar] [CrossRef]
- Usai, M.; Meyer, R.; Baier, R.; Herzberger, N.; Lebold, K.; Flemisch, F. System architecture for gesture control of maneuvers in automated driving. In Intelligent Human Systems Integration 2021: Proceedings of the 4th International Conference on Intelligent Human Systems Integration (IHSI 2021): Integrating People and Intelligent Systems, Palermo, Italy, 22–24 February 2021; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Gao, Q.; Chen, Y.; Ju, Z.; Liang, Y. Dynamic Hand Gesture Recognition Based on 3D Hand Pose Estimation for Human–Robot Interaction. IEEE Sensors J. 2021, 22, 17421–17430. [Google Scholar] [CrossRef]
- Okano, M.; Liu, J.Q.; Tateyama, T.; Chen, Y.-W. DHGD: Dynamic hand pose dataset for skeleton-based gesture recognition and baseline evaluations. In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; pp. 1–4. [Google Scholar]
- Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.L.; Grundmann, M. Mediapipe hands: On-device real-time hand tracking. arXiv 2020, arXiv:2006.10214. [Google Scholar]
- Hakim, N.L.; Shih, T.K.; Kasthuri Arachchi, S.P.; Aditya, W.; Chen, Y.C.; Lin, C.Y. Dynamic hand pose recognition using 3D CNN and LSTM with FSM context-aware model. Sensors 2019, 19, 5429. [Google Scholar] [CrossRef] [PubMed]
- Materzynska, J.; Berger, G.; Bax, I.; Memisevic, R. The jester dataset: A large-scale video dataset of human gestures. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 1–7. [Google Scholar]
- Gupta, P.; Kautz, K. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Volume 1, p. 3. [Google Scholar]
- Benitez-Garcia, G.; Olivares-Mercado, J.; Sanchez-Perez, G.; Yanai, K. Ipn hand: A video dataset and benchmark for real-time continuous hand pose recognition. In Proceedings of the 25th IEEE Conference on International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 4340–4347. [Google Scholar]
- Ng, M.Y.; Chng, C.B.; Koh, W.K.; Chui, C.K.; Chua, M.C. An enhanced self-attention and A2J approach for 3D hand pose estimation. Multimed. Tools Appl. 2022, 81, 41661–41676. [Google Scholar] [CrossRef]
- Ge, L.; Liang, H.; Yuan, J.; Thalmann, D. Robust 3d hand pose estimation in single depth images: From single-view cnn to multi-view cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3593–3601. [Google Scholar]
- Fang, L.; Liu, X.; Liu, L.; Xu, H.; Kang, W. Jgr-p2o: Joint graph reasoning based pixel-to-offset prediction network for 3d hand pose estimation from a single depth image. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 120–137. [Google Scholar]
- Moon, G.; Chang, J.Y.; Lee, K.M. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5079–5088. [Google Scholar]
- Ge, L.; Cai, Y.; Weng, J.; Yuan, J. Hand pointnet: 3d hand pose estimation using point sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8417–8426. [Google Scholar]
- Gong, J.; Fan, Z.; Ke, Q.; Rahmani, H.; Liu, J. Meta agent teaming active learning for pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11079–11089. [Google Scholar]
- Almadani, M.; Elhayek, A.; Malik, J.; Stricker, D. Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input. IEEE Access 2021, 9, 136438–136447. [Google Scholar] [CrossRef]
- Chang, X.; Yi, W.; Lin, X.; Sun, Y. 3D hand reconstruction with both shape and appearance from an RGB image. Image Vis. Comput. 2023, 135, 104690. [Google Scholar] [CrossRef]
- Yu, Z.; Huang, S.; Fang, C.; Breckon, T.P.; Wang, J. ACR: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12955–12964. [Google Scholar]
- Zimmermann, C.; Brox, T. Learning to estimate 3d hand pose from singleRGB images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4903–4911. [Google Scholar]
- Moon, G.; Yu, S.I.; Wen, H.; Shiratori, T.; Lee, K.M. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single RGB image. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 548–564. [Google Scholar]
- Hampali, S.; Sarkar, S.D.; Rad, M.; Lepetit, V. Keypoint transformer: Solving joint identification in challenging hands and object interactions for accurate 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11090–11100. [Google Scholar]
- Doosti, B.; Naha, S.; Mirbagheri, M.; Crandall, D.J. Hope-net: A graph-based model for hand-object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6608–6617. [Google Scholar]
- Ge, L.; Ren, Z.; Li, Y.; Xue, Z.; Wang, Y.; Cai, J.; Yuan, J. 3d hand shape and pose estimation from a single RGB image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10833–10842. [Google Scholar]
- Khaleghi, L.; Sepas-Moghaddam, A.; Marshall, J.; Etemad, A. Multiview Video-Based 3-D Hand Pose Estimation. IEEE Trans. Artif. Intell. 2022, 4, 896–909. [Google Scholar] [CrossRef]
- Dong, X.; Yu, J.; Zhang, J. Joint usage of global and local attentions in hourglass network for human pose estimation. Neurocomputing 2022, 472, 95–102. [Google Scholar] [CrossRef]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the Computer Vision–ECCV 2016, 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Zhang, F.; Zhu, X.; Dai, H.; Ye, M.; Zhu, C. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7093–7102. [Google Scholar]
- Cai, Y.; Ge, L.; Cai, J.; Thalmann, N.M.; Yuan, J. 3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3739–3753. [Google Scholar] [CrossRef] [PubMed]
- Oikonomidis, I.; Kyriazis, N.; Argyros, A.A. Efficient model-based 3d tracking of hand articulations using kinect. BMVC 2011, 1, 3. [Google Scholar]
- Qian, C.; Sun, X.; Wei, Y.; Tang, X.; Sun, J. Realtime and robust hand tracking from depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23––28 June 2014; pp. 1106–1113. [Google Scholar]
- Sun, X.; Wei, Y.; Liang, S.; Tang, X.; Sun, J. Cascaded hand pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 824–832. [Google Scholar]
- Li, R.; Liu, Z.; Tan, J. A survey on 3D hand pose estimation: Cameras, methods, and datasets. Pattern Recognit. 2019, 93, 251–272. [Google Scholar] [CrossRef]
Algorithm | EPE (Mean) | EPE (Median) | AUC (0–50 mm) |
---|---|---|---|
Z&B [19] | 12.210 mm | 9.405 mm | 0.948 |
Cai [28] | 7.342 mm | 4.561 mm | 0.994 |
PSO [29] | - | - | 0.709 |
ICCPSO [30] | - | - | 0.748 |
CHPR [31] | - | - | 0.839 |
Our | 9.284 mm | 5.178 mm | 0.995 |
Encoding Method | Decoding Method | EPE (Mean) | EPE (Median) | AUC (0–50 mm) |
---|---|---|---|---|
Biased | No Offset | 18.374 mm | 16.137 mm | 0.628 |
Standard Offset | 16.235 mm | 13.998 mm | 0.684 | |
Taylor Expanded Offset | 14.468 mm | 11.556 mm | 0.731 | |
Unbiased | Taylor Expanded Offset | 13.949 mm | 11.113 mm | 0.785 |
Modeling | EPE (Mean) | EPE (Median) | AUC (0–50 mm) |
---|---|---|---|
Hourglass | 13.346 mm | 11.448 mm | 0.789 |
Hourglass + Graph Reasoning | 9.951 mm | 8.274 mm | 0.895 |
Hourglass + Graph Reasoning + DARK | 6.895 mm | 5.479 mm | 0.998 |
Modeling | EPE (Mean) | EPE (Median) | AUC (0–50 mm) |
---|---|---|---|
Hourglass | 16.235 mm | 13.998 mm | 0.684 |
Hourglass + Graph Reasoning | 13.456 mm | 11.233 mm | 0.794 |
Hourglass + Graph Reasoning + DARK | 10.782 mm | 7.389 mm | 0.948 |
Hourglass + Graph Reasoning + DARK+ Loss of Skeletal Restraint | 9.284 mm | 5.178 mm | 0.995 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rong, D.; Gang, F. Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method. Sensors 2024, 24, 7289. https://doi.org/10.3390/s24227289
Rong D, Gang F. Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method. Sensors. 2024; 24(22):7289. https://doi.org/10.3390/s24227289
Chicago/Turabian StyleRong, Dang, and Feng Gang. 2024. "Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method" Sensors 24, no. 22: 7289. https://doi.org/10.3390/s24227289
APA StyleRong, D., & Gang, F. (2024). Coordinate-Corrected and Graph-Convolution-Based Hand Pose Estimation Method. Sensors, 24(22), 7289. https://doi.org/10.3390/s24227289