Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation
Abstract
:1. Introduction
2. Related Works
3. Synthesizing Depth Hand Images with GANs and Style Transfer
3.1. Generator G and Discriminator D
3.2. The Style-Transfer Variant
3.2.1. Content Loss
3.2.2. Style Loss
3.2.3. Total Variation Loss
Algorithm 1. Generate depth hand image via GAN and Style Transfer |
|
4. Experimental Details
4.1. Datasets and Preparation
4.2. Model Architecture and Internal Parameters
5. Empirical Experiments
5.1. NYU Hand Pose Dataset
5.2. MSRA Hand Pose Dataset
5.3. ICVL Hand Posture Dataset
6. Ablation Study
6.1. Effects of the Components
6.2. Visual Results
7. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Meena, Y.K.; Cecotti, H.; Wong-Lin, K.; Dutta, A.; Prasad, G. Toward Optimization of Gaze-Controlled Human-Computer Interaction: Application to Hindi Virtual Keyboard for Stroke Patients. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 911–922. [Google Scholar] [CrossRef] [PubMed]
- Preece, J. Human-Computer Interaction; Addison-Wesley Longman Ltd.: Essex, UK, 1994; Volume 19, pp. 43–50. [Google Scholar]
- Supancic, J.S.; Rogez, G.; Yang, Y.; Shotton, J.; Ramanan, D. Depth-Based Hand Pose Estimation: Data, Methods, and Challenges. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1868–1876. [Google Scholar]
- Tompson, J.; Stein, M.; Lecun, Y.; Perlin, K. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Trans. Graph. 2014, 33, 1–10. [Google Scholar] [CrossRef]
- Tang, D.; Taylor, J.; Kohli, P.; Keskin, C.; Kim, T.; Shotton, J. Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 3325–3333. [Google Scholar]
- Ge, L.; Liang, H.; Yuan, J.; Thalmann, D. 3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5679–5688. [Google Scholar]
- Li, Y.; Chen, J.; Ye, F.; Liu, D. The Improvement of DS Evidence Theory and Its Application in IR/MMW Target Recognition. J. Sensors 2016, 2016, 1–15. [Google Scholar] [CrossRef]
- Li, J.; Qiu, T.; Wen, C.; Xie, K.; Wen, F.-Q. Robust Face Recognition Using the Deep C2D-CNN Model Based on Decision-Level Fusion. J. Sensors 2018, 18, 2080. [Google Scholar] [CrossRef] [PubMed]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 1–9. [Google Scholar]
- Deng, X.; Yang, S.; Zhang, Y.; Tan, P.; Chang, L.; Wang, H. Hand3D: Hand Pose Estimation Using 3D Neural Network. Available online: https://arxiv.org/pdf/1704.02224.pdf (accessed on 7 April 2017).
- Zhou, X.; Wan, Q.; Zhang, W.; Xue, X.; Wei, Y. Model Based Deep Hand Pose Estimation. Available online: https://arxiv.org/pdf/1606.06854.pdf (accessed on 22 June 2016).
- Fourure, D.; Emonet, R.; Fromont, E.; Muselet, D.; Neverova, N.; Tremeau, A.; Wolf, C. Multi-Task, MultiDomain Learning: Application to Semantic Segmentation and Pose Regression. Neurocomputing 2017, 1, 68–80. [Google Scholar] [CrossRef]
- Neverova, N.; Wolf, C.; Nebout, F.; Taylor, G. Hand Pose Estimation through Semi-Supervised and WeaklySupervised Learning. Comput. Vision Image Understanding 2017, 164, 56–67. [Google Scholar] [CrossRef]
- Xu, C.; Govindarajan, L.N.; Zhang, Y.; Cheng, L. LieX: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups. Int. J. Comput. Vision 2017, 454–478. [Google Scholar] [CrossRef]
- Wang, G.; Chen, X.; Guo, H.; Zhang, C. Region Ensemble Network: Towards Good Practices for Deep 3D Hand Pose Estimation. J. Vision Commun. Image R 2018, 55, 404–414. [Google Scholar] [CrossRef]
- Oberweger, M.; Lepetit, V. DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 585–594. [Google Scholar]
- Chen, X.; Wang, G.; Guo, H.; Zhang, C. Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation. Available online: https://arxiv.org/pdf/1708.03416.pdf (accessed on 24 June 2018).
- Yang, H.; Zhang, J. Hand Pose Regression via a Classification-Guided Approach. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 452–466. [Google Scholar]
- Sun, X.; Wei, Y.; Liang, S.; Tang, X.; Sun, J. Cascaded hand pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 824–832. [Google Scholar]
- Tang, D.; Jin Chang, H.; Tejani, A.; Kim, T.-K. Latent regression forest: Structured estimation of 3d articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 3786–3793. [Google Scholar]
- Kingma, D.; Welling, M. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014; pp. 3452–3457. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, America, 19–24 June 2016; pp. 1747–1756. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. Available online: https://arxiv.org/pdf/1511.06434.pdf (accessed on 7 January 2016).
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. Available online: https://arxiv.org/pdf/1411.1784.pdf (accessed on 6 November 2014).
- Denton, E.; Chintala, S.; Fergus, R. Deep generative image models using a Laplacian pyramid of adversarial networks. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 1486–1494. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2172–2180. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. Available online: https://arxiv.org/pdf/1701.07875.pdf (accessed on 6 December 2017).
- Hertzmann, A.; Jacobs, C.; Oliver, N.; Curless, B.; Salesin, D. Image analogies. In Proceedings of the Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, 26–30 July 2001; pp. 327–340. [Google Scholar]
- Cheng, L.; Vishwanathan, S.; Zhang, X. Consistent image analogies using semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AL, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Gatys, L.; Ecker, A.; Bethge, M. A neural algorithm of artistic style. Available online: https://arxiv.org/pdf/1508.06576.pdf (accessed on 2 September 2015).
- Ulyanov, D.; Lebedev, V.; Vedaldi, A.; Lempitsky, V. Texture networks: Feed-forward synthesis of textures and stylized images. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1349–1357. [Google Scholar]
- Johnson, J.; Alahi, A.; Li, F. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 694–711. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Ciregan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, UAS, 16–21 June 2012; pp. 3642–3649. [Google Scholar]
- Ge, L.; Liang, H.; Yuan, J.; Thalmann, D. Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3593–3601. [Google Scholar]
- Guo, H.; Wang, G.; Chen, X.; Zhang, C.; Qiao, F.; Yang, H. Region ensemble network: Improving convolutional network for hand pose estimation. In Proceedings of the IEEE International Conference on Image Processing, Beijing, China, 17–20 September 2017; pp. 4512–4516. [Google Scholar]
- Oberweger, M.; Wohlhart, P.; Lepetit, V. Hands deep in deep learning for hand pose estimation. In Proceedings of the Computer Vision Winter Workshop, Styria, Austria, 9–11 February 2015; pp. 21–30. [Google Scholar]
- Tang, D.; Yu, T.-H.; Kim, T.-K. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013; pp. 3224–3231. [Google Scholar]
- Liang, H.; Yuan, J.; Thalmann, D. Parsing the hand in depth images. IEEE Trans. Multimed. 2014, 16, 1241–1253. [Google Scholar] [CrossRef]
- Wan, C.; Probst, T.; Van Gool, L.; Yao, A. Crossing nets: Dual generative models with a shared latent space for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 3642–3649. [Google Scholar]
- Bouchacourt, D.; Kumar, M.P.; Nowozin, S. DISCO Nets: Dissimilarity Coefficient Networks. In Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 352–360. [Google Scholar]
- Baek, S.; Kim, K.I.; Kim, T.K. Augmented skeleton space transfer for depth-based hand pose estimation. Available online: https://arxiv.org/pdf/1805.04497v1.pdf (accessed on 11 May 2018).
- Oberweger, M.; Wohlhart, P.; Lepetit, V. Training a Feedback Loop for Hand Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 3316–3324. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. Available online: https://arxiv.org/pdf/1409.1556.pdf (accessed on 10 April 2015).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Krejov, P.; Gilbert, A.; Bowden, R. Guided Optimisation through Classification and Regression for Hand Pose Estimation. Comput. Vision Image Understanding 2017, 115, 124–138. [Google Scholar] [CrossRef]
Method | Average 3D Error |
---|---|
Bouchacourt et al. [42] (DISCO) | 20.7 mm |
Oberweger et al. [38] (DeepPrior) | 19.8 mm |
Deng et al. [10] (Hand3D) | 17.6 mm |
Zhou et al. [11] (DeepModel) | 17.04 mm |
Fourure et al. [12] (JTSC) | 16.8 mm |
Oberweger et al. [44] (Feedback) | 16.2 mm |
Neverova et al. [13] | 14.9 mm |
Xu et al. [14] (Lie-X) | 14.5 mm |
Ge et al. [6] (3DCNN) | 14.1 mm |
Baek et al. [43] | 14.1 mm |
Guo et al. [37] (REN-) | 13.39 mm |
Wang et al. [15] (REN-) | 12.69 mm |
Oberweger et al. [16] (DeepPrior++) | 12.24 mm |
Chen et al. [17] (Pose-REN) | 11.81 mm |
This work | 11.40 mm |
Method | Average 3D Error |
---|---|
Sun et al. [19] (HPR) | 15.2 mm |
Yang et al. [18] (Cls-Guided) | 13.7 mm |
Ge et al. [36] (MultiView) | 13.2 mm |
Baek et al. [43] | 12.5 mm |
Wan et al. [41] (CrossingNets) | 12.2 mm |
Wang et al. [15] (REN-) | 9.7 mm |
Oberweger et al. [16] (DeepPrior++) | 9.5 mm |
Chen et al.(Pose-REN) [17] | 8.65 mm |
This work | 8.41 mm |
Method | Average 3D Error |
---|---|
Tang et al. [20] (LRF) | 12.6 mm |
Zhou et al. [11] (DeepModel) | 11.56 mm |
Deng et al. [10] (Hand3D) | 10.9 mm |
Krejov et al. [47] (CDO) | 10.5 mm |
Oberweger et al. [38] (DeepPrior) | 10.4 mm |
Wan et al. [41] (CrossingNets) | 10.2 mm |
Oberweger et al. [16] (DeepPrior++) | 8.1 mm |
Baek et al. [43] | 8.5 mm |
Guo et al. [37] (REN-) | 7.63 mm |
Wang et al. [15] (REN-) | 7.31 mm |
Chen et al. [17] (Pose-REN) | 6.79 mm |
This work | 6.45 mm |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, W.; Xie, Z.; Li, Y.; Wang, X.; Cai, W. Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation. Sensors 2019, 19, 2919. https://doi.org/10.3390/s19132919
He W, Xie Z, Li Y, Wang X, Cai W. Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation. Sensors. 2019; 19(13):2919. https://doi.org/10.3390/s19132919
Chicago/Turabian StyleHe, Wangyong, Zhongzhao Xie, Yongbo Li, Xinmei Wang, and Wendi Cai. 2019. "Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation" Sensors 19, no. 13: 2919. https://doi.org/10.3390/s19132919
APA StyleHe, W., Xie, Z., Li, Y., Wang, X., & Cai, W. (2019). Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation. Sensors, 19(13), 2919. https://doi.org/10.3390/s19132919