Dexterous Object Manipulation with an Anthropomorphic Robot Hand via Natural Hand Pose Transformer and Deep Reinforcement Learning
Abstract
:1. Introduction
2. Methods
2.1. Databases for T-DRL Trainings
2.2. Object Manipulation T-DRL
2.2.1. Transformer Network with Attention Mechanism
2.2.2. DRL with Reward Shaping
2.3. Training and Validation of T-DRL
2.3.1. Training and Validation of Transformer Network
2.3.2. Training and Validation of T-DRL
3. Results
3.1. Simulation Environments
3.2. Hand Pose Estimation via Transformer Network
3.3. Object Manipulation Performance by T-DRL
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Van Hoof, H.; Hermans, T.; Neumann, G.; Peters, J. Learning robot in-hand manipulation with tactile features. In Proceedings of the 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), Seoul, Korea, 3–5 November 2015; pp. 121–127. [Google Scholar]
- Pinto, L.; Gupta, A. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3406–3413. [Google Scholar]
- Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 2018, 37, 421–436. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Baker, B.; Chociej, M.; Jozefowicz, R.; McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef] [Green Version]
- Valarezo Anazco, E.; Rivera Lopez, P.; Park, N.; Oh, J.H.; Ryu, G.; Al-antari, M.A.; Kim, T.S. Natural hand object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network. Appl. Sci. 2021, 51, 1041–1055. [Google Scholar] [CrossRef]
- Lu, Q.; Rojas, N. On Soft Fingertips for In-Hand Manipulation: Modeling and Implications for Robot Hand Design. IEEE Robot. Autom. Lett. 2019, 4, 2471–2478. [Google Scholar] [CrossRef] [Green Version]
- Erol, A.; Bebis, G.; Nicolescu, M.; Boyle, R.D.; Twombly, X. Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 2007, 108, 52–73. [Google Scholar] [CrossRef]
- Du, G.; Wang, K.; Lian, S.; Zhao, K. Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review. Int. J. Robot. Res. 2020, 54, 1677–1734. [Google Scholar]
- Abdal Shafi Rasel, A.; Abu Yousuf, M. An Efficient Framework for Hand Gesture Recognition based on Histogram of Oriented Gradients and Support Vector Machine. I.J. Inf. Technol. Comput. Sci. 2019, 12, 50–56. [Google Scholar]
- Feix, T.; Romero, J.; Schmiedmayer, H.-B.; Dollar, A.M.; Kragic, D. The GRASP Taxonomy of Human Grasp Types. IEEE Trans. Hum. Mach. Syst. 2016, 46, 66–77. [Google Scholar] [CrossRef]
- Hampali, S.; Sarkar, S.; Rad, M.; Lepetit, V. HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation of Hands and Object in Interaction. arXiv 2021, arXiv:2104.14639. Available online: https://arxiv.org/abs/2104.14639 (accessed on 6 August 2021).
- Bohg, J.; Morales, A.; Asfour, T.; Kragic, D. Data-Driven Grasp Synthesis—A Survey. IEEE Trans. Robot. 2014, 30, 289–309. [Google Scholar] [CrossRef] [Green Version]
- Caldera, S.; Rassau, A.; Chai, D. Review of Deep Learning Methods in Robotic Grasp Detection. MTI 2018, 2, 57. [Google Scholar] [CrossRef] [Green Version]
- Abondance, S.; Teeple, C.B.; Wood, R.J. A Dexterous Soft Robotic Hand for Delicate In-Hand Manipulation. IEEE Robot. Autom. Lett. 2020, 5, 5502–5509. [Google Scholar] [CrossRef]
- Osa, T.; Peters, J.; Neumann, G. Hierarchical reinforcement learning of multiple grasping strategies with human instructions. Adv. Robot. 2018, 32, 955–968. [Google Scholar] [CrossRef]
- Ji, S.-Q.; Huang, M.-B.; Huang, H.-P. Robot Intelligent Grasp of Unknown Objects Based on Multi-Sensor Information. Sensors 2019, 19, 1595. [Google Scholar] [CrossRef] [Green Version]
- Karunratanakul, K.; Yang, J.; Zhang, Y.; Black, M.J.; Muandet, K.; Tang, S. Grasping Field: Learning Implicit Representations for Human Grasps. arXiv 2020, arXiv:2008.04451. Available online: https://arxiv.org/abs/2008.04451 (accessed on 10 August 2020).
- Qin, Y.; Su, H.; Wang, X. From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation. arXiv 2022, arXiv:2204.12490. Available online: https://arxiv.org/abs/2204.12490 (accessed on 26 April 2022). [CrossRef]
- Corona, E.; Pumarola, A.; Alenya, G.; Moreno-Noguer, F.; Rogez, G. GanHand: Predicting Human Grasp Affordance in Multi-Object Scenes. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–15 June 2020; pp. 5030–5040. [Google Scholar]
- Lundell, J.; Corona, E.; Le, T.N.; Verdoja, F.; Weinzaepfel, P.; Rogez, G.; Moreno-Noguer, F.; Kyrki, V. Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps. arXiv 2020, arXiv:2012.09696. Available online: https://arxiv.org/abs/2012.09696 (accessed on 17 December 2020).
- Varley, J.; Weisz, J.; Weiss, J.; Allen, P. Generating multi-fingered robotic grasps via deep learning. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 4415–4420. [Google Scholar]
- Jiang, H.; Liu, S.; Wang, J.; Wang, X. Hand-Object Contact Consistency Reasoning for Human Grasps Generation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 11087–11096. [Google Scholar]
- Lin, K.; Wang, L.; Liu, Z. End-to-End Human Pose and Mesh Reconstruction with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1954–1963. [Google Scholar]
- Huang, L.; Tan, J.; Liu, J.; Yuan, J. Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 17–33. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31th Conference on Neural Information Processing Systems (NeuralIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; pp. 4171–4186. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. Available online: https://arxiv.org/abs/2010.11929 (accessed on 22 October 2021).
- Kaiser, L.; Gomez, A.N.; Shazeer, N.; Vaswani, A.; Parmar, N.; Jones, N.; Uszkoreit, J. One Model to Learn Them All. arXiv 2017, arXiv:1706.05137. Available online: https://arxiv.org/abs/1706.05137 (accessed on 16 June 2017).
- Khatun, A.; Abu Yousuf, M.; Ahmed, S.; Uddin, M.D.Z.; Alyami, A.S.; Al-Ashhab, S.; F. Akhdar, H.; Kahn, A.; Azad, A.; Ali Moni, M. Deep CNN-LSTM With Self-Attention Model for Human Activity Recognition Using Wearable Sensor. IEEE J. Transl. Eng. Health Med. 2022, 10, 1–16. [Google Scholar] [CrossRef]
- Cachet, T.; Perez, J.; Kim, S. Transformer-based Meta-Imitation Learning for Robotic Manipulation. In Proceedings of the 3rd Workshop on Robot Learning, Thirty-Fourth Conference on Neural Information Processing Systems (NeurlIPS), Virtual Only Conference, 6–12 December 2020. [Google Scholar]
- Huang, L.; Tan, J.; Meng, J.; Liu, J.; Yuan, J. HOT-Net Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3136–3145. [Google Scholar]
- Wu, Y.-H.; Wang, J.; Wang, W. Learning Generalizable Dexterous Manipulation from Human Grasp Affordance. arXiv 2022, arXiv:2204.02320. Available online: https://arxiv.org/abs/2204.02320 (accessed on 5 April 2022).
- Rajeswaran, A.; Kumar, V.; Gupta, A.; Vezzani, G.; Schulman, J.; Todorov, E.; Levine, S. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arXiv 2017, arXiv:1709.10087. Available online: https://arxiv.org/abs/1709.10087 (accessed on 28 September 2017).
- Mousavian, A.; Eppner, C.; Fox, D. 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 2901–2910. [Google Scholar]
- Brahmbhatt, S.; Tang, C.; Twigg, C.D.; Kemp, C.C.; Hays, J. ContactPose: A Dataset of Grasps with Object Contact and Hand Pose. arXiv 2020, arXiv:2007.09545. Available online: https://arxiv.org/abs/2007.09545 (accessed on 19 July 2020).
- Johnson, J.W. Adapting Mask-RCNN for Automatic Nucleus Segmentation. arXiv 2018, arXiv:1805.00500. Available online: https://arxiv.org/abs/1805.00500 (accessed on 1 May 2018).
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Grishick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollar, P. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Kakade, S.M. A Natural Policy Gradient. In Proceedings of the International Conference on Neural Information Processing Systems: Natural and Synthetic, Vancouver, BC, Canada, 3–8 December 2001; pp. 1531–1538. [Google Scholar]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar]
- Kumar, V.; Xu, Z.; Todorov, E. Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1512–1519. [Google Scholar]
- Mandikal, P.; Grauman, K. DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video. arXiv 2022, arXiv:2022.00164. Available online: https://arxiv.org/abs/2202.00164 (accessed on 1 February 2022).
Objects | NPG (Mean ± std) | T-DRL (Mean ± std) |
---|---|---|
Apple | 91.58 (±3.03) | 94.68 (±1.42) |
Banana | 88.36 (±8.12) | 92.63 (±3.20) |
Hammer | 89.03 (±4.85) | 94.37 (±2.98) |
Light-bulb | 68.53 (±8.45) | 87.61 (±7.14) |
Bottle | 21.81 (±7.51) | 90.35 (±4.20) |
Camera | 32.40 (±6.06) | 81.21 (±3.52) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lopez, P.R.; Oh, J.-H.; Jeong, J.G.; Jung, H.; Lee, J.H.; Jaramillo, I.E.; Chola, C.; Lee, W.H.; Kim, T.-S. Dexterous Object Manipulation with an Anthropomorphic Robot Hand via Natural Hand Pose Transformer and Deep Reinforcement Learning. Appl. Sci. 2023, 13, 379. https://doi.org/10.3390/app13010379
Lopez PR, Oh J-H, Jeong JG, Jung H, Lee JH, Jaramillo IE, Chola C, Lee WH, Kim T-S. Dexterous Object Manipulation with an Anthropomorphic Robot Hand via Natural Hand Pose Transformer and Deep Reinforcement Learning. Applied Sciences. 2023; 13(1):379. https://doi.org/10.3390/app13010379
Chicago/Turabian StyleLopez, Patricio Rivera, Ji-Heon Oh, Jin Gyun Jeong, Hwanseok Jung, Jin Hyuk Lee, Ismael Espinoza Jaramillo, Channabasava Chola, Won Hee Lee, and Tae-Seong Kim. 2023. "Dexterous Object Manipulation with an Anthropomorphic Robot Hand via Natural Hand Pose Transformer and Deep Reinforcement Learning" Applied Sciences 13, no. 1: 379. https://doi.org/10.3390/app13010379
APA StyleLopez, P. R., Oh, J. -H., Jeong, J. G., Jung, H., Lee, J. H., Jaramillo, I. E., Chola, C., Lee, W. H., & Kim, T. -S. (2023). Dexterous Object Manipulation with an Anthropomorphic Robot Hand via Natural Hand Pose Transformer and Deep Reinforcement Learning. Applied Sciences, 13(1), 379. https://doi.org/10.3390/app13010379