Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Problem Formulation
2.2. Towards a Unified Loco-Manipulation Model
2.3. Learning Algorithms
2.4. Learning a Structured Policy with Inductive Bias
2.5. Reconfiguration with Overconstrained Robotic Limbs
2.6. Simulation Environment
- Learning MLP Policy from Scratch: We trained agents using the MLP policy from scratch. The locomotion task (LocoFromScratch) and manipulation task (ManiFromScratch) were trained in separate simulation runs, as shown in Figure 2.
- Learning MLP Policy Transferred from Expert Policy: The policy learned from scratch was transferred to another task. Specifically, for the locomotion transfer experiments (LocoFromMani), the expert manipulation policy trained from the ManiFromScratch experiment was used as the initial policy, while for the manipulation transfer experiments (ManiFromLoco), the initial policy was set to the policy trained from LocoFromScratch.
- Co-learning a Unified Loco-manipulation Policy: We combined the locomotion and manipulation task into a single simulation run and trained a unified policy using MLP policy (ColearnMLP) or graph policy (ColearnGraph).
3. Experiment Results
3.1. Transferability between Locomotion and Manipulation in Two Robot Configurations
3.2. Co-Learning Unified Loco-Manipulation Policies with MLP and GNN
3.3. Transfer Learned Skills to Physical Hardware
4. Conclusions and Limitations
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Billeschou, P.; Bijma, N.N.; Larsen, L.B.; Gorb, S.N.; Larsen, J.C.; Manoonpong, P. Framework for Developing Bio-Inspired Morphologies for Walking Robots. Appl. Sci. 2020, 10, 6986. [Google Scholar] [CrossRef]
- Shi, F.; Homberger, T.; Lee, J.; Miki, T.; Zhao, M.; Farshidian, F.; Okada, K.; Inaba, M.; Hutter, M. Circus ANYmal: A Quadruped Learning Dexterous Manipulation with its Limbs. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 2316–2323. [Google Scholar]
- Lynch, K.M. Nonprehensile Robotic Manipulation: Controllability and Planning. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1996. [Google Scholar]
- Johnson, A.M.; Koditschek, D.E. Legged Self-Manipulation. IEEE Access 2013, 1, 310–334. [Google Scholar] [CrossRef]
- Mason, M.T. Toward Robotic Manipulation. Annu. Rev. Control. Robot. Auton. Syst. 2018, 1, 1–28. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Johannink, T.; Bahl, S.; Nair, A.; Luo, J.; Kumar, A.; Loskyll, M.; Ojea, J.A.; Solowjow, E.; Levine, S. Residual reinforcement learning for robot control. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6023–6029. [Google Scholar]
- Feng, Y.; Shi, C.; Du, J.; Yu, Y.; Sun, F.; Song, Y. Variable Admittance Interaction Control of UAVs via Deep Reinforcement Learning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1291–1297. [Google Scholar]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning Agile and Dynamic Motor Skills for Legged Robots. Sci. Robot. 2019, 4, eaau5872. [Google Scholar] [CrossRef]
- Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning Quadrupedal Locomotion over Challenging Terrain. Sci. Robot. 2020, 5, eabc5986. [Google Scholar] [CrossRef]
- Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. In Proceedings of the Conference on Robot Learning, London, UK, 6–9 November 2022; pp. 91–100. [Google Scholar]
- Margolis, G.B.; Yang, G.; Paigwar, K.; Chen, T.; Agrawal, P. Rapid Locomotion via Reinforcement Learning. In Proceedings of the Robotics: Science and Systems (RSS), New York, NY, USA, 27 June–1 July 2022. [Google Scholar]
- Andrychowicz, O.M.; Baker, B.; Chociej, M.; Jozefowicz, R.; McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning Dexterous In-hand Manipulation. Int. J. Robot. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Xu, J.; Agrawal, P. A System for General In-Hand Object Re-Orientation. arXiv 2021, arXiv:2111.03043. [Google Scholar] [CrossRef]
- Rehman, B.U.; Focchi, M.; Lee, J.; Dallali, H.; Caldwell, D.G.; Semini, C. Towards a Multi-Legged Mobile Manipulator. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3618–3624. [Google Scholar] [CrossRef]
- Bellicoso, C.D.; Krämer, K.; Stäuble, M.; Sako, D.; Jenelten, F.; Bjelonic, M.; Hutter, M. ALMA–Articulated Locomotion and Manipulation for a Torque-Controllable Robot. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8477–8483. [Google Scholar] [CrossRef] [Green Version]
- Fu, Z.; Cheng, X.; Pathak, D. Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion. In Proceedings of the Machine Learning Research (PMLR), Proceedings of the 6th Conference on Robot Learning, Auckland, New Zealand, 14–18 December 2023; Volume 205, pp. 138–149. [Google Scholar]
- Zimmermann, S.; Poranne, R.; Coros, S. Go Fetch!—Dynamic Grasps using Boston Dynamics Spot with External Robotic Arm. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 4488–4494. [Google Scholar] [CrossRef]
- Koyachi, N.; Adachi, H.; Izumi, M.; Hirose, T. Control of Walk and Manipulation by a Hexapod with Integrated Limb Mechanism: MELMANTIS-1. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, USA, 11–15 May 2002; Volume 4, pp. 3553–3558. [Google Scholar] [CrossRef]
- Hooks, J.; Ahn, M.S.; Yu, J.; Zhang, X.; Zhu, T.; Chae, H.; Hong, D. ALPHRED: A Multi-Modal Operations Quadruped Robot for Package Delivery Applications. IEEE Robot. Autom. Lett. 2020, 5, 5409–5416. [Google Scholar] [CrossRef]
- Huang, X.; Li, Z.; Xiang, Y.; Ni, Y.; Chi, Y.; Li, Y.; Yang, L.; Peng, X.B.; Sreenath, K. Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning. arXiv 2022, arXiv:cs.RO/2210.04435. [Google Scholar]
- Cheng, X.; Kumar, A.; Pathak, D. Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion. arXiv 2023, arXiv:cs.RO/2303.11330. [Google Scholar]
- Sanchez-Gonzalez, A.; Heess, N.; Springenberg, J.T.; Merel, J.; Riedmiller, M.; Hadsell, R.; Battaglia, P. Graph Networks as Learnable Physics Engines for Inference and Control. In Proceedings of the Machine Learning Research (PMLR), Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4470–4479. [Google Scholar]
- Yang, L.; Huang, B.; Li, Q.; Tsai, Y.Y.; Lee, W.W.; Song, C.; Pan, J. TacGNN: Learning Tactile-Based In-Hand Manipulation With a Blind Robot Using Hierarchical Graph Neural Network. IEEE Robot. Autom. Lett. 2023, 8, 3605–3612. [Google Scholar] [CrossRef]
- Huang, W.; Mordatch, I.; Pathak, D. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control. In Proceedings of the Machine Learning Research (PMLR), Proceedings of the 37th International Conference on Machine Learning, Virtual Event Online, 13–18 July 2020; pp. 4455–4464. [Google Scholar]
- Wang, T.; Liao, R.; Ba, J.; Fidler, S. NerveNet: Learning Structured Policy with Graph Neural Networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Ahn, M.; Zhu, H.; Hartikainen, K.; Ponte, H.; Gupta, A.; Levine, S.; Kumar, V. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots. In Proceedings of the Machine Learning Research (PMLR), Proceedings of the Conference on Robot Learning, Osaka, Japan, 30 October–1 November 2020; pp. 1300–1313. [Google Scholar]
- Wuthrich, M.; Widmaier, F.; Grimminger, F.; Joshi, S.; Agrawal, V.; Hammoud, B.; Khadiv, M.; Bogdanovic, M.; Berenz, V.; Viereck, J.; et al. TriFinger: An Open-Source Robot for Learning Dexterity. In Proceedings of the Machine Learning Research (PMLR), Proceedings of the 2020 Conference on Robot Learning, Virtual Event Online, 16–18 November 2021; pp. 1871–1882. [Google Scholar]
- Lowrey, K.; Kolev, S.; Dao, J.; Rajeswaran, A.; Todorov, E. Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system. In Proceedings of the 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), Brisbane, QL, Australia, 16–19 May 2018; pp. 35–42. [Google Scholar] [CrossRef] [Green Version]
- White, D. Solving infinite horizon discounted Markov decision process problems for a range of discount factors. J. Math. Anal. Appl. 1989, 141, 303–317. [Google Scholar] [CrossRef] [Green Version]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:cs.LG/1707.06347. [Google Scholar]
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:cs.LG/1806.01261. [Google Scholar]
- Song, C.; Chen, Y. Multiple linkage forms and bifurcation behaviours of the double-subtractive-Goldberg 6R linkage. Mech. Mach. Theory 2012, 57, 95–110. [Google Scholar] [CrossRef]
- Song, C.; Chen, Y.; Chen, I. A 6R linkage reconfigurable between the line-symmetric Bricard linkage and the Bennett linkage. Mech. Mach. Theory 2013, 70, 278–292. [Google Scholar] [CrossRef]
- Song, C.; Feng, H.; Chen, Y.; Chen, I.; Kang, R. Reconfigurable mechanism generated from the network of Bennett linkages. Mech. Mach. Theory 2015, 88, 49–62. [Google Scholar] [CrossRef]
- Gu, Y.; Feng, S.; Guo, Y.; Wan, F.; Dai, J.S.; Pan, J.; Song, C. Overconstrained Coaxial Design of Robotic Legs with Omni-Directional Locomotion. Mech. Mach. Theory 2022, 176, 105018. [Google Scholar] [CrossRef]
- Serrano-Muñoz, A.; Arana-Arexolaleiba, N.; Chrysostomou, D.; Bøgh, S. skrl: Modular and Flexible Library for Reinforcement Learning. arXiv 2022, arXiv:cs.LG/2202.03825. [Google Scholar]
- Marchand, E.; Spindler, F.; Chaumette, F. ViSP for visual servoing: A generic software platform with a wide class of robot control skills. IEEE Robot. Autom. Mag. 2005, 12, 40–52. [Google Scholar] [CrossRef] [Green Version]
- Nogueira, F. Bayesian Optimization: Open Source Constrained Global Optimization Tool for Python. 2014. Available online: https://github.com/bayesian-optimization/BayesianOptimization (accessed on 26 May 2023).
- Yang, L.; Han, X.; Guo, W.; Wan, F.; Pan, J.; Song, C. Learning-Based Optoelectronically Innervated Tactile Finger for Rigid-Soft Interactive Grasping. IEEE Robot. Autom. Lett. 2021, 6, 3817–3824. [Google Scholar] [CrossRef]
Reward | Reward Scale of MLP | Reward Scale of GNN |
---|---|---|
Rotation Reward | 0.5 | 2 |
Position Deviation Penalty | 2.5 | 2.5 |
Joint Acceleration Penalty | 0.0005 | 0.002 |
Action Rate Penalty | 0.02 | 0.02 |
Hyperparameter | Value | Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|---|---|
learning_epochs | 5 | mini_batches | 1 | discount_factor | 0.99 |
lambda | 0.95 | kl_threshold | 0.012 | grad_norm_clip | 1.0 |
ratio_clip | 0.2 | value_clip | 0.2 | clip_predicted_values | True |
value_loss_scale | 1.0 | rollouts | 48 | min_learning_rate |
Hyperparameter | Value | Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|---|---|
learning_epochs | 5 | mini_batches | 1 | discount_factor | 0.99 |
lambda | 0.95 | kl_threshold | 0.012 | grad_norm_clip | 1.0 |
ratio_clip | 0.2 | value_clip | 0.2 | clip_predicted_values | True |
value_loss_scale | 1.0 | rollouts | 48 | min_learning_rate | |
model_size | 32 | stack numbers N | 3 | iteration numbers K | 2 |
Model Transfer | Horizontal | Vertical |
---|---|---|
LocoFromMani | 66.4% | 3.6% |
ManiFromLoco | 51.8% | 18.0% |
Model Transfer | LocoFromMani | ManiFromLoco |
---|---|---|
LocoFromScratch | 0.2024 | 0.1063 |
ManiFromScratch | 0.1007 | 0.1923 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Yang, L.; Gu, Y.; Pan, J.; Wan, F.; Song, C. Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning. Biomimetics 2023, 8, 364. https://doi.org/10.3390/biomimetics8040364
Sun H, Yang L, Gu Y, Pan J, Wan F, Song C. Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning. Biomimetics. 2023; 8(4):364. https://doi.org/10.3390/biomimetics8040364
Chicago/Turabian StyleSun, Haoran, Linhan Yang, Yuping Gu, Jia Pan, Fang Wan, and Chaoyang Song. 2023. "Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning" Biomimetics 8, no. 4: 364. https://doi.org/10.3390/biomimetics8040364
APA StyleSun, H., Yang, L., Gu, Y., Pan, J., Wan, F., & Song, C. (2023). Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning. Biomimetics, 8(4), 364. https://doi.org/10.3390/biomimetics8040364