Reinforcement Learning of Bipedal Walking Using a Simple Reference Motion
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. Reference Motion
3.2. Action and Target Joint Angle
3.3. State
3.4. Reward
3.4.1. Orientation Reward
3.4.2. Velocity Reward
3.4.3. Symmetry Reward
3.4.4. Survival Reward
3.4.5. Torque Penalty
3.5. Inverse Kinematics
3.6. Optimizing the Parameters of the Reference Motion
3.7. Optimizing
4. Numerical Experiments
4.1. Simulation Setups
4.2. Methods for Comparison
4.2.1. RL Only
4.2.2. Imitation Learning 1
4.2.3. Imitation Learning 2
4.3. Experimental Results
- ;
- ;
- ;
- .
5. Real-Robot Experiments
5.1. Real-Robot System
5.2. Domain Randomization
5.3. Experimental Results
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Miura, H.; Shimoyama, I. Dynamic walk of a biped. Int. J. Robot. Res. 1984, 3, 60–74. [Google Scholar] [CrossRef]
- Kajita, S.; Morisawa, M.; Miura, K.; Nakaoka, S.; Harada, K.; Kaneko, K.; Kanehiro, F.; Yokoi, K. Biped Walking Stabilization Based on Linear Inverted Pendulum Tracking. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 4489–4496. [Google Scholar]
- Vukobratović, M.; Borovac, B. Zero-moment point—Thirty five years of its life. Int. J. Humanoid Robot. 2004, 1, 157–173. [Google Scholar] [CrossRef]
- Chevallereau, C.; Djoudi, D.; Grizzle, J.W. Stable bipedal walking with foot rotation through direct regulation of the zero moment point. IEEE Trans. Robot. 2008, 24, 390–401. [Google Scholar] [CrossRef]
- Peng, X.B.; Abbeel, P.; Levine, S.; Van de Panne, M. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 2018, 37, 143:1–143:14. [Google Scholar] [CrossRef]
- Hou, L.; Wang, H.; Zou, H.; Wang, Q. Efficient robot skills learning with weighted near-optimal experiences policy optimization. Appl. Sci. 2021, 11, 1131. [Google Scholar] [CrossRef]
- Zhang, W.; Jiang, Y.; Farrukh, F.; Zhang, C.; Zhang, D.; Wang, G. LORM: A novel reinforcement learning framework for biped gait control. PeerJ Comput. Sci. 2022, 8, e927. [Google Scholar] [CrossRef] [PubMed]
- Kaymak, Ç.; Uçar, A.; Güzeliş, C. Development of a new robust stable walking algorithm for a humanoid robot using deep reinforcement learning with multi-sensor data fusion. Electronics 2023, 12, 568. [Google Scholar] [CrossRef]
- Huang, C.; Wang, G.; Zhou, Z.; Zhang, R.; Lin, L. Reward-adaptive reinforcement learning: Dynamic policy gradient optimization for bipedal locomotion. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7686–7695. [Google Scholar] [CrossRef]
- Rodriguez, D.; Behnke, S. DeepWalk: Omnidirectional Bipedal Gait by Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 3033–3039. [Google Scholar] [CrossRef]
- Heess, N.; TB, D.; Sriram, S.; Lemmon, J.; Merel, J.; Wayne, G.; Tassa, Y.; Erez, T.; Wang, Z.; Eslami, S.M.A.; et al. Emergence of locomotion behaviours in rich environments. arXiv 2017, arXiv:1707.02286. [Google Scholar]
- Xie, Z.; Berseth, G.; Clary, P.; Hurst, J.; van de Panne, M. Feedback Control For Cassie With Deep Reinforcement Learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1241–1246. [Google Scholar] [CrossRef]
- Taylor, M.; Bashkirov, S.; Rico, J.F.; Toriyama, I.; Miyada, N.; Yanagisawa, H.; Ishizuka, K. Learning Bipedal Robot Locomotion from Human Movement. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 2797–2803. [Google Scholar] [CrossRef]
- Zhang, J.X.; Yang, T.; Chai, T. Neural network control of underactuated surface vehicles with prescribed trajectory tracking performance. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef]
- Xie, H.; Zhang, J.X.; Jing, Y.; Dimirovski, G.M.; Chen, J. Self-adjustable performance-based adaptive tracking control of uncertain nonlinear systems. IEEE Trans. Autom. Sci. Eng. 2024, 1–15. [Google Scholar] [CrossRef]
- Wu, Q.; Zhang, C.; Liu, Y. Custom Sine Waves Are Enough for Imitation Learning of Bipedal Gaits with Different Styles. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation (ICMA), Guilin, China, 7–10 August 2022; pp. 499–505. [Google Scholar] [CrossRef]
- Shi, F.; Kojio, Y.; Makabe, T.; Anzai, T.; Kojima, K.; Okada, K.; Inaba, M. Reference-Free Learning Bipedal Motor Skills via Assistive Force Curricula. In Proceedings of the International Symposium of Robotics Research, Geneva, Switzerland, 25–30 September 2022; pp. 304–320. [Google Scholar]
- Singh, R.P.; Xie, Z.; Gergondet, P.; Kanehiro, F. Learning bipedal walking for humanoids with current feedback. IEEE Access 2023, 11, 82013–82023. [Google Scholar] [CrossRef]
- Tutsoy, O.; Erol Barkana, D.; Colak, S. Learning to balance an NAO robot using reinforcement learning with symbolic inverse kinematic. Trans. Inst. Meas. Control. 2017, 39, 1735–1748. [Google Scholar] [CrossRef]
- Nguyen, V.T.; Bui, T.; Hasegawa, H. A gait generation for biped robot based on artificial neural network and improved self-adaptive differential evolution algorithm. Int. J. Mach. Learn. Comput. 2016, 6, 260. [Google Scholar] [CrossRef]
- Dutta, S.; Maiti, T.K.; Miura-Mattausch, M.; Ochi, Y.; Yorino, N.; Mattausch, H.J. Analysis of sensor-based real-time balancing of humanoid robots on inclined surfaces. IEEE Access 2020, 8, 212327–212338. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Dutta, S.; Luo, A.; Miura-Mattausch, M.; Ochi, Y.; Mattausch, H.J. Energy efficiency of force-sensor-controlled humanoid-robot walking on indoor surfaces. IEEE Access 2020, 8, 227100–227112. [Google Scholar] [CrossRef]
- Sugihara, K.; Zhao, M.; Nishio, T.; Makabe, T.; Okada, K.; Inaba, M. Design and control of a small humanoid equipped with flight unit and wheels for multimodal locomotion. IEEE Robot. Autom. Lett. 2023, 8, 5608–5615. [Google Scholar] [CrossRef]
- Siekmann, J.; Valluri, S.; Dao, J.; Bermillo, L.; Duan, H.; Fern, A.; Hurst, J. Learning Memory-Based Control for Human-Scale Bipedal Locomotion. In Proceedings of the Robotics: Science and Systems (RSS), Virtual Event, 12–17 July 2020; pp. 1–8. [Google Scholar]
- Xie, Z.; Clary, P.; Dao, J.; Morais, P.; Hurst, J.; van de Panne, M. Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real. Proc. Mach. Learn. Res. 2020, 100, 317–329. [Google Scholar]
- Denavit, J.; Hartenberg, R.S. A kinematic notation for lower-pair mechanisms based on matrices. J. Appl. Mech. 1955, 22, 215–221. [Google Scholar] [CrossRef]
- Li, Z.; Peng, X.B.; Abbeel, P.; Levine, S.; Berseth, G.; Sreenath, K. Robust and Versatile Bipedal Jumping Control through Reinforcement Learning. In Proceedings of the Robotics: Science and Systems XIX, Daegu, Republic of Korea, 10–14 July 2023. [Google Scholar] [CrossRef]
- Li, Z.; Cheng, X.; Peng, X.B.; Abbeel, P.; Levine, S.; Berseth, G.; Sreenath, K. Reinforcement learning for robust parameterized locomotion control of bipedal robots. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 2811–2817. [Google Scholar] [CrossRef]
- Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Trans. Graph. 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Masuda, S.; Takahashi, K. Sim-to-real transfer of compliant bipedal locomotion on torque sensor-less gear-driven humanoid. In Proceedings of the 22nd IEEE-RAS International Conference on Humanoid Robots (Humanoids 2023), Austin, TX, USA, 12–14 December 2023; pp. 1–8. [Google Scholar] [CrossRef]
- Močkus, J.; Tiešis, V.; Žilinskas, A. The Application of Bayesian Methods for Seeking the Extremum; Towards Global Optimization; North-Holand Publishing Company: Amsterdam, The Netherlands, 1978; Volume 2, pp. 117–129. [Google Scholar]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for Hyper-Parameter Optimization. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; Volume 24. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Kuo, P.H.; Pao, C.H.; Chang, E.Y.; Yau, H.T. Deep-reinforcement-learning-based gait pattern controller on an uneven terrain for humanoid robots. Int. J. Optomechatronics 2023, 17, 2222146. [Google Scholar] [CrossRef]
- Michel, O. Cyberbotics Ltd. Webots™: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 2004, 1, 5. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 12348–12355. [Google Scholar]
- Alfian, R.I.; Ma’arif, A.; Sunardi, S. Noise reduction in the accelerometer and gyroscope sensor with the Kalman filter algorithm. J. Robot. Control 2021, 2, 180–189. [Google Scholar] [CrossRef]
- Quigley, M.; Conley, K.; Gerkey, B.P.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An Open-source Robot Operating System. ICRA Workshop Open Source Softw. 2009, 3, 5. [Google Scholar]
- Peng, X.B.; Andrychowicz, M.; Zaremba, W.; Abbeel, P. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018. [Google Scholar] [CrossRef]
- Bosch Sensortec. Inertial Measurement Unit BMI088. Available online: https://www.bosch-sensortec.com/products/motion-sensors/imus/bmi088/. (accessed on 17 January 2024).
- Yan, H.; Zhang, J.X.; Zhang, X. Injected infrared and visible image fusion via L1 decomposition model and guided filtering. IEEE Trans. Comput. Imaging 2022, 8, 162–173. [Google Scholar] [CrossRef]
- Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to Walk in Minutes using Massively Parallel Deep Reinforcement Learning. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 91–100. [Google Scholar]
Joint | [mm] | [rad] | [mm] | [rad] |
---|---|---|---|---|
1 | 0 | 23 | * | |
2 | 41 | 0 | ||
3 | 65 | 0 | 0 | |
4 | 65 | 0 | 0 | |
5 | 41 | 0 | ||
6 | 25 | 0 | 0 |
Hyperparameter | Value |
---|---|
Time steps per actor batch | 2048 |
Batch size | 128 |
Optimization epochs | 10 |
Learning rate | |
Discount factor | |
GAE | |
Clip parameter | |
Entropy coefficient |
Method | Walking Distance in 750 Steps (i.e., 30 s) [cm] | Speed [cm/s] |
---|---|---|
Proposed method | ||
Proposed method w/o Bayesian optimization | ||
RL only | - | - |
Imitation Learning 1 | - | - |
Imitation Learning 2 | - | - |
Parameter | Value |
---|---|
Applied force | |
Direction of the applied force | |
Interval between force applications | |
Motor noise | |
Motor offset | |
Gyro noise | |
Gyro offset | |
Accelerometer noise (x and y axis) | |
Accelerometer noise (z axis) | |
Accelerometer offset |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Itahashi, N.; Itoh, H.; Fukumoto, H.; Wakuya, H. Reinforcement Learning of Bipedal Walking Using a Simple Reference Motion. Appl. Sci. 2024, 14, 1803. https://doi.org/10.3390/app14051803
Itahashi N, Itoh H, Fukumoto H, Wakuya H. Reinforcement Learning of Bipedal Walking Using a Simple Reference Motion. Applied Sciences. 2024; 14(5):1803. https://doi.org/10.3390/app14051803
Chicago/Turabian StyleItahashi, Naoya, Hideaki Itoh, Hisao Fukumoto, and Hiroshi Wakuya. 2024. "Reinforcement Learning of Bipedal Walking Using a Simple Reference Motion" Applied Sciences 14, no. 5: 1803. https://doi.org/10.3390/app14051803
APA StyleItahashi, N., Itoh, H., Fukumoto, H., & Wakuya, H. (2024). Reinforcement Learning of Bipedal Walking Using a Simple Reference Motion. Applied Sciences, 14(5), 1803. https://doi.org/10.3390/app14051803