Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards
(This article belongs to the Section Multidisciplinary Applications)
Abstract
:1. Introduction
- (1)
- Design a maximum entropy deep reinforcement learning grasping method based on an attention mechanism to address complex and sparse reward tasks while eliminating the trouble of adjusting hyper-parameters in unstructured grasping environments.
- (2)
- Design an experience replay mechanism to reduce data correlation and combine advantage functions to enhance reasoning and decision-making abilities in complex environments.
- (3)
- Design object affordance perception based on space-channel attention to make robots more flexible in dealing with various complex grasping tasks.
- (4)
- Our proposed method has generalization ability from simulation to real world. For cluttered situations, the experimental results indicate the grasping rate of unknown objects is up to 100% and 91.6% for single-object and multi-object, respectively.
2. Related Work
3. Preliminaries and Problem Formulation
3.1. Model Description
3.2. Prioritized Experience Replay
3.3. Reward Reshaping
4. Push-Grasp Policy Design
4.1. Affordance Perception
4.2. Maximum Entropy DQN
5. Experiment Analysis
5.1. Experimental Setup
5.2. Training
5.3. Object Grasping Simulation Experiments
5.4. Ablation Experiment
5.5. Physical Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, H.; Lan, X.; Bai, S.; Zhou, X.; Tian, Z.; Zheng, N. ROI-based Robotic Grasp Detection for Object Overlapping Scenes. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4768–4775. [Google Scholar] [CrossRef]
- Zhou, X.; Lan, X.; Zhang, H.; Tian, Z.; Zhang, Y.; Zheng, N. Fully Convolutional Grasp Detection Network with Oriented Anchor Box. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 7223–7230. [Google Scholar] [CrossRef]
- Chen, T.; Shenoy, A.; Kolinko, A.; Shah, S.; Sun, Y. Multi-Object Grasping—Estimating the Number of Objects in a Robotic Grasp. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 7 September–1 October 2021; pp. 4995–5001. [Google Scholar] [CrossRef]
- Liu, S.; Wang, L.; Vincent Wang, X. Multimodal Data-Driven Robot Control for Human–Robot Collaborative Assembly. ASME. J. Manuf. Sci. Eng. May 2022, 144, 051012. [Google Scholar] [CrossRef]
- Valencia, D.; Jia, J.; Hayashi, A.; Lecchi, M.; Terezakis, R.; Gee, T.; Liarokapis, M.; MacDonald, B.A.; Williams, H. Comparison of Model-Based and Model-Free Reinforcement Learning for Real-World Dexterous Robotic Manipulation Tasks. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 871–878. [Google Scholar] [CrossRef]
- Yu, K.-T.; Bauza, M.; Fazeli, N.; Rodriguez, A. More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 30–37. [Google Scholar] [CrossRef]
- Bauza, M.; Rodriguez, A. A probabilistic data-driven model for planar pushing. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3008–3015. [Google Scholar] [CrossRef]
- Palleschi, A.; Angelini, F.; Gabellieri, C.; Park, D.W.; Pallottino, L.; Bicchi, A.; Garabini, M. Grasp It Like a Pro 2.0: A Data-Driven Approach Exploiting Basic Shape Decomposition and Human Data for Grasping Unknown Objects. IEEE Trans. Robot. 2023, 39, 4016–4036. [Google Scholar] [CrossRef]
- Lee, M.A.; Zhu, Y.; Srinivasan, K.; Shah, P.; Savarese, S.; Fei-Fei, L.; Garg, A.; Bohg, J. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8943–8950. [Google Scholar] [CrossRef]
- Takahashi, K.; Ko, W.; Ummadisingu, A.; Maeda, S.-I. Uncertainty-aware Self-supervised Target-mass Grasping of Granular Foods. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 2620–2626. [Google Scholar] [CrossRef]
- Zeng, A.; Song, S.; Welker, S.; Lee, J.; Rodriguez, A.; Funkhouser, T. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4238–4245. [Google Scholar] [CrossRef]
- Berscheid, L.; Meißner, P.; Kröger, T. Robot Learning of Shifting Objects for Grasping in Cluttered Environments. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 612–618. [Google Scholar] [CrossRef]
- Liu, H.; Yuan, Y.; Deng, Y.; Guo, X.; Wei, Y.; Lu, K.; Fang, B.; Guo, D. Active Affordance Exploration for Robot Grasping. In Intelligent Robotics and Applications. ICIRA 2019; Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11744, pp. 426–438. [Google Scholar] [CrossRef]
- Peng, G.; Liao, J.; Guan, S.; Yang, J.; Li, X. A pushing-grasping collaborative method based on deep Q-network algorithm in dual viewpoints. Sci. Rep. 2022, 12, 3927. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Ju, Z.; Yang, C. Combining Reinforcement Learning and Rule-based Method to Manipulate Objects in Clutter. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Mohammed, M.Q.; Kwek, L.C.; Chua, S.C.; Aljaloud, A.S.; Al-Dhaqm, A.; Al-Mekhlafi, Z.G.; Mohammed, B.A. Deep Reinforcement Learning-Based Robotic Gras** in Clutter and Occlusion. Sustainability 2021, 13, 13686. [Google Scholar] [CrossRef]
- Lu, N.; Lu, T.; Cai, Y.; Wang, S. Active Pushing for Better Grasping in Dense Clutter with Deep Reinforcement Learning. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 1657–1663. [Google Scholar] [CrossRef]
- Kiatos, M.; Sarantopoulos, I.; Koutras, L.; Malassiotis, S.; Doulgeri, Z. Learning Push-Grasping in Dense Clutter. IEEE Robot. Autom. Lett. 2022, 7, 8783–8790. [Google Scholar] [CrossRef]
- Lu, N.; Cai, Y.; Lu, T.; Cao, X.; Guo, W.; Wang, S. Picking out the Impurities: Attention-based Push-Grasping in Dense Clutter. Robotica 2023, 41, 470–485. [Google Scholar] [CrossRef]
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv 2018, arXiv:1806.10293. [Google Scholar]
- Yuan, W.; Hang, K.; Song, H.; Kragic, D.; Wang, M.Y.; Stork, J.A. Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 2153–2160. [Google Scholar] [CrossRef]
- Yu, W.; Tan, J.; Liu, C.K.; Turk, G. Preparing for the unknown: Learning a universal policy with online system identification. arXiv 2017, arXiv:1702.02453. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Baker, B.; Chociej, M.; Józefowicz, R.; McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef]
- Hossain, D.; Capi, G.; Jindai, M.; Kaneko, S.-I. Pick-place of dynamic objects by robot manipulator based on deep learning and easy user interface teaching systems. Ind. Robot. 2017, 44, 11–20. [Google Scholar] [CrossRef]
- Hossain, D.; Capi, G. Multiobjective evolution for deep learning and its robotic applications. In Proceedings of the 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, T.; Mo, H. Reinforcement learning for robot research: A comprehensive review and open issues. Int. J. Adv. Robot. Syst. 2021, 18. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Jin, Y.; Liu, Q.; Shen, L.; Zhu, L. Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving. Symmetry 2021, 13, 1061. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Ni, P.; Zhang, W.; Zhang, H.; Cao, Q. Learning efficient push and grasp policy in a totebox from simulation. Adv. Robot. 2020, 34, 873–887. [Google Scholar] [CrossRef]
- Rohmer, E.; Singh, S.P.N.; Freese, M. V-REP: A versatile and scalable robot simulation framework. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1321–1326. [Google Scholar] [CrossRef]
- Tang, Z.; Shi, Y.; Xu, X. CSGP: Closed-Loop Safe Grasp Planning via Attention-Based Deep Reinforcement Learning from Demonstrations. IEEE Robot. Autom. Lett. 2023, 8, 3158–3165. [Google Scholar] [CrossRef]
- Mosbach, M.; Behnke, S. Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Grasping Policies. In Proceedings of the 2022 Sixth IEEE International Conference on Robotic Computing (IRC), Rome, Italy, 5–7 December 2022; pp. 156–163. [Google Scholar] [CrossRef]
- Sarantopoulos, I.; Kiatos, M.; Doulgeri, Z.; Malassiotis, S. Split Deep Q-Learning for Robust Object Singulation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6225–6231. [Google Scholar] [CrossRef]
Layer Name | Output Size | Kernel Size/Number | Output Feature Maps |
---|---|---|---|
Conv | 112 112 | 64 | |
Pooling | 56 56 | 64 | |
Attention block_1 | 56 56 | 256 | |
Transition layer_1 | 56 56 | 128 | |
28 28 | 125 | ||
Attention block_2 | 28 28 | 512 | |
Transition layer_2 | 28 28 | 256 | |
14 14 | 256 | ||
Attention block_3 | 14 14 | 1024 | |
Transition layer_3 | 14 14 | 512 | |
7 7 | 512 | ||
Attention block_4 | 7 7 | 1024 |
GS(%) | GE (Number per Hour) | GT (s) | |||||||
---|---|---|---|---|---|---|---|---|---|
Module | cub | cy | o | cub | cy | o | cub | cy | o |
DenseNet-201 | 78.5 | 75.1 | 68.5 | 800 | 642 | 590 | 4.5 | 5.6 | 6.1 |
DenseNet-169 | 89.2 | 85.7 | 80.3 | 947 | 734 | 679 | 3.8 | 4.9 | 5.3 |
DenseNet-121(Ours) | 100 | 100 | 100 | 972 | 782 | 750 | 3.7 | 4.6 | 4.8 |
GS (%) | GE (Number per Hour) | GT (s) | |
---|---|---|---|
Same structure | 93.1 | 702 ± 3 | 7.9 |
Different structure | 92.4 | 519 ± 3 | 10.8 |
Evaluation Metrics (Mean %) | ||
---|---|---|
Methods | Completion | GS (%) |
Dual viewpoint [14] | 92 | 83.2 |
Rule-based method [15] | 90 | 72.8 |
VPG-only depth [30] | 96 | 74.6 |
VPG [11] | 90 | 86.9 |
Ours | 98 | 92.4 |
Module | 60% | 70% | 80% | 90% |
---|---|---|---|---|
DQN (DenseNet121) | 185 | 525 | - | - |
ME-DQN-noAF | 269 | 337 | 402 | - |
ME-DQN-noattention | 213 | 286 | 592 | - |
ME-DQN (ours) | 287 | 368 | 435 | 711 |
Methods | Attempts | Average Successful Rate/Individual Object Time | Successful Rate of Empty Workplace | ||
---|---|---|---|---|---|
10 Objects | 20 Objects | 30 Objects | |||
UCB [32] | 523 | 82% (15.8 s) | 89% | 83% | 75% |
3DCNN [33] | 471 | 87% (12.7 s) | 92.5% | 89.5% | 79% |
Coordinator [34] | 509 | 85% (17.3 s) | 94.5% | 81% | 79.5% |
VPG [11] | 497 | 82.9% (10.9 s) | 94.8% | 83.6% | 70.3% |
Ours | 511 | 91.6% (8.9 s) | 96% | 88% | 87.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, T.; Mo, H. Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards. Entropy 2024, 26, 416. https://doi.org/10.3390/e26050416
Zhang T, Mo H. Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards. Entropy. 2024; 26(5):416. https://doi.org/10.3390/e26050416
Chicago/Turabian StyleZhang, Tengteng, and Hongwei Mo. 2024. "Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards" Entropy 26, no. 5: 416. https://doi.org/10.3390/e26050416
APA StyleZhang, T., & Mo, H. (2024). Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards. Entropy, 26(5), 416. https://doi.org/10.3390/e26050416