Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input
Abstract
:1. Introduction
2. Existing Method on SGE
2.1. Evaluation Indices
2.2. Integration of Evaluation
2.3. Reward Generation
3. Adaptation by Considering the Input Properties
4. Experiments
4.1. Simulation Experiments with Path Learning
4.1.1. Experimental Setup
4.1.2. Experimental Results and Discussion
4.2. Simulation Experiments with Sparse External Rewards
4.2.1. Experimental Setup
4.2.2. Experimental Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, Z.; Barenji, A.V.; Jiang, J.; Zhong, R.Y.; Xu, G. A mechanism for scheduling multi robot intelligent warehouse system face with dynamic demand. J. Intell. Manuf. 2020, 31, 469–480. [Google Scholar] [CrossRef]
- Matheson, E.; Minto, R.; Zampieri, E.G.G.; Faccio, M.; Rosati, G. Human–Robot Collaboration in Manufacturing Applications: A Review. Robotics 2019, 8, 100. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhao, W.; Chu, S.; Wang, L.; Fu, J.; Yang, J.; Gao, B. Research progress of nuclear emergency response robot. IOP Conf. Ser. Mater. Sci. Eng. 2018, 452, 042102. [Google Scholar] [CrossRef]
- Li, F.; Hou, S.; Bu, C.; Qu, B. Robots for the urban earthquake environment. Disaster Med. Public Health Prep. 2022, 17, 181. [Google Scholar] [CrossRef] [PubMed]
- He, Z.; Ye, D.; Liu, L.; Di, C.A.; Zhu, D. Advances in materials and devices for mimicking sensory adaptation. Mater. Horiz. 2022, 9, 147–163. [Google Scholar] [CrossRef] [PubMed]
- Graczyk, E.L.; Delhaye, B.P.; Schiefer, M.A.; Bensmaia, S.J.; Tyler, D.J. Sensory adaptation to electrical stimulation of the somatosensory nerves. J. Neural Eng. 2018, 15, 046002. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Zhu, H.; Yu, J.; Gupta, A.; Shah, D.; Hartikainen, K.; Singh, A.; Kumar, V.; Levine, S. The Ingredients of Real-World Robotic Reinforcement Learning. International Conference on Learning Representations. arXiv 2020, arXiv:2004.12570. [Google Scholar]
- Akalin, N.; Loutfi, A. Reinforcement Learning Approaches in Social Robotics. Sensors 2021, 21, 1292. [Google Scholar] [CrossRef] [PubMed]
- Kuhnle, A.; Kaiser, J.-P.; Theiß, F.; Stricker, N.; Lanza, G. Designing an adaptive production control system using reinforcement learning. J. Intell. Manuf. 2021, 32, 855–876. [Google Scholar] [CrossRef]
- Eschmann, J. Reward function design in reinforcement learning. In Reinforcement Learning Algorithms: Analysis and Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 25–33. [Google Scholar]
- Everitt, T.; Hutter, M.; Kumar, R.; Krakovna, V. Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective. Synthese 2021, 198, 6435–6467. [Google Scholar] [CrossRef]
- Fu, J.; Korattikara, A.; Levine, S.; Guadarrama, S. From language to goals: Inverse reinforcement learning for vision-based instruction following. arXiv 2019, arXiv:1902.07742. [Google Scholar]
- Arora, S.; Doshi, P. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 2021, 297, 103500. [Google Scholar] [CrossRef]
- Chentanez, N.; Barto, A.; Singh, S. Intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2004; Volume 17. [Google Scholar]
- Aubret, A.; Matignon, L.; Hassas, S. A survey on intrinsic motivation in reinforcement learning. arXiv 2019, arXiv:1908.06976. [Google Scholar]
- Colas, C.; Fournier, P.; Chetouani, M.; Sigaud, O.; Oudeyer, P.Y. Curious: Intrinsically motivated modular multi-goal reinforcement learning. In Proceedings of the International Conference on Machine Learning, Beijing China, 3–7 November 2019; pp. 1331–1340. [Google Scholar]
- Hakim, A.A.B.M.N.; Fukuzawa, K.; Kurashige, K. Proposal of Time-based evaluation for Universal Sensor Evaluation Index in Self-generation of Reward. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 1161–1166. [Google Scholar] [CrossRef]
- Ono, Y.; Kurashige, K.; Hakim, A.A.B.M.N.; Kondo, S.; Fukuzawa, K. Proposal of Self-generation of Reward for danger avoidance by disregarding specific situations. In Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), Online, 5–7 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Kurashige, K.; Nikaido, K. Self-Generation of Reward by Moderate-Based Index for Senor Inputs. J. Robot. Mechatron. 2015, 27, 57–63. [Google Scholar] [CrossRef]
- Watanabe, M.; Narita, M. Brain Reward Circuit and Pain. In Advances in Pain Research: Mechanisms and Modulation of Chronic Pain; Springer: Berlin/Heidelberg, Germany, 2018; pp. 201–210. [Google Scholar]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-Driven Exploration by Self-Supervised Prediction. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2778–2787. [Google Scholar] [CrossRef]
- Sugimoto, S. The Effect of Prolonged Lack of Sensory Stimulation upon Human Behavior. Philosophy 1967, 50, 361–374. [Google Scholar]
- Sugimoto, S. Human mental processes under sensory restriction environment. Jpn. J. Soc. Psychol. 1985, 1, 27–34. [Google Scholar]
- Zhong, H.; Wang, J.; Jia, H.; Mu, Y.; Lv, S. Vector field-based support vector regression for building energy consumption prediction. Appl. Energy 2019, 242, 403–414. [Google Scholar] [CrossRef]
- Quan, Q.; Zou, H.; Huang, X.; Lei, J. Research on water temperature prediction based on improved support vector regression. Neural Comput. Appl. 2020, 34, 8501–8510. [Google Scholar] [CrossRef]
The number of trials | 1000 |
The number of actions per trial | 200 |
Learning method | Q-learning |
Learning rate in Q-learning | 0.3 |
Discount rate in Q-learning | 0.99 |
The action selection method | -greedy |
in the -greedy method | 0.01 |
Goal reward | 1 |
Reward for goal failure per trial | |
Reward for colliding with a wall | |
Reward per action |
The maximum value of the temperature sensor | 100 |
The minimum value of the temperature sensor | 0 |
Parameter N | 0.08 |
in the evaluation for the time with no input | 250 |
in the evaluation for the time with no input | 0.99 |
Parameter of the proposed method c | 0.001 |
Parameter of the proposed method | 0.001 |
Input for every action taken | 100 |
The number of trials | 2000 |
The number of actions per trial | 200 |
Learning method | Q-learning |
Learning rate in Q-learning | 0.3 |
Discount rate in Q-learning | 0.99 |
The action selection method | -greedy |
in the -greedy method | 0.01 |
Reward for continued 200 actions | 1 |
The reward of inactivity | |
Reward for colliding with a wall |
The maximum value of the temperature sensor | 100 |
The minimum value of the temperature sensor | 0 |
Parameter N | 0.08 |
in the evaluation for the time with no input | 250 |
in the evaluation for the time with no input | 0.99 |
Parameter of the proposed method c | 0.001 |
Parameter of the proposed method | 0.001 |
Input for action taken | 100 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sakamoto, Y.; Kurashige, K. Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input. Machines 2023, 11, 892. https://doi.org/10.3390/machines11090892
Sakamoto Y, Kurashige K. Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input. Machines. 2023; 11(9):892. https://doi.org/10.3390/machines11090892
Chicago/Turabian StyleSakamoto, Yuma, and Kentarou Kurashige. 2023. "Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input" Machines 11, no. 9: 892. https://doi.org/10.3390/machines11090892
APA StyleSakamoto, Y., & Kurashige, K. (2023). Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input. Machines, 11(9), 892. https://doi.org/10.3390/machines11090892