Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning
Abstract
:1. Introduction
2. Related Works
2.1. Integration of Prior Knowledge through Human-Delivered Reward Functions
2.2. Reward Function Analysis for Human Avoidance in Robot Navigation
3. Suggested Reward Function
3.1. Markov Decision Process (MDP)
3.2. Transformable Gaussian Reward Function (TGRF)
3.3. Application of Transformable Gaussian Reward Function (TGRF)
4. Simulation Experiments
4.1. Environment and Navigation Methods
4.1.1. Simulation Environment
4.1.2. Navigation Methods
- DS-RNN: A model utilizing an RNN. However, it does not predict trajectories.
- No pred + HH Attn: Attention-based model excluding trajectory prediction ().
- Const vel + HH Attn: The experimental case assumes that the trajectory-prediction algorithm predicts the trajectories to move at a constant velocity.
- Truth + HH Attn: This experiment assumes that the robot predicts the actual human trajectory.
- GST + HH Attn: Scenarios in which the robot predicts the human trajectory nonlinearly using the GST.
4.2. Results
4.2.1. Results in Different Navigation Methods
4.2.2. Performance Comparison with Other Reward Functions
5. Real-World Experiments
6. Results and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nourbakhsh, I.R.; Andre, D.; Tomasi, C.; Genesereth, M.R. Mobile robot obstacle avoidance via depth from focus. Robot. Auton. Syst. 1997, 22, 151–158. [Google Scholar] [CrossRef]
- Ulrich, I.; Borenstein, J. VFH+: Reliable obstacle avoidance for fast mobile robots. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Leuven, Belgium, 20 May 1998; Volume 2, pp. 1572–1577. [Google Scholar]
- Nalpantidis, L.; Gasteratos, A. Stereovision-based fuzzy obstacle avoidance method. Int. J. Humanoid Robot. 2011, 8, 169–183. [Google Scholar] [CrossRef]
- Nalpantidis, L.; Sirakoulis, G.C.; Gasteratos, A. Non-probabilistic cellular automata-enhanced stereo vision simultaneous localization and mapping. Meas. Sci. Technol. 2011, 22, 114027. [Google Scholar] [CrossRef]
- Pritsker, A.A.B. Introduction to Simulation and SLAM II; John Wiley & Sons, Inc.: New York, NY, USA, 1995. [Google Scholar]
- Grisetti, G.; Kümmerle, R.; Stachniss, C.; Burgard, W. A tutorial on graph-based SLAM. IEEE Intell. Transp. Syst. Mag. 2010, 2, 31–43. [Google Scholar] [CrossRef]
- Ai, Y.; Rui, T.; Lu, M.; Fu, L.; Liu, S.; Wang, S. DDL-SLAM: A robust RGB-D SLAM in dynamic environments combined with deep learning. IEEE Access 2020, 8, 162335–162342. [Google Scholar] [CrossRef]
- Cui, L.; Ma, C. SDF-SLAM: Semantic depth filter SLAM for dynamic environments. IEEE Access 2020, 8, 95301–95311. [Google Scholar] [CrossRef]
- Borenstein, J.; Koren, Y. Real-time obstacle avoidance for fast mobile robots. IEEE Trans. Syst. Man, Cybern. 1989, 19, 1179–1187. [Google Scholar] [CrossRef]
- Van Den Berg, J.; Guy, S.J.; Lin, M.; Manocha, D. Reciprocal n-body collision avoidance. In Robotics Research: The 14th International Symposium ISRR; Springer: Berlin/Heidelberg, Germany, 2011; pp. 3–19. [Google Scholar]
- Helbing, D.; Molnar, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282. [Google Scholar] [CrossRef] [PubMed]
- Patel, U.; Kumar, N.K.S.; Sathyamoorthy, A.J.; Manocha, D. Dwa-rl: Dynamically feasible deep reinforcement learning policy for robot navigation among mobile obstacles. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Virtual, 30 May–5 June 2021; pp. 6057–6063. [Google Scholar]
- Liu, S.; Chang, P.; Huang, Z.; Chakraborty, N.; Hong, K.; Liang, W.; Driggs-Campbell, K. Intention aware robot crowd navigation with attention-based interaction graph. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 12015–12021. [Google Scholar]
- Chen, C.; Liu, Y.; Kreiss, S.; Alahi, A. Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6015–6022. [Google Scholar]
- Van Den Berg, J.; Lin, M.; Manocha, D. Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 1928–1935. [Google Scholar]
- Oh, J.; Heo, J.; Lee, J.; Lee, G.; Kang, M.; Park, J.; Oh, S. Scan: Socially-aware navigation using monte carlo tree search. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 7576–7582. [Google Scholar]
- Liu, S.; Chang, P.; Liang, W.; Chakraborty, N.; Driggs-Campbell, K. Decentralized structural-rnn for robot crowd navigation with deep reinforcement learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Virtual, 30 May–5 June 2021; pp. 3517–3524. [Google Scholar]
- Kretzschmar, H.; Grisetti, G.; Stachniss, C. Lifelong map learning for graph-based slam in static environments. KI-Künstliche Intell. 2010, 24, 199–206. [Google Scholar] [CrossRef]
- Brown, N. Edward T. Hall: Proxemic Theory, 1966; Center for Spatially Integrated Social Science, University of California, Santa Barbara: Santa Barbara, CA, USA, 2001; Available online: http://www.csiss.org/classics/content/13 (accessed on 10 June 2023).
- Rios-Martinez, J.; Spalanzani, A.; Laugier, C. From proxemics theory to socially-aware navigation: A survey. Int. J. Soc. Robot. 2015, 7, 137–153. [Google Scholar] [CrossRef]
- Bellman, R. A Markovian decision process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Sutton, R.S. Learning to predict by the methods of temporal differences. Mach. Learn. 1988, 3, 9–44. [Google Scholar] [CrossRef]
- Jeong, H.; Hassani, H.; Morari, M.; Lee, D.D.; Pappas, G.J. Deep reinforcement learning for active target tracking. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Virtual, 30 May–5 June 2021; pp. 1825–1831. [Google Scholar]
- Gleave, A.; Dennis, M.; Legg, S.; Russell, S.; Leike, J. Quantifying differences in reward functions. arXiv 2020, arXiv:2006.13900. [Google Scholar]
- Mataric, M.J. Reward functions for accelerated learning. In Proceedings of the Machine Learning Proceedings 1994, New Brunswick, NJ, USA, 10–13 July 1994; Morgan Kaufmann: Burlington, MA, USA, 1994; pp. 181–189. [Google Scholar]
- Laud, A.D. Theory and Application of Reward Shaping in Reinforcement Learning. Ph.D. Thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 2004. [Google Scholar]
- Montero, E.E.; Mutahira, H.; Pico, N.; Muhammad, M.S. Dynamic warning zone and a short-distance goal for autonomous robot navigation using deep reinforcement learning. Complex Intell. Syst. 2024, 10, 1149–1166. [Google Scholar] [CrossRef]
- Samsani, S.S.; Muhammad, M.S. Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning. IEEE Robot. Autom. Lett. 2021, 6, 5223–5230. [Google Scholar] [CrossRef]
- Samsani, S.S.; Mutahira, H.; Muhammad, M.S. Memory-based crowd-aware robot navigation using deep reinforcement learning. Complex Intell. Syst. 2023, 9, 2147–2158. [Google Scholar] [CrossRef]
- Choi, J.; Lee, G.; Lee, C. Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell. Serv. Robot. 2021, 14, 663–677. [Google Scholar] [CrossRef]
- Liu, S.; Chang, P.; Huang, Z.; Chakraborty, N.; Liang, W.; Geng, J.; Driggs-Campbell, K. Socially aware robot crowd navigation with interaction graphs and human trajectory prediction. arXiv 2022, arXiv:2203.01821. [Google Scholar]
- Pérez-D’Arpino, C.; Liu, C.; Goebel, P.; Martín-Martín, R.; Savarese, S. Robot navigation in constrained pedestrian environments using reinforcement learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Virtual, 30 May–5 June 2021; pp. 1140–1146. [Google Scholar]
- Scholz, J.; Jindal, N.; Levihn, M.; Isbell, C.L.; Christensen, H.I. Navigation among movable obstacles with learned dynamic constraints. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 3706–3713. [Google Scholar]
- Cassandra, A.R. A survey of POMDP applications. In Proceedings of the Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, Orlando, FL, USA, 22–24 October 1998; Volume 1724. [Google Scholar]
- Hu, Y.; Wang, W.; Jia, H.; Wang, Y.; Chen, Y.; Hao, J.; Wu, F.; Fan, C. Learning to utilize shaping rewards: A new approach of reward shaping. Adv. Neural Inf. Process. Syst. 2020, 33, 15931–15941. [Google Scholar]
- Icarte, R.T.; Klassen, T.Q.; Valenzano, R.; McIlraith, S.A. Reward machines: Exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 2022, 73, 173–208. [Google Scholar] [CrossRef]
- Yuan, M.; Li, B.; Jin, X.; Zeng, W. Automatic intrinsic reward shaping for exploration in deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 40531–40554. [Google Scholar]
- Zhang, S.; Wan, Y.; Sutton, R.S.; Whiteson, S. Average-reward off-policy policy evaluation with function approximation. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 12578–12588. [Google Scholar]
- Rucker, M.A.; Watson, L.T.; Gerber, M.S.; Barnes, L.E. Reward shaping for human learning via inverse reinforcement learning. arXiv 2020, arXiv:2002.10904. [Google Scholar]
- Goyal, P.; Niekum, S.; Mooney, R.J. Using natural language for reward shaping in reinforcement learning. arXiv 2019, arXiv:1903.02020. [Google Scholar]
- Trautman, P.; Krause, A. Unfreezing the robot: Navigation in dense, interacting crowds. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 797–803. [Google Scholar]
- Huang, Z.; Li, R.; Shin, K.; Driggs-Campbell, K. Learning sparse interaction graphs of partially detected pedestrians for trajectory prediction. IEEE Robot. Autom. Lett. 2021, 7, 1198–1205. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
- Goodman, N.R. Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Ann. Math. Stat. 1963, 34, 152–177. [Google Scholar] [CrossRef]
Reward | Navigation Method | Mean (Sigma) of SR | SR (%) | NT (s) | PL (m) | ITR (%) |
---|---|---|---|---|---|---|
DS-RNN | 35.5 (7.697) | 44.0 | 20.48 | 20.58 | 15.45 | |
No pred + HH Attn | 59.636 (4.848) | 67.0 | 17.49 | 20.30 | 17.22 | |
Without | Const vel + HH Attn | 65 (8.023) | 81.0 | 17.34 | 21.95 | 6.15 |
TGRF | Truth + HH Attn | 5.545 (1.616) | 5.0 | 21.60 | 23.89 | 14.83 |
GST + HH Attn | 77.1 (5.718) | 88.0 | 14.18 | 20.19 | 7.38 | |
DS-RNN | 30.5 (4.843) | 40.0 | 27.11 | 25.19 | 12.19 | |
With | No pred + HH Attn | 59.364 (6.692) | 72.0 | 18.17 | 21.92 | 14.16 |
TGRF | Const vel + HH Attn | 87.909 (3.029) | 92.0 | 16.38 | 22.33 | 5.08 |
(Ours) | Truth + HH Attn | 84.909 (4.776) | 92.0 | 17.13 | 22.52 | 5.30 |
GST + HH Attn | 94.091 (2.843) | 97.0 | 17.63 | 23.81 | 3.92 |
Reward | Navigation Method | Mean (Sigma) of SR | SR (%) | NT (s) | PL (m) | ITR (%) |
---|---|---|---|---|---|---|
DS-RNN | 29.8 (5.231) | 36.0 | 23.26 | 27.13 | 13.38 | |
Without | No pred + HH Attn | 12.091 (8.062) | 28.0 | 26.52 | 34.98 | 12.78 |
TGRF | Const vel + HH Attn | 92.182 (3.588) | 96.0 | 14.74 | 21.49 | 5.24 |
GST + HH Attn | 91.636 (2.267) | 95.0 | 13.74 | 20.47 | 5.37 | |
DS-RNN | 48.6 (9.013) | 62.0 | 22.48 | 25.26 | 10.14 | |
With | No pred + HH Attn | 77.364 (6.079) | 87.0 | 16.19 | 21.95 | 13.43 |
TGRF | Const vel + HH Attn | 95.273 (1.911) | 98.0 | 17.00 | 23.55 | 5.39 |
(Ours) | GST + HH Attn | 92.909 (3.579) | 96.0 | 15.37 | 21.91 | 5.81 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.; Kang, S.; Yang, S.; Kim, B.; Yura, J.; Kim, D. Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning. Sensors 2024, 24, 4540. https://doi.org/10.3390/s24144540
Kim J, Kang S, Yang S, Kim B, Yura J, Kim D. Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning. Sensors. 2024; 24(14):4540. https://doi.org/10.3390/s24144540
Chicago/Turabian StyleKim, Jinyeob, Sumin Kang, Sungwoo Yang, Beomjoon Kim, Jargalbaatar Yura, and Donghan Kim. 2024. "Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning" Sensors 24, no. 14: 4540. https://doi.org/10.3390/s24144540
APA StyleKim, J., Kang, S., Yang, S., Kim, B., Yura, J., & Kim, D. (2024). Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning. Sensors, 24(14), 4540. https://doi.org/10.3390/s24144540