Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot
Abstract
:1. Introduction
2. The hTetran Platform Description
2.1. The hTetran System Architecture
2.2. Description of hTetran in the Polyabolo-Based Worspace
3. Complete Path Planning by hTetran the Polyabolo-Based Tiling Platform
3.1. Tiling Theory for Polyabolo-Based hTetran
3.2. Optimal Complete Overage Framework
4. Reinforcement Learning Approach for TSP-Based Coverage Path Planning
4.1. Energy Aware RL Reward Function
4.2. Optimization with Reinforcement Learning
5. Experimental Results
5.1. RL Training and Trajectory Generation Results
5.2. Real Environment Testbed
6. Conclusions
- A model for assessing necessity in a dynamic workspace;
- The autonomous tuning for hyperparameters of RL frameworks;
- Multi-target RL;
- Increased autonomy of considerable distance with the robot stage tiled movement;
- Consideration of robot locomotion and environment friction.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ACO | Ant Colony Optimization |
CPP | Coverage Path Planning |
GA | Genetic Algorithm |
GRU | Gated Recurrent Unit |
LSTM | Long Short-Term Memory |
PPO | Proximal Policy Optimization |
RL | Reinforcement Learning |
RNN | Recurrent Neural Network |
ROS | Robot Operating System |
TSP | Travelling Salesman Problem |
References
- Pandey, A.; Pandey, S.; Parhi, D. Mobile robot navigation and obstacle avoidance techniques: A review. Int. Robot. Autom. J. 2017, 2, 00022. [Google Scholar] [CrossRef] [Green Version]
- Holland, J.M. Designing Autonomous Mobile Robots: Inside the Mind of an Intelligent Machine; Elsevier: Amsterdam, The Netherlands, 2004. [Google Scholar]
- Murata, S.; Kurokawa, H. Self-reconfigurable robots. IEEE Robot. Autom. Mag. 2007, 14, 71–78. [Google Scholar] [CrossRef]
- Zhao, B.; Liu, D. Event-triggered decentralized tracking control of modular reconfigurable robots through adaptive dynamic programming. IEEE Trans. Ind. Electron. 2019, 67, 3054–3064. [Google Scholar] [CrossRef]
- Le, A.V.; Parween, R.; Elara Mohan, R.; Khanh Nhan, N.H.; Enjikalayil, R. Optimization Complete Area Coverage by Reconfigurable hTrihex Tiling Robot. Sensors 2020, 20, 3170. [Google Scholar] [CrossRef]
- Prabakaran, V.; Elara, M.R.; Pathmakumar, T.; Nansai, S. hTetro: A tetris inspired shape shifting floor cleaning robot. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017. [Google Scholar] [CrossRef]
- Le, A.V.; Parween, R.; Kyaw, P.T.; Mohan, R.E.; Minh, T.H.Q.; Borusu, C.S.C.S. Reinforcement Learning-Based Energy-Aware Area Coverage for Reconfigurable hRombo Tiling Robot. IEEE Access 2020, 8, 209750–209761. [Google Scholar] [CrossRef]
- Le, A.V.; Nhan, N.H.K.; Mohan, R.E. Evolutionary Algorithm-Based Complete Coverage Path Planning for Tetriamond Tiling Robots. Sensors 2020, 20, 445. [Google Scholar] [CrossRef] [Green Version]
- Carvalho, R.D.; Vidal, H.; Vieira, P.; Ribeiro, M. Complete coverage path planning and guidance for cleaning robots. In Proceedings of the ISIE 97 Proceeding of the IEEE International Symposium on Industrial Electronics, Guimaraes, Portugal, 7–11 July 1997. [Google Scholar] [CrossRef]
- Kang, J.W.; Kim, S.J.; Chung, M.J.; Myung, H.; Park, J.H.; Bang, S.W. Path planning for complete and efficient coverage operation of mobile robots. In Proceedings of the 2007 International Conference on Mechatronics and Automation, Harbin, China, 5–8 August 2007; pp. 2126–2131. [Google Scholar]
- Le, A.V.; Prabakaran, V.; Sivanantham, V.; Mohan, R.E. Modified a-star algorithm for efficient coverage path planning in tetris inspired self-reconfigurable robot with integrated laser sensor. Sensors 2018, 18, 2585. [Google Scholar] [CrossRef] [Green Version]
- Di Franco, C.; Buttazzo, G. Energy-aware coverage path planning of UAVs. In Proceedings of the 2015 IEEE International Conference on Autonomous Robot Systems and Competitions, Vila Real, Portugal, 8–10 April 2015; pp. 111–117. [Google Scholar]
- Cheng, K.P.; Mohan, R.E.; Nhan, N.H.K.; Le, A.V. Graph Theory-Based Approach to Accomplish Complete Coverage Path Planning Tasks for Reconfigurable Robots. IEEE Access 2019, 7, 94642–94657. [Google Scholar] [CrossRef]
- Le, A.V.; Kyaw, P.T.; Veerajagadheswar, P.; Muthugala, M.V.J.; Elara, M.R.; Kumar, M.; Nhan, N.H.K. Reinforcement learning-based optimal complete water-blasting for autonomous ship hull corrosion cleaning system. Ocean Eng. 2021, 220, 108477. [Google Scholar] [CrossRef]
- Lakshmanan; Mohan, R.E.; Ramalingam, B.; Le, A.V.; Veerajagadeshwar, P.; Tiwari, K.; Ilyas, M. Complete coverage path planning using reinforcement learning for tetromino based cleaning and maintenance robot. Autom. Constr. 2020, 112, 103078. [Google Scholar]
- Lobos-Tsunekawa, K.; Leiva, F.; Ruiz-del Solar, J. Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robot. Autom. Lett. 2018, 3, 3247–3254. [Google Scholar] [CrossRef]
- Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep Reinforcement Learning Robot for Search and Rescue Applications: Exploration in Unknown Cluttered Environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
- Panov, A.I.; Yakovlev, K.S.; Suvorov, R. Grid path planning with deep reinforcement learning: Preliminary results. Procedia Comput. Sci. 2018, 123, 347–353. [Google Scholar] [CrossRef]
- Konar, A.; Chakraborty, I.G.; Singh, S.J.; Jain, L.C.; Nagar, A.K. A deterministic improved Q-learning for path planning of a mobile robot. IEEE Trans. Syst. Man Cybern. Syst. 2013, 43, 1141–1153. [Google Scholar] [CrossRef] [Green Version]
- Low, E.S.; Ong, P.; Cheah, K.C. Solving the optimal path planning of a mobile robot using improved Q-learning. Robot. Auton. Syst. 2019, 115, 143–161. [Google Scholar] [CrossRef]
- Cruz, D.L.; Yu, W. Path planning of multi-agent systems in unknown environment with neural kernel smoothing and reinforcement learning. Neurocomputing 2017, 233, 34–42. [Google Scholar] [CrossRef]
- Yuan, J.; Wang, H.; Lin, C.; Liu, D.; Yu, D. A Novel GRU-RNN Network Model for Dynamic Path Planning of Mobile Robot. IEEE Access 2019, 7, 15140–15151. [Google Scholar] [CrossRef]
- Acar, E.U.; Choset, H.; Rizzi, A.A.; Atkar, P.N.; Hull, D. Morse decompositions for coverage tasks. Int. J. Robot. Res. 2002, 21, 331–344. [Google Scholar] [CrossRef]
- Galceran, E.; Campos, R.; Palomeras, N.; Carreras, M.; Ridao, P. Coverage path planning with realtime replanning for inspection of 3d underwater structures. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 6586–6591. [Google Scholar]
- Kim, D.; Lee, D.; Myung, H.; Choi, H.T. Artificial landmark-based underwater localization for AUVs using weighted template matching. Intell. Serv. Robot. 2014, 7, 175–184. [Google Scholar] [CrossRef]
- Choset, H. Coverage for robotics—A survey of recent results. Ann. Math. Artif. Intell. 2001, 31, 113–126. [Google Scholar] [CrossRef]
- Ghaddar, A.; Merei, A.; Natalizio, E. PPS: Energy-Aware Grid-Based Coverage Path Planning for UAVs Using Area Partitioning in the Presence of NFZs. Sensors 2020, 20, 3742. [Google Scholar] [CrossRef]
- Manimuthu, A.; Le, A.V.; Mohan, R.E.; Veerajagadeshwar, P.; Huu Khanh Nhan, N.; Ping Cheng, K. Energy Consumption Estimation Model for Complete Coverage of a Tetromino Inspired Reconfigurable Surface Tiling Robot. Energies 2019, 12, 2257. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.X.; Luo, C. A neural network approach to complete coverage path planning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004, 34, 718–724. [Google Scholar] [CrossRef]
- Kyaw, P.T.; Paing, A.; Thu, T.T.; Mohan, R.E.; Le, A.V.; Veerajagadheswar, P. Coverage Path Planning for Decomposition Reconfigurable Grid-Maps Using Deep Reinforcement Learning based Travelling Salesman Problem. IEEE Access 2020, 8, 225945. [Google Scholar] [CrossRef]
- Le, A.V.; Arunmozhi, M.; Veerajagadheswar, P.; Ku, P.C.; Minh, T.H.Q.; Sivanantham, V.; Mohan, R.E. Complete path planning for a tetris-inspired self-reconfigurable robot by the genetic algorithm of the traveling salesman problem. Electronics 2018, 7, 344. [Google Scholar] [CrossRef] [Green Version]
- A Polyomino Tiling Algorithm. 2018. Available online: https://gfredericks.com/gfrlog/99 (accessed on 15 July 2020).
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
- Konda, V.R.; Tsitsiklis, J.N. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 2000, 42, 1008–1014. [Google Scholar]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2692–2700. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Le, A.V.; Ku, P.C.; Than Tun, T.; Huu Khanh Nhan, N.; Shi, Y.; Mohan, R.E. Realization Energy Optimization of Complete Path Planning in Differential Drive Based Self-Reconfigurable Floor Cleaning Robot. Energies 2019, 12, 1136. [Google Scholar] [CrossRef] [Green Version]
- Shi, Y.; Elara, M.R.; Le, A.V.; Prabakaran, V.; Wood, K.L. Path tracking control of self-reconfigurable robot hTetro with four differential drive units. IEEE Robot. Autom. Lett. 2020, 5, 3998–4005. [Google Scholar] [CrossRef] [Green Version]
Wd | Rectangle | Triangle | Parallelogram | Curve | Square | |
---|---|---|---|---|---|---|
Ws | ||||||
Rectangle | 0 0 0 0 | 0 0 0 | 0 0 | 0 | 0 0 − (, −) | |
Triangle | 0 0 0 | 0 0 0 0 | 0 0 0 | 0 0 | 0 | |
Parallelogram | 0 0 | 0 0 0 | 0 0 0 0 | (, ) 0 | (, ) 0 0 | |
Curve | 0 | 0 0 | (, ) 0 | 0 0 0 0 | 0 0 0 | |
Square | 0 0 (, ) | 0 | (, ) 0 0 | 0 0 0 | 0 0 0 0 |
Wd | Rectangle | Triangle | Parallelogram | Curve | Square | |
---|---|---|---|---|---|---|
Ws | ||||||
Rectangle | 0 0 0 0 | 0 0 0 | 0 0 | 0 | 0 0 (, ) | |
Triangle | 0 0 0 | 0 0 0 0 | 0 0 0 | 0 0 | 0 | |
Parallelogram | 0 0 | 0 0 0 | 0 0 0 0 | (, ) 0 | (, ) 0 0 | |
Curve | 0 | 0 0 | (, ) 0 | 0 0 0 0 | 0 0 0 | |
Square | 0 0 (, ) | 0 | (, ) 0 0 | 0 0 0 | 0 0 0 0 |
Approach | 2D Distance (m) | Total Cost Weight (Nm) | Running Time (second) |
---|---|---|---|
Zigzag | 51.43 | 382.26 | 0.05 |
Spiral | 50.91 | 384.32 | 0.06 |
ACO | 49.42 | 322.15 | 6.21 |
RL | 49.09 | 315.36 | 2.16 |
Method | Costweight | Summation | Translation | Transformation | Orientation | Travel |
---|---|---|---|---|---|---|
- | (Nm) | Energy (J) | Energy (J) | Energy (J) | Energy (J) | Time (second) |
Zigzag | 382.26 | 63.26 | 32.39 | 19.52 | 11.35 | 1683 |
Spiral | 384.32 | 60.26 | 30.32 | 19.11 | 10.83 | 1679 |
ACO | 322.15 | 53.59 | 25.51 | 17.95 | 10.13 | 1244 |
RL | 315.36 | 51.15 | 26.24 | 15.56 | 9.35 | 1212 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Le, A.V.; Veerajagadheswar, P.; Thiha Kyaw, P.; Elara, M.R.; Nhan, N.H.K. Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot. Sensors 2021, 21, 2577. https://doi.org/10.3390/s21082577
Le AV, Veerajagadheswar P, Thiha Kyaw P, Elara MR, Nhan NHK. Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot. Sensors. 2021; 21(8):2577. https://doi.org/10.3390/s21082577
Chicago/Turabian StyleLe, Anh Vu, Prabakaran Veerajagadheswar, Phone Thiha Kyaw, Mohan Rajesh Elara, and Nguyen Huu Khanh Nhan. 2021. "Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot" Sensors 21, no. 8: 2577. https://doi.org/10.3390/s21082577
APA StyleLe, A. V., Veerajagadheswar, P., Thiha Kyaw, P., Elara, M. R., & Nhan, N. H. K. (2021). Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot. Sensors, 21(8), 2577. https://doi.org/10.3390/s21082577