Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey
:1. Introduction
2. Background
2.1. MDP and Q-Learning
2.2. Multi-Agent Q-Learning
2.3. (Multi-Agent) Deep Q-Learning
2.3.1. Q-Networks
2.3.2. Policy Optimization Techniques
2.3.3. Extensions to Multi-Agent
3. Multi-Robot System Applications of Multi-Agent Deep Reinforcement Learning
3.1. Coverage and Exploration
3.2. Path Planning and Navigation
3.3. Swarm Behavior Modeling
3.4. Pursuit-Evasion
3.5. Information Collection
3.6. Task Allocation
3.7. Object Transportation
3.8. Collective Construction
4. Challenges and Discussion
- VMAS: Vectorized Multi-Agent Simulator for Collective Robot Learning (VMAS) is an open-source software for multi-robot application benchmarking [258]. Some applications that are part of the software include swarm behaviors, such as flocking and dispersion, as well as object transportation and multi-robot football. Note that it is a 2D physics simulator powered by PyTorch [259].
- MultiRoboLearn: Similar to VMAS, this is an open-source framework for multi-robot deep reinforcement learning applications [260]. The authors aim to unify the simulation and the real-world experiments with multiple robots via this presented software tool, which is accomplished by integrating ROS into the simulator. Mostly multi-robot navigation scenarios were tested. It would be interesting to extend this software to other multi-robot applications, especially where the robots might be static.
- MARLlib: Although not strictly built for robots, Multi-Agent RLlib (MARLlib) [261] is a multi-agent DRL software framework that is built upon Ray [262] and its toolkit RLlib [263]. This is a rich open-source software that follows Open AI Gym standards and provides frameworks for not only cooperative tasks, but for competitive multi-robot applications as well. Currently, ten environments are supported by MARLlib among which the grid world environment might be the most relevant one to the multi-robot researchers. Many baseline algorithms including the ones that are highly popular among roboticists, e.g., DDPG, PPO, and TRPO are available as baselines. The authors also show that this software is much more versatile than some of the existing ones including [264,265].
5. Conclusions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Arai, T.; Pagello, E.; Parker, L.E. Advances in multi-robot systems. IEEE Trans. Robot. Autom. 2002, 18, 655–661. [Google Scholar] [CrossRef] [Green Version]
- Gautam, A.; Mohan, S. A review of research in multi-robot systems. In Proceedings of the 7th IEEE International Conference on Industrial and Information Systems (ICIIS), Chennai, India, 6–9 August 2012; pp. 1–5. [Google Scholar]
- Rizk, Y.; Awad, M.; Tunstel, E.W. Cooperative heterogeneous multi-robot systems: A survey. ACM Comput. Surv. (CSUR) 2019, 52, 1–31. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Fawzi, A.; Balog, M.; Huang, A.; Hubert, T.; Romera-Paredes, B.; Barekatain, M.; Novikov, A.; Ruiz, F.J.R.; Schrittwieser, J.; Swirszcz, G.; et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 2022, 610, 47–53. [Google Scholar] [CrossRef]
- Popova, M.; Isayev, O.; Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 2018, 4, eaap7885. [Google Scholar] [CrossRef] [Green Version]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Mammeri, Z. Reinforcement learning based routing in networks: Review and classification of approaches. IEEE Access 2019, 7, 55916–55950. [Google Scholar] [CrossRef]
- Panov, A.I.; Yakovlev, K.S.; Suvorov, R. Grid path planning with deep reinforcement learning: Preliminary results. Procedia Comput. Sci. 2018, 123, 347–353. [Google Scholar] [CrossRef]
- Theile, M.; Bayerlein, H.; Nai, R.; Gesbert, D.; Caccamo, M. UAV coverage path planning under varying power constraints using deep reinforcement learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 1444–1449. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Nguyen, H.; La, H. Review of deep reinforcement learning for robot manipulation. In Proceedings of the 3rd IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019; pp. 590–595. [Google Scholar]
- Busoniu, L.; Babuska, R.; De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2008, 38, 156–172. [Google Scholar] [CrossRef] [Green Version]
- Yang, E.; Gu, D. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey; Technical Report of the Department of Computer Science; 2004. [Google Scholar]
- Dutta, A.; Roy, S.; Kreidl, O.P.; Bölöni, L. Multi-Robot Information Gathering for Precision Agriculture: Current State, Scope, and Challenges. IEEE Access 2021, 9, 161416–161430. [Google Scholar] [CrossRef]
- Zhou, Z.; Liu, J.; Yu, J. A survey of underwater multi-robot systems. IEEE/CAA J. Autom. Sin. 2021, 9, 1–18. [Google Scholar] [CrossRef]
- Queralta, J.P.; Taipalmaa, J.; Pullinen, B.C.; Sarker, V.K.; Gia, T.N.; Tenhunen, H.; Gabbouj, M.; Raitoharju, J.; Westerlund, T. Collaborative multi-robot search and rescue: Planning, coordination, perception, and active vision. IEEE Access 2020, 8, 191617–191643. [Google Scholar] [CrossRef]
- Yliniemi, L.; Agogino, A.K.; Tumer, K. Multirobot coordination for space exploration. AI Mag. 2014, 35, 61–74. [Google Scholar] [CrossRef] [Green Version]
- Long, P.; Fan, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 6252–6259. [Google Scholar]
- Wang, X.; Xuan, S.; Ke, L. Cooperatively pursuing a target unmanned aerial vehicle by multiple unmanned aerial vehicles based on multiagent reinforcement learning. Adv. Control Appl. Eng. Ind. Syst. 2020, 2, e27. [Google Scholar] [CrossRef]
- Pham, H.X.; La, H.M.; Feil-Seifer, D.; Nefian, A. Cooperative and distributed reinforcement learning of drones for field coverage. arXiv 2018, arXiv:1803.07250. [Google Scholar]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef] [Green Version]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
- Bloembergen, D.; Tuyls, K.; Hennes, D.; Kaisers, M. Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 2015, 53, 659–697. [Google Scholar] [CrossRef]
- Littman, M.L. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994; Elsevier: Amsterdam, The Netherlands, 1994; pp. 157–163. [Google Scholar]
- Bowling, M.; Veloso, M. Multiagent learning using a variable learning rate. Artif. Intell. 2002, 136, 215–250. [Google Scholar] [CrossRef] [Green Version]
- Kaisers, M.; Tuyls, K. Frequency adjusted multi-agent Q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto, ON, Canada, 10–14 May 2010; Volume 1, pp. 309–316. [Google Scholar]
- Dutta, A.; Dasgupta, P.; Nelson, C. Adaptive locomotion learning in modular self-reconfigurable robots: A game theoretic approach. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 3556–3561. [Google Scholar]
- Matignon, L.; Laurent, G.J.; Le Fort-Piat, N. Hysteretic q-learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October–2 November 2007; pp. 64–69. [Google Scholar]
- Dutta, A.; Dasgupta, P.; Nelson, C. Distributed adaptive locomotion learning in modred modular self-reconfigurable robot. In Distributed Autonomous Robotic Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 345–357. [Google Scholar]
- Sadhu, A.K.; Konar, A. Improving the speed of convergence of multi-agent Q-learning for cooperative task-planning by a robot-team. Robot. Auton. Syst. 2017, 92, 66–80. [Google Scholar] [CrossRef]
- Hu, J.; Wellman, M.P. Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 2003, 4, 1039–1069. [Google Scholar]
- Buşoniu, L.; Babuška, R.; Schutter, B.D. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications—1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–221. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Moon, J.; Papaioannou, S.; Laoudias, C.; Kolios, P.; Kim, S. Deep reinforcement learning multi-UAV trajectory control for target tracking. IEEE Internet Things J. 2021, 8, 15441–15455. [Google Scholar] [CrossRef]
- Wang, D.; Deng, H. Multirobot coordination with deep reinforcement learning in complex environments. Expert Syst. Appl. 2021, 180, 115128. [Google Scholar] [CrossRef]
- Yu, C.; Dong, Y.; Li, Y.; Chen, Y. Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit. J. Eng. 2020, 2020, 499–504. [Google Scholar] [CrossRef]
- Zellner, A.; Dutta, A.; Kulbaka, I.; Sharma, G. Deep Recurrent Q-learning for Energy-constrained Coverage with a Mobile Robot. arXiv 2022, arXiv:2210.00327. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 1928–1937. [Google Scholar]
- Li, B.; Li, S.; Wang, C.; Fan, R.; Shao, J.; Xie, G. Distributed Circle Formation Control for Quadrotors Based on Multi-agent Deep Reinforcement Learning. In Proceedings of the 2021 IEEE China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 4750–4755. [Google Scholar]
- Xu, Z.; Lyu, Y.; Pan, Q.; Hu, J.; Zhao, C.; Liu, S. Multi-vehicle flocking control with deep deterministic policy gradient method. In Proceedings of the 14th IEEE International Conference on Control and Automation (ICCA), Anchorage, AK, USA, 2–15 June 2018; pp. 306–311. [Google Scholar]
- Bezcioglu, M.B.; Lennox, B.; Arvin, F. Self-Organised Swarm Flocking with Deep Reinforcement Learning. In Proceedings of the 7th IEEE International Conference on Automation, Robotics and Applications (ICARA), Prague, Czech Republic, 4–6 February 2021; pp. 226–230. [Google Scholar]
- Na, S.; Niu, H.; Lennox, B.; Arvin, F. Bio-Inspired Collision Avoidance in Swarm Systems via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2022, 71, 2511–2526. [Google Scholar] [CrossRef]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
- Agrawal, A.; Won, S.J.; Sharma, T.; Deshpande, M.; McComb, C. A multi-agent reinforcement learning framework for intelligent manufacturing with autonomous mobile robots. Proc. Des. Soc. 2021, 1, 161–170. [Google Scholar] [CrossRef]
- Bromo, C. Reinforcement Learning Based Strategic Exploration Algorithm for UAVs Fleets. Ph.D. Thesis, Politecnico di Torino, Turin, Italy, 2022. [Google Scholar]
- Han, R.; Chen, S.; Wang, S.; Zhang, Z.; Gao, R.; Hao, Q.; Pan, J. Reinforcement Learned Distributed Multi-Robot Navigation With Reciprocal Velocity Obstacle Shaped Rewards. IEEE Robot. Autom. Lett. 2022, 7, 5896–5903. [Google Scholar] [CrossRef]
- Na, S.; Niu, H.; Lennox, B.; Arvin, F. Universal artificial pheromone framework with deep reinforcement learning for robotic systems. In Proceedings of the 6th IEEE International Conference on Control and Robotics Engineering (ICCRE), Beijing, China, 16–18 April 2021; pp. 28–32. [Google Scholar]
- Thumiger, N.; Deghat, M. A Multi-Agent Deep Reinforcement Learning Approach for Practical Decentralized UAV Collision Avoidance. IEEE Control Syst. Lett. 2021, 6, 2174–2179. [Google Scholar] [CrossRef]
- Wang, G.; Liu, Z.; Xiao, K.; Xu, Y.; Yang, L.; Wang, X. Collision Detection and Avoidance for Multi-UAV based on Deep Reinforcement Learning. In Proceedings of the 40th IEEE Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 7783–7789. [Google Scholar]
- Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 2017, 12, e0172395. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Egorov, M. Multi-agent deep reinforcement learning. In CS231n: Convolutional Neural Networks for Visual Recognition; 2016; pp. 1–8. [Google Scholar]
- Yang, Y.; Luo, R.; Li, M.; Zhou, M.; Zhang, W.; Wang, J. Mean field multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Playa Blanca, Spain, 9–11 April 2018; pp. 5571–5580. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6382–6393. [Google Scholar]
- Yu, C.; Velu, A.; Vinitsky, E.; Wang, Y.; Bayen, A.; Wu, Y. The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv 2021, arXiv:2103.01955. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Du, W.; Ding, S. A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artif. Intell. Rev. 2021, 54, 3215–3238. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [Green Version]
- OroojlooyJadid, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. arXiv 2019, arXiv:1908.03963. [Google Scholar]
- Wei, Y.; Zheng, R. Multi-Robot Path Planning for Mobile Sensing through Deep Reinforcement Learning. In Proceedings of the INFOCOM 2021-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
- Mou, Z.; Zhang, Y.; Gao, F.; Wang, H.; Zhang, T.; Han, Z. Deep reinforcement learning based three-dimensional area coverage with UAV swarm. IEEE J. Sel. Areas Commun. 2021, 39, 3160–3176. [Google Scholar] [CrossRef]
- Li, W.; Zhao, T.; Dian, S. Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment. J. Robot. 2022, 2022, 6825902. [Google Scholar] [CrossRef]
- Kakish, Z.; Elamvazhuthi, K.; Berman, S. Using reinforcement learning to herd a robotic swarm to a target distribution. In Proceedings of the International Symposium Distributed Autonomous Robotic Systems; Springer: Berlin/Heidelberg, Germany, 2021; pp. 401–414. [Google Scholar]
- Yang, Y.; Juntao, L.; Lingling, P. Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Trans. Intell. Technol. 2020, 5, 177–183. [Google Scholar] [CrossRef]
- Bae, H.; Kim, G.; Kim, J.; Qian, D.; Lee, S. Multi-robot path planning method using reinforcement learning. Appl. Sci. 2019, 9, 3057. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Sun, Y.; Barth, A.; Ma, O. Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning. IEEE Access 2020, 8, 184109–184119. [Google Scholar] [CrossRef]
- Marchesini, E.; Farinelli, A. Enhancing deep reinforcement learning approaches for multi-robot navigation via single-robot evolutionary policy search. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 5525–5531. [Google Scholar]
- Marchesini, E.; Farinelli, A. Centralizing state-values in dueling networks for multi-robot reinforcement learning mapless navigation. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 4583–4588. [Google Scholar]
- Zhang, H.; Li, D.; He, Y. Multi-robot cooperation strategy in game environment using deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 886–891. [Google Scholar]
- Manko, S.V.; Diane, S.A.; Krivoshatskiy, A.E.; Margolin, I.D.; Slepynina, E.A. Adaptive control of a multi-robot system for transportation of large-sized objects based on reinforcement learning. In Proceedings of the 2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), Moscow and St. Petersburg, Russia, 29 January–1 February 2018; pp. 923–927. [Google Scholar]
- Yasuda, T.; Ohkura, K. Collective behavior acquisition of real robotic swarms using deep reinforcement learning. In Proceedings of the 2nd IEEE International Conference on Robotic Computing (IRC), Laguna Hills, CA, USA, 31 January–2 February 2018; pp. 179–180. [Google Scholar]
- Eoh, G.; Park, T.H. Cooperative object transportation using curriculum-based deep reinforcement learning. Sensors 2021, 21, 4780. [Google Scholar] [CrossRef]
- Huang, W.; Wang, Y.; Yi, X. Deep q-learning to preserve connectivity in multi-robot systems. In Proceedings of the 9th International Conference on Signal Processing Systems, ICSPS 2017, Auckland, New Zealand, 27–30 November 2017; pp. 45–50. [Google Scholar]
- Gupta, J.K.; Egorov, M.; Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems; Springer: Berlin/Heidelberg, Germany, 2017; pp. 66–83. [Google Scholar]
- Wang, Z.; Gombolay, M. Learning scheduling policies for multi-robot coordination with graph attention networks. IEEE Robot. Autom. Lett. 2020, 5, 4509–4516. [Google Scholar] [CrossRef]
- Yan, C.; Wang, C.; Xiang, X.; Lan, Z.; Jiang, Y. Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing uavs using local situation maps. IEEE Trans. Ind. Inform. 2021, 18, 1260–1270. [Google Scholar] [CrossRef]
- Liu, Y.; Peng, Y.; Wang, M.; Xie, J.; Zhou, R. Multi-usv system cooperative underwater target search based on reinforcement learning and probability map. Math. Probl. Eng. 2020, 2020, 7842768. [Google Scholar] [CrossRef]
- Viseras, A.; Meissner, M.; Marchal, J. Wildfire front monitoring with multiple uavs using deep q-learning. IEEE Access 2021. [Google Scholar] [CrossRef]
- Goyal, A. Multi-Agent Deep Reinforcement Learning for Robocup Rescue Simulator. Ph.D. Thesis, The University of Texas, Austin, TX, USA, 2020. [Google Scholar]
- Chen, L.; Wang, Y.; Mo, Y.; Miao, Z.; Wang, H.; Feng, M.; Wang, S. Multi-Agent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss. IEEE Trans. Ind. Electron. 2022, 70, 7032–7040. [Google Scholar] [CrossRef]
- Jestel, C.; Surmann, H.; Stenzel, J.; Urbann, O.; Brehler, M. Obtaining Robust Control and Navigation Policies for Multi-robot Navigation via Deep Reinforcement Learning. In Proceedings of the 7th IEEE International Conference on Automation, Robotics and Applications (ICARA), Prague, Czech Republic, 4–6 February 2021; pp. 48–54. [Google Scholar]
- Gautier, P.; Laurent, J.; Diguet, J.P. Deep Q-Learning-Based Dynamic Management of a Robotic Cluster. IEEE Trans. Autom. Sci. Eng. 2022, 1–13. [Google Scholar] [CrossRef]
- Song, C.; He, Z.; Dong, L. A Local-and-Global Attention Reinforcement Learning Algorithm for Multiagent Cooperative Navigation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Ding, S.; Aoyama, H.; Lin, D. Combining Multiagent Reinforcement Learning and Search Method for Drone Delivery on a Non-grid Graph. In Proceedings of the International Conference on Practical Applications of Agents and Multi-Agent Systems; Springer: Berlin/Heidelberg, Germany, 2022; pp. 112–126. [Google Scholar]
- Choi, H.B.; Kim, J.B.; Ji, C.H.; Ihsan, U.; Han, Y.H.; Oh, S.W.; Kim, K.H.; Pyo, C.S. MARL-based Optimal Route Control in Multi-AGV Warehouses. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 21–24 February 2022; pp. 333–338. [Google Scholar]
- Johnson, D.; Chen, G.; Lu, Y. Multi-Agent Reinforcement Learning for Real-Time Dynamic Production Scheduling in a Robot Assembly Cell. IEEE Robot. Autom. Lett. 2022, 7, 7684–7691. [Google Scholar] [CrossRef]
- Chen, L.; Zhao, Y.; Zhao, H.; Zheng, B. Non-communication decentralized multi-robot collision avoidance in grid map workspace with double deep Q-network. Sensors 2021, 21, 841. [Google Scholar] [CrossRef]
- Miyashita, Y.; Sugawara, T. Analysis of coordinated behavior structures with multi-agent deep reinforcement learning. Appl. Intell. 2021, 51, 1069–1085. [Google Scholar] [CrossRef]
- Caccavale, R.; Calà, V.; Ermini, M.; Finzi, A.; Lippiello, V.; Tavano, F. Multi-robot Sanitization of Railway Stations Based on Deep Q-Learning. In Proceedings of the 8th Italian Workshop on AI and Robotics (AIRO), Online, 30 November 2021. [Google Scholar]
- Chen, W.; Zhou, S.; Pan, Z.; Zheng, H.; Liu, Y. Mapless collaborative navigation for a multi-robot system based on the deep reinforcement learning. Appl. Sci. 2019, 9, 4198. [Google Scholar] [CrossRef] [Green Version]
- Ma, J.; Lu, H.; Xiao, J.; Zeng, Z.; Zheng, Z. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning. J. Intell. Robot. Syst. 2020, 99, 371–386. [Google Scholar] [CrossRef]
- Kheawkhem, P.; Khuankrue, I. Study on Deep Reinforcement Learning for Mobile Robots Flocking Control in Certainty Situations. In Proceedings of the 19th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Prachuap Khiri Khan, Thailand, 24–27 May 2022; pp. 1–4. [Google Scholar]
- Qiu, Y.; Zhan, Y.; Jin, Y.; Wang, J.; Zhang, X. Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control. arXiv 2022, arXiv:2209.08351. [Google Scholar]
- Setyawan, G.E.; Hartono, P.; Sawada, H. Cooperative Multi-Robot Hierarchical Reinforcement Learning. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 35–44. [Google Scholar] [CrossRef]
- Meng, S.; Kan, Z. Deep reinforcement learning-based effective coverage control with connectivity constraints. IEEE Control Syst. Lett. 2021, 6, 283–288. [Google Scholar] [CrossRef]
- Hamed, O.; Hamlich, M. Hybrid Formation Control for Multi-Robot Hunters Based on Multi-Agent Deep Deterministic Policy Gradient. Mendel 2021, 27, 23–29. [Google Scholar] [CrossRef]
- Liu, C.H.; Chen, Z.; Tang, J.; Xu, J.; Piao, C. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE J. Sel. Areas Commun. 2018, 36, 2059–2070. [Google Scholar] [CrossRef]
- Kouzehgar, M.; Meghjani, M.; Bouffanais, R. Multi-agent reinforcement learning for dynamic ocean monitoring by a swarm of buoys. In Proceedings of the Global Oceans 2020: Singapore–US Gulf Coast, IEEE, Biloxi, MS, USA, 5–30 October 2020; pp. 1–8. [Google Scholar]
- Salimi, M.; Pasquier, P. Deep Reinforcement Learning for Flocking Control of UAVs in Complex Environments. In Proceedings of the 6th IEEE International Conference on Robotics and Automation Engineering (ICRAE), Guangzhou, China, 19–22 November 2021; pp. 344–352. [Google Scholar]
- Fan, T.; Long, P.; Liu, W.; Pan, J. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int. J. Robot. Res. 2020, 39, 856–892. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Qingqing, L.; Westerlund, T. Towards closing the sim-to-real gap in collaborative multi-robot deep reinforcement learning. In Proceedings of the 5th IEEE International Conference on Robotics and Automation Engineering (ICRAE), Singapore, 20–22 November 2020; pp. 7–12. [Google Scholar]
- Lin, J.; Yang, X.; Zheng, P.; Cheng, H. End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; pp. 2493–2500. [Google Scholar]
- Tolstaya, E.; Paulos, J.; Kumar, V.; Ribeiro, A. Multi-robot coverage and exploration using spatial graph neural networks. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8944–8950. [Google Scholar]
- Blumenkamp, J.; Morad, S.; Gielis, J.; Li, Q.; Prorok, A. A framework for real-world multi-robot systems running decentralized GNN-based policies. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8772–8778. [Google Scholar]
- Lin, J.; Yang, X.; Zheng, P.; Cheng, H. Connectivity guaranteed multi-robot navigation via deep reinforcement learning. In Proceedings of the Conference on Robot Learning, PMLR, Virtual, 16–18 November 2020; pp. 661–670. [Google Scholar]
- Wang, J.; Cao, J.; Stojmenovic, M.; Zhao, M.; Chen, J.; Jiang, S. Pattern-rl: Multi-robot cooperative pattern formation via deep reinforcement learning. In Proceedings of the 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 210–215. [Google Scholar]
- Park, B.; Kang, C.; Choi, J. Cooperative Multi-Robot Task Allocation with Reinforcement Learning. Appl. Sci. 2021, 12, 272. [Google Scholar] [CrossRef]
- Yao, S.; Chen, G.; Pan, L.; Ma, J.; Ji, J.; Chen, X. Multi-robot collision avoidance with map-based deep reinforcement learning. In Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 532–539. [Google Scholar]
- Tan, Q.; Fan, T.; Pan, J.; Manocha, D. DeepMNavigate: Deep reinforced multi-robot navigation unifying local & global collision avoidance. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 6952–6959. [Google Scholar]
- Han, R.; Chen, S.; Hao, Q. Cooperative multi-robot navigation in dynamic environment with deep reinforcement learning. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 448–454. [Google Scholar]
- Blumenkamp, J.; Prorok, A. The emergence of adversarial communication in multi-agent reinforcement learning. arXiv 2020, arXiv:2008.02616. [Google Scholar]
- Sivanathan, K.; Vinayagam, B.; Samak, T.; Samak, C. Decentralized motion planning for multi-robot navigation using deep reinforcement learning. In Proceedings of the 3rd IEEE International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 709–716. [Google Scholar]
- Liu, J.y.; Wang, G.; Fu, Q.; Yue, S.h.; Wang, S.y. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning. Def. Technol. 2022, 19, 210–219. [Google Scholar] [CrossRef]
- Sadhukhan, P.; Selmic, R.R. Multi-agent formation control with obstacle avoidance using proximal policy optimization. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 2694–2699. [Google Scholar]
- Sadhukhan, P. Proximal Policy Optimization for Formation Control and Obstacle Avoidance in Multi-Agent Systems. Ph.D. Thesis, Concordia University, Montreal, QC, Canada, 2021. [Google Scholar]
- Ourari, R.; Cui, K.; Koeppl, H. Decentralized swarm collision avoidance for quadrotors via end-to-end reinforcement learning. arXiv 2021, arXiv:2104.14912. [Google Scholar]
- Zhang, T.; Liu, Z.; Pu, Z.; Yi, J. Multi-Target Encirclement with Collision Avoidance via Deep Reinforcement Learning using Relational Graphs. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8794–8800. [Google Scholar]
- Sadhukhan, P.; Selmic, R.R. Proximal policy optimization for formation navigation and obstacle avoidance. Int. J. Intell. Robot. Appl. 2022, 6, 746–759. [Google Scholar] [CrossRef]
- Allen, R.E.; Gupta, J.K.; Pena, J.; Zhou, Y.; Bear, J.W.; Kochenderfer, M.J. Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning. arXiv 2019, arXiv:1908.01022. [Google Scholar]
- Xia, J.; Luo, Y.; Liu, Z.; Zhang, Y.; Shi, H.; Liu, Z. Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning. Defence Technol. 2022, in press. [Google Scholar] [CrossRef]
- Li, S.; Guo, W. Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment. Wirel. Commun. Mob. Comput. 2022, 2022, 4384954. [Google Scholar] [CrossRef]
- Paull, S.; Ghassemi, P.; Chowdhury, S. Learning Scalable Policies over Graphs for Multi-Robot Task Allocation using Capsule Attention Networks. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8815–8822. [Google Scholar]
- Na, S.; Krajník, T.; Lennox, B.; Arvin, F. Federated Reinforcement Learning for Collective Navigation of Robotic Swarms. arXiv 2022, arXiv:2202.01141. [Google Scholar] [CrossRef]
- Fan, T.; Long, P.; Liu, W.; Pan, J. Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios. arXiv 2018, arXiv:1808.03841. [Google Scholar]
- Elfakharany, A.; Ismail, Z.H. End-to-end deep reinforcement learning for decentralized task allocation and navigation for a multi-robot system. Appl. Sci. 2021, 11, 2895. [Google Scholar] [CrossRef]
- Wen, S.; Wen, Z.; Zhang, D.; Zhang, H.; Wang, T. A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning. Appl. Soft Comput. 2021, 110, 107605. [Google Scholar] [CrossRef]
- Khan, A.; Tolstaya, E.; Ribeiro, A.; Kumar, V. Graph policy gradients for large scale robot control. In Proceedings of the Conference on Robot Learning, PMLR, Virtual, 16–18 November 2020; pp. 823–834. [Google Scholar]
- Alon, Y.; Zhou, H. Multi-agent reinforcement learning for unmanned aerial vehicle coordination by multi-critic policy gradient optimization. arXiv 2020, arXiv:2012.15472. [Google Scholar]
- Khan, A.; Kumar, V.; Ribeiro, A. Graph policy gradients for large scale unlabeled motion planning with constraints. arXiv 2019, arXiv:1909.10704. [Google Scholar]
- Asayesh, S.; Chen, M.; Mehrandezh, M.; Gupta, K. Least-restrictive multi-agent collision avoidance via deep meta reinforcement learning and optimal control. arXiv 2021, arXiv:2106.00936. [Google Scholar]
- Qamar, S.; Khan, S.H.; Arshad, M.A.; Qamar, M.; Gwak, J.; Khan, A. Autonomous Drone Swarm Navigation and Multi-target Tracking with Island Policy-based Optimization Framework. IEEE Access 2022, 10, 91073–91091. [Google Scholar] [CrossRef]
- Zhou, W.; Li, J.; Zhang, Q. Joint Communication and Action Learning in Multi-Target Tracking of UAV Swarms with Deep Reinforcement Learning. Drones 2022, 6, 339. [Google Scholar] [CrossRef]
- Hüttenrauch, M.; Šošić, A.; Neumann, G. Local communication protocols for learning complex swarm behaviors with deep reinforcement learning. In Proceedings of the International Conference on Swarm Intelligence; Springer: Berlin/Heidelberg, Germany, 2018; pp. 71–83. [Google Scholar]
- Hüttenrauch, M.; Adrian, S.; Neumann, G. Deep reinforcement learning for swarm systems. J. Mach. Learn. Res. 2019, 20, 1–31. [Google Scholar]
- Wang, W.; Wang, L.; Wu, J.; Tao, X.; Wu, H. Oracle-Guided Deep Reinforcement Learning for Large-Scale Multi-UAVs Flocking and Navigation. IEEE Trans. Veh. Technol. 2022, 71, 10280–10292. [Google Scholar] [CrossRef]
- Prianto, E.; Kim, M.; Park, J.H.; Bae, J.H.; Kim, J.S. Path planning for multi-arm manipulators using deep reinforcement learning: Soft actor–critic with hindsight experience replay. Sensors 2020, 20, 5911. [Google Scholar] [CrossRef]
- Cao, Y.; Wang, S.; Zheng, X.; Ma, W.; Xie, X.; Liu, L. Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot. arXiv 2022, arXiv:2209.01434. [Google Scholar] [CrossRef]
- Galceran, E.; Carreras, M. A survey on coverage path planning for robotics. Robot. Auton. Syst. 2013, 61, 1258–1276. [Google Scholar] [CrossRef] [Green Version]
- Agmon, N.; Hazon, N.; Kaminka, G.A. Constructing spanning trees for efficient multi-robot coverage. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2006, Orlando, FL, USA, 15–19 May 2006; pp. 1698–1703. [Google Scholar]
- Kapoutsis, A.C.; Chatzichristofis, S.A.; Kosmatopoulos, E.B. DARP: Divide areas algorithm for optimal multi-robot coverage path planning. J. Intell. Robot. Syst. 2017, 86, 663–680. [Google Scholar] [CrossRef] [Green Version]
- Rekleitis, I.; New, A.P.; Rankin, E.S.; Choset, H. Efficient boustrophedon multi-robot coverage: An algorithmic approach. Ann. Math. Artif. Intell. 2008, 52, 109–142. [Google Scholar] [CrossRef] [Green Version]
- Zheng, X.; Jain, S.; Koenig, S.; Kempe, D. Multi-robot forest coverage. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 3852–3857. [Google Scholar]
- Marjovi, A.; Nunes, J.G.; Marques, L.; De Almeida, A. Multi-robot exploration and fire searching. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 1929–1934. [Google Scholar]
- Nieto-Granda, C.; Rogers III, J.G.; Christensen, H.I. Coordination strategies for multi-robot exploration and mapping. Int. J. Robot. Res. 2014, 33, 519–533. [Google Scholar] [CrossRef]
- Simmons, R.; Apfelbaum, D.; Burgard, W.; Fox, D.; Moors, M.; Thrun, S.; Younes, H. Coordination for multi-robot exploration and mapping. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00), Austin, TX, USA, 30 July–3 August 2000; pp. 852–858. [Google Scholar]
- Rooker, M.N.; Birk, A. Multi-robot exploration under the constraints of wireless networking. Control Eng. Pract. 2007, 15, 435–445. [Google Scholar] [CrossRef]
- Zhou, X.; Liu, X.; Wang, X.; Wu, S.; Sun, M. Multi-Robot Coverage Path Planning based on Deep Reinforcement Learning. In Proceedings of the 24th IEEE International Conference on Computational Science and Engineering (CSE), Shenyang, China, 20–22 October 2021; pp. 35–42. [Google Scholar]
- Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 14413–14423. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar]
- Gama, F.; Marques, A.G.; Leus, G.; Ribeiro, A. Convolutional neural network architectures for signals supported on graphs. IEEE Trans. Signal Process. 2018, 67, 1034–1049. [Google Scholar] [CrossRef] [Green Version]
- Aydemir, F.; Cetin, A. Multi-Agent Dynamic Area Coverage Based on Reinforcement Learning with Connected Agents. Comput. Syst. Sci. Eng. 2023, 45, 215–230. [Google Scholar] [CrossRef]
- Zhang, H.; Cheng, J.; Zhang, L.; Li, Y.; Zhang, W. H2GNN: Hierarchical-Hops Graph Neural Networks for Multi-Robot Exploration in Unknown Environments. IEEE Robot. Autom. Lett. 2022, 7, 3435–3442. [Google Scholar] [CrossRef]
- Gao, M.; Zhang, X. Cooperative Search Method for Multiple UAVs Based on Deep Reinforcement Learning. Sensors 2022, 22, 6737. [Google Scholar] [CrossRef]
- Sheng, W.; Guo, H.; Yau, W.Y.; Zhou, Y. PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search. IEEE Robot. Autom. Lett. 2022, 7, 8869–8876. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Reynolds, C.W. Flocks, herds and schools: A distributed behavioral model. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, Anaheim, CA, USA, 27–31 July 1987; pp. 25–34. [Google Scholar]
- Liang, Z.; Cao, J.; Lin, W.; Chen, J.; Xu, H. Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment. In Proceedings of the 3rd IEEE International Conference on Cognitive Machine Intelligence (CogMI), Atlanta, GA, USA, 13–15 December 2021; pp. 272–281. [Google Scholar]
- Acar, E.U.; Choset, H.; Lee, J.Y. Sensor-based coverage with extended range detectors. IEEE Trans. Robot. 2006, 22, 189–198. [Google Scholar] [CrossRef]
- Chen, D.; Qi, Q.; Zhuang, Z.; Wang, J.; Liao, J.; Han, Z. Mean field deep reinforcement learning for fair and efficient UAV control. IEEE Internet Things J. 2020, 8, 813–828. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, C.; Li, J.; Han, Z. Distributed interference-aware traffic offloading and power control in ultra-dense networks: Mean field game with dominating player. IEEE Trans. Veh. Technol. 2019, 68, 8814–8826. [Google Scholar] [CrossRef]
- Guéant, O.; Lasry, J.M.; Lions, P.L. Mean field games and applications. In Paris-Princeton Lectures on Mathematical Finance 2010; Springer: Berlin/Heidelberg, Germany, 2011; pp. 205–266. [Google Scholar]
- Kadanoff, L.P.; Martin, P.C. Statistical physics: Statics, dynamics, and renormalization. Phys. Today 2001, 54, 54–55. [Google Scholar]
- Nemer, I.A.; Sheltami, T.R.; Belhaiza, S.; Mahmoud, A.S. Energy-Efficient UAV Movement Control for Fair Communication Coverage: A Deep Reinforcement Learning Approach. Sensors 2022, 22, 1919. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.H.; Ma, X.; Gao, X.; Tang, J. Distributed energy-efficient multi-UAV navigation for long-term communication coverage by deep reinforcement learning. IEEE Trans. Mob. Comput. 2019, 19, 1274–1285. [Google Scholar] [CrossRef]
- Surynek, P. An optimization variant of multi-robot path planning is intractable. In Proceedings of the AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; Volume 24, pp. 1261–1263. [Google Scholar]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Wagner, G.; Choset, H. Subdimensional expansion for multirobot path planning. Artif. Intell. 2015, 219, 1–24. [Google Scholar] [CrossRef]
- Bennewitz, M.; Burgard, W.; Thrun, S. Finding and optimizing solvable priority schemes for decoupled path planning techniques for teams of mobile robots. Robot. Auton. Syst. 2002, 41, 89–99. [Google Scholar] [CrossRef]
- Dutta, A.; Dasgupta, P. Bipartite graph matching-based coordination mechanism for multi-robot path planning under communication constraints. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 857–862. [Google Scholar]
- Kimmel, A.; Bekris, K. Decentralized multi-agent path selection using minimal information. In Distributed Autonomous Robotic Systems; Springer: Berlin/Heidelberg, Germany, 2016; pp. 341–356. [Google Scholar]
- Yu, J.; LaValle, S.M. Multi-agent path planning and network flow. In Algorithmic Foundations of Robotics X; Springer: Berlin/Heidelberg, Germany, 2013; pp. 157–173. [Google Scholar]
- Xu, Y.; Wei, Y.; Wang, D.; Jiang, K.; Deng, H. Multi-UAV Path Planning in GPS and Communication Denial Environment. Sensors 2023, 23, 2997. [Google Scholar] [CrossRef]
- Wang, D.; Deng, H.; Pan, Z. Mrcdrl: Multi-robot coordination with deep reinforcement learning. Neurocomputing 2020, 406, 68–76. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Li, M.; Jie, Y.; Kong, Y.; Cheng, H. Decentralized Global Connectivity Maintenance for Multi-Robot Navigation: A Reinforcement Learning Approach. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8801–8807. [Google Scholar]
- Achiam, J.; Held, D.; Tamar, A.; Abbeel, P. Constrained policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 22–31. [Google Scholar]
- Dutta, A.; Ghosh, A.; Kreidl, O.P. Multi-robot informative path planning with continuous connectivity constraints. In Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3245–3251. [Google Scholar]
- Challita, U.; Saad, W.; Bettstetter, C. Interference management for cellular-connected UAVs: A deep reinforcement learning approach. IEEE Trans. Wirel. Commun. 2019, 18, 2125–2140. [Google Scholar] [CrossRef] [Green Version]
- Wang, B.; Liu, Z.; Li, Q.; Prorok, A. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot. Autom. Lett. 2020, 5, 6932–6939. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Playa Blanca, Spain, 9–11 April 2018; pp. 4295–4304. [Google Scholar]
- Chen, Y.F.; Liu, M.; Everett, M.; How, J.P. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 285–292. [Google Scholar]
- Chen, Y.F.; Everett, M.; Liu, M.; How, J.P. Socially aware motion planning with deep reinforcement learning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1343–1350. [Google Scholar]
- Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 1–40. [Google Scholar] [CrossRef] [Green Version]
- Konečnỳ, J.; McMahan, B.; Ramage, D. Federated optimization: Distributed optimization beyond the datacenter. arXiv 2015, arXiv:1511.03575. [Google Scholar]
- Luo, R.; Ni, W.; Tian, H.; Cheng, J. Federated Deep Reinforcement Learning for RIS-Assisted Indoor Multi-Robot Communication Systems. IEEE Trans. Veh. Technol. 2022, 71, 12321–12326. [Google Scholar] [CrossRef]
- Sartoretti, G.; Kerr, J.; Shi, Y.; Wagner, G.; Kumar, T.S.; Koenig, S.; Choset, H. Primal: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot. Autom. Lett. 2019, 4, 2378–2385. [Google Scholar] [CrossRef] [Green Version]
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
- Damani, M.; Luo, Z.; Wenzel, E.; Sartoretti, G. PRIMAL _2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robot. Autom. Lett. 2021, 6, 2666–2673. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Sun, L.; Yan, J.; Qin, W. Path planning for multiple agents in an unknown environment using soft actor critic and curriculum learning. Comput. Animat. Virtual Worlds 2023, 34, e2113. [Google Scholar] [CrossRef]
- Pu, Y.; Gan, Z.; Henao, R.; Yuan, X.; Li, C.; Stevens, A.; Carin, L. Variational autoencoder for deep learning of images, labels and captions. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Li, H. Decentralized Multi-Agent Collision Avoidance and Reinforcement Learning. Ph.D. Thesis, The Ohio State University, Columbus, OH, USA, 2021. [Google Scholar]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Pieter Abbeel, O.; Zaremba, W. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 2017, 30, 5048–5058. [Google Scholar]
- Everett, M.; Chen, Y.F.; How, J.P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3052–3059. [Google Scholar]
- Semnani, S.H.; Liu, H.; Everett, M.; De Ruiter, A.; How, J.P. Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning. IEEE Robot. Autom. Lett. 2020, 5, 3221–3226. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Luo, J.; Lin, X.; Tan, K.; Pan, C. Dispatching and Path Planning of Automated Guided Vehicles based on Petri Nets and Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Networking, Sensing and Control (ICNSC), Xiamen, China, 3–5 December 2021; Volume 1, pp. 1–6. [Google Scholar]
- Huang, H.; Zhu, G.; Fan, Z.; Zhai, H.; Cai, Y.; Shi, Z.; Dong, Z.; Hao, Z. Vision-based Distributed Multi-UAV Collision Avoidance via Deep Reinforcement Learning for Navigation. arXiv 2022, arXiv:2203.02650. [Google Scholar]
- Yarats, D.; Zhang, A.; Kostrikov, I.; Amos, B.; Pineau, J.; Fergus, R. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 10674–10681. [Google Scholar]
- Jeon, S.; Lee, H.; Kaliappan, V.K.; Nguyen, T.A.; Jo, H.; Cho, H.; Min, D. Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control. Energies 2022, 15, 7426. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv 2016, arXiv:1610.03295. [Google Scholar]
- Ammar, H.B.; Tutunov, R.; Eaton, E. Safe policy search for lifelong reinforcement learning with sublinear regret. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 2361–2369. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Taskar, B.; Chatalbashev, V.; Koller, D.; Guestrin, C. Learning structured prediction models: A large margin approach. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 896–903. [Google Scholar]
- Liang, Z.; Cao, J.; Jiang, S.; Saxena, D.; Xu, H. Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation. arXiv 2022, arXiv:2206.12718. [Google Scholar]
- Farrow, N.; Klingner, J.; Reishus, D.; Correll, N. Miniature six-channel range and bearing system: Algorithm, analysis and experimental validation. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 6180–6185. [Google Scholar]
- Shiell, N.; Vardy, A. A bearing-only pattern formation algorithm for swarm robotics. In Proceedings of the Swarm Intelligence: 10th International Conference, ANTS 2016, Brussels, Belgium, 7–9 September 2016; pp. 3–14. [Google Scholar]
- Rubenstein, M.; Cornejo, A.; Nagpal, R. Programmable self-assembly in a thousand-robot swarm. Science 2014, 345, 795–799. [Google Scholar] [CrossRef]
- Zhu, P.; Dai, W.; Yao, W.; Ma, J.; Zeng, Z.; Lu, H. Multi-robot flocking control based on deep reinforcement learning. IEEE Access 2020, 8, 150397–150406. [Google Scholar] [CrossRef]
- Lan, X.; Liu, Y.; Zhao, Z. Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment. Neurocomputing 2020, 410, 410–418. [Google Scholar] [CrossRef]
- Kortvelesy, R.; Prorok, A. ModGNN: Expert policy approximation in multi-agent systems with a modular graph neural network architecture. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 9161–9167. [Google Scholar]
- Yan, C.; Xiang, X.; Wang, C. Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach. Robot. Auton. Syst. 2020, 131, 103594. [Google Scholar] [CrossRef]
- Yan, C.; Xiang, X.; Wang, C.; Lan, Z. Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 4738–4744. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Ng, A. Sparse Autoencoder. CS294A Lecture Notes. 2011; Volume 72, pp. 1–19. [Google Scholar]
- Bhagat, S.; Das, B.; Chakraborty, A.; Mukhopadhyaya, K. k-Circle Formation and k-epf by Asynchronous Robots. Algorithms 2021, 14, 62. [Google Scholar] [CrossRef]
- Datta, S.; Dutta, A.; Gan Chaudhuri, S.; Mukhopadhyaya, K. Circle formation by asynchronous transparent fat robots. In Proceedings of the International Conference on Distributed Computing and Internet Technology; Springer: Berlin/Heidelberg, Germany, 2013; pp. 195–207. [Google Scholar]
- Dutta, A.; Gan Chaudhuri, S.; Datta, S.; Mukhopadhyaya, K. Circle formation by asynchronous fat robots with limited visibility. In Proceedings of the International Conference on Distributed Computing and Internet Technology; Springer: Berlin/Heidelberg, Germany, 2012; pp. 83–93. [Google Scholar]
- Flocchini, P.; Prencipe, G.; Santoro, N.; Viglietta, G. Distributed computing by mobile robots: Uniform circle formation. Distrib. Comput. 2017, 30, 413–457. [Google Scholar] [CrossRef] [Green Version]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wenhong, Z.; Jie, L.; Zhihong, L.; Lincheng, S. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 2022, 35, 100–112. [Google Scholar]
- Nowak, M.A. Five rules for the evolution of cooperation. Science 2006, 314, 1560–1563. [Google Scholar] [CrossRef] [Green Version]
- Smola, A.; Gretton, A.; Song, L.; Schölkopf, B. A Hilbert space embedding for distributions. In Proceedings of the International Conference on Algorithmic Learning Theory; Springer: Berlin/Heidelberg, Germany, 2007; pp. 13–31. [Google Scholar]
- Chung, T.H.; Hollinger, G.A.; Isler, V. Search and pursuit-evasion in mobile robotics. Auton. Robot. 2011, 31, 299–316. [Google Scholar] [CrossRef] [Green Version]
- Sincák, D. Multi–robot control system for pursuit–evasion problem. J. Electr. Eng 2009, 60, 143–148. [Google Scholar]
- Stiffler, N.M.; O’Kane, J.M. A sampling-based algorithm for multi-robot visibility-based pursuit-evasion. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 1782–1789. [Google Scholar]
- Oh, S.; Schenato, L.; Chen, P.; Sastry, S. Tracking and coordination of multiple agents using sensor networks: System design, algorithms and experiments. Proc. IEEE 2007, 95, 234–254. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Dong, L.; Sun, C. Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 2020, 412, 101–114. [Google Scholar] [CrossRef]
- Tokekar, P.; Vander Hook, J.; Mulla, D.; Isler, V. Sensor planning for a symbiotic UAV and UGV system for precision agriculture. IEEE Trans. Robot. 2016, 32, 1498–1511. [Google Scholar] [CrossRef]
- Batjes, N.H.; Ribeiro, E.; Van Oostrum, A.; Leenaars, J.; Hengl, T.; Mendes de Jesus, J. WoSIS: Providing standardised soil profile data for the world. Earth Syst. Sci. Data 2017, 9, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Viseras, A.; Garcia, R. DeepIG: Multi-robot information gathering with deep reinforcement learning. IEEE Robot. Autom. Lett. 2019, 4, 3059–3066. [Google Scholar] [CrossRef] [Green Version]
- Said, T.; Wolbert, J.; Khodadadeh, S.; Dutta, A.; Kreidl, O.P.; Bölöni, L.; Roy, S. Multi-robot information sampling using deep mean field reinforcement learning. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 1215–1220. [Google Scholar]
- Khamis, A.; Hussein, A.; Elmogy, A. Multi-robot task allocation: A review of the state-of-the-art. Coop. Robot. Sens. Netw. 2015, 2015, 31–51. [Google Scholar]
- Korsah, G.A.; Stentz, A.; Dias, M.B. A comprehensive taxonomy for multi-robot task allocation. Int. J. Robot. Res. 2013, 32, 1495–1512. [Google Scholar] [CrossRef]
- Verma, S.; Zhang, Z.L. Graph capsule convolutional neural networks. arXiv 2018, arXiv:1805.08090. [Google Scholar]
- Kool, W.; Van Hoof, H.; Welling, M. Attention, learn to solve routing problems! arXiv 2018, arXiv:1803.08475. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Devin, C.; Gupta, A.; Darrell, T.; Abbeel, P.; Levine, S. Learning modular neural network policies for multi-task and multi-robot transfer. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2169–2176. [Google Scholar]
- Tavakoli, A.; Pardo, F.; Kormushev, P. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Alkilabi, M.H.M.; Narayan, A.; Tuci, E. Cooperative object transport with a swarm of e-puck robots: Robustness and scalability of evolved collective strategies. Swarm Intell. 2017, 11, 185–209. [Google Scholar] [CrossRef] [Green Version]
- Tuci, E.; Alkilabi, M.H.; Akanyeti, O. Cooperative object transport in multi-robot systems: A review of the state-of-the-art. Front. Robot. AI 2018, 5, 59. [Google Scholar] [CrossRef] [PubMed]
- Niwa, T.; Shibata, K.; Jimbo, T. Multi-agent Reinforcement Learning and Individuality Analysis for Cooperative Transportation with Obstacle Removal. In Proceedings of the International Symposium Distributed Autonomous Robotic Systems; Springer: Berlin/Heidelberg, Germany, 2021; pp. 202–213. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. J. Mach. Learn. Res. 2020, 21, 181:1–181:50. [Google Scholar]
- Stroupe, A.; Huntsberger, T.; Okon, A.; Aghazarian, H.; Robinson, M. Behavior-based multi-robot collaboration for autonomous construction tasks. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 1495–1500. [Google Scholar]
- Werfel, J.K.; Petersen, K.; Nagpal, R. Distributed multi-robot algorithms for the TERMES 3D collective construction system. In Proceedings of the Robotics: Science and Systems VII, Institute of Electrical and Electronics Engineers, Los Angeles, CA, USA, 25–30 September 2011. [Google Scholar]
- Werfel, J.; Petersen, K.; Nagpal, R. Designing collective behavior in a termite-inspired robot construction team. Science 2014, 343, 754–758. [Google Scholar] [CrossRef] [PubMed]
- Sartoretti, G.; Wu, Y.; Paivine, W.; Kumar, T.; Koenig, S.; Choset, H. Distributed reinforcement learning for multi-robot decentralized collective construction. In Distributed Autonomous Robotic Systems; Springer: Berlin/Heidelberg, Germany, 2019; pp. 35–49. [Google Scholar]
- Liang, Z.; Cao, J.; Jiang, S.; Saxena, D.; Chen, J.; Xu, H. From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning. arXiv 2022, arXiv:2206.09590. [Google Scholar]
- Bettini, M.; Kortvelesy, R.; Blumenkamp, J.; Prorok, A. VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning. arXiv 2022, arXiv:2207.03530. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
- Chen, J.; Deng, F.; Gao, Y.; Hu, J.; Guo, X.; Liang, G.; Lam, T.L. MultiRoboLearn: An open-source Framework for Multi-robot Deep Reinforcement Learning. arXiv 2022, arXiv:2209.13760. [Google Scholar]
- Hu, S.; Zhong, Y.; Gao, M.; Wang, W.; Dong, H.; Li, Z.; Liang, X.; Chang, X.; Yang, Y. MARLlib: Extending RLlib for Multi-agent Reinforcement Learning. arXiv 2022, arXiv:2210.13708. [Google Scholar]
- Moritz, P.; Nishihara, R.; Wang, S.; Tumanov, A.; Liaw, R.; Liang, E.; Elibol, M.; Yang, Z.; Paul, W.; Jordan, M.I.; et al. Ray: A distributed framework for emerging {AI} applications. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA, 8–10 October 2018; pp. 561–577. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Gonzalez, J.; Goldberg, K.; Stoica, I. Ray rllib: A composable and scalable reinforcement learning library. arXiv 2017, arXiv:1712.09381. [Google Scholar]
- Hu, J.; Jiang, S.; Harding, S.A.; Wu, H.; Liao, S.w. Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv 2021, arXiv:2102.03479. [Google Scholar]
- Zhou, M.; Wan, Z.; Wang, H.; Wen, M.; Wu, R.; Wen, Y.; Yang, Y.; Zhang, W.; Wang, J. Malib: A parallel framework for population-based multi-agent reinforcement learning. arXiv 2021, arXiv:2106.07551. [Google Scholar]
- Michel, O. Cyberbotics Ltd. Webots™: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 2004, 1, 5. [Google Scholar] [CrossRef] [Green Version]
- Rohmer, E.; Singh, S.P.; Freese, M. V-REP: A versatile and scalable robot simulation framework. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1321–1326. [Google Scholar]
- Dasari, S.; Ebert, F.; Tian, S.; Nair, S.; Bucher, B.; Schmeckpeper, K.; Singh, S.; Levine, S.; Finn, C. Robonet: Large-scale multi-robot learning. arXiv 2019, arXiv:1910.11215. [Google Scholar]
- Challita, U.; Saad, W.; Bettstetter, C. Deep reinforcement learning for interference-aware path planning of cellular-connected UAVs. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–7. [Google Scholar]
- Baca, J.; Hossain, S.; Dasgupta, P.; Nelson, C.A.; Dutta, A. Modred: Hardware design and reconfiguration planning for a high dexterity modular self-reconfigurable robot for extra-terrestrial exploration. Robot. Auton. Syst. 2014, 62, 1002–1015. [Google Scholar] [CrossRef]
- Chennareddy, S.; Agrawal, A.; Karuppiah, A. Modular self-reconfigurable robotic systems: A survey on hardware architectures. J. Robot. 2017, 2017, 5013532. [Google Scholar]
- Tan, N.; Hayat, A.A.; Elara, M.R.; Wood, K.L. A framework for taxonomy and evaluation of self-reconfigurable robotic systems. IEEE Access 2020, 8, 13969–13986. [Google Scholar] [CrossRef]
- Yim, M.; Shen, W.M.; Salemi, B.; Rus, D.; Moll, M.; Lipson, H.; Klavins, E.; Chirikjian, G.S. Modular self-reconfigurable robot systems [grand challenges of robotics]. IEEE Robot. Autom. Mag. 2007, 14, 43–52. [Google Scholar] [CrossRef]
Q-Networks | Policy Gradients | ||
DDPG | PPO | Other | |
[23,24,39,40,59,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92] (QMIX), [38] (DDQN), [93] (DDQN), [94] (DDQN), [95] (DQN), [96] (DQN) | [46,47,48,49,50,97,98,99,100,101,102,103,104,105,106], | [22,52,53,54,55,56,57,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128] | [86,88,129,130,131,132,133,134,135,136,137,138,139,140] (TRPO), [141] (TRPO), [81] (TRPO), [142] (TD3), [143] (SAC), [144] (SAC) |
Refs. | State | Action | Reward |
[59] | Map of the environment and robots’ locations with 4 channels | Discreet | Based on the locations of the robots |
[67] | Robot locations and budgets | Discreet | Based on collected sensor data |
[68] | Position of the leader UAVs, the coverage map, and the connection network | Discreet | Based on the overall coverage and connectivity of the Leader UAVs. |
[69] | Map with obstacles and the coverage area | Discreet | Based on the robot reaching a coverage region within its task area. |
[24] | Map with covered area | Discreet | Based on robot coverage. |
[71] | Map of the environment | Discreet | Based on the robot movements and reaching the target without collisions. |
[70] | Map with robots’ positions | Discreet | Based on the herding pattern. |
[72] | Map with robots’ positions and target locations | Discreet | Distance from the goal and collision status. |
[39] | Map with robots’ positions and target locations | Discreet | Distance from the goal and collision status. |
[40] | Pursuer and evader positions | Discreet | Collision status and time to capture the predator. |
[73] | The map, locations, and orientations of the robots, and the objects the robots are connected to | Discreet | Based on the position of the object and the robots hitting the boundaries. |
[74] | The map, and the locations of the robots | Discreet | Based on distance from the target and collisions. |
[76] | The regions of the robots, positions of the defender and the attacker UAVs and the intruder | Discreet | Based on distance. |
[77] | The distance from the MRS center to the goal, the difference in orientation of the direction of MRS to the goal, and the distance between the robots | Discreet | Based on the distance to the goal, orientation to the goal, proximity of obstacles, and the distance between the robots. |
[78] | Sensor input information that includes distance to other robots and the target landmarks | Discreet | Based on becoming closer to the target landmark. |
[79] | Spatial information on the robots, the object, and the goal | Discreet | Based on the object reaching the goal while avoiding collisions. |
[80] | Robot position and velocity | Discreet | Based on the robots being within sensing range of one another. |
[38] | The positions and speed of the first responders and UAVs | Discreet | Based on the Cramér–Rao lower bound (CRLB) for the whole system. |
[93] | The agents’ positions, types, and remaining jobs | Discreet | Based on minimizing the makespan. |
[83] | The position and direction of the leader and the followers | Discreet | Based on the distance from followers to leaders and collision status. |
[84] | Information on the target, other agents, maps, and collisions | Discreet | Based on finding targets and avoiding obstacles. |
[85] | The robot’s position, position relative to other robots, and angle and direction of the robots | Discreet | Based on covering a location on fire. |
[23] | The positions, velocities, and distances between UAVs | Discreet | Determined by the distance from the target and the evader being reached by the pursuer. |
[87] | Consists of static obstacle locations and the locations of other agents | Discreet | Based on the robots’ movements toward the goal while avoiding collisions. |
[94] | Contains the sensor data for the location of the target relative to the robot and the last action done by the robot | Discreet | Determined by the robot reaching the goal, reducing the number of direction changes, and avoiding collisions. |
[95] | A map that includes the agent’s locations, empty cells, obstacle cells, and the location of the tasks | Discreet | Determined by laying pieces of flooring in the installation area. |
[110] | Map of the environment represented with waypoints, locations of the UAVs, and points of interest | Discreet | Based on the coverage of the team of robots. |
[114] | The positions and tasks of the robots, the state of the robot | Discreet | Based on minimizing the number of timesteps in an episode. |
[96] | Map of the area to be sterilized and the positions of the agents, the cleaning priority, size, and area of the cleaning zone | Discreet | Based on the agents cleaning priority areas for sanitation. |
[86] | Temperature and “fieryness” of a building, location of the robots, water in the tanks, and busy or idle status | Discreet | Based on keeping the fires to a minimum “fieryness” level. |
[53] | Sensory information on obstacles | Discreet | Based on the UAV’s coverage of the area. |
[52] | Robots’ positions and velocities and the machine status | Discreet | Determined by robots completing machine jobs to meet the throughput goal, and their motions while avoiding collisions. |
[107] | Includes the laser readings of the robots, the goal position, and the robot’s velocity | Continuous | Based on the smooth movements of the robots while avoiding collisions. |
[22] | Includes the laser readings of the robots, the goal position, and the robot’s velocity | Continuous | Based on the time to reach the target while avoiding collisions. |
[108] | An environment that includes the coordinates of the manipulator arm gripper | Continuous | Based on reaching the target object. |
[109] | Laser measurements of the robots and their velocities | Continuous | Based on the centroid of the robot team reaching the goal. |
[117] | The state of the robot, other robots, obstacles, and the target position | Continuous | Based on the robots’ relative distance from the target location. |
[97] | Sensed LiDAR data | Continuous | Based on the robot approaching and arriving at the target, avoiding collisions and the formation of the robots. |
[132] | Consists of the goal positions, the robots’ positions, past observations | Continuous | Based on the robot moving towards the goal in the shortest amount of time. |
[133] | Contains laser data, speeds and positions of the robots, and the target position | Continuous | Based on arriving at the target, avoiding collisions, and relative position to other robots. |
[113] | The position information of other robots (three consecutive frames) | Continuous | Based on time for formation, collisions, and the formation progress. |
[115] | The most recent three frames of the map, local goals that include positions and directions | Continuous | Based on minimizing the arrival time of each robot while avoiding collisions. |
[116] | Map of the environment, robot positions and velocities, and laser scans | Continuous | Based on the arrival time of the robot to the destination, avoiding collisions, and smoothness of travel. |
[104] | The coverage score and coverage state for each point of interest and the energy consumption of each UAV | Continuous | Defined by coverage score, connectivity fairness, and energy consumption. |
[134] | Robot’s relative position to the goal and its velocity | Continuous | Based on the robots having collisions. |
[88] | Robot motion parameters, relative distance and orientation to the goal, and their laser scanner data | Discreet/Continuous | Determined by reaching the goal without timing out and avoiding collisions. |
Aerial Robots | Ground Robots | Manipulators |
[23,24,38,46,53,56,57,68,76,83,85,91,104,106,110,120,123,134,135,138,139,141,142,161,167,171,172,186,205,207,218,219,220,230,240,269] | [22,39,49,52,54,55,59,70,71,73,74,75,77,78,79,82,87,90,92,97,107,109,111,112,115,117,120,124,128,130,132,133,137,155,162,183,189,190,194,202,212,251] | [50,93,108,131,143,144] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Orr, J.; Dutta, A. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors 2023, 23, 3625.
Orr J, Dutta A. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors. 2023; 23(7):3625.
Chicago/Turabian StyleOrr, James, and Ayan Dutta. 2023. "Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey" Sensors 23, no. 7: 3625.
APA StyleOrr, J., & Dutta, A. (2023). Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors, 23(7), 3625.