Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles
Abstract
:1. Introduction
- Modeling for maritime formation control: We introduce a USV model for underactuated USV, only considering its kinematics model. Moreover, we propose a formation shape model to describe the physical relationship between USV members, including relationship in formation, relative distance offset, and scaling coefficient.
- Formation control scheme: We propose an asynchronous decentralized formation control scheme for multiple cooperative USVs, in which we propose the ADDPG algorithm, and design the reward functions for the formation control problem. Then, based on the required specific geometry shape of the USV team, the decentralized formation generation policies and decentralized formation maintenance policies are trained based on the ADDPG to generate the formation and keep the geometric shape, respectively.
- Performance validation: Evaluation criteria are designed to evaluate the performance of the proposed scheme. Extensive simulations are conducted to verify the effectiveness of the proposed formation control scheme. The simulation results show that the proposed scheme can realize the effectiveness of formation generation and the stability of formation maintenance.
2. Related Works
3. System Models
3.1. USV Model
3.2. Formation Shape Model
3.3. Control Objective
4. Proposed Scheme
4.1. Problem Formulation
4.2. Formation Control Algorithm Based on Reinforcement Learning
4.2.1. Formation Generation Policy
Algorithm 1: Formation Generation Policy Based on the Advantage Deep Deterministic Policy Gradient (ADDPG) | |
Input Reward Function , for formation generation scenario | |
Input The predefined formation shape | |
Output formation generation policies for the USV team | |
Initial experience replay buffer | |
1: | fordo |
2: | Initialize a random process for action exploration |
3: | Receive environment state |
4: | for do |
5: | for do |
6: | Select action |
7: | Execution actions and observe reward and new state |
8: | Add into replay buffer |
9: | Sample a random minibatch of transitions from |
10: | Calculate reward in real-time |
11: | for do |
12: | |
13: | Update the critic by (9): |
14: | |
15: | Update the formation generation policy for USV by (10): |
16: | |
17: | Update the “soft” target networks for the actor and critic: |
18: | |
19: | |
20: | end for |
21: | end for |
22: | end for |
23: | end for |
4.2.2. Decentralized Formation Maintenance Policy
5. Experiment and Analysis
5.1. Experimental Setting
5.2. Results Analysis
5.2.1. Formation Generation
- The deep deterministic policy gradient (DDPG) scheme: In this scheme, USV learns formation generation policy based on the deep deterministic policy gradient.
- The deep Q-learning (DQN) scheme: In this scheme, USV learns formation generation policy based on deep Q-learning.
5.2.2. Formation Maintenance
- Cumulative discounted reward during training
- The final distance between the leader and the team goal in each episode
- The stability difference of the whole team
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Savitz, S.; Blickstein, I.; Buryk, P.; Button, R.W.; DeLuca, P.; Dryden, J.; Mastbaum, J.; Osburg, J.; Padilla, P.; Potter, A. U.S. Navy Employment Options for Unmanned Surface Vehicles (USVs); RAND Corporation: Santa Monica, CA, USA, 2013. [Google Scholar]
- Low, C.B. A flexible virtual structure formation keeping control design for nonholonomic mobile robots with low-level control systems, with experiments. In Proceedings of the 2014 IEEE International Symposium on Intelligent Control (ISIC), Juan Les Pins, France, 8–10 October 2014. [Google Scholar]
- Zhang, B.; Zong, Q.; Dou, L.; Tian, B.; Wang, D.; Zhao, X. Trajectory Optimization and Finite-Time Control for Unmanned Helicopters Formation. IEEE Access 2019, 7, 93023–93034. [Google Scholar] [CrossRef]
- Franchi, A.; Giordano, P.R. Online Leader Selection for Improved Collective Tracking and Formation Maintenance. IEEE Trans. Control Netw. Syst. 2018, 5, 3–13. [Google Scholar] [CrossRef] [Green Version]
- Reynolds, C.W. Flocks, herds and schools: A distributed behavioral model. Comput. Graph. 1987, 21, 25–34. [Google Scholar] [CrossRef] [Green Version]
- Olfati-Saber, R. Flocking for multi-agent dynamics systems algorithms and theory. IEEE Trans. Autom. Control 2006, 51, 401–420. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Wang, X.F.; Lin, Z. Flocking of multi-agents with a virtual leader. IEEE Trans. Autom. Control 2009, 54, 293–307. [Google Scholar] [CrossRef]
- Ponomarev, A.; Chen, Z.; Zhang, H.T. Discrete-Time Predictor Feedback for Consensus of Multiagent Systems with Delays. IEEE Trans. Autom. Control 2018, 63, 498–504. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, H.T. A remark on collective circular motion of heterogeneous multi-agents. Automatica 2013, 49, 1236–1241. [Google Scholar] [CrossRef]
- Yuanchang, L.; Richard, B. A survey of formation control and motion planning of multiple unmanned vehicles. Robotica 2018, 36, 1019–1047. [Google Scholar]
- Tan, K.H.; Lewis, M.A. Virtual structures for high-precision cooperative mobile robotic control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’96), Osaka, Japan, 8 November 1996. [Google Scholar]
- Balch, T.; Arkin, R.C. Behavior-based formation control for multirobot teams. IEEE Trans. Robot. Autom. 1998, 14, 926–939. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.Q.; Wang, Z. Formation control: A review and a new consideration. In Proceedings of the2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AL, Canada, 2–6 August 2005; pp. 3181–3186. [Google Scholar]
- Peng, Z.; Wang, D.; Liu, H.H.; Sun, G.; Wang, H. Distributed robust state and output feedback controller designs for rendezvous of networked autonomous surface vehicles using neural networks. Neurocomputing 2013, 115, 130–141. [Google Scholar] [CrossRef]
- Wang, J.; Liu, J.Y.; Yi, H. Formation Control of Unmanned Surface Vehicles with Sensing Constraints Using Exponential Remapping Method. Math. Probl. Eng. 2017, 2017, 7169086. [Google Scholar] [CrossRef] [Green Version]
- Ghommam, J.; Saad, M. Adaptive Leader-Follower Formation Control of Underactuated Surface Vessels under Asymmetric Range and Bearing Constraints. IEEE Trans. Veh. Technol. 2017, 67, 852–865. [Google Scholar] [CrossRef]
- Fahimi, F. Non-linear model predictive formation control for groups of autonomous surface vessels. Int. J. Control 2007, 80, 1248–1259. [Google Scholar] [CrossRef]
- Do, K.D. Formation control of multiple elliptical agents with limited sensing ranges. Automatica 2011, 48, 1330–1338. [Google Scholar] [CrossRef] [Green Version]
- Do, K.D. Formation control of underactuated ships with elliptical shape approximation and limited communication ranges. Automatica 2011, in press. [Google Scholar] [CrossRef] [Green Version]
- Peng, Z.; Wang, D.; Hu, X. Robust adaptive formation control of underactuated autonomous surface vehicles with uncertain dynamics. IET Control Theory Applications 2011, 5, 1378–1387. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, G.; Qiao, L.; Zhang, W. Adaptive output-feedback formation control for underactuated surface vessels. Int. J. Control 2020, 93, 400–409. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, G.; Sun, Z.; Zhang, W. Robust adaptive formation control of underactuated autonomous surface vessels based on MLP and DOB. Nonlinear Dyn. 2018, 94, 503–519. [Google Scholar] [CrossRef]
- Ding, F.G.; Wang, B.; Ma, Y.Q. Adaptive coordinated formation control for multiple surface vessels based on virtual leader. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; pp. 7561–7566. [Google Scholar]
- Sun, Z.; Zhang, G.; Lu, Y.; Zhang, W. Leader-follower formation control of underactuated surface vehicles based on sliding mode control and parameter estimation. ISA Trans. 2018, 72, 15–24. [Google Scholar] [CrossRef]
- Shojaei, K. Leader–follower formation control of underactuated autonomous marine surface vehicles with limited torque. Ocean Eng. 2015, 105, 196–205. [Google Scholar] [CrossRef]
- Shojaei, K. Observer-based neural adaptive formation control of autonomous surface vessels with limited torque. Robot. Auton. Syst. 2016, 78, 83–96. [Google Scholar] [CrossRef]
- Sun, X.; Wang, G.; Fan, Y.; Mu, D.; Qiu, B. A Formation Collision Avoidance System for Unmanned Surface Vehicles with Leader-Follower Structure. IEEE Access 2019, 7, 24691–24702. [Google Scholar] [CrossRef]
- Breivik, M.; Hovstein, V.E.; Fossen, T.I. Ship Formation Control: A Guided Leader-Follower Approach. IFAC Proc. Vol. 2008, 41, 16008–16014. [Google Scholar] [CrossRef] [Green Version]
- Cui, R.; Sam Ge, S.; Voon Ee How, B.; Sang Choo, Y. Leader–follower formation control of underactuated autonomous underwater vehicles. Ocean Eng. 2010, 37, 1491–1502. [Google Scholar] [CrossRef]
- Fan, Z.; Li, H. Two-layer model predictive formation control of unmanned surface vehicle. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 6002–6007. [Google Scholar]
- Park, B.S. Adaptive formation control of underactuated autonomous underwater vehicles. Ocean Eng. 2015, 96, 1–7. [Google Scholar] [CrossRef]
- Liu, Y.; Bucknall, R. The angle guidance path planning algorithms for unmanned surface vehicle formations by using the fast marching method. Appl. Ocean Res. 2016, 59, 327–344. [Google Scholar] [CrossRef] [Green Version]
- Sui, Z.; Pu, Z.; Yi, J.; Wu, S. Formation Control With Collision Avoidance through Deep Reinforcement Learning Using Model-Guided Demonstration. IEEE Trans. Neural Netw. Learn. Syst. 2020. [Google Scholar] [CrossRef]
- Hou, Z.; Fantoni, I. Leader-follower formation saturated control for multiple quadrotors with switching topology. In Proceedings of the 2015 Workshop on Research, Education and Development of Unmanned Aerial Systems (RED-UAS), Cancún, Mexico, 23–25 November 2015; pp. 8–14. [Google Scholar]
- Li, F.; Ding, Y.; Zhou, M.; Hao, K.; Chen, L. An affection-based dynamic leader selection model for formation control in multirobot systems. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1217–1228. [Google Scholar] [CrossRef]
- Li, F.; Ding, Y.; Hao, K. A neuroendocrine inspired dynamic leader selection model in formation control for multi-robot system. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; pp. 5454–5459. [Google Scholar]
- Xue, L.; Cao, X. Leader selection via supermodular game for formation control in multiagent systems. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3656–3664. [Google Scholar] [CrossRef]
- Zelazo, D.; Franchi, A.; Bülthoff, H.H.; Robuffo Giordano, P. Decentralized rigidity maintenance control with range measurements for multi-robot systems. Int. J. Robot. Res. 2015, 34, 105–128. [Google Scholar] [CrossRef] [Green Version]
- Sun, Z.; Helmke, U.; Anderson, B.D.O. Rigid formation shape control in general dimensions: An invariance principle and open problems. In Proceedings of the 2015 54th IEEE Conference on Decision and Control (CDC), Osaka, Japan, 15–18 December 2015. [Google Scholar]
- Mou, S.; Belabbas, M.-A.; Morse, A.S.; Sun, Z.; Anderson, B.D.O. Undirected Rigid Formations Are Problematic. IEEE Trans. Autom. Control 2016, 61, 2821–2836. [Google Scholar] [CrossRef]
- Chen, X.; Belabbas, M.A.; Başar, T. Global stabilization of triangulated formations. SIAM J. Control Optim. 2017, 55, 172–199. [Google Scholar] [CrossRef] [Green Version]
- Sun, Z.; Mou, S.; Anderson, B.D.O.; Cao, M. Exponential stability for formation control systems with generalized controllers: A unified approach. Syst. Control Lett. 2016, 93, 50–57. [Google Scholar] [CrossRef]
- Sun, Z.; Park, M.-C.; Anderson, B.D.O.; Ahn, H.-S. Distributed stabilization control of rigid formations with prescribed orientation. Automatica 2017, 78, 250–257. [Google Scholar] [CrossRef] [Green Version]
- Aranda, M.; Lopez-Nicolas, G.; Sagues, C.; Zavlanos, M.M. Distributed Formation Stabilization Using Relative Position Measurements in Local Coordinates. IEEE Trans. Autom. Control 2016, 61, 3925–3935. [Google Scholar] [CrossRef]
- Oh, K.-K.; Ahn, H.-S. Distance-based undirected formations of single-integrator and double-integrator modeled agents in n-dimensional space. Int. J. Robust Nonlinear Control 2014, 24, 1809–1820. [Google Scholar] [CrossRef]
- Lin, Z.; Wang, L.; Chen, Z.; Fu, M.; Han, Z. Necessary and Sufficient Graphical Conditions for Affine Formation Control. IEEE Trans. Autom. Control 2016, 61, 2877–2891. [Google Scholar] [CrossRef]
- Lin, Z.; Wang, L.; Han, Z.; Fu, M. Distributed Formation Control of Multi-Agent Systems Using Complex Laplacian. IEEE Trans. Autom. Control 2014, 59, 1765–1777. [Google Scholar] [CrossRef]
- Timothy, P.L.; Jonathan, J.H.; Alexander, P.; Nicolas, H.; Tom, E.; Yuval, T.; David, S.; Daan, W. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Mordatch, I.; Abbeel, P. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Hyperparameter | Value |
---|---|
Minibatch size | 1024 |
Replay buffer size | 106 |
Discount factor | 0.95 |
Learning rate | 0.01 |
Maximum episode length | 100 |
Number of episodes | 20,000 |
Number of units in the multilayer perceptron (MLP) | 64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, J.; Zhou, R.; Liu, Y.; Luo, J.; Xie, S.; Peng, Y.; Pu, H. Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles. Appl. Sci. 2021, 11, 546. https://doi.org/10.3390/app11020546
Xie J, Zhou R, Liu Y, Luo J, Xie S, Peng Y, Pu H. Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles. Applied Sciences. 2021; 11(2):546. https://doi.org/10.3390/app11020546
Chicago/Turabian StyleXie, Jiajia, Rui Zhou, Yuan Liu, Jun Luo, Shaorong Xie, Yan Peng, and Huayan Pu. 2021. "Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles" Applied Sciences 11, no. 2: 546. https://doi.org/10.3390/app11020546
APA StyleXie, J., Zhou, R., Liu, Y., Luo, J., Xie, S., Peng, Y., & Pu, H. (2021). Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles. Applied Sciences, 11(2), 546. https://doi.org/10.3390/app11020546