A Control Algorithm for Sea–Air Cooperative Observation Tasks Based on a Data-Driven Algorithm
Abstract
:1. Introduction
- Mission 1: USVs search for the isotherm of the mesoscale eddy, self-navigate along the isotherm, and collect ocean data;
- Mission 2: A UAV takes off, and in its one round trip, the UAV and USVs cooperatively move and the UAV reads all observation data from the USVs one by one with a limited distance.
- 1.
- Mission 1 requires the USV to follow the isotherm of the mesoscale eddy, which can be regarded as the navigation problem of the USV in an unknown environment. Because the state of the USV at sea will change with time, with the instability of this environment, it is difficult to understand and predict its strategy, which makes it challenging for the Q learning algorithm to converge. Unlike the supervised learning algorithm, which requires a large amount of sample data, the DDPG algorithm does not need an accurate mathematical model of the controlled object, which is of great significance for controlling a USV in unknown environments. In this scenario, the DDPG algorithm has certain advantages in dealing with high-dimensional continuous state and action space problems. Therefore, we use the DDPG algorithm to learn these system data and train the data-driven DDPG controller of the related system to complete the task of the USV tracking isotherm.
- 2.
- Mission 2 requires multiple USVs and a UAV to be guided to work together and transmit a large amount of data. The control network of this kind of sea–air cross-domain heterogeneous system is very complex; in addition, under the condition of long distances at sea, the communication conditions are limited, so it is difficult to build an accurate environment and control model. Moreover, for the scenario of heterogeneous cooperation between drones, it is not a simple extension to train each agent separately. From the perspective of either the USV or UAV, the environment will become unstable. Because they update their respective strategies independently, it is difficult to guarantee the convergence of the algorithm. In view of the DDPG algorithm’s deficiency in solving heterogeneous cooperative control, a cooperative control strategy of a heterogeneous multi-agent-based DDPG (MADDPG) is adopted in this paper.
2. Reinforcement Learning
2.1. Markov Decision Process (MDP)
2.2. Q Learning
3. USV Tracking Isotherm Algorithm
3.1. Deep Deterministic Policy Gradient Algorithm
3.2. USV Tracking Isotherm Algorithm
4. Sea–Air Cooperative Control Algorithm
4.1. Multi-Agent Deterministic Policy Gradient Algorithm (MADDPG)
4.2. Sea–Air Cooperative Control Algorithm
5. Simulation Results and Analysis
- 1.
- The experimental verification of the USV tracking isotherm algorithm;
- 2.
- The experimental verification of the sea–air cooperative control algorithm.
5.1. Experiment on USV Tracking Isotherm Algorithm
5.2. Experiment of Cooperative Control Algorithm
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Burk, S.D.; Thompson, W.T. Mesoscale eddy formation and shock features associated with a coastally trapped disturbance. In Proceedings of the Oceans 2003. Celebrating the Past…Teaming Toward the Future (IEEE Cat. No. 03CH37492), San Diego, CA, USA, 22–26 September 2003; pp. 1763–1770. [Google Scholar]
- Vu, M.T.; Van, M.; Bui, D.H.P.; Do, Q.T.; Huynh, T.-T.; Lee, S.-D.; Choi, H.-S. Study on Dynamic Behavior of Unmanned Surface Vehicle-Linked Unmanned Underwater Vehicle System for Underwater Exploration. Sensors 2020, 20, 1329. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cho, H.; Jeong, S.-K.; Ji, D.-H.; Tran, N.-H.; Vu, M.T.; Choi, H.-S. Study on Control System of Integrated Unmanned Surface Vehicle and Underwater Vehicle. Sensors 2020, 20, 2633. [Google Scholar] [CrossRef] [PubMed]
- Jung, D.W.; Hong, S.M.; heon Lee, J.; Cho, H.J.; Choi, H.S.; Vu, M.T. A Study on Unmanned Surface Vehicle Combined with Remotely Operated Vehicle System. Proc. Eng. Technol. Innov. 2018, 9, 17–24. [Google Scholar]
- Polvara, R.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R. Autonomous vehicular landings on the deck of an unmanned surface vehicle using deep reinforcement learning. Robotica 2019, 37, 1867–1882. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.-J.; Cheng, S.-M.; Hsueh, Y.-L. eNB selection for machine type communications using reinforcement learning based Markov decision process. IEEE Trans. Veh. Technol. 2017, 66, 11330–11338. [Google Scholar] [CrossRef]
- Lin, S.-W.; Huang, Y.-L.; Hsieh, W.-K. Solving Maze Problem with Reinforcement Learning by a Mobile Robot. In Proceedings of the 2019 IEEE International Conference on Computation, Communication and Engineering (ICCCE), Fujian, China, 8–10 November 2019; pp. 215–217. [Google Scholar]
- Woo, J.; Yu, C.; Kim, N. Deep reinforcement learning-based controller for path following of an unmanned surface vehicle. Ocean Eng. 2019, 183, 155–166. [Google Scholar] [CrossRef]
- Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar]
- Wu, X.; Liu, S.; Zhang, T.; Yang, L.; Li, Y.; Wang, T. Motion control for biped robot via DDPG-based deep reinforcement learning. In Proceedings of the 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), Beijing, China, 15–19 August 2018; pp. 40–45. [Google Scholar]
- Werbos, P.J.; Miller, W.T.; Sutton, R.S. A menu of designs for reinforcement learning over time. Neural Netw. Control 1990, 3, 67–95. [Google Scholar]
- Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
- Passalis, N.; Tefas, A. Deep reinforcement learning for controlling frontal person close-up shooting. Neurocomputing 2019, 335, 37–47. [Google Scholar] [CrossRef]
- Xia, M.; Wang, T.; Zhang, Y.; Liu, J.; Xu, Y. Cloud/shadow segmentation based on global attention feature fusion residual network for remote sensing imagery. Int. J. Remote Sens. 2021, 42, 2022–2045. [Google Scholar] [CrossRef]
- Xia, M.; Wang, K.; Song, W.; Chen, C.; Li, Y. Non-intrusive load disaggregation based on composite deep long short-term memory network. Expert Syst. Appl. 2020, 160, 113669. [Google Scholar] [CrossRef]
- Chen, B.; Xia, M.; Huang, J. Mfanet: A multi-level feature aggregation network for semantic segmentation of land cover. Remote Sens. 2021, 13, 731. [Google Scholar] [CrossRef]
- Jiang, X.; Ji, Y. HD3: Distributed Dueling DQN with Discrete-Continuous Hybrid Action Spaces for Live Video Streaming. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2632–2636. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International conference on machine learning, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
- Sutton, R.S.; McAllester, D.A.; Singh, S.P.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 2000, 12, 1057–1063. [Google Scholar]
- Bhatnagar, S. An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes. Syst. Control Lett. 2010, 59, 760–766. [Google Scholar] [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International conference on machine learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Qi, C.; Hua, Y.; Li, R.; Zhao, Z.; Zhang, H. Deep reinforcement learning with discrete normalized advantage functions for resource management in network slicing. IEEE Commun. Lett. 2019, 23, 1337–1341. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. Computer Science. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Zhou, X.; Wu, P.; Zhang, H.; Guo, W.; Liu, Y. Learn to navigate: Cooperative path planning for unmanned surface vehicles using deep reinforcement learning. IEEE Access 2019, 7, 165262–165278. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Wang, D.; Shen, Y.; Sha, Q.; Li, G.; Kong, X.; Chen, G.; He, B. Adaptive DDPG design-based sliding-mode control for autonomous underwater vehicles at different speeds. In Proceedings of the 2019 IEEE Underwater Technology (UT), Kaohsiung, Taiwan, 16–19 April 2019; pp. 1–5. [Google Scholar]
- Xing, S.; Guan, X.; Luo, X. Trajectory tracking and optimal obstacle avoidance of mobile agent based on data-driven control. In Proceedings of the 29th Chinese Control Conference, Beijing, China, 29–30 July 2010; pp. 4619–4623. [Google Scholar]
- Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans. Ind. Electron. 2016, 64, 4091–4100. [Google Scholar] [CrossRef]
- Foerster, J.N.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual multi-agent policy gradients. arXiv 2017, arXiv:1705.08926. [Google Scholar]
- Xiong, S.; Hou, Z.; Yu, X. Data-driven Formation Control for a Class of Unknown Heterogeneous Discrete-time MIMO Multi-agent System with Switching Topology. In Proceedings of the 2019 12th Asian Control Conference (ASCC), Fukuoka, Japan, 9–12 June 2019; pp. 277–282. [Google Scholar]
Classification | Name | Introduction |
---|---|---|
DQN | Two identical Q network structures are proposed, but they cannot deal with continuous action control. | |
Value-based RL | Double DQN [17] | The DQN training algorithm is improved to decouple the selection and evaluation of actions. |
Dueling DQN [18] | The DQN is improved, and the Q value is divided into the state value and the action dominance function. | |
Policy-based RL | PG [19] | The value-based approach is replaced with a policy-based approach. |
AC [20] | A strategy network and value network are combined. | |
Hybrid algorithm | A3C [21] | A general asynchronous and concurrent reinforcement learning framework is proposed, and the evaluation network is optimized. |
DDPG | The evaluation and policy network is used to update the model parameters iteratively, but it is not suitable for random environment scenarios. | |
NAF [22] | A network is trained to output both an action and Q value simultaneously, but the algorithm is not mature. |
Parameters | Value |
---|---|
Number of training rounds max-el | 1000 |
Learning rate of the policy network | 0.0001 |
Learning rate of the evaluation network | 0.001 |
Discount factor | 0.99 |
Capacity of experience pool R-size | 10,000 |
Update parameters of the target network | 0.01 |
Coefficient of reward function | 0.5, 0.5 |
Parameters | 110 | 210 | 410 | 610 | 710 | 910 |
---|---|---|---|---|---|---|
Average value | 2.142 | 1.786 | 1.167 | 0.815 | 0.534 | 0.399 |
Variance | 9.119 | 6.981 | 4.046 | 2.473 | 1.583 | 1.042 |
Parameters | Value |
---|---|
Number of training rounds max-el | 20,000 |
Learning rate of the policy network | 0.001 |
Learning rate of the evaluation network | 0.0001 |
Discount factor | 0.99 |
Capacity of experience pool R-size | 1,000,000 |
Update parameters of the target network | 0.01 |
Coefficient of reward function | 0.5, 0.5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, K.; Chen, X.; Xia, Q.; Jin, J.; Weng, L. A Control Algorithm for Sea–Air Cooperative Observation Tasks Based on a Data-Driven Algorithm. J. Mar. Sci. Eng. 2021, 9, 1189. https://doi.org/10.3390/jmse9111189
Hu K, Chen X, Xia Q, Jin J, Weng L. A Control Algorithm for Sea–Air Cooperative Observation Tasks Based on a Data-Driven Algorithm. Journal of Marine Science and Engineering. 2021; 9(11):1189. https://doi.org/10.3390/jmse9111189
Chicago/Turabian StyleHu, Kai, Xu Chen, Qingfeng Xia, Junlan Jin, and Liguo Weng. 2021. "A Control Algorithm for Sea–Air Cooperative Observation Tasks Based on a Data-Driven Algorithm" Journal of Marine Science and Engineering 9, no. 11: 1189. https://doi.org/10.3390/jmse9111189
APA StyleHu, K., Chen, X., Xia, Q., Jin, J., & Weng, L. (2021). A Control Algorithm for Sea–Air Cooperative Observation Tasks Based on a Data-Driven Algorithm. Journal of Marine Science and Engineering, 9(11), 1189. https://doi.org/10.3390/jmse9111189