State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Proposed Framework
2.2. Assumptions
- The motion of the AUV in the roll, pitch, and sway directions was neglected;
- The AUV maintained neutral buoyancy and the origin of the carrier coordinate was located at the center of mass;
- The AUV had three planes of symmetry;
- The maximum speed in the surge direction was 5 knots, and the maximum speed in the heave direction was 2 knots;
- The action decision interval of the AUV was a step time of s. Within each step time, the change in speed was within knots, and the heading change was within degrees.
- The communication delay between the AUVs was set to a fixed value of 2 s; i.e., at time t, an AUV can obtain accurate location information about other AUVs at time .
2.3. AUV Motion Model
2.4. Hunting Environment Description
3. State Super Sampling Soft Actor–Critic (S4AC) Algorithm
3.1. Super Sampling Info-GAN
3.2. Inferring the Predictive State with SSIG
3.3. Soft Policy Iteration
3.4. Reward Function
3.4.1. Safe Rewards
3.4.2. Cooperating Rewards
3.4.3. Hunting Rewards
3.4.4. Finishing Rewards
3.5. Training Progress
Algorithm 1 S4AC Training Process |
|
4. Simulation Results
4.1. Semi-Physical Simulation Platform
4.2. Experimental Environment and Training Parameters
4.3. Results and Analysis
4.3.1. Training Scenario
- When the evader does not sense any hunter within its perception range, which is set to 50 m, it moves at a speed of 4 knots and heads straight toward its target point.
- When any of the hunters is detected within the perception range, the evader employs the following strategy: Firstly, it calculates the centroid of all hunters within its perception range. Then, based on the original direction of travel, the evader adjusts its sailing direction by adding a vector that points away from the centroid. The magnitude of this vector is inversely proportional to the distance between the evader and the centroid of the hunter. Finally, the evader’s sailing direction is determined by the sum of this vector and the target direction.
4.3.2. Testing Scenarios in Different Layouts
4.3.3. Testing Scenarios with Current Disturbance
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Castillo-Zamora, J.J.; Camarillo-Gómez, K.A.; Pérez-Soto, G.I.; Rodríguez-Reséndiz, J.; Morales-Hernández, L.A. Mini-AUV hydrodynamic parameters identification via CFD simulations and their application on control performance evaluation. Sensors 2021, 21, 820. [Google Scholar] [CrossRef] [PubMed]
- Orr, J.; Dutta, A. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. Sensors 2023, 23, 3625. [Google Scholar] [CrossRef]
- Chen, M.; Zhu, D. A novel cooperative hunting algorithm for inhomogeneous multiple autonomous underwater vehicles. IEEE Access 2018, 6, 7818–7828. [Google Scholar] [CrossRef]
- Cao, X.; Xu, X. Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment. IEEE Access 2020, 8, 138529–138538. [Google Scholar] [CrossRef]
- Cao, X.; Sun, H.; Guo, L. Potential field hierarchical reinforcement learning approach for target search by multi-AUV in 3-D underwater environments. Int. J. Control 2020, 93, 1677–1683. [Google Scholar] [CrossRef]
- Cao, X.; Zuo, F. A fuzzy-based potential field hierarchical reinforcement learning approach for target hunting by multi-AUV in 3-D underwater environments. Int. J. Control 2021, 94, 1334–1343. [Google Scholar] [CrossRef]
- Mao, Y.; Gao, F.; Zhang, Q.; Yang, Z. An AUV target-tracking method combining imitation learning and deep reinforcement learning. J. Mar. Sci. Eng. 2022, 10, 383. [Google Scholar] [CrossRef]
- Wu, J.; Song, C.; Ma, J.; Wu, J.; Han, G. Reinforcement learning and particle swarm optimization supporting real-time rescue assignments for multiple autonomous underwater vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6807–6820. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhao, M.; Wang, C.; Wei, F.; Qi, H. A Method for Underwater Human–Robot Interaction Based on Gestures Tracking with Fuzzy Control. Int. J. Fuzzy Syst. 2021, 23, 2170–2181. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhao, M.; Wang, C.; Wei, F.; Wang, K.; Qi, H. Diver’s hand gesture recognition and segmentation for human–robot interaction on AUV. Signal Image Video Process. 2021, 15, 1899–1906. [Google Scholar] [CrossRef]
- Jiang, Y.; Peng, X.; Xue, M.; Wang, C.; Qi, H. An underwater human–robot interaction using hand gestures for fuzzy control. Int. J. Fuzzy Syst. 2021, 23, 1879–1889. [Google Scholar] [CrossRef]
- Jiang, Y.; Lin, S.; Ruan, J.; Qi, H. Spatio-temporal dependence-based tensor fusion for thermocline analysis in Argo data. Proc. Inst. Mech. Eng. Part J. Syst. Control. Eng. 2021, 235, 1797–1807. [Google Scholar] [CrossRef]
- Jiang, Y.; Yu, D.; Zhao, M.; Bai, H.; Wang, C.; He, L. Analysis of semi-supervised text clustering algorithm on marine data. Comput. Mater. Contin. 2020, 64, 207–216. [Google Scholar] [CrossRef]
- Wang, G.; Wei, F.; Jiang, Y.; Zhao, M.; Wang, K.; Qi, H. A Multi-AUV Maritime Target Search Method for Moving and Invisible Objects Based on Multi-Agent Deep Reinforcement Learning. Sensors 2022, 22, 8562. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Guo, T.; Liu, Y.T.; Yang, J.M. Survey of multi-agent strategy based on reinforcement learning. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020. [Google Scholar]
- Ma, T.; Lyu, J.; Yang, J.; Xi, R.; Li, Y.; An, J.; Li, C. CLSQL: Improved Q-Learning Algorithm Based on Continuous Local Search Policy for Mobile Robot Path Planning. Sensors 2022, 22, 5910. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Deng, G.; Luo, C.; Lin, Q.; Yan, Q.; Ming, Z. A hybrid path planning method in unmanned air/ground vehicle (UAV/UGV) cooperative systems. IEEE Trans. Veh. Technol. 2016, 65, 9585–9596. [Google Scholar] [CrossRef]
- Nie, Y.; Yang, H.; Gao, Q.; Qu, T.; Fan, C.; Song, D. Research on Path Planning Algorithm Based on Dimensionality Reduction Method and Improved RRT. In Proceedings of the Global Oceans 2020: Singapore-U.S. Gulf Coast, Biloxi, MS, USA, 9 April 2021. [Google Scholar]
- Li, J.; Li, C.; Chen, T.; Zhang, Y. Improved RRT algorithm for AUV target search in unknown 3D environment. J. Mar. Sci. Eng. 2022, 10, 826. [Google Scholar] [CrossRef]
- Solari, F.J.; Rozenfeld, A.F.; Villar, S.A.; Acosta, G.G. Artificial potential fields for the obstacles avoidance system of an AUV using a mechanical scanning sonar. In Proceedings of the 2016 3rd IEEE/OES South American International Symposium on Oceanic Engineering (SAISOE), Buenos Aires, Argentina, 15–17 June 2016. [Google Scholar]
- Noguchi, Y.; Maki, T. Path planning method based on artificial potential field and reinforcement learning for intervention AUVs. In Proceedings of the 2019 IEEE Underwater Technology (UT), Kaohsiung, Taiwan, China, 16–19 April 2019. [Google Scholar]
- Fan, X.; Guo, Y.; Liu, H.; Wei, B.; Lyu, W. Improved artificial potential field method applied for AUV path planning. Math. Probl. Eng. 2020, 2020, 6523158. [Google Scholar] [CrossRef]
- Meng, X.; Sun, B.; Zhu, D. Harbour protection: Moving invasion target interception for multi-AUV based on prediction planning interception method. Ocean Eng. 2021, 219, 108268. [Google Scholar] [CrossRef]
- Lin, C.; Han, G.; Du, J.; Bi, Y.; Shu, L.; Fan, K. A path planning scheme for AUV flock-based Internet-of-Underwater-Things systems to enable transparent and smart ocean. IEEE Internet Things J. 2020, 7, 9760–9772. [Google Scholar] [CrossRef]
- Tang, G.; Liu, P.; Hou, Z.; Claramunt, C.; Zhou, P. Motion Planning of UAV for Port Inspection Based on Extended RRT* Algorithm. J. Mar. Sci. Eng. 2023, 11, 702. [Google Scholar] [CrossRef]
- Hermand, E.; Nguyen, T.W.; Hosseinzadeh, M.; Garone, E. Constrained control of UAVs in geofencing applications. In Proceedings of the 2018 26th Mediterranean Conference on Control and Automation (MED), Zadar, Croatia, 19–22 June 2018. [Google Scholar]
- Kim, J.; Atkins, E. Airspace Geofencing and Flight Planning for Low-Altitude, Urban, Small Unmanned Aircraft Systems. Appl. Sci. 2022, 12, 576. [Google Scholar] [CrossRef]
- Wang, Z.; Lu, H.; Qin, H.; Sui, Y. Autonomous Underwater Vehicle Path Planning Method of Soft Actor–Critic Based on Game Training. J. Mar. Sci. Eng. 2022, 10, 2018. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Nagayuki, Y.; Ishii, S.; Doya, K. Multi-agent reinforcement learning: An approach based on the other agent’s internal model. In Proceedings of the Fourth International Conference on MultiAgent Systems, Boston, MA, USA, 10–12 July 2000. [Google Scholar]
- Gu, S.; Geng, M.; Lan, L. Attention-based fault-tolerant approach for multi-agent reinforcement learning systems. Entropy 2021, 23, 1133. [Google Scholar] [CrossRef] [PubMed]
- Singh, G.; Lofaro, D.M.; Sofge, D. Pursuit-evasion with Decentralized Robotic Swarm in Continuous State Space and Action Space via Deep Reinforcement Learning. In Proceedings of the ICAART, Valletta, Malta, 22–24 February 2020. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the NIPS 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Haarnoja, T.; Tang, H.; Abbeel, P.; Levine, S. Reinforcement learning with deep energy-based policies. In Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, T.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Wei, E.; Wicke, D.; Freelan, D.; Luke, S. Multiagent soft q-learning. arXiv 2018, arXiv:1804.09817. [Google Scholar]
- Danisa, S. Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies. Master’s Thesis, Department of Pure and Applied Mathematics University of Cape Town, Cape Town, South Africa, September 2022. [Google Scholar]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef] [Green Version]
- Todorov, E. Policy gradients in linearly-solvable MDPs. In Proceedings of the NIPS 2010, Vancouver, BC, Canada, 6–9 December 2010. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the NIPS 2016: Neural Information Processing Systems Proceedings, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Vu, M.T.; Van, M.; Bui, D.H.P.; Do, Q.T.; Huynh, T.T.; Lee, S.D.; Choi, H.S. Study on dynamic behavior of unmanned surface vehicle-linked unmanned underwater vehicle system for underwater exploration. Sensors 2020, 20, 1329. [Google Scholar] [CrossRef] [Green Version]
- Fjellstad, O.E.; Fossen, T.I. Position and attitude tracking of AUV’s: A quaternion feedback approach. IEEE J. Ocean. Eng. 1994, 19, 512–518. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the NIPS 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Kurutach, T.; Tamar, A.; Yang, G.; Russell, S.J.; Abbeel, P. Learning plannable representations with causal infogan. In Proceedings of the NIPS 2018, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Wang, W.; Wang, A.; Tamar, A.; Chen, X.; Abbeel, P. Safer classification by synthesis. arXiv 2017, arXiv:1711.08534. [Google Scholar]
- Tian, Z.; Wen, Y.; Gong, Z.; Punakkath, F.; Zou, S.; Wang, J. A regularized opponent model with maximum entropy objective. arXiv 2019, arXiv:1905.08087. [Google Scholar]
Hyperparameter | Value |
---|---|
Learning rate of all networks | 0.0003 |
Discount factor | 0.99 |
State error threshold | 0.75 |
Mini batch size | 256 |
Replay buffer size | 2048 |
Soft update frequency | 0.01 |
Max training episode num | 6000 |
Input size of the Generator | 3 |
Output size of the Generator | 3 × 1024 × 1 |
Hidden layers of the Generator | 2 |
Input size of the Discriminator | 3 × 1024 × 1 |
Output size of the Discriminator | 1 |
Hidden layers of the Discriminator | 1 |
Activation function | ReLU |
Initial Position of AUVs | Obstacle Parameters |
---|---|
Initial Position of AUVs | Obstacle Parameters |
---|---|
Initial Position of AUVs | Obstacle Parameters |
---|---|
Algorithm | With or Without a Current Disturbance | The Minimum Hunting Steps | Success Rates |
---|---|---|---|
S4AC | Without a current disturbance | 127 | 86.7% |
S4AC | With a current disturbance | 129 | 85.3% |
MADDPG | Without a current disturbance | 129 | 77.5% |
MADDPG | With a current disturbance | 150 | 65.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Sui, Y.; Qin, H.; Lu, H. State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment. J. Mar. Sci. Eng. 2023, 11, 1257. https://doi.org/10.3390/jmse11071257
Wang Z, Sui Y, Qin H, Lu H. State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment. Journal of Marine Science and Engineering. 2023; 11(7):1257. https://doi.org/10.3390/jmse11071257
Chicago/Turabian StyleWang, Zhuo, Yancheng Sui, Hongde Qin, and Hao Lu. 2023. "State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment" Journal of Marine Science and Engineering 11, no. 7: 1257. https://doi.org/10.3390/jmse11071257
APA StyleWang, Z., Sui, Y., Qin, H., & Lu, H. (2023). State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment. Journal of Marine Science and Engineering, 11(7), 1257. https://doi.org/10.3390/jmse11071257