A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning
Abstract
:1. Introduction
2. Mission Planning Problem Formulation
2.1. Mission and UAV Formulation
2.2. Multiple Objective Functions of Mission Planning
2.3. Constraint Conditions of Mission Planning
3. DIRL-SAM
3.1. Encoder with SAM
3.2. DIRL
Algorithm 1. Dynamical Information Reinforcement Learning |
1: Input: Initialized actor–network and critic–network parameters |
2: Output: The optimal actor–network parameters |
3: for each epoch do |
4: for each batch do |
5: |
6: |
7: |
8: |
9: |
10: end for batch |
11: if the paired t test in the test batches, then |
12: |
13: end if |
14: end for epoch |
4. Experimental Results
4.1. Experimental Settings
- Parameter settings of the contrast optimization algorithms:
- 2.
- Parameters settings about the objection functions:
4.2. Simulation Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fraser, B.T.; Congalton, R.G. Monitoring Fine-Scale Forest Health Using Unmanned Aerial Systems (UAS) Multispectral Models. Remote Sens. 2021, 13, 4873. [Google Scholar] [CrossRef]
- Kurdi, H.; AlDaood, M.F.; Al-Megren, S.; Aloboud, E.; Aldawood, A.S.; Youcef-Toumi, K. Adaptive Task Allocation for Multi-UAV Systems Based on Bacteria Foraging Behavior. Appl. Soft Comput. 2019, 83, 105643. [Google Scholar] [CrossRef]
- Wang, Y.; Ru, Z.Y.; Wang, K.Z.; Huang, P.Q. Joint Deployment and Task Scheduling Optimization for Large-Scale Mobile Users in Multi-UAV-Enabled Mobile Edge Computing. IEEE Trans. Cybern. 2019, 9, 3984–3997. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shirani, B.; Najafi, M.; Izadi, I. Cooperative Load Transportation Using Multiple UAVs. Aerosp Sci Technol. 2019, 84, 158–169. [Google Scholar] [CrossRef]
- Sun, L.; Chen, J.; Feng, D.; Xing, M. Parallel Ensemble Deep Learning for Real-Time Remote Sensing Video Multi-Target Detection. Remote Sens. 2021, 13, 4377. [Google Scholar] [CrossRef]
- Milani, I.; Bongioanni, C.; Colone, F.; Lombardo, P. Fusing Measurements from Wi-Fi Emission-Based and Passive Radar Sensors for Short-Range Surveillance. Remote Sens. 2021, 13, 3556. [Google Scholar] [CrossRef]
- Wu, H.S.; Li, H.; Xiao, R.B.; Liu, J. Modeling and Simulation of Dynamic Ant Colony’s Labor Division for Task Allocation of UAV Swarm. Phys. A Stat. Mech. Appl. 2018, 491, 127–141. [Google Scholar] [CrossRef]
- Zhang, J.; Xing, J.H. Cooperative Task Assignment of Multi-UAV System. Chin. J. Aeronaut. 2020, 3, 2825–2827. [Google Scholar] [CrossRef]
- Li, Y.B.; Zhang, H.J.; Long, K.P. Joint Resource, Trajectory, and Artificial Noise Optimization in Secure Driven 3-D UAVs with NOMA and Imperfect CSI. IEEE J. Sel. Area Commun. 2021, 39, 3363–3377. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, D.B.; Wang, J.H. A Convex Optimization Based Method for Multiple UAV Autonomous Formation Reconfiguration. Sci China Technol. Sci. 2017, 47, 249–258. [Google Scholar] [CrossRef]
- Mohr, H.; Schroeder, K.; Black, J. Distributed Source Seeking and Robust Obstacle Avoidance through Hybrid Gradient Descent. In Proceedings of the 2019 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2019; pp. 1–8. [Google Scholar]
- Bagheri, S.M.; Taghaddos, H.; Mousaei, A.; Shahnavaz, F.; Hermann, U. An A-Star Algorithm for Semi-optimization of Crane Location and Configuration in Modular Construction. Automat. Constr. 2021, 121, 103447. [Google Scholar] [CrossRef]
- Martin, R.A.; Rojas, I.; Franke, K.; Hedengren, J.D. Evolutionary View Planning for Optimized UAV Terrain Modeling in a Simulated Environment. Remote Sens. 2016, 8, 26. [Google Scholar] [CrossRef] [Green Version]
- Huang, X.; Dong, X.; Ma, J.; Liu, K.; Ahmed, S.; Lin, J.; Qiu, B. The Improved A* Obstacle Avoidance Algorithm for the Plant Protection UAV with Millimeter Wave Radar and Monocular Camera Data Fusion. Remote Sens. 2021, 13, 3364. [Google Scholar] [CrossRef]
- Banerjee, B.P.; Raval, S. A Particle Swarm Optimization Based Approach to Pre-Tune Programmable Hyperspectral Sensors. Remote Sens. 2021, 13, 3295. [Google Scholar] [CrossRef]
- Alhaqbani, A.; Kurdi, H.; Youcef-Toumi, K. Fish-Inspired Task Allocation Algorithm for Multiple Unmanned Aerial Vehicles in Search and Rescue Missions. Remote Sens. 2021, 13, 27. [Google Scholar] [CrossRef]
- Zhen, Z.Y.; Xing, D.; Gao, C. Cooperative Search-Attack Mission Planning for Multi-UAV based on Intelligent Self-Organized Algorithm. Aerosp. Sci. Technol. 2018, 76, 402–411. [Google Scholar] [CrossRef]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. In Proceedings of the 2015 Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Meng, X.; Wu, L.; Yu, S. Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning. Remote Sens. 2019, 11, 448. [Google Scholar] [CrossRef] [Green Version]
- Nazari, M.; Oroojlooy, A.; Snyder, L.V.; Takac, M. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of the 2018 Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 2–8 December 2018; Volume 31. [Google Scholar]
- Bello, I.; Pham, H.V.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. In Proceedings of the 2017 International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Kool, W.; Herke, V.F.; Welling, M. Attention, Learn to Solve Routing Problems. In Proceedings of the 2019 International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2017. [Google Scholar]
- Huang, Y.; Mu, Z.; Wu, S.; Cui, B.; Duan, Y. Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning. Remote Sens. 2021, 13, 2377. [Google Scholar] [CrossRef]
- Ying, C.S.; Chow, A.H.F.; Chin, K.S. An Actor-Critic Deep Reinforcement Learning Approach for Metro Train Scheduling with Rolling Stock Circulation under Stochastic Demand. Transp. Res. B Meth. 2020, 140, 210–235. [Google Scholar] [CrossRef]
- Jun, Y.; You, X.H.; Wu, G.X.; Hassan, M.M.; Almogren, A.; Guna, J. Application of Reinforcement Learning in UAV Cluster Task Scheduling. Future Gener. Comput. Syst. 2019, 95, 140–148. [Google Scholar]
- Qie, H.; Shi, D.X.; Shen, T.L.; Xu, X.H.; Li, Y.; Wang, L.J. Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 146264–146272. [Google Scholar] [CrossRef]
- Wang, C.; Wu, L.Z.; Yan, C.; Wang, Z.C.; Long, H.; Yu, C. Coactive Design of Explainable Agent-Based Task Planning and Deep Reinforcement Learning for Human-UAVs Teamwork. Chin. J. Aeronaut. 2020, 33, 2930–2945. [Google Scholar] [CrossRef]
- Li, K.W.; Zhang, T.; Wang, R. Deep Reinforcement Learning for Multi-objective Optimization. IEEE Trans. Cybern. 2021, 51, 3103–3114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Atencia, R.C.; Ser, J.D.; Camacho, D. Weighted Strategies to Guide a Multi-Objective Evolutionary Algorithm for Multi-UAV Mission Planning. Swarm Evol. Comput. 2019, 44, 480–495. [Google Scholar] [CrossRef]
- Zhen, Z.Y.; Chen, Y.; Wen, L.D.; Han, B. An Intelligent Cooperative Mission Planning Scheme of UAV Swarm in Uncertain Dynamic Environment. Aerosp. Sci. Technol. 2020, 100, 105826. [Google Scholar] [CrossRef]
- Zhao, X.Y.; Zong, Q.; Tian, B.L.; Zhang, B.Y.; You, M. Fast Task Allocation for Heterogeneous Unmanned Aerial Vehicles Through Reinforcement Learning. Aerosp. Sci. Technol. 2019, 92, 588–594. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 2015 International Conference on Learning Representations (ICLR), Lille, France, 6–11 July 2015. [Google Scholar]
- He, K.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 2017 Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 3–9 December 2017. [Google Scholar]
- Hussain, R.; Karbhari, Y.; Ijaz, M.F.; Wozniak, M.; Singh, P.K.; Sarkar, R. Revise-Net: Exploiting Reverse Attention Mechanism for Salient Object Detection. Remote Sens. 2021, 13, 4941. [Google Scholar] [CrossRef]
- Guo, Q.P.; Qiu, X.P.; Liu, P.F. Star-Transformer. In Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Minneapolis, MN, USA, 5–7 June 2019. [Google Scholar]
- Rennie, J.S.; Marcheret, E.; Mroueh, Y.; Ross, J.; Goel, V. Self-Critical Sequence Training for Image Captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017. [Google Scholar]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the 1999 Neural Information Processing Systems (NIPS), Denver, CO, USA, 29 November–4 December 1999. [Google Scholar]
- Zhou, N.; Lau, L.; Bai, R.; Moore, T. A Genetic Optimization Resampling Based Particle Filtering Algorithm for Indoor Target Tracking. Remote Sens. 2021, 13, 132. [Google Scholar] [CrossRef]
- Huang, J.; Xing, Y.; You, H.; Qin, L.; Tian, J.; Ma, J. Particle Swarm Optimization-Based Noise Filtering Algorithm for Photon Cloud Data in Forest Area. Remote Sens. 2019, 11, 980. [Google Scholar] [CrossRef] [Green Version]
Algorithms | Concept | Advantage | Limitation |
---|---|---|---|
Mathematics | The mathematical algorithms transform the problem into a mathematical programming model. Based on the constructing optimization model and the objective function, choosing special mathematical methods to solve the problem, i.e., gradient descent, dynamic programming algorithm. | High Solving precision. Strong interpretability. | Poor generalization ability in the complex optimization problem, i.e., nonconvex problem model, multiple complex objective functions. Time consumption in the large-scale variable problem. |
Heuristic | The heuristic algorithms design greedy optimization rules according to the problem’s characteristics and the researchers’ experience and knowledge, and iterate the predefined optimization process to obtain feasible solutions until the special convergence rules meet. | Not dependent on initial conditions of solution. Robustness for the solution domain of the problem, i.e., non-differentiability, discontinuity. Easy to realize. | Time consumption and limitation performance of algorithms in the large-scale variable problem. |
Artificial intelligence | The artificial intelligence algorithms use the deep neural network to extract the high dimension features of problem data and output the solutions with the designing decoding policy. Unsupervised training methods are employed to optimize the total parameters of the model. | Fast solving speed and good generalization ability in large-scale problems. | Extensive training process. Limitation robustness in a complex problem model, i.e., both containing continuous and discrete variables, complex multiple constraint conditions. |
GA Algorithm | |
---|---|
Population size | 100 |
Hybrid probability | 0.9 |
Mutation probability | 0.1 |
Number of iterations | 3000 |
GA Algorithm | |
---|---|
Population size | 100 |
Inertia weight | 0.8 |
Learning factor 1 | 1.5 |
Learning factor 2 | 1.5 |
Number of iterations | 3000 |
UAV Training Data | |
---|---|
100 | |
Mission Training Data | |
10 |
Mission | Priority Rank | ||||||||
---|---|---|---|---|---|---|---|---|---|
mission 1 | 1 | 1.6 | 2.4 | 20 | (1.0, 1.5) | 0.1 | 0.1 | 0.8 | 0.8 |
mission 2 | 2 | 2.7 | 2.7 | 20 | (4.0, 4.0) | 0.2 | 0.2 | 0.6 | 0.9 |
mission 3 | 3 | 1.4 | 1.4 | 10 | (4.0, 1.0) | 0.3 | 0.3 | 0.4 | 0.8 |
mission 4 | 4 | 2.8 | 2.8 | 10 | (2.5, 2.5) | 0.3 | 0.3 | 0.4 | 0.7 |
mission 5 | 5 | 1.8 | 2.7 | 10 | (1.0, 4.0) | 0.8 | 0.1 | 0.1 | 0.9 |
mission 6 | 6 | 2.7 | 1.8 | 10 | (3.0, 2.0) | 0.8 | 0.1 | 0.1 | 0.9 |
Algorithms | Mission 1 | Mission 2 | Mission 3 | Mission 4 | Mission 5 | Mission 6 | Total Value | Running Time |
---|---|---|---|---|---|---|---|---|
GA | 2.53 | 3.82 | 2.98 | 11.25 | 5.23 | 5.07 | 30.88 | 33.98 s |
PSO | 2.99 | 3.84 | 3.28 | 11.62 | 5.36 | 5.12 | 32.21 | 45.32 s |
RL-SAM | 10.54 | 8.46 | 4.95 | 13.52 | 5.78 | 7.37 | 50.62 | 2.10 s |
DIRL-AM | 2.38 | 3.82 | 2.75 | 11.07 | 4.99 | 4.83 | 29.84 | 3.06 s |
DIRL-SAM | 2.38 | 3.82 | 2.75 | 11.07 | 5.09 | 4.90 | 30.01 | 1.98 s |
Algorithms | 150 UAVs | 200 UAVs | ||
---|---|---|---|---|
Value of | Running Time | Value of | Running Time | |
GA | 6.51 | 4.43 s | 8.32 | 6.31 s |
PSO | 6.67 | 6.38 s | 8.85 | 9.53 s |
RL-SAM | 49.05 | 0.39 s | 67.43 | 0.48 s |
DIRL-AM | 5.57 | 0.63 s | 4.93 | 0.94 s |
DIRL-SAM | 5.57 | 0.41 s | 5.07 | 0.57 s |
Algorithms | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6 |
---|---|---|---|---|---|---|
Exhaustive method | 8.85/40.1 s | 7.70/84.51 s | 13.55/168.59 s | 4.83/341.24 s | 5.45/682.24 s | 6.34/1676.28 s |
GA | 8.85/1.97 s | 7.70/1.62 s | 13.55/1.96 s | 4.83/2.01 s | 5.45/1.82 s | 6.34/1.71 s |
PSO | 8.85/2.13 s | 7.70/2.18 s | 13.55/2.23 s | 4.83/2.28 s | 5.45/2.31 s | 6.34/2.53 s |
DIRL-SAM | 9.08/0.32 s | 7.76/0.24 s | 13.58/0.35 s | 4.83/0.19 s | 5.46/0.28 s | 6.34/0.33 s |
Random | 19.21/0 s | 18.32/0 s | 21.53/0 s | 18.15/0 s | 14.64/0 s | 17.66/0 s |
Test | Number of UAVs | |||||||
---|---|---|---|---|---|---|---|---|
Test 1 | 20 | 3.4 | 4.1 | (4.55, 0.30) | 0.5 | 0.3 | 0.2 | 1 |
Test 2 | 21 | 2.5 | 4.8 | (3.80, 2.40) | 0.2 | 0.4 | 0.3 | 1 |
Test 3 | 22 | 4.0 | 5.3 | (3.85, 4.95) | 0.2 | 0.3 | 0.5 | 1 |
Test 4 | 23 | 2.4 | 2.2 | (1.40, 1.90) | 0.5 | 0.1 | 0.4 | 1 |
Test 5 | 24 | 3.1 | 2.5 | (2.75, 1.80) | 0.5 | 0.4 | 0.1 | 1 |
Test 6 | 25 | 1.6 | 5.4 | (3.50, 3.90) | 0.1 | 0.3 | 0.6 | 1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zuo, L.; Gao, S.; Li, Y.; Li, L.; Li, M.; Lu, X. A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sens. 2022, 14, 1304. https://doi.org/10.3390/rs14061304
Zuo L, Gao S, Li Y, Li L, Li M, Lu X. A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sensing. 2022; 14(6):1304. https://doi.org/10.3390/rs14061304
Chicago/Turabian StyleZuo, Lei, Shan Gao, Yachao Li, Lianghai Li, Ming Li, and Xiaofei Lu. 2022. "A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning" Remote Sensing 14, no. 6: 1304. https://doi.org/10.3390/rs14061304
APA StyleZuo, L., Gao, S., Li, Y., Li, L., Li, M., & Lu, X. (2022). A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sensing, 14(6), 1304. https://doi.org/10.3390/rs14061304