Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application
Abstract
:1. Introduction
1.1. Related Works
1.2. Contribution
- In order to regulate the UAV motion, the UAV formation model considering time delay is first established in discrete-time form based on the UAV error dynamics. Then, an optimization problem designed to minimize the quadratic cost function is formulated for the optimal formation control under time delays.
- According to the error dynamics and optimization formation control problem, a delay-informed MDP (DIMDP) framework is presented by including the previous control actions into the state and reward function. Then, a DRL-based algorithm is proposed to address DIMDP, and the classical DDPG algorithm is extended as a delay-informed DDPG (DIDDPG) algorithm to solve DIMDP.
- The computational complexity analysis and the effect of the time delay are discussed, and the proposed algorithm is further extended to the arbitrary communication delay case. Through the training results, the proposed DIDDPG for the UAV formation control can achieve better convergence and system performance.
2. System Modeling and Problem Formulation
2.1. System Modeling
2.2. Optimization Problem Formulation
3. DIDDPG Algorithm for Formation Control
3.1. DIMDP-Based Environmental Model
- (1)
- State: Referring to the leader–follower UAV formation, several factors, including the action of the follower and the error states between the leader and the follower, are considered. As shown in (4), the state errors of the follower are determined by the position and velocity errors. From the discrete-time dynamics (6), the effect of the previous control strategy is also attributed to the time delay as shown in Figure 4. Therefore, the state in the j- sampling interval is defined asIn (8), the updated state error information and local previous control strategy information are extracted to represent the environment state to regulate the follower UAV tracking. In particular, the previous control strategy is used to compensate for the effects of the time delay.
- (2)
- Action: The decision action is given by
- (3)
- State transition function: The state transition function can be determined according to the discrete-time dynamics of the follower in (6) as follows:
- (4)
- Reward function: The reward is used to evaluate the performance of the action, and then the follower can intelligently learn to derive the proper control strategy to maintain the formation tracking. The reward function can be designed as the opposite of the cost function in terms of the optimization problem in (7) as follows:In fact, the closer the follower’s states are to the desired ones, the greater the reward. It is significant that, based on the well-designed reward function, the follower can rapidly achieve the desired position and velocity by continuously adjusting the action in order to acquire the maximum long-term cumulative rewards, which is formulated as a finite horizon N item by
3.2. DIDDPG UAV Formation Algorithm
Algorithm 1 DIDDPG-based UAV formation algorithm. |
|
3.3. Algorithm Analysis
3.3.1. Time Delay Analysis
3.3.2. Computational Complexity Analysis
4. Simulation Results and Discussions
4.1. Performance Comparison of Convergence
4.2. Performance Comparison of Different Scenarios
4.3. Performance Comparison with Different Aspects
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hayat, S.; Yanmaz, E.; Muzaffar, R. Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint. IEEE Commun. Surv. Tutor. 2016, 18, 2624–2661. [Google Scholar] [CrossRef]
- Gupta, L.; Jain, R.; Vaszkun, G. Survey of important issues in UAV communication networks. IEEE Commun. Surv. Tutor. 2015, 18, 1123–1152. [Google Scholar] [CrossRef] [Green Version]
- Cho, J.; Sung, J.; Yoon, J.; Lee, H. Towards persistent surveillance and reconnaissance using a connected swarm of multiple UAVs. IEEE Access 2020, 8, 157906–157917. [Google Scholar] [CrossRef]
- Jasim, W.; Gu, D. Robust team formation control for quadrotors. IEEE Trans. Control Syst. Technol. 2017, 26, 1516–1523. [Google Scholar] [CrossRef] [Green Version]
- Chao, Z.; Zhou, S.L.; Ming, L.; Zhang, W.G. UAV formation flight based on nonlinear model predictive control. Math. Probl. Eng. 2012, 2012, 261367. [Google Scholar] [CrossRef] [Green Version]
- Chao, Z.; Ming, L.; Shaolei, Z.; Wenguang, Z. Collision-free UAV formation flight control based on nonlinear MPC. In Proceedings of the 2011 International Conference on Electronics, Communications and Control (ICECC), Ningbo, China, 9–11 September 2011; pp. 1951–1956. [Google Scholar]
- Cordeiro, T.F.K.; Ferreira, H.C.; Ishihara, J.Y. Non linear controller and path planner algorithm for an autonomous variable shape formation flight. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, 13–16 June 2017; pp. 1493–1502. [Google Scholar]
- Najm, A.A.; Ibraheem, I.K. Nonlinear PID controller design for a 6-DOF UAV quadrotor system. Eng. Sci. Technol. Int. J. 2019, 22, 1087–1097. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Jiang, Y.; Jiang, Z.P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012, 48, 2699–2704. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef] [Green Version]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Li, B.; Wu, Y. Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 2020, 8, 29064–29074. [Google Scholar] [CrossRef]
- Bayerlein, H.; Theile, M.; Caccamo, M.; Gesbert, D. UAV path planning for wireless data harvesting: A deep reinforcement learning approach. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
- Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement learning for UAV attitude control. ACM Trans.-Cyber-Phys. Syst. 2019, 3, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Buechel, M.; Knoll, A. Deep reinforcement learning for predictive longitudinal control of automated vehicles. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2391–2397. [Google Scholar]
- Chu, T.; Kalabi, U. Model-based deep reinforcement learning for CACC in mixed-autonomy vehicle platoon. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 4079–4084. [Google Scholar]
- Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In Proceedings of the 2020 IEEE International Symposium on circuits and systems (ISCAS), Virtual Event, 10–21 October 2020; pp. 1–5. [Google Scholar]
- Wan, K.; Gao, X.; Hu, Z.; Wu, G. Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning. Remote Sens. 2020, 12, 640. [Google Scholar] [CrossRef] [Green Version]
- Wen, G.; Chen, C.L.P.; Feng, J.; Zhou, N. Optimized multi-agent formation control based on an identifier-actor-critic reinforcement learning algorithm. IEEE Trans. Fuzzy Syst. 2017, 26, 2719–2731. [Google Scholar] [CrossRef]
- Yang, Y.; Modares, H.; Wunsch, D.C.; Yin, Y. Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2139–2153. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Ma, Y.; Hu, S. USV formation and path-following control via deep reinforcement learning with random braking. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5468–5478. [Google Scholar] [CrossRef]
- Liu, Y.; Bucknall, R. A survey of formation control and motion planning of multiple unmanned vehicles. Robotica 2018, 36, 1019–1047. [Google Scholar] [CrossRef]
- Wang, X.; Yadav, V.; Balakrishnan, S.N. Cooperative UAV formation flying with obstacle/collision avoidance. IEEE Trans. Control Syst. Technol. 2007, 15, 672–679. [Google Scholar] [CrossRef]
- Wang, J.; Xin, M. Integrated optimal formation control of multiple unmanned aerial vehicles. IEEE Trans. Control Syst. Technol. 2012, 21, 1731–1744. [Google Scholar] [CrossRef]
- Kuriki, Y.; Namerikawa, T. Consensus-based cooperative formation control with collision avoidance for a multi-UAV system. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 2077–2082. [Google Scholar]
- Dong, X.; Yu, B.; Shi, Z.; Zhong, Y. Time-varying formation control for unmanned aerial vehicles: Theories and applications. IEEE Trans. Control Syst. Technol. 2015, 23, 340–348. [Google Scholar] [CrossRef]
- Dong, X.; Zhou, Y.; Ren, Z.; Zhong, Y. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans. Ind. Electron. 2017, 64, 5014–5024. [Google Scholar] [CrossRef]
- Yan, C.; Xiang, X.; Wang, C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments. J. Intell. Robot. Syst. 2020, 98, 297–309. [Google Scholar] [CrossRef]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Walsh, T.J.; Nouri, A.; Li, L.; Littman, M.L. Learning and planning in environments with delayed feedback. Auton. Agents Multi-Agent Syst. 2009, 18, 83–105. [Google Scholar] [CrossRef]
- Adlakha, S.; Madan, R.; Lall, S.; Goldsmith, A. Optimal control of distributed Markov decision processes with network delays. In Proceedings of the 2007 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, 12–14 December 2007; pp. 3308–3314. [Google Scholar]
- Zhong, A.; Li, Z.; Wu, D.; Tang, T.; Wang, R. Stochastic peak age of information guarantee for cooperative sensing in internet of everything. IEEE Internet Things J. 2023, 1–10. [Google Scholar] [CrossRef]
- Ramstedt, S.; Pal, C. Real-time reinforcement learning. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 1–10. [Google Scholar]
- Li, Z.; Li, F.; Tang, T.; Zhang, H.; Yang, J. Video caching and scheduling with edge cooperation. Digit. Commun. Netw. 2022, 1–13. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, B.; Liu, H.; Sun, H.; Pan, Z.; Guo, Q. Exploiting the flexibility inside park-level commercial buildings considering heat transfer time delay: A memory-augmented deep reinforcement learning approach. IEEE Trans. Sustain. Energy 2021, 13, 207–219. [Google Scholar] [CrossRef]
- Nath, S.; Baranwal, M.; Khadilkar, H. Revisiting state augmentation methods for reinforcement learning with stochastic delays. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, Virtual Event, 1–5 November 2021; pp. 1346–1355. [Google Scholar]
- Chen, B.; Xu, M.; Li, L.; Zhao, D. Delay-aware model-based reinforcement learning for continuous control. Neurocomputing 2021, 450, 119–128. [Google Scholar] [CrossRef]
- Li, Z.; Zhu, N.; Wu, D.; Wang, H.; Wang, R. Energy-efficient mobile edge computing under delay constraints. IEEE Trans. Green Commun. Netw. 2021, 6, 776–786. [Google Scholar] [CrossRef]
- Zeng, T.; Semiari, O.; Saad, W.; Bennis, M. Joint communication and control for wireless autonomous vehicular platoon systems. IEEE Trans. Commun. 2019, 67, 7907–7922. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Zhou, Y.; Wu, D.; Tang, T.; Wang, R. Fairness-aware federated learning with unreliable links in resource-constrained Internet of things. IEEE Internet Things J. 2022, 9, 17359–17371. [Google Scholar] [CrossRef]
- Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
- Zhang, A.; Zhou, D.; Yang, M.; Yang, P. Finite-time formation control for unmanned aerial vehicle swarm system with time-delay and input saturation. IEEE Access 2018, 7, 5853–5864. [Google Scholar] [CrossRef]
- Gonzalez, A.; Aranda, M.; Lopez-Nicolas, G.; Sagüés, C. Time delay compensation based on Smith predictor in multiagent formation control. IFAC-PapersOnLine 2017, 50, 11645–11651. [Google Scholar] [CrossRef]
- Su, H.; Wang, X.; Lin, Z. Flocking of multi-agents with a virtual leader part II: With a virtual leader of varying velocity. In Proceedings of the 2007 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, 12–14 December 2007; pp. 1429–1434. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Detroit, MI, USA, 3–5 December 2014; pp. 387–395. [Google Scholar]
- Wang, Z.; Gao, Y.; Fang, C.; Liu, L.; Guo, S.; Li, P. Optimal connected cruise control with arbitrary communication delays. IEEE Syst. J. 2019, 14, 2913–2924. [Google Scholar] [CrossRef]
- Qiu, C.; Hu, Y.; Chen, Y.; Zeng, B. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Internet Things J. 2019, 6, 8577–8588. [Google Scholar] [CrossRef]
- Treiber, M.; Hennecke, A.; Helbing, D. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E 2000, 62, 1805. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Symbol | Description | Setting |
---|---|---|
Velocity space | m/s | |
Acceleration space | m/s | |
K | Mini-batch size | 32 |
N | Episode | 260 |
M | Time steps | 200 |
Learning rates for actor and critic | ||
Discount factor | ||
Maximum headway | 30 | |
Minimum headway | 5 | |
Sampling interval | s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Xu, M.; Wang, Z.; Fang, C.; Li, Z.; Li, M.; Sun, Y.; Chen, H. Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application. Sensors 2023, 23, 6190. https://doi.org/10.3390/s23136190
Liu L, Xu M, Wang Z, Fang C, Li Z, Li M, Sun Y, Chen H. Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application. Sensors. 2023; 23(13):6190. https://doi.org/10.3390/s23136190
Chicago/Turabian StyleLiu, Lihan, Mengjiao Xu, Zhuwei Wang, Chao Fang, Zhensong Li, Meng Li, Yang Sun, and Huamin Chen. 2023. "Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application" Sensors 23, no. 13: 6190. https://doi.org/10.3390/s23136190
APA StyleLiu, L., Xu, M., Wang, Z., Fang, C., Li, Z., Li, M., Sun, Y., & Chen, H. (2023). Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application. Sensors, 23(13), 6190. https://doi.org/10.3390/s23136190