Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform
Abstract
:1. Introduction
2. Preliminaries
2.1. Reinforcement Learning
2.2. Deep Reinforcement Learning
2.3. Reinforcement Learning with Corrective Feedback
3. Approach
3.1. Uav Dynamics
3.2. Baseline: Standard PID for UAV Landing
3.3. PID with RL for UAV Landing
3.4. Rl with Corrective Feedback for UAV Landing
Algorithm 1 PID with DDPG and corrective feedback |
|
- If or , then .
- If or , then .
4. Experiments and Results
4.1. Environmental Settings
4.2. Training in the Simulation Environment
4.3. Testing in the Simulation Environment
4.3.1. Testing with a Static Vehicle
4.3.2. Testing with a Moving Vehicle
4.4. Real-World Experiments
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, P.; Chen, A.Y.; Huang, Y.N.; Kang, S.C. A review of rotorcraft unmanned aerial vehicle (UAV) developments and applications in civil engineering. Smart Struct. Syst. 2014, 13, 1065–1094. [Google Scholar] [CrossRef]
- Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on uav-based applications for precision agriculture. Information 2019, 10, 349. [Google Scholar] [CrossRef]
- Ren, H.; Zhao, Y.; Xiao, W.; Hu, Z. A review of uav monitoring in mining areas: Current status and future perspectives. Int. J. Coal Sci. Technol. 2019, 6, 20–333. [Google Scholar] [CrossRef]
- Michael, N.; Shen, S.; Mohta, K.; Kumar, V.; Nagatani, K.; Okada, Y.; Kiribayashi, S.; Otake, K.; Yoshida, K.; Ohno, K.; et al. Collaborative mapping of an earthquake damaged building via ground and aerial robots. J. Field Robot. 2012, 29, 832–841. [Google Scholar] [CrossRef]
- Baca, T.; Stepan, P.; Spurny, V.; Hert, D.; Penicka, R.; Saska, M.; Thomas, J.; Loianno, G.; Kumar, V. Autonomous landing on a moving vehicle with an unmanned aerial vehicle. J. Field Robot. 2019, 36, 874–891. [Google Scholar] [CrossRef]
- Talha, M.; Asghar, F.; Rohan, A.; Rabah, M.; Kim, S.H. Fuzzy logic-based robust and autonomous safe landing for uav quadcopter. Arab. J. Sci. Eng. 2019, 44, 2627–2639. [Google Scholar] [CrossRef]
- Feng, Y.; Zhang, C.; Baek, S.; Rawashdeh, S.; Mohammadi, A. Autonomous landing of a uav on a moving platform using model predictive control. Drones 2018, 2, 34. [Google Scholar] [CrossRef]
- Erginer, B.; Altug, E. Modeling and pd control of a quadrotor vtol vehicle. In Proceedings of the 2007 IEEE Intelligent Vehicles Symposium, Istambul, Turkey, 13–15 June 2007; IEEE: New York, NY, USA, 2007; pp. 894–899. [Google Scholar]
- Asadi, K.; Suresh, A.K.; Ender, A.; Gotad, S.; Maniyar, S.; Anand, S.; Noghabaei, M.; Han, K.; Lobaton, E.; Wu, T. An integrated ugv-uav system for construction site data collection. Autom. Constr. 2020, 112, 103068. [Google Scholar] [CrossRef]
- Bacheti, V.; Brandao, A.; Sarcinelli-Filho, M. Path-following with a ugv-uav formation considering that the uav lands on the ugv. In Proceedings of the 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 1–4 September 2020; IEEE: New York, NY, USA, 2020; pp. 488–497. [Google Scholar]
- Shaker, M.; Smith, M.N.; Yue, S.; Duckett, T. Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning. In Proceedings of the 2010 International Conference on Emerging Security Technologies, Canterbury, UK, 6–7 September 2010; IEEE: New York, NY, USA, 2010; pp. 183–188. [Google Scholar]
- Rodriguez-Ramos, A.; Sampedro, C.; Bavle, H.; De La Puente, P.; Campoy, P. A deep reinforcement learning strategy for uav autonomous landing on a moving platform. J. Intell. Robot. Syst. 2019, 93, 351–366. [Google Scholar] [CrossRef]
- Lee, S.; Shim, T.; Kim, S.; Park, J.; Hong, K.; Bang, H. Vision-based autonomous landing of a multi-copter unmanned aerial vehicle using reinforcement learning. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; IEEE: New York, NY, USA, 2018; pp. 108–114. [Google Scholar]
- Rodriguez-Ramos, A.; Sampedro, C.; Bavle, H.; Moreno, I.G.; Campoy, P. A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 1010–1017. [Google Scholar]
- Wang, S.; Yin, X.; Li, P.; Zhang, M.; Wang, X. Trajectory tracking control for mobile robots using reinforcement learning and PID. Iran. J. Sci. Technol. Trans. Electrical Eng. 2020, 44, 1059–1068. [Google Scholar] [CrossRef]
- Carlucho, I.; De Paula, M.; Villar, S.A.; Acosta, G.G. Incremental q-learning strategy for adaptive pid control of mobile robots. Expert Syst. Appl. 2017, 80, 183–199. [Google Scholar] [CrossRef]
- Watkins, C.J.C.H.; Dayan, P. Technical note: Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; PMLR: New York, NY, USA, 2014; pp. 387–395. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 2018 International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; pp. 1861–1870. [Google Scholar]
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
- Kelly, M.; Sidrane, C.; Driggs-Campbell, K.; Kochenderfer, M.J. Hg-dagger: Interactive imitation learning with human experts. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 8077–8083. [Google Scholar]
- Spencer, J.; Choudhury, S.; Barnes, M.; Schmittle, M.; Chiang, M.; Ramadge, P.; Srinivasa, S. Learning from Interventions: Human-robot interaction as both explicit and implicit feedback. In Robotics: Science and Systems; MIT Press Journals: Cambridge, MA, USA, 2020. [Google Scholar] [CrossRef]
- Knox, W.B.; Stone, P. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the Fifth International Conference on Knowledge Capture, Redondo Beach, CA, USA, 1–4 September 2009; pp. 9–16. [Google Scholar]
- Celemin, C.; Ruiz-del-Solar, J. An interactive framework for learning continuous actions policies based on corrective feedback. J.Intell. Robot. Syst. 2019, 95, 77–97. [Google Scholar] [CrossRef]
- Celemin, C.; Maeda, G.; Ruiz-del-Solar, J.; Peters, J.; Kober, J. Reinforcement learning of motor skills using policy search and human corrective advice. Int. J. Robot. Res. 2019, 38, 1560–1580. [Google Scholar] [CrossRef]
- Scholten, J.; Wout, D.; Celemin, C.; Kober, J. Deep reinforcement learning with feedback-based exploration. In Proceedings of the IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; IEEE: New York, NY, USA, 2019; pp. 803–808. [Google Scholar]
- Zhang, H.T.; Hu, B.B.; Xu, Z.; Cai, Z.; Liu, B.; Wang, X.; Geng, T.; Zhong, S.; Zhao, J. Visual navigation and landing control of an unmanned aerial vehicle on a moving autonomous surface vehicle via adaptive learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5345–5355. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjel, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [PubMed]
- VanHasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30, pp. 2094–2100. [Google Scholar]
- Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Santa Monica, CA, USA, 28 September–2 October 2004; IEEE: New York, NY, USA, 2004; pp. 2149–2154. [Google Scholar]
- Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An Open-Source Robot Operating System. In Proceedings of the 2009 ICRA Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009. [Google Scholar]
Approach | PID | RL | RL-PID | RLC-PID |
---|---|---|---|---|
90 | 25 | 100 | 100 | |
52 | 0 | 55 | 92 |
Approach | PID | RL | RL-PID | RLC-PID |
---|---|---|---|---|
10 | 0 | 91 | 97 | |
5 | 0 | 52 | 93 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, L.; Wang, C.; Zhang, P.; Wei, C. Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform. Drones 2022, 6, 238. https://doi.org/10.3390/drones6090238
Wu L, Wang C, Zhang P, Wei C. Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform. Drones. 2022; 6(9):238. https://doi.org/10.3390/drones6090238
Chicago/Turabian StyleWu, Lizhen, Chang Wang, Pengpeng Zhang, and Changyun Wei. 2022. "Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform" Drones 6, no. 9: 238. https://doi.org/10.3390/drones6090238
APA StyleWu, L., Wang, C., Zhang, P., & Wei, C. (2022). Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform. Drones, 6(9), 238. https://doi.org/10.3390/drones6090238