Powered Landing Control of Reusable Rockets Based on Softmax Double DDPG
Abstract
:1. Introduction
- 1.
- We introduced RL to obtain the optimal control commands for the reusable rocket landing problem. We apply the SD3 algorithm to simultaneously predict both the optimal thrust and its nozzle angle considering the three-axis translational and three-axis rotational dynamics of the rocket as well as environmental disturbances and uncertainties.
- 2.
- We designed a novel reward function specific for the reusable rocket landing problem during the training of the RL models. We aim to give sufficient guidance to the rocket in terms of position, velocity, thrust, and attitude angle. We introduced Gaussian distribution rewards for position and attitude angle to give higher rewards under the condition of a higher probability of a successful landing.
2. Materials and Methods
2.1. Mathematical Modelling of the Reusable Rocket Landing
2.2. SD3 Reinforcement Learning Algorithm
- Clipped double QAs an extension of DQN in continuous control, DDPG inherits the over-estimation problem: the estimated Q value function is larger than the actual Q value function. To alleviate this problem, TD3 uses two Q value function networks at the same time, while in the update, the smaller value of the two networks is selected to calculate the updating target for the value function
- Delayed policy updatesSince an unstable value network can lead to a blind update of the policy network, it should be updated at a lower frequency than the value network. In this case, the error of the target networks is reduced to avoid updating the policy updates on high-error states which can cause divergent behaviour.
- Target policy smoothingDue to the deterministic policy, the update target of the Q network is more susceptible to the estimation error. TD3 introduces a regularization strategy for deep value learning in calculating the target Q, adding a small amount of random noise to the target policy and averaging over mini-batches.
Algorithm 1 SD3-based rocket recycling. |
|
2.3. Reinforcement Learning for Rocket Landing
2.3.1. Action Space
2.3.2. Observation Space
2.3.3. Reward Function
- Reward for successful landing: The agent receives a reward when the rocket successfully lands at the target location. The conditions for successful landing can be expressed as , which includes , , , , , , , and . The reward can be defined as follows:
- Penalty for failed landing: The agent will receive a penalty if the landing fails, including an excessive attitude angle or deviation from the landing zone, and the simulation will be terminated. A failed landing is that exact opposite of a successful landing. The penalty can be expressed as:
- Penalty for fuel consumption: The landing performance is evaluated by fuel consumption. The penalty aims to minimize fuel consumption and is defined by
- Reward for accurate landing: The ideal landing location is set to be . In order to encourage the rocket to land as close to the ideal landing location as possible, we construct the landing position reward function in the form of a Gaussian distribution,
- Reward for velocity direction: We construct a reward for velocity direction to stabilize the translational flight of the rocket. The reference direction is defined to point from the rocket’s centre of mass to the ideal location, and the angle between the current velocity and the reference direction is used as a guidance for learning. The smaller is, the bigger reward the rocket will obtain. The reward can be described as:
- Reward for Euler angle: Similar to the reward for velocity direction, we want to stabilize the attitude motion of the rocket during landing and prefer to control the Euler angle of the rocket to arrive as close to our predefined reference attitude , and . The reward function can be expressed as:Finally, the overall reward function is given in the form of a summation
2.3.4. Network Architecture
3. Results
3.1. Environmental Settings
3.2. Training Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DDPG | Deep deterministic policy gradient |
TD3 | Twin-delayed deep deterministic policy gradient |
SD3 | Softmax double deep deterministic policy gradient |
Appendix A
Appendix B
References
- Cui, N.; Wu, R.; Wei, C.; Xu, D.; Zhang, L. Development and Key Technologies of Vertical Takeoff Vertical Landing Reusable Launch Vehicle. Astronaut. Syst. Eng. Technol. 2018, 2, 27–42. [Google Scholar]
- Jo, B.U.; Ahn, J. Optimal staging of reusable launch vehicles for minimum life cycle cost. Aerosp. Sci. Technol. 2022, 127, 107703. [Google Scholar] [CrossRef]
- Jones, H. The recent large reduction in space launch cost. In Proceedings of the 48th International Conference on Environmental Systems, Albuquerque, NM, USA, 8–12 July 2018. [Google Scholar]
- Sivan, K.; Pandian, S. An overview of reusable launch vehicle technology demonstrator. Curr. Sci. 2018, 114, 38–47. [Google Scholar] [CrossRef]
- de Mirand, A.P.; Bahu, J.M.; Louaas, E. Ariane Next, a vision for a reusable cost efficient European rocket. In Proceedings of the 8th European Conference for Aeronautics and Space Sciences (EUCASS), Madrid, Spain, 1–4 July 2019. [Google Scholar]
- Meyerson, R.; Taylor, A. A status report on the development of the Kistler Aerospace K-1 Reusable Launch Vehicle. In Proceedings of the 16th AIAA Aerodynamic Decelerator Systems Technology Conference and Seminar, Boston, MA, USA, 21–24 May 2001; p. 2069. [Google Scholar]
- Gardinier, D.; Taylor, A. Design and testing of the K-1 reusable launch vehicle landing system airbags. In Proceedings of the 15th Aerodynamic Decelerator Systems Technology Conference, Toulouse, France, 9–11 June 1999; p. 1757. [Google Scholar]
- King, R.; Hengel, J.; Wolf, D. Ares I First Stage Booster Deceleration System-An Overview. In Proceedings of the 20th AIAA Aerodynamic Decelerator Systems Technology Conference and Seminar, Seattle, WA, USA, 4–7 May 2009; p. 2984. [Google Scholar]
- Davis, S. Ares IX Flight Test: The Future Begins Here. In Proceedings of the AIAA SPACE 2008 Conference & Exposition, San Diego, CA, USA, 9–11 September 2008; p. 7806. [Google Scholar]
- Açıkmeşe, B.; Carson, J.M.; Blackmore, L. Lossless convexification of nonconvex control bound and pointing constraints of the soft landing optimal control problem. IEEE Trans. Control Syst. Technol. 2013, 21, 2104–2113. [Google Scholar] [CrossRef]
- Blackmore, L. Autonomous precision landing of space rockets. In Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2016 Symposium; The Bridge: Washington, DC, USA, 2016; Volume 46, pp. 15–20. [Google Scholar]
- Blackmore, L.; Açikmeşe, B.; Scharf, D.P. Minimum-landing-error powered-descent guidance for Mars landing using convex optimization. J. Guid. Control Dyn. 2010, 33, 1161–1171. [Google Scholar] [CrossRef] [Green Version]
- Mattingley, J.; Boyd, S. CVXGEN: A code generator for embedded convex optimization. Optim. Eng. 2012, 13, 1–27. [Google Scholar] [CrossRef]
- Sagliano, M.; Mooij, E.; Theil, S. Onboard trajectory generation for entry vehicles via adaptive multivariate pseudospectral interpolation. J. Guid. Control Dyn. 2016, 40, 466–476. [Google Scholar] [CrossRef] [Green Version]
- Seelbinder, D.; Büskens, C. Real-time atmospheric entry trajectory computation using parametric sensitivities. In Proceedings of the 6th International Conference on Astrodynamics Tools and Techniques, Darmstadt, Germany, 14–17 March 2016. [Google Scholar]
- Wang, J.; Cui, N.; Wei, C. Optimal rocket landing guidance using convex optimization and model predictive control. J. Guid. Control Dyn. 2019, 42, 1078–1092. [Google Scholar] [CrossRef]
- Cheng, L.; Wang, Z.; Jiang, F.; Li, J. Fast generation of optimal asteroid landing trajectories using deep neural networks. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 2642–2655. [Google Scholar] [CrossRef]
- Song, Y.; Miao, X.; Cheng, L.; Gong, S. The feasibility criterion of fuel-optimal planetary landing using neural networks. Aerosp. Sci. Technol. 2021, 116, 106860. [Google Scholar] [CrossRef]
- Furfaro, R.; Bloise, I.; Orlandelli, M.; Di Lizia, P.; Topputo, F.; Linares, R. Deep learning for autonomous lunar landing. Adv. Astronaut. Sci. 2018, 167, 3285–3306. [Google Scholar]
- Furfaro, R.; Bloise, I.; Orlandelli, M.; Di Lizia, P.; Topputo, F.; Linares, R. A recurrent deep architecture for quasi-optimal feedback guidance in planetary landing. In Proceedings of the IAA SciTech Forum on Space Flight Mechanics and Space Structures and Materials, Moscow, Russia, 13–15 November 2018; pp. 1–24. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ye, D.; Liu, Z.; Sun, M.; Shi, B.; Zhao, P.; Wu, H.; Yu, H.; Yang, S.; Wu, X.; Guo, Q.; et al. Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6672–6679. [Google Scholar]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef] [PubMed]
- Soleymani, F.; Paquet, E. Financial portfolio optimization with online deep reinforcement learning and restricted stacked autoencoder—DeepBreath. Expert Syst. Appl. 2020, 156, 113456. [Google Scholar] [CrossRef]
- Palanisamy, P. Multi-agent connected autonomous driving using deep reinforcement learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar]
- Yu, C.; Wang, X.; Xu, X.; Zhang, M.; Ge, H.; Ren, J.; Sun, L.; Chen, B.; Tan, G. Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs. IEEE Trans. Intell. Transp. Syst. 2019, 21, 735–748. [Google Scholar] [CrossRef]
- Ciabatti, G.; Daftry, S.; Capobianco, R. Autonomous planetary landing via deep reinforcement learning and transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2031–2038. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Cheng, L.; Wang, Z.; Jiang, F. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm. Astrodynamics 2019, 3, 375–386. [Google Scholar] [CrossRef]
- Furfaro, R.; Scorsoglio, A.; Linares, R.; Massari, M. Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach. Acta Astronaut. 2020, 171, 156–171. [Google Scholar] [CrossRef] [Green Version]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Gaudet, B.; Linares, R.; Furfaro, R. Deep reinforcement learning for six degree-of-freedom planetary landing. Adv. Space Res. 2020, 65, 1723–1741. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Parameters | Values |
---|---|
Launch azimuth A | |
Latitude of launch point B | N |
Longitude of launch point | E |
Mass of rocket m | 35,000 kg |
Radius of rocket r | 3.7 m |
Height of rocket h | 40 m |
Minimum thrust | 250,000 N |
Maximum thrust | 750,000 N |
Reference area of rocket | 10.75 m |
Specific impulse | 345 s |
Landing zone area | 10 m × 10 m |
Acceptable vertical landing velocity | m/s |
Acceptable translational landing velocity | m/s |
Initial States of the Rocket | Value Ranges |
---|---|
X-axis component of the initial position | m |
Y-axis component of the initial position | 2500 m |
Z-axis component of the initial position | m |
X-axis component of the initial velocity | m/s |
Y-axis component of the initial velocity | m/s |
Z-axis component of the initial velocity | m/s |
Initial pitch angle | |
Initial yaw angle | |
Initial roll angle |
Parameters | Values |
---|---|
Size of replay buffer | |
Variance of policy noise | |
Maximum noise c | |
Discount factor | |
Learning rate of actor networks | |
Learning rate of critic networks |
Landing Point Coordinates | DDPG | TD3 | SD3 |
---|---|---|---|
Average | m | m | m |
Standard deviation | m | m | m |
Best | m | m | m |
Worst | m | m | m |
Reward Function Components | DDPG | TD3 | SD3 |
---|---|---|---|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, W.; Zhang, X.; Dong, Y.; Lin, Y.; Li, H. Powered Landing Control of Reusable Rockets Based on Softmax Double DDPG. Aerospace 2023, 10, 590. https://doi.org/10.3390/aerospace10070590
Li W, Zhang X, Dong Y, Lin Y, Li H. Powered Landing Control of Reusable Rockets Based on Softmax Double DDPG. Aerospace. 2023; 10(7):590. https://doi.org/10.3390/aerospace10070590
Chicago/Turabian StyleLi, Wenting, Xiuhui Zhang, Yunfeng Dong, Yan Lin, and Hongjue Li. 2023. "Powered Landing Control of Reusable Rockets Based on Softmax Double DDPG" Aerospace 10, no. 7: 590. https://doi.org/10.3390/aerospace10070590
APA StyleLi, W., Zhang, X., Dong, Y., Lin, Y., & Li, H. (2023). Powered Landing Control of Reusable Rockets Based on Softmax Double DDPG. Aerospace, 10(7), 590. https://doi.org/10.3390/aerospace10070590