Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search
Abstract
:1. Introduction
2. Problem Description
2.1. General Framework of Self-Learning Control for Launch Vehicle Recovery
2.2. Deep Network Architecture and Automatic Optimization of Hyperparameters
3. Algorithm Design
3.1. Neural Network Architecture Search Algorithm
3.1.1. Lightweight Search Design
3.1.2. Particle Swarm Optimization Algorithm
Algorithm 1: NAS algorithm based on the improved MOHPSO |
Input: Search space , particle swarm location range ; The maximum velocity of a particle swarm , the number of particles , and the number of iterations Output: Optimized neural network architecture model |
1. Initialize the particle swarm and disposable solution Archive Set according to and 2. For i←1 to do 3. For j←1 to do 4. Select the structure, train the model, and obtain the particle fitness value according to Formula (15) 5. if particle fitness > Pbest fitness then 6. Update the Pbest value 7. if the fitness of the disposable Archive Set of particle fitness then 8. Update the disposable Archive Set 9. End for 10. Update the velocity and position of Particles according to Formulas (12) and (13) 11. End for 12. Return |
3.2. Bayesian Optimization Algorithm
Algorithm 2: Bayesian reinforcement learning hyperparameter optimization |
Input: The launch vehicle recovery reinforcement learning needs to optimize the combination of hyperparameters ; is the maximum number of iterations; is the maximum number of assessments; is the median value of fitness in the first five assessments. Output: The optimal combination of hyperparameters . |
1. Initialization: The initialization point is generated randomly 2. while : 3. if 4. Maximize the collection function to select the next evaluation point; 5. Model parameters are obtained by training the model; 6. Calculate the fitness value according to Equation (5) fitness; 7. if: while: :ends the algorithm early 8. 9. end if 10. Update the TPE proxy model and the optimal hyperparameter values; 11. 12. end while 13. return . |
4. Simulation Analysis
4.1. Simulation Parameter Settings
4.2. Analysis of Training Results
4.3. Simulation Verification
4.4. Generalization Ability Verification
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, Z.G.; Luo, S.B.; Wu, J.J. Recent Progress on Reusable Launch Vehicle; National University of Defense Technology Press: Changsha, China, 2004. [Google Scholar]
- Jones, H.W. The Recent Large Reduction in Space Launch Cost. In Proceedings of the 48th International Conference on Environmental Systems, Albuquerque, NM, USA, 8–12 July 2018. [Google Scholar]
- Xu, D.; Zhang, Z.; Wu, K.; Li, H.B.; Lin, J.F.; Zhang, X.; Guo, X. Recent progress on development trend and key technologies of vertical take-off vertical landing reusable launch vehicle. Chin. Sci. Bull. 2016, 61, 3453–3463. [Google Scholar] [CrossRef]
- Jo, B.U.; Ahn, J. Optimal staging of reusable launch vehicles for minimum life cycle cost. Aerosp. Sci. Technol. 2022, 127, 107703. [Google Scholar] [CrossRef]
- Li, X.D.; Liao, Y.X.; Liao, J.; Luo, S.B. Finite-time sliding mode control for vertical recovery of the first-stage of reusable rocket. J. Cent. South Univ. (Sci. Technol.) 2020, 51, 979–988. [Google Scholar] [CrossRef]
- Blackmore, L.; Scharf, D.P. Minimum-landing-error powered-descent guidance for Mars landing using convex optimization. J. Guid. Control Dyn. 2010, 33, 1161–1171. [Google Scholar] [CrossRef]
- Tang, B.T.; Cheng, H.B.; Yang, Y.X.; Jiang, X.J. Research on iterative guidance method of solid sounding rocket. J. Solid Rocket Technol. 2024, 47, 135–142. [Google Scholar] [CrossRef]
- Tian, W.L.; Yan, Z.W.; Li, W.; Jia, D. Design and analysis of takeoff and landing control algorithm for four-rocket boosting drone. Adv. Aeronaut. Sci. Eng. 2024, 15, 105–117. [Google Scholar] [CrossRef]
- Wu, N.; Zhang, L. A fast and accurate injection strategy for solidrockets based on the phase plane control. Aerosp. Control 2020, 38, 44–50. [Google Scholar] [CrossRef]
- Zhang, L.; Li, D.Y.; Cui, N.G.; Zhang, Y.H. Full profile flight preset Performance control for vertical take-off and Landing reusable launch vehicle. Acta Aeronaut. Sin. 2019, 44, 179–195. [Google Scholar]
- Liu, W.; Wu, Y.Y.; Liu, W.; Tian, M.M.; Huang, T.P. RLV reentry robust fault-tolerant attitude control considering unknown disturbance. Acta Aeronaut. Sin. 2019, 44, 169–176. [Google Scholar]
- Yang, Z.S.; Mao, Q.; Dou, L.Q. Design of Interval Two adaptive fuzzy sliding Mode Control for Reentry attitude of Reusable Aircraft. J. Beijing Univ. Aeronaut. Astronaut. 2019, 46, 781–790. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, J.; Li, Y.; Gong, Q.; Luo, W.; Zhao, J. Automated Reinforcement Learning Based on Parameter Sharing Network Architecture Search. In Proceedings of the 2021 6th International Conference on Robotics and Automation Engineering (ICRAE), Guangzhou, China, 19–22 November 2021; pp. 358–363. [Google Scholar]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Hadi, B.; Khosravi, A.; Sarhadi, P. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle. Appl. Ocean Res. 2022, 129, 103326. [Google Scholar] [CrossRef]
- Bijjahalli, S.; Sabatini, R.; Gardi, A. Advances in intelligent and autonomous navigation systems for small UAS. Prog. Aerosp. Sci. 2020, 115, 100617. [Google Scholar] [CrossRef]
- Alagumuthukrishnan, S.; Deepajothi, S.; Vani, R.; Velliangiri, S. Reliable and Efficient Lane Changing Behaviour for Connected Autonomous Vehicle through Deep Reinforcement Learning. Procedia Comput. Sci. 2023, 218, 1112–1121. [Google Scholar] [CrossRef]
- Huang, Z.G.; Liu, Q.; Zhu, F. Hierarchical reinforcement learning with adaptive scheduling for robot control. Eng. Appl. Artif. Intell. 2023, 126, 107130. [Google Scholar] [CrossRef]
- Liu, J.Y.; Wang, G.; Fu, Q.; Yue, S.H.; Wang, S.Y. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning. Def. Technol. 2023, 19, 210–219. [Google Scholar] [CrossRef]
- Federici, L.; Scorsoglio, A.; Zavoli, A.; Furfaro, R. Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions. Acta Astronaut. 2022, 201, 129–141. [Google Scholar] [CrossRef]
- Federici, L.; Zavoli, A. Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning. Acta Astronaut. 2024, 214, 147–158. [Google Scholar] [CrossRef]
- Costa, B.A.; Parente, F.L.; Belfo, J.; Somma, N.; Rosa, P.; Igreja, J.M.; Belhadj, J.; Lemos, J.M. A reinforcement learning approach for adaptive tracking control of a reusable rocket model in a landing scenario. Neurocomputing 2024, 577, 127377. [Google Scholar] [CrossRef]
- Belkhale, S.; Li, R.; Kahn, G.; McAllister, R.; Calandra, R.; Levinel, S. Model-Based Meta-Reinforcement Learning for Flight With Suspended Payloads. IEEE Robot. Autom. Lett. 2021, 6, 1471–1478. [Google Scholar] [CrossRef]
- Xue, S.; Han, Y.; Bai, H. Research on Ballistic Planning Method Based on Improved DDPG Algorithm. In Proceedings of the 2023 International Conference on Cyber-Physical Social Intelligence (ICCSI), Xi’an, China, 20–23 October 2023; pp. 13–19. [Google Scholar]
- Xu, J.; Du, T.; Foshey, M.; Li, B.; Zhu, B.; Schulz, A.; Matusik, W. Learning to fly: Computational controller design for hybrid UAVs with reinforcement learning. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer Nature: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
- Wen, L.; Gao, L.; Li, X.; Li, H. A new genetic algorithm based evolutionary neural architecture search for image classification. Swarm Evol. Comput. 2022, 75, 101191. [Google Scholar] [CrossRef]
- Chen, L.C.; Collins, M.D.; Zhu, Y.; Papandreou, G.; Zoph, B.; Schroff, F.; Adam, H.; Shlens, J. Searching for efficient multi-scale architectures for dense image prediction. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 8699–8710. [Google Scholar]
- Wang, Y.; Yang, Y.; Chen, Y.; Bai, J.; Zhang, C.; Su, G.; Kou, X.; Tong, Y.; Yang, M.; Zhou, L. Textnas: Aneural architecture search space tailored for text representation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 9242–9249. [Google Scholar]
- Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–16. [Google Scholar]
- Xie, L.; Yuille, A. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1379–1388. [Google Scholar]
- Falanti, A.; Lomurno, E.; Ardagna, D.; Matteucci, M. POPNASv3: A pareto-optimal neural architecture search solution for image and time series classification. Appl. Soft Comput. 2023, 145, 110555. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. Advances in Neural Information Processing Systems. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Yang, C.H.; Xia, Y.S.; Chu, Z.F.; Zha, X. Logic Synthesis Optimization Sequence Tuning Using RL-Based LSTM and Graph Isomorphism Network. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 3600–3604. [Google Scholar] [CrossRef]
- Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Mhs95 Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; IEEE: Piscataway, NJ, USA, 2002. [Google Scholar]
- Feng, X.F.; Liu, T.W.; Li, B.; Su, S. Wind power slope climbing event detection method based on sliding window two-sided CUSUM algorithm. Sci. Technol. Eng. 2024, 24, 595–603. [Google Scholar] [CrossRef]
- Young, M.T.; Hinkle, J.D.; Kannan, R.; Ramanathan, A. Distributed Bayesian optimization of deep reinforcement learning algorithms. J. Parallel Distrib. Comput. 2020, 139, 43–52. [Google Scholar] [CrossRef]
- Deng, S. CNN hyperparameter optimization method based on improved Bayesian optimization algorithm. Appl. Res. Comput. 2019, 36, 1984–1987. [Google Scholar] [CrossRef]
- Dong, L.J.; Fang, Z.; Chen, H.T. Compressor fault diagnosis based on deep learning and Bayesian optimization. Mach. Des. Manuf. 2023, 384, 45–52. [Google Scholar] [CrossRef]
No. | State Variable | Symbol |
---|---|---|
1 | Position coordinates in the x direction in the inertial coordinate system | |
2 | Position coordinates in the y direction in the inertial coordinate system | |
3 | Position coordinates in the z direction in the inertial coordinate system | |
4 | The velocity in the x direction in the inertial coordinate system | |
5 | The velocity in the y direction in the inertial coordinate system | |
6 | The velocity in the z direction in the inertial coordinate system | |
7 | Pitch angle | |
8 | Yaw angle | |
9 | Roll angle | |
10 | The angular velocity of rotation in the x direction in the launch vehicle coordinate system | |
11 | The angular velocity of rotation in the y direction in the launch vehicle coordinate system | |
12 | The angular velocity of rotation in the z direction in the launch vehicle coordinate system | |
13 | Rocket mass |
No. | Action Variable | Symbol |
---|---|---|
1 | Thrust direction angle | |
2 | Thrust direction angle | |
3 | Magnitude of thrust |
No. | Type | Hyperparameter | Symbol | Search Area |
---|---|---|---|---|
1 | Actor network | Number of hidden layers | {1, 2, 3, 4} | |
Hidden layer neuron | {64, 128, 256} | |||
Activation function | {tanh, relu, sigmoid} | |||
2 | Critic network | Number of hidden layers | {1, 2, 3, 4} | |
Hidden layer neuron | {64, 128, 256} | |||
Activation function | {tanh, relu, sigmoid} | |||
3 | LSTM network | Hidden layer neuron | {16, 32, 64, 128, 256, 512} |
No. | Hyperparameter | Symbol | Search Area |
---|---|---|---|
1 | Batch learning size | b | {32, 64, 128, 256, 512} |
2 | Maximum steps | N | {256, 512, 1024, 2048, 8192} |
3 | Discount factor | uniform (0.8, 0.999) | |
4 | Learning rate | loguniform (1 × 10−5, 1) | |
5 | Clipping factor | uniform (0.1, 0.4) | |
6 | Generalized dominance estimation parameters | uniform (0.8, 1) |
No. | Rocket Parameter | Value |
---|---|---|
1 | Total length of rocket | 40 m |
2 | Diameter of rocket | 3.66 m |
3 | Mass of rocket | 41 t |
4 | Specific impulse | 360 s |
5 | Atmospheric density | 1.225 kg/m3 |
6 | Acceleration of gravity | 9.8 kg/s2 |
7 | Aerodynamic drag coefficient | 0.82 |
8 | The maximum thrust of the rocket | 981 KN |
9 | Initial position coordinate | (2000, −1600, 50) m |
10 | Initial velocity | (−90, 180, 0) m/s |
11 | Landing position coordinates | (0, 0, 0)m |
12 | Attitude angle limitation during landing | [85, 85, 360]° |
13 | Terminal position constraint | 5 m |
14 | Terminal velocity constraint | 2 m/s |
15 | Terminal attitude deviation constraint | [10, 10, 360]° |
16 | Rotational angular velocity constraint | [0.2, 0.2, 0.2] rad/s |
No. | NAS Search Policy | Search Time | Average Fitness Value | Accuracy Rate |
---|---|---|---|---|
1 | Reinforcement learning | 10 h | 162 | 79.4% |
2 | Genetic algorithm | 8 h | 140 | 80.5% |
3 | Particle swarm optimization | 7 h | 200 | 80.2% |
4 | Algorithm of this paper | 1.5 h | 298.8 | 99.6% |
No. | Type | Hyperparameter | Symbol | Search Area |
---|---|---|---|---|
1 | Actor network | Number of hidden layers | 3 | |
Hidden layer neuron | [64, 64, 128] | |||
Activation function | tanh | |||
2 | Critic network | Number of hidden layers | 3 | |
Hidden layer neuron | [64, 64, 128] | |||
Activation function | relu | |||
3 | LSTM network | Hidden layer neuron | 128 |
No. | Hyperparameter | Symbol | Search Area |
---|---|---|---|
1 | Batch learning size | b | 64 |
2 | Maximum steps | N | 2048 |
3 | Discount factor | 0.9391973108460121 | |
4 | Learning rate | 0.0001720593703687 | |
5 | Clipping factor | 0.2668120684510983 | |
6 | Generalized dominance estimation parameters | 0.8789545362092943 |
No. | Name | Terminal Position Accuracy/m | Terminal Speed Accuracy/m/s | Terminal Attitude Deviation /° | Fuel Consumption/kg |
---|---|---|---|---|---|
1 | NAS-RL | 0.4 | 1.0 | (1.916, 0.129, 0.003) | 3437.9 |
2 | RL | 11.9 | 12.9 | (2.593, 0.228, 0.008) | 4086.3 |
3 | Convex optimization | 10.0 | 7.5 | (2.420, −0.774, 0.003) | 3452.2 |
4 | NAS-RL-Embedded platform | 0.4 | 1.0 | (1.916, 0.129, 0.003) | 3437.9 |
No. | Name | Symbol | Maximum Deviation | Minimum Deviation |
---|---|---|---|---|
1 | The initial position of the rocket | +10% | −10% | |
2 | The initial velocity of the rocket | +10% | −10% | |
3 | Mass of rocket | +10% | −10% | |
4 | Aerodynamic drag coefficient | +10% | −10% | |
5 | Fuel specific impulse | +1% | −1% | |
6 | Atmospheric density | +5% | −5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xue, S.; Wang, Z.; Bai, H.; Yu, C.; Li, Z. Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search. Aerospace 2024, 11, 774. https://doi.org/10.3390/aerospace11090774
Xue S, Wang Z, Bai H, Yu C, Li Z. Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search. Aerospace. 2024; 11(9):774. https://doi.org/10.3390/aerospace11090774
Chicago/Turabian StyleXue, Shuai, Zhaolei Wang, Hongyang Bai, Chunmei Yu, and Zian Li. 2024. "Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search" Aerospace 11, no. 9: 774. https://doi.org/10.3390/aerospace11090774
APA StyleXue, S., Wang, Z., Bai, H., Yu, C., & Li, Z. (2024). Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search. Aerospace, 11(9), 774. https://doi.org/10.3390/aerospace11090774