1. Introduction
In this paper, an adaptive convex optimization algorithm is presented for the application of lunar surface pinpoint landings. Unlike the classical method that directly convexifies the lunar landing dynamics model with the corresponding constraints into a second-order cone programming (SOCP) problem, the proposed approach introduces parameter-adaptive and target-adaptive algorithms to improve the adaptability of the method for parameter uncertainties and constraints on the final attitude and thrust magnitude of the lander.
To successfully implement autonomous obstacle avoidance and ensure an accurate and safe landing, the lunar landing process is usually divided into three phases: powered descent, approach, and landing, as shown in
Figure 1 [
1,
2,
3,
4]. In the powered descent phase, the lander’s altitude and velocity will be greatly reduced. In the subsequent approach phase, various sensors on the lander will start to work actively to avoid obstacles and select the landing site, and a soft landing will be achieved in the final landing phase. These phases can be subdivided further depending on the mission. Taking Chang’e-3 [
2,
5,
6,
7,
8] as an example, the lunar lander is in a transfer orbit with an apocynthion of 100 km and pericynthion of 15 km before the powered descent phase. From near the pericynthion, it enters the powered descent phase. The main goal of the powered descent phase is to reduce the altitude and velocity of the lander as much as possible. During the powered descent phase, the altitude is reduced from 15 km to 2.4 km and the velocity is reduced from 1.7 km/s to approximately 50 m/s [
9,
10]. This phase typically consumes approximately 80% of the lander’s fuel, so the guidance algorithm for this phase requires minimizing fuel consumption with the initial lander state error and model uncertainty and achieving the entrance state requirements for the subsequent approach phase at the end.
To save as much fuel as possible during the powered descent phase, the linear tangent guidance law with suboptimal fuel characteristics was adopted for the Chang’e-3 probes. However, the linear tangent guidance law cannot control the flight distance; in fact, the allowable landing area of Chang’e-3 is a rectangle area, 356 km long and 91 km wide, in Sinus Iridum [
11]. This clearly does not meet the mission requirements of future lunar exploration for high-precision landings.
China plans to establish a manned lunar base in the future, which requires the lander to be able to land with pinpoint accuracy near the base [
12]. In addition, as the exploration of the moon progresses further, lunar landers need the ability to land precisely near high-science-value targets with complex terrain [
13,
14].
All of these goals require the next-generation lunar lander to have the capability of precision pinpoint landing. The quality of trajectory planning and the accuracy of the guidance algorithm have a direct impact on the final landing accuracy and fuel consumption during the powered descent phase, where most of the lander’s kinetic and potential energy are dissipated. Due to their critical importance, these algorithms have received extensive attention from various institutions and individuals [
15].
Current lunar surface landing guidance algorithms are divided into explicit and trajectory optimization algorithms. Explicit guidance algorithms are typically represented by augmented Apollo powered-descent guidance (A
2PDG) [
16], fractional-polynomial powered descent guidance [
17], and zero-effort-miss/zero-effort-velocity (ZEM/ZEV) guidance [
18,
19,
20]. These algorithms have good autonomy for real-time adjustments and anti-interference capabilities and can guarantee terminal guidance accuracy.
Compared to explicit guidance algorithms, numerical optimization algorithms based on optimization principles design nominal trajectories according to certain objectives and constraints. These methods typically consume less fuel and are more robust. In particular, convex optimization algorithms have been widely studied and applied owing to their polynomial computational complexity and theoretical global optimality [
21,
22].
The method presented in [
22] transforms the Mars landing problem into a convex optimization problem, ensuring convergence to the optimal solution within a limited number of iterations. As a result, subsequent research has focused on extending this method to more complex dynamics, constraints, free-final-time problems, and other related areas. These improvements aim to enhance the robustness and applicability of the algorithm [
23,
24]. The determination of the optimal time-of-flight required for convex optimization was further specified in [
23]. As reported in [
24], by combining convex optimization with the pseudo-spectral method, the number of nodes and CPU time is reduced, and the accuracy of the operation is improved by reasonably selecting the set of nodes.
Furthermore, other improvements have been investigated to extend the applicability of the method under different conditions, such as complex dynamics environments, or to add functions, such as hazard avoidance. In [
25], convex optimization and curvature adjustment strategies were combined to ensure obstacle avoidance while reducing fuel consumption. In [
26], an approach to optimal trajectory design with uncertainty was considered. The uncertainty was further considered in [
27], where a convex optimization method was proposed to achieve precise landing of the rocket under different initial states and various disturbances. An optimal control approach based on learning and theoretical support was proposed to solve the problem of the on-board fuel-optimal guidance law in the powered descent phase in [
28]. The dimension of the learning space was significantly reduced by guiding the learning process with the necessary conditions derived from Pontryagin’s minimum principle, and supervised learning (SL) and optimal control theory were combined. In [
29], a learning-based six-DOF planetary powered descent and landing method was introduced using reinforcement learning theory. The thrust command of each engine was mapped to the estimated state of the lander by learning a policy, achieving precise positioning and soft landing with robustness to noise and parameter uncertainty. Different discount rates were also introduced to calculate shape and terminal rewards, significantly improving performance.
In [
30], a convex optimal trajectory programming algorithm was improved for application to landing on asteroids with irregular shapes and gravitational fields. In [
31], an effective solution for powered-descent guidance was proposed, which considered multiple constraint conditions, such as attitude and state-triggered constraints. The authors of [
32] utilized a dual-quaternion-based approach to simultaneously handle attitude and thrust constraints, achieving an integrated design of guidance and control. Building upon these works, the non-convex trajectory optimization problem, which is difficult to handle, was solved in real-time by combining continuous convex programming and first-order cone optimization in [
33]. Multiple constraint conditions, including attitude, thrust, and state-triggered constraints, can be simultaneously handled by this algorithm, which achieves higher computational efficiency and faster convergence rates.
In this study, we first considered that landers usually fly with uncertainties such as specific impulse and initial mass [
34,
35], which can affect convex optimization and reduce the optimality of the lander trajectory and increase fuel consumption. In addition, various lunar surface sensors, such as cameras and lidars, typically do not work when the lander is further away from the moon, which means that the lander has no attitude constraints at this time. However, as the powered descent phase comes to an end and the lander gets close enough to the moon, it is usually desired that the lander has a specific attitude so that it can obtain images and measurements from various sensors for obstacle avoidance and navigation [
9,
35,
36,
37,
38] and a specific thrust magnitude for a smooth transition to the subsequent phase [
10,
39].
On this basis, an adaptive convex optimization algorithm for lunar landing guidance is proposed. In the proposed algorithm, we first consider the effect of parameter uncertainty during the flight of the powered descent phase. Thus, an optimal observer using accelerometric measurements was designed to estimate the specific impulse and real-time mass of the lander and eliminate uncertainty. After this observer convergence is reached, the lander’s trajectory is replanned online on the basis of the new parameter values. This algorithm utilizes the optimal observer to obtain the optimal estimation of the system state and parameters, compensating for parameter uncertainty. In contrast to learning-based methods, this algorithm does not require training datasets and can provide more precise estimations due to its mathematical model-based nature. Furthermore, the optimal observer-based approach can offer error quantification and correction for estimations, thereby improving the robustness of the algorithm. Additionally, unlike the approach in [
31,
32,
33] that introduces attitude constraints throughout the entire flight, this paper inserts a rapid adjustment phase at the end of the trajectory generated by the classical convex optimization algorithm. This method does not introduce attitude constraints in the convex optimization process, nor does it require integrated guidance and control design. Instead, the rapid adjustment phase at the end of the trajectory, generated by the classical convex optimization algorithm, is used to constrain the lander’s attitude (primarily pitch angle) and thrust magnitude. The deviation of the position and velocity caused by the rapid adjustment of the attitude and thrust is eliminated by adaptively adjusting the power descent phase target to meet the requirement of approaching the phase entrance window. Compared with classical methods, the proposed approach in this paper, which incorporates a rapid adjustment phase and a target-adaptive algorithm, only requires the lander to have a specific attitude at the end of the powered descent phase, rather than imposing unnecessary constraints on attitude range throughout the entire flight. This provides the lander with greater flexibility for adjusting attitude during flight, leading to more efficient fuel consumption. Moreover, it avoids the need for six-DOF guidance and a control-integrated design, which reduces the design complexity and system complexity, as well as the state dimension. This results in a smaller computational load and better real-time performance of the guidance system.
The remainder of this paper is organized as follows.
Section 2 presents the lunar landing process and phase division as well as the dynamics model and briefly describes how to transform the lunar landing optimal trajectory planning problem into a finite-dimensional convex optimization problem.
Section 3 presents the design of an optimal estimator to estimate the uncertainties in the lunar landing process to achieve parameter adaption.
Section 4 presents a target-adaptive algorithm that allows the lander to make rapid adjustments to meet the approach phase entrance requirements without causing position and velocity deviations.
Section 5 presents the numerical results using only parameter adaption, only target adaption, and integrated simulation. Finally,
Section 6 summarizes the conclusions of this study.
2. Problem Formulation
2.1. Guidance System Scheme
The convex optimization algorithm, which is used to plan the optimal trajectory, can transfer from the given initial state to a predetermined target state on the premise of saving as much fuel as possible. However, two problems remain.
On the one hand, it is difficult to add the final attitude constraints to the guidance process, which causes uncertainty in the final attitude. However, when the lander is close enough to the lunar surface, to navigate and avoid hazards, the sensors and detection elements usually require the lander to have a certain attitude. Furthermore, the lander requires a specific thrust for a smooth transition to the subsequent phase. To achieve the attitude and thrust constraints at the end of the powered descent phase, the powered descent phase is divided into the main breaking phase and rapid adjustment phase. The thrust and attitude are smoothly transferred to meet the requirements of the subsequent phase during the rapid adjustment phase. However, the trajectory generated by convex optimization depends on the specific impulse, initial mass, and other parameters of the engine. The uncertainty of the parameters affects the optimality of the trajectory formed by the convex optimization and the accuracy of the final state. Simultaneously, the rapid adjustment process also affects the final state. Therefore, it is necessary to adopt an adaptive convex optimization algorithm to achieve the entrance condition of the approach phase by automatically adjusting the powered descent phase target and estimating the guidance parameters in real time to improve guidance accuracy and reduce fuel consumption.
The sequence of the guidance system is shown in
Figure 2. The powered descent phase is divided into the main breaking phase and the rapid adjustment phase. A closed-loop feedback system is implemented along the optimal trajectory generated by convex optimization. The nominal trajectory is generated twice: The trajectory generated for the first time is offline, and only the target-adaptive algorithm is considered. After the online parameter estimation converges, the adaptive convex optimization is performed according to the current lander state. Both the target adaption and parameter adaption are considered during this convex optimization. The lander then carries closed-loop feedback guidance according to the nominal trajectory. Before the end of the powered descent phase, the lander enters the rapid adjustment phase. In this phase, the lander adjusts its attitude and thrust rapidly. In this phase, the lander performs open-loop guidance. As the target-adaptive adaption has been carried out previously, after the guidance of the rapid adjustment phase is completed, the lander reaches the target state of the power descent phase.
2.2. Dynamic Model
As the inertial force does not exist in the lunar center inertial coordinate system, and to simplify the problem, a dynamic model is established in this coordinate system. Under this condition, the dynamic model is given by:
where
and
are the position and velocity of the lander at time, respectively;
is the thrust;
is the mass;
is the vacuum specific impulse;
is the nominal gravity acceleration constant of the Earth;
is the gravitational acceleration; and
denotes the mass flow coefficient.
As this paper adopts the inverse-square gravity model, and the coordinate system of the model is in the inertial frame, the gravitational acceleration varies with the position during flight. On the one hand, as the flight altitude decreases, the magnitude of the gravitational acceleration changes. On the other hand, as the latitude and longitude of the lander change, the direction of the in the inertial frame also changes. Therefore, in this paper, the gravitational acceleration is assumed to be a time-varying quantity that changes along the nominal trajectory of the lander during the approach phase before powered descent guidance.
The state of the lander must be constrained to ensure vehicle safety during the landing process. In the landing process, to prevent the lander from hitting the lunar surface, a height constraint should be introduced:
where
denotes the height of the lander, and
denotes the minimum allowable height. The thrust constraint can also be introduced as:
where
,
are the minimum and maximum thrusts, respectively, provided by the thruster.
where
is the mass of the lander without fuel in the powered descent phase, and
is the final mass of the lander.
Finally, the constraints on the initial and final states of the lander are given as:
where
and
are the initial and target positions, respectively, and
and
are the initial and target velocity, respectively.
The trajectory planning problem for lunar landing can be formulated as a two-point boundary fuel-optimal control problem. By taking the fuel consumption as the landing performance index and maximizing the final mass
, the following objective function must be minimized:
By combining the dynamic model, constraints, and objective functions in Formulas (1)–(6), the optimization problem referred to as Problem 1 is expressed as follows.
Problem 1: Continuous-time minimum fuel optimization problem 2.3. Trajectory Planning Algorithm Based on Convex Optimization
For Problem 1, as shown in Formula (7), because
in the dynamic model appears in the denominator and the thrust constraint is a nonconvex constraint, Problem 1 is a non-convex problem. To use the convex optimization method to solve the problem the model needs to be convexified and converted into a convex problem, specifically, the SOCP problem. First, logarithmic mass is introduced [
22]:
Second, new control variables are introduced:
From Formulas (8) and (9), the dynamic model becomes:
The objective function becomes:
Furthermore, the thrust constraint and mass constraint also become:
The remaining mass constraint becomes:
where
denotes the logarithmic dry mass.
Furthermore, the control variable constraint must be introduced:
When the optimal solution is obtained, the control variable must satisfy [
40]:
where the superscript asterisks indicate the optimal control variables.
To avoid the exponential nonconvex constraint, Taylor expansion can be performed using Formula (12):
where
satisfies:
where
denotes the initial mass of the lander.
The logarithmic mass
must also satisfy the following constraints:
From the preceding convexification of the dynamics and constraints, the fuel-optimal continuous-time convex optimization problem represented as Problem 2 can be described as follows.
Problem 2: Continuous-time minimum fuel SOCP problem Problem 2 is an SOCP problem obtained by lossless convexification from Problem 1. This must be discretized before solving. The state and augmented control variables are defined as follows:
Accordingly, the dynamic model of Formula (10) becomes:
where
To discretize the problem, the time interval
is divided into
intervals with time increment
; the temporal nodes are
,
,
. In the time interval
, the control variables
satisfy:
where
denotes a specific primary function. In the proposed algorithm, the primary function is taken as a constant function, which means that the control variables remain unchanged in each interval. The discrete dynamic model is [
28,
41]:
where
,
, and
are
,
,
, and
is the solution of the linear matrix ordinary differential equation:
and
are defined as:
Problem 3 can be obtained after discretization:
Problem 3: Convex finite-dimensional fuel-optimal lunar landing problem
where
,
,
, and
.
In this way, the lunar landing problem becomes a discrete SOCP problem. If a feasible solution exists for Problem 3, then this solution also defines a feasible solution for Problem 1 [
22], and simultaneously, Problem 3 can be solved globally optimally in polynomial time [
42]. Further discussion on the speed of convergence can be found in [
23].
4. Target-Adaptive Algorithm
It is usually necessary for the lander to have a specific thrust and attitude at the end of the powered descent phase to facilitate the operation of various lunar surface sensors and to transition smoothly to the subsequent approach phase. The classical convex optimization algorithm typically imposes constraints on the thrust range and attitude throughout the entire flight process, rather than requiring the lander to have a specific thrust magnitude and direction at the end of the powered descent phase. To address this issue, this paper proposes a target-adaptive algorithm based on convex optimization for real-time trajectory planning. During most of the powered descent phase, the lander’s control system adjusts the attitude in real time based on the thrust direction commanded by the guidance system to ensure that the lander’s thrust direction is consistent with the command. In the last several seconds of the powered descent phase (generally 10~20 s), the lander switches to open-loop guidance and rapidly adjusts its attitude and thrust to smoothly transition to the desired thrust and attitude in a linear manner over time.
Due to the fact that the lander switches to open-loop guidance in the final seconds of the powered descent phase, adjusting the attitude and thrust rapidly to achieve the desired values, it naturally deviates from the nominal values in terms of thrust direction and magnitude. Consequently, the lander’s final state at the end of the powered descent phase deviates from the desired position and velocity. To address this issue, the target offset technique is proposed in this paper, which intentionally shifts the constraints on the final target state of the convex optimization to enable the lander to reach the original ideal target state, which has not been offset, after the rapid adjustment phase of open-loop guidance.
The flight process of the lander is shown in
Figure 3. Assuming that the lander flies along the initial nominal trajectory, as indicated by the black line in
Figure 3 at the beginning, after entering the rapid adjustment phase, because the attitude and thrust no longer refer to the nominal command designed by the nominal convex optimization algorithm, the position and velocity of the lander inevitably deviate from the ideal position and velocity along the black dotted line. The actual lander final state
also deviates from the target state,
. The target-adaptive algorithm adopts the target offset technology. After setting the target to
, the convex optimization trajectory is re-planned. The final nominal trajectory is indicated by the blue line in
Figure 3. The lander flies along the nominal trajectory through closed-loop guidance during the main breaking phase. In the rapid adjustment phase, owing to the rapid adjustment of attitude and thrust, the lander also deviates from the nominal trajectory along the blue dotted line. However, owing to the target offset technology, the lander reaches the target state after rapid adjustment.
The specific algorithm for target adaption is as follows: the target position and velocity of the lander are
and
, respectively. Then,
are considered as the target values of the convex optimization for trajectory programming. On the basis of the current optimal trajectory, it is assumed that the last
s will enter the rapid-adjusted phase so that the final attitude and thrust of the lander meet the constraints. However, the final position and velocity of the lander deviate from the nominal state owing to the insertion of a rapid adjustment phase compared with the nominal trajectory. Assuming that the times before and after the rapid adjustment are
and
, the ideal initial and end thrust values are
and
, and the initial and end pitch angles are
and
, respectively. As the lander generally does not exhibit lateral movement during landing, its yaw angle changes little; therefore, the yaw angle is assumed to change along the original yaw angle profile. Simultaneously, the roll angle is independent of the thrust direction and can be set arbitrarily. Therefore, the effect of roll angle is not considered in this process. During the rapid adjustment phase, the thrust and pitch angles change uniformly according to the program instructions:
where
represents the nondimensional time, and
,
represent the change values of thrust and pitch angle, respectively, to be adjusted in the rapid adjustment phase.
According to Formula (36) the output thrust vector of the lander is given by:
According to the thrust vector output , the position , velocity , and mass of the lander in the initial stage of the rapid adjustment phase, the predicted position , and velocity can be obtained.
As the process of the rapid adjustment phase lasts only 10~20 s, the mass of the lander changes little during this process. Consider a typical 7500 N variable-thrust engine with a specific impulse of 309 s as an example. Its full-thrust output is 20 s, and the fuel consumption is approximately 50 kg. If the initial mass of the lander is 5000 kg, the mass change is only 1%. Therefore, in the prediction process, the mass of the lander is regarded as unchanged and is calculated using the average mass of the rapid adjustment phase. First, the average mass is predicted as follows.
As the mass satisfies:
the final mass
can be obtained as:
and the average mass is:
Therefore, the predicted position and velocity changes of the lander are:
If the nominal target states without any offset are
and
, the position and velocity deviations caused by the rapid adjustment are:
In the proposed algorithm, the target bias method is used to adjust the target-state deviation; that is, the new landing target is set as:
where
and
are feedback correction coefficients for the target position and target velocity, respectively.
According to the new landing targets and obtained previously, convex optimization is performed again to obtain the optimal trajectory. After the optimal trajectory is obtained, the position and velocity deviations are recalculated. This process is repeated until the final position and velocity deviations are less than certain values.
According to the preceding discussion, a closed-loop guidance system is shown in
Figure 4. Its main guidance loop compares the nominal trajectory generated by convex optimization with the trajectory generated by guidance and carries out position and speed feedback on the basis of the nominal guidance command so that the lander does not leave the nominal trajectory in the presence of interference force and output uncertainty and successfully reaches the target state.
Simultaneously, the parameter-adaptive module uses the information measured by the accelerometer and the output data of the thruster to estimate the online parameters through a nonlinear optimal observer. The estimated parameters converge to the real value with the accumulation of acceleration measurement data. To avoid violent oscillation of the estimated parameters, the estimated parameters are smoothed by the sliding-window integral smoother, and then the parameters are transmitted to the convex optimization system to generate a new nominal trajectory. In the subsequent guidance process, the lander uses a new nominal trajectory for closed-loop feedback guidance. This module improves the adaptability of the guidance to mass and specific impulse uncertainty.
In addition, the target-adaptive module first performs pre-convex optimization in the convex optimization process, and according to the convex optimization trajectory, the final state of the main breaking phase is obtained, namely, the state of the lander at the moment. It is assumed that the attitude and thrust of the lander linearly and smoothly transit to the target attitude and thrust. This transition leads to deviation and of the target state of the lander. Therefore, the target of the lander deviates according to Formula (44), and convex optimization is performed again until the final state deviation of the lander is less than the given value. The trajectory formed at this time is the nominal trajectory, and the lander conducts closed-loop feedback guidance according to this trajectory. This module ensures the terminal attitude and thrust-constraint ability of the lander.