1. Introduction
In recent years, there has been increasing demand for lunar rovers for Moon exploration missions, with special designs suitable for the extreme environmental conditions of the Moon, and economically feasible hardware. Four-wheeled platforms have received significant attention as promising candidates because they are compact enough to fit the limited space available in launch vehicles while satisfying critical physical requirements [
1,
2,
3].
Lunar rovers are designed as four-wheeled two-steering mobile robots with a shielded body to allow them to tolerate the extreme environmental conditions experienced on Moon exploration missions, including the absence of air and microgravity [
4]. They must also meet the physical requirements of the Korean launch vehicles and landing modules in which they are loaded. The rover’s mechanical parts must also ensure the safety of their integrated electrical components, protecting them from the extreme environmental conditions of the Moon. For these reasons, a unique mechanism called modified explicit steering is used for the rover’s steering system. When all of the steering motors work properly, steering control can be achieved by approximating a kinematic model of the target rover, considering its slow-moving speed, and the speed of telecommunication between the Moon and the Earth.
This method, however, has a critical limitation; a failure in even a single steering motor would make it impossible to establish a mathematical dynamics model that is precise enough to be used for robot control due to the resultant recursive effects, which are difficult to mathematically formulate. More specifically, such a failure leads to slipping between the wheel and the ground, which results in the robot body slipping. This then leads to further slipping of the other wheels and the ground.
In addition, existing four-wheeled mobile robots with non-holonomic characteristics exhibit low mobility on the Moon’s surface, where wheel slip is maximized due to microgravity and soft soil [
5]. Furthermore, a failure in the steering motor of a rover may affect its wheel alignment. Even when only one of the four wheels becomes misaligned, the overall locomotion of the rover will be significantly affected by slippage [
3]. However, most studies on mathematical dynamics models concerning the interaction between the Moon’s surface and the rovers’ wheels have been based on a single wheel–terrain interaction and, thus, the results do not necessarily guarantee that the same degree of controllability would be achieved in four-wheeled systems.
Meanwhile, there has been a sustained interest in the field of fault tolerance and mobility health for planetary exploration rovers. However, many of the studies mainly focus on diagnosis to detect and classify mobility faults or sense errors [
6]. Otherwise, most research subjects are close to the implementation of the fault protection sequences through the human decision from a ground-in-the-loop solution [
7]. Recently, attention to self-reliant rovers (SRR) is on the rise as well due to the efficiency of unmanned rover activities as onboard computational autonomy becomes advanced [
8,
9,
10,
11]. According to a NASA report, multiple hardware failures have been recorded during the first seven years since the landing of the Mars Science Laboratory (MSL) exploration rover on Mars in 2012. The report highlights that one or more failures of steering or driving actuators are classified as primary risks to the rover’s mobility [
12]. Another report chooses the increased scope of fault response as one of the remained outstanding challenges in the field. The report emphasizes the importance of autonomous recovery from mobility failures for planetary rovers [
13]. In response to these challenges, some works study the development of hardware design toolkits for fault-tolerant systems [
14], while others have focused on software architecture based on control allocation to address actuator failures [
15]. However, these compensation strategies for actuator failure may not fully address the degree of the rover’s motion planning.
In recent studies, the focus on safety in motion planning of planetary rovers has been explored from various perspectives. Some researchers consider reliable motion planning to be an energy-efficiency issue [
16,
17,
18,
19,
20,
21,
22], while others focus on the development of classification and navigation technologies for avoiding hazardous terrain such as deformable or high-slip areas {Egan, #54;Blacker, 2021 #61;Ugur, 2021 #62;Tang, 2022 #65;Endo, 2022 #74;Blacker, 2021 #61}. Additionally, some works concentrate on the rover’s path planning itself, employing classical or learning-based methods [
23,
24,
25]. Simply put, these studies have primarily centered around traversability analysis and the prediction of slippage on rough terrain in the motion planning of rovers.
As part of these efforts, various studies aimed at predicting slippage on rough terrain have been undertaken. Some previous studies focused on estimating the exact degree of slippage [
26,
27,
28,
29]. Recent works aim to model wheel–terrain dynamics and calculate traction forces [
30,
31,
32,
33]. Additionally, some studies utilize machine learning or reinforcement learning to model the dynamics through the neural networks [
34,
35,
36,
37]. However, the common factor among these studies is to develop a more sophisticated wheel–terrain contact model for rough or soft terrain where slippage is likely to occur, with the aim of increasing the controllability of the planetary rover during normal operation without any failure. In addition, it is worth noting that predicting rover slippage accurately is a challenging task, especially on sandy slopes, and the accuracy of the resulting wheel–terrain modeling is uncertain.
The uncertainty associated with actuator failures presents a critical challenge for wheeled rovers, as mobility loss can result in a mission end. Given the inaccessibility of planetary rovers once they are sent into space, it is essential to have a contingency plan in order to address failure scenarios to ensure the successful completion of the mission without loss of mobility. Despite its importance, there is a limited amount of research on developing safe motion planning strategies for potential mobility failures of planetary rovers.
Therefore, we propose a method for failure-safe motion planning that maintains mobility even in events of motor failure. As previously mentioned, machine learning and reinforcement learning methods have been increasingly applied in various technology fields relevant to planetary rovers [
38,
39]. Thus, in this paper, we present a new reinforcement-learning-based approach for safe motion planning in the presence of actuator failure. This method utilizes only the kinematic model when all motors are functioning correctly, without incorporating any dynamic model such as the wheel–terrain model, even if the robot’s kinematic model cannot be accurately formulated due to unavoidable slip in the event of steering motor failure.
In this paper, a model-free learning approach is proposed for the failure-safe motion planning of a lunar rover. The lunar rover is a four-wheeled, two-steered mobile robot that is subject to loss of mobility in the case of steering motor failure due to the limitation of design and steering mechanism. Reinforcement learning is used to compensate for the unmodeled dynamics effects in the kinematic model in local motion planning by fitting parameters to environments for motor failure. As will be mentioned again later in the paper, the applicability of various model-free learning methods was evaluated and compared with respect to detailed classification criteria (including the state transition probability, the presence of target networks, the capability of calculating high degrees of freedom, and the required time for learning) [
40,
41,
42,
43,
44,
45], while also considering the control inputs of rovers performing as mobile robots with non-holonomic characteristics [
46]. As a result, the Deep Q-learning Network (DQN) was selected as the final approach.
The performance of failure-safe motion planning using DQN was verified for Moon exploration missions, which is associated with path planning. The ultimate purpose of a rover is to move from a starting point to a target point. Thus, the primary performance criterion for any rover controller is the success rate of reaching the target point. Because the reference path is linear, the robot can move along the optimal, shortest trajectory if it can approach the reference path as fast as possible, and follow the path until it reaches the target point. Thus, a secondary performance criterion is defined as the ability to approach the optimum path. The next performance criterion is the path-tracking ability, which verifies whether the actual path of the rover generated by the controller matches the given reference path. Two sets of experiments were performed. In the first experiment, the applicability and performance of the failure-safe algorithm were tested not only in a normal state, in which all motors were properly working, but also in a failure state, in which one of the steering motors was out of order. In the second experiment, the reliability of the reinforcement learning-based motion planning method was evaluated, and the results were compared with those obtained from a conventional control method.
The contributions of this paper are:
A kinematic model for controlling a four-wheeled Lunar Rover using a modified explicit steering mechanism is presented.
A new method for motion planning is proposed by combining the reinforcement learning approach with the robot’s kinematic model without the requirement for dynamics modeling such as a wheel–terrain interaction model.
A failure-safe algorithm is proposed as a safe motion planning strategy to address critical mobility limitations in four-wheeled rovers in case of steering motor failure. The algorithm is designed to ensure the execution of the rover’s mission even in the event of motor failure by scaling the above motion planning method.
The proposed failure-safe algorithm is validated through simulations on high-slip flat terrain resulting from steering motor failure, serving as a preliminary study prior to testing on rough or soft terrain in future work. Additionally, these works have practical value as the weight values of the deep neural network are pre-optimized through simulation and can be utilized for fine-tuning in subsequent physical experiments.
This work represents a meaningful prior study on an autonomous recovery mission execution strategy for SRR with mobility health management and maintenance issues.
This paper is composed as follows.
Section 2 provides a description of a four-wheeled two-steering lunar rover equipped with the modified explicit-steering mechanism.
Section 3 describes the failure-safe motion planning of the rover using DQN.
Section 4 presents experimental results obtained from two different sets of experiments. In the first experiment, the applicability of the failure-safe algorithm was evaluated when there was a failure in the rover’s steering motor, especially with respect to the success rate of reaching the target point, the ability to approach the optimum path, and the path-tracking ability. In the second experiment, the reliability of the proposed motion planning method was compared to that of a conventional control method.
Section 5 presents the conclusions of this paper, along with directions for future studies.
2. Hardware Development for a 4-Wheeled 2-Steering Lunar Rover
This section describes the rover that was designed in this paper. It was designed to fulfill the conditions required for moon exploration missions, while meeting the minimum operating requirements, as well as difficult-to-overcome limitations in its steering mechanism, where a failure in one of the two steering motors might result in the complete loss of steering control.
2.1. Development Requirements
Developing lunar rover hardware is generally limited by the physical constraints imposed by the requirements of Moon exploration missions. The maximum weight of a rover is 20 kg because of the limited loading capacity of Korean launch vehicles. The limited space available in lunar landing modules and the extreme environmental conditions of the Moon also critically affect the design of rovers. On Moon exploration missions, objects can be directly exposed to radically changing temperatures, from −153 °C to −170 °C [
47], as well as radiation, fine dust, etc., and thus the reliability of the rover’s actuation system should be considered the primary concern in its design.
2.2. Design and Mechanisms
All of the actuators and electronic components of rovers are installed inside a warm box to ensure actuation reliability. Only the mechanical components are exposed to the external environment. The torque generated by actuators inside the box is transmitted to the external wheels via a power transmission mechanism. This actuator-shielded body design not only ensures the overall safety and reliability of the system but also helps reduce its weight and volume.
In lunar rover design, the number of wheels must be as small as possible due to the limited space available in lunar landing modules. Given this constraint, a 4-wheeled system was proposed in this paper.
Figure 1 presents the steering mechanisms that are generally used in 4-wheeled systems. Ackermann steering [
48,
49] and skid steering [
50] have been widely used as steering mechanisms for 4-wheeled mobile platforms. While the number of active actuators is small, which is an advantage, these two methods are considered less suitable for the Moon’s environment, where actuation dynamics are more pronounced due to microgravity and the soft soil on the surface. In contrast, general explicit steering requires having the same number of actuators as wheels, and thus explicit steering-based robots tend to be large and heavy, which is not consistent with the ultimate direction of lunar rover hardware development.
As an alternative, a modified explicit-steering system was devised. This system is relatively less affected by slippage, while still satisfying all the requirements of Moon exploration missions. In this system, a single actuator is coupled to the front and rear wheels on one side through gears and two rockers. As a result, the alignment of the four wheels can be controlled using the internal torque generated by only two steering motors, and these motors are located inside the shielded body of the rover. This configuration is suitable for an actuator-shielded structure.
This modified explicit steering design was finally selected as the steering mechanism for the 4-wheeled lunar rover because it was suitable for the Moon’s extreme environmental conditions while satisfying all physical constraints [
4].
At the same time, a failure-safe design is needed to ensure that the rover can still move, even in case of failure. If the driving motor fails, the rover can end up losing mobility in the forward direction and, thus, will be unable to complete its mission. Therefore, although there is a trade-off relationship between the number of actuators and the weight and volume limits, additional redundancy must be built into the hardware design, especially the driving motors [
51]. Accordingly, a dual-motor system was employed in the design of the driving mechanism of the rover [
4]. In this system, each coupled set of front and rear wheels on both sides are controlled using two motors, and a total of four actuators were used. As shown in
Figure 2a, even if one driving motor of the robot breaks down, the other motor will continue to rotate while also bearing the additional load resulting from the failed motor, i.e., in a hardware backup scenario.
However, despite its various advantages (lighter weight and less affected by wheel slip), the modified explicit-steering system has the disadvantage of increased uncertainty in steering operations even when only one of the steering motors malfunctions.
Figure 2b presents two situations, one in which the right steering motor works properly and one where it breaks down, respectively.
Figure 2c illustrates the situation where the steering control of the rover is lost due to the failure of the right steering motor. Due to the addition of actuators in the driving system, there is a physical limit to the degree of hardware redundancy in the steering system. Therefore, a software-level solution must be prepared to respond to a failure in the steering motor, and this will be described in more detail in
Section 3.
2.3. Kinematic Modeling
It is very difficult to formulate precise kinematic models for non-holonomic mobile robots with special structural designs, such as lunar rovers [
52]. Therefore, the kinematics of lunar rovers need to be approximated, and this requires the following three assumptions [
53]. First, from a control perspective, given that explicit steering can minimize slip between the wheels and the terrain, it is assumed that the effect of both lateral and longitudinal slips is insignificant.
Second, given that the rover moves at a very low speed of about 1 cm/s, lateral slip can be considered insignificant compared to the longitudinal rolling of each wheel. Accordingly, a non-lateral slip condition can be assumed, as shown in
Figure 3b. Under this condition, the alignment of each wheel corresponds to the tangential direction of the gyration trajectory of the robot, and thus the rotation center point
of each wheel ends up being aligned with the instant center of gyration
.
Therefore, as shown in
Figure 4, the distance between this point and the center point of the rover body corresponds to the radius of gyration
of the rover, and thus the alignment of each wheel
can be expressed as in Equations (1) and (2), where
L and
W refer to the length and width of the robot body, respectively.
As can be seen in
Figure 5a, each wheel’s alignment
is controlled by both the lever’s displacement
and
and, therefore, the relationship can be expressed as a proportional expression with respect to the revolution of the bevel gear ratio
. Simply put, in the 2D Cartesian frame, the relationship between the steering motor input
and the radius of gyration
of the robot can be expressed as in Equation (3).
The last assumption for the approximation of the kinematic model is the pure-rolling condition. This assumption is made based on the fact that the effect of the longitudinal slip of each wheel is also very small due to the slow movement of the rover. If this assumption holds, the relationship
is satisfied, as shown in
Figure 3c. Accordingly, in the 2D Cartesian frame, the relationship between the linear velocity
and the dual-motor system’s output rotation speed
can be determined.
As shown in
Figure 4, the radius of gyration
of each wheel can be defined as in Equation (4), given its symmetric steering structure, where the constant
refers to the steering offset due to the physical limit of wheel alignment.
By substituting these relationships into Equation (2), the following equation can be obtained:
Each wheel’s rolling speed in Equation (5) can be expressed as a relationship between the radius connecting each wheel and the rotation center point and the angular velocity of the entire robot trajectory. In addition, is also associated with the steering velocity of wheel alignment , where refers to the radius of each wheel.
As shown in
Figure 5b,
can be expressed as
, a proportional expression with respect to the composite gear ratio
, and thus Equation (6) can be obtained as follows.
Finally, given the identical relationship between Equations (5) and (6), a simplified expression, can be obtained as follows.
Overall, in the 2D Cartesian frame, the angular velocity of the robot is proportional to the dual motor system’s output rotation speed , and accordingly, the linear velocity of the robot can also be controlled by the relationship .
These results indicate that the kinematic model of the rover can be mathematically approximated in the condition where all motors are working properly [
4]. Rover control can be implemented using this approach. More specifically, the forward kinematics can be calculated with respect to motor control inputs
using the relationships defined in Equations (3) and (7). As a result, the radius of gyration
and angular velocity
of the rover can be obtained in the 2D Cartesian frame, and the rover’s status
can, accordingly, be controlled, as shown in
Figure 3a.
However, a failure in even one of the steering motors leads to a situation in which the locomotion of the rover is inevitably affected by the slip of each wheel, and this means that all the assumptions made above no longer hold, and accordingly, the approximated kinematic model cannot be used. Therefore, for rover control, a new dynamic model that considers the resulting force generated by the slip of each wheel must be developed. As shown in
Figure 6 and
Figure 7, however, any slip occurring between one wheel of a 4-wheeled mobile robot and the terrain changes the position of the center point of the robot, and this, in turn, affects the slip between each of the other wheels and the terrain.
A wide range of studies have been performed to derive mathematical models that represent the contact dynamics between a single wheel and the terrain, but more complex dynamics between the four wheels as a whole and the terrain have not been sufficiently studied [
54]. Going forward, further research is also needed in the area of whole dynamics modeling, which focuses on the slip between the body of 4-wheeled mobile robots and the terrain [
55].
It is also very difficult to formulate contact dynamics with respect to the slip between each wheel and the terrain in an environment where the terrain conditions are constantly changing over time. Furthermore, even if such modeling was possible, the calculation drawn from the process would not be suitable for the real-time control of robots in unconstructed terrains, as on the Moon’s surface.
4. Experiments
4.1. System Setup
The deep neural network model DQN was implemented using Keras [
64], a deep learning library. The failure-safe algorithm was developed using OpenAI Gym [
65], an algorithm development library for reinforcement learning. The entire learning process for motion planning was performed based on an interaction with a test environment established via ROS Gazebo simulations [
66].
A 3D model of the full-scale lunar rover built in this paper was employed, and the model was designed to have the same mechanical properties as the actual rover, especially with respect to the mass, inertia, gear ratio, etc., as shown in
Figure 16, to closely simulate the experimental results obtained in the Gazebo simulation environment. In addition, the static and kinetic friction coefficients of the actual testbed floor on which rover moving experiments were performed were measured and applied to the simulation environmental variables, to faithfully replicate the real-life slip between the rover’s wheels and the ground, ensuring that the approximated physical interactions were implemented in the simulation environment as well in order to obtain closer results for applying to the actual rover. The wheel–terrain interaction model is simplified to the point contact model because the single tip of the wheel’s grouser contacts the surface sequentially while the wheel is rolling due to the slow movement of the rover. Uncertainty about this wheel–terrain interaction model is compensated by the reinforcement learning method while interacting the robot’s wheel model with the testbed environment through the dynamic engine of simulation.
Future works will be conducted on physical robots and test beds to achieve the sim-to-real goal. The nature of reinforcement learning algorithms requires preceded numerous trial-and-errors. However, it is challenging to consistently repeat them under the same conditions in physical experiments. To address this issue, we first optimize the weight value of the deep neural network through simulation. The pre-optimized weight value is then utilized for fine-tuning in physical experiments.
The control system of the actual robot was developed in the ROS Melodic [
67], and Ubuntu 16.04 environment, as shown in
Figure 17a. In this environment, detailed features, including motor control, data communication between the hardware interface and the embedded PC, and localization using RGB cameras (RealSense D435) and AR Marker, were implemented. The control system of the simulated robot model is presented in
Figure 17b. The actual full-scale robot and the testbed were simulated as 3D models using Gazebo simulations [
68]. A joint controller package with the same gear ratio as that of the actual robot was employed in the simulation model, replacing the encoder and motor driver. The robot’s localization data were obtained from the simulated testbed environment instead of using RGB cameras and an AR Marker. Here, the noise was intentionally included in the data.
4.2. Parameter Setup
In this paper, and were selected as the starting point and the target point, respectively, considering the actual dimensions of the testbed for the rover moving experiments. Meanwhile, boundary conditions were needed to determine whether each training episode was a success or a failure. If the agent took a certain action, and the resultant values of the five observation variables regarding the state of the rover were within these boundaries, the corresponding episode was considered to be still ongoing. Accordingly, rewards were given based on the three criteria, and the next action was then executed. In the meantime, when the state values affected by any actions of the agent were no longer within the boundaries, the corresponding episode was immediately terminated and then evaluated to be either a success or failure based on the final state values and boundary conditions. If even just one of the five observation variables was out of the boundary limit, the episode was considered to be a failure, and the corresponding cumulative reward was deducted. Then, the state of the robot was reset, and a new episode begun.
Finally, the thresholds
,
, and
needed to be adjusted. These parameters are directly related to the applied reward strategy, and critically affect the motion planning of the agent. The smaller these thresholds are, the longer the training takes, but it makes it possible to train the robot to perform more sophisticated motions.
Figure 18 presents the setting margins for each criterion and the boundary conditions as the reset criterion. As discussed in 3.2.2 above,
determines how closely the agent will approach the reference path. If
is within this range, an even greater reward is given, and the size of the reward depends on the degree of closeness. Given that the width of the rover body is 300 mm, the threshold
was set to 100 mm.
represents how faithfully the agent follows the reference path. If
is within this range, a greater reward is given. In this paper, this margin angle was set to ±10° to allow the robot to learn to perform motions while staying within the reference path.
is the threshold based on which the agent is considered to have reached the target point. If the final
value of the agent is less than this threshold, the corresponding episode is finished. In this case, the episode is considered to be a success. The success radius
was set to 200 mm, about half of the total width of the robot, including the wheels.
As described in 3.2.2, the robot actions in the first experiment were composed of three discretized locomotion options.
Figure 19a illustrates how the three inputs of the forward kinematics that are fixed at
,
, and
are translated into
,
, and
, respectively, i.e., output actions in a normal state.
Figure 19b shows how these three output motions differ in the case of a failure in the right steering motor of the robot, even though the same inputs as before are entered. The figure demonstrates how the red paths deviate from the original routes observed in a normal state when the robot makes turning actions. This is because the wheel slippage caused by the malfunction of the steering motor significantly affects the moving performance of the robot.
In the second experiment, the same test conditions were applied to compare the two different methodologies. Thus, the robot’s actions were constrained.
Figure 20a shows that numerous action options may be produced as outputs of the forward kinematics depending on the angular velocity
. Only three output motions were allowed, as in the previous experiment, but the
value was continuous in the PID Path Follower, and this led to a difference in the moving speed of the rover when making turning actions. In order to ensure the same test conditions as in the first experiment, the radius of gyration
was fixed at
, and the time per action step was set to
. As can be seen in
Figure 20b, the gyration radius of the rover was the same in all experiments.
Thus, although in the second experiment only three discretized steering motions were allowed in gyration, there can be numerous action options as outputs depending on the angular velocity. The yellow path is the trajectory of the rover in the case of a failure in the right steering motor, which differs from that in a normal state. When the control cycle is the same, this approach is more advantageous than the DQN in motion planning, thanks to its higher DOF with respect to the angular velocity. Except for that, all other test conditions were the same as in the first experiment.
4.3. Experimental Details and Results
The applicability of the DQN as a failure-safe motion planner for lunar rover control was verified by performing two sets of experiments in a virtual environment.
In the first experiment, the applicability of the failure-safe algorithm was evaluated, especially with respect to the mechanical unit of the lunar rover equipped with special steering mechanisms. The mission execution performance of the rover was evaluated based on three performance criteria, the success rate of reaching the target point, the ability to approach the optimum path, and the path-tracking ability, both in a normal state and when there was a failure in the right steering motor. In the normal state
training episodes were performed, and 1000 were performed for the steering motor, failure case, as shown in
Figure 21a. A total of 50 test episodes were performed for normal and failure conditions, respectively, as shown in
Figure 21b. The actual paths of the rover and the perpendicular position error from the optimum reference trajectory were repeatedly measured, and the averaged results were obtained, as shown in
Figure 22.
In the second experiment, the reliability of the failure-safe motion planning algorithm using the DQN was verified, especially with respect to linear paths given for each sequence of the Moon exploration missions. A PID controller-based tracking approach, among various conventional control methodologies, was employed as a comparison group, and their performance was assessed and compared based on three performance criteria. A PID controller tuned with optimal values was used, and 50 test episodes were performed in a normal state, while 50 test episodes were performed in the case of a failure in the steering motor in the same way as in the first experiment, as shown in
Figure 23. Similarly, the averaged actual paths of the rover and the averaged perpendicular position errors between the actual position of the rover and the optimum reference trajectory were repeatedly measured, and the averaged results were obtained, as shown in
Figure 24.
4.4. Applicability of the Failure-Safe Algorithm
The results demonstrated that the DQN-based motion planning algorithm developed in this paper was applicable as a failure-safe solution for rover control in case of a failure in the steering motor. The detailed results are presented in
Figure 25.
The averaged actual paths of the rover in the first experiment proved that there was no significant difference in its ability to approach the optimum path, between the normal and failure states. In both cases, when the rover approached the linear path, their initial coordinates were found to be similar to each other , .
The averaged perpendicular position errors between the actual position of the rover and the optimum reference trajectory were in the normal state and in the failure state, respectively. Despite the steering motor malfunction, the position error was less than , the threshold . These results confirmed that its path-tracking performance was sufficiently high for the given mission.
Finally, the test for the success rate of reaching the target point was conducted 50 times in the normal and failure states, respectively. The last row of
Figure 25 shows the detail. The success rate in a failure state was about
lower than that in a normal state relatively. However, although the total number of cases in which the rover arrived was lower due to the failure of the steering motor, the overall success rate still remained high, at
. In addition, the position error between the target point and the average final position of the rover was
in the failure state, but that figure was less than
, the threshold
, when the rover was considered to have reached the target point. These results also showed that the success rate of reaching the target point was also within the allowable range.
4.5. Reliability Comparison with a Conventional Control Method
The second experiment was performed to verify the reliability of the motion planner using the DQN, and the results were compared with those obtained from a conventional control method. In this paper, among various linear path-tracking methods, the PID Path Follower was employed as a comparison group thanks to its reliable control performance for a linear function, such as constant inputs.
The control loop of the PID Path Follower used in the experiment is presented in
Figure 26. The controller receives data on the position error (
) between the robot’s current position and the reference trajectory as a control input from the robot state calibrator. Subsequently, the controller adjusts the robot’s steering angular velocity
on a 2D Cartesian coordinate system to minimize this position error. The PID Gain was set as follows:
,
I, and
. These values were determined from repeated parameter tuning, which ensured that the 4-wheeled 2-steering lunar rover would exhibit its best performance as a mobile platform in a normal state.
The results of the second experiment showed that, in a normal state, the Path Follower based on the PID controller with optimized turning parameters was superior to the DQN-based motion planning method in terms of the ability to approach the optimum path and the path-tracking ability. However, when the steering motor failed, the robot approached the reference path at a later step (, ) when the PID controller was used with the same parameters applied, as shown in the averaged actual paths of the robot. Furthermore, the position error between the current position of the rover and the reference trajectory exceeded the threshold , indicating that the path tracking had not been effectively performed in a reliable manner.
An overall comparison of the two methods is presented in
Figure 27. The averaged actual paths of the robot in a failure state showed that the DQN-based motion planner allowed the robot to approach the reference trajectory relatively faster than the path follower using the PID controller. The perpendicular position errors between the current position of the rover and the optimum reference trajectory also verified the superiority of the DQN-based motion planner in path-tracking performance.
Further result analyses are presented in
Figure 28. In the normal state, the number of steps required for the rover to reach the target point was
for the PID controller and
for the DQN-based motion planner, respectively. In the failure state, the figure was
for the PID controller and
for the DQN-based motion planner, respectively. Regardless of the steering motor failure, the DQN-based motion planner was found to be more efficient, requiring less time for the rover to arrive and, thus, reduced costs.
In summary, as previously discussed in
Section 4.2, the path follower using the PID controller is considered to be more flexible for motion planning because it can produce numerous motion options with the steering angular velocity
. However, the DQN-based motion planner generally requires less time cost and is also superior, especially in a failure state, in terms of the ability to approach the optimum path and path-tracking ability.
5. Conclusions
In this paper, for a four-wheeled two-steering lunar rover, a deep reinforcement-learning-based failure-safe motion planning method has been studied. One of the characteristics of this mobile robot platform is that it is equipped with a modified explicit-steering mechanism. Four-wheeled lunar rovers have advantages in terms of compactness because they have fewer wheels, and better fit the limited space available in launch vehicles. The modified explicit-steering mechanism in this paper minimizes the number of actuators required, reducing the overall weight of the system. Furthermore, this mechanism allows rovers to have the robust structure for the Moon’s environmental conditions, with steering motors located inside the rover body. This paper proposed a reinforcement learning-based failure-safe motion planning algorithm to solve the critical mobility limitations issue in the non-holonomic four-wheeled rovers and uncertainty in rover control due to various environmental factors at play on the Moon’s surface. The proposed motion planning method combines the reinforcement learning approach with the robot’s kinematic model without the requirement for dynamics modeling such as a wheel–terrain interaction model. In addition, the failure-safe algorithm provides a safe motion planning strategy to ensure the execution of the rover’s mission even in the event of steering motor failure. To validate this method, simulations on high-slip flat terrain resulting from steering motor failure were demonstrated.
The experimental results obtained in this study confirmed that the proposed failure-safe algorithm exhibited comparable mission execution performance in a normal state and in the case of a failure in the steering motor. In the first experiment, the applicability of the failure-safe algorithm was validated. Especially, in the failure state, the overall success rate of reaching the target point was found to be high, at , and the position error between the target point and the average final position of the rover was while satisfying the threshold of less than . In addition, the ability to approach the optimum path was evaluated to be sufficiently high because the initial coordinates that the rover approached the reference path were comparable to that of the normal state , . The path-tracking ability was also suitable for the given mission, as the averaged perpendicular position errors were and less than the threshold of .
Meanwhile, it was also found that, in the case of a hardware failure, i.e., a steering motor malfunction, the Deep Q-learning Network was more reliable than a conventional control method for motion planning for the four-wheeled two-steering lunar rover. The second experiment was demonstrated to verify the reliability of the proposed motion planner, where the performance of the conventional path follower and the DQN-based motion planner was compared. The result showed that the DQN method’s performance was high in the case of a steering motor failure compared to the conventional control method. The ability to approach the optimum path was also relatively high in the proposed method, with a convergence point of , , while the conventional path follower’s point was at (, ). The path-tracking ability of the conventional path follower was not enough to satisfy the condition as the position error between the current position of the rover and the reference trajectory exceed the threshold. Furthermore, it was additionally founded that the DQN-based motion planner required fewer steps for mission success in both normal and failure states than the conventional path follower. More specifically, the number of steps required for the rover to reach the target point was respectively in the normal state and in the failure state, which was less than the number of steps required by the conventional path follower. These results indicate that the proposed method reduced the time and cost required for the rover to arrive at its destination.
The reinforce learning-based failure-safe algorithm proposed as a motion planning solution for the four-wheeled two-steering lunar rover can be effectively employed in a failure state, but also used as a control system in a normal state by extending its applicability to include various ground types, such as rough or deformable terrains. A slippage-robust algorithm, a subject for future studies, can be applied to various ground conditions when it is difficult to predict the wheels’ slip behavior. With this level of scalability, the developed algorithm can be adopted as a flexible motion planning solution that effectively suits the unique conditions of each planet. In addition, this algorithm is valuable as a failure-safe solution for planetary exploration rovers, given the nature of this type of rover, i.e., difficult to maintain.