1. Introduction
Since the 1980s [
1], the locomotion of quadruped robots has been widely investigated to ascertain a stable and more effective walking motion. Studies have typically involved either the moment of inertia for the entire robot system [
2,
3,
4] or the zero-moment point (ZMP) [
5,
6] to optimize the trajectory for the robot’s center of mass in order to allow the robot to walk in a stable fashion. To this end, the locomotion control inputs the optimized center-of-mass (COM) trajectory result into the inverse kinematics calculations to obtain the joint solution.
As a promising stable walking approach for a quadruped robot, the robotic dynamic could be modeled as the optimization objective, in which the walking stability, such as the zero-moment point or the inverted pendulum, would be derived as the constraint. However, the resulting calculations are tedious, at best. For example, Jin et al. argued that using an adaptive controller for quadruped locomotion (ACQL) could provide a recursive update. so that the external force of the control law could be incorporated with an inverse-dynamic-based quadratic programming (QP) method to realize an optimized trotting gait [
7]. However, although the ACQL provided an optimum and stable walk for the quadruped, it could only do so at the expense of time efficiency. In another study, Chen et al. proposed the implementation of quadratic programming (QP) for the optimization of the model predictive control (MPC) of the inertia of a robot that could effectively generate optimum locomotion [
8]. In this case, the robot walking dynamic involved a non-linear model, thus requiring the non-linear programming (NP) optimization method. Among the NP methods for optimizing robot trajectory, the method using mixed-integer quadratic programming (MIQP) has been the most promising for an optimized solution. MIQP has been related to the problem of optimizing a quadratic function over the points in a polyhedral set containing some integers and continuous variables [
9]. This optimization was also implemented by Ahn et al. [
10] for optimizing the center-of-mass trajectory for quadruped robot locomotion control. This implementation demonstrated that the MIQP method could provide an optimal solution and offered a time-effective optimization process, as compared to other non-linear programming optimization methods. However, this optimization task required that every walking step be redefined, which affected the time efficiency related to the overall walking performance of the quadruped. To address this issue, this study extended the conventional locomotion manipulation method and implemented a machine-learning algorithm to simplify the control policy while generating the joint solution. By reducing the optimization time, the time efficiency of the walking simulation was noticeably improved. The simplified control policy could effectively generate results for the quadruped in a short time, which enhanced the computational time during locomotion. The advantage of this implementation was that it omitted the optimization time in the conventional method and stabilized the quadruped’s walking locomotion.
Machine learning involves two main learning methods: reinforcement (unsupervised) learning and supervised learning. There has been substantial research on locomotion control for legged robots using reinforcement learning, which has provided a practical and general approach in building controllers. These trained controllers have been used to perform a range of sophisticated tasks. However, using the designated behaviors to create the reward functions necessary for reinforcement training involves a tedious task-specific tuning process. In addition, a trained model based on reinforcement learning can adopt unnatural behaviors that are dangerous and infeasible in real-world applications, particularly if the reward functions have not been designed carefully. Therefore, most studies that promote reinforcement learning for locomotion control will require choosing from a large controller family to develop a specific controller suitable to the task at hand. For example, Tan et al. [
11] created a dynamic motor model for a “Minitaur” quadruped robot that was used to learn the walking policy for locomotion control. In another study, Jain et al. [
12] proposed a hierarchical control approach for reinforcement learning to create a high-level and low-level control scheme for both the quadruped base and the overall locomotion control.
Supervised learning-based models could be trained with a reliable dataset for locomotion. This eliminates the unpredictability of exploring and exploiting an action for locomotion during the training process, such as reinforcement learning. The supervised learning approach would be more efficient than reinforcement learning methods from the standpoint of computational efficiency during the learning process. With a reliable dataset, a joint model training using supervised learning could yield lower inaccuracies in the joint solution prediction, which could then lead to an optimized trained joint model suitable for autonomous walking scenarios.
The benefits of a trained model could provide a one-off solution to support robotic control while maintaining time efficiency. By understanding the process of training models with supervised learning, the accuracy of a trained model could be pre-evaluated for reliability. Reliable control accuracy and time-efficiency in quadruped locomotion control have provided the motivation for this study. Although Hiroki et.al identified the benefits of supervised learning implementation in robotics, a training method for a high degree-of-freedom robot remains unsolved [
13]. Therefore, we focused on training the individual joints model of the quadruped and then combining them to perform a walk.
Therefore, this study proposed an offline approach for building a simplified locomotion control policy using a supervised learning method for a 12-degree-of-freedom quadruped robot. The joint solution generated from the conventional method proposed by Ahn et al. [
10] was used, which has been proven capable of performing stable quadruped walk motions and whose sample data were, therefore, suitable for generating a joint model using a supervised learning algorithm. Therefore, the dataset used in this study originated from the stable walking control algorithm proposed by Ahn et al.
After obtaining the joint model through the application of a supervised learning algorithm, the robot’s leg joints were trained to predict the joint solution for a walking sequence. Furthermore, the data features from the previous study’s dataset included the robot states (robot position before and after each single-cycle walking motion) and the respective sequence of joint solutions. Each walking sequence involved the analysis of multi-input and multi-output data features for joint model training. Among the training methods available for supervised learning, the multilayer perceptron algorithm was ideal for implementing multi-input–multi-output (MIMO) training. Therefore, the multilayer perceptron method was implemented in this study. The trained models were compared with a test set (see
Section 6.1) and were subsequently evaluated using a 3D quadruped robot in the GAZEBO simulation environment (see
Section 6.2).
The remainder of this paper is organized as follows. The robot configuration is described in
Section 2. The concept of ZMP is described in
Section 3. The data generation method is explained in
Section 4. An overview of the control policy is presented in
Section 5. The evaluation results of the proposed control policy are presented in
Section 6.
6. Results and Discussion
To evaluate any enhancement in time efficiency using the proposed method for robot locomotion, the proposed method was compared to the conventional method proposed by Ahn et al. [
10]. To compare the performance of both methods, they were tested 5 times using the same parameters, including the same starting and goal positions of (0, 0) and (0, −0.25) as well as for the distance allowance.
Table 5 shows all results for the distance error (D, units in centimeters (cm) and time T, units in seconds (s)) after completing the walking simulations, with the respective average value (Ave.) shown for each measure. The result showed that the average time spent to complete the walk was half the time, as compared to the conventional method, with a result of 91.9862 s. These results proved that the proposed method significantly improved the locomotion time efficiency of the quadruped robot and enhanced its accuracy to arrive at the desired target coordinate.
There were two evaluation methods that could be used to investigate the joint model performance: comparing the predicted parameters from the trained models with the test dataset, and testing the trained model in a walking simulation. The results are presented in
Section 6.1 and
Section 6.2, respectively.
6.1. Trained Model Evaluation
In this study, the joint models were trained using different numbers of epochs (100, 500, 1000, 2500, 5000, 7500 and 10,000) to identify the best prediction model. The results are shown in
Figure 6.
The results indicated that joint models trained using more than 5000 epochs began showing a saturated result for the root mean squared error at 0.005 radians2 with a standard deviation value of 0.005 radians. This result indicated that the accuracy of the joint prediction was comparatively higher than that of the models trained using fewer than 5000 epochs, indicating that the latter exhibited an under-fitting performance. That is, the number of training epochs was insufficient to enable the trained model to provide accurate predictions. In contrast, joint models trained using more than 5000 epochs yielded similar evaluation results as when 5000 epochs were used; therefore, the optimized number of training epochs was 5000 for the model architecture used in this study. To verify the performance of the trained models, a walking simulation was conducted to evaluate the distance error and the total running time, as discussed in the following section.
6.2. Walking Simulation
The joint models trained using different epoch numbers were used to predict the joint solution in the walking simulation five times for each model. Using the conventional walking simulation based on the COM optimization as the benchmark for this study, the running time of the simulation was limited to 250 s for performing a walk from the origin (0 m, 0 m) to the goal (0 m, −0.25 m). The average results for the distance error and running time were calculated and are plotted in
Figure 7 and
Figure 8, respectively.
Referring to
Figure 7, although there was some fluctuation in the distance error as the number of epochs increased, the distance error for the joint model trained using 5000 epochs achieved the lowest distance error of 7.197 cm. This value was even lower than that of the conventional walking simulation (7.9738 cm) using COM optimization. The trained model, therefore, enhanced the walking performance by reducing the distance error.
Figure 8 shows the curve of the simulation running time versus the number of epochs. The shortest running time of 91.986 s for a successful walk was achieved using the model trained for 5000 epochs. This was consumed by the conventional walking method simulation (189.7820 s) approximately half of the time.
The model evaluation and simulation results showed that an epoch number of 5000 was the optimum number for training the joint models used in this study. These results also proved that the supervised trained joint model enhanced walking simulation performance by reducing both the distance error and the computational time.
Figure 9 shows each trajectory of the robot’s joints with respect to a change in the robot’s COM in the Gazebo simulation (
Figure 10). Trajectories of each joint in
Figure 9 proved to have a similar pattern as the data that were collected in
Section 4.1 and the data generated in
Figure 3. The trained joint model in
Section 6.1 evaluated the accuracy of the robot locomotion as reliable for stable walking.
Furthermore, to verify the robot’s walking stability, the rotational angle of the robot (in both the pitch and roll directions) was recorded during the walking simulations.
Table 6 indicates that the minimum and maximum values for the angle in the pitch and roll directions ranged from −0.0308 to 0.0554 rad and from −0.0433 to 0.0242 rad, respectively. The angle variation gap for the pitch direction was 0.0987 rad, and that for the roll direction was 0.0675 rad, which was much lower than the angle values that could cause an unrecoverable tilting action. Therefore, the walking simulation indicated that a stable walk could be achieved by implementing the trained joint models using the method proposed in this study.
7. Conclusions
In this study, the employment of supervised learning could assist in improving the conventional method of robotic control. Implementing the dataset generated from the COM optimization solution based on MIQP and ZMP theory to train the joint models of a four-legged robot was a promising method for achieving suitable walking stability. In general, quadruped robot locomotion involves 12 degrees of freedom (DoF) for locomotion control, which has proved to be difficult when defining the reward rules that are needed in reinforcement learning. In deep reinforcement learning, to achieve higher rewards, 12 variables must be manipulated through trial and error, which may require a long or unpredictable running time and may cause unpredictable robot behavior. In contrast, by providing the collected data from the simulations, using both the joint output and robot state information, as the input could be sufficient to train the joint models for successful quadruped walking tasks. In addition, training quadruped locomotion requires high computational performance to manipulate and alter the memory of the learning process [
11,
12]. Moreover, the proposed method of this study provided a light computational process to train a high DoF locomotion robot, resulting in a time-efficient model.
Therefore, employing a multilayer perceptron architecture for supervised training of the multi-joint models of the quadruped was more time-effective and lower in cost than deep reinforcement learning. For validation of the trained joint models, each model was shown to have a low mean squared error value, indicating that the model prediction could perform a joint solution prediction close to the joint solution generated by the COM-optimized algorithm. Meanwhile, the reliability analysis of this trained model for an autonomous walking simulation showed that the walking robot could reduce the distance error tolerance, as compared to the simulation using the conventional method, as well as the total computational time required. Therefore, implementing the dataset from the COM optimization study and using a multilayer perceptron for each joint model under supervised learning was an effective method for simplifying the control policy for autonomous walking of a quadruped robot. Though the outcome of this study, in terms of the time efficiency and accuracy enhancement, met the desired objectives, it could be further extended by combining the perception module for acquiring the environment state information.
To stress the significant contribution of this study’s proposals, it should be noted that we built a correct schematic for joint model training. This should help to speed up computations in robotics applications. In particular, the contribution of this work in artificially intelligent robots may extend its applicability to robotic research.