1. Introduction
In today’s robotics, the need for precise manipulation has encouraged the development of human-like robots. This has led to the emergence of manipulator arms of all kinds of shapes and configurations in order to achieve a system capable of reproducing natural human-inspired movements. To guarantee this similarity, robotic arms have a large number of degrees of freedom (DoF). For this reason, path planning and control for these platforms have become a very difficult task [
1]. The most common way to solve this kind of problem in robotics is to use inverse kinematics. However, redundant manipulators carrying out complex tasks hinder classical methods from successfully solving this problem.
The most common way to overcome this is based on the use of different learning methods. Learning algorithms would obtain a policy that works in a similar way as a human does. The main objective of these strategies is to identify patterns and then try to generalize the more relevant features for the given data. With that information, the algorithm tries to reproduce new behaviors resembling previous learned experiences, even if there is a change in the environment of the study or if it is unknown. When applying these methods to manipulation, the concept of learning in complex manipulating tasks lies in the use of knowledge taught by a human who shows how to solve a task and the generalization of that information for similar tasks [
2,
3].
Traditional learning methods have shown good performance for the resolution of manipulation problems [
4,
5]. However, the process needed in those methods to obtain the learning policy generates a mathematical model which usually relies on probabilistic terms, which makes these learning methods stochastic. Depending on what would need to be solved, this characteristic may not be a desirable property, since even with the same demonstrations, different solutions could be obtained in each test. This stochastic behavior may cause algorithms to end up obtaining a solution that does not converge.
In this paper, we propose a learning-based solution that overcomes typical problems from both traditional planners and learning methods. It is an extension of a previous work proposed in [
6,
7], which focuses on the generation of trajectories for a robotic manipulator for tasks in two dimensions and without the use of imitation learning techniques. We present the fast marching learning (FML) method, which is a deterministic and asymptotically globally stable learning algorithm created for learning point-to-point 2D and 3D tasks based on the fast marching square (
) method [
8,
9]. The proposed learning approach is either referred to in this work as fast marching learning or as learning based on the fast marching square.
Initially, paths are kinesthetically taught to the robot by guiding its robotic manipulator. Then, these trajectories are represented by
, which is a path planner based on how the light propagates. This treats the environment where it is applied in a continuous way, deriving multiple benefits which will be discussed later. FML applies
on every learned path to parameterize experience, which is later merged for obtaining a global knowledge representation. Additionally, some relevant optimizations are made with respect to our previous publications and a new concept of auto-learning is presented herein. A direct relation between the proposed method and common learning techniques, such as reinforcement learning (RL), is also carried out. Finally, a large amount of testing trials on a real robotic manipulator are performed to fully validate the functioning of the proposed strategy in 3D complex indoor environments. Regarding the state-of-the-art techniques and the previous work presented in [
6,
7], the following advantages were obtained:
The generated trajectories are ensured to be smooth and local minima-free, which means that these always converge.
The path planner method is complete. If there is any solution, it is always found.
The proposed approach can handle high-dimensional data and it can learn from one or more demonstrations.
The method presents an ability to explore unknown environments even if it has not been taught beforehand, searching for the fastest paths. In addition, this exploration allows one to work on dynamic environments where objects that were not accounted for during data acquisition are present.
It can guarantee a correct performance regardless of the variation of starting points even in the case of high-dimensional complex motions.
It is robust against erroneous paths learned during data acquisition.
The method overcomes state-of-the-art methods in terms of similarity with respect to demonstrations.
The method improves on previous works because it can be applied to a real robot, makes use of kinesthetic learning techniques and can be applied in both 2D and 3D.
The work presented in this paper is focused on the use of a specific learning method known as learning by imitation [
10], which is also referred to as programming by demonstration (PbD) [
11]. In this type of learning, the robot is provided with a set of demonstrations as input. These demonstrations can be given in different ways, including by observing human demonstrators doing a task, collecting data by haptic devices and physically guiding the robot to emulate the task that must be learned (i.e., kinesthetic teaching). When dealing with learning by human demonstration, motion capture (MOCAP) suits are usually the preferable option, as highlighted in [
12]. This methodology is based on marker point measurements from a 3D motion capture system. However, it requires the system to deal with the re-targeting problem, since robotic arm joints do not necessarily have the same structure as the human arm. Regarding haptic learning, there are different options that can be developed. One of the most common ways consists in the use of some haptic device which may be similar to a joystick [
13] that the user controls to generate training data. Other common ways to work with the haptic method are based on the use of a twin master to capture data [
14]. Typically, that twin master is a simplified version of the robotic arm, which is provided with a sufficient amount of sensors to capture the required data and to report the user through haptic information on any problem that may occur. Another important detail is that those devices normally have the same structure as the real arm, which eliminates the re-targeting problem. However, these haptic-based method have some drawbacks. Given the great amount of DoF of manipulator arms, it may not be intuitive to operate these devices through just a joystick. In case of using a twin master, possible delays between the twin and the real arm movements could also occur, which could lead to delays in data collection or even potential collisions when performing complex trajectories. Kinesthetic teaching solves all these problems.
First of all, as for the demonstration (data collection) and reproduction of the learned movements, the same embodiment is used: the real robot. Hence, the re-targeting problem does not appear and furthermore, no delays in the data acquisition process exist. Second of all, data collection is performed in an very intuitive way by moving the robotic arm through the desired positions. This process generates point-to-point trajectories, which will be used as input for the learning algorithm. When performing kinesthetic teaching, two different types of tasks can be identified. Firstly, there are tasks where the path does not necessarily require critical accuracy. In those kind of tasks, the main objective is not to move to a specific point but to perform a certain controlled dynamic in the movement. This type of task focuses on teaching the robot how to complete a given task without any data about the initial or goal position. Examples of these kinds of missions include ability games taught to the robot, like the
ball in a cup game [
15,
16] or the work in [
17], where a robot learns how to play table tennis using a variety of simple motor tasks which represent just elementary actions. The most common way to solve these learning tasks is by using dynamic motion primitives (DMP) [
18,
19]. The main idea is to teach simple functional movements that can be combined and adapted to solve new and more complex tasks. Based on this idea, reinforcement learning techniques are found in the literature to carry out this learning process of primitives [
20,
21]. Even though these methods are helpful for the above-mentioned applications, this paper is focused on manipulating objects in a certain location. Hence, the followed trajectories are not initially optimized with respect to distance, requiring additional steps.
The second type of task that can be performed for motion learning are known as point-to-point tasks. In this case, a specific goal state is given while the initial state may vary. The aim for this type of motion learning is to teach the robot how a discrete motion is performed. Hence, this is the learning type in which our work is focused. In order to face this problem, different approaches in the literature are found and reviewed. Some works propose the use of gradient techniques for point-to-point task learning. The authors in [
22] defined an extension of the Maximum Margin Planning (MMP) [
23]. The algorithm, called LEARCH (Learning to Search), uses gradient descent techniques to learn the cost map given the training paths, which were generated by an expert. The strategy tries to imitate the expert policy by performing regression upon features from paths using support vector regression (SVR) and iteratively updates the map with that knowledge. This map is then used to plan the needed path for the task through any path planning algorithm. The Unified Motion and Variable Impedance Control (UMIC) technique was presented in [
24]. It defines a control policy to regulate both the robot movement and its dynamic interactions with the environment. The main objective is to encapsulate the potential function gradient and the curvature in the same model so that motions resemble the learned path while respecting the velocity profile taught in the demonstration. It is time-invariant and the learning process can be carried out based on a few demonstrations by solving just two constrained quadratic optimization problems. However, the main issue of using gradients is that there is a risk of deriving into local minima. The authors in [
25] proposed the stable estimator of dynamical systems (SEDS), which constraints the Gaussian mixture regression model parameters to learn globally stable motions. One of the main issues from this method is that strict constraints limit the learning accuracy from taught data. In [
26], the algorithm named control Lyapunov function-based dynamics movement (CLF-DMs) is presented. Initially, a roughly consistent Lyapunov function is learned with respect to taught data, which is used for deriving estimated motions using regression techniques. Stability is then ensured solving a constrained convex optimization problem. The work presented in [
27] proposes the FSM-DM method based on extreme learning machine to consider three factors when learning: stability, accuracy and learning speed. Even though results from the three methodologies are promising, there are two main drawbacks: none of them are applicable to high-dimensional spaces and they do not work with a single demonstration.
Other techniques are better-suited for three-dimensional spaces. The authors in [
28] proposed a kinesthetic teaching approach based on Motion Primitives (MP) and Gaussian Process Latent Variable Models (GPLVM). Initially, the paths are kinesthetically taught to a robot with the help of an operator. In order to model these learned paths, MP group motion trajectories that present time-dependent variances over time and their dimensionality is reduced with GPLVM. The resulting latent space is finally mapped into a reward for the learning algorithm. In [
29], a neuronal network is proposed for the learning procedure, obtaining a better reproduction performance while guaranteeing the generalization ability. This method (modified DS) has several limitations derived from the use of neural networks as can be the great amount of parameters that need to be controlled for the training time. Overall, the main issue of all these presented kinesthetic learning methods is that they are highly dependent on taught trajectories. New paths calculated using acquired knowledge strictly follow learned data. There might be cases in which it could be useful for learning methods to have the ability to explore new trajectories. This is key in situations such as having corrupted learned data or having a single demonstration. The Fast Marching Learning method proposes an auto-learning factor to combine the usage of learned data and the exploration of unknown regions.
The rest of the paper is organized as follows. In
Section 2, the Fast Marching Square method is explained, with special emphasis on its implementation in the path planning problem. In
Section 3, the FML method is presented by giving an in-depth account of kinesthetic data collection explaining the auto-learning process developed, explaining the ability to add obstacles and making a direct comparison with the RL structure.
Section 4 presents a series of experiments in both two and three dimensions performed to test the benefits of the developed method. For this purpose, a handwriting human motion dataset created by LASA has been used for the empirical comparison of the method developed for two dimensions. In the case of three dimensions, a series of tests were carried out in an indoor environment, both in simulation and in a real environment using a robotic manipulator.
Section 5 presents the final conclusions and proposes future work.
2. Fast Marching Applied to Path Planning
The Fast Marching Square (FM
) method [
30] is a path planning algorithm which is a variation of the original Fast Marching Method [
31]. It is based on the idea of guiding the desired path by following light propagation. Mathematically, light propagation is defined by the Eikonal equation, which states that the speed of light is determined by the substance in which it is travelling [
32]. This equation is solved by the solution of wave propagation, as stated in Equation (
1):
where
and
represent the wave value and initial wave value, respectively,
is the refraction index and
is the light speed. According to this definition, some properties of the proposed method can be derived:
The path that light follows is always the fastest feasible one, so the proposed planning method ensures the calculation of the path of least possible time.
Given the characteristics of Equation (
1), if
is
, then
is also
, and so are the calculated trajectories using the gradient method over this potential. This turns into smooth paths that avoid the need for extra refinement steps.
Since the method is based on wave propagation, if there is a feasible solution, it is always found, so it is complete.
Following its original definition, fast marching solves the Eikonal equation applied on a rectangular orthogonal mesh, deriving into a
algorithm where
n is the total number of grid points. The discrete formulation of the Eikonal function can be seen in Equation (
2):
where
and
are the grid spacing in the x and y directions,
is the wave propagation speed for the grid cell (
i,
j) and
T represents the arrival time of the Eikonal equation for each position. If this time
T is represented for the different values
and
, it is obtained that:
where
T sets the time in the current cell
and the values of
and
set the minimum values of the time in the neighboring cells previously visited to the current cell by the expansion of the Eikonal equation. Iterations of the propagation procedure on a grid can be visually observed in
Figure 1.
The final outcome is a matrix where each cell indicates the arrival time of the wave with respect to the propagation point. As a modification of FM,
applies this procedure twice, deriving in safer and smoother paths. This path planning method (
) can be applied to binary maps by propagating a wavefront that consider all obstacles as source points. The final matrix is considered as a velocity map
. To better show the differences between FM and
, the paths are calculated on a sample binary map and shown in
Figure 2 and
Figure 3. Both figures show that the path obtained with the FM
method is smoother and reproducible by a robot.
The velocity map values range from 0 to 1, representing the maximum speed allowed for the vehicle at each point of the map. Obstacles imply speeds equal to zero, whilst points in space far enough from obstacles will allow maximum speeds. When computing a path for the vehicles to follow, the FM
method will obtain the shortest path from the initial position to the goal position that lets the vehicles navigate at greater speeds. On the other hand, the original FM method will compute the shortest path, not taking into account the safety of the vehicle. Assuming that
contains relative velocities between 0 and 1, it is possible to trim (saturate) this velocity map. With this small modification, the safety and smoothness of the computed paths is still ensured (except for saturation values close to 0), while obtaining trajectories closer to the optimal one in terms of distance. Examples are shown in
Figure 4. It is also possible to saturate the map based on a safety distance from obstacles. That is, instead of using an index between 0 and 1 to saturate the map, we can directly set a distance in cells from where the velocity will be 1 (safe for the vehicle). We call this parameter the area of influence (aoi). If each cell corresponds to a square meter, an aoi of five cells corresponds to a security distance from obstacles of 5 m. Examples are shown in
Figure 5.
3. Learning Manipulation Trajectories via Fast Marching Square
The main idea of using Fast Marching Square (FM) for learning is to take advantage of the benefits of this path planner. One of its greatest advantages is its capacity to guide planned paths through places of the environment in which the propagation velocity is higher. This generates that the total path can be covered in less time. Consequently, one of the main ideas in the use of this method for learning is to force generated paths to take a similar direction as learned paths. Following this concept, the method tries to exploit learned data to generate similar paths. To do that, the method amends the F matrix so that learned data obtained via kinesthetic teaching will have faster values in the matrix. Other advantages that arise from the use of FM are its smoothness and being local minima-free.
In this paper, we are focused on point-to-point tasks in which we teach the robot different trajectories for the manipulation of elements of the environment. Once these trajectories are available, they are codified as point sequences in the workspace. By generalizing these paths, similar paths can be generated for any specific task. Therefore, the application of FM
generates more efficient results that the ones generated by just applying kinesthetic teaching, making faster and smoother paths.
Figure 6 shows the three proposed steps for performing learning based on Fast Marching Square. First, data are kinesthetically collected. Then, Fast Marching Learning is applied to acquire knowledge from provided data. Finally, new paths are generated considering an additional auto-learning process, which includes the capacity of finding new unexplored paths. These processes are hereunder described in more detail.
3.1. Kinesthetic Data Collection
The FML algorithm needs learning data generated by a user as input. To prevent the problems presented in
Section 1, such as re-targeting, data collection is directly performed with the robot itself. The data acquisition process, therefore, is based on kinesthetic learning, where the user moves the ADAM robotic arms generating the trajectories required for the task to be performed. This robotic platform is presented in
Figure 7.
The arms used for this project are UR3 from the Universal Robots company. This arm is an ultra-light and compact collaborative industrial robot, with a weight of 11 kg, a payload of 3 kg and a rotation of ±360
for the first five joints and infinite rotation at the end-effector. To facilitate the process of taking data with the arm, two grips have been designed at the critical control points of the arm, namely the end-effector and the elbow, as shown in
Figure 8. These grips allow the user to have greater control of the six joints, thus allowing the generation of trajectories to be much more natural and similar to those that a human would perform.
These elements can be easily removed from the robot after kinesthetic teaching has been carried out and they can be used on both arms. With these grips, the user can handle the arm in a simple and controlled way, as shown in
Figure 9, where the person can handle the last three joints through the end-effector grip (gray), and the first three joints through the elbow grip (green).
For path generation, an algorithm with a constant time cycle
T has been designed. This algorithm allows to store the position in Cartesian coordinates of the end-effector. Therefore, each of the paths
P that are taught will contain a set of
N points in the three-dimensional space conformed by
p(
x,
y,
z). Since this method has been created for its use in real environments, it has been established that more than one path can be taken to teach the robot. Therefore,
K different paths can be made, which will make up the robot’s experience
E that can be codified as
where each path consists of
Although data collection may seem trivial, it is a very important step and must be performed correctly in order not to generate errors that can be accumulated during the learning step. Empirically, it has been found that there are two preferable ways to collect data. The first one is to perform a movement at a very low speed, having full control of all the joints of the robot. The second one is to perform a path at a high speed, in which, due to the inertia of the movement, the transmission of any possible disturbance from the person teaching the trajectory is avoided.
Figure 10 shows results from gathering data in three different ways, where the same path has been taught using different strategies depending on the velocity and stability.
As shown in
Figure 10, results are clearly better in cases where the user has full control of the robotic arm. When using low velocity and a full control of the 6DoF of the arm, correct paths are generated but with a greater amount of points than in other cases. If the user decides to use a high velocity demonstration, the path generated has less points than in the previous case, but the path precision is reduced compared with the low-velocity case. Therefore, there is no single way to solve the kinesthetic learning task. It has been empirically proven that, for precise tasks, it is better to perform a teaching process at low speed and have control of all the DoF, and for more general tasks, it is better to use a teaching process at high speeds. In addition to this, a combination of both forms of data acquisition can be carried out, performing a high-speed data acquisition for approaching the end point and then performing a low-speed control to increase the accuracy of the task.
Regarding data acquisition, a code was developed with which, by using ROS, the values of the joints are read and stored in a path. Subsequently, a filtering process is performed, where we avoid possible repeated points due to the sampling rate and perform the relevant transformations to go from a reference point at the base of the arm to a reference point at the base of the robot. As such, we can apply FML and directly send movement commands to the real robot.
It is important to emphasize that, although filtering for duplicate point removal is necessary, small perturbation removal is not required. This is because the paths will be used as a basis for learning, but applying the FM method will provide the method with some of its advantages, directly obtaining smooth paths as an advantage with respect to those paths used for traditional learning.
3.2. Fast Marching Learning Algorithm
Fast Marching Square, as mentioned previously, creates a velocity map in which a path is optimal when the propagation velocity is higher. Hence, by modifying these velocities, the final calculated path can be reshaped. This fact is very useful for the learning step. Given a set of experiences, the main objective of Fast Marching Learning is to encode this information into the velocity map. As such, the final path would be biased by learned experiences without losing the main properties of Fast Marching Square, such as avoiding local minima and being smooth.
The proposed algorithm takes as inputs the point-to-point paths from the robot’s end effector learned with kinesthetic teaching, defined with Equations (
4) and (
5). As an output, it is intended to replicate the shape of learned paths when a similar task is commanded. Given that the outcome is based on learned experience, the final path is expected to be efficient, smooth and faster.
The first step is labeling all the points contained in E as 1 in an empty workspace, that is to say, filled with 0. The resulting workspace is denoted as . Then, a dilation operation is applied on . The dilation size is defined by the area of influence (aoi), specified in voxels in the case of 3D environments. After the dilation process is applied, the workspace is divided into regions filled with 1, where previous knowledge is found, and other zones with 0, where no information is available. For the latter ones, the algorithm works as FM normally does.
When the workspace is ready, Fast Marching is applied in the same way as in the first step for FM
, turning the workspace
into a velocity map.
is linearly rescaled to be within the established bounds
, where
stands for
, which is in charge of weighting the importance of new data with respect to the rest of the environment. With this, the final velocity map
is obtained. This leads into a generalization in
of the provided demonstrations. Finally, Fast Marching is applied on the entire workspace considering a unique wave source defined by the centroid of all final trajectory points
and the velocity map
. A detailed description of the algorithm coding is shown in Algorithm 1.
Algorithm 1 Fast Marching Learning Algorithm. |
- Require:
- Ensure:
fordo for do end for end for return
|
3.3. Auto-Learning Process
The use of kinesthetic learning generates a series of advantages over other learning methods, such as computation speed, since it is based on a series of data taken by an expert, or the generation of paths or movement orders in which there are no collision problems. These ideas are based on the use of the exploitation of known information as a means to generate optimal and fast results. When working with learning methods in robotics, it is very important to find a balance between the use of already learned data (exploitation) and the acquisition of new data autonomously (exploration) [
33]. Without exploration, the FML method will always return to the first learned path, and better paths will never be found. On the other hand, if the FML method explores too much, it cannot stick to a path, performing different paths each time, which does not generate real learning.
There are different strategies for balancing exploration and exploitation, which allow a choice to be made between a process of exploitation of prior knowledge and the generation of new data through exploration. One of the most commonly used forms is the
-greedy method, in which a small
probability is responsible for selecting the best action or generating a random option to encourage exploration. Another form to work with this is the use of Boltzmann selection, where the probability that an action will be selected depends on how it is compared with other action values. In order to obtain such a balance in FML, the FM
applies its characteristic of always obtaining the fastest path. When kinesthetic learning is carried out, a modification of the
matrix is forced, thus generating a series of tunnels or routes which the method will use for the estimation of ideal paths. It may happen that, when FM
is applied, it estimates that to go from one point to another, there is a faster path than the learned one. If only the exploitation of the obtained learning is used, it will be forced to obtain a similar path to those learned, which in this case will generate a non-optimal path. Therefore, in order to encourage exploration, an extension of FML has been generated. An estimation of the obtained path is made and a learning process is carried out so that the new obtained path, which is better than previous knowledge, is also learned. As such, an auto-learning process is generated with which the robot, through exploration, increases its knowledge of the environment. In order to be able to control when auto-learning takes place or when FML makes use of the exploitation of learned information, it is necessary to establish similar criteria to those explained above. For this purpose, a selection strategy has been created based on the coincidence of the generated path with the kinesthetically learned paths. Given a new path, the velocity values on the
matrix for each of its points are checked. In zones where the new path is outside the kinesthetic learned paths, very low velocity values close to 0 are found, whereas for those zones within the learned paths, high velocities close to 1 are present. Following these facts, the selection strategy is defined according to Equation (
6).
where
is the auto-learning factor,
is the value of the velocity matrix for a point of the path and
l is the length of the evaluated path. Based on the value of
, auto-learning will be applied or not. This threshold was empirically obtained by calculating the mean values for different trajectories using only the kinesthetically learned values, so that through various tests, it was established that if the value of
, it can be estimated as a new learned path, and therefore, it will be added to the previously taught paths. An example of the application of the auto-learning method with respect to the
value is presented in
Figure 11.
As in the -greedy method, the value obtained as the optimal threshold is not mandatory for the implementation. Depending on the strategy to follow, the user can decide to work more conservatively, encouraging the use of learned data (decreasing the value of the threshold) or encouraging the addition of new paths obtained (increasing the threshold).
The application of auto-learning is highly beneficial in FML oriented to manipulation for two cases in particular. The first is that, by having the ability to learn and add such automatic learning to previous data, it is not necessary to teach all possible cases of a workspace. As such, by teaching the robot the most common cases so that it can generalize and by having the auto-learning method to be able to learn more specific cases, almost all the activities to be carried out in a given workspace can be covered. This also ensures that no segregation of data arises. This means that there is a possibility that the robot is only taught to perform one type of movement (e.g., downward movements). With auto-learning, the robot can learn other types of movements, such as lateral movements, thus generating a series of new paths that have not been taught in a kinesthetic way.
Secondly, the use of auto-learning allows the correction of possible errors in data collection. If incorrect data acquisition is performed, in which movements that are impossible for the robot to perform are generated, auto-learning will avoid carrying out those erroneous paths and it will permit to learn correct paths to perform the required task. An example of both cases can be seen in
Section 4.2.
3.4. Important Parameters: Saturation and Area of Influence
In the FML method, there are two parameters derived from Fast Marching that are essential for the correct behavior of the algorithm, which are saturation and area of influence.
Saturation () represents the propagation velocity of FM. Saturation in FML is used to indicate the places with previous experience or auto-learned experience. Therefore, places that have previous experience will be reached earlier. If the saturation value is really low, the method will only use the experience provided for the kinesthetic data. On the contrary, if the value is very high, the method will encourage the exploration of the environment by not following exactly learned values.
The area of influence (
) allows to estimate the dilation of the path points in order to give connectivity to the demonstrations and its spatial surroundings. In two-dimensional environments, the area of influence is in pixels (px) and in three dimensions it is measured in voxels (vx). This parameter affects the learning generalization. When an excessively low value is given, the algorithm will not generalize and it will strictly follow the taught trajectories. In the opposite case, the reproductions will excessively generalize and the demonstrations shapes will be lost. Normally, this parameter usually follows a morphologically flat structural element based on a disc, which allows the dilatation processes to be carried out according to the value of the selected
aoi, as shown in
Figure 12.
In the proposed case, where we are working in three dimensions, both for the FML process and for the cases where auto-learning is necessary, a multidimensional structure has to be applied (in this case in three dimensions). When working with 2-dimensional elements, as shown in
Figure 12, it may happen that areas that have been auto-learned do not appear as traversable areas for the algorithm. This is due to path orientation changes. If the path is totally parallel to the base where we are taking data, the disc will be dilating parallel to the ground. Therefore, non-passable areas between auto-learned zones are created due to the lack of the third dimension (height) of the structure, obtaining a result in which a totally unconnected path is obtained. An example of this can be seen in
Figure 13 where the result of the same auto-learning is compared using a typical 2-dimensional structure (disc) and a 3-dimensional structure (sphere).
It can also be seen that the use of 3D structures does not only improve the regions that are parallel to the ground, but also that a positive influence is achieved for the curved paths, as shown in
Figure 13.
3.5. Addition of Obstacles in the Workspace
Real environments are rarely completely free and are usually dynamic. This is why obstacle inclusion is necessary. Obstacles can belong to the environment from the beginning, which allows the robot to be taught to take them into account, or they can be added later once kinesthetic learning has already taken place. Due to the advantages of FML and FM, both cases can be solved in a simple way.
If the obstacles in the environment are initially known, the solution is trivial because the kinesthetic teaching will take these obstacles into account, thus generating paths that allow the obstacles to be circumvented. An example of this case in 2 dimensions is shown in
Figure 14.
In the case of initially having a free environment (
), the learning process will be carried out without taking into account any obstacle, generating a certain experience
E, which will have a velocity map
F associated with it. As soon as any obstacle appears dynamically, it will be necessary to create a new velocity map
, which will be saturated. For this work, we considered a dynamic object as one that, after having learned a series of trajectories, appears and interferes with previously learned data. In other types of algorithms, where only the information that has been learned is used and there is no exploration, this type of situation requires either a new learning procedure of the modified environment, or systems for detecting and processing this type of situation. In the presented algorithm, the auto-learning process through the use of exploration determines a new path that is able to avoid the dynamic object. After verifying its feasibility, the new plan is added as a learned path, thus extending the knowledge of the environment as well as accelerating the process in case the object remains in that position. As such, to obtain the final velocity map, it will be necessary to apply Equation (
7) to all the points in which
is not saturated.
That process has to be repeated any time a new obstacle appears in the environment. The method is the same for 2D and 3D. An example in two dimensions of the dynamic appearance of an object in a free environment can be seen in
Figure 15 with
and
.
An example of the process in three dimensions is represented in
Section 4.2.
3.6. Fast Marching Learning Characteristics
Due to the characteristics derived from FM, there are a number of advantages which make it possible for the FML to solve problems that may arise when it comes to learning.
3.6.1. Deterministic Behavior
Since FM is a deterministic method, FML is also deterministic. This means that, when using FML, we will obtain the same output (path) as long as the input is invariant. This factor can be important because it allows the behavior of the generated trajectories to be known and therefore avoids erratic behavior that could cause damage to the robot, which is common in probabilistic methods.
3.6.2. Bi-Directional Behavior
Other learning methods are designed in such way that they can only generate solutions in one direction, i.e., from the start point to the end point. In case of needing a trajectory going in the opposite direction, from the end point to the start point, a totally new and unexpected behavior would be generated. In the case of FML, the same movement learned to go in one direction is applied to return in the opposite direction. Although it can be assumed as a limitation, it allows the method to predict the robot’s behavior, thus increasing the safety when working in environments with humans, as well as ensuring that, in both cases, the fastest path is obtained.
3.6.3. Behavior with No Previous Experience
In other learning algorithms, having a point away from learned experience generates erratic and unpredictable behavior. In cases where there is no prior information, the FML method will obtain the fastest path from the starting point to the end point, always according to the metric obtained from the velocity matrix. Due to the use of a constant saturation (
) value at points far from previous experience or where there is no previous experience, the fastest path also means the shortest path in terms of distance and time. An example of the application of such behavior is represented in
Figure 16, where a path without previous experience (no kinesthetic data) and a path with previous experience (kinesthetic) have been created.
3.6.4. One-Shot Learning
As described in [
34,
35], one-shot learning refers to the characteristics that some learning methods have to be able to successfully reproduce and generalize certain tasks with a single taught demonstration. This is a relevant and important feature, as it allows a reduction in the amount of data to be collected, which can reduce to a certain extent the errors derived from the difficulty that this sometimes entails.
When a large number of kinesthetic demonstrations are applied in the FML method, in the dilation stage, the algorithm tends to generate a single area of influence, where, depending on the
set, more or less demonstrations can be grouped together. Therefore, what the method establishes is a single demonstration derived from the union of many demonstrations. The union of several learned tasks makes it possible to generate a single area of interest larger than with a single learned path, but which behaves as a one-shot learning. An example of the application of one-shot learning in two dimensions is represented in
Figure 17.
3.6.5. Stable Behavior
Fast Marching Learning has been proven to be stable according to an analysis based on the Lyapunov Stability Theorem adapted to non-dynamical systems [
7]. Equation (
2), defining wave propagation, is considered as the Lyapunov function
. It starts at the specified goal
for the robot, where
. Given that the propagation values are based on the wave arrival times, no negative values can be found. Additionally, as distance increases from the initial point, the values of
also increase. Finally, since no local minima are found in
and given the increasing value of the wave, the derivative of the function is always positive.
Added to the above facts, when the gradient is applied on the function, it descends following the direction in which time decreases the most. Hence, the Lyapunov conditions are met, proving that the system is asymptotically stable. This means that every motion calculated with FML converges to the same point, since only a single minimum exists in .
3.6.6. Limitations of the Method
The main disadvantage of FML derives directly form the disadvantages of FM
. The main problem is given by the high dimensionality of the working environment. Working in environments where the search space is very high or the resolution is very small results in an excessively large working matrix, which makes the FM
method unable to obtain a solution in a reasonable time because the equation must check a large number of points. This problem is often encountered when working in outdoor environments where large navigation areas have to be covered for trajectory planning with mobile robots or UAVs [
36]. To avoid this problem in robotic manipulation, a constant environment matrix value is set. This matrix will have as maximum and minimum range the values of the manipulator workspace and the minimum resolution possible. In our case, a working matrix of
cells with a resolution of 1 cm is established.
3.7. Fast Marching Learning as a Reinforcement Learning Problem
As explained in previous sections, robotic arm control for environmental elements manipulation needs to have a balance between tasks that we give to the robot and tasks that it can learn autonomously. The generation and use of the FML method is based on the application of the general characteristics of reinforcement learning algorithms [
37], where the robot is supposed to take different actions in an environment so as to maximize the value of a specific cumulative reward, but obtaining the advantages of the FM
method for the generation of the trajectories. Specifically, the developed algorithm establishes a direct relationship of its structure with Markov Decision Processes (MDPs) [
38]. In MDPs, the robot selects an action
a in a given state
according to a specific policy created
, which tries to maximize a reward function
r provided that there is a transition probability of being in state
after selecting an action
when in state
. MDPs are generated using the idea that the actions and states are finite. Thus, the task to be solved can be easily defined as a sequence of different behaviors given descriptive state variables. A schematic of the MDPs is represented in
Figure 18.
Despite the advantages of MDP, working with this method in manipulation with a large number of DoF is not trivial. Due to high dimensionality, a large number of actions can exist as a function of states, which leads to an infinite number of state–action pairs. The FML algorithm manages to solve this problem due to the direct application of FM, that allows working in a discretized space obtaining results in a continuous space. As such, FML allows working in a priory environment with infinite actions (continuous), simplifying the positions of the end-effector to a discrete space and subsequently obtaining a sequence of actions for each state. If we relate the elements established for the MDPs to the direct application using FML on the robotic arms, we obtain that the actions are denoted by the actions to be performed by each motor. States allow us to represent each of the joint and end-effector positions. The control policy, when working with FM, will be denoted by the potential matrix which represents the arrival times at the specified points. As this policy will always seek to minimize the distance and time to perform a task, which entails seeking the maximum combination of velocities, it can be assumed that the reward for the FML is represented by the velocity matrix. Positive reinforcement is obtained for going through the fastest and more favorable areas (without obstacles). Negative reinforcement will be given for the slower and less favorable areas (areas closed to obstacles). This control policy therefore allows the position commands to each motor to be estimated on a state-by-state basis. In this case, the obtained policy is deterministic.
Unlike in FM, where a direct estimation is made between the initial and final points, thus obtaining the path to follow, in FML, this process is carried out for each of the points that make up the path between the initial and final points. As such, the action of the next state is obtained only taking into account the previous state as well as the decisions taken in it.
5. Conclusions
In this paper, the operation of the extended FML and the extension of the algorithm can be observed by adding the ability to work in a 3-dimensional environment as well as the auto-learning process developed for this method has been detailed. It has been shown that the method works correctly in empty workspaces, with obstacles that are present when the data are being collected and with obstacles that work as dynamic objects. Due to the benefits of the FM capabilities, the method shows a different point of view from other motion learning methods.
The method shows an equilibrium between the exploration of the environment and the exploitation of the learning data, which generates results that allow adaptation to any type of environment tested, independently of errors in data collection and segregation in the data learned. The principal advantages from this method against others are its determinism, stable behavior, bi-directional behavior, one-shot learning, capability to solve the problem without enough data, and capability to auto-learn from its own experience. Additionally, experimental results show that the FML method generates reliable and safe paths.
The method was qualitatively presented with the LASA dataset and has been quantitatively compared against five different imitation learning algorithms presented in the state of the art. This comparison has presented that the developed method generates paths with less error and faster than the other previous methods. This is mathematically represented by SEA, in which our method promotes approximately , and , with an improvement of without an auto-learning process and with an auto-learning process compared with the most restrictive method (modified DS). It can be seen that the developed method tends to seek the fastest solution by making use of both the information learned and the exploration of the environment, generating solutions that, compared with other typical methods, optimize the resolution. This does not only make use of contrasted information, but is able to carry out an auto-learning of possible options that optimize the solution. The idea of this method is not to position itself as a better option than other methods, but to present its own solution based on the idea of learning the environment for the manipulation of elements in an environment and with a real robot. In spite of this, it can be observed that, the FML algorithm presents clear advantages over other imitation-learning techniques since it is able to solve the main defects of imitation algorithms, such as data dependence, since the algorithm is able to work with none, one and even with incorrect data due to its balance between the exploitation factor and the exploration of the environment. In addition to this, the presented algorithm can be implemented in several dimensions, a factor that other algorithms do not have, as it presents a higher efficiency than the algorithms typically used for this type of techniques.
Therefore, future work will focus on applying that method for the bimanipulation of elements of the environment using the same robot, taking into account the learning process for each of the arms, generating the auto-learned data and correlating it to generate coordinated and independent tasks.