1. Introduction
The future of navigation points toward increasing the autonomous capabilities of ships [
1,
2], from decision support systems to fully autonomous navigation. The framework that will guide the future of the maritime world is hinged on the classification released by the International Maritime Organization (IMO) [
3], in which four incremental MASS (Maritime Autonomous Surface Ship) autonomy levels are described for the classification of autonomous ships, ranging from MASS Level 1, featuring automated systems for decision support for the crew, to MASS Level 4, an uncrewed ship with full decisional autonomy. The push toward the development of autonomous technologies in the maritime field has created fertile ground for the research community, which has identified numerous scientific gaps in various areas typical of autonomous navigation and marine robotics, such as complex guidance and control algorithms [
4,
5,
6], collaborative control [
7,
8,
9], situational awareness [
10], path planning, and collision avoidance.
The challenge for maritime collision avoidance is to develop algorithms capable of reacting to the presence of obstacles, fixed or moving in the surrounding environment, and generating trajectories or maneuvers capable of avoiding collision, maintaining an adequate safety distance, and, when applicable, complying with the COLREG [
11,
12], which sets the “rules of the road” a ship must follow when interacting with other ships. This requirement is essential in scenarios where autonomous systems interact with human-controlled systems [
13]. Collision avoidance falls under the broader problem of path planning, i.e., the determination of an optimal path automatically based on information about the surrounding environment. In the robotics field, path planning is commonly divided into two levels [
14,
15], the off-line or global level, in which the path is determined based on a priori known information, such as information about fixed obstacles or weather forecasts (weather routing), and the local or reactive level, in which subparts of the path are re-planned online in reaction to changes in the environment detected by sensors, such as moving or unexpected obstacles. The scientific literature has proposed various approaches to the reactive collision avoidance of marine vessels, including A* [
16], Dijkstra’s algorithm [
17,
18], visibility graphs [
19], rapidly exploring random trees [
20,
21,
22], Artificial Potential Field methods [
23,
24], and various population-based heuristics [
25,
26,
27,
28].
Describing a ship’s route or maneuver through a sequence of waypoints is a common approach in the literature, not only at the global planning level but also at the reactive planning level. Such an approach features some relevant advantages for the application to large human-crewed ships, both in a fully automatic collision avoidance system and within a MASS Level 1 decision support framework, such as the one
Figure 1 illustrates. The representation of a maneuver by waypoints and legs is intelligible to seafarers, and the decision support system can propose it to the officer on the watch, who can understand it and decide whether to acknowledge it. Then, the new sequence of waypoints is taken over by a motion control system, for example, one based on Line of Sight [
29,
30], which determines, based on GNSS localization, the necessary propulsion and steering actions to track the course according to the ship’s dynamics with a reasonably low track error.
Dynamic programming (DP) is an effective approach for solving multi-stage optimization problems. From its introduction by Bellman [
31,
32], DP has been successfully generalized and formulated to describe path planning problems [
33]. The requirements of a collision avoidance system are the ability to effectively and efficiently represent obstacles and the surrounding environment and the ability to determine in real time, with low computational cost, evasive maneuvers that onboard control systems can actuate. These requirements perfectly match the potential of DP, which, in this framework, offers the possibility of a simple and effective problem description, implementation of constraints, and efficient solution schemes capable of exploiting parallelism. Despite the inherent recursive formulation of DP, very efficient solution strategies based on function memoization or tabulation [
34] and leveraging parallel computation [
35,
36,
37] have been proposed to optimize the computation efficiency. DP-based algorithms are ideal for scenarios where resources are limited and real-time computation is crucial, such as in autonomous vehicles, robotics, and embedded systems, and have found application in a broad range of industrial fields, including railway transportation [
38], robotics [
39,
40], and maritime transport. The maritime literature features various applications of DP, primarily in typical maritime global planning problems such as weather routing [
41,
42,
43,
44], where the objective is to determine the optimal trajectory across a space–time domain to reach the destination by effectively navigating through forecasted dynamic weather conditions in compliance with various constraints including ship motions, seasickness, and minimization of fuel consumption or the ship’s motion-related loss function.
In this article, we introduce a DP formulation for the problem presented by the collision avoidance of autonomous vessels which can be applied in a reactive collision avoidance context. In
Section 2, we set up the framework to describe the ship collision avoidance problem. In
Section 3, we formulate the optimal path planning problem as a multi-stage decision process with a recursive definition based on Bellman’s equation. In
Section 4, we describe the cost function and constraints based on the actual requirements of a collision avoidance maneuver for marine application, taking into account maneuvering limits and regulations to obtain smooth, collision-free, and COLREG-compliant maneuvers. In
Section 5, we propose a bottom-up solution scheme for the problem, leveraging parallel computing, and we discuss some implementation-related strategies in
Section 6. Next, in
Section 7, we further reduce the algorithm time complexity by proposing a greedy approximation of the DP problem formulated previously. Eventually, in
Section 8, we assess the performances of the proposed approach. First, we discuss the results of the proposed algorithms in some relevant navigation scenarios. Moreover, we analyze the performance of the proposed algorithms depending on the algorithm parameters and the required computation effort on a large set of randomly generated scenarios. Finally, we compare the proposed algorithms with the RRT* in randomly generated scenarios using performance metrics proposed in the literature.
2. Collision Avoidance Framework Definition
Despite the Earth’s curvature being relevant when planning long routes [
41], the horizon of a maneuver in a collision avoidance context is usually such that we can approximate the ship’s state space
X with a Euclidean plane.
Figure 2 visually represents the definitions and notations we will provide in the following.
At the beginning of the collision avoidance maneuver, the ship is located in
. We introduce the orthogonal unit vectors
and
to represent the cardinal axes of
X; the unit vector
is aligned with the ship’s course at the beginning of the collision avoidance maneuver, while
points towards the ship’s starboard (right) side so that
points downwards, as per the standard convention for ship maneuverability. We will refer to any generic position
as a waypoint. We can represent a generic maneuver or route
R as a sequence of consecutive waypoints as follows:
A sequence of two consecutive waypoints,
, is called a route leg. If
, and we denote with “⊕” the sequence concatenation operator such that
, we can represent the route
R as:
We introduce the notation
to represent the vector connecting the start point of the leg to the end point. We can obtain the course change between two consecutive legs
,
,
using the function
defined as follows:
Some considerations about the ship’s kinematics are also needed. Firstly, we assume the ship is moving at a constant speed
u. Such an assumption is reasonable since, in maritime practice, course alterations are preferred to speed reductions due to the long transients a speed alteration takes. In particular, it is common practice to avoid obstacles by altering the ship’s course and trying to keep the speed or reducing the speed by a limited amount in the first part of the maneuver, keeping it constant and slowly regaining the cruise speed after the collision risk is mitigated. Leveraging the constant speed assumption, we can easily map
R to a sequence of time instants
. The instant
at which it engages the waypoint
is given by:
Therefore, the instantaneous vessel position
at time
is given by:
Moreover, since collision avoidance is about the interaction with obstacles, we need to introduce a notation and some hypotheses to describe them. First, we assume there are
M obstacles in the scenario with known kinematics. We denote with
the instantaneous position of the
mth obstacle. For this study, we will approximate the motion of the obstacles as a straight line with constant speed motion as follows:
where
represents the speed vector of the
mth obstacle. The proposed formulation could be generalized to any known motion law.
3. Dynamic Programming Formulation
Concerning
Figure 3a, let
be the initial position of the ship, the unit vector
be aligned with the ship’s initial course, and
be a unit vector orthogonal to
such that
points downwards. Let
and
be the dimensions of the region in which the ship can maneuver. We introduce a discretization of the domain by defining
and
as follows:
where
control the fineness of the discretization along
and
, respectively, and
are the indices of the discretization.
Figure 3b exemplifies a graphical representation. A route
is such that
.
We can imagine a route
as the result of a sequence of transitions between consecutive route legs
, where
. In this framework, optimal route planning can be represented as finding an optimal policy in a multi-stage decision process in which the route legs represent the states. Let
be a function expressing the cost of a route up to state
,
; therefore,
represents the cost of the whole route. Let us assume that
J is additively separable, i.e., there exists a function
such that
Let us denote
, the route’s cost up to the state
following an optimal policy, i.e., a policy such that
is the minimum, as follows:
Since
J is additively separable, we can leverage Bellman’s equation to define
F recursively [
33]:
where
represents the state of the ship before the collision avoidance maneuver begins, and
is a function such that
if the transition from
to
is feasible,
otherwise.
Similarly, we can define the optimal predecessor function
returning the optimal predecessor state in
for the state
:
Let the recursive function
have the following form:
Once we know
for all
, the optimal solution
R can be obtained, backtracking the optimal decisions as follows:
where
4. Constraints and Cost Function
The constraints are needed to formulate the function
to identify whether a transition is feasible. For the transition to be feasible, the next leg must begin at the end of the previous. Moreover, each transition must be such that the final resulting sequence, i.e., the route, is compatible with the ship’s maneuvering capabilities, respects the COLREG, and is collision safe. Therefore, the function
t takes the following form:
In particular, ensures that the two route legs can be connected. The course change constraint function ensures the course changes along the path are compatible with ship maneuvering capabilities and good seamanship, the COLREG constraint function implements a set of rules to ensure the COLREG compliance of the own ship’s kinematics relative the other vessels’, and the collision avoidance constraint function aims to guarantee an appropriate distance from the obstacles to avoid collisions safely.
4.1. Route Leg Connection
The feasibility of the transition from
to
is expressed by the function
. The transition is possible if the end point of the first leg is the start point of the second:
where the symbol “·” is a placeholder for any point in the proper domain, and the operator “
”, returning values in
, is such that
returns
if
,
otherwise.
4.2. Course Change Constraint
We need to define a minimum and maximum threshold value for the ship’s course changes, denoted as
and
, respectively. The introduction of
is motivated by the ship’s maneuvering capabilities and the performance of the motion control system. During a course change, the ship goes off the predefined track by a certain distance which is related to the magnitude of the course change. A large course change will push the ship too much off track, increasing the risk of collision. The reason for
can be found in COLREG Rule 8(b), which requires that “Any alteration in course and/or speed to avoid collision must, if the circumstances of the case permit, be large enough to be readily apparent to another vessel observing visually or by radar; a succession of small alterations in course and/or speed must be avoided”. We thus want to ensure that we either have course changes large enough to be apparent to an observer but not so large that the control and steering system cannot actuate them or no course change between legs. Thus, we can define
as:
4.3. COLREG Compliance Constraint
The COLREG is an international convention regulating several aspects of navigation, including visibility, navigation lights, and required behaviors in encounter situations to avoid collisions. This last part is a set of “rules of the road” that each ship has to follow when encountering other ships. COLREG-compliant behavior can be identified by analyzing the kinematics of the ships engaging in the scenario and relying on a set of if-then-else rules to assess which behavior each ship has to keep relative to the others. This action is usually called COLREG classification. Various approaches and algorithms for COLREG classification have been proposed in the literature [
45,
46].
For this study, we suppose that, to determine the COLREG-compliant behavior
, the own ship must keep relative to the
ith dynamic obstacle in the scenario, and that
, where
is the enumeration of all the possible behaviors:
In particular:
(Stand On): COLREG requires the target ship to maneuver to avoid a collision, so the own ship must keep its course and speed. In this case, we will neglect the presence of the target ship;
(Give Way): The own ship must maneuver to avoid collision with the target ship, letting the latter pass ahead;
(Head On): The own ship and the target ship are sailing on parallel and opposite routes, and the own ship must avoid the collision by turning to starboard (right);
(Any Action): The own ship must take any appropriate action to avoid collision; this behavior is adopted in emergencies, such as when the target ship is expected to maneuver to avoid the collision but does not seem to initiate the evasive maneuver.
These behaviors are ensured in the solution by imposing constraints on the leg in which the own ship and the target ship cross each other’s route.
Let us assume that the ship and the target
cross in leg
, i.e., for
, and let us denote with
the intersection point. If
, COLREG requires the target ship to engage
before the own ship; thus, we need to impose that, if the Own ship engages
during leg
, it does so after the target:
If
, the correct behavior is ensured by ensuring that the computed path keeps the target ship on the port side of the own ship. The latter is ensured by the constraint below:
Eventually, the COLREG constraint function takes the following form:
4.4. Collision Avoidance Constraint
The obstacle avoidance constraint aims to guarantee that the evasive maneuver is collision safe. To this end, let us introduce the concepts of safety distance
and the Closest Point of Approach
. The
is the distance that the own ship must maintain from all obstacles during an evasive maneuver. It provides a safety margin that accounts for ship dimensions and uncertainties related to the environment, control, and measurement systems, ensuring the ship’s safety during the avoidance maneuver. The
is the minimum distance between the center points of the own ship
and an obstacle, whose position over time is denoted by
, for
. In particular, we define the
as follows:
The ship and obstacles may have different sizes, speeds, and relative positions. In addition, previous results have shown that some encounter scenarios are more critical than others in their dynamic evolution [
13]. For the sake of generality, it may thus be appropriate to determine a specific safety distance
for each obstacle belonging to the scenario. Thus, if there are
obstacles, the function
checks whether a transition is collision safe:
4.5. Cost Function
The definition of the cost function plays an essential role since it directly influences the characteristics of the optimal path. Various approaches can be used, depending on the application. At the global route planning level, cost functions related to travel time or distance are often used to determine the shortest or fastest route. More elaborate approaches feature models of ship response to the environment, including ship motions, to determine a safer or more comfortable route, and ship propulsion, to determine the most fuel-efficient route [
41,
42,
47,
48]. Such approaches require more or less accurate modeling of the ship’s response to weather conditions and the availability of weather forecast data. Conversely, local planning aims to alter short route segments in reaction to an encounter situation with other ships to ensure a safe distance and avoid collisions. Fuel consumption and comfort relative to ship motions have a minor relevance in executing an evasive maneuver where the time horizon is short and the priority is collision avoidance. While the constraints described in the previous sections are crucial for this purpose, the role of the cost function is to describe the characteristics of a “desirable” route so that the algorithm can choose it. Although the same approaches described for high-level planning are possible, such calculations come at the price of a potentially increased computational effort due to the complexity of the ship response model.
Within the proposed framework, a maneuver, i.e., a series of waypoints, should be simple for an operator or autopilot to execute, with limited course alterations to limit overshoots and prevent the ship from going off track. To this end, we propose a cost function inspired by the concept of minimum control energy. In the context of this paper, control energy is related to the amplitude of the maneuvers needed to follow a path. In other words, a path with high control energy requires the ship to perform large course changes over time. According to this principle, we can define the transition cost
c as follows:
The choice of a minimum path length cost function would lead to equal ease of evaluation with potentially larger angle variations and, in general, a smaller margin on safety distance, but the guarantee of a shorter and more direct solution.
5. Solution Scheme
This section presents a bottom-up solution scheme for the problem posed in
Section 3 based on a tabulation approach. We can divide the algorithm into two phases: the tabulation phase and the backtracking phase.
Algorithm 1 illustrates the tabulation phase. We sequentially generate the feasible states at each stage by leveraging Equation (
7), while keeping track of the partial optimum
F values and the backward link to the optimal predecessor
p, into proper data structures.
Concerning Algorithm 1, we can estimate the number of stages in each state as
, i.e., the total number of transitions to be evaluated at each stage is
. Since the only transitions to be evaluated are those whose initial state ends in the starting point of the final state, a proper implementation allows them to be accessed directly in constant time; thus, the number of evaluated transitions can be reduced to
in constant time, leading to a time complexity of
for the overall process if the cost and constraints have constant time complexity. It is worth noting that the inner for-loop at line 10 of Algorithm 1 can be run in parallel since the loop does not mutate shared data. Once the tabulation phase is completed, the backtracking phase, described in Algorithm 2, reconstructs the optimal solution based on the previously tabulated back-links returned by the best predecessor function
p.
Algorithm 1: Bottom-up solution scheme. |
1 2 for : 3 ) 4 5 6 7 8 for : 9 10 parallel for : 11 12 for : 13 14 15 16 if : 17 18 19 |
Algorithm 2: Backtracking of the optimal solution. |
1 2 3 while : 4 5 6 return R |
6. Implementation
By leveraging a graph structure, we can implement the approach described in Algorithms 1 and 2. We can think of each node of the graph as a data structure , which stores the following for each stage i and for each state j:
The current state , which we will refer to with the shorthand notation ;
The optimal cost of the current state ;
A back-link to the optimal predecessor , implemented as a pointer to the predecessor node.
Algorithm 1 allows us to create all the nodes of the graph starting from the initial stage (lines 2–6) and generating the nodes at stage
from stage
i by initializing each node’s state (lines 11, 13, 14) and then assessing the node’s optimal cost and predecessor by solving a simple optimization problem to choose the best predecessor node (lines 15 to 19).
Figure 4 shows a section of an example graph structure generated based on Algorithm 1. Each node contains the state
, denoted in shorthand as
, the optimal cost
, and the optimal predecessor
, i.e., a back-link to the optimal predecessor node represented with an arrow.
When the graph has been completed, we can select the node with the minimum cost (i.e., minimum value of F) at the last stage and then backtrack the optimal solution by following the chain of back-links, concatenating each state to the front of a sequence according to Algorithm 2 until we reach the starting point.
7. Greedy Approximation
From Equation (
15), we can note that only part of the conditions for the feasibility of the transition from stage
to stage
i depends on the state
, as the transition cost
c defined in Equation (
24) does. In this section, we propose a greedy approximate dynamic programming (GADP) scheme that makes greedy decisions to reduce the number of evaluated transitions. The proposed scheme loses the capacity to find the globally optimal policy, yet it allows for the time complexity of the algorithm to be reduced. The basic idea is to reformulate the route
as a sequence of transitions between consecutive states
,
. In other words, we use the waypoints to represent the ship’s state rather than representing it with a leg connecting two consecutive waypoints. The greedy minimum cost at stage
i is expressed by the function
, while the greedy optimal predecessor is represented by the function
:
where
and
greedily compute the result based on
:
Algorithm 3 shows the proposed approach. Since a whole cycle over
disappears in the solution scheme, we now evaluate only the
transition at each stage, and the time complexity required to run the algorithm drops from
to
.
Algorithm 3: Approximate dynamic programming bottom-up solution scheme. |
1 2 3 4 for
: 5 6 parallel for
: 7
8 9 if
: 10 11 12 |
The backtracking phase works similarly to in the previous case.
8. Results
This section aims to evaluate the performance and capabilities of the proposed algorithms to understand their potential contribution if applied within an autonomous navigation and decision support context. In particular, this section is divided into three parts: In
Section 8.1, we present results for three example scenarios which include moving and fixed obstacles. This analysis aims to show, qualitatively and quantitatively, the differences between the proposed algorithms in simple scenarios inspired by practical navigation conditions. The second and third subsections involve extensive tests of the presented algorithms against randomly generated scenarios, including those with fixed and moving obstacles with randomly assigned positions, direction, and speed. We solve one thousand randomly generated scenarios to statistically evaluate the algorithms’ performance. In particular,
Section 8.2 analyzes the influence of domain discretization on the optimality of the solution, the computation time, and the algorithm failure rate, while
Section 8.3 compares DP and GADP with the RRT* algorithm according to performance metrics from the literature. All scenarios take place in a square domain of
nautical miles, in which the own ship starts from point
with heading
to reach the opposite side of the domain, i.e., any point
. An implementation of the two proposed algorithms has been developed for testing and comparison purposes. The algorithms are implemented in Rust language, relying on the Rayon library for parallelization. We perform the computations on an AMD Ryzen 9 5900HS CPU (8 × 3.3 Hz − 4.6 Hz boost), 32 GB DDR.
8.1. Example Navigation Scenarios
In this subsection, we apply the proposed algorithms to three navigation scenarios. The first features two sets of fixed obstacles extended transversely to the navigation direction, mimicking two docks; the second is a COLREG encounter scenario with sailboats; the third is a head-on scenario within a channel. More details on the analyzed scenarios are given below.
Scenario 1, shown in
Figure 5a, has two barriers placed at
and
nautical miles, respectively, ranging in
and
. These obstacles force the own ship to perform a zig-zag maneuver between the barriers;
Scenario 2, shown in
Figure 5b, features two sailboat vessels, the first starting in
with a speed of
knots and a heading of
, the second positioned at
with a speed of 6 knots and a heading of
. The COLREG imposes on the own ship a “give-way” behavior against both the target vessels;
Scenario 3, shown in
Figure 5c, features a double “head-on” with two target vessels starting from
and
with a heading of
and speeds of 9 and 8 knots, respectively. In addition, two fixed side barriers form a channel parallel to the
x-axis that is 8 nautical miles wide.
To comply with the COLREG, the own ship must make only visible heading alterations, with a minimum angle of
, while a maximum of
turn is accepted. The algorithms perform the path optimization on a discrete computation grid defined as per Equation (
7), where
and
.
Figure 6 shows the solutions found by the two algorithms in the three proposed test scenarios. In scenarios 1 and 2 (
Figure 6a,b), DP and GADP propose different maneuvers; in particular, the GADP solution features more delayed direction changes. In scenario number 3 (
Figure 6c), on the contrary, the solution computed by the two approaches is the same, i.e., the greedy optimum computed by GADP corresponds to the global optimum of the DP. Eventually,
Figure 7a,b present the value of the cost function and the computation time required to determine the solution, respectively. We can note that, at the price of a higher cost function value, the solutions determined by the GADP algorithm require less computation time.
8.2. Effect of the Grid Fineness Parameters
The proposed algorithms determine the best maneuver by selecting a sequence of waypoints on a discrete grid controlled by the parameters
N and
D.
Section 5 discusses how the computational effort depends strictly on these two parameters; however, a too-sparse grid could compromise the optimum quality. To estimate how much the fineness of the grid affects the minimum cost function and computational time, we perform a series of experiments based on the random generation of 1000 scenarios. Each scenario includes up to 10 fixed and 10 moving obstacles, each with a randomly assigned position, direction, and speed. The scenarios thus generated are solved with the DP and GADP algorithms with
and
and
D doubling from 5 to 80.
The results are presented in
Figure 8 using violin plots [
49]. We can observe how, while the grid fineness increases, the cost function reduces to stabilize when we reach a sufficiently fine grid (
Figure 8a). Conversely,
Figure 8b shows that the computation time increases as expected. We can also note how GADP requires less time than DP and how the greedy optimum tends to approach the exact optimum as the grid becomes dense. Eventually,
Figure 8c shows the trend in the percentage of failed random scenarios as the fineness of the grid changes. With a sufficiently high
D, the percentage stabilizes to a fraction of the “unsolvable” scenarios, which decreases as
N increases. We can also note that the greedy approximated algorithm can solve fewer scenarios. In other words, although a solution exists, the GADP cannot find it due to the consequences of the greedy assumptions made in
Section 7 on the transition feasibility.
8.3. Performance Assessment and Comparison with RRT*
This subsection compares the proposed algorithms against a set of randomly generated scenarios according to a set of performance metrics to better highlight the advantages and disadvantages of the proposed algorithms, which we partially discuss in
Section 8.1. The scenarios used to perform the test are randomly generated using the same technique as described in
Section 8.2, i.e., by arranging in the domain a random number from 1 to 10 of fixed obstacles and moving obstacles with randomly assigned positions, direction, and speed. For completeness, the comparison includes a state-of-the-art algorithm with comparable features and applicability, the RRT* algorithm. The RRT* is a random sampling algorithm designed to explore domains using tree structures and determine collision-free maneuvers quickly. RRTs [
50] iteratively generate tree-like structures, initiating from a root node and terminating when a node is close enough to the desired goal. The “*” (star) variant, or Optimal RRT [
51], includes local optimizations of the tree topology within the neighborhood of each newly generated node, leveraging a cost function to generate heuristically optimal trajectories. The implementation of the RRT* for comparison includes assumptions similar to those described for the dynamic programming algorithms described in this paper. In addition, our implementation can account for fixed and moving obstacles with constant speed and direction, similar to the DP-based schemes proposed here [
21].
The RRT* produces heuristically better solutions the more iterations it runs. However, a large number of iterations leads to a high computation time since the complexity of the RRT* algorithm, if the cost function runs in constant time, is
, where
K is the number of iterations [
51]. As a stopping criterion, we assume that the algorithm ends when it is possible to reach the target side of the domain from a node closer than the distance used to select the near nodes in the tree optimization. If more than one maneuver is possible, the one with the lowest cost is chosen. Setting a minimum number of nodes in the tree ensures enough iterations to obtain sufficiently optimized paths. Since the computation time depends on the number of iterations, i.e., roughly on the number of newly created nodes, this parameter significantly affects the trade-off between computation time and optimality of the solution. On the other hand, to avoid infinite iteration in dead-end scenarios, a maximum number of nodes after which the algorithm stops is usually set. To diversify the comparison, we consider two values for the minimum number of nodes of the RRT*: 500 nodes (RRT*-500n) and 2000 nodes (RRT*-2000n). In both cases, the algorithm stops if the number of nodes is ten times the minimum.
The four algorithms, DP, GADP, and RRT* with two different minimum tree nodes, are tested against one thousand randomly generated scenarios, three of which are shown in
Figure 9 for example purposes. The comparison includes the cost function defined in Equation (
24), the computation time, and the percentage of failed scenarios. Moreover, for each scenario, some further algorithm evaluation metrics proposed in the literature have been included [
15]. In summary, we consider the following metrics:
Each of the considered metrics is analyzed both in absolute value and in normalized form. If we denote with
a generic metric measuring the performance of algorithm
a over the testing scenario
k, the normalized metric
over a set
A of algorithms is defined as follows:
While absolute metrics allow us to appreciate the absolute values of certain quantities of interest, which, in some cases, are significant (e.g., computation time), normalized metrics allow us to understand how often one algorithm is better than another on a per-scenario basis. We present the results in graphical format using violin plots in
Figure 10,
Figure 11,
Figure 12,
Figure 13 and
Figure 14.
Figure 10 shows the distribution of control energy, i.e., the cost function. DP and GADP have lower values than RRT*. We also note that the optimality of the RRT* result increases if we impose a higher minimum number of nodes, as expected. The normalized values highlight that DP and GADP score the best results in most cases. In particular, the average value obtained by DP is the lowest of the four, followed by that of the GADP.
Figure 11 shows the other side of the trade-off, which is computation time. In this case, we can see that GADP and RRT* with 500 nodes score the best performance, the former showing less variance than the latter. RRT* with 2000 nodes averages an order of magnitude above the others in absolute terms and is almost always the slowest on a per-scenario basis. Cross-comparison with
Figure 10 allows us to appreciate the good trade-off of the proposed algorithms in the analyzed scenarios.
Figure 12 shows the smoothness of the trajectories. Since DP and GADP provide a fixed number of waypoints, evaluating control energy and smoothness is roughly equivalent. This is not the case for RRT*, which provides solutions with varying numbers of waypoints. In comparison with
Figure 10, we can observe that this difference favors the DP-based algorithms.
Figure 13 presents the minimum CPAs obtained by the four algorithms. We can observe that they all meet the required safety distance constraint of 1 nmi with comparable performance. Such behavior is reasonable since the distance to obstacles does not intervene in ranking the solutions but only as a constraint.
Figure 14 shows the results in terms of path length. Considering the absolute values, we might note that the four algorithms provide similar results; however, the normalized values highlight that the DP-based algorithms generally provide shorter paths than the RRT*-based algorithms on a per-scenario basis. Eventually,
Figure 15 shows the percentage of failed scenarios, i.e., scenarios in which the algorithm fails to find the solution. We can again see how the greedy approximation has a greater tendency to fail scenarios even where a solution exists since DP can compute it. We note that RRT* algorithms have a lower failure rate due to the greater flexibility that a non-grid-based algorithm allows.
9. Conclusions
In this paper, we proposed a dynamic programming approach for the collision avoidance of autonomous ships. We showed how the collision-free route calculation for a ship can be formulated as a multi-stage optimal decision problem by leveraging Bellman’s equation, and we described appropriate constraints for obtaining a collision-safe route with features compatible with the maneuvering capabilities of a ship and compliant with collision regulations.
We proposed and discussed a tabulation-based solution scheme to compute the solution. Moreover, we proposed a greedy approximate dynamic programming (GADP) scheme that allows, using a greedy approach, the number of transitions to be evaluated for each step to be reduced, consequently reducing the algorithm’s time complexity.
We evaluated the performance of the proposed algorithms. Firstly, we tested the algorithms in some navigation scenarios. Secondly, we tested the proposed algorithms against randomly generated scenarios to evaluate the influence of grid parameters on solution optimality, computation time, and failure rate, highlighting the trade-off between DP and GADP. Finally, we compared the proposed algorithms with a known planning algorithm, the RRT*, using performance metrics proposed in the literature. Dynamic programming confirmed its role as a powerful and flexible tool for solving practical problems efficiently with low computational burden.
This work has laid a solid theoretical foundation, paving the way for integrated applications. The result showed that the proposed algorithms could be used in real-time applications, such as in an automatic navigation or decision support system, to play a first-class role in the transition towards autonomous shipping. As a future development of this research, we foresee the possibility of testing the developed algorithms in an autonomous navigation architecture both in simulation and model-scale experiments. Moreover, we envision human-in-the-loop decision support tests using the simulator from the perspective of full-scale testing.