1. Introduction
A bread and butter operation on a road network is finding the shortest path from a source to a destination and a natural corollary to that is the determination of whether a POI is along the shortest path or can be reached by making a small detour. This problem is called an “in-path” query. While this problem is deceptively simple, it almost always involves computing the shortest path or at least some part of the shortest path, which makes this operation slow. Given a set of POIs, the in-path query is a determination of which of the POIs lies on the shortest path or can be reached by taking a bounded detour from the shortest path. The detour can be expressed either by a fixed value (e.g., 1 km) or a percentage fraction (e.g., say 5% additional detour from the shortest path).
Now, there are many applications that require answering millions of in-path queries on a road network. Consider the following real-life scenarios where the notion of in-path is of critical importance.
- 1.
In-path POIs A coupon company wants to place digital coupons on route planner apps. The coupons of participating businesses that are in-path to the user’s shortest paths are placed on the maps app. Since routing apps are quite popular, there may be millions of drivers requiring quick lookups of the coupons that are in-path to their shortest paths. Related use-cases include ride-sharing [
1,
2,
3,
4], location-based POI recommendation [
5,
6,
7,
8,
9,
10,
11], package delivery [
12], location-based advertisement [
13,
14,
15,
16], or scenic spots [
17,
18,
19,
20].
- 2.
Analysis or Simulation Queries A city wants to choose a COVID vaccination camp and wants to choose one among hundreds of possible locations. One of the considerations among others is that these camps should be in-path to many commuters from a target group whose home and work locations are known. The city takes the daily commute routes of their target group and chooses the one location for the camp that is in-path to the most number of commuters. Related use-cases here include evacuation or shelter planning [
21,
22,
23,
24,
25,
26], location planning for new businesses [
27], choosing new sites for playgrounds, etc.
Both these queries require computing millions of in-path queries per second and at high throughput. When a user is interacting with a routing app, the window of opportunity for displaying coupons or gas stations along the way is very short. Answering in-path queries requires finding the shortest path or at least part of it. Our challenge in this paper is doing so without having to actually compute the shortest path during runtime.
We introduce an in-path oracle that is able to answer millions of in-path queries per second. An oracle [
28,
29,
30,
31] is a data structure that is able to answer a query not by computing but by looking up a precomputed result quickly. In our case, the in-path oracle is able to make the determination for any arbitrary source and destination if a POI is along the shortest path or not. The key innovation here is that to make this determination, we do not have to compute the shortest path between the sources and the destinations. The construction of the oracle involves precomputation and storing all the pertinent information needed to answer the in-path queries. The precomputation is done in an offline process which aids in our stated goal of building a spatial platform that can support in-path queries at high throughput. To the best of our knowledge, the high-throughput computation of “in-path” queries is not yet explored on a real road network. Our work builds on a body of road network oracles like
distance oracles [
29], and
path oracles [
30] that have been pioneered by the authors and others over the last decade. This work adds to the arsenal of high throughput operations one can perform on road networks. We list out the following important contributions in the paper.
Throughput: The in-path oracle relies on precomputing using an offline process so that queries can be answered at a high throughput rate. Once precomputation is done, our in-path oracle can answer 1.5M in-path queries per second on a single desktop machine. We compared our approach to the
dual Dijkstra approach [
12] and show that our approach is many orders of magnitude faster since it can answer in-path queries without computing the shortest path.
Size: The next consideration is the size of the oracles. We show both theoretically and experimentally that the size of our oracle is
where
n is the number of nodes in the road network and
m is the number of POIs. In-path queries can be answered in
time where
F is the number of POIs in the result. The complexity bound is achieved by appealing to the similarity to WSP decomposition [
32,
33,
34,
35,
36] which provides us with the linear bound on the number of nodes on the road network. Note that the linear bound means that each POI requires just a few MB of storage and our approach can be scaled to a large number of POIs.
Computation Time: The precomputation takes about tens of minutes on a single machine for city-sized road networks. Note that, although not explored in the paper, since the computation is data parallel, there are straightforward techniques to speed up the precomputation by employing more machines.
Interoperability: The in-path oracle can be embedded, employed, and used in multiple ways. One of the methods described in this paper is embedding in a relational database system [
37] and querying using SQL. This means that the in-path oracle can be used in complex querying scenarios.
Related work is discussed in
Section 2. Next,
Section 3 provides the background and problem statement. A dual Dijkstra based baseline algorithm for finding in-path POIs is provided in
Section 4. We describe the in-path oracle in
Section 5, while experimental results are provided in
Section 6. Finally, concluding remarks are drawn in
Section 7.
3. Background
In this section, we first develop the preliminary concepts before describing the problem setup.
3.1. Road Network
In our formulation, the road network is modeled as a directed, and weighted graph, , such that V corresponds to nodes which are road intersections, while E corresponds to edges which are directed road segments connecting two nodes. Let n, and m be the number of nodes and edges respectively in the road network. Edge weights represent either the travel distance or the travel times between two adjacent nodes and that have an edge between them. The edge weight is strictly greater than zero. In other words, there are no negative or zero edge weights.
3.2. Shortest Distance
Given source s and destination t nodes, the shortest distance is obtained by summing up edges along the shortest path between s and t. Note that since denotes either distance or trip time, denotes the shortest path using an appropriate unit. Note that G is fully connected such that given any two nodes u and v, there is a path from u to v and vice-versa, although they may not have the same distance due to the directed nature of the road network.
3.3. Detour
Given source s and destination t nodes, let denote a simple path that is not necessarily the shortest. The detour of such a path is the difference in the network distance along compared to . Furthermore, it is fairly trivial to see that the detour of any path is greater or equal to zero.
3.4. Detour Bound
We are interested in finding detours that are bounded by a fraction such that the detours are bounded by . For instance, if , we allow up to 10% increase in the path distances to accommodate the detour, so, all paths between s and t of distance are acceptable.
3.5. POI
A point of interest (POI) is an arbitrary node on the road network through which we want to route the shortest paths. There are m such POIs in the road network and these can be dynamically added or deleted as the algorithm proceeds.
3.6. In-Path POI
A traveller starts from a source node s and is driving to the destination node t. A POI p is said to be in-path if there is a path that passes through p that is a detour bounded by .
Now that we have provided the necessary details to define the problem setup, we can provide the problem statement that we solve in this paper.
3.7. Problem Definition
We are given a road network G, set P of m POIs, and a detour bound . A driver travels from source s and destination t, we want to find the set of POIs in P that are “in-path” under the conditions specified.
Now that we have provided our problem statement we can also discuss the limitations of our approach. First, we are interested in retrieving all the POIs that satisfy the detour constraints. Our focus here is maximising the throughput where one can answer millions of in-path queries a second using a single machine. Our solution is not geared towards a driver that wants to visit multiple POIs yet stay within the detour bound. In our model, the expectation is that the user is presented with the POI choices and may choose one of the in-path POI to visit. Such examples include coffee shops, restaurants, gas stations, vaccination clinics, etc. The driver is unlikely to visit another POI of the same kind. Our work is in the context of the placement of relevant POIs on a map as an opportunistic service where speed is of the essence, so, the composition of complex trips that include visiting multiple POIs is not the focus of this work.
4. Dijkstra-Based In-Path Algorithm
Over the years, there have been many variants, modifications, extensions, and case-specific (negative weight edges, small and fixed integer weight edges) alternatives proposed to Dijkstra’s algorithm. Prominent among them are
[
60], Bellman–Ford algorithm [
61,
62], Moore’s algorithm [
63], Dial’s algorithm [
64], etc. that improved the original algorithm or applied it the algorithm to different use-cases. For instance, Moore’s algorithm uses a queue (instead of a priority queue) that can handle negative edge weights. In our problem formulation, we require a way of not only finding the shortest paths but also visiting the detour paths in an incremental fashion.
In this paper, we focus on developing the dual Dijkstra algorithm [
12], a Dijkstra based variant for finding POIs within the specified detour tolerance limit
. This approach serves as a baseline algorithm for our oracle-based approach as well as a useful mechanism to highlight a few properties that we will use later. Here we are given a source
s, a destination
t, detour limit
, and a set of POIs
P. The algorithm fetches the subset of POIs in
P that are in-path with respect to the given source and destination.
Algorithm 1 sets up the dual Dijkstra’s implementation for determining the in-path POIs for a given source
s and destination
t. Our algorithm makes use of a priority queue
Q of objects
O, which are made up of triples,
,
, and
, namely, the current node, road network distance, and the traversal label. To aid efficient storage representation, these handy functions are defined to access the fields from the object
O. Note that
can either be a backward or forward scan or a POI. The current state of the algorithm is stored in two data structures
T and
Visited.
T stores the forward and backward distances of a node from
s and
t. For instance,
and
provide the backward and forward distances of a node
v, respectively. Another data structure
Visited helps avoid revisiting already visited nodes and is also initially empty. Since we have two simultaneous traversals, this data structure needs to keep track of which traversal (i.e.,
and/or
) has visited a node. For
s and
t,
Visited is marked TRUE, i.e.,
Visited is initialised. We initialise the forward scans from
s and the backward scans from
t. Updation of the priority queue
Q takes place with the forward and backward traversals from
s and
t respectively. The incoming and outgoing nodes are added into
Q with their appropriate labels. Finally,
is the estimate of the shortest path distance computed so far, and
result stores the POIs found thus far.
Algorithm 1 Setup for determining in-path POIs |
- 1:
Object O⇐ triple of node v, network distance d and label l indicating forward or backward traversal - 2:
, and are helper functions to access elements of O - 3:
Q⇐ min-priority queue of O ordered by distance from s. Q is initially empty. - 4:
is a table of forward or backward distances of from s or t. - 5:
Visited( returns true or false if has been visited by forward or backward scan. - 6:
Visited() = True - 7:
for each outgoing edges do - 8:
Q.Insert() - 9:
end for - 10:
Visited() = True - 11:
for each incoming edges do - 12:
Q.Insert() - 13:
end for - 14:
⇐∞ ▹ Holds estimate of - 15:
⇐ ▹ Output of the algorithm
|
The primary approach is presented in Algorithm 2, which is based on the aforementioned setup and initialisation. Line 1 illustrates the termination condition of the algorithm. When the Q is empty, the algorithm stops. In other words, when the whole graph has been traversed or if the element at the front of the queue is at a distance more than , the algorithm stops. We provide correctness proof for the algorithm in Lemma 1.
We start by looking into object
O, which has been taken out of the
Q front. When a POI is found at the front of the queue, the POI is in-path and is added to the result. This is shown in lines 2–5. In lines 6–10 it checks whether the current extracted object
O from
Q is already marked, i.e., we look at
Visited to check if the node has been already traversed. If so, we ignore it and continue with the traversal, otherwise, the node is marked as visited. To ensure that every object’s label is correctly reflected in the
Visited data structure, we label every object whether the traversal is forward or backward. After that, in line 11, we check the corresponding entries of
T, i.e. whether an update requires the current object’s node
and the label
. If the distance has already been computed, we stop trying to improve it. This is because of the best-first nature of the backward and forward traversals on
G. At this point, we furthermore check whether both the labels (i.e., backward and forward) of node
are computed. When both the entries (i.e., backward and forward distance) entries are computed, then in line 14 we compute the current shortest path distance and compare it with the existing shortest path distance. In case, the current shortest path distance is less than the existing shortest path distance, we replace the existing shortest path distance with the current shortest path distance. Then in line 19, we add an object triple into
Q which stores the fully resolved node’s current shortest path distance and the POI node. When we say fully resolved, we mean that both the forward and backward traversals have reached the POI after which it is added back to
Q.
Algorithm 2 Determine if p is in-path to for a given |
- 1:
while O⇐do - 2:
if then - 3:
▹ POI is found - 4:
continue - 5:
end if - 6:
if Visited() == True then ▹ Indicates cycle - 7:
continue - 8:
else - 9:
Visited() = True - 10:
end if - 11:
if Not Exists then - 12:
Update with - 13:
if Both () and () Exists then - 14:
) + - 15:
if then ▹ Updated once. - 16:
▹ holds detour limit - 17:
end if - 18:
if then - 19:
Q.Insert() ▹ Add fully resolved POI to priority queue - 20:
end if - 21:
end if - 22:
end if - 23:
if then ▹ Forward scan - 24:
for outgoing nodes of s.t. !Visited(, ) do - 25:
F) - 26:
end for - 27:
else ▹ Backward Scan - 28:
for incoming nodes of s.t. !Visited(, ) do - 29:
B) - 30:
end for - 31:
end if - 32:
end while - 33:
return
|
Finally, lines 23–32 are based on the kind of the label, i.e., depending on , all the incoming nodes of are added into Q in case of backward traversal. All the outgoing nodes are added into Q in the case of forward traversal. The algorithm continues until either the queue is empty or a decision for an in-path POI is reached. At this point, if is empty, then it means that the algorithm failed to find an in-path POI (line 33).
Lemma 1. The above algorithm correctly retrieves all the POIs in P that are in-path to the shortest path from s to t.
Proof. Suppose is a POI that may be in-path with respect to the shortest path between s and t. Here we have to be cognizant of two cases and handle them appropriately.
Case 1: p lies on the way from s to t, so to reach p, no detour is required. In other words, p is part of the shortest path.
Case 2: p does not lie on the way from s to t, so to reach p, a detour is required. The complexity, in this case, is that the determination of whether p is within the detour limit can only be made after the shortest distance between s and t has been determined. Therefore, the algorithm should seamlessly work for an arbitrary value of .
For Case 1, the algorithm can only stop when the object at the front of the queue is . Note that , so we are guaranteed that all the nodes in the shortest path are visited by the traversal before the algorithm fetches the POIs corresponding to Case 2.
Now for Case 2, similarly, the algorithm cannot stop before all nodes that have a distance of have been visited by both the forward and backward scans. Note that this means all the POIs qualifying for the detour requirement are retrieved by the algorithm. □
5. In-Path Oracles
The objective of this paper is to design an in-path oracle to efficiently answer the problem defined in
Section 3 whether a given POI is in-path with respect to a given source and destination provided during query time. The goal of an in-path oracle is to answer this question quickly using precomputed results. In this paper, we define a new kind of oracle but the concept of oracles in itself is not new. For instance, a distance oracle [
29] provides an
-approximate shortest distance, while a path oracle [
30] provides both an approximate distance as well as the shortest path. In contrast, the in-path oracle proposed here retrieves the POIs that are in-path to the shortest path of the driver travelling from
s to a destination
t, subject to a detour limit
.
The key idea behind in-path oracles is that it exploits the
spatial coherence [
46] property in road networks which observes that spatially adjacent nodes in a road network share common characteristics. The shortest paths and distances between nodes and their spatial locations are coherent with one another which forms the key principle behind the path and distance oracles [
29,
30].
The in-path oracle expands on the spatial coherence property in road networks. We know from path coherence that a set of source nodes A and destination nodes B may share the same shortest paths if A and B are sufficiently far apart from one another but the nodes in A and B themselves are close to one another. Since they share common paths it means that we can now determine if a given POI is in-path with respect to this group of nodes as opposed to single pairs of shortest paths. This is a powerful idea since it allows us to define the “in-path” property on groups of nodes as opposed to pairs of nodes resulting in dramatic space savings and efficient access.
In the rest of this section, in
Section 5.1, we establish the sufficient conditions by which we can determine if a POI
p is in-path to a set of sources
A and destinations
B. Next, we provide the algorithm for computing the in-path oracle in
Section 5.2. We discuss how the oracle is stored and indexed in
Section 5.3. Finally, discuss how the oracle is queried using a few examples in
Section 5.4.
5.1. Identifying In-Path Property
In this section, we define the in-path property for a set of source nodes
A and a set of destination nodes
B. The sources
A and destinations
B are represented by a bounding box (later we restrict the box to be a quadtree [
65,
66,
67,
68,
69,
70,
71,
72] block) that contains all the nodes in
A (and
B). Let
s be a randomly chosen representative source node in
A while
t is a representative destination node in
B. Let
p be the POI to which we want to make a determination if all the shortest path from all the sources in
A to all the destinations in
B are in-path to
p. In other words, we want to determine if block-pair (
) is in-path to
p. Note that we make this determination for a single POI and then repeat the process for all the
m POIs.
Let
be the forward radius from
s denoting the farthest distance of a node in
A from
s. Similarly,
is the backward radius to
s denoting the farthest distance from a node in
A to
s. We can compute a similar radius for
t in
B. We define
and
denoting the forward and backward radius of
B, which is depicted in
Figure 1. For every pair of source nodes in
A to every destination node in
B, one can define the shortest path. Among these shortest paths, one can bound the shortest and longest shortest paths. These are captured by the following lemmas.
Lemma 2. The length of any shortest path between A and B is greater than or equal to .
Proof. This can be trivially shown by counter example. If the path is smaller than that quantity, then one can go from s to t using a shorter distance than . Let us choose an arbitrary and to be an arbitrary source and destination and, furthermore, the network distance between them is smaller than the unit derived above. Then one can take a shortest path . Note that is bounded by and is bounded by . Hence, has to be greater than or equal to else is not the shortest distance between , which would be a contradiction. □
Lemma 3. The length of any shortest path between A and B is at most .
Proof. Let us choose an arbitrary and be an arbitrary source and destination and, furthermore, one can define a path as follows: . This path is bounded by . Hence, the lemma. □
Lemma 4. is in-path to p if the following condition is satisfied.
Proof. For any given node
in
, respectively, the network distance between
is at least
from Lemma 2. We can estimate a path that passed
. If
p is in-path to
, then any path passing via
p should be at most
. For the shortest possible path in
to be in-path with
p, we have the following inequality:
If the above inequality is satisfied for a particular pair of , then any shortest path between A and B is in-path to p. □
Lemma 5. is not in-path to p if the following condition is satisfied.
Proof. For any given node
in
, respectively, the network distance between
is at most
from Lemma 3. A path that passes through
p is
is at least
if
are part of the shortest path between
and
, respectively. If the shortest path between
is not in-path to
p, then we have the following inequality:
If the above inequality is satisfied for a particular pair of , then any shortest path between A and B is not an in-path to p. □
Lemma 6. If both conditions in Lemmas 4 and 5 are not satisfied, then some sub-division of is in-path to p.
Proof. Subdividing A and B reduces the radius of the blocks and could satisfy the conditions in Lemmas 4 and 5. In other words, if we cannot make a determination whether is in-path to p, we need to keep subdividing it until children blocks are either empty, in-path, or not in-path. □
Figure 2 shows an example of both in-path and not in-path cases in practice. Here the POI is shown using a green circle. The set of sources and destinations are shown using blue colour. The common shortest path between a randomly chosen representative source and destination, and the resulting detour path are also shown. Now that the lemmas establishing the properties have been established, an algorithm for computing the in-path oracle for a road network is described next.
5.2. Computing In-Path Oracle
Algorithm 3 provides the mechanics for computing the in-path oracle. The input to the algorithm is a road network consisting of nodes and edges. There is a quadtree on the spatial positions of the nodes such that
R denotes the root block. Queue
Q holds the quadtree block pairs. Note that
Q is initialised with the block-pair
. The algorithm continues till
Q is empty (line 4). At every iteration of the algorithm, a block pair
is retrieved from the front of the queue. We choose a source
s and destination
t at random from
A and
B, respectively. We compute the shortest distance between
s and
t, i.e.,
, and the distance of the shortest path that passes through
p, i.e.,
and
. We also compute the forward and backward radiuses of the blocks, i.e.,
,
,
,
.
Algorithm 3 Determine in-path oracle for a given POI p |
- 1:
quadtree root block on the nodes of the road network. - 2:
is the list of block-pairs forming the output. - 3:
Q ← is a Queue of block pairs. - 4:
while
do - 5:
← - 6:
Choose at random from , respectively. - 7:
Compute , , , , , , - 8:
if satisfies Lemma 4 then - 9:
Add to - 10:
else if satisfies Lemma 5 then - 11:
continue ▹ A, B is known to be not in-path - 12:
end if - 13:
Subdivide A and B into 4 children blocks. Discard empty children blocks if any. - 14:
Insert all children block-pairs into Q - 15:
end while - 16:
return
|
Armed with the necessary components, one can use Lemma 4 if denotes an in-path block-pair. If so, we add it to the result set. Similarly, Lemma 5 informs if the block-pair cannot be in-path in which case it is discarded. Finally, if both the conditions are not satisfied, we will simply break A, B into their 4 children blocks to form the resulting children blocks. These are inserted into Q. It is fairly easy to see that the children blocks might satisfy one of the two lemmas after a few decomposition steps. If one continues running the algorithm at the end we will get pairs of blocks such that they are all in-path with respect to p.
Referring back to the example in
Figure 2a,b, one can see a block pair A, B denoting both the in-path and not in-path cases. Here we can see that
Figure 2a corresponds to the case in line 9 where all the paths between A, B pass through the POI under consideration. The POI is shown by the green dot in the figures.
Figure 2b corresponds to the case where we can determine that the
p is definitely not in-path and the pair A, B can simply be discarded. This covers in line 11 in the construction algorithm. Finally, if some of the paths between A and B pass through
p but some of them do not, we have to further subdivide. This is the case in line 13 where we have to further subdivide A, B into children block pairs and insert them into the priority queue.
Now, we have established the principles of the in-path oracle decomposition, for a single POI, we can estimate the size of the resulting decomposition in Lemma 7.
Lemma 7. The size of the in-path oracle for p is since it is a Well-Separated Pair (WSP) decomposition of the road network.
Proof. The size of the in-path oracle uses the same arguments as in [
29]. Since there is a bit of background information that connects road network oracles (e.g., distance, path, and now in-path) to WSP, the interested reader is referred to [
29,
30,
45,
46,
73] for the necessary background information. Here we will only provide a rough sketch of how the bounds are established. □
The first intuition is that the in-path oracle decomposition in Algorithm 3 is a WSP decomposition of the nodes of the road network. This can be seen from Lemmas 4 and 5 that the block-pair under consideration is in-path or not depending on the relative magnitude of the radius of the blocks to the distance separating the blocks. For instance, if the blocks are sufficiently “far” and their radiuses are small, the block may (or may not) satisfy the in-path condition. It is important to note that the WSP decomposition has an identical condition of determining if a block-pair is sufficiently far. Once the similarity has been established, we know that the size of the in-path oracle decomposition is
, where
s is the separation factor [
33] and
d is the dimensionality of the embedding space which is 2 for the case of road networks. In our case,
s is dependent on
as we discuss next.
Now, we can observe that, as increases (i.e., relaxed), more blocks easily satisfy the in-path oracle condition. Similarly, as decreases (i.e., made more stringent), more block-pairs can be marked off as not being in-path. So, there is a sweet-spot for that controls the size of the in-path oracle; setting it high or low results in a smaller in-path oracle. Since we are only interested in the worst-case size behaviour here, we can note that the size of the in-path oracle is inversely proportional to the . This can be intuitively observed from the fact that a smaller requires that the radiuses are much smaller than the distance between the blocks. In other words, the separation factor s is proportional to where the factor 2 denotes the dimensionality of the embedding space.
5.3. Storing the In-Path Oracle
From Lemma 7 we know that a single POI results in an in-path oracle of size
. For each entry, we store a block-pair
corresponding to the quadtree blocks as a four-dimensional Morton code [
72,
74,
75,
76]. In this scheme, a pair of Morton codes of the same level can be bit-interleaved to form a 64-bit number with the level of the block stored in the most-significant bit position of the code. Here 64 bits is sufficient to represent a fairly small area of less than 100 meters on the surface of the road network. This variation is also
linear quadtrees [
72] and forms the basis of storing multidimensional quadtrees on a database system. In the following we provide the size and access time results for our in-path oracle.
Lemma 8. The in-path oracle for m POIs takes up space and has an access latency of , where F is the number of POIs in the result.
Proof. Given
m POIs, we compute the in-path oracle for each POI. We store the identity of the POI along with each block-pair. The table containing these blocks is encoded as a four-dimensional Morton code, sorted and indexed by a B-tree [
73,
77,
78,
79,
80,
81]. The size of this representation is
. Given a source
s and destination
t, we can first compute its four-dimensional Morton code, i.e.,
in
. Next, we can use that key to look for matching blocks using the B-tree index. The time complexity of this operation is
, where
F is the number of POIs that match the results. This leads us to the storage and access complexity of the in-path oracles. □
5.4. Querying the In-Path Oracle
To illustrate how one can use the in-path oracle effectively, we consider a relational database [
37,
73,
82] containing a few sample tables using which we can create a few scenarios. The oracle is stored as a table
ORACLE (, ). Here
represents the block-pair that is known to be in-path with respect to
. Here the attribute
is a foreign key to another table
POI (, , , ...) containing additional dimensional information about the POIs. Finally, there is a very large table of
ACTIVE_TRIPS(, ) where the starting and ending locations are encoded
as 4-dimensional Morton codes. Note that the
ORACLE table is indexed by (
,
) which means that one can binary search for a particular
pair in
time, where
n is the number of nodes and
m is the number of POIs. Once we have the following setup, we can apply some interesting queries to the in-path oracles.
Find all POIs that are in-path to a given source and destination : Here we are given a source s and a destination t and we want to find all the POIs that are in-path to the given source and destination. In this case, the query is a lookup on the ORACLE table using the primary index to fetch the in-path and joining it with the POI table to provide the dimensional fields of the POIs. This query is expected to be very quick since it is a simple lookup using a primary index. One can do millions of such queries per second using a single machine hosting the database which means that one can trivially scale up this setup for use in the critical path of a popular navigation app to add an in-path layer during navigation.
SELECT , Type, Name
FROM POI JOIN ORACLE using ()
WHERE = ;
In this setup, one can add new POIs by adding a row to the POI table and corresponding entries to the ORACLE table. Similarly, if one had to delete a POI, one just needs to delete the entry from the POI table and the delete operation would cascade to the ORACLE table as well.
Find the POIs that are in-path to most active trips: Here we use the ACTIVE_TRIPS table since we want to find the most common set of POIs to a potentially large table of trips. Such a query denotes an analysis OLAP style of the query. Here we join the ACTIVE_TRIPS table to obtain the list of POIs and then join it with the POI table to obtain the list of POIs and only retain those POIs that are of type “VACCINATION-CLINIC.”
SELECT , count(*) as num_in_path
FROM POI JOIN ORACLE using ()
WHERE IN
(SELECT FROM ACTIVE_TRIPS)
AND POI.Type = “VACCINATION-CLINIC”
GROUP BY
ORDER BY 2 DESC;
As one can see, a fairly complex query can be reduced to a few lines of SQL using the in-path oracle which speaks to the real power of this abstraction.
6. Experimental Results
The experiments were conducted on a GPU consisting of an AMD Ryzen Threadripper 3960 CPU, which has multi-processing power. It has 48 threads and 24 cores, having 2.2 GHz clock speed, 128 GB RAM, 4 TB HDD, and 512 KB cache memory. The implementation of proposed algorithms is done using Python programming language, version 3.10, running in 64-bit Ubuntu Linux operating system, version 20.04.3, Kernel version 5.4.0-125.
Dataset: The road network dataset was obtained from the 9th DIMACS Implementation Challenge website [
83]. We used two datasets in our evaluation, Washington, D.C. road network with 9559 nodes and 12,304 edges and San Francisco dataset which included 321,270 nodes and 800,172 edges. Each directed edge’s weight represents the road network travel distances between two nodes. The spatial position of each node was also available which was normalised into a [0, 1) grid box.
Comparison Techniques: Our primary evaluation consists of dual Dijkstra baseline algorithm and comparing it against the proposed in-path oracle. In particular, we choose a source and destination node at random and ask both dual Dijkstra algorithm and in-path oracle to determine which of the POIs are in-path to the shortest path. The parameters that we change to examine how the algorithms perform under diverse settings are listed in
Table 1. Each data point was repeated 100 times by randomly chosen sources and destinations for each time and then averaged across all execution runs. The POIs used in the experiments were randomly chosen by uniformly sampling the nodes in the road network. The sampling rate of the POI refers to the rate of choosing POIs among the nodes of the road network. For instance,
means that we roughly choose 1 node from a set of 1000 nodes. Recall that
refers to the detour tolerance limit which is varied as a fraction over the shortest distance between the source and destination. We vary the epsilon value from 0.1 (i.e., 10%) to 5 (500%).
Methodology: We collected the following metrics for each experiment: (1) the elapsed time spent waiting for the program to deliver its outputs (measured in seconds), and (2) the throughput of the algorithm using a single machine. For the dual Dijkstra approach, we additionally report the (3) largest size of the priority queue and (4) the number of POIs added to the priority queue as a way of quantifying how much of the road network had to be traversed by the algorithm. Note that among these experiments, the throughput approach provides the best indicator of the relative performance of the experiments. At the end of the day when these algorithms are deployed in production, the key question is how much machine resources are needed to support the incoming workload. Here, as one can see later, the in-path oracle is a much superior solution compared to the dual Dijkstra baseline approach.
6.1. Baseline Approach
In this section, we examine the performance of a general in-path query variant where the driver is travelling between a source and a destination. The experimental results are on the dual Dijkstra algorithm that we developed earlier, which serves as a credible variant for the in-path oracle approach.
6.1.1. Varying Detour Limit
The plot shown in
Figure 3a demonstrates that the running time increases with the detour tolerance limit. It can be seen from the graphs that, as we increase the detour tolerance limit, the running time also increases. Similarly, the size of the priority queue increases when the detour tolerance limit increases, shown in
Figure 3b. Finally,
Figure 3c shows that the number of POIs added to the priority queue increases when the detour tolerance limit increases. Note that this result again underscores the observation from earlier that the algorithm is sensitive to detour tolerance limit. The larger the detour tolerance limit we allow, the larger is the search space and consequently slower is the query response time.
6.1.2. Varying POIs Sampling Rates
We vary the POIs sampling rate from 0.0001 to 0.1 while keeping the detour tolerance limit
is fixed at 0.01. As mentioned earlier, we conducted each experiment 100 times, averaging the results. We track latency, the maximum size of the priority queue, and the number of POIs added to the priority queue during the course of the search process. The plot shown in
Figure 4a shows that the running time increases with the POIs sampling rate. It can be seen from the graphs that as we increase the POIs sampling rate, running time also increases. Similarly, the priority queue size increases when the POIs sampling rate increases shown in
Figure 4b. Finally,
Figure 4c shows that the number of POIs added to the priority queue increases when the POIs sampling rate increases.
The key insight here is that as the detour tolerance limit increases, the search space of the algorithm increases and consequently the query latency. Although the sensitivity of the algorithm to the POI sampling rate does not seem very high. This is expected since the query is expected to return all the POIs found, so, there is no opportunity to stop the search process when a few POIs have been found.
6.2. In-Path Oracle
In this section, we examine the performance of oracle size with varying parameter values like detour limits and road network size for a fixed POI.
6.2.1. Varying Road Network Size
The road network used in the experiment was the San Francisco Bay area dataset, consisting of 321,270 road nodes and 800,172 road edges. For different sizes (number of nodes) of the road network, we recorded the size of the oracle. We considered the size of the road network between 1000 and 100,000 and kept the detour tolerance limit fixed at 0.25.
Figure 5 illustrates the different sizes of oracles with varying numbers of nodes. One can see that the size of the oracle is linear in the number of nodes which confirms the earlier linear complexity result. This means that, for a road network of 100,000 nodes, the size of the in-path oracle is about 8 MB which is quite modest by today’s computing standards. Note that computing the oracle took less than 30 min for all these cases using a single machine running a single processor thread. One can substantially decrease the query execution by using multiple threads or employing multiple machines. Since the algorithm is data parallel, this denotes a fairly trivial extension to our work.
6.2.2. Varying Detour Limit
For this experiment, we use the Washington D.C. road network dataset and vary the detour limits from 0.05 to 5.
Figure 6 illustrates the size of the in-path oracle and one can see that the size of the oracle is bell-shaped. In particular, when
is greater than 2 the size of the oracle sharply decreases since at that point many larger block-pairs would start becoming in-path thus there is no need to subdivide these blocks deeper. For instance, at
, most of the road network would be in-path, so, the size of the oracle would be quite small. In the case of the baseline approach, the latency increases with
but in the case of in-path oracles the size of the oracle decreases with
which is a desirable property.
6.3. In-Path Query Throughput Experiment
In this set of experiments, the throughput of the in-path queries is measured by the varying sampling rates and compared against baseline dual Dijkstra’s algorithm which was provided in
Section 4. The experiment used 100,000 nodes belonging to the San Francisco dataset. Experiments were done using a single machine with 20 parallel threads corresponding to the number of cores in the machines.
The POIs were randomly chosen and the corresponding in-path oracles were computed. They were inserted into a disk-resident B-tree. The experiments randomly generated source and destination node pairs and looked up the disk-based in-path oracle to find which of the POIs (if any) were in-path to the chosen node pairs. Internally, these queries looked up a disk-resident B-tree with a fixed 3 GB cache. The results shown here correspond to the warm-cache case where the experiments have been running for some period of time. At this point, the in-memory buffers have cached pages of the B-tree in memory. Running the experiment on a cold cache results in high variation in the throughput results, but this case is not interesting from a production standpoint. In our use-case, it was envisioned that the in-path oracle service would be running continuously, and hence the cache is seldom cold.
The number of POIs is chosen similarly to the previous experiments and measured in terms of the sampling rate. The average number of blocks per in-path oracles was about 1 M, which is roughly about 8 MB in memory. Note that the disk size of these oracles are considerably smaller since B-trees encode and compress the data and the resulting representation is an order of magnitude smaller on disk.
We can observe many things from this experimental setup. First of all, the throughput from the dual Dijkstra approach was about 100–200 in-path queries/second for most of the POI sampling rates. In this case, we ran 20 parallel versions of the setup in
Figure 3. In this case, there is no opportunity for work-sharing among the threads, so, we just run them in separate threads that do not share anything other than the graph data structure. In the case of the in-path oracle experiments, they share the cache and the underlying B-trees. Note that the dual Dijkstra baseline algorithm searches the graph representation at runtime while in-path oracle looks up the precomputed representation, so, is expected to be much faster.
In particular,
Figure 7 shows the throughput results as a function of the POI sampling rate. One can see that we get more than one million in-path queries per second. In other words, we can answer queries four orders of magnitude faster than the existing baseline algorithm.
This is a very powerful result that shows that we can use just a few machines to answer in-path queries for very large map services. This means that our in-path oracle technique can also be used in extensive simulations where in-path considerations may be important. This exceptional improvement that we report in this paper is due to the sensible use of precomputation in solving fundamental and basic operations in road networks.