1. Introduction
Searching strategies for finding targets using appropriate sensing modalities are of great importance in many aspects of life, from national security [
1,
2], rescue and recovery missions [
3,
4], to biological applications [
5,
6,
7]. A taxonomy of search problems is proposed in [
8]. We focus on
probabilistic search, where the objective is to localise the target in the shortest time (on average), for a given search volume.
The earliest theoretical studies of search strategies [
9,
10] were based on systematic search along a predetermined (deterministic) path, such as the parallel sweep or the Archimedean spiral [
3,
11]. The search patterns of animals, on the contrary, are random rather than deterministic. An explanation for this phenomenon is that an event, such as a detection (false or true), changes the strategy and hence the behaviour of the searcher. Subsequent changes of strategy manifest themselves as a random-like motion pattern. Most of the current research into search strategies is towards the mathematical modelling and explanation of random search patterns [
2,
12,
13].
By studying the GPS data of albatrosses, it was discovered that search patterns of these birds consist of the segments whose lengths are random draws from the Pareto–Lévy distribution [
6]. This discovery led to several papers demonstrating that the so-called Lévy walks/flight are the optimal search strategy for foraging animals (deer, bees, etc), resulting in fractal geometries of search paths. An alternative to Lévy strategies is the
intermittent search: a combination of a fast and non-receptive displacement phase (long jumps within the search domain, with no sensing) with a slow search phase characterised by sensing and reaction [
14]. Bénichou et al. provided both a theoretical study and experimental data verification of intermittent search [
13,
15]. In their terminology, the fast relocation phase is referred to as the
ballistic flight with constant velocity and random direction. The slow sensing/detection phase is modelled as either a motionless wait or a
diffusive displacement. Bénichou et al. studied intermittent search without taking into account the information gathered by sensing during the search, so the searcher could revisit a same location multiple times; this leads to apparent redundancy in the search process. To overcome this shortcoming, Vergassola et al. [
16] proposed an information driven search strategy (referred as infotaxis), which selects the motion option that maximises the expected rate of the information gain. Information driven search by infotaxis made a profound impact on the research community (for a recent review, see [
2]). Vergassola et al. considered information driven search only in the slow sensing/detection phase and for a single searching agent. Multi-agent infotaxis have subsequently been proposed in [
17,
18,
19,
20,
21].
In this paper, we propose a fully decentralised intermittent information-driven search by a coordinated team of autonomous agents. Each searching agent is equipped with a sensor, characterised by unreliable detection, in the sense that the probability of correct detection depends on the distance to the target, while the probability of false detection is non-zero. Displacement decisions for each agent (i.e., where to move next), both in the ballistic phase and in the diffusive displacement phase, are based on maximisation of the expected information gain. Each searching agent performs the computations (estimation and motion control) locally and independently of other agents. Group coordination, for the sake of achieving the (common) task mission, is carried out via consensus [
22], by exchanging the data only with neighbours, in a manner which does not require global knowledge of the communication network topology. The proposed approach is therefore scalable, in the sense that the complexities for sensing, communication and computing per agent are independent of the network and agent formation size. In addition, because all sensor platforms are treated equally (no leader–follower hierarchy), this approach is robust to the failure of any of the searching agents. The only requirement for avoiding the break-up of the multi-agent formation is that the graph of its communication network, during the search, is connected at all times.
2. Problem Formulation
For convenience, and without loss of generality, let us consider a search area in the shape of a square with sides of length b. The area is discretised into an upright square lattice of the unit mesh size, thus consisting of cells. The grid is specified as , where are the Cartesian coordinates of a centre of nth cell.
The team of searching agents consists of
members. Let the searching agent
at discrete-time
k be located in the cell
. If the agent is in the sensing mode, it collects at time
k a set of detections
. Each detection can originate from the true target or be a false alarm. A vector
consists of a range and azimuth measurement of the perceived target, relative to agent position
. Thus, if the true target is in the grid cell
, its corresponding measurement is a Gaussian random vector
, where
is Euclidean distance, is the angle between and the y-axis of the references coordinate system and is the measurement covariance matrix. The probability of detection is a function of the distance between the agent and the target. The probability of false alarm is constant within the sensing area around the agent (and zero otherwise). For example, let the probability of detection be adopted as , where r is the distance between the target and the agent and is a sensor characteristic. Assuming sensor coverage, the sensing area can be defined as the circular area around the sensor position, with a radius (in this area ).
Searching agents move in formation. Each agent knows its relative coordinates within the formation (for example, the offset from the centroid), however, it does not have to know the topology of the formation or its size. Communication between two agents in the formation can be established only if their mutual distance is smaller than some
. Motion is subjected to small perturbation errors, meaning that an agent whose destination during the displacement is a cell
, may end up in a cell adjacent to it. These motion errors will cause the network topology to vary with time. For simplicity, we assume that communication links, when established, are error-free.
Figure 1 shows a formation of 13 searching agents with two different communication graphs for two values of
. Green lines indicate the existence of a communication link between two nodes (agents) in the graph.
The problem for the team of agents is to coordinate the search and find the target in the shortest possible time, using the information-driven intermittent search strategy. The described problem can be applied in the context of a search for a submarine, using a swarm of amphibious drones, each equipped with an active sonar system and a wireless communication device. When a drone floats on the sea surface, it turns on its sonar to collect detections of underwater targets.
3. Decentralised Estimation: The Probability of Occupancy Map
Let us denote the complete set of measurement data from all S agents at time k as , where and represent the measurement set and the location of sth agent, respectively, at time k.
The current knowledge about the target location within the search area is represented by the probability of occupancy map (POM). This is a map in which each pixel corresponds to a cell of the grid and represents the posterior probability of target presence in the cell. For cell n and agent , this posterior probability at time k is expressed as , where , and is a Bernoulli random variable representing the event that target is located in cell n. The POM is then a collection .
The POM is updated sequentially using the Bayes rule. Given the POM at the previous time, that is
, and the measurement data at time
k from agent
, that is
, the probability of occupancy in the
nth cell is updated as [
23]
if none of the detections in
falls into the
nth cell. The term
in Equation (
2) represents the probability of detection in the
nth cell given that the
tth agent is located at
. A similar explanation applies to the probability of false alarm
. If a detection from
falls in the
nth cell, the update equation for the POM of agent
s is:
Initially, before any sensing information is received, that is at , the POM of each agent is set to , for all and . This POM corresponds to the state of complete ignorance.
Each agent in the team updates sequentially its local POM using its local measurements and those measurements it receives from other agents in the team (ideally, using the complete set of data
). We adopt the dissemination based decentralised architecture [
24] for this purpose, where the entire
is exchanged via an iterative scheme over the communication network. In the first iteration, agent
s broadcasts its data-pair
to its neighbours and receives from them their respective data-pairs. In the second, third and all subsequent iterations, agent
s broadcasts its newly acquired data-pairs to its neighbours and accepts from them only the data-pairs that agent
s has not seen before (i.e., newly acquired). Providing that the communication graph is connected, after a sufficient number of iterations (which depends on the topology and the size of the graph), a complete list of measurement data pairs from all agents in the formation (i.e.,
, will be available to each agent for updating its local POM.
An illustration of the evolution of a POM and the effect of sequential Bayes update is shown in
Figure 2. A group of 13 agents, in the formation shown in
Figure 1a, is placed in the lower left corner of the search area of size
arbitrary units (a.u.). The target is far from all agents and cannot be detected. The sensors are characterised with parameter
a.u., while the distance between the agents is 6 a.u. The probability of false alarm within the detection (sensing) volume was set to
per cell. At
, see
Figure 2a, all pixels of the POM are set to
. At
(
Figure 2b) and
(
Figure 2c), the regions of the maps in vicinity of agent locations become white, indicating a low probability of target occupancy in them. Occasional false detections increase the probability of occupancy in affected pixels; however, with time, they all tend to zero (see
Figure 2c).
By staying longer in the same position, the white areas around the agent formation grow only up to a certain saturation level, determined by the probability of detection as a function of distance. The measurements received after reaching this saturation level increasingly become uninformative. For this reason, the formation at some point in time should move to another location (as discussed in the next section).
The search is terminated when the probability of occupancy in one of the cells of the POM is higher than a given threshold, i.e., when , with . The cell which satisfies this condition is declared to contain the target.
4. Formation Motion Control
The search objective is driven by two conflicting demands:
exploration and
exploitation [
25]. Exploration is driving the agents to new locations in order to investigate as much of the search volume as possible. The exploitation demand is urging the agents to stay longer in one place, because this helps determine with certainty if a detection is false or true and improves the localisation accuracy. The balance between exploration and exploitation exposes two questions: how long to stay in one position and where to move next. Intermittent search strategy [
13,
15] was proposed as a balance between exploration and exploitation. Exploration corresponds to the ballistic flight phase, while exploitation is carried out in diffusion phase.
In decentralised multi-agent search, each agent autonomously makes a decision about its next action. However, some form of coordination between the agents is essential, in order to collectively maintain the prescribed geometric shape of the formation and thus avoid its break-up.
Section 4.1 discusses the individual decisions by agents, while the team coordination is explained in
Section 4.2.
4.1. Individual Decisions
During the diffusion phase, the formation is static and agents repeatedly collect measurements. If the sensing interval (i.e., the time required to acquire a snapshot of measurements) is
, then the duration of the diffusion phase is
, where
is the number of sensing intervals, computed as follows. Recall that, after a single sensing interval, the probability that an agent detects a target within a circle centered at its location and with radius
, is
. After an arbitrary number of sensing intervals
m, the probability that the agent
does not detect the target within a circle of radius
, is then:
is monotonically decreasing with
m. The agent should stay in one location as long as
, where
is a user defined (small) probability value. Let
be the minimal number of snapshots which satisfies
. After
snapshots, the agent is certain with probability
, that the target is not within the radius
. Then from Equation (
4) we can write:
where
is the ceiling function (defined as the smallest integer greater than or equal to its argument).
After the diffusion phase, the agent wants to jump outside the explored area, that is, the length of its subsequent ballistic flight should be at least
. Let the speed of the ballistic flight be
. The ballistic time
, according to Bénichou et al. [
13], is:
where
a was introduced earlier (the sensing parameter) and
is a numerical factor dependent on the search area geometry [
13]:
Note that the value of
slowly increases with the ratio
. Typically,
and then
. The minimum length of a ballistic flight, from Equation (
6) and using
, equals to
. In summary, for given
a,
b and
, the count
can be computed from Equation (
5) as
The search starts with a diffusion phase. After collecting and exchanging all snapshot of measurements by agents in the formation, each agent compares the highest value of its local POM with a threshold, set just above , in order to test if a target have been detected. If the comparison with the threshold is positive, further investigation is required, and hence the agent moves by one step on the grid towards the cell containing the suspected target position and repeats the diffusion phase. There are four options for this one-step move: left, right, up and down.
If the comparison is negative, the agent would consider a ballistic flight, as follows. First, it would create a set of “move” actions , which consists of potential destinations for a ballistic displacement, as well as the option to remain static (no move). Let action represent the destination of the centroid of the formation (after the hypothetical ballistic flight). For each action , a reward is computed.
The reward function is defined as the information gain rate, i.e.,
where
is the current entropy of the POM, defined as:
is the entropy of the POM after the hypothetical move of the agent (and the entire formation) to a new destination (commanded by action ), followed by sensing and updating its local POM during the subsequent diffusion phase.
is the expectation operator with respect to all possible realisations of random measurements. To simplify computations, we assume that during the diffusion (sensing) phase, following an action
, sensing resulted in no detections (being the most likely scenario). In this way, we can ignore
in Equation (
9).
is the time required to carry out the hypothetical move and perform sensing (the ballistic time is the time required for the agent to travel to the new location, while is the sensing time in the subsequent diffusion phase). The ballistic time is computed as the quotient of the distance to be travelled (according to action ) and the velocity of ballistic flight . Recall that one of the actions in is not to move. For this action, is zero.
Note that the rewards for all hypothetical actions are computed before the agent actually moves. The action which results in the maximum reward is selected and subsequently involved in the processing described in
Section 4.2. Let us denote this action–reward pair for agent
s as
.
It remains to explain how the “move” actions of are created. Their number is a parameter of the algorithm. For each “move” action, the length of the ballistic flight is a random draw from the exponential distribution with the mean , where is the multiplying factor and was introduced as the minimal length of the ballistic flight. The direction of the ballistic flight is also random: it is drawn from the uniform distribution over the interval of rad.
4.2. Coordination through Consensus
In a decentralised fusion architecture, each agent, independently of the other agents in the formation, makes a decision on its future action. This can result in a disagreement between the agents, unless the decided actions , , are all equal. The disagreement is undesirable, because it leads to a break-up of the searching formation. The consequence of the break-up is the loss of connectivity in the communication network and, ultimately, reduced effectiveness of search. Initially, at the start of the search mission, the formation is created to ensure its communication graph is connected. The goal of cooperative control is to maintain (approximately) the shape of the formation during the mission and thereby keep the communication graph connected. In addition, it prevents a collision of agents in motion. For this reason, whenever an action decision is made by an agent, it needs to engage in a protocol which will ensure that all members of the formation reach the agreement on the common action to be applied by all of them.
The decentralised cooperative control is based on the consensus protocol [
22,
26]. Consensus algorithms are a family of iterative schemes which enable autonomous agents to achieve an agreement based on local information in a decentralised fusion architecture. Among the consensus protocols, the most widespread are the average consensus and the max-consensus [
27]. Because we use both protocols in the proposed intermittent multi-agent search, they are briefly reviewed next.
Consider a graph representing the communication network used by the agents to share the data (such as those in
Figure 1). Let
be the set of vertices of the graph (representing the agents) and
the set of its edges, where an edge
exists only if agents
s and
t can communicate directly. The set of neighbours of agent
is then
. Suppose all vertices (agents) in the network have calculated locally a certain scalar value or “state” (such as the action reward). The goal of the max-consensus is, for each agent, to determine the globally maximum state, by communicating only with its neighbours. Let the original (initial) state of agent
be denoted as
. At each iteration
, agent
s communicates and updates its state according to [
27]:
After a sufficient number of iterations (which depends on the topology of the network), all agents will reach an agreement on the global maximum.
Average consensus is an iterative algorithm by which agent
computes the mean value of the collection of initial states
, by communicating only with its neighbours. At each iteration
, agent
s updates its state according to [
22]
where
is referred to as the averaging matrix.
Q must be symmetric and doubly stochastic [
28], and satisfies
if
. We adopt the averaging matrix as the Metropolis weight matrix [
29]:
The consensus algorithm is iterative and hence its convergence properties are very important. Although the network topology may change with time (as the agents move), during the short interval, when the exchange of information takes place, the topology can be considered as time-invariant. The convergence of the consensus algorithm in the adopted framework (time-invariant, undirected communication topology) is guaranteed, if the graph is connected [
22,
29,
30]. While this is a theoretical result, valid for an infinite number of iterations, in practice, due to the finite number of iterations, the consensus may not be reached and, as a consequence, one or more agents may be lost during the search (a lost agent has lost the connection with the formation). This event, however, does not mean that the search mission has failed: the target can be found eventually by the remaining (smaller) formation, albeit in a (possibly) longer interval of time.
The max-consensus is used by agents to agree on: (1) the destination of the ballistic flight with the overall highest reward; and (2) the termination of search. The average consensus is used to agree with the simple majority in which direction (left, right, up or down) to carry out the one-step move towards the closest potential target.