1. Introduction
Robotic technology has immense potential to change our daily life. In the industry, human-robot co-working is envisioned to play a key role in the next industrial revolution known as Industry 5.0 [
1]. In healthcare robots also see increasing usage e.g., in personalized healthcare for providing assistance to patients, and the elderly [
2,
3], and during the COVID-19 pandemic, robots were deployed to disinfect common spaces, such as supermarkets and hospitals [
4]. Common to the above is the increased need for autonomous robots that can safely and naturally interact with humans while solving different abstractly and/or vaguely defined tasks. Due to the uncertainties in such problems, pure goal-driven problem-solving architectures will often end up in local minima in the problem formulation also known as impasses. I.e., situations where the information or action selection strategy currently available to the robot is insufficient to solve the task. Thus, one core faculty of such robotic systems should be the ability to reflect on the current situation to timely deviate from one action selection strategy to try out other strategies or to retrieve new information about the task.
The next generation of cognitive architectures, based on modern machine learning techniques, has the potential to revolutionize robotics by allowing roboticists to develop such autonomous systems easily. In previous work, we proposed the Generalized Cognitive Hour-glass Model constituting a framework for developing cognitive architectures by composing them from generally applicable probabilistic programming idioms over which powerful general algorithms can perform inference [
5]. The idiomatic approach to composing cognitive architectures, encouraged by this framework, allows researchers and practitioners to more easily cooperate by mixing and matching probabilistic programming idioms developed by others while being able to handcraft parts of a system for which current solutions do not suffice.
In another work, we proposed one such probabilistic programming idiom based on the “standard model of the mind” [
6] for the task of Active Knowledge Search (AKS) in unknown environments [
7]. This idiom defines a probabilistic decision process that encourages a robot to take actions to discover, i.e., obtain information about, its environment based purely on notions of progress and information gain while avoiding constraint violations. Simulations applying this idiom to the specific problem of active mapping and robot exploration showed promising results. However, limitations were also identified. The main limitation was that in specific situations the simulated robot would get “stuck” in an impasse taking repetitive actions yielding no new information about the environment, thus hindering full exploration of the environment. As we will discuss in more detail in
Section 3, this is essentially caused by the fixed strategy for action selection employed by the previous solution.
In the literature related to robot navigation, impasse phenomena are commonly known as “the local minima issue” [
8], “deadlocks” [
9], “limit cycles” [
10], “infinite loops” [
11], “dead ends", “cyclic dead ends", or “trap-situations” [
12]. Like the problem mentioned above, all of these terms refer to situations in which a fixed strategy for action selection results in no meaningful progress towards a goal state or compared to a measure of optimality. To resolve these situations solutions proposed by researchers within robotics usually rely on problem-specific information, e.g., geometric properties, to detect and/or resolve the impasse. As an example consider the approach used in [
13] where a grid map is defined over the workspace with a counter attached to each of the cells keeping track of the number of times a given cell has been visited. Whenever this counter reaches a predefined threshold, it is registered as a limit cycle. When a limit cycle is detected a temporary way-point is generated, guiding the robot out of the enclosure causing the limit cycle. Finally, when the robot gets outside the enclosure, a virtual wall is generated, ensuring that the robot does not enter the problematic enclosure again. As another example consider the approach used in [
14] where deadlock loops are detected based on the periodicity of the distance to the goal. Similarly, in [
15], deadlocks are detected based on a preferred velocity magnitude, the actual velocity magnitude, and the unsigned distance between robots. While the solutions suggested above might work for specific problems, they do not easily generalize to other problems.
In the literature related to cognitive architectures, similar phenomena in which an agent is unable to make progress with the information that is currently available are often referred to as impasses [
16,
17]. Research in cognitive architectures is mainly focused on developing systems with generally applicable capabilities opposite to the problem-specific solutions commonly proposed in the robotics literature. This is done by taking substantial inspiration from theories about the workings of animal and human cognition developed within cognitive science, based on which computational instantiations are proposed. Nevertheless, solutions to tackle impasse phenomena seem to follow the same pattern as those proposed by researchers in robotics. First, systems are made able to detect impasses. Secondly, systems are induced with some reflective mechanism that based on the detected impasse can choose appropriate temporary decision strategies until the impasse has been resolved. E.g., consider the tri-level control structures implemented by two of the most prominent cognitive architectures, SOAR [
16] and Sigma [
17]. These tri-level control structures consist of a
reflective control mechanism, a
deliberative control mechanism, and a
reactive control mechanism, each of which activates the control layer above it based on the detection of impasses. These control structures assume that there a priori exists a discrete/symbolic set of operators that re-actively can be either sorted out or proposed for further evaluation, thereby making the detection of impasses straightforward without any problem-specific knowledge. This approach has some clear benefits with respect to attention, i.e., the effective allocation of limited (computational) resources. Since each of the layers focuses computations on the information actually needed to solve a given problem in a specific context/state, a lot of computations are saved. However, the approach also raises some difficulties in robotics, where a lot of the low-level control is more naturally described by means of continuous variables. As an example, consider the position control of a robot. In SOAR and Sigma, such controls are usually abstracted to symbolic representations such as “walk towards target object", “run towards target object", “pick up target object” or “walk towards random object” [
17]. These symbolic representations then have to be decoded by an extra module external to the decision process, called the
motor buffer, before they can be manifested in the environment. This layer of abstraction makes it hard to incorporate uncertainties resulting from low-level control into the decision process, simply due to the inevitably loss of information happening when trying to lump low-level controls, e.g., motor currents or positions, that are continuous in nature together by coarse discretized compound control in the form of symbolic representations. In the end, this will result in less optimal responses being picked since the fine level of control needed within robotics cannot be accounted for as an intrinsic part of the decision process.
For these reasons, our intention with this paper is to present our recent efforts toward implementing general reflective mechanisms similar to the ones found in cognitive architectures within the scope of the framework proposed in [
5] in a way that is suitable for robotic applications. The main contributions of this paper are:
A description and implementation of a control structure grounded in stochastic variational inference that is capable of deliberate and reflective control based on architectural appraisals allowing for the incorporation of uncertainties resulting from low-level control.
A demonstration of how such a control structure overcomes the limitations of the probabilistic programming idiom previously proposed in [
7].
A demonstration of how such general control structure compares to problem-specific approaches commonly used in robotics.
A discussion of the time complexity of the proposed control structure.
This paper is organized as follows:
Section 2 introduces the notation used within this paper.
Section 3 shortly describes the previously proposed probabilistic programming idiom in more detail together with the impasse phenomenon observed. Modifications and extensions to the previously proposed idiom are presented in
Section 4 and
Section 5. Simulation results utilizing the modifications are provided in
Section 6 and
Section 7 concludes the paper, and potential future directions are given in this section.
3. Related Work
As stated in
Section 1, the probabilistic programming idiom proposed in [
7] defines a probabilistic decision process for Active Knowledge Search in unknown environments, based on the “standard model of the mind” [
6]. This was done by first defining a probabilistic model relating the previous content of working memory,
, with the future content,
, while taking variables stored in the long-term memory,
, into account. In [
7], the working memory was further sub-divided into variables relating to motoric actions i.e., the
motor buffer,
, variables related to the
perceptual buffer,
,
State variables,
, representing the state of the agent itself, the environment, and
decision variables . From this a probabilistic decision model with the factorization were derived,
where
. Inspired by the work on emotions in [
18], a subset of the decision variables,
, was denoted
attention variables. The purpose of these
attention variables is to control how the decision process is influenced by the other decision variables:
progress,
,
information gain,
, and
constraints,
, hereafter referred to as
appraisal variables. In [
7] this was done via the fixed relation
which basically states that during the decision process attention should be given to future states that yield progress
or new information
and does not violate constraints. Having defined the model in Equation (1) and the relation in Equation (2), Stochastic Variational Inference was used to approximate the posterior over optimal future motoric actions given the attention variables, i.e.,
The above was implemented as an abstract class utilizing the probabilistic programming language Pyro [
19], thereby ensuring that the probabilistic programming idiom can be reused in multiple applications by implementing a few abstract methods defined by the abstract class.
To investigate the performance of the idiom, it was used to implement an algorithm for autonomous robot exploration which was simulated on the full HouseExpo dataset [
20], containing 35,126 different floor plans. From these simulations, one of the observations was that the robot sometimes would end up taking repetitive actions purely driven by the
progress appraisal variable. Whereby, the robot would not fully explore its environment as illustrated in
Figure 1. In other words, the robot ended up at an impasse. It was further concluded that an alternative to the fixed decision strategy given by Equation (2) would be needed to overcome this problem.
4. Overall Idea
Both SOAR and Sigma implement a tri-level control structure, in which the distinction between the deliberative control and the reflective control mechanisms is architectural rather than conceptual. They both simply comprise a specific architectural response to similar architectural stimuli, i.e., the detection of impasses. When we further consider the statement:
“Work in Sigma on appraisal, and its relationship to attention, has led to the conclusion that the detection of impasses should itself be considered as a form of appraisal”
It hints toward the possibility that similar functionality might be obtained from an architecturally simpler control structure. Instead of treating the detection of and responses to impasses as distinctive architectural mechanisms, we propose that this is treated as
affective responses to appraisals arising from evaluations of deliberate attention allowing us to incorporate uncertainties from low-level control. As illustrated in
Figure 2, our proposal is to have a control structure consisting of a single architectural layer with decisions being the result of three main steps:
Deliberate Attention Proposal,
Deliberate Attention Evaluations,
and Affective Responses.
In the state at time t,
, the
Deliberate Attention Proposal step proposes one or more relevant attention mechanisms each coresponding to a specific attention variable,
, e.g., as in Equation (2). Then the
Deliberate Attention Evaluations first follows the steps described in
Section 3 to estimate the distribution over specific future motoric actions,
, corresponding to each of the attention mechanisms proposed by the
Deliberate Attention Proposal. This is indicated with red, yellow, and blue boxes in
Figure 2. After inferring the motoric action posteriors, the
Deliberate Attention Evaluations step evaluates what the expected appraisals would be from effectuating each of motoric actions,
This is indicated by the gray boxes in
Figure 2. As it is also indicated in
Figure 2, both parts of the
Deliberate Attention Evaluations step requires access to parts of cognition distal to the decision process. Based on the expected appraisals of each of the motoric action posteriors the last step in the decision process can initiate different
affective responses, such as proposing additional attention mechanisms to evaluate or effectuating one of the action posteriors, as indicated by the thin arrows in
Figure 2.
Considering the tri-level control structure implemented by SOAR and Sigmathis resembles a combination of the deliberate and reflective mechanism. The main difference is that here attention mechanisms for choosing motoric actions similar to Equation (2) are proposed for evaluation rather than operators for which the outcome is known a priori. The benefit of this is that motoric actions does not need to be represented by symbolic operators a priori. Instead, they are made symbolic implicitly through the choice of deliberate attention mechanism. Thereby, allowing the incorporation of uncertainties of low-level controls. Each of the proposed deliberate attention mechanisms might consider a subset of and/or special combinations and weightings of the appraisal variables available to the robot. Thereby, promoting different behaviors. While the proposed approach conceptually does support deliberate and reflective responses via the
affective responses, it does not currently have support for reactive responses, since all motoric actions have to be inferred from the deliberate attention mechanisms. However, in
Section 7 we will discuss how we imagine that reactive responses could be incorporated into the control structure. Furthermore, modern probabilistic programs such as Pyro [
19] can combine stochastic variational inference with enumeration to infer the motoric action posterior. Thereby, making it possible to combine operators represented by both discrete/symbolic and continuous variables in the proposed control structure.
7. Discussion
The intention of the presented efforts was to implement general reflective mechanisms suitable for robotic applications with an outset in previous work. In
Section 6.1 and
Section 6.2 we demonstrate that the proposed method functionally improves upon our previous proposed probabilistic programming idiom and that it at least can perform as well as, if not better than, problem-specific methods. However, in
Section 6.3 it was concluded that the current implementation would probably be too slow and jerky for real-world robot applications. The approach presented within this paper is supposed to be generally applicable and reusable, making it hard to assess how much the current implementation should be improved to be applicable to real-world robot applications since this would of course depend on each specific use case. One way to asses this anyway could be by comparing it to Allan Newell’s analysis of the time scales of human cognition [
21]. This is reasonable because the ultimate end goal of our efforts is to make robots as capable as humans.
In a single cycle of
deliberate attention proposal,
deliberate attention evaluation and
affective response, access to parts of cognition distal to the decision process have to have occurred multiple times in order to infer the motor buffer posteriors. This places the proposed approach somewhere above the “biological band” of Newell’s analysis said to be on the order of ∼10 ms. The next step up in Newell’s analysis is to the “cognitive band” starting at the level of
deliberate acts in the order of ∼100 ms. However, the proposed approach does not merely comprise deliberation, i.e., choosing one known operation over other known operators by bringing available knowledge to bear, since operators are constructed for the to-be-produced response based on the proposed deliberate attention mechanisms. Therefore, the proposed approach also belongs somewhere above the time scales of
deliberate acts. At the other end of the “cognitive band", we have
unit tasks in the order of ∼10 s. At the time scale of
unit tasks, operations should be composed to deal with tasks. By design, the specific affective response presented in
Section 5.3 can only deliver simple responses in one decision cycle and not a plan of responses to solve complete tasks. This leaves us at the time scales of
elementary cognitive operations or
immediate external cognitive behavior at ∼1 s. According to Newell’s analysis, such elementary reactions often take ∼2–3 s, however, with learning from experience, simplification, preparation, and carefully shaped anticipation, it can take less than ∼0.5 s. By design, the specific affective response presented in
Section 5.3 can deliver simple responses within 1 to 3 full cycles of
deliberate attention proposal,
deliberate attention evaluation and
affective response. With the timings in
Figure 6a response thus takes anywhere from ∼1.7 s up to at most ∼5.34 s in the case of two impasses. Thus, to arrive at the upper end of elementary reactions, the computational times of the current implementation would have to be improved with a factor of ∼2–3. Obtaining such improvements does not seem implausible via code optimizations, however, it brings us nowhere near the lower end of ∼ 0.5 s. This begs the question: can the proposed approach support the necessary machinery to learn from experience in order to deliver responses at the lower end of ∼0.5 s?
In
Section 3 and
Section 4 we described the use of stochastic variational inference, as the basis for inferring a parametric approximation of the posterior over the motor buffer,
. It was assumed that this inference process would have to be done from scratch in each decision cycle. However, this need not necessarily be the case. Instead, we could make use of amortized variational inference [
24,
25,
26,
27]. Thus, instead of making use of a variational distribution with free parameters,
, we would make use of a variational distribution with parameters determined by a parametric function,
, e.g., a neural network. When new situations are encountered we would not necessarily gain much by doing so, however, over time this would in principle allow the system to generate proper responses to situations similar to those that the system has previously encountered, without performing any inference. Thereby, removing the need for the most time-consuming step in the decision cycle. Again, when considering the timings in
Figure 6a, reducing the inference step to near zero, would bring the total time of a single decision cycle down to around ∼500 ms with the current implementation. Now if it is possible to improve the other steps with a factor of ∼2–3 via code optimizations, it would indeed seem plausible to achieve
immediate external cognitive behavior in around ∼0.5 s after an initial learning period.
Further optimization might be achieved by considering when to stop the underlying inference algorithm. In the current implementation, the underlying inference algorithm uses a fixed number of iterations that has to be pre-defined. It might not be necessary with the same number of iterations in all situations, and thus time could be saved if a more clever mechanism for deciding the number of iterations could be implemented.
With these additions and optimizations of the approach and its implementation, we believe that the approach will be applicable to real-world robot applications, and thereby contribute to the goal of constructing autonomous robots that can safely and naturally interact with humans while solving different abstractly and/or vaguely defined tasks. As such, these optimizations will be the focus of our future work.