1. Introduction
Reinforcement Learning (RL) has achieved great success in solving several tasks, such as Atari games in the arcade learning environment (ALE) [
1], where the sequence of environmental observations serves as the basis for determining decisions [
2]. RL algorithms process environmental data to learn a policy that chooses the best action to maximize cumulative reward [
3]. During RL, the agent interacts with its environment to arrive at different states by performing actions that cause the agent to obtain positive or negative rewards.
Nevertheless, the limited adaptability of RL approaches poses challenges when dealing with complex tasks. Consequently, developing methods that enable the application of RL to complex environments is a significant research problem [
3]. The goal is to enhance the capabilities of RL algorithms to effectively handle intricate tasks, allowing for more robust and efficient learning in complex scenarios.
The combination of deep neural networks (DNNs) and Q-learning [
4] led to the deep Q-network (DQN) algorithm [
2,
5], which has been used in various works to develop models for complex tasks. These agents demonstrate remarkable success, surpassing human-level performance and outperforming baseline benchmarks [
6].
However, the DQN algorithm can suffer inefficiency and inflexibility, which can limit its performance [
6,
7]. It is vulnerable in complex environments regarding data efficiency, as there is an infinite number of possible experiences in such environments, and there is a need to process many states, requiring high computational power [
8]. The DQN algorithm has received criticism for its need for more flexibility, specifically when adapting to changes in the environment or incorporating new tasks. In such cases, the algorithm typically requires a complete restart of the learning process, which can be time-consuming and inefficient [
7]. It also has certain limitations in terms of generalization compared to regular neural networks [
9].
Researchers have developed a range of extensions to the original DQN algorithm to address these issues, such as double DQN [
10]. Moreover, several works have proposed the use of visual attention mechanism [
11,
12,
13,
14], which allows the network to focus on specific regions of an input image rather than processing the entire image at once [
11].
The attention mechanism can be implemented in various ways, such as using convolutional neural networks (CNNs) to extract image features, recurrent neural networks (RNN), and visual question answering (VQA) models that process the textual question input [
15,
16,
17]. The VQA attention mechanism serves the purpose of selectively directing attention toward image regions that are crucial for answering a question. The attention mechanism effectively prioritizes the attended regions by assigning importance scores or weights to different image regions based on their relevance to the question. As a result, these regions are granted a higher degree of significance in the overall analysis [
15,
16,
17,
18,
19,
20].
In this work, we address the aforementioned limitations of the baseline DQN agent and propose Attention-Augmented Deep Q-Network (AADQN) by extending DQN with a dual attention mechanism to highlight task-relevant features of input. The dual attention mechanism unifies bottom-up and top-down visual attention mechanisms within the AADQN model, as illustrated in
Figure 1. This way, AADQN allows the agent to efficiently concentrate on the most relevant parts of the input image. The top-down attention (TDA) incorporates particle filters, enabling dynamic identification of task-related features. The integration of particle filters provides a regularization effect, effectively mitigating overfitting and enhancing generalization. The bottom-up attention consists of two parts: a preliminary bottom-up attention (PBA) module and a bottom-up attention refinement (BAR) module. The PBA extracts the feature importances, which TDA subsequently utilizes to initialize the particle filters. BAR then refines attention by considering the saliency value of the features.
AADQN differs from the previous work using visual attention with DQN [
11,
12,
13,
14] by refining top-down focus through bottom-up attention. The collaboration of both attention mechanisms enables more accurate decision-making and improves the model’s robustness to variations, reducing its sensitivity to noise. It is also fundamentally different from the VQA-based RL approaches, which aim to align visual and textual information to make decisions and are typically applied to tasks where understanding and reasoning about visual and textual data are essential, such as answering questions about the content of an image [
15,
16,
17,
18,
19,
20]. Our dual attention mechanism, enhanced by a particle filter, can be applied to various tasks without relying on textual information, enabling AADQN to adaptively focus on different regions of the input.
Overall, the contributions of our work can be summarized as follows:
AADQN introduces a novel particle-filter -enhanced dual attention approach to DQN, integrating both bottom-up and top-down attention mechanisms. This unified approach handles the complexity of the environment by extracting essential task-related features, thereby enhancing efficiency and improving overall performance.
AADQN enhances the flexibility performance of DQN by freezing the unimportant or irrelevant units of CNN’s inner layers during the decision-making process. These units, identified by low relevance scores assigned by AADQN, undergo transfer learning to optimize the model’s robustness and flexibility.
The remainder of this paper is organized as follows:
Section 2 gives an overview of the related work. Then,
Section 3 describes the background of Q-learning and baseline DQN. The proposed algorithm and algorithmic efficiency analysis are presented in
Section 4 and
Section 5.4, respectively. These are followed by the experiment results and performance comparison reported in
Section 5. Finally,
Section 6 concludes the paper.
2. Related Work
Reinforcement learning algorithms, such as DQN [
2], have shown great success in learning to play Atari 2600 games from raw pixel inputs. While DQN [
2] can learn to play games effectively, it can suffer from instability and inefficiency in learning [
6,
7].
To address these shortcomings, several modifications have been proposed to the original DQN algorithm [
6,
8,
10,
21,
22]. The prioritized experience replay method, proposed by Schaul et al. [
21], prioritizes experience replay based on the importance of the sample, so that it replays critical transitions more frequently, and therefore learns more efficiently. A dueling neural network architecture, which is introduced by Wang et al. [
10], separates the state value function and the action function, resulting in more stable learning and better performance than the state of the art on the Atari 2600 domain. Several works have presented a distributed architecture for deep reinforcement learning [
8,
22,
23]. Such architectures distribute the learning process across multiple parallel agents, which enables more efficient exploration of the state space and faster convergence [
8]. Another group of work has aimed to learn additional auxiliary functions with denser training rewards to improve the sample efficiency problem [
24]. Some research has combined multiple techniques, such as prioritized experience replay, dueling networks, and distributional RL, to achieve performance enhancements over DQN [
6].
A number of studies have used attentional mechanisms to improve the performance of their models [
12,
13,
25,
26,
27]. Some of these use bottom-up attention, allowing the agent to selectively attend to different parts of the input, regardless of the agent’s task. Others have applied attention to DRL problems in the Atari game domain [
11,
12,
28]. Additionally, several others have explored attention by incorporating a saliency map [
29] as an extra layer [
30] that modifies the weights of CNN features.
Studies that use the basic bottom-up saliency map [
29] as an attention mechanism in RL have used many hand-crafted features as inputs [
31]. Yet, these models show an inability to attend to multiple input information with sufficient importance simultaneously. Top-down attention mechanisms can also be used to improve the performance of DQNs by allowing the agent to selectively attend to relevant parts of the input based on its current tasks [
32].
Most of the previous DRL studies that use attention mechanisms are generally based on back-propagation learning [
13,
27], which is actually not ideal to be used by DRL [
25,
30] as it can lead to inflexibility [
33]. Few other works have proposed to learn attention without back-propagation [
25,
30]. Yuezhang et al. infer that attention from optical flow only applies to issues involving visual movement [
30]. The Mousavi attention mechanism uses a bottom-up approach, which is directed by the inherent characteristics of the input information regardless of the reward [
25]. We incorporate a unified bottom-up and top-down visual attention mechanism into the DQN model to improve the game agent’s training stability and efficiency.
3. Background on Q-Learning and DQN
Basic Q-learning learns a function that updates an action-value (Q) table for different states and actions, enabling an agent to learn the optimal policy for taking actions in an environment by maximizing expected cumulative rewards based on the current, as represented by the variable
s in (Equation (
1)) [
34]:
In the above; the Q-value for a state s when taking action a is calculated as the sum of the immediate reward and the highest Q-value achievable from the subsequent state . The variable , referred to as the discount factor, governs the extent to which future rewards contribute to this calculation.
The agent performs the sequences of actions to maximize the total expected reward. The Q-value is formalized as follows [
34]:
where
is the learning rate, and
is called the temporal difference error.
To address the limitations of basic Q-learning, such as slow convergence and the need for manual feature engineering, Mnih et al. proposed a deep Q-network (DQN) [
2] to utilize deep neural networks to approximate the learning function, allowing for more complex and efficient learning directly from raw sensory inputs. The DQN architecture (
Figure 2) uses a CNN to process the input image representing the game state and produces Q-values for all available actions. The CNN’s convolutional layers extract important features, generating a feature map. This feature map is then flattened and fed into a fully connected (FC) network, which computes the Q-values for the actions in the current state.
In traditional supervised neural networks, learning the target is always fixed at each step, and only estimates are updated. In contrast, there is no specific target in reinforcement learning: the target itself is estimated, and therefore, it changes during learning. However, changing the target during the training process leads to unstable learning. To resolve this issue, DQN uses a dual neural network architecture, referred to as the prediction and target network, in which there are two Q-networks with identical structures but different parameters. Learning is primarily accomplished through the update of the prediction network at each step to minimize the loss between its present Q-values and the target Q-values. The target network is a copy of the prediction network that is updated less frequently, usually by copying the weights of the prediction network every n step. This way, the target network serves to maintain stability and to prevent the prediction network from overfitting to the current data by keeping the target values fixed for a window of n time steps.
4. Attention-Augmented DQN Algorithm (AADQN)
Our proposed AADQN approach enhances the DQN architecture by incorporating bottom-up and top-down attention mechanisms. This integration empowers the game agent to selectively attend to task-relevant features while disregarding irrelevant features, thereby reducing the complexity of the task at hand.
As illustrated in
Figure 1, the CNN part of the model takes the present state of the game as input
I and extracts the set of feature maps
. Using
, PBA defines the feature importance vector
. The TDA mechanism then generates particle vectors
, each with
D dimensions representing the
D feature maps, and converts them to the attention vector
. Next, the BAR mechanism refines attention vector
by considering the saliency map
.
After calculating the CNN mid-layer relevance scores by LRP rules, AADQN freezes the irrelevant units on the mid-layers. Freezing the irrelevant neurons on the mid-layers improves the model’s robustness and flexibility in complex environments.
The overall flow of AADQN is given in Algorithm 1. In the following, we break down this process and describe the AADQN modules in detail.
Algorithm 1: Attention-Augmented DQN (AADQN) |
|
4.1. Preliminary Bottom-Up Attention (PBA)
PBA determines the feature importance vector of the set of feature maps , which is then utilized by the TDA mechanism to initialize the particles’ states.
To generate feature importance , we first compute the normalized mean activation value of each feature map in , where . The mean activation value of a feature map gives a measure of the activity within a CNN, s.t., it serves to assess the level of activity in a feature map.
To select the most informative features, we calculate the entropy of each feature map
using the Shannon entropy formula [
35] given by
where
B is the number of feature value ranges, and
is the probability of observing a feature value in the
ith range [
35].
The entropy
is then also normalized:
Features with uniform distribution are not desired in CNNs, as they do not provide the network with the ability to learn complex patterns and features in the input data. To this end, the chi-square (
) Test [
36] given by
is used to improve the feature selection, where the
metric is used to assess the uniformity of the activation values. Here,
represents the activation values of the feature maps, while
corresponds to the activation values that are uniformly distributed. A higher value of
indicates the lower importance of the feature in determining the output.
The evaluation of feature importance typically relies on activation values [
37,
38] or, in certain cases, the entropy of feature maps [
39,
40]. We use both metrics to achieve better performance across diverse environments and enhance the assessment of feature importance, as in
makes up the feature importance vector
, which is used in TDA to initialize the particle states.
4.2. Top-Down Attention (TDA)
The TDA mechanism helps the agent focus on task-specific information by assigning weights to feature maps through the attention vector
, where
represents the attention weight of the feature map
. This mechanism uses a particle filter that generates a set of particles
to approximate the posterior distribution over the attention vector for the current task. Each particle is a D-dimensional binary vector
that estimates the probability distribution of the feature maps with respect to importance. In a particle vector
, the
jth element
corresponds to the feature map
and indicates whether that feature is useful (‘1’) or not (‘0’) for the task at hand. This binary representation provides a selection of essential features [
41,
42,
43,
44].
The algorithm iteratively updates the particles using the reward feedback. Then, re-samples the particles (
Figure 3) and generates a new sample set. Finally, the distribution of the feature maps is estimated.
Combining gradient descent and particle filter provides the following advantages:
Following the local gradient within the particle filter allows for finding minima (including the global minimum) with fewer particles [
45].
Enhancing the gradient descent method with multiple hypotheses helps to avoid getting stuck in local minima [
46].
Similar to DQN [
2], we apply
feature maps in its final CNN layer. To estimate the distribution of these feature maps, the number of particles required depends on the distribution complexity. Increasing the number of particles generally enhances the accuracy of distribution estimation. However, there is no universally optimal number of particles that applies to all scenarios. It is common to employ a sufficiently large number of particles to ensure reliable estimates [
43,
44]. In our case,
particles are utilized to process the set of 64 feature maps.
In order to generate the particle set
that TDA uses to compute the attention vector
, we first normalize the importance value
of each feature
by the maximum value across all feature maps and convert to the probability. This ensures that the most active feature will have a probability of 1 as follows:
Then, the binary are generated randomly using a Bernoulli distribution, i.e., has a probability of for and a probability of for . In this way, TDA initializes the particles by focusing on the most informative features with regard to the feature importance , so that the features with higher are more likely to be attended to than those with lower .
To calculate the particles’
likelihood, i.e., the immediate reward that is output from DQN’s given state, first, the initialized particle state
is normalized as follows:
After updating the state of each particle as described above, we calculate the error between the predicted particle state Q-value
and the
, which is returned from
as follows:
where
is the normalized particle.
In the TDA process, rewards are determined based on normalized particles. A good particle state has a stronger predictive ability
for the target reward
. The main objective in this context is to identify the particle state that achieves the highest accuracy among a given set of particles, with the aim of minimizing the error in (Equation (
9)). By minimizing the squared error, the algorithm seeks to find the particle state that closely aligns with the desired target reward, which is used in the next time step.
Particles are updated based on the likelihood of the immediate reward
, which is proportional to the following error value, s.t.,
Once the likelihoods are calculated,
are found as follows:
Using , the particles are then re-sampled with replacement, and the posterior distribution is updated for the next step.
Finally, the attention vector
is reset with the normalized mean of the particle states as in
where
is given by
4.3. Bottom-Up Attention Refinement (BAR)
BAR enhances the focus by considering the saliency values of the essential features. It aims to improve agent performance by increasing the attention weights of the essential features (
) (Equation (
16)) with lower saliency values
. While bottom-up methods can be useful in identifying salient features, they are not solely effective in identifying task-related features since they can miss much crucial task-related information due to noise or complexity. As a result, our approach improves the learning ability of the agent while considering this attention refinement, allowing for improved capture of task-related information.
Traditional saliency prediction methods rely on low-level features like color, contrast, and texture [
47,
48], but they struggle to capture the full range of factors influencing visual saliency maps [
49]. BAR utilizes the saliency attentive model (SAM) by Cornia et al. [
49], which uses a convolutional long short-term memory to enhance saliency predictions iteratively.
We quantify the relevance score of FDP with respect to specific features within the CNN using LRP [
50]. These relevance scores are used to select the specific feature map-related pixels. These pixels are determined by whose relevance scores are greater than the average value of the corresponding pixels’ relevance scores.
BAR obtains the saliency value
corresponding to a specific pixel
from the saliency map (
) that is generated by the SAM model [
49].
To calculate the specific feature saliency value
according to the pixel information using the saliency map (
), we first normalize the saliency map within the range of
. Then, BAR calculates the saliency value of a specific feature map denoted as
j by considering only pixels with average relevance scores higher than the average relevance of the corresponding feature map.
Here,
P corresponds to the pixels that have a higher relevance score than the average relevance value (
) of that specific feature map:
The refinement vector
is obtained by considering the feature saliency value
using
where
threshold is defined as the average value of the attention vector
.
The attention vector
is refined by multiplying element-wise with the refinement vector
as
. Then, each refined element is replicated to align with the dimensions of a feature map, ensuring that the same attentional value is applied uniformly across all spatial locations in the feature map
. Finally, this process re-weights the feature maps, amplifying feature maps with attentional weights as (Equation (
17)):
Here ⊙ corresponds to the Hadamard product, and ⇑ shows the upscaling the refined attention vector by replication.
4.4. Layer-Wise Relevance Propagation (LRP)-Based Transfer Learning
AADQN employs a transfer learning scheme to enhance the agent’s flexibility by reducing features and improving adaptation to noise and complexity. This scheme is based on LRP by Saraee et al. [
50].
LRP propagates the output of the network backward through its CNN layers, assigning relevance scores to each neuron in each layer based on its contribution to the output, as illustrated in
Figure 4, according to the LRP rule as in [
51], which satisfies
where
is the relevance of neuron
i at layer
l, and
is the relevance of neuron
j at the next layer
. According to the LRP method, the relevance value of the feature map is redistributed in the lower layers, and
is defined as the share of
to neuron
i in the lower layer
l. Back-propagation continues until relevance scores are extracted for all neurons, including the input layer.
As described in
Section 4.3 above, the relevance scores for the input layer are then used in BAR to highlight the FDP information in the input data that leads to the output of the feature maps.
When computing the final layer feature maps of the CNN, multiple internal CNN layers are employed. The vast number of parameters within the mid-layers of the network enables the capture of intermediate features at various levels of abstraction [
52]. By harnessing these intermediate features, especially through dual attention mechanisms, the models’ generalization capabilities can be enhanced [
15]. To avoid unrecoverable information loss [
53], AADQN freezes the unimportant neurons of the inner layers [
53,
54], which improves the model computing efficiency. Moreover, the model becomes most robust to noise and can improve flexibility in complex environments.
5. Experiment Results
In our work, we focus on DRL game agent’s inefficiency and inflexibility problems. In this context, we investigate the importance of applying a visual attention mechanism to the performance of the game agent. To evaluate the performance of our model, we address the following questions:
To compare the two algorithms, we have implemented our approach on the Atari 2600 environment in the OpenAI Gym environment [
1]. From Atari 2600, we selected eight games (Pong, Wizard of Wor, SpaceInvaders, Breakout, Asterix, Seaquest, Beam Rider, Qbert) (
Figure 5) with varying complexities and difficulty levels.
All experiments were run on a workstation with an Intel i7-8700 CPU and an Nvidia GTX-1080Ti GPU. The source code for the AADQN algorithm is available at the paper’s GitHub page,
https://github.com/celikcan-cglab/AADQN, accessed on 20 October 2023.
5.1. Comparison with Baseline DQN
In this section, we provide a comparative analysis with respect to DQN in terms of game average score and time step. The time-step analysis includes the number of steps required to achieve a certain average score performance threshold.
Table 1 provides the comparison of training the AADQN and DQN benchmark agents in average scores achieved on the eight tasks at the same final time spent for each task, and
Figure 6 illustrates the progress of the two agents in terms of achieved scores until the final time step.
In two relatively simpler games, Pong and Breakout, we have observed an increasing average score for both algorithms with increasing time steps. Furthermore, AADQN outperformed the DQN algorithm throughout the entire process. Until time steps, both algorithms performed similarly, but after this point, AADQN started to outperform DQN. This is likely due to the fact that in simple environments, learning is easier for both networks, and the game’s high-level task is easier to learn due to the simplicity of these game environments.
In the other more complex games, we have observed a better game score performance of AADQN after a certain point of game experience. The difference between AADQN and DQN grows with increasing time steps of learning. For example, in the Wizard of Wor game environment, DQN shows a better performance than our approach until time steps. This is because particles in the AADQN algorithm in the initial steps were not yet different from random states. However, after () steps, AADQN has a better performance than the DQN method throughout the entire experiment. This clearly shows that when the particles are close to the desired state, which represents a target configuration of the features map, AADQN reaches the optimal policy earlier than DQN. We have observed similar behavior in other complex game environments (SpaceInvaders, Seaquest, Beamrider, and Qbert).
The decrease in score performance of the baseline DQN method during learning is also noteworthy in these complex environments, whereas the AADQN score has an increasing trend throughout the experiment. For example, in the Wizard of Wor game, at different learning steps (e.g., 30 and ), DQN’s score actually decreased. These complex environments, including Wizard of Wor, can be non-stationary, meaning the optimal policy may change over time. DQN assumes a stationary environment, and if this assumption is violated, the algorithm may struggle to adapt, leading to a decrease in performance. Moreover, DQN faces a challenge in exploration-exploitation trade-offs. In some stages of training, the agent may prioritize exploration and try different actions to learn about the environment. This exploration can lead to a decrease in the average score. If the exploration strategy is not well-tuned, the agent may not explore enough to discover optimal policies.
Overall, these findings provide empirical evidence supporting the claim of faster learning in AADQN with respect to DQN.
5.2. Comparison with Other Algorithms
In this section, we extend the performance comparison of AADQN to other state-of-the-art algorithms in addition to DQN. The results are given in
Table 2, where we provide the final state of each analyzed algorithm for the eight games under consideration at the same specific time step to ensure an equitable comparison.
The results reveal that there are specific environments where AADQN falls slightly behind the other state-of-the-art approaches. For instance, the asynchronous actor-learner architecture of A3C [
22], which utilizes parallel agents, achieves a higher average score in the Wizard of Wor game. A3C is recognized for its ability to efficiently explore diverse actions and policies in intricate environments, making it valuable in the “Wizard of Wor” game. Similarly, the dueling network architecture (Dueling DQN) by Wang et al. [
10], which separates value and advantage function estimation for Q-value computation, exhibits better performance in the Qbert game. This separation allows Dueling DQN to excel in situations where distinguishing between the value and advantage of different actions is critical, such as in the Qbert game. Nonetheless, in other games, AADQN outperforms these methods. Overall, we can conclude that AADQN exhibits relatively better performance in complex game environments.
5.3. Comparison of Learning Stability
Figure 7 demonstrates the fluctuation rate in the average game scores of the agent during the start-up and convergence phases for the eight games under consideration. For the sake of a clear comparison between DQN and AADQN, the two plots are overlaid in each sub-figure so that they are shown with the same scale but different ranges (e.g., in the Asterix game, the AADQN scores are in the range [1550–1610], whereas DQN is the range [1150–1210]).
It is seen that the AADQN and DQN methods have different start-up and convergence phase fluctuation behaviors. Significant fluctuations in the start-up phase for both algorithms indicate instability or inconsistent learning. In the convergence phase, AADQN’s average score stabilizes and mitigates the fluctuation compared to the DQN algorithm. This point indicates that the agent has converged to a relatively optimal policy and is consistently performing well. As shown in
Figure 6, in the Pong game, AADQN becomes stabilized sooner than the baseline DQN. Similar stability behavior in the start-up and convergence phases are observed in all games, regardless of game complexity.
5.4. Algorithmic Efficiency
The time required for training DQN can vary significantly depending on the hardware. So, in this section, we explain the computational efficiency of AADQN in terms of total time steps in order to provide a fair comparison with the baseline. Our framework is based on DQN together with particle-filter-based top-down and saliency-based bottom-up attention. So, the computation overhead on top of DQN comes from these attention mechanisms. The particles can be sampled in parallel, which adds a single particle computation to the system. However, the attention mechanism decreases the total time steps required to obtain the same average score with a focus on significant feature maps. Similarly, the saliency maps can be extracted parallel to the DQN forward pass. LRP relevance scores can be calculated parallel to the DQN backward pass, which requires no additional time steps. In
Figure 6, the time-step comparisons of DQN and AADQN are given.
6. Conclusions
Motivated by the lack of efficiency and flexibility of standard DQN, this study proposed a new attention-augmented DQN model (AADQN), which integrates bottom-up and top-down attention mechanisms, surpasses DQN performance in shorter steps, and demonstrates adaptability in intricate environments.
While the bottom-up method identifies basic lower-level features, it may miss task-related information due to noise or complexity by itself. The particle-filter-based top-down attention mechanism addressed this limitation and enhanced DQN’s performance. Additionally, by utilizing LRP, unimportant neurons on CNN were identified and frozen, leading to an improvement in DQN robustness.
Leveraging a combination of particle filter and gradient descent impacts the determination of gradient direction in a way that causes time-step basis acceleration during the training process and improves the average score by about 134.93% across the eight game environments. The particle-filter-based attention reduces fluctuation and provides more reliable convergence, together with alleviating the necessity for extensive training data that are typically required by baseline DQN.
When comparing with other studies, it is worth noting that in certain complex games like ‘Wizard of Wor’ and ‘Qbert’, AADQN was outperformed by A3C and Dueling DQN algorithms, respectively. However, if we exclude these two games from our analysis, we can generally observe that AADQN performs relatively well in complex game environments.
In future research, it would be valuable to conduct more extensive comparisons with other methods, such as Gorila and Double DQN, and provide a more comprehensive analysis involving other game environments in varying complexity. Furthermore, future research in this field could explore the impact of clustering the extracted feature map of a CNN to reduce the dimensionality of the attention vector. This clustering technique may have the potential to enable game agents to attend to multiple areas simultaneously, thereby enhancing their ability to focus on multiple regions of interest at once. Furthermore, inhibitory/negative signals of LRP can also be taken into account, which might further enhance its contribution by refining its functionality.