1. Introduction
Navigating uncertainty is crucial for adaptive decision-making across various real-life contexts. Uncertainty is inherently challenging because it involves a lack of clear information or ambiguous signals about future events’ probabilities [
1], requiring strategies to be tested against unknown circumstances. Various cognitive models suggest that individuals gradually update their knowledge about their environment based on the history of choices and outcomes [
2]. Sensitivity to environmental uncertainty is essential for appropriately weighting past and recent experiences, guiding both the exploration of new strategies and the exploitation of optimal options [
3]. In real-life situations, such as managing economic volatility, an individual’s approach to uncertainty can shape the strategies of large-scale organizations, including countries (in the case of governments) or companies (in the case of managers) [
The exploration–exploitation dilemma has been used as an experimental framework to understand how people handle uncertainty, leveraging high-level cognitive functions such as metacognition and prospection [
4]. Evidence indicates a connection between long-term thinking (operationalized as temporal discounting) and directed exploration, highlighting the long-term benefits of this strategy compared to inflexible decision-making [
5]. In cognitive neuroscience, substantial evidence links long-term thinking with prefrontal cortex activity associated with executive processes [
7]. Specifically, studies using neuroimaging methods suggest that the dorsal anterior cingulate cortex (dACC) monitors decision uncertainty. In contrast, the lateral frontopolar cortex (lFPC) is involved in the metacognitive control of decision adjustments, supporting the notion of a prefrontal metacognitive monitoring process independent of the decision process itself [
7]. In practical scenarios, such as managers dealing with economic volatility, cognitive control over impulsive behavior is traditionally linked to adaptive behavior [
8]. However, recent evidence suggests that impulsivity can be beneficial depending on the available choices, task structure, and volatility [
8]. In management, for instance, the negative impact of uncertainty can be mitigated by managers’ tolerance for ambiguity (usually taken as an uncertainty inductor) and flexibility in their decision strategies [
Uncertainty has been traditionally associated with negative states such as fear and anxiety. However, evidence also links uncertainty to positive emotions like surprise, interest, excitement, and enthusiasm, depending on individuals’ expectations about a task [
10]. Mood plays a significant role in decision-making and is associated with emotional states elicited by expectations, rewards, and uncertainty [
11]. Mood in decision-making can be understood as a subjective experience and physiological changes occurring during task performance [
12]. More specifically, mood relates to prolonged experiences marked by valence, motivation, and arousal [
13]; positive mood is linked to creativity, originality, and flexibility [
12]. The induction of mood varies between different experimental approaches, from using music or films to induce mood in participants [
15] to the use of task incentives such as money [
16]. Mood serves as a way to apprehend the participants’ emotional state when receiving rewards and losses. Within computational modeling, mood is operationalized as the cumulative effect of differences between reward outcomes and expectations [
16]. It is thought to reflect the cumulative impact of several outcomes, as opposed to emotional reactions related typically to a single stimulus [
17]. In our experimental design, mood is defined as the expected value of the prediction error history (see details below in Materials and Methods). We did not measure the subjective and physiological components of the emotional reaction of mood; instead, we used a computational construct that allow us to capture the influence of the appraisal of an agent’s action. Based on the operationalization of Bennett et al. [
18], we propose that the valence of an agent’s mood corresponds to their appraisals of the difference between the value of an action and the value of the environmental state within which the action was taken. In this sense, the computational construct captures the moving average of surprise, that is, the history of how different the expectations and the real rewards are.
Emotions can shape how people learn from environmental cues by influencing cognitive processes such as confidence and metacognition. For instance, negative emotions can lead people to be more prone to risk-seeking behavior [
21] opening the door to flexible and even creative solutions [
10]. Recent research has focused on the influence of confidence on learning under uncertainty and how mood influences these executive processes [
11]. Confidence, defined as the belief in the correctness of a decision, is associated with higher decision accuracy in learning tasks [
22]. Confidence is typically measured by asking participants to judge their own decisions and then rate their confidence on a scale. Confidence rates can represent, altogether with the probability that a decision is correct, factors such as estimates of noise and evidence magnitude, among others. High confidence can be related to confirmation bias by affecting the post-decision processing of the integration of confirmatory and disconfirmatory evidence [
23]. Furthermore, confidence involves prefrontal brain activity, which influences learning rates and social interactions, among others [
22]. The relationship between mood and confidence has yet to be well explored, with some studies indicating that self-reported mood does not necessarily align with confidence as it evolves on different timescales [
Using a learning task (see Rollwage et al. [
23], Doll et al. [
24], Palminteri et al. [
25]), we aimed to explore the relationship between confidence, mood, learning under uncertainty, and exploration–exploitation behavior. In our task, participants had to learn under uncertainty and under different reward states that induced different moods. We hypothesize that mood influences confidence, enabling higher decision flexibility (i.e., accurate switching between exploration and exploitation). This higher decision flexibility will lead to better task performance (higher accuracy). Our results show that, indeed, confidence modulates exploration/exploitation and learning rates, while mood modulates reward perception and confidence. These results suggest that the context of a decision can influence the strategies people use when facing uncertainty by modulating mood, which, in turn, affects their sensitivity to environmental changes and overall task performance. Specifically, we observed that a negative mood reduced confidence, leading to increased exploratory behavior. Conversely, a positive mood enhanced confidence, promoting a focus on exploitation and improving task performance. Taken together, our results suggest that metacognition involves a flexible balance between exploration and exploitation, integrating mood with high-level cognitive processes.
3. Behavioral Modeling
In this section, we delve into the nuances of the behavioral modeling through the lens of the Confidence Prediction-Error Momentum (CPEM) Model. This model elucidates the dynamic interplay between the expected values, rewards, and mood within decision-making processes by integrating parameters such as the confidence weighed (
) on learning rate (
), choice stochasticity (
), and the mood-biased prediction error (
) (below, we provide a definition and operationalization of the parameters listed in
Table 1). We construct a robust framework for understanding how cognitive and emotional states influence behavioral outcomes from a computational perspective.
The CPEM model offers a comprehensive approach to quantifying the latent values assigned to each action (a) and state (s) in a trial-by-trial analysis. Central to this model is the concept of prediction error, which captures the divergence between expected outcomes and actual rewards. An individual’s mood () intricately modifies this error, introducing a unique layer of complexity to the reward perception and subsequent decision-making processes. The model dynamically updates the expected values and mood based on real-time interactions and outcomes by applying nonlinear functions and learning rates. This iterative learning process is pivotal in refining the accuracy of behavioral predictions under varying conditions of uncertainty and reward-state contexts. This section aims to provide a clear and detailed exposition of the Confidence Prediction-Error Momentum Model, emphasizing its potential applications and the theoretical underpinnings that make it a valuable tool for psychological and behavioral research.
We adjust the behavioral data to reinforcement learning models. The model space includes a standard Rescorla–Wagner model or Q-learning [
30], from now on named RW; a modified version of the RW model that add mood as a measure of the recent history of prediction errors (named PEM) [
31]; and 5 modified versions of the PEM model that add confidence monitoring of the influence of behavior and mood on confidence monitoring (named CPEM). In the next sections, we will explain the details of each model.
3.1. Basic Model-Free RL
We started with a basic, model-free reinforcement learning algorithm [
30] (RW model). Trial-by-trial for each pair of stimuli (for instance, A and B), the model estimated the expected value, based on the individual results history, and used this expected value (
) for making a decision (for instance, taking an action
a for A instead of B in state
s). The expected values were zero before learning, and after each trial
t, the choices’ estimated value was updated as a function of the prediction error. This update of the value is the learning rate, and the updating follows the delta rule [
and we keep the expected value for the non-chosen option, where “¬a” represents the absence of the action a, in other words, the non-chosen option:
is the learning rate (the updated value) and
is the prediction error defined as the difference between the actual reward (
) and expected value (
update their value in Equation (
1), and are not present in the non-chosen choice (Equation (
2)) given that this does not present information to be updated.
3.2. Mood Effects on Valuation
In order to include the mood effect in the valuation of rewards, Eldar and Niv [
31] include the perceived rewards in the calculation of the prediction error, instead of the real rewards:
represents the bias effect of mood over rewards; thus,
indicates a positive mood (
) or a negative mood (
), and
is a constant parameter that shows the direction and scope of mood, restricted to the interval
in this study. If
, mood exerts positive feedback, as reward is perceived as larger in a good mood and as smaller in a bad mood. Conversely,
corresponds to negative feedback, as reward is perceived as smaller under a positive mood and as larger under a negative mood. If
, mood does not bias the perception of rewards Eldar and Niv [
31]. Eldar and Niv [
31] modeled the effects of unexpected results over mood, assuming that mood represents a recent history of prediction errors (
) independent of state or context (Prediction Error Mood Model, from now on named PEM), restricted to the interval (
) by means of a hyperbolic tangent function:
is a constant learning rate (mood update), restricted to the interval
. Following th RW model, the prediction error
updates the expected values after each trial with a learning rate of
3.3. Confidence Monitoring
Regardless of the CPEM variants, we estimated the expected confidence value trial-by-trial (
), adding different predictors used in previous studies that use a learning task with two forced options (2AFC) [
35]. For instance, Lebreton et al. [
33] and Somatori and Kunisato [
34] modeled confidence, adding as a regressor the difference in the evidence for each option in the decision time, which reflects the difficulty of distinguishing the best choice. This, in an RL model can be operationalized as the absolute value of the expected value difference for a contingent pair of options (
is a delta rule that updates the confidence values from the prediction errors [
35], with a confidence learning rate of
, restricted to the interval
3.4. Meta-Learning Modulation
The estimated confidence was used for modeling the free parameters in the reinforcement learning models. This was carried out after each result, providing information about the precision of the reinforcement learning model regarding the value estimations and the behavioral strategies. The reinforcement models have constant parameters: learning rate (parameter
) and choice stochasticity (which has been linked to exploration and exploitation; for more details, see below subsections “Confidence Modulation on Choice Stochasticity” and “Choice probability”). This limits the capacity for optimizing the behavioral policy (the strategy) at the end of the learning blocks once subjects believe that they have a reasonably good estimation of contingencies. At this point, the prediction errors have to be moderated and the choices have to be adjusted to a more deterministic exploitation of the learned contingencies [
38]. On the contrary, when contingencies change (at the beginning of the reversal phase), prediction errors should weigh more and choices should be more exploratory. A method of optimizing behavior is to subordinate the reward-learning parameters to a higher control level that monitors the performance. Thus, the CPEM models included a metacognitive level that updated confidence to regulate contingency learning and stochasticity of choice.
3.5. Confidence’s Modulation of Learning Rate
Confidence can modulate the learning rate. Lower confidence means higher learning rates because the knowledge of the environment is lower [
40]. Higher confidence has the opposite effect; the learning rates should be lower since a decision-maker that trusts in their decisions does not need to update their value estimations. Learning rates were modulated based on confidence and by asymmetrically inducing confirmation bias, that is, when the expected value of the option and the valence of the observed reward match (positive bias) [
Instead of estimating two different learning rates (
), increasing the model complexity, we defined a constant
if the result is confirmatory; otherwise,
, which increases the modulating effect of
(weight of confidence) in the dynamic learning rate
, inducing asymmetry as follows:
is the learning rate value when the effect of the estimated confidence value is absent. Moreover,
is the dynamic coefficient of the learning rate (models CPEM, CPEM
1, and CPEM
if it is constant (models RW, PEM, CPEM
0, and CPEM
The selection of the constant
follows three criteria: Firstly, the empirical value of the asymmetry in the learning rates of previous studies suggest that the
value is close to a 30% of
43], which is possible for
≈ 4 values. Secondly, we conduct sensitivity analyses testing different confirmatory constant values for the CPEM model. Thirdly, we consider the alternative of making
an additional parameter, increasing the model complexity (see
Figure 3B for the model comparison, and
Supplementary Materials,
Figure 1, to see an extensive comparison).
3.6. Confidence’s Modulation of Choice Stochasticity
Similarly, confidence can modulate global behavioral variables such as choice stochasticity, which refers to how deterministic or random the decision strategies were. There is evidence showing that uncertainty influences exploration, since lower confidence is used to estimate the value for signaling a change over a more exploratory strategy [
44]. Dynamic choice stochasticity
, that is, the randomness of decisions throughout the task, was modulated in such a way that state-dependent exploration for the next trial was reduced when the present estimated confidence increased:
is the upper bound for parameter
, which is the choice stochasticity value when the effect of confidence is absent, and
, which is the weight of confidence on choice stochasticity
3.7. Mood’s Modulation of Confidence Update
Within all CPEM model variants, the updating of the estimated confidence was controlled by the delta rule
and the parameter
. However, only for the saturated CPEM model was the mood modulation included
is the confidence learning rate without the mood effect and
was escalated to the range (
) with a sigmoidal function:
For models CPEM0−3, is constant.
3.8. Choice Probability
The probability of accuracy was estimated from the choices’ expected value differences with respect to the sigmoidal function:
is the dynamic choice stochasticity coefficient (models CPEM, CPEM
1, snd CPEM
2) and
if it is constant (models RW, PEM, CPEM
0, and CPEM
3.9. CPEM Model Space
The construction of the CPEM models builds on the PEM model, adding an additional layer of complexity by incorporating confidence and its interaction with mood, value updating, and choice stochasticity. The various specifications of the CPEM models account for different computational hypotheses regarding the modulation and dynamics of value updating (learning rate) and choice stochasticity by mood and confidence (see
Figure 3A).
The CPEM model (best model) considers that the value update and choice stochasticity are dynamics (
, respectively); this means they are updated trial-by-trial by confidence (parameter
, nodes
Figure 3A). Moreover, parameter
, node
and parameter
, node
are idiosyncratic (non-dynamic) for each subject (see Equations (12) and (13), respectively). Additionally, the model considers that confidence updating (
) is dynamically modulated by mood (parameter
, node
Figure 3A) and by the idiosyncratic confidence learning rate of each subject (parameter
, node
Figure 3) (see Equation (
The alternative CPEM models (CPEM
0,1,2,3) start from the CPEM model, but can omit some of the specific dynamic modulations. The CPEM
0 model (see
Figure 3A) omits the dynamic modulation of confidence (parameter
) over the value update and choice stochasticity (nodes
Figure 3A), reducing the modulation to the idiosyncratic parameters (parameters
, nodes
Figure 3A). Thus, Equations (12) and (13) are simplified to
, respectively. In the same way, the absence of the dynamic modulation of mood over confidence update (node
Figure 3A) reduces the relation
, and simplifies Equation (
14) to
1 model only omits the dynamic modulation of the confidence update by mood (node
Figure 3A), and reduces the relation
, and simplifies Equation (
14) to
2 and CPEM
3 models omit the dynamic modulation of confidence both over the value update (CPEM
2, node
Figure 3A) and over the choice stochasticity (CPEM
3, node
Figure 3A), respectively. Both models CPEM
2 and CPEM
3 also omit the dynamic modulation of the confidence update by mood (node
Figure 3A), reducing the relation to
and simplifying Equation (
14) to
6. Discussion
In this study, we hypothesized that mood influences confidence, enabling higher decision flexibility for accurate switching between exploration and exploitation decision strategies under uncertainty. Our results suggest that reward states influence mood estimations as expected. We observed that the reward states importantly influenced accuracy, confidence, and the updating process of choice values. This suggests that contexts with higher rewards facilitate learning under uncertainty. The difference in the learning rates is associated with higher confidence rates, which influence exploitation, during the reversal phase. Moreover, during the low-reward state, subjects updated the choice’s values more frequently, being more sensitive to the trial-by-trial nature of the test. This suggests that mood might influence the flexibility to explore new strategies and exploit new learning.
Recently, an interesting distinction has been proposed between directed exploration and random exploration, nuancing the dichotomy between exploiting and exploring alternatives when deciding. Directed exploration refers to establishing a goal and is driven by information (or identification of patterns), while random exploration is driven by chance [
45]. Evidence has shown a relationship between long-term thinking (operationalized as temporal discounting) and directed exploration, associated with the long-term benefit of this strategy compared to exploiting known alternatives [
5]. In line with this idea, mood might influence how participants read environmental cues in critical moments when exploring new strategies and facilitate the metacognitive integration of signaling for “closing” the exploration time to benefit from what was recently learned. This is also related to the metacognitive effect that was observed between the different reward states, in that metacognition was high during the high-reward context in which confidence had a markedly stronger influence on exploitation. There is evidence showing drops in confidence following errors in a decision task (even without external feedback) [
48], and confidence, in turn, has been related to explorative strategies [
49]. For instance, in a perceptual manipulation study Desender et al. [
49] found that participants were more prone to seeking additional information when they were less confident in their decisions. Moreover, the opposite relation has been also established. That is, boosting confidence makes subjects less prone to seeking alternative information in searching tasks [
50]. It is notable that there is evidence relating negative mood with lower metacognition even when this has no impact on decision accuracy [
The differences between the confidence rates in our experiment are consistent with the previous evidence. Participants were more confident in the high-reward state, which influenced the exploitation of the choice with higher expected value. Although overconfidence can be detrimental in specific decision contexts, in this case, higher confidence rates positively influenced the accuracy of the strategy during the reversal phase. The reversal phase presented to participants a sudden change in their decision environment through the two new reward-state contexts and inversion of the reward probability. This new environment, however, was again stable until the end of the task. In this sense, the cognitive challenge involved correct reading of the change, but also stability afterward. In decision terms, this involves correct exploration of new alternatives, and the exploitation of what was learned. This makes sense with the relation between higher confidence and less exploration previously mentioned, and with the influence of confidence on post-decision processing, biasing the weight of environmental cues over the expected results [
23]. We can hypothesize that positive mood, in our case in the form of a higher expected value of the prediction error history of the task, modulates how the changes in the decision environment elicit metacognitive processing, which changes the updating frequency of present values, opening the door to exploration, which quickly turns into a newly defined strategy that is then exploited as confidence grows.
Our results, as we previously mentioned, collectively highlight the complex interplay between reward conditions, mood, and cognitive processes in shaping human decision-making. They suggest that different reward states as mood inductors influence learning under uncertainty. This influence occurs through integration of the history of rewards and the reading of the present environment. Mood, taken as a computational construct in our study, modulates the weight of rewards in behavioral planning, making them more salient in critical moments. Metacognition allows the detection of these critical moments, which might be the key to the correct adaptation to environmental changes in uncertain contexts.
As highlighted by one of the anonymous referees, it is important to acknowledge that these results face limitations associated with the lack of physiological measures of emotional states and a small sample of subjects with no variability in sociodemographic characteristics. Incorporating physiological and subjective measures of mood could provide more definitive evidence regarding the role of emotions in decision-making under uncertainty. These additions may also shed light on individual differences or temporal factors influencing the variables under study. For instance, such measures could help explain why certain participants appear more sensitive to trial-by-trial or historical conditions within the task while also offering deeper insights into other factors, such as attention or fatigue, that may affect decision-making processes.
Moreover, environmental changes necessitate distinct cognitive dynamics for adapting decision strategies, potentially influenced by personality traits. Future research should address these gaps by including additional measures of emotional states, assessments of personality traits, and clinical samples, as well as employing more extensive and more diverse participant groups. Such steps would enhance the precision and applicability of the proposed cognitive models, enabling a more comprehensive understanding of the mechanisms underlying adaptive decision-making in uncertain contexts and yielding more robust and generalizable results.