1. Introduction and Basic Definitions
Prisoner’s dilemma (PD) games involve two players with a binary action each, typically denoted as cooperate (C) vs. defect (D). A usually symmetrical payoff matrix determines the reward of each player, depending on their combined action. Typically, payoffs are set so that it is most advantageous to D, if the other player Cs, but the mutual gain is highest if they both C (defection is then the Nash equilibrium). PD games have been extensively studied in psychology, partly because they can lead to apparent discrepancies with classical probability theory [
1,
2,
3,
4]. In the pioneering study by [
4], participants were put in the shoes of one of the players in a PD game and were presented with three kinds of trials: first, trials for which participants were told the other player would defect; second, trials for which participants were told the other player would cooperate; third, trials for which participants were not given information about the other player. Results indicated that
Prob(
DParticipant,
unknown) was outside the bounds of
Prob(
DParticipant|
known C) and
Prob(
DParticipant|
known D), thus violating the law of total probability. Such results are not insurmountably inconsistent with classical probability theory, but they do challenge the ubiquitousness of classical probability theory in cognitive theory [
5,
6,
7,
8].
In standard PD paradigms, there is a Nash equilibrium for each participant to D, that is, neither participant can improve her position by unilaterally changing a D action. In this work, we do not consider such PD paradigms, but rather just the two-player interactions, based on a payoff matrix without a Nash equilibrium. We refer to such paradigms as PD variants. The surprising hypothesis we are interested in is whether there are PD variants for which choice statistics cannot be modelled with a four-way probability distribution (this statement will be qualified shortly). So, our paradigm reflects a minimal set up of interaction between two agents. While there is a vast literature on game theory, we avoid engaging with this literature so as to focus on our specific objective: are there simple situations for the interaction between two agents, as just described, which might confound the straightforward expectation that behavior can be modelled with a four-way probability distribution?
Consider a PD variant, such that each of two players, Alice and Bob, has two binary questions; Alice’s questions are a1, a2 and Bob′s b1, b2, all having two possible outcomes ±1. A baseline classical expectation is that it is always possible to represent probabilities from such tasks as marginals from a four-way joint probability distribution. More conventionally, we expect that corresponding choice frequencies can be organized in a four-way table. So, our question is, are there PD variants for which participant behavior might be inconsistent with this expectation?
Noting that expectation values are computed as
, for a pair of binary questions,
x, y, with
being the probability of
and
the value assigned to
, the expectation value is
Consider three conditions when computing these expectations. First, locality means that Alice answers her questions without any information about what Bob is doing, and vice versa. Locality means that Alice and Bob are separated in space and no communication between them is possible [
9]. Second, free choice means that the question asked to Alice is determined independently from the one asked to Bob. Third, realism means that the outcomes to Alice and Bob′s questions exist, whether Alice and Bob state them or not. One of the most significant results in theoretical physics is that, with locality, free choice, and realism, the maximum value of
S is 2; this upper limit of
S is called Bell’s bound [
10,
11,
12]. Let us take realism for granted, so henceforth we will focus on locality and free choice.
Note, locality and free choice are properties of the two systems producing the relevant statistics. So, in the example with Alice and Bob, locality means that the two agents are local relative to each other—there is no communication—so that Alice has no information about Bob when making her choices and vice versa. Likewise, free choice means that Alice’s choices are not influenced by Bob′s.
How can Bell’s bound be broken? Consider Alice and Bob perfectly tuned to each other, so that , , and . Given this, if locality and free choice apply, then questions a2, b2 must correlate as well. This is because if a1, b1 perfectly correlate and a1, b2 perfectly correlate, then b1, b2 must perfectly correlate too. This, together with the fact that a2, b1 perfectly correlate with each other, leads to the conclusion that a2, b2 must perfectly correlate as well. But, if , then the S = 2, which is the maximum value that S can take, with realism, locality, and free choice. Therefore, the only way we can break Bell’s bound is via some kind of sensitivity to context. For example, Bob′s answers are sensitive to the context created by Alice’s questions.
To explain sensitivity to context, suppose that the b2 question depends on whether Alice considers a1 or a2. If Alice considers a1, then Bob responds to b2 in a way that the two questions correlate with each other, . However, if Alice considers the a2 question, then Bob responds to b2 in a way that the outcomes of the two questions anticorrelate, . That is, there is no answer to the b2 question, independently of what Alice does. If we accept the possibility of sensitivity to context, then we can easily see that the Bell bound can be exceeded, in that . In this simple situation, sensitivity to context means that the original set of questions {a1, a2, b1, b2} is better understood as {a1, a2, b1, b2a1, b2a2}, where b2 has two different versions, depending on which question Alice responds to.
Cases when
S > 2 reveal a case of correlation ‘stronger’ than classical correlation. For
S > 2, it is not sufficient for pairs of questions to be responded perfectly in tune with each other (this would be a case of perfect, classical correlation). It is also required that responses are sensitive to the questions asked by the other agent. Thus, cases of
S > 2 can be said to reflect
supercorrelation (noting of course that correlation is a binary relation, whereas supercorrelation is a relation between answers amongst two sets of questions). As noted in the physics literature, the kind of correlation producing
S = 4 is called a PR-box and refers to the strongest type of non-local correlation that is non-signaling, in the two-question, two-outcome scenario [
13].
Especially in physics, this discussion is complicated by various inter-related notions, such as signaling, disturbance, and communication. Signaling is a statistical notion informing us of whether the choice of measurement on one side affects the statistics on the other side. The idea is that Alice and Bob have some device generating statistics where a, b indicate outcomes and x, y = 1, 2 are the measurement settings for Alice and Bob respectively. Signaling is if Alice is able to send a meaningful signal to Bob concerning what her setting is, x = 1, 2. If signaling occurs, then Bob can infer Alice’s measurement setting by looking at the statistics on his side, i.e., depending on whether his statistics are different for different measurement settings for Alice: . Let us note that, if Bob does not know the outcome of Alice’s measurement, then we have to marginalise across different possibilities for this outcome, writing, e.g., when we are interested in x = 1, So, the signaling condition is that , that is, as noted, that Bob can tell whether Alice measures x = 1 or x = 2, by looking at the statistics on his side (later on, in the Signaling section we offer an equivalent way to compute signaling quantifiers).
When there is no signaling, another seminal result, Fine’s theorem [
14], shows that one condition for the existence of a (four-way) joint probability distribution for four binary random variables is
, which is called the Clauser, Horne, Shimony, and Holt (CSHS) inequality [
12]. Note, there are four versions to the inequality, depending on which expectation is given a minus sign in Equation (2) and Fine’s result states that the bound 2 for all those four expressions is the sufficient condition for the existence of a joint probability distribution. When there is signaling, there is a corresponding generalized test of contextuality due to [
15]; but see also [
9]. Above we referred to sensitivity to context rather than contextuality. We will define sensitivity to context more precisely shorty and offer our rationale for why sensitivity to context is the more appropriate notion for the present work, as opposed to contextuality. Readers should note, however, that there is intense, ongoing debate on these issues.
Presently, what we are interested in is whether there is sensitivity to context, which can be defined as the non-existence of a joint probability distribution—informally, we can say that Alice changes her answer to her question, depending on the question that Bob has. When there is signaling, we can immediately conclude that there is sensitivity to context, regardless of whether
or
. However, sometimes we may want to test for sensitivity to context without considering signaling. For example, this might be because signaling is low and hence our estimate of signaling is not necessarily reliable (for an example in physics, see [
16]). In such cases, when
, we can conclude that there is sensitivity to context (this follows from the usual proof of the Bell inequalities based only on the factorization property for conditional probabilities).
Here is the tricky point: Dzhafarov et al. [
15] generalized test examines sensitivity to context when there is signaling (their expression can be seen as subtracting away the influence from signaling). But in the present case, the only interest is whether Alice employs the available information of what Bob does to demonstrate sensitivity to context (here and throughout as defined in the paragraph above), regardless of whether this is due to signaling or not. So Dzhafarov et al. [
15] generalized test is not relevant here.
These distinctions are particularly relevant in psychology, since the only systems known to break Bell’s bound are physical systems of microscopic particles, obeying the laws of quantum mechanics. By contrast, for macroscopic systems, it is generally (see shortly) accepted that violations of Bell’s bound can be accounted for only by communication, disturbance or some other equivalent mechanism, between the two systems [
9]. For example, demonstrably classical systems, such as containers with fluids at different levels, connected by tubes, allow the construction of variables which violate Bell’s bound. But of course there is nothing peculiar going on and this is just a result of communication or influence between the systems (such examples have been known for a while, e.g., [
17,
18]). We can say that such systems demonstrate sensitivity to context. Note, there are subtleties to this discussion, for example see [
17,
19], who described possible systems for which a measurement (decision) itself can bring about the dependence to context needed for
S > 2. An additional subtlety is whether communication is assumed to lead to signaling or not. In [
18,
19] there is no signaling, but in [
17] there is signaling (as [
18] note, in general, communication can be taken to be some influence of some sort, but it does not always have to lead to signaling). These ideas are interesting, though we think they do not apply to the present results (this issue is briefly considered in the General Discussion).
2. Psychological Implications and Outline
Bell’s bound has an almost magical quality. Sensitivity to context means impossibility of describing the system in the usual way via a four-way probability distribution, with the marginal distributions representing the observed (conditional) statistics. But what exactly does this mean? Consider
Table 1, wherein we assume that all marginal probabilities are 0.5. For the right-hand side,
S = 4 and it can be shown that the corresponding probability information is not self-consistent (the same conjunction can be ‘shown’ to be both zero and non-zero,
Appendix A). We think that, amongst experimental psychologists at least, it is a baseline expectation that probabilities can be organized in a table of this kind.
We are interested in how these ideas translate to two individuals playing a game, corresponding to a Bell scenario (i.e., each individual has two binary questions). Of course, an interaction between two individuals is an extremely common decision situation. With the locality and free choice assumptions, in general it is impossible to break Bell’s bound [
19,
20,
21]. For two agents, the only way Bell’s bound can be exceeded is if at least one of the free choice or locality assumptions is violated. For example, suppose we retain free choice and allow violations of locality. Then, Bob needs to adjust his answers depending on knowledge of which question Alice receives. So, the decision to stay local or not is ‘outsourced’ to Bob—in the experimental paradigm we employ, it is up to the participants (on a trial by trial basis) to decide whether to stay local or not. This is the essence of the paradigm we will shortly present.
So far, while there have been several studies concerning Bell’s bound in psychology, these studies have focused on the thought processes of individual participants. Specifically, there have been several examinations of sensitivity to context, for the same participant answering all four questions,
a1,
a2,
b1,
b2 (for an early example see [
22]. The issue of compositionality in conceptual combination concerns whether the constituent concepts combine in a way that their meaning independently determines the meaning of the composite concept. For example, in considering the novel conceptual combination ‘spring plant’, under a compositionality assumption we would look for some meaning from ‘spring’ and some from ‘plant’, independently combined together. A contrasting hypothesis is that a constituent in a conceptual combination acquires meaning contextually, depending on the other constituent. For example, in the case of boxer-bat, whether we consider a sporting or animal sense for ‘bat’, will impact on the how we interpret ‘boxer’ [
23]. A number of theorists have employed the CHSH inequality or variants to conclude in favor of non-compositionality in conceptual combination [
23,
24], an issue of considerable significance concerning conceptual representation [
25,
26,
27]. Similar ideas have been pursued in memory associations [
28,
29] and in decision making [
24,
30].
There has been no research exploring Bell’s ideas for interacting agents. Our purpose is to develop a paradigm based on a PD variant involving the interaction of a participant with a hypothetical counterpart. The payoff matrices can be set up in a way that optimal performance (relative to overall payoff) requires sensitivity to the counterpart’s choices in some cases, but not others. Allowing participants to choose whether to communicate or not with their counterpart on every trial, we can examine participant’s sensitivity to context and the capacity of different modeling approaches to capture behavior.
We propose two models for modeling choice behavior, based on the models widely employed in physics for Bell paradigms. The classical model (specifically, a local hidden variables one) is based on an assumption of perfect coordination between the interacting agents, but without communication of the questions each agent receives on any trial. It allows for no sensitivity to context. The quantum model is also based on an assumption of perfect coordination between the agents, but, additionally, it allows sensitivity to context up to a certain degree (quantified by Tsirelson’s bound [
31]). In physics, such quantum models are interesting, because they allow sensitivity to context, even though there is no obvious physical mechanism violating locality and free choice (and there is no signaling). In psychology, such a quantum model offers a particular hypothesis of the extent to which any communication between participants can translate to sensitivity to context.
Note that we could construct more elaborate classical models, in which the causal role of communication on the observed statistics is included, and such models could (in principle) be reconciled with the sort of paradigm we have outlined above. However, we think it is more interesting to explore a baseline classical model (perfect coordination, but no sensitivity to context) vs. the standard quantum model (perfect coordination and some sensitivity to context), to inform our understanding of the extent to which participants could employ their information resource. We think it is surprising and interesting that, when
S > 2, as we shall see, a superficially reasonable classical model cannot offer a good description of behavior. Examining violations of Bell’s bound while allowing for interacting participants to break locality mimics attempts in physics to describe experimental statistics in Bell paradigms, by allowing violations of free choice and locality [
32].
More generally, the use of quantum probability theory in cognitive modeling follows an assumption that, in some cases, quantum principles offer better descriptions to human behavior [
33,
34,
35]. Quantum cognitive models have been explored for many kinds of cognitive processes, including decision making, categorization, similarity, perception, and memory. What is common amongst such diverse applications is a handful of characteristics which researchers have taken to be indicative of quantum-like processes. For example, sometimes behavior appears to be subject to interference effects, so that the law of total probability is violated—the PD games and analogous situations in [
4] are good examples. In other cases, when participants are asked to make a decision, it appears that the underlying mental state changes. Social psychologists have been aware of such processes for a long time [
36]. The added value from quantum models is that in quantum theory there is a specific requirement for how the state ought to change as a result of measurements (in behavior, decisions) and various researchers have taken advantage of these processes to build cognitive models (e.g., [
37,
38]). Of course, as outlined above, there have also been behavioral results indicative of sensitivity to context, for which the Bell framework and corresponding quantum models have been invoked to construct relevant theory (e.g., [
23,
24]). Quantum cognitive models have had good generative value, for example, in terms of anticipating biases from prior decisions [
38] or a surprising constraint for question order effects [
39].
As per our comments for Bell inequality violations above, in quantum cognitive models any quantum processes are epiphenomenal and are underwritten by an assumption of classical neurophysiology [
40]. Moreover, there have been some compelling proposals of heuristic models mimicking quantum models [
41]. So, why invoke the (unfamiliar) concepts of quantum theory at all? There are two reasons. First, it appears that in some behavioural cases quantum models can offer particularly simple explanations. Such cases tend to be ones for which behaviour is sensitive to context (as in the present case) or there are conflicting biases for behaviour, which appear to interfere with each other. Second, different quantum models generally employ the same set of principles and so have been used to identify commonalities between findings which, up to that point, had been considered separate [
42]. So, even assuming that there is no ‘real’ quantum structure in the brain, and even if there are compelling mimicries between a specific quantum model and models based on other principles (as in [
41]), we think there is explanatory value in considering such models.
4. Experiment 2
Experiment 1 showed that participants recognized that there would be different biases for action depending on whether their associate’s question was
b1 or
b2. In this experiment, we constructed payoff matrices so that the reduced matrix for e.g.,
a1 would be the collapsed matrix across the
a1
b1 and
a1
b2 possibilities (
Table 6).
Whereas previously there were only eight main trials (four question combinations in good and bad versions), for which participants were free to decide whether to check or not check, in this experiment we added eight trials when participants were forced to check and another eight trials in which participants were forced to not check (e.g., on some trials they were told that they had to check on their associate). Recall that to use Equation (1), we need probabilities for e.g., , which is computed by considering the number of times the participant denies when given question a1 together with his/her counterpart denying when given question b1. Trivially, . With this approach, we can compute S values for the entire sample, but it is difficult to do so for individual participants, because e.g., a participant may have not checked in the case of the a1b1 good trial. With the additional trials in this experiment, all relevant probabilities can be computed within participants, e.g., , and so S values can be computed within participants (which enables us to conduct some statistical tests). For the example of this probability, , for a particular participant there would be a max of two relevant trials and a min of one trial, depending on whether the participant decided to check when he/she had the option to do so.
We also included three questionnaires. First, we included the Toronto Empathy Questionnaire (TEQ, [
44]), since the present task is one of guessing what a (hypothetical) associate is planning to do. The questionnaire asks participants to rate 16 questions on a five-point scale, ranging from Never (1), Rarely (2), Sometimes (3), Often (4), to Always (5). Items include, “Other people’s misfortunes do not disturb me a great deal” and “It upsets me to see someone being treated disrespectfully”. Second, we include the 17-item Cognitive Uncertainty (CU) subscale from the Uncertainty Response Scale [
45]. The CU asks participants to state how well a series of statements describe them, including, “I like to plan ahead in detail rather than leaving things to chance” and “I like to know exactly what I’m going to do next” on a four-point scale of Never (1), Sometimes (2), Often (3) and Always (4). This questionnaire assesses the possibility that checking behavior is driven by uncertainty aversion. Finally, we employed the Cognitive Reflection Task (CRT) to test for engagement and reflection with our PD tasks. However, the original CRT has been massively overused [
46,
47]. To reduce the likelihood that participants had encountered the original CRT in the past, we used three of the word problems presented in the appendices of [
47]. Participants read each of the questions and were asked to provide an answer in the text box.
4.1. Participants
Participants were recruited using Prolific Academic and we restricted sampling to UK nationals only. They were paid £4.50 for their involvement. Sample size was set a priori to 100 participants, and we recruited 101 participants, 50 males, 50 females and 1 participant who self-identified as ‘other’. Participants were between 18 and 78 years old (MAge = 32.13 years old, SD = 12.54). Participants also reported their English fluency on a scale from 1 (extremely uncomfortable) to 5 (extremely comfortable), with the majority of participants reporting 5 (n = 95) and only a few others (n = 6) reporting 4 or lower. None of the participants for this experiment had taken part in Experiment 1.
4.2. Materials and Procedure
In Experiment 2, the payoff matrices were set up so that if the participant did not check, the reduced 2 × 1 payoff matrix would be identical across the two possible question combinations, e.g.,
a1
b1,
a1
b2 (
Table 6). Additionally, there were 24 trials in total: eight choice trials, where the participant can choose or not to check on their counterpart (as in Experiments 1), eight trials for which the participant is forced to check, and eight trials for which the participant is forced to not check.
4.3. Results
As expected, when participants did not check, choice proportions were nearly identical across matched pairs of question combinations (e.g.,
a1
b1 good and
a1
b2 good,
Table 7). Once again, we were interested in the extent to which participants check on their counterpart when they were meant to, notably in the case of
a2
b1 and
a2
b2 trials. For this experiment, this analysis will only examine the trials when participants could decide whether to check or not. We first confirmed that there was a difference in the overall proportion of trials when participants checked vs. did not check, χ² (1,
n = 808) = 148.31,
p = < 0.001 (
Table 3). Moreover, participants were more likely to check with
a2
b1 and
a2
b2 trials than for other ones (
Table 4).
We next consider the individual differences measures. We computed d’, empathy (TEQ), aversion (CU), engagement/reflection (CRT) scores and
S for each participant, using Equation (1) (focused on the trials when participants could choose whether to check or not). The
d’ coefficient was calculated as
, where
H and
F are hits and false alarms, respectively, and the
function converts raw scores to z scores by fitting a normal distribution (0, 1 mean and standard deviation) to scores from each participant and then inverting [
48]. Hits are considered instances of checking when the participants are meant to be checking (on
a2
b1 and
a2
b2 trials) and false alarms instances of checking when there would be no need for participants to check (on
a1
b1 and
a1
b2 trials). Note, due to the small number of trials per participants, we had a large number of probabilities of 0 or 1, which we corrected by adding 1 to the number of trials and 0.5 to the counts of hits and false alarms ([
48], p. 144). Indeed, participants checked more so on
a2b1 and
a2
b2 trials (hits) than they did in the other trials (false alarms). This is evident from the mean d’ (
M = 0.995, SD = 1.25) being above zero. All measures were then correlated with each other, without a multiple comparisons correction, as the intention is exploratory. There are two notable results. First, there was no relationship between individual participant
S scores and d’,
r = −0.135,
p = 0.18. Second, there was a negative relationship between
S and empathy,
r = −0.23,
p < 0.05. Higher values of
S imply higher sensitivity to context, which in this case means that a participant is better at recognizing when he/she should reverse decisions, based on what his/her counterpart is doing. One possible explanation for this result is that participants higher in empathy try to over-guess their counterpart’s action, at the expense of considering the statistical properties of the game. There were no other significant results.
6. Hidden Variables Classical Model
According to this model, for each of the two agents, there is a hidden variable
describing each sub-system, such that
, with
uniformly distributed over a 3D sphere. Note, this is an expression of perfect anti-correlation of the hidden variables corresponding to the agents, as opposed to perfect correlation, but this difference is immaterial (this is illustrated for the quantum model in
Appendix B, but the case is analogous for the classical model). So, the main assumptions of the model are as follows. First, if the same questions are asked, the participant will always perfectly coordinate in the same way with the counterpart, that is, either always correlate or always anticorrelate; assuming always-correlation, if the participant denies, it is assumed the counterpart will deny as well, etc. Second, there is a specific value for all question outcomes at all times. The implication of this more subtle assumption is that the participant should produce an outcome to her question, independently of which question is asked to her counterpart. In physics, this is the key realism assumption. Third, this model assumes locality and free choice. In the present experiments, we endow participants with a means of violating locality, so if they do this in a certain way, we expect the model to perform poorly. A final, minor assumption is that the participant will generally recognize the optimal action in each trial (corresponding to a lower sentence), and that she will always assume that her associate will also take the optimal action. This assumption is minor because of the way the payoff matrices were constructed, but if it is wrong, the model will just fail (both models will fail). In what follows, instead of a participant and her counterpart, we sometimes talk about two interacting agents, Alice and Bob.
The first agent is measured in two directions,
a1,
a2 and the second agent is measured in two different directions,
b1,
b2. In the present psychological context, ‘directions’ just correspond to the steer for action from each question, which is a function of the information in the payoff matrix and the agent’s interpretation of this information (which will depend on his/her personality etc.). Non-trivial algebra shows that (e.g., [
10]; note, the assumption concerning the existence of the hidden variable
impacts on how these probabilities are derived):
The key parameter in Equation (3) is the angle , in radians, corresponding to the correlation between a measurement direction a for Alice and b for Bob. So, the joint probability for Alice and Bob to deny for question combination ab depends on the relation between how Alice perceives question a and Bob question b. Note that when , there is an equal chance for Alice and Bob to anticorrelate in one way (plus, minus) vs. the opposite way (minus, plus), which is just an expression of the assumption , in the considered hidden variable model.
Since we have four pairs of measurement directions,
a1
b1,
a1
b2,
a2
b1,
a2
b2, then there are four angles as the parameters of this model. But these parameters are not independent. In the original physics set up they are actual measurement directions—psychologically, there is a corresponding assumption regarding the extent to which the two agents align or not in their consideration of questions. Suppose we have co-planar measurement directions, without much loss of generality. Then, the
Figure 1 arrangement is a plausible representation of the four directions. Without loss of generality, we set θ
a = 0 and θ
b1, θ
b2 and θ
a2 as shown in
Figure 1. Then, the four angles needed for the classical model are given as
a1
b1 = θ
b1 mod π,
a1
b2 = θ
b2 mod π,
a2
b1 = (θ
a2 − θ
b1) mod π, and
a2
b2 = (θ
a2 − θ
b2) mod π. The mod π function simply ensures that the angles for the four question pairs stay within the
limit. It is defined as:
We next consider the
S value given this classical model.
is the probability for both agents to +, when the questions are
a1,
b1,
the probability for Alice to + and Bob to—etc. Each expectation value is given by
, where
is the angle between the measurement directions
a,
b. The overall result for the classical model is then:
Note, we have mentioned that for this classical model S is bounded by 2. It can be shown that for the max is and the min is , where is the angle between b1, b2, and for the max, min are and −. Together these results deliver the classical limits for S.
7. Quantum Model
One of the most significant discoveries in the history of quantum theory has been the capacity of the theory to break the classical
bound, seemingly without violating either locality or free choice. In the present paradigm, the situation is less philosophically challenging, since we endow the two agents with a communication capacity to break locality. Since the statistics produced by the quantum model are equivalent to classical ones, but with a degree of violation of locality (or free choice; [
32]), the quantum model is a reasonable option for the present paradigm. The assumptions of the quantum model are equivalent to those of the classical one, but for two differences. First, instead of the Bayesian probability rules, we employ the probability rules from quantum theory. Second, instead of a hidden variable capturing perfect coordination between the two agents, we have the quantum property of
entanglement (see just below). However, this is not true (physical) quantum entanglement, but rather one of a more epiphenomenal flavor [
40].
A column vector is denoted as , its conjugate transpose as , and an inner product between two vectors as . Since we are concerned with two systems (agents), we need to employ tensor products to construct the joint state from the individual states, for example, which can be written for brevity as . We employ a qubit representation such that 0 means an intention for a ‘−’ (minus) action (Confess) and 1 a ‘+’ action (Deny). States are represented as . Measurements can change the state, so if on measuring we obtain x the new state becomes .
We start with state,
, where the tensor structure is so that the first index corresponds to Alice and the second to Bob (the subscript ‘+’ in
simply indicates a ‘correlation’ state). So,
means that Alice is intending to minus and Bob to minus etc. Note, in physics, the state used is typically the singlet state, which is an anticorrelation state,
. However, the predictions from
are essentially identical but for a fixed rotation of the measurement directions; so, for the purposes of model fitting, this issue is irrelevant (in a way analogous to that for the classical model). The state
is called entangled and is one of perfect coordination between the two agents, but now using the rules of quantum theory. The predictions from the quantum model are then (
Appendix C).
As before, the crucial parameter is the angle
for each measurement direction. The four angles are constrained as for the classical model (
Figure 1), so that the quantum model also has three parameters.
We can consider the computation for the Bell bound from the quantum model. We have that the expectation values are given by
, where
is the angle between the two measurement directions. Then,
It can be immediately seen that if we set the angle for
a1
b1,
a2
b1,
a1
b2 to
, with the arrangement as in
Figure 1,
a2
b2 is
. Then
S =
= 2
. In fact, though not obvious from the present discussion, a quantum model cannot produce
S values greater than 2
and 2
is called Tsirelson’s bound [
31].
10. Fit Results
Table 9 shows observed, classical predicted, and quantum predicted probabilities. Observe that for the
a1
b1,
a1
b2, and
a2
b1 pairs we recorded higher probabilities along the diagonals of the corresponding cells, but for the
a2
b2 pair, the opposite is true. This is the essential impression of supercorrelation and sensitivity to context: participants respond differently to question
a2 depending on whether his/her counterpart received question
b1 (correlation) vs.
b2 (anticorrelation).
We computed three
S values, one for the observed choice probabilities, one for the predicted probabilities based on the classical model, and one for the predicted probabilities based on the quantum model. Note that, for Experiment 2, empirical
S was computed on the basis of the trials for which participants could freely choose whether to check on their associate or not. For Experiment 1, the empirical
S, best fit classical
S, and best fit quantum
S were, respectively, 3, 2 (
), and 2.76 (
). For Experiment 2, the corresponding values were 2.46, 2 (
), and 2.65 (
). Bootstrapped 95% confidence intervals for the empirical
S values were [2.73, 3.23] for Experiment 1 and [2.23, 2.71] for Experiment 2. The confidence intervals were computed by first calculating individual
S values for each participant (only choice trials were used in this computation). Means were then calculated from each of the 1000 bootstrap samples created (each bootstrapped sample was a random choice of N values from the original sample, with replacement, where N = number of values in the sample, i.e., the number of participants). Finally, the bootstrapped means were sorted and quantiles of 0.025 and 0.975 were utilized to indicate the 95% confidence intervals for each participant. In all cases, the empirical data show
S > 2, which demonstrates sensitivity to context and the impossibility of a four-way classical probability distribution to explain the data. The classical model resulted in worse fits than the quantum one, with the latter producing
S values closer to the observed ones. Note that while the quantum model is able to capture a certain kind of sensitivity to context, of course it cannot describe any behavior [
31].
Using the forced checking and non-checking trials in Experiment 2, we computed S values for checking and non-checking trials for each participant. Note, in this case, it is only checking trials that should allow a violation of the bound—therefore, for non-checking trials, it must be the case that . When participants were not checking on their associate, S for the good and bad trials respectively were 1.78 and 1.82; when checking, we observed 2.91 and 2.59, respectively. The difference in S between checking (averaged across good, bad matrices 2.75) and non-checking trials (averaged across good, bad matrices 1.80) was reliable, Z = −6.44, p < 0.001 (using the Wilcoxon Signed Rank Test, as the normality assumption would be suspect here).
11. Signaling
We finally, briefly consider the issue of signaling, for completeness. We can define a signaling quantity as:
where the expectation values are defined as expected, for example,
. Note, the max value for
is 8, when communication in both directions is considered (this is relevant in evaluating the size of the observed
values). We review a point which may lead to confusion: the probabilities in
Table 5 and
Table 7 are not exactly the ones appearing in these expectation values. This is because, in
Table 5 and
Table 7, we counted probabilities separately for the Good and Bad matrices, i.e., the probabilities in
Table 5 and
Table 7 are e.g.,
. Therefore, as seen above too, we need to compute
, but recall
. So,
, because in the present design
(meaning the probability of having a ‘good’ payoff matrix etc.; the same applies for all question combinations). The probabilities
etc. are the ones in
Table 5 and
Table 7 and so in computing the expectation values for
, all probabilities from
Table 5 and
Table 7 need to be multiplied by a factor of ½ (the same applies to the calculations for the
S values presented in
Table 10).
We computed
separately for each experiment and for the checking vs. no checking trials. For Experiment 1, for the checking and no checking trials we observed, respectively, that
and
. The corresponding values in Experiment 2 were
and
. In Experiment 2, the results are as expected, since there is more signaling in the checking trials (ostensibly as a result of communication). In Experiment 1, even though for the no checking trials there was no communication, we still observed sizeable signaling. Signaling in Experiment 1 would be the result of the lack of balancing between the payoff matrices (as discussed in detail above). A consideration of signaling is clearly useful as a way to establish whether there might be unintended causal influences in the experimental statistics (as in Experiment 1). However, the non-zero
in Experiment 2 in the no checking trials indicates that signaling may be apparent even when there is no plausible corresponding mechanism, perhaps as a result of noise [
16]. This does recommend caution when employing signaling in such experiments, especially when the
N is small (as would be the case in behavioral experiments).
The calculation of the signaling quantifiers
allows us to test for contextuality in the sense of [
15], which we do here for completeness. According to this work, contextuality is present whenever
(the
S here refers to the maximum one between the four possible ways to compute it; here, we focused only on
, which is most relevant to our experimental design). In
Table 10, we offer a complete record of relevant
S values for the checking/ no checking quantifiers separately, for both experiments, as well as the quantities
, which are, as it happens, indicative of contextuality.
12. General Discussion
Sensitivity to context is an important insight concerning the representation of information, whether in physics, data science, or psychology. Outside the physics of microscopic particles, it is assumed that there are no true quantum processes, and the study of sensitivity to context leads one to question the mechanism that supports it. In psychology, some pioneering work has been carried out so that both sets of questions, {
a1,
a2} and {
b1,
b2}, would be answered by the same participant, or in any case concern mental processes focused on the individual (e.g., [
24,
28]). Such approaches cannot be adapted to the interaction between separate agents because, in general, without communication there is no possibility of breaking Bell’s bound (or without rigging the choice of the questions asked to each agent).
For the first time, in this study we developed an approach enabling the application of the Bell framework in the interaction of two cognitive (and so macroscopic) agents. We considered putative locality violations as an information resource, that two interacting agents can employ at will (cf. [
32]). We developed a simple empirical paradigm which embodied sensitivity to context in its structure, as a variant of a PD task [
2] Empirical results showed that participants were sensitive to this context and the empirical
S values exceeded Bell’s bound. As noted, this is not surprising, given the structure of the payoff matrices we employed. The more surprising implication is that this sensitivity prevented fits by a simple classical model and therefore shows another way in which PD tasks and variants can produce results problematic for baseline expectation from classical probability theory. ‘Baseline’ is a key qualification here since, as noted above, a classical model incorporating communication could be developed to account for the present results. Therefore, the present situation is not unlike most so-called paradoxes in probabilistic inference, for which a baseline classical probability approach appears erroneous, but it is always possible to offer accommodating elaborations (e.g., faced with a result such as
Prob(X&Y) >
Prob(X), one could write
Prob(X&Y|A) >
Prob(X|B)).
Theoretically, we fitted two closely matched models, a classical and a quantum one. The latter produced superior fits. This conclusion adds to the body of evidence that quantum theory sometimes offers a good descriptive framework for behavior [
33,
34]. Elsewhere, we have suggested that this is because quantum theory looks like Bayesian inference, but in a local way [
49]. That is, a set of questions for which it is impossible to have a complete joint probability distribution (e.g., because of resource limitations) is divided into subsets, such that within each subset—locally—we have Bayesian inference, but across subsets apparent classical errors arise. The idea that behavior is ‘locally rational’ has a precedent in psychology [
50,
51].
Note that the immediate availability of locality violations to the participants makes it unlikely that any results showing
S > 2 would be due to ‘correlations of the second kind’, as discussed by
S. Aerts and D. Aerts [
19,
21,
52]. In Experiment 2, when participants would check on the hypothetical counterpart we observed
S > 2 and when they would not,
S < 2, showing that any apparent sensitivity to context was not brought about just by the measurements (decisions) themselves.
From the point of view of a physicist, the present results are interpreted as sensitivity to context, due to communication, regardless of whether this sensitivity to context is due to signaling or not. As noted, rather than considering signaling a nuisance influence, in this case we are interested in it, as a possible way in which Alice makes use of the information she has about Bob′s questions.
There have been several challenges in realizing this project. First, the notion of applying the Bell framework to the interaction of cognitive agents superficially goes against the grain of Bell’s work in physics. To address this problem, we had to formalize a notion of violations of locality or free choice, as information resources, which can be adopted vs. not at will (our formal work on this topic is reported in [
32], as well as consider the distinction between context sensitivity and contextuality (for the latter see [
15]). Second, adapting the classical and quantum models developed for systems of microscopic particles in physics to behavioral data required careful consideration of the underlying assumptions of the models and how they could be matched to behavioral situations. Third, the difference between contextuality and sensitivity to context and the restrictive (or not) role of signaling in Bell-type paradigms are highly contentious issues. We think the approach we chose is justified, but equally we have offered additional analyses which we hope will allow researchers of differing opinions to still appreciate the results. Finally, reporting the research was challenging: the primary audience for this work is cognitive scientists, but we also hope to interest physicists and mathematicians familiar with Bell who might be intrigued by applications outside physics. But the mathematics is likely to be unfamiliar and challenging to cognitive scientists, while the details of the behavioral paradigm unfamiliar to physicists and mathematicians. Overall, interdisciplinary work of this kind, while conceptually exciting and potentially rewarding, is fraught with challenges—we can only hope that we have been at least partly successful in overcoming them.
The present analysis has practical potential. Consider two agents, Alice and Bob, for whom it is in their interest to supercorrelate, but such that they are not meant to break locality and free choice, e.g., they are not meant to communicate. Alice and Bob might be an employee in a tech firm and a stockbroker considering investment opportunities in that firm, respectively. The present framework could be employed to determine whether Alice and Bob benefit from supercorrelation, either on the basis of violations of locality (which may reveal illegal insider trading) or free choice (which could correspond to Alice and Bob independently being sensitive to market conditions which determine the ‘questions’ each one of them has to respond to, at a given time). Clearly, the applicability of such an analysis depends largely on how the questions for each agent are specified and whether there is advantage in supercorrelation, which may not be very often.
In closing, we hope that the present work will further encourage researchers to employ the notion of contextuality and the corresponding technical tools in the study of the interaction between multiple agents.