There are subtleties involved. What exactly is meant by “correlation”? Many experimenters proudly exhibited a negative cosine, but the amplitude of that curve is crucial. The question is: can classical computers generate a full amplitude negative cosine, where the “correlation” is the average of the product of the valued outcomes. No post-selection or renormalisation is allowed.
We give a complete probabilistic proof of this impossibility theorem using basic results from Fourier analysis. No knowledge of quantum mechanics is needed.
Bell’s Theorem as a Theorem of Distributed Computing
We would like to paraphrase what is now called Bell’s theorem as the metaphysical statement that quantum mechanics (QM) is incompatible with local realism (LR). More precisely, and following B. Tsirelson’s wonderful
Citizendium.org article [
9], Bell’s theorem states that conventional quantum mechanics is a mathematical structure incompatible with the conjunction of three mathematical properties:
relativistic local causality (commonly abbreviated to “locality”),
counterfactual definiteness (“realism”) and
no-conspiracy (“freedom”) (some readers have objected to various parts of this terminology; there is a continuously raging discussion in the philosophy of science concerning terminology). By conventional quantum mechanics, we mean: quantum mechanics including the Born rule, but with a minimum of further interpretational baggage. Whether the physicist likes to think of probabilities in a Bayesian or in a frequentist sense is up to them. In Many Worlds interpretations (and some other approaches), the Born rule is argued to follow from the deterministic (unitary evolution) part of the theory. Nevertheless, everyone agrees that it is there.
Bell himself sometimes used the phrase “my theorem” to refer to his
inequalities: first his (Bell, 1964) three correlations inequality [
1], and later what is now called the Bell-CHSH (Clauser, Horne, Shimony, Holt 1969) four correlations inequality [
10]. We would rather see those inequalities as simple probabilistic
lemmas used in two alternative, conventional, proofs of the same
theorem (the incompatibility of QM and LR). Actually, the original proof involves four correlations, too, since it also builds on perfect anti-correlation in the singlet correlations when equal settings are used. Moreover, reference [
1] also has a small section in which Bell shows that his conclusions still hold if all those correlations are only approximately true.
The CHSH inequality is, indeed, a very simple result in elementary probability theory. It is a direct consequence of the Boole inequality stating that the probability of the disjunction (the logical “or”) of several propositions is bounded by the sum of their individual probabilities. Something a whole lot stronger is Fine’s (1982) theorem [
11], that the satisfaction of all eight one-sided CHSH inequalities together with the four no-signalling equalities is necessary and sufficient for a local hidden variables theory to explain the sixteen conditional probabilities
of pairs of binary outcomes
x,
y, given pairs of binary settings
a,
b, in a Bell-CHSH-type experiment. No-signalling is the statement that Alice cannot see from her statistics what Bob is doing: with typical statisticians’ and physicists’ abuse of notation (the abuse of notation is that which actual function you are talking about depends on the names you use for the variables on which it depends; this abuse is nowadays reinforced by being built into some computer languages),
does not depend on
b, and similarly, for Bob,
does not depend on
a.
The following remarks form a digression into the prehistory of Bell inequalities. With quite some imagination, a version of Fine’s theorem can be recognised in Boole’s (1853) monumental work [
12], namely his book “
An investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities”. See the end of Section 11 of Chapter XIX of [
12], where the general method of that chapter is applied to Example 7, Case III of Chapter XVIII. Boole derives the conditions on three probabilities
p,
q and
r of three events that must hold in order that a probability space exists on which those three events can be defined with precisely those three probabilities, given certain logical relations between those three events. In modern terminology let the events be
,
,
where
X,
Y and
Z are three random variables. It is impossible that two of the events are true but the third is false. Boole derives the six linear inequalities involving
p,
q and
r whose simultaneous validity is necessary and sufficient for such a probability space to exist. Boole’s proof uses the arithmetic of Boolean logic, taking expectations of elementary linear identities concerning indicator functions (0/1 valued random variables), just as Bell did.
A similar exploration of necessary and sufficient conditions for a probability space to exist, supporting a collection of probability assignments to various composite events, was given by Vorob’ev (1982) [
13]. Vorob’ev starts his impressive but nowadays pretty much forgotten paper with a little three variable example. He has no further concrete examples, just general theorems; no reference to Bell or to Boole. The fact that Bell’s inequalities have a precursor in Boole’s work was first mentioned (though without giving a precise reference) by Itamar Pitowsky in a number of publications in the early 1980s. This led several later authors to accuse Bell of carelessness and even to suggest plagiarism because Bell does not refer to Boole. At least one cannot blame Bell (1964) for not citing Vorob’ev’s paper. All that this really shows is that Bell’s theorem elicits very strong emotions, both positive and negative, and that lots of physicists and even mathematicians do not know much probability theory.
Having cleared up terminology and attribution, we continue towards the contribution of Gull to the story.
In a Bell-CHSH type experiment, we have two locations or labs, in which two experimenters, Alice and Bob, can each input a binary setting to a device, which then generates a binary output. The setting corresponds to an intended choice of an angle, but only two angles are considered in each wing of the experiment. This is repeated, say N times. We will talk about a run of Ntrials. The settings are externally generated, perhaps by tossing coins or performing some other auxiliary experiment. The N trials of Alice and Bob are somehow synchronised; exactly how does not matter for the purposes of this paper, but in real experiments, the synchronisation is taken care of using clocks, and the spatial-temporal arrangement of the two labs is such that there is no way a signal carrying Alice’s nth setting, sent just before it is inserted into her device, could reach Bob’s lab before his device has generated its nth outcome, even if it were transmitted at the speed of light.
In actual fact, these experiments involve measurements of the “spin” of “quantum spin-half particles” (electrons, for instance), or alternatively, measurements of the polarisation of photons in the plane opposite to their directions of travel. The two settings, both of Alice and Bob, correspond to what are intended to be two
directions (spin) or
orientations (polarisation), usually in the plane but conceivably in three-dimensional space. Polarisation does not only have an orientation—horizontal or vertical with respect to any direction in the plane—but also a degree of ellipticity (anything from circular to linear), and it can be clockwise or anti-clockwise. Talking about spin: in the so-called singlet state of two maximally entangled spin-half quantum systems, one can conceivably measure each subsystem in any 3D direction whatsoever, and the resulting pair of
-valued outcomes
would have the “correlation”
. Marginally, they would be completely random,
. The quantum physics set-up is often called an EPR-B experiment: the Einstein, Podolsky, Rosen (1935) [
14] thought experiment, transferred to spin by Bohm and Aharonov (1957) [
15].
When translated to the polarization example, this joint probability distribution of two binary variables is often called Malus’ law. We will, however, stick to the spin-half example and call it “the singlet correlations”. Moreover, we will restrict attention to spin measured in directions in the plane. The archetypical example (though itself only a thought experiment) of such an experiment would involve two Stern–Gerlach devices and is a basic example in many quantum physics texts. Present day experiments use completely different physical systems. Nowadays, anyone can buy time on a quantum computer “in the cloud” and do the experiment themselves on an imperfect two-qubit quantum computer. One can perhaps also look forward to a future quantum internet, connecting two one-qubit quantum computers.
Now, Bell was actually interested in what one nowadays calls (possibly “stochastic”) “local hidden variables theories” (LHV). According to such a theory, the statistics predicted by quantum mechanics, and observed in experiments, are merely the reflection of a more classical underlying theory of an essentially deterministic and local nature. There might be local randomness, i.e., by definition, randomness inside the internal structure of individual particles, completely independent of the local randomness elsewhere. Think of “pseudo-random” processes going on inside particles. Think of modelling the whole of nature as a huge stochastic cellular automaton. Mathematically, one might characterise such theories as claiming the mathematical existence of a classical probability space on which are defined a large collection of random variables and for all directions a and b in the plane, such that each pair has the previously described joint probability distribution. Therefore, the question addressed by this paper is: can such a probability space exist? The answer is well-known to be “no”, and the usual proof is via the Bell-CHSH inequality. The impossibility theorem is what we will call Gull’s theorem. We are interested here in a different way to prove it—very different from the usual proofs—based on the outline presented by Gull.
The underlying probability space is usually called instead of , and the elementary outcomes stand for the configuration of all the particles involved in the whole combined set-up of a source connected to two distant detectors, which are fed the settings a and b from outside. Thus stands within the mathematical model for the outcome that Alice would theoretically see if she used the setting a, even if she actually used another, or even if the whole system was destroyed before the outcomes of the actually chosen settings were registered. There is no claim that these variables exist in reality, whatever that means. I am talking about the mathematical existence of a mathematical model with certain mathematical features. I am not entering a debate involving use of the concepts ontological and epistemological. This is not about the properties of the real physical world. It is about mathematical descriptions thereof; mathematical descriptions that could be adequate tools for predictions of statistics (empirical averages) and predictions of probabilities (empirical relative frequencies). If I write about the outcome that Alice would have observed, had, counter to actual fact, Bob’s measurement setting been different from what he actually chose, I use colourful, and hopefully helpful, language about mathematical variables in mathematical models or about variables in computer code in computer simulation programs.
In a sequence of trials, one would suppose that for each trial there is some kind of resetting of the apparatus so that at the nth trial, we see the outcomes corresponding to , where the sequence , , ..., represents independent draws from the same probability measure on the same probability space . Now suppose we could come up with such a theory and indeed come up with a (classical) Monte-Carlo computer simulation of that theory on a classical PC. Then, we could do the following. Simulate N outcomes of the hidden variable , and simply write them into two computer programs as N constants defined in the preamble to the programs. More conveniently, if they were simulated by a pseudo random number generator (RNG), then we could write the constants used in the generator, and an initial seed, as just a few constants, and reproduce the RNG itself inside both programs. The programs are to be run on two computers thought of as belonging to Alice and Bob. The two programs are started. They both set up a dialogue (a loop). Initially, n is set to 1. Alice’s computer prints the message, “Alice, this is trial number . Please input an angle.” Alice’s computer then waits for Alice to type an angle and hit the “enter” key. Bob’s computer does exactly the same thing, repeatedly asking Bob for an angle.
If, on her nth trial, Alice submits the angle a, then the program on her computer evaluates and outputs , increments n by one, and the dialogue is repeated. Alice’s computer doesn’t need Bob’s angle for this – this is where locality comes in! Bob’s does not need Alice’s.
We have argued that if we could implement a local hidden variables theory in one computer program, then we could simulate the singlet correlations derived from one run of many trials on two completely separate computers, each running its own program, and each receiving its own stream of inputs (settings) and generating its own stream of outputs.
To summarise: for us, a local hidden variables theory for the EPR-B “Gedanken experiment”, including what is often called a stochastic local hidden variables theory, is just a pair of functions , taking values , where a and b are directions in the plane, together with a probability distribution over the third variable , which can lie in any space of any complexity and which represents everything that determines the final measured outcomes throughout the whole combined system of a source, transmission lines, and two measurement stations, including (local, pseudo-) randomness there. In the theory, the outputs are a deterministic function of the inputs. When implemented as two computer programs, even more is true: given the programs, the nth output of either computer depends only on n and on the nth input setting angle on that computer, not on the earlier input angles. If one reruns either program with an identical input stream, the output stream will be the same, too. In order to obtain different outputs, one would have to change the constants defined in the program. As we mentioned before, but want to emphasise again, we envisage that an identical stream of instances of the hidden variable is generated on both computers by the same RNG, initialised by the same seed; that seed is a constant fixed in the start of the programs.
The previous discussion was quite lengthy but is needed to clear up questions about Gull’s assumptions. Actually, quite a lot of careful thought lies behind them.
Gull [
4] posed the problem: write those computer programs, or prove that they do not exist. He gave a sketch of a rather pretty proof that such programs could not exist using Fourier theory, and that is what we will turn to next. We will call the statement that such programs do not exist
Gull’s theorem.
We will go through Gull’s outline proof but will run into difficulty at the last step. However, it can easily be fixed. In Gull’s argument, there are, after some preparations, two completely separated computers running completely deterministically. We need three networked computers: a third computer supplies a stream of random numbers to the two computers, which represent the two measurement stations in Bell’s theorem. At the end of the day, one can imagine that computer replaced by a cloned, virtual computer, generating the same pseudo-random numbers within each of Alice and Bob’s computers. Gull’s proof then just needs a third step: writing a grand expectation over all randomness as the expectation of a conditional expectation, given the hidden variables.
A recently posted
stackexchange discussion [
16]) also attempts to decode Gull’s proof, but in our opinion, is also incomplete, becoming stuck at the same point as we did.
In a final section, we will show how Gull’s theorem (with completely separated, deterministically operating computers) can also be proven using a Bell theorem proof variant due to the first author of the present paper, designed specifically for fighting Bell deniers by challenging them to implement their theory as a networked computer experiment. The trick is to use externally created streams of random binary setting choices and derive martingale properties of a suitable game score, treating the physics implemented inside the computers as completely deterministic. Randomness resides only in the streams of settings used, trial by trial, in the experiment.
The message of this paper is that Gull’s theorem is true. Its mathematical formulation is open to several interpretations, needing different proofs, of course. A version certainly can be proven using Fourier theory, and Gull’s Fourier theoretic proof of this version is very pretty and original indeed.
We have referred to the question of computer simulation of Bell experiments, and to Gull’s proof of Bell’s theorem, in recent papers by Gill (2021, 2022). It is good that any doubts as to the validity of Gull’s claim can be dispelled, though his claim does need more precise formulation since his outline proof has a gap. We hope he would approve of how we have bridged the gap.