1. Introduction
The laws of physics, and in particular the laws of dynamics, have traditionally been seen as laws of nature. It is usually believed that such laws are discovered and that they are useful because they reflect reality. The reflection, imperfect though it may be, represents a very direct relation between physics and nature. Here we explore an alternative view in which the relation is considerably more indirect: The laws of physics provide a framework for processing information about nature. From this perspective physical models are mere tools that are partly discovered and partly designed with our own very human purposes in mind. This approach is decidedly pragmatic: when tools happen to be successful we do not say that they are true; we say that they are useful.
Is there any evidence in support of such an unorthodox view? The answer is yes. Indeed, if physics is an exercise in inference then we should expect it to include both ontic and epistemic concepts. The ontic concepts are meant to represent those entities in nature that are the subject of our interest. They include the quantities, such as particle positions and field strengths, that we want to predict, to explain, and to control. The epistemic concepts, on the other hand, are the tools—the probabilities, the entropies, and the (information) geometries—that are used to carry out our inferences. The prominence of these epistemic elements strongly suggests that physics is not a mere mirror of nature; instead physics is an inference framework designed by humans for the purpose of facilitating their interactions with nature. The founders of quantum theory—Bohr, Heisenberg, Born,
etc.—were quite aware of the epistemological and pragmatic elements in quantum mechanics (see e.g., [
1]) but they wrote at a time when the tools of quantitative epistemology—the Bayesian and entropic methods of inference—had not yet been sufficiently developed.
Entropic Dynamics (ED) provides a framework for deriving dynamical laws as an application of entropic methods. (The principle of maximum entropy as a method for inference can be traced to E. T. Jaynes. For a pedagogical overview of Bayesian and entropic inference and further references see [
2].) In ED, the dynamics is driven by entropy subject to the constraints appropriate to the problem at hand. It is through these constraints that the “physics” is introduced. Such a framework is extremely restrictive. For example, in order to adopt an epistemic view of the quantum state ψ it is not sufficient to merely assert that the probability
represents a state of knowledge; this is a good start but it is not nearly enough. It is also necessary that the changes or updates of the epistemic ψ—which include
both the unitary time evolution described the Schrödinger equation and the collapse of the wave function during measurement—be
derived according to the established rules of inference. Therefore, in a truly entropic dynamics we are not allowed to postulate action principles that operate at some deeper sub-quantum level. Instead, the goal is to derive such action principles from entropy principles with suitably chosen constraints.
In this paper we collect and streamline results that have appeared in several publications (see [
3,
4,
5,
6] and references therein) to provide a self-contained overview of three types of ED: (1) standard diffusion; (2) Hamiltonian dynamics; and (3) quantum mechanics. First we tackle the case of a diffusion process which serves to address the central concern with the nature of time. In ED “entropic” time is a relational concept introduced as a book-keeping device designed to keep track of the accumulation of change. Time is constructed by (a) introducing the notion of “instants”; (b) showing that these instants are “ordered”; and (c) defining a measure of the interval that separates successive instants. The welcome new feature is that an arrow of time is generated automatically; entropic time is intrinsically directional.
The early formulation of ED [
3] involved assumptions about auxiliary variables, the metric of configuration space, and the form of the quantum potential. All these assumptions have been subsequently removed. In [
6] it was shown how the constraint that the dynamics be non-dissipative leads to a generic form of Hamiltonian dynamics, with its corresponding symplectic structure and action principle. Thus, in the context of ED action principles are
derived; they are useful tools but they are not fundamental.
Different Hamiltonians lead to different dynamical laws. We show how considerations of information geometry provide the natural path to Hamiltonians that include the correct form of “quantum potential” and lead to the Schrödinger equation, and we also identify the constraints that describe motion in an external electromagnetic field.
Here, we focus on the derivation of the Schrödinger equation but the ED approach has been applied to several other topics in quantum mechanics that will not be reviewed here. These include the quantum measurement problem [
5,
7,
8]; momentum, angular momentum, their uncertainty relations, and spin [
9,
10]; relativistic scalar fields [
11]; the Bohmian limit [
12]; and the extension to curved spaces [
13].
There is vast literature on the attempts to reconstruct quantum mechanics and it is inevitable that the ED approach might resemble them in one aspect or another — after all, in order to claim success all these approaches must sooner or later converge to the same Schrödinger equation. However, there are important differences. For example, the central concern with the notion of time makes ED significantly different from other approaches that are also based on information theory (see e.g., [
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]). ED also differs from those approaches that attempt to explain the emergence of quantum behavior as the effective statistical mechanics of some underlying sub-quantum dynamics which might possibly include some additional stochastic element (see e.g., [
25,
26,
27,
28,
29,
30,
31,
32,
33,
34]). Indeed, ED makes no reference to any sub-quantum dynamics whether classical, deterministic, or stochastic.
2. Entropic Dynamics
As with other applications of entropic methods, to derive dynamical laws we must first specify the microstates that are the subject of our inference — the subject matter — and then we must specify the prior probabilities and the constraints that represent the information that is relevant to our problem. (See e.g., [
2].) We consider
N particles living in a flat Euclidean space
with metric
. We assume that the particles have definite positions
and it is their unknown values that we wish to infer. (The index
denotes the particle and
the spatial coordinates.) For
N particles the configuration space is
.
In this work ED is developed as a model for the quantum mechanics of particles. The same framework can be deployed to construct models for the quantum mechanics of fields, in which case it is the fields that are objectively “real” and have well-defined, albeit unknown, values [
11].
The assumption that the particles have definite positions is in flat contradiction with the standard Copenhagen notion that quantum particles acquire definite positions only as a result of a measurement. For example, in the ED description of the double slit experiment we do not know which slit the quantum particle goes through but it most definitely goes through either one or the other.
We do not explain why motion happens but, given the information that it does, our task is to produce an estimate of what short steps we can reasonably expect. The next assumption is dynamical: we assume that the particles follow trajectories that are continuous. This means that displacements over finite distances can be analyzed as the accumulation of many infinitesimally short steps, and our first task is to find the transition probability density for a single short step from a given initial to an unknown neighboring . Later we will determine how such short steps accumulate to yield a finite displacement.
To find
we maximize the (relative) entropy,
To simplify the notation in all configuration space integrals we write , , and so on. is the prior probability. It expresses our beliefs—or more precisely, the beliefs of an ideally rational agent—before any information about the motion is taken into account. The physically relevant information about the step is expressed in the form of constraints imposed on —this is the stage at which the physics is introduced.
The prior—We adopt a prior
that represents a state of extreme ignorance: knowledge of the initial position
x tells us nothing about
. Such ignorance is expressed by assuming that
is proportional to the volume element in
. Since the space
is flat and a mere proportionality constant has no effect on the entropy maximization we can set
. The generalization to curved spaces is straightforward [
13]. Another possible matter of concern is that the uniform prior is not normalizable and it is known that improper priors can sometimes be mathematically problematic. Fortunately, in our case no such difficulties arise. For microscopic particles any prior that is sufficiently flat over macroscopic scales turns out to lead to exactly the same physical predictions. We can, for example, use a Gaussian centered at
x with a macroscopically large standard deviation and this leads to exactly the same transition probability.
The constraints—The first piece of information is that motion is continuous—motion consists of a succession of infinitesimally short steps. Each individual particle
n will take a short step from
to
and we require that for each particle the expected squared displacement,
takes some small value
. Infinitesimally short steps are obtained by taking the limit
. We will assume each
to be independent of
x to reflect the translational symmetry of
. In order to describe non-identical particles, we assume that the value of
depends on the particle index
n.
The
N constraints Equation (
2) treat the particles as statistically independent and their accumulation eventually leads to a completely isotropic diffusion. But we know that particles can become correlated and even become entangled. We also know that motion is not normally isotropic; once particles are set in motion they tend to persist in it. This information is introduced through one additional constraint involving a “drift” potential
that is a function in configuration space,
. We impose that the expected displacements
along the direction of the gradient of ϕ satisfy
where capitalized indices such as
include both the particle index and its spatial coordinate;
; and
is another small, but for now unspecified, position-independent constant.
The introduction of the drift potential ϕ will not be justified at this point. The idea is that we can make progress by identifying the constraints even when their physical origin remains unexplained. This situation is not unlike classical mechanics, where identifying the forces is useful even in situations where their microscopic origin is not understood. We do however make two brief comments. First, in
section 9 we shall see that ED in an external electromagnetic field is described by constraints that are formally similar to Equation (
3). There we shall show that the effects of the drift potential ϕ and the electromagnetic vector potential
are intimately related—a manifestation of gauge symmetry—suggesting that whatever ϕ might be, it is as “real” as
. The second comment is that elsewhere, in the context of a particle with spin, we will see that the drift potential can be given a natural geometric interpretation as an angular variable. This imposes the additional condition that the integral of ϕ over any closed loop is quantized,
where ν is an integer.
Maximizing
in Equation (
1) subject to the
constraints Equations (2) and (3) plus normalization yields a Gaussian distribution,
where
is a normalization constant and the Lagrange multipliers
and
are determined from
The distribution
is conveniently rewritten as
where
Z is a new normalization constant. A generic displacement
can be expressed as an expected drift plus a fluctuation,
From these equations we can get a first glimpse into the meaning of the multipliers
and
. For very short steps, as
, the fluctuations become dominant: the drift is
while
. This implies that, as in Brownian motion, the trajectory is continuous but not differentiable. In the ED approach a particle has a definite position, but its velocity—the tangent to the trajectory—is completely undefined. We can also see that the effect of
is to enhance or suppress the magnitude of the drift relative to the fluctuations — a subject that is discussed in detail in [
12]. However, for our current purposes we can absorb
into the so far unspecified drift potential,
, which amounts to setting
.
4. The Information Metric of Configuration Space
We have assumed that the geometry of the single particle spaces is described by the Euclidean metric . We can expect that the N-particle configuration space, , will also have a flat geometry, but the relative contribution of different particles to the metric remains undetermined. Should very massive particles contribute the same as very light particles? The answer is provided by information geometry.
To each point
there corresponds a probability distribution
. Therefore
is a statistical manifold and up to an arbitrary global scale factor its geometry is uniquely determined by the information metric,
where
C is an arbitrary positive constant (see e.g., [
2]). A straightforward substitution of Equations (6) and (13) into Equation (14) in the limit of short steps (
) yields
We see that
diverges as
. The reason for this is not hard to find. As the Gaussian distributions
and
become more sharply peaked and it is easier to distinguish one from the other which translates into a greater information distance,
. In order to define a distance that remains meaningful for arbitrarily small
it is convenient to choose
. In what follows the metric tensor will always appear in combinations such as
. It is therefore convenient to define the “mass” tensor,
Its inverse,
is called the “diffusion” tensor.
We can now summarize our results so far. The choice Equation (
13) of the multipliers
simplifies the dynamics:
in Equation (
6) is a standard Wiener process. A generic displacement, Equation (
7), is
where
is the drift velocity,
and the fluctuations
are such that
We are now ready to comment on the implications of the choice of time scale
and of multipliers
, Equation (
13). The first remark is on the nature of clocks: In Newtonian mechanics the prototype of a clock is the free particle, and time is defined so as to simplify the motion of free particles—they move equal distances in equal times. In ED the prototype of a clock is a free particle too—for sufficiently short times all particles are free—and time here is also defined to simplify the description of their motion: the particle undergoes equal fluctuations in equal times.
The second remark is on the nature of mass. The particle-specific constants
will, in due course, be called “mass” and Equation (
20) provides their interpretation: mass is an inverse measure of fluctuations. Thus, up to overall constants the metric of configuration space is the mass tensor and its inverse is the diffusion tensor. In standard QM there are two mysteries: “Why quantum fluctuations?” and “What is mass?” ED offers some progress in that instead of two mysteries there is only one. Fluctuations and mass are two sides of the same coin.
Finally we note the formal similarity to Nelson’s stochastic mechanics [
25]. The similarity is to be expected—all theories that converge on the Schrödinger equation must at some point become formally similar—but our epistemic interpretation differs radically from Nelson’s ontic interpretation and avoids the difficulties discussed in [
37].
5. Diffusive Dynamics
Equation (
10) is the dynamical equation for the evolution of
. It is written in integral form but it can be written in differential form as a Fokker–Planck (FP) equation (see e.g., [
2])
or equivalently as a continuity equation,
where
is the velocity of the probability flow or
current velocity and
is called the
osmotic velocity—it represents the tendency for probability to flow down the density gradient. Since both
and
are gradients, it follows that the current velocity is a gradient too,
The FP equation,
can be conveniently rewritten in the alternative form
for some suitably chosen functional
. It is easy to check that the appropriate functional
is
where the integration constant
is some unspecified functional of ρ.
With these results we have demonstrated that a specific form of dynamics—a standard diffusion process—can be derived from principles of entropic inference. This diffusive dynamics can be written in different but equivalent ways—Equations (21), (22), (26) and (27) are all equivalent. Next we turn our attention to other forms of dynamics, such as quantum or classical mechanics which require a somewhat different choice of constraints.
6. Hamiltonian Dynamics
The previous discussion has led us to a standard diffusion, in which the density ρ evolves under the influence of some externally fixed drift potential ϕ. However, in quantum dynamics we require a second degree of freedom, the phase of the wave function. The extra degree of freedom is introduced into ED by replacing the constraint of a fixed drift potential ϕ by an evolving constraint in which at each time step the potential ϕ is readjusted in response to the evolving ρ.
To find the appropriate readjustment of ϕ we borrow an idea of Nelson’s [
38] and impose that the potential ϕ be updated in such a way that a certain functional, later called “energy”, remains constant. The next challenge is to identify the appropriate functional form of this energy, but before this we make two remarks.
The standard procedure in mechanics is to derive the conservation of energy from the invariance of the action under time translations, but here we do not have an action yet. The logic of our derivation runs in the opposite direction: we first identify the conservation of an energy as the piece of information that is relevant to our inferences, and from it we derive Hamilton’s equations and their associated action principle.
Imposing energy conservation appears to be natural because it agrees with our classical preconceptions of what mechanics is like. But ED is not at all like classical mechanics. Indeed, Equation (
18) is the kind of equation (a Langevin equation) that characterizes a Brownian motion in the limit of
infinite friction. Therefore in the ED approach to quantum theory particles seem to be subject to infinite friction while suffering zero dissipation. Such a strange dynamics can hardly be called “mechanics”, much less “classical”.
The Ensemble Hamiltonian—The energy functional that codifies the correct constraint is of the form Equation (
28). We therefore impose that, irrespective of the initial conditions, the potential ϕ will be updated in such a way that the functional
in Equation (
28) is always conserved,
Using Equation (
27) we get
We require that
for arbitrary choices of the initial values of ρ and Φ. From Equation (
26) we see that this amounts to imposing
for arbitrary choices of
. Therefore the requirement that
be conserved for arbitrary initial conditions amounts to imposing that
Equations (27) and (31) have the form of a canonically conjugate pair of Hamilton’s equations. The field ρ is a generalized coordinate and Φ is its canonical momentum. The conserved functional
in Equation (
28) will be called the ensemble Hamiltonian. We conclude that non-dissipative ED leads to Hamiltonian dynamics.
Equation (
31) leads to a generalized Hamilton–Jacobi equation,
The Action, Poisson Brackets, etc.—Now that we have Hamilton’s equations, (27) and (31), we can invert the usual procedure and
construct an action principle from which they can be derived. Define the differential
and then integrate to get the action
By construction, imposing
leads to Equations (27) and (31).
The time evolution of any arbitrary functional
is given by a Poisson bracket,
which shows that the ensemble Hamiltonian
is the generator of time evolution. Similarly, under a spatial displacement
the change in
is
where
is interpreted as the expectation of the total momentum, and
are the coordinates of the center of mass,
A Schrödinger-Like Equation—We can always combine ρ and Φ to define the family of complex functions,
where
k is some arbitrary positive constant. Then the two coupled Equations (27) and (31) can be written as a single complex Schrödinger-like equation,
The reason for the parameter k will become clear shortly, but even at this stage we can already anticipate that will play the role of ℏ.
7. Information Geometry and the Quantum Potential
Different choices of the functional
in Equation (
28) lead to different dynamics. Earlier we invoked information geometry, Equation (
14), to define the metric
induced in configuration space by the transition probabilities
. To motivate the particular choice of the functional
that leads to quantum theory we appeal to information geometry once again.
Consider the family of distributions
that are generated from a distribution
by pure translations by a vector
,
. The extent to which
can be distinguished from the slightly displaced
or, equivalently, the information distance between
and
, is given by
where
Changing variables
yields
The Functional —The simplest choice of functional
is linear in ρ,
, where
is some function that will be recognized as the familiar scalar potential. Since ED aims to derive the laws of physics from a framework for inference, it is natural to expect that the Hamiltonian might also contain terms that are of a purely informational nature. We have identified two such tensors: one is the information metric of configuration space
, another is
. The simplest nontrivial scalar that can be constructed from them is the trace
. This suggests
where
is a constant that controls the relative strength of the two contributions. The term
is sometimes called the “quantum” or the “osmotic” potential. This relation between the quantum potential and the Fisher information was pointed out in [
39]. From Equation (
43) we see that
is a contribution to the energy such that those states that are more smoothly spread out tend to have lower energy. The case
leads to instabilities and is therefore excluded; the case
leads to a qualitatively different theory and will be discussed elsewhere [
12].
With this choice of
the generalized Hamilton-Jacobi Equation (
32) becomes
8. The Schrödinger Equation
Substituting Equation (
44) into Equation (
40) gives a Schrödinger-like equation,
the beauty of which is severely marred by the non-linear last term.
Regraduation—We can now make good use of the freedom afforded by the arbitrary constant
k. Since the physics is fully described by ρ and Φ, the different choices of
k in
all describe the same theory. Among all these equivalent descriptions, it is clearly to our advantage to pick the
k that is most
convenient—a process usually known as “regraduation”. Other notable examples of regraduation include the Kelvin choice of absolute temperature, the Cox derivation of the sum and product rule for probabilities, and the derivation of the sum and product rules for quantum amplitudes [
2,
15].
A quick examination of Equation (
46) shows that the optimal
k is such that the non-linear term drops out. The optimal choice, which we denote
, is
We can now identify the optimal regraduated
with Planck’s constant
ℏ,
and Equation (
46) becomes the linear Schrödinger equation,
where the wave function is
. The constant
in Equation (
44) turns out to be crucial: it defines the numerical value of what we call Planck’s constant and sets the scale that separates quantum from classical regimes.
The conclusion is that for any positive value of the constant ξ it is always possible to regraduate to a physically equivalent but more convenient description where the Schrödinger equation is linear. From the ED perspective the linear superposition principle and the complex Hilbert spaces are important because they are convenient, but not because they are fundamental.
9. ED in an External Electromagnetic Field
In ED the information that is physically relevant for prediction is codified into constraints that reflect that motion is (a) continuous, (b) correlated and directional, and (c) non-dissipative. These constraints are expressed by Equations (2), (3), and (31) respectively. In this section we show that interactions can be introduced by imposing additional constraints.
As an explicit illustration, we show that the effect of an external electromagnetic field is modelled by a constraint on the component of displacements along a certain direction represented by the vector potential
. For each particle
n we impose the constraint
where
is a particle-dependent constant that reflects the strength of the coupling to
.
The resemblance between Equation (
50) and the drift potential constraint, Equation (
3), is very significant—as we shall see shortly it leads to gauge symmetries—but there also are significant differences. Note that Equation (
3) is a single constraint acting in the
N-particle configuration space—it involves the drift potential
with
. In contrast, Equation (
50) is
N constraints acting in the 1-particle space—the vector potential
is a function in 3D space,
.
Except for minor changes the development of ED proceeds as before. The transition probability
that maximizes the entropy
, Equation (1), subject to Equations (2), (3) and (50) and normalization, is
which includes an additional set of Lagrange multipliers
. Next use Equations (13) and (16) to get
where we have absorbed
into ϕ,
, and written
as a vector in configuration space. As in Equation (
18), a generic displacement
can be expressed in terms of a expected drift plus a fluctuation,
, but the drift velocity now includes a new term,
The fluctuations
, Equation (
20), remain unchanged.
A very significant feature of the transition probability
is its invariance under gauge transformations,
Note that these transformations are local in space. (The vector potential
and the gauge function
are functions in space.) They can be written in the
N-particle configuration space,
where
The accumulation of many small steps is described by a Fokker–Planck equation which can be written either as a continuity equation, Equation (
22), or in its Hamiltonian form, Equation (
27). As might be expected, the current velocity
, Equation (
25), and the ensemble Hamiltonian
, Equation (
28), must be suitably modified,
and
As a shortcut here, we have adopted the same functional
motivated by information-geometry, Equation (
44), and set
. The new FP equation now reads,
The requirement that
be conserved for arbitrary initial conditions amounts to imposing the second Hamilton equation, Equation (
31), which leads to the Hamilton–Jacobi equation,
Finally, we combine ρ and Φ into a single wave function,
, to obtain the Schrödinger equation,
We conclude with two comments. First, the covariant derivative
in Equation (
64) can be written in the standard notation,
where
is the electric charge of particle
n in units where
c is the speed of light. Comparing Equation (
65) with Equation (
66) allows us to interpret the Lagrange multipliers
in terms of the electric charges
and the speed of light
c. Thus, in ED electric charge
is essentially a Lagrange multiplier
that regulates the response to the external electromagnetic potential
.
The second comment is that the derivation above is limited to static external potentials,
and
, so that energy is conserved. This limitation is easily lifted. For time-dependent potentials the relevant energy condition must take into account the work done by external sources: we require that the energy increase at the rate
The net result is that Equations (62)–(64) remain valid for time-dependent external potentials.
10. Some Remarks and Conclusions
Are there any new predictions?—Our goal has been to derive dynamical laws, and in particular quantum theory, as an example of entropic inference. This means that, to the extent that we succeed and derive quantum mechanics and not some other theory, we should not expect predictions that deviate from those of the standard quantum theory — at least not in the nonrelativistic regime discussed here. However, the motivation behind the ED program lies in the conviction that it will eventually allow us to extend it to other realms, such as gravity or cosmology, where the status of quantum theory is more questionable.
The Wallstrom objection—An important remaining question is whether the Fokker–Planck and the generalized Hamilton–Jacobi equations, Equations (27) and (31), are fully equivalent to the Schrödinger equation. This point was first raised by Wallstrom [
40,
41] in the context of Nelson’s stochastic mechanics [
25] and concerns the single- or multi-valuedness of phases and wave functions. Briefly, the objection is that stochastic mechanics leads to phases Φ and wave functions Ψ that are either both multi-valued or both single-valued. Both alternatives are unsatisfactory: quantum mechanics forbids multi-valued wave functions, while single-valued phases can exclude physically relevant states (e.g., states with non-zero angular momentum). Here we do not discuss this issue in any detail except to note that the objection does not apply once particle spin is incorporated into ED. As shown by Takabayasi [
42], a similar result holds for the hydrodynamical formalism. The basic idea is that, as mentioned earlier, the drift potential ϕ should be interpreted as an angle. Then the integral of the phase
over a closed path gives precisely the quantization condition that guarantees that wave functions remain single-valued even for multi-valued phases.
Epistemology vs. ontology—Dynamical laws have been derived as an example of entropic dynamics. In this model “reality” is reflected in the positions of the particles and our “limited information about reality” is represented in the probabilities as they are updated to reflect the physically relevant constraints.
Quantum non-locality— ED may appear classical because no quantum probabilities were introduced. But this is not so. Probabilities, in this approach, are neither classical nor quantum; they are tools for inference. Phenomena that would normally be considered non-classical, such as non-local correlations, are the natural result of including the quantum potential term in the ensemble Hamiltonian.
On dynamical laws—Action principles are not fundamental; they are convenient ways to summarize the dynamical laws derived from the deeper principles of entropic inference. The requirement that an energy be conserved is an important piece of information (i.e., a constraint) which will probably receive its full justification once a completely relativistic extension of entropic dynamics to gravity is developed.
On entropic vs. physical time—The derivation of laws of physics as examples of inference led us to introduce the informationally motivated notion of entropic time, which includes assumptions about the concepts of instant, simultaneity, ordering, and duration. It is clear that entropic time is useful, but is this the actual, real, “physical” time? The answer is yes. By deriving the Schrödinger equation (from which we can obtain the classical limit) we have shown that the t that appears in the laws of physics is entropic time. Since these are the equations that we routinely use to design and calibrate our clocks, we conclude that what clocks measure is entropic time. No notion of time that is in any way deeper or more “physical” is needed. Most interestingly, the entropic model automatically includes an arrow of time.