1. Introduction
The brain’s perception, body movement, and learning are conjointly organized to ensure the homeostasis and adaptive fitness of the embodied agent in the environment. It is tempting to imagine a neural observer in the brain presiding over the cognitive control of higher animals. Such a homunculus idea is untenable and must be discarded in light of the present-day brain theory [
1]. However, there is still much distance to a complete scientific understanding of the higher-order functions emerging from brain matter; it demands a comprehension of the profound interplay between the standpoints of scientific reductionism and teleological holism [
2,
3].
The brain-inspired FEP is a purposive theory that bridges the gap between top-down teleology and bottom-up scientific constructionism. According to the FEP [
4,
5], all living systems are self-organized to tend to avoid an atypical niche in the environment for existence. The FEP adopts the autopoietic hypothesis [
6] and scientifically formalizes the abductive rationale that organisms make optimal predictions and behaviors from uncertain sensory data. To be precise, the FEP suggests an information-theoretic
variational measure of environmental atypicality, termed
free energy (FE). The FE objective is technically defined as a functional of the probabilistic generative density specifying the brain’s internal model of sensory data generation and environmental change and an online auxiliary density actuating variational inference in the following sense: the Bayesian brain computes the posterior of the environmental causes of sensory data by minimizing FE, whose detailed continuous-state description can be found in [
7]. For discrete-state models of the FEP with discrete time, we recommend [
8,
9] to readers. FE minimization can also be read as self-demonstrating [
10]: this follows from the fact that—as we will see below—FE furnishes a bound on the log marginal likelihood of sensory inputs, where the marginal likelihood can be read as Bayesian model evidence for a generative density or model.
When a Gaussian probability is employed for the variational density [
11], the FE becomes a
norm specified by the Gaussian means and variances, termed the Laplace-encoded FE [
7]. Thus, the Laplace-encoded FE provides a scientific base of the
objectives in a principled manner, which are widely used in machine learning and artificial intelligence. For instance, the optimization function in the predictive-coding framework is proposed to be a sum of the squared (precision-weighted) prediction errors [
12]. In addition, the loss function of a typical artificial neural network (ANN) is often written as a sum of squared differences between the ground truth and the predictive entries from the network [
13]. Furthermore, it is argued that Gaussian sufficient statistics are encoded by biophysical brain variables, which form the brain’s low-dimensional representations of environmental states. This way, the brain acquires access to the encoded FE for minimization as it becomes fully specified in terms of the brain’s internal states.
Our research over the years has been devoted to developing a continuous-state implementation of FE minimization in a manner guided by physical laws and principles [
14,
15,
16]. We have endeavored to advance the FEP to the point where it coalesces into a unified principle of top-down architecture and material base. Moreover, to promote the FEP to nonstationary problems, we incorporated the fact that the physical brain is in a nonequilibrium (NEQ) steady state and is generally continually aroused by nonstationary sensory stimuli. The functional brain must perform the variational Bayesian inversion of nonstationary sensory data to compute the posterior mentioned above. Previously, we accounted for the brain’s behaviors of perception and motor control, as described by attractor dynamics, and termed the governing equations
Bayesian mechanics (BM). The BM coordinates the brain’s sensory estimation and motor prediction in neural phase space. In this paper, we make further progress by incorporating the brain’s
synaptic learning into the BM, which we did not accommodate in our earlier studies. Learning constitutes the crucial brain function of consolidating memory (e.g., via Hebbian plasticity) [
17].
In applications of the FEP to generative models, one usually makes a distinction between inference and learning, namely inferring time-dependent states and time-independent parameters, respectively. In this paper, we treat learning as inference by assuming connection weights (i.e., model parameters) are time-dependent, therefore equipping them with dynamics that are coupled to state dynamics (i.e., changes in synaptic weights are coupled to changes in synaptic activity and vice versa). Considering this view, we are concerned with optimizing fluctuations in synaptic plasticity as opposed to long-term potentiation or memory (in which changes in synaptic weights develop very slowly in relation to synaptic activity).
This study aims to provide a simple but insightful formulation of synaptic learning in the Bayesian brain. Our agendas are that the functional brain operates continuously using continuous environmental representations and that synaptic learning is a cognitive phenomenon that may very well be understood when guided by statistical-physical laws. The notion of
cognition throughout this study is meant to describe the brain’s higher-order capability that involves a top-down, internal model. We consider the NEQ brain a problem-solving matter that cognitively interacts with the environment. To quantify synaptic cognition, we specify the generative densities furnishing the Laplace-encoded FE in a manner to meet the NEQ stationarity and present an FE minimization scheme by practicing the principle of least action (Hamilton’s principle) [
18]. The novel contributions derived in this study are discussed in
Section 7.
In summary, the current work shows how one can formulate variational (Bayesian) inference in the brain—under the free-energy principle (FEP)—in terms of classical paths of least action. This provides a formal link between the functionalist interpretation of FE minimization as inference and learning and the biophysical implementation that can be described with noisy neuronal dynamics and the ensuing paths of least action. For clarity, we focus on a simple inference problem, namely the postsynaptic response to a presynaptic input. This minimal setup allows us to consider both the postsynaptic neuronal activity and the synaptic weight or efficacy of a generic neuron as encoding the sufficient statistics of (Bayesian) beliefs about the causes of sensory or presynaptic input. This example highlights the intimate relationship between synaptic activity and plasticity in subtending inference and learning, respectively. By dealing with continuous-state spaces and Gaussian random fluctuations, we effectively recover generalized predictive coding by applying Hamilton’s principle of least action to noisy synaptic dynamics. Furthermore, the ensuing BM treatment can accommodate fast dynamics, therefore eschewing gradient-descent schemes in terms of generalized coordinates of motion and providing the basis for a direct link with NEQ thermodynamics.
The remainder of the paper is organized as follows: In
Section 2, the single-synapse structure of interest is described. The essence of the FEP is recapitulated, with revision made for synaptic learning, in
Section 3. In
Section 4, an NEQ formulation is presented, which determines the likelihood and prior densities in the physical brain. Next, in
Section 5, the FE objective is identified as a classical action, and the governing equations of synaptic dynamics are derived by exercising the Hamilton principle. The utility of our theory is demonstrated in
Section 6, using a simple model. After the discussion in
Section 7, a conclusion is given in
Section 8.
5. Bayesian Mechanics: Computability of Synaptic Learning
The synaptic FE landscape is generally non-static, as the presynaptic input s in the generative densities is explicitly time-dependent. Thus, it is anticipated that the usual GD implementation on the FE landscape will fail. Accordingly, we formulate the brain’s computability under nonstationary conditions, facilitating nonautonomous neural computation.
Here, we substitute the Onsager–Maclup path probabilities [Equations (
11) and (
13)] into Equation (
7), and obtain the mathematical expression for Laplace-encoded synaptic FE as follows:
where the integrand
is expressed as
Please note that in Equation (
15), we concretely displayed the autonomous dependence on the variables
and
w and the nonautonomous dependence (explicit time-dependence) on the input
s through the generative functions
f and
h.
Equation (
14) manifests a specific association of the FE objective
F with the mathematical object
; namely
F is given as a time-integral of
. This observation is reminiscent of the relation between the action and Lagrangian in classical mechanics [
18]. Accordingly, by analogy, if we identify
F as an effective
action and the integrand
as an effective
Lagrangian for the brain’s cognitive computation, the FE minimization—which is mathematically performed by
under the FEP—is precisely mapped to exercising Hamilton’s principle,
. Then, the Euler-Lagrange equations of motion for determining the optimal trajectories
and
will follow in a straightforward manner, constituting the synaptic BM. Note the temperature dependence of the Lagrangian [Equation (
15)] via the noisy strengths
and
—Equation (
10)—which makes
a
thermal Lagrangian [
28]. See also [
31] for a path integral formulation in generalized coordinates of motion.
Here, working in the Hamiltonian description is more suitable for our purposes. To this end, we carried out a Legendre transformation of
to derive an effective
Hamiltonian :
The outcome is expressed as
In the preceding expression of Hamiltonian, the new variables
and
appear, which are mechanically conjugate to the variables
and
w, respectively. They are determined from the definitions:
Additionally, the constants
and
were defined as
which are measures of the respective precisions of the probabilistic generative models, Equations (
11) and (
13). Equation (
10) suggests that the generative precisions are a biophysical constant specified by the body temperature and the friction of the brain matter. A few points about the Hamiltonian
are noteworthy: The variables (
,
w) and (
,
) correspond to
positions and
momenta, respectively, and the generative precisions
and
may be interpreted as a
neural mass as a metaphor. The Hamiltonian is not breakable into the kinetic and potential energies, as the third and fourth terms on the RHS in Equation (
16) are given as a product of momentum and position variables. The
function does not furnish a conservative energy surface due to its explicit time-dependence through the presynaptic signal
.
The generative functions
f and
h for synaptic learning were introduced in Equations (
8) and (
12) without specifying them; they are the biophysical forces driving synaptic dynamics at the neuronal level. We now specify them by the following models:
The first terms on the RHSs, involving the damping coefficients
and
, prevent the unlimited growth of
and
w [
32]. The linear damping models may be replaced with a nonlinear alternative; for instance, the modified
may be used in Equation (20) [
33]. The second term
on the RHS of Equation (
19) describes the presynaptic input weighted by
w. Moreover, the term
in Equation (20) accounts for Hebb’s rule; one can explore anti-Hebbian learning by inverting its sign. The extra parameters
and
are the steady-state values of
and
w, respectively, without driving terms
and
. After substituting Equations (
19) and (20) into Equation (
15) and evaluating Equation (
17), one can determine the neural representations of the momenta
and
. The results are given as
Please note that momentum represents the discrepancy between the state rate and its prediction from the generative model, which corresponds to (precision-weighted)
prediction error in predictive-coding theory (see discussion in
Section 7).
Having specified the synaptic Hamiltonian given in Equation (
16), we now derive Hamilton’s equations of motion by following the standard procedure [
18]. We present only the outcome without showing intermediate steps:
The resulting Equations (
23)–(26) are a set of coupled differential equations for four dynamical variables
,
w,
, and
, subject to the time-dependent input source
s, which constitute the synaptic BM governing
co-evolution of the state and weight variables. In
Figure 2, we show the neural circuitry implied by the derived BM. We argue that the functional behavior depicted in the circuitry is generic in every synapse in the brain, similar to every cortical column in the neocortex behaving as a sensorimotor system performing the same intrinsic function [
34].
For a more compact description, we shall define the cognitive state
as a column vector in four-dimensional
phase space:
where
T denotes a transpose operation. Then, the preceding Equations (
23)–(26) can be compactly expressed as
where
is a
matrix identified as
and the inhomogeneous vector
is identified to be
Equation (
27) can be formally integrated to bring about the solution:
where the first term on the RHS is a homogeneous solution, given the initial condition
, and the second term is the inhomogeneous solution, driven by the source
. The formal solution represents a continuous path in 4-dimensional phase space, which variationally optimizes the FE objective [Equation (
14)]. Please note that the trace of
vanishes identically; that is,
. Accordingly, the sum of its eigenvalues must equal zero, which we use as a consistency condition in the numerical calculation presented in
Section 6. In addition, when the presynaptic signal is constant or saturates in time, the fixed point
can be obtained analytically:
where we used the notation
.
7. Discussion
The idea that the brain is a neural observer (reinforcer or problem-solving matter) is implicit in the brain-inspired FEP, which is capable of perceiving, learning, and acting on the external and internal milieus; this renders the FEP a purposive theory. On the other hand, brain functions must emerge from the brain matter in a manner that obeys physical laws and principles; the neural substrates afford the biological base for the brain’s high-order capability. Thus, it is significant to recognize that the working FE objective is not a single measure but, instead, an architecture hybridizing the teleological rationale and biophysical realities.
In this study, we continued our endeavor for the continuous-state implementation of the promising FEP as a universal biological principle. Specifically, we applied our theory to synaptic learning and exemplified the learning dynamics as an inference problem under the FEP. The noteworthy contributions from our effort are discussed below:
(i) Equation (
14) is the FE objective in our theory, which suggests that the FE conducts itself as a classical (cognitive) action in Hamilton’s principle; that is,
. We obtained the result by deriving the Onsager-Machlup representations for the NEQ generative densities and inserting them into the Laplace-encoded FE. In our previous studies [
14,
15,
16], in contrast, the cognitive action was identified as a time-integral of the FE,
, under the ergodic assumption. The ergodicity asserts that the ensemble average of surprisal (i.e., the negative log sensory evidence) equals the corresponding temporal average; however, it is difficult to justify the ergodicity idea in the brain. In the present work, we bypassed the ergodic assumption using the more physics-grounded NEQ densities and avoided employing the generalized coordinates of motion; this grounds FEP computations in the physics of stochastic dynamical systems under NEQ conditions.
(ii) The weight variables change over time due to biophysical factors such as the opening of channels at the synapse, through which neurotransmitters transfer in a complex time-dependent manner. Accordingly, we treated the synaptic weights
w as a dynamical variable co-evolving with the state variables in completing the synaptic BM. In contrast, the weights are handled as a static parameter in the widely utilized ANNs in machine learning. Furthermore, in the frameworks of ANNs, a nonlinear activation scheme—such as the sigmoidal function or ReLU (rectified linear unit)—rectifies the network output value [
13]. Our biophysics-informed treatment does not use engineering manipulation to regulate the outcome; instead, the learning smoothly follows the continuous BM. We add that one may employ different biophysical models from our Langevin dynamics, such as Izhikevich neurons [
36] at the neuronal level or neural field models on a mesoscopic scale [
37], and apply our framework to derive a desired BM.
(iii) The momentum representations we derived [see Equations (
21) and (22)] match with the theoretical construct of prediction error in predictive-coding theory [
12,
38]. Empirical evidence of error neurons has recently been reported, which encodes prediction errors in the mouse auditory cortex [
39]. Such a finding provides a neural basis for our theory. However, the differentiation between predictive and error units within a cortical column is still controversial, mainly because of insufficient electrophysiological recordings. Although there is no concrete agreement, the compartmental neuron hypothesis seems to suit the neuronal scenario of functional distinguishability [
40,
41], which argues that pyramidal neurons in the cortex are functionally organized such that feedback and feed-forward signals are sent to outer layers (L1) and middle layers (L5), respectively. In this case, our state representations correspond to feedback channels via apical dendrites and momentum representations to feed-forward channels via basal dendrites in somas. The Hebbian sign of Equation (20) can be either positive or negative, and one can implement spiking predictive coding with the former and the dendrite predictive coding with the latter.
(iv) Data learning via ANNs has become a formidable scientific tool [
42], and much attention is drawn to theoretical questions on how and why they work [
43]. This paper suggested that the brain-inspired FEP underlies the widely used
objective in machine learning algorithms. The
minimization is implemented using a GD with respect to the weights connecting layers, rendering back-propagation of the input-output error reduction in a feed-forward architecture. However, strictly speaking, the validity of GD updating is limited to situations when the inputs are static or quasi-static. For continuous nonstationary inputs such as a video stream, a bidirectional recurrent NN (RNN) is employed [
44]; the RNN sends converted time-series inputs to a pre-structured deep network and performs GD by incorporating a feedback loop to predict the sequential outputs. Our BM formulation, in contrast, handles nonstationary learning in a genuinely continuous manner, offering a fresh perspective. The brain integrates the BM to learn a continuous optimal trajectory in neural phase space by minimizing the FE objective rather than producing a sequential output. We hope that our physics-guided approach will provide further useful insights into the practice of ANN methodologies in continuous time.