1. Introduction
Estimating the effective connectivity between neurons in the brain is not an easy task [
1,
2,
3,
4,
5]. There are many ways to unveil causal relationships in their multiple scales, such as from neurons to brain regions. Experiments with external stimuli are commonly used for this inference process, where the spikes and the activity of a neuron are related to a second neuron that is connected to the first one if the perturbation allows us to see that [
6]. These procedures focus on an improvement in the prediction of the future activity of the second neuron (the receiver) by incorporating information produced by the past activity of the first neuron (the sender of the perturbation), which is seen as a causal interaction between these neurons [
7].
Admittedly, connectivity estimation is not straightforward due to the noisy nature of neuronal signals. Recordings of electrophysiological patterns in vitro and in vivo reveal that the neuronal activity is highly irregular and difficult to predict [
8,
9,
10]. Intrinsic variability is apparent in the response of neurons, even to frozen stimulation [
11,
12]. Experimental data suggest that neurons, synapses, and the network system operate in an inherently stochastic framework [
13,
14,
15]. Accordingly, the mathematical description of neuronal phenomena can be treated in probabilistic terms, i.e., describing the process of spiking as a stochastic process.
Determining which stochastic process is more suitable is a matter of debate. It is, however, reasonable to consider that the probability of a neuron spiking is conditioned by the knowledge of its past temporal response. Hence, this probability is greater the further in the past the last spike of the neuron in question is. This implies that the stochastic process that models the activity of this neuron is not Markovian with full memory, as shown by some works in the literature [
16,
17,
18]. The activity of a neuron could, therefore, be reasonably modeled by a stochastic process whose dependence on the past is variable in scope.
The class of Markov chains with variable-length memory became popular in the statistical and probabilistic community with the work by [
19]. The processes in this class are still Markovian of fixed order but with transition probabilities that do not depend on a fixed number of past states, taking into account, on the other hand, the dependency structure present in the data. These relevant sequences of past states are called contexts, and the set of contexts can be represented as a rooted tree, namely a context tree. When considering a variable-length memory, we have more informative models that are flexible and parsimonious compared to Markov chains with full memory.
Given a trajectory of the Markov chain with a variable-length memory, we can estimate transition probabilities using, for example, a plug-in estimator. A way of estimating connectivity and disambiguate spurious correlations from actual connections is by inferring the information that flows from one neuronal spike train to another. For this, we can use information-theoretical measures [
20,
21,
22], which are functions of these transition probabilities. Thus, the estimation of the transition probabilities is essential. In this work, we consider the modeling of the neuronal spike trains by way of Markov chains with variable-length memory and use transfer entropy to understand the transmission of information between neurons over a finite time interval.
Transfer entropy (TE), an information-theoretic measure for quantifying time-directed information transfer between joint processes, was proposed by [
20] and independently by [
23] as an effective measure of causality. A closely related concept that measures information transport is the transfer entropy rate (TER). These measures can quantify the strength and direction of coupling between simultaneously observed systems [
24,
25]. Consequently, TE and TER are widely used in neuroscience today to assess connectivity from neuronal datasets [
3,
26,
27,
28,
29,
30,
31,
32,
33]. In this sense, these measures allow us to study both linear and nonlinear causality relations between neuronal spike trains, described as discrete random processes. In this article, we are interested in the application of these measures to the detection of effective neuronal connectivity between neurons with variable-length memory. In other words, we aim to test for the absence of causal influence between neurons. Under fair conditions, the null hypothesis, which corresponds to the absence of causal influence, is equivalent to the requirement that the transfer entropy rate is equal to zero [
34].
To test the statistical significance of a connectivity value and determine whether connectivity is detected, we use the plug-in estimator for the transfer entropy rate, which is identified with the log-likelihood ratio test statistic for the desired test. According to [
34,
35], this statistic is asymptotically
distributed under the null hypothesis, facilitating the computation of
p-values when used on simulated data. In this work, the test is employed in the analysis of spike trains simulated from a space-time framework inspired by the Galves and Löcherbach model [
36], which is built on the simple and biologically plausible assumption that the membrane potential of each neuron is reset every time it spikes. The authors construct a stationary version of the process using probabilistic tools and obtain an explicit upper bound for the correlation between successive inter-spike intervals. This enables the application of the proposed statistical test to samples generated from this model. The effectiveness of the resulting hypothesis test is illustrated in these simulated data, which identifies interesting and biologically relevant information.
The problem of testing effective connectivity between neurons based on transfer entropy has been considered in the literature using surrogate data [
3,
37]. In general, generating surrogate data with the same properties as the original data but without dependencies between signals is difficult. In this sense, feasibility emerges when collecting sufficiently large samples, and when the test statistic has a known asymptotic distribution, the use of parametric tests is a good alternative. Recently, parametric tests have been used to detect connectivity between neurons using a test statistic based on the plug-in estimator of directed information, assuming that the bivariate process is Markovian with full memory [
38,
39]. To the best of our knowledge, this is the first time that testing for causality using a transfer entropy has been performed in the more general scenario of Markov chains with variable-length memory, based on a transfer entropy rate plug-in estimator. Thus, this work complements existing studies on transfer entropy estimation and effective connectivity detection between neurons.
The remainder of this article is organized as follows. In the next section, we establish our notations and review preliminary definitions and concepts, particularly those concerning the neuronal network model, transfer entropy, and the estimation of the transfer entropy rate.
Section 3 introduces the hypothesis test we use to detect causal influence between stochastic neurons. In
Section 4, we apply transfer entropy to the identification of effective connectivity between a pair of stochastic neurons using synthetic data generated from the random network model described in
Section 2. Lastly, we end this article with our conclusions in
Section 5.
2. Notations, Definitions, and Preliminary Notions
In this paper, we denote random variables in uppercase letters, stochastic chains in uppercase bold letters, and the specific values assumed by them in lowercase letters. Calligraphic letters denote the alphabets where random variables take values. Subscripts denote the outcome’s position in a sequence, for example, generally indicates the outcome of the process . For any integers j and k such that , we use the notation for finite sequences , for left-infinite sequences , and for right-infinite sequences . We use the convention that if , is the empty sequence. We use analogous notations for sequences of random variables.
2.1. Neuronal Spike Trains as Stochastic Processes
Throughout this paper, we assume that we record the neuronal activity over a finite time horizon. The sequence of times at which an individual neuron in the nervous system generates an action potential is termed a
spike train. It is useful to consider the times of spike occurrence with a certain degree of accuracy, which is called the
bin size [
40]. In this sense, the bin size refers to the duration of time over which neural activity is aggregated or binned for analysis. For a small enough bin size (10 ms is a typical choice), the spike train may be represented as a binary sequence
, where
for every
. The appropriate bin size to use depends on the specific experimental design and the characteristics of the data being analyzed. In general, the bin size is chosen to strike a balance between capturing relevant details of the neuronal activity and having sufficient statistical power. This typically involves selecting a bin size that is small enough to capture important features of the data but not so small that the resulting spike counts are noisy or unreliable.
Recordings of neuronal activity reveal irregular spontaneous activity of neurons and variability in their response to the same stimulus [
41,
42,
43,
44,
45]. Thus, the experimental data suggest that spike trains should be modeled from a probabilistic point of view. In this context, and to give a probability measure to describe the process of spiking as a sequential process, we assume that the activity of a neuron is described by a discrete-time homogeneous stochastic chain
defined on a suitable probability space
, where
for every
.
In this paper, we assume that the sample spike train is generated by a stochastic source. This means that at each bin, conditional on the whole past, there is a fixed probability of obtaining a spike. Neurons exhibiting this characteristic are arranged in such a way that they share similar biophysical properties and are collectively referred to as stochastic neurons.
The randomness introduced by stochastic neurons can be useful in training neural network models because it can help prevent overfitting and improve the network’s ability to generalize to new data. In this work, we are interested in detecting the effective connectivity between a pair of stochastic neurons using synthetic data generated from such random network models.
2.2. Neuronal Network Model
Let
I be a finite set of neurons, and assume that the bins are indexed by the set
. In this context, the network of neurons is described by a discrete-time homogeneous stochastic chain
. For each neuron
at each bin
,
Moreover, whenever we say time , it should be interpreted as time bin t. For notational convenience, we write the configuration of at time by and the path of associated with neuron as . We use analogous notation for the observed configuration of at time and the observed path of associated with a neuron .
In what follows,
P denotes the law of the neuronal network
. In this network, the stochastic chain
has the following dynamic. At each time step, conditional on the whole past, neurons update independently from each other, i.e., for any
and any choice
,
, we have
where
is a left-infinite configuration of
.
Moreover, the probability that neuron
spikes at bin
, conditional on the whole past, is an increasing function of its membrane potential. In other words, for each neuron
at any
,
where
denotes the membrane potential of neuron
at time
and
is an increasing function called the
spiking rate function.
The membrane potential of a given neuron is affected by the actions of all other neurons interacting with it. More precisely, the membrane potential of a given neuron depends on the influence received from its presynaptic neurons since its last spiking time. In this sense, the probability of neuron spiking increases monotonically with its membrane potential. Whenever neuron fires, its membrane potential is reset to a resting value, and at the same time, postsynaptic current pulses are generated, modifying the membrane potential of all its postsynaptic neurons. When a presynaptic neuron fires, the membrane potential of neuron changes. The contribution of neuron to the membrane potential of neuron is either excitatory or inhibitory, depending on the sign of the synaptic weight of neuron j on neuron i. Moreover, the membrane potential of each neuron in the network is affected by the presence of leakage channels in its membrane, which tends to push its membrane potential toward the resting potential. This spontaneous activity of neurons is observed in biological neuronal networks.
Assuming the above description, we may consider stochastic neurons with several kinds of short-term memory. In this article, we explore a stochastic neuron model inspired by the GL model [
36], where neuronal spike trains are prescribed by interacting chains with variable-length memory.
For each neuron
at any bin
, we can write
where
is the synaptic weight of neuron
j on neuron
i,
is the spontaneous activity of neuron
i, and
is the last spike time of neuron
before time
, i.e.,
Therefore, for each neuron
at any
, we may rewrite (
2) in the following way
Observe that the spiking probability of a given neuron depends on the accumulated activity of the system after its last spike time. Here, we adopt the convention that , where K is a positive integer number that represents the largest memory length of all stochastic neurons considered in the network. This implies that the time evolution of each single neuron looks like a Markov chain with variable-length memory. This structure of variable-length memory is more appropriate from the estimation point of view because it implies that some transition probabilities of the Markov chain with order K are lumped together.
One can show the existence and uniqueness of a stationary stochastic chain
satisfying (
1) whose dynamics are given by (
3). We refer the interested reader to [
36] for a rigorous proof of this result in the GL neuron model.
2.3. Transfer Entropy
In this work, we use transfer entropy to assess connectivity from neuronal datasets. This measure allows us to study causality relations between neuronal spike trains described as discrete random processes. Transfer entropy is a statistical tool used to quantify the directed flow of information between different neurons. Specifically, it measures how much information from one signal helps predict the future of another signal, after accounting for the past of both signals.
Let
be a discrete-time jointly homogeneous stochastic chain taking values on the alphabet
with distribution
, where
is the set of Borelian probability measures defined on the usual sigma-algebra generated by cylinders of
. For any positive integer
k, the
k-block transfer entropy from
to
is defined as
where
is the conditional
k-block entropy of
, which is given by
and
is the causally conditional
k-block entropy of
on
, defined as
Throughout this paper, “log” denotes the natural logarithm, and, by convention, we take .
Unlike mutual information, transfer entropy is, in general, asymmetric, i.e., . The asymmetry of transfer entropy is characterized by the causally conditional entropy, which quantifies the entropy of conditioned on the causal part of in addition to the history of . We say that has no causal influence on when the causally conditional entropy is equal to the conditional entropy of . In this case, the transfer entropy is zero. Therefore, with this measure, we can quantify the strength and direction of the information flow between simultaneously observed systems.
Although transfer entropy is a measure widely used in neuroscience to quantify the amount of information that flows from one spike train to another, it only considers a finite block of states. In this sense, transfer entropy estimation is sensitive to faulty observations, which may lead to the identification of false causality. For a more comprehensive understanding of the system’s behavior, we may consider the estimation of an information flow rate. This idea leads to the following definition of the transfer entropy rate.
Since
is a jointly stationary ergodic finite-alphabet process, we can define the
transfer entropy rate from
to
as
The existence of the limit can be checked as follows:
where
is the entropy rate
of the process
and
is the causally conditional entropy rate
. Thus,
The following proposition shows that, under appropriate conditions, the transfer entropy rate can be expressed in a simpler form.
Proposition 1. Suppose is a jointly stationary ergodic finite-alphabet variable-length Markov chain with memory no larger than k and with an arbitrary initial distribution. If, in addition, is also a variable-length Markov chain with memory no larger than k, then the transfer entropy rate exists and it equals Proof. Since
is a jointly stationary ergodic finite-alphabet process, we have the existence of
guaranteed. If
is a variable-length Markov chain with memory no larger than
k and with all positive transitions, then
If, in addition, the process
is itself a variable-length Markov chain with memory no larger than
k, then, in a very similar way, we can show that the entropy rate
is simply
. Therefore,
□
Note that if and only if each , given its past , is conditionally independent of . In other words, the transfer entropy rate is only zero in the absence of causal influence.
2.4. Transfer Entropy Rate Estimation
Since we generally do not have access to the probability distributions of the stationary processes whose possible causality relations are investigated, there are many methods to estimate the transfer entropy rate. This is particularly the case when recording neuronal and network signals, without or with equal external stimulation to the neurons, so that their activity is stationary, and inferring causal relationships, especially when dealing with different data formats. For a thorough review, we refer the reader to [
20,
46,
47]. Thus, in this paper, we consider a
plug-in estimator for the transfer entropy rate
between the jointly stationary ergodic chains
and
(see
Section 2.2).
Consider the positive integers
k and
n such that
, and a given finite sample
from the jointly stationary ergodic chain
with joint distribution
. In this context, for any sequences
and
, we define the
plug-in estimate of
P as
where
denotes the indicator function.
Note that
defines a probability measure on
induced by the sample
from
. In this context, if
, we may define the plug-in estimate for the transfer entropy rate
as
Since
is a jointly stationary ergodic chain with distribution
, we have, by the ergodic theorem,
for every positive integer
k and
. Thus,
P-almost surely,
As discussed in Section III.2 in [
48], p.174, we can take a single limit considering
k as a function of
n. If
whenever
and
, then the sequence
is admissible to
in the sense that
Therefore,
P-almost surely,
The asymptotic behavior of
can also be described in terms of its probability distribution. According to [
34], if
does not have a causal influence on
, equivalently, if
, then
has an asymptotic
distribution, where the number of degrees of freedom
d is equal to
.
3. Hypothesis Test
Consider the problem of testing whether the binary time series generated by the process has a causal influence on . In the present context, this corresponds to testing the null hypothesis that each random variable is conditionally independent of given , within the larger hypothesis that the joint stationary and ergodic process is a variable-length Markov chain with order no larger than k and with all positive transitions.
Formally, each positive transition matrix
for the process
can be indexed by a parameter vector
taking values in a
-dimensional open set
. The null hypothesis corresponding to each
being conditionally independent of
given
is described by transition matrices
that can be expressed as
This collection of transition matrices can be indexed by parameters in a lower-dimensional parameter set
, which is an open subset of
and can be naturally embedded within
via a map
, with the property that all induced transition matrices
satisfy the conditional independence property in (
4).
To test the null hypothesis
within the general model
, we employ a likelihood test. The log-likelihood function
of
given a sample
from the joint process
can be expressed as
where
denotes the law of
with transition matrix
. Then, the likelihood ratio test statistic is the difference
For our purposes, a key observation is that the statistic is exactly equal to times the plug-in estimator .
Proposition 2. If is a variable-length Markov chain of memory no larger than k with all positive transition matrices Q on the finite alphabet and with an arbitrary initial distribution, then Proof. The first maximum in the definition of
can be expressed as
where the last maximization is over all transition matrices
Q with all positive entries. Thus,
The above minimum is achieved by making
Therefore,
where
.
The computation for the second maximum in the definition of
reduces to two different maximizations. Under the null hypothesis,
Q admits the decomposition in (
4), so that
Therefore, by the chain rule,
which, recalling the definition of
, is precisely the claimed result. □
As noted before, under the null hypothesis, that is, when
, the distribution of
is approximately
with
degrees of freedom. Therefore, by Proposition 2, the likelihood ratio test statistic
is approximately
distributed with
degrees of freedom. Note that this limiting distribution does not depend on the distribution of the underlying process
, except through the memory length
k. Therefore, we can decide whether the data offer strong enough evidence to reject the null hypothesis by examining the value of
. Given a threshold
, if
is the observed value of
and
, then the causality hypothesis can be rejected at the significance level
. The algorithm for conducting this hypothesis test is described below in Algorithm 1.
Algorithm 1 Causal influence test. |
Input: Data ;
Significance level ;
Test statistic . Output: Decision: Reject or not reject the causality hypothesis .
.
, where .
if
then
Reject ;
else if
then
Not reject .
end if |
Example 1. Consider a sample of length generated from a microcircuit composed of two neurons whose activities are modeled as in the neuronal network model described in Section 2. In this case, we consider the jointly stationary ergodic variable-length Markov chain with memory no larger than . In this microcircuit, there is a strong excitatory connection from neuron to neuron but no connection from neuron to neuron , i.e., and . In Figure 1, we illustrate the signals from the neurons generated by the neural model for this parameter specification. We conduct two hypothesis tests. In the first one, we are interested in testing the following hypotheses: vs. . In this case, the observed value of the test statistic is . Therefore, by setting a significance level of , we obtain . Hence, since , we reject the null hypothesis. On the other hand, in the second test, the hypotheses are as follows: vs. . In this case, the observed value of the test statistic is . Thus, by setting a significance level of , we obtain . Therefore, since , we do not reject the null hypothesis.
A more comprehensive simulation study is given in the next section.
4. Results on Simulated Data
A natural interest of this work is the application of transfer entropy in the study of effective connectivity between a pair of stochastic neurons. For this, we use synthetic data generated from the neuronal network model described in
Section 2. To test the statistical significance of a connectivity value, we use the hypothesis test described in
Section 3.
For the experiment conducted in this section, we consider two stochastic neurons with variable-length memory whose dynamics are given by the model introduced in
Section 2. In this case, the neuronal network is a microcircuit composed of two neurons whose activities are modeled by the jointly stationary ergodic variable-length Markov chain
with memory no larger than
. We select scenarios where the synaptic weights are either strong or weak. Based on different choices of these synaptic weights, we define the following four distinct cases:
Scenario 1: There is a strong excitatory connection from neuron to neuron but no connection from neuron to neuron , i.e., and . In this case, when is the postsynaptic neuron, we observe, on average, a firing proportion of .
Scenario 2: There is a weak excitatory connection from neuron to neuron but no connection from neuron to neuron , i.e., and . In this case, when is the postsynaptic neuron, we observe, on average, a firing proportion of .
Scenario 3: There is a weak inhibitory connection from neuron to neuron but no connection from neuron to neuron , i.e., and . In this case, when is the postsynaptic neuron, we observe, on average, a firing proportion of .
Scenario 4: There is a strong inhibitory connection from neuron to neuron but no connection from neuron to neuron , i.e., and . In this case, when is the postsynaptic neuron, we observe, on average, a firing proportion of .
In all scenarios, when
is the postsynaptic neuron, we observe, on average, a firing proportion of
. In addition, for each scenario, we consider four different sample sizes:
;
n = 10,000;
n = 20,000; and
n = 40,000, representing
n bins of 10 ms. These are typical recording times in electrophysiological experiments. Note that with these choices of sample sizes, we have
, for all
n, which ensures the convergence results of the transfer entropy rate estimator discussed in
Section 2.4. For each scenario and sample size, 100 replicates are generated, and the test is conducted on each of them.
Using 100 repetitions for the test on samples of length
n = 40,000, the empirical distribution of the statistic
is estimated, as shown in
Figure 2, to be in close agreement with the theoretically predicted
limiting distribution in all scenarios.
In
Table 1, we show the fraction of times, out of 100 simulations of the neuronal network model, that the test rejects the null hypothesis of the absence of causal influence for four different significance levels:
= 0.1%,
,
, and
. We can observe that, as expected (and desired), in scenarios 1 and 2, where there is a strong connection from neuron
to neuron
(excitatory with
and inhibitory with
, respectively), the test detects the connection in
of the cases regardless of the sample size and significance level. However, in scenarios 2 and 3, where there is a weak connection (excitatory with
and inhibitory with
, respectively), the test struggles to detect the connection with small sample sizes and low significance levels, and its performance improves as we increase the sample size and the significance level. Furthermore, in all scenarios, the test does not reject the null conditional independence hypothesis when
is the presynaptic neuron and
is the postsynaptic neuron. In fact, there is no connection from neuron
to neuron
. Therefore, the results are in agreement with the nature of the data.
In
Figure 3, we display the distributions of the estimated transfer entropy rates for each of the 100 samples of length
n = 40,000 generated by the neuronal model using box plots. We can observe that in scenarios 1 and 4, where there is a strong connection from neuron
to neuron
, the estimated values are similar and greater than those estimated in scenarios 2 and 3, where there is a weak connection from neuron
to neuron
. On the other hand, in all scenarios, there is no connection from neuron
to neuron
, and the estimated values tend to be lower than those obtained in the aforementioned situations.
5. Conclusions
In this work, we studied the effective connectivity between neurons through a hypothesis test whose test statistic is based on the plug-in estimator of the transfer entropy rate. Effective connectivity refers to the causal interactions among distinct neural units, whereas anatomical connectivity and functional connectivity refer to anatomical links or loosely defined statistical dependencies between units, respectively [
49]. Our work demonstrates how to properly use transfer entropy to measure the flow of information between sequences and explore its use in determining effective neuronal connectivity.
Understanding effective connectivity has long been a central challenge in neuroscience [
1,
50,
51,
52]. The identification of connectivity has garnered significant interest in recent years, primarily due to advancements enabling the simultaneous recording of a vast number of neurons [
53,
54,
55,
56]. Essentially, we now exploit the understanding that synaptic connections induce voltage fluctuations capable of triggering postsynaptic action potentials. These subtle effects modulate spike timing within a spike train, discernible under specific conditions. By iteratively applying inference techniques to extensive datasets, crucial connectivity maps for understanding the brain are generated. A successful recent example is the inference of the small central pattern-generating circuit in the stomatogastric ganglion of the crab
Cancer borealis [
57]. This circuit is known and so it is amenable to this type of analysis. Yet, it is a challenging circuit because pharmacological manipulations alter the neuronal intrinsic dynamics and synaptic communication, as clearly shown by the authors. However, for the majority of other living systems, challenges persist in this process, including the stochastic nature of neurons originating from a highly random environment, leading to confounding factors that are challenging to disambiguate, as well as the selection of an appropriate and refined metric capable of addressing such issues.
Our analysis indicates that the hypothesis testing framework described in this paper can be a useful exploratory tool for providing conclusive biologically relevant findings. Here, we showed that this test reliably detected effective connectivity when two signals are generated from a neuronal network model in which neurons are stochastic with variable-length memory.
A second contribution of this work concerns the relationship between the synaptic weight values set for the neuronal network model described in
Section 2 and the estimates of the transfer entropy rate (see
Figure 3). We observed that synaptic weight values close to zero lead to estimates that are also close to zero. On the other hand, the farther from zero the synaptic weight values, the larger the estimates of the transfer entropy rate. Therefore, empirical transfer entropy effectively translates the types of connections existing between the neurons in the network.
One avenue for future research stemming from this work is the testing and validation of our studies with experimental data. Difficulties may arise, given that not many circuits are completely known to act as ground-truth data, with some exceptions such as
Cancer borealis [
57] or
C. elegans [
58]. An intermediate step would, therefore, involve applying simulations with biophysically grounded neurons based on the Hodgkin–Huxley model [
59]. The inclusion of a stochastic background at different levels of the model (ion channels, synapses, network) would highlight the advantages of our approach and allow for further extensions.
The findings of this study suggest that the method is both robust and versatile, accurately deducing effective connectivity among neurons possessing various synaptic characteristics. We utilize synaptic values aligned with both strong and weak neuronal connections in the brain. Regarding its versatility, this implies that the technique can be enhanced and extended to more intricate systems, considering the influence of confounding factors such as stimuli or additional spike trains from third parties.