1. Introduction
Living cells use information [
1] and energy to maintain a stable ordered state far from equilibrium, allowing them to self-replicate and form functioning multi-cellular societies [
2,
3]. Since the information-encoding structure of DNA was demonstrated, virtually all investigation of information dynamics within the cells has focused on the genome. Gene networks are typically viewed as the primary regulators of cell function [
4] similar to computing networks [
5] so that the genome is typically modeled as a cellular central processor. Thus, the nucleus is usually viewed as the cellular “command center” [
3] that receives, processes, and responds to all intra- and extra-cellular information.
This conventional model is clearly supported by the essential role of the nucleus in cell function and proliferation and the unambiguous mechanism for information storage in the triplet code of the DNA. This fundamentally motivates the intense and ongoing empirical investigation of the genome and gene networks as the primary governing dynamics in single cells and multicellular organisms. However, the implicit assumption that all cellular information is contained in the genome has not been validated by empirical observations. For example, in 2001, as the Human Genome Project neared completion, a betting pool named Genesweep [
6] was formed so that scientists could wager their best estimate of the number of genes in the human genome. More than 400 scientists participated, with an average estimate of 66,000 ranging from 310,000 to 26,000 [
6]. None were correct—the human genome, based on multiple different data sources, is now thought to contain fewer than 19,000 genes [
7].
This somewhat embarrassing over-estimation of our species genome size was based on the widely accepted assumption that, as the sole repository of cellular information, the genome acts as a central processor [
8,
9] that exclusively receives, analyzes, and responds to all intracellular and extracellular signals. This concept of the nucleus as the cellular “control center” [
10] is implicit in the expectation that the number of genes necessary for each organism must be some increasing function of the size and complexity of that organism. In fact, it is now clear that the human genome has roughly the same number of genes as worms and fewer genes than many species, including the frog and rice [
11]. Failure of the prevailing paradigm to accurately predict the observed variations in genome size has led to a number of amendments that generally focus on complexity of genetic interactions. In essence, the human genome can do more with less through evolution of increasingly sophisticated gene networks and perhaps a greater role for miRNA [
11]. Note, however, that in these discussions, the fundamental assumption that the genome is the site of all information dynamics that govern individual cell function and the formation of cellular societies with precise spatial organization and distribution of function to allow complex anatomic structure and physiological interactions in multicellular organisms.
An alternative hypothesis views the genome as simply one component of a broad multiscale, multimodal information dynamic within cells [
12,
13,
14]. Specifically, the genome, while the sole source of heritable information, also produces the molecular machinery and physiological interactions that permit independent information storage, acquisition, and processing at other sites (the cell membrane, for example [
15,
16]) within the cell. This non-heritable information plays an essential role in the cell’s ability to interact with spatial and temporal variations in its environment. For single cells, this information includes global events, such as local changes in temperature as well as small perturbation, that just affect a single region of the cell surface. Similarly, some perturbations may occur and require adaptations over hours, days, or years while others occur in microseconds and require an equally fast response. In multicellular organisms, individual cells must respond to environmental signals that are, in effect, instructions regarding the location and differentiated state necessary for tissue structure and function. Thus, we might hypothesize the density and diversity of these non-genomic interactions between the cell membrane and the adjacent tissue environment may be a critical component in the development of size and complexity within multicellular organisms.
This hypothesis suggests that cellular information dynamics is a distributive system that at least involves the cell membrane and nucleus and most likely includes other organelles, such as the mitochondria, centrosome, and endoplasmic reticulum. This model is motivated by observations that the “nucleus as cellular command center” paradigm is subject to a number of practical and theoretical objections. Among them is the simple factor of time. A mammalian cell that is 20 microns in diameter is about 20,000 protein diameters across. Signals traveling from the cell membrane to the nucleus via pathway proteins diffusing in the three dimensions inevitably results in information degradation, since the location at which each protein arrives on the nuclear membrane and the time required for the transit are subject to considerable variations. Furthermore, cell responses to a critical perturbation often must occur in seconds or even microseconds—far too fast to allow communication to the nucleus, processing of information, synthesis of proteins, and transport of proteins to the critical site. Furthermore, feedback over time regarding the success or failure of the cell’s response requires continuous bidirectional communications, which will suffer the same degradation of information and time limitations noted above. In other words, if the nucleus is the sole location in which the cell can receive, process, and respond to information, the time required to send and receive signals from an acute, external perturbation would be excessively long. In contrast, we hypothesize that evolutionary optimization would select a distributed information system that includes a large repertoire of information receptors and response modules that allow rapid and spatially specific responses to the diverse external and internal signals and perturbations a cell will receive in its lifetime.
2. Is the Information Stored in the Genome Sufficient for Cellular Order and Function?
An apparent gap between the information content of a cell and that of the genome and its products has been previously noted [
13,
17,
18]. Over six decades ago, Morowitz [
17,
18] calculated the complexity of the three-dimensional structures in a single
Escherichia Coli requires about 2 × 10
11 bits of information. A similar estimate is obtained using a calorimetric approach. In contrast, the information storage capacity of the
E. Coli genome is 10
7 bits [
17]. Of course, the genomic information is expanded through many translations so that one gene can be amplified to hundreds or thousands of proteins. The total information content of genes and gene products can be estimated because the average molecular composition of
E coli is known. The mRNA content is four-fold larger than the DNA, and the typical protein content of
E. coli is 1.6 × 10
−13 g/cell. Assuming the average weight of amino acids is 110, about 8.4 × 10
8 amino acids are incorporated into the intracellular proteins of
E coli. Using Shannon information, 4.2 bits of information are gained with each amino acid for a total information content of protein of about 3.4 × 10
9 bits. Thus, the total information content contributed to the
E Coli from the genome is about two orders of magnitude smaller than the information content of the cell. In other words, most of the information in a cell is not a gene or gene product [
13].
Actually, this exercise simply restates the obvious. There are clearly ordered structures in the cell other than proteins and polynucleotides. Membranes, for example, constitute about 60% of the cell mass and contain about 10
9 lipid and glycolipid molecules. In mammalian cells, over 200 different species of lipid molecules contribute to the membrane, and their relative content is precisely controlled—varying among different cell types, different regions of the same cell and even in the inner and outer layers of the nuclear membrane [
13]. The information content of membranes in a “typical” mammalian cell has been calculated to be on the order of 5 × 10
10 bits [
13,
17]. While proteins catalyze the formation of lipids, this feed-forward dynamic is inherently unstable, and clearly other mechanisms must play a critical role in controlling the lipid distribution.
A somewhat less obvious information-containing structure is transmembrane ion gradients [
15]. The asymmetric distribution of Na
+, Cl
−, Ca
+, and K
+ across cell membranes (manifesting as transmembrane potentials in the range of −80 to −40 mV [
19]) as well as the H
+ gradient across the mitochondria represent highly ordered (i.e., non-random) structures. The critical role of the transmembrane gradient in the biology of eukaryotic cells is evidenced by observations that up to 40% of the cell’s energy budget is used to maintain them [
20].
3. What is Information in the Context of Living Systems?
In one of the most famous
gedankens (thought experiments) in scientific history [
21], Maxwell proposed a remarkably clever set of initial conditions [
22]: Two closed boxes at identical temperatures containing the same gas are connected by a channel in which a “demon” operated a frictionless gate. Because the gas molecules have a distribution of velocities, the demon could allow only faster molecules to move from one box to another and slower molecules in the opposite direction. This would generate an apparently spontaneous heat flow between regions of the same temperature—a violation of the first law of thermodynamics.
In the early 20th Century, Szilard [
23] and others resolved this conundrum by pointing out the demon forced the heat flow by using information (i.e., the velocity of the molecules) [
24]. This was the birth of information theory, which reached maturity in Shannon’s seminal work [
25] on communication that first quantified information in “bits”.
In 1970, Johnson, writing in Science [
26], predicted information theory would become the “calculus of biology”. Although it is clear that information is necessary for living systems, information theory thus far has contributed little to biology and certainly has not become its primary theoretical framework. In fact, even the definition of information in a biological context remains controversial. Since the mathematics of information theory rely on probability distributions, information can be viewed as a deviation from randomness or the degree of “unexpectedness” [
13,
14]. More importantly, the connection of information to order and stability in living systems remains unclear. In some ways the challenge is evident in the very metric of information contents—bits. The elegant mathematical basis for this metric has been clear for decades, and the predicted link between information [
25] and thermodynamic energy has been empirically confirmed [
27]. However, bits may provide a metric for quantity of information, they say nothing about meaning (i.e., context) and provide no clear mechanism by which information is converted to thermodynamic work, reduces cellular entropy, and increases cellular order [
2].
In living systems, information is useful only when it is converted to order or thermodynamic work. Gatenby and Frieden [
28] proposed one mechanism by which living systems convert unit-less bits stored in the genome to thermodynamic work, and “meaning” (or specificity) is apparent in proteins that serve as enzymes. The amino acid sequence that forms the protein is encoded unambiguously in the genetic DNA, and this information is converted to a three-dimensional structure as the protein folds into its lowest free energy state. Thus, the 3D configuration is specified by the gene information, since the folding is dependent on the interaction of the amino acids in the protein. We propose this folding process represents an increase in order and information, which is expressed thermodynamically as a release of free energy as the protein assumes a minimum free energy state. Importantly, this conversion process is subject to modification by environmental information in the form of temperature, pH, ion concentrations, and so on. Of particular interest are the variations in concentrations of mobile cations on protein functions. Page and Di Cera [
29] demonstrated empirically the activity of many important intracellular enzymes is dependent on the local concentration of Na
+ and K
+. The role of Ca
2+ concentrations for kinase activity [
30] has been extensively investigated. Other enzymes, hexokinase and phosphofructokinase, for example, are dependent on Mg
2+ concentrations [
31]. These observations are the outcomes of interactions of cations with negative charges on some amino acids in the protein. But it is likely similar dynamics will produce many other biologically important changes in macromolecules. Furthermore, as demonstrated in
Figure 1, cations shield negative charges on the inner leaf of the cell membrane while mobile anions can shield positive charges on the nuclear membrane and intracellular structures. Brief ion fluxes have the effect of uncovering these charges allowing coulomb interactions with adjacent macromolecules.
In enzymes, the acceleration of a reaction requires the shape of the catalytic pocket to match that of the substrate (like a “key and lock”). As noted above, the enzyme shape is largely dependent on the genetic amino acid sequence but also subject to fluxes in local environmental factors. The degree of matching of the catalytic pocket to the substrate can be quantified by the Kullback–Leibler (K–L) divergence [
32,
33,
34], which is also a metric of information. Because this spatial match and corresponding K–L Divergence for each enzyme is unique to each substrate, these dynamics add specificity to the information content of the gene. In other words, the information content of each enzyme in bits, as described by the K–L Divergence, is not fixed but rather dependent on the substrate with which it is attempting to bind. If the substrate key does not fit into the enzyme lock, there is no informational effect in that transaction. Finally, we note these dynamics allow observation and quantitation of the conversion of information to thermodynamic work through the reduction of the activation energy. Note this does not alter the initial or final energy state of the reaction but, rather, alters the speed with which the reaction occurs. Remarkably, enzymes can increase the rate of reaction by as much as 15 orders of magnitude! To support this theoretical model, we note the Arrhenius equation, which was developed solely through empirical observations, can be theoretically derived using K–L Divergence [
28].
4. Information Dynamics in the Cell Membrane
As noted above, the membrane structure and associated transmembrane ion gradients represent a large storehouse of information. How is that used by the cell to maintain order? The answer is observable in a common experience when walking toward the entrance of a heated building on a cold day. Should the door open as you approach it from inside the building, you will feel a rush of warm air. Note that, even if you are not looking at the door, the release of warm air along the thermal gradient between the building and environment is clear evidence that it has opened. If the door was linked to a sensor so that it opened upon some perturbation, the flow of warm air is transmission of information arriving at the sensor into the building where it can be received and processed (by you and others) as a puff of warm air similar to ion puffs observed in cells [
35]. Note that, assuming the door immediately closes, this flow of information into the building in the form of warm air will be localized and brief as the temperature gradient rapidly dissipates with distance from the door and with time following the opening.
Cells generate a similar transmembrane ion gradient to the thermal gradient described above. Here we will focus on the analogous series of events when the gate on a transmembrane ion channel opens in response to a perturbation allowing rapid flow of ions along the pre-existing transmembrane concentration gradients constantly maintained by eukaryotes. This process is well known in the propagation of depolarization waves [
36] along a neuron (described by the well-known Hodgkin–Huxley equation [
37]). This led to the hypothesis that these dynamics represent a specialized application of membrane information processing and transmission in the plasma membranes of non-neuronal cells [
15]. Several hundred different types of gates (e.g., voltage-dependent) for membrane ion channels are encoded in the human genome. Each can act as signal receptors. When the gate opens, ions flow rapidly along the membrane channels. This represents transmission of information received by the gate into the cell. This has been visualized experimentally as Ca
2+ ion puffs [
35]. Although these puffs disperse quickly, they will briefly alter the local ion concentrations in the cytoplasm adjacent to the channel. This information is then “processed” (
Figure 1) as the change in cations alters enzyme function and the loss of shielding of negative charges on the inner leaf of the membrane can both attract positively charged macromolecules and (possibly) the local distribution of phospholipids in the cell membrane. Furthermore, there is the possibility of propagating a depolarization wave along some or all of the adjacent membrane [
35].
Mathematical models of ion flow through transmembrane channels has demonstrated this to be a highly efficient process that minimizes the loss of signal [
15]. There are several hundred different protein gates that can detect and respond to a wide range of environmental signals. When the gate opens, the large, pre-existing transmembrane ion gradients allow immediate and rapid flow (about 10
7 per second [
38]) of a specific anion or cation (each channel is highly ion-specific) into or out of the cell. These dynamics are identical to those associated with propagation of a depolarizing wave along neurons as described by the Hodgkin–Huxley equation [
37]. Recently, we have, in fact, demonstrated the Hodgkin–Huxley equation can be derived from first principles of information theory used to model the flow of information across the membrane of non-neuron cells [
15]. This supports the hypothesis that the transmembrane ion flows that produce a travelling depolarization wave in neurons are simply a specialized adaptation of membrane information dynamics found universally in eukaryotic cells.
How do cells process and respond to information transmitted by transmembrane ion flows? The key to this is likely the above-noted sensitivity of proteins to local ion concentrations. By altering the function and location of peripheral membrane proteins at the site of the ion flux, the cell can rapidly and locally respond to many perturbations (
Figure 1). It is likely that similar dynamics will occur in the extracellular space adjacent to the channel. Furthermore, since changes in the ion concentrations dissipate rapidly over space and time, these dynamics allow the cell to repeatedly assess critical spatial and temporal information. Thus, environmental signals that are time-dependent and/or spatially localized can be resolved within the membrane. For single-cell eukaryotes, this allows rapid response to critical environments, such as the presence of a dangerous predator or a potential meal. For eukaryotic cells that are part of a multicellular organism, these dynamics can promote a response to sudden perturbation, such as an injury, but are also likely required to use spatial and temporal signals from the local tissues that convey instructions on the required cellular location and function within that tissue.
6. Conclusions
The conceptual model of the nucleus as a cell’s central processor and control center is anthropomorphic and, therefore, intuitively appealing, but the temporal and spatial demands of cellular functions suggest this would be a suboptimal organizing principle. To be clear, heritable information in the genome is necessary to generate all components of this information network. That is, gene-encoded proteins are membrane pumps and detectors that generate, similar to Maxwell’s Demon, a large transmembrane ion gradient as well as the peripheral membrane proteins that process and respond to local signals. However, once formed, the membrane information processor can function as a critical and, at times, independent component within a broad, integrated cellular information network. It is linked to the nucleus and other organelles by a highly sophisticated communication system embedded in the cytoskeleton allowing synchronous response to some perturbations. However, the membrane information processor can also process and respond to local perturbations independent of additional input from the nucleus.
Perhaps the best evidence for the critical role of the transmembrane ion gradient in a distributed information system is simply the energy cells used to maintain it. Guppy et al. [
20] found that up to 40% of a mammalian cell’s energy budget is devoted to maintaining the transmembrane ion gradient. Since evolution is a relentless optimization process that continuously balances cost and benefit, one can generally assume that the energy devoted to a cellular function is a good estimate of its evolutionary value.
Additional support is found in considering the time requirement imposed by the assumption that the genome is the central processor and the nucleus the cellular control center. External information frequently requires actions on a timescale too short to allow communication to the membrane to the DNA, processing, decision making, and subsequent protein synthesis and delivery to the site of a perturbation. On the other hand, a distributed information system permits parallel “computing” and maximal efficiency. Furthermore, channels in the cell membrane allow eukaryotic cells to acquire spatial and temporal information from the environment with high resolution. This information can be communicated to other organelles through both fine-grained and coarse-grained conduits. This augmentation of information generated by cell membrane receptors and carried by diffusion through molecular pathways allows single-cell eukaryotes to respond quickly to threats or opportunities. In multicellular organisms, membrane information dynamics may be critical for proper distribution of cells in spatially complex structures as well as synchronized function of those cells necessary for optimal tissue function.
Finally, the hypothesized spatial and temporal information in the membrane will likely be critical in determining the robust spatial organization of cells necessary for functioning tissue within multicellular organisms. Indeed, if the spatial organization and function of multicellular tissue is primarily governed by information dynamics in the membrane, the lack of correlation between organism complexity and genomic size is much less surprising. If so, this hypothesis generates a straightforward prediction: The number and distribution of transmembrane ion channels and intracellular signaling networks will increase as the complexity of an organism increases. This prediction can be tested through empirical observations.