1. Introduction
The RNA world hypothesis maintains that RNA molecules, being capable of both performing functions and storing information, were the first self-replicating molecules in the origin of life [
1,
2]. Deamer et al. [
3] have advanced a specific theory that outlines the importance of RNA to the origins of life. Of particular interest is the formation of long RNAs called ribozymes which are capable of catalysis [
4,
5]. A variety of ribozymes have been designed and synthesized [
6], including some capable of catalyzing the reactions necessary for RNA replication [
7], and some capable of replicating other ribozymes [
8,
9]. In theory, collections of ribozymes may form autocatalytic sets, leading to self replication and evolution [
10,
11,
12,
13]. In order for a ribozyme to exist in the first place, non-enzymatic RNA synthesis must have occured. Extensive experiments have been conducted on non-enzymatic RNA polymerization in various settings, including lipid-assisted synthesis, templating, and chemical activation of the phosphate [
14]. Additionally, the effect of wet-dry cycling on RNA polymerization has been studied in simulation [
15,
16,
17] as well as experiment [
18,
19,
20].
RNA polymerization occurs through dehydration synthesis: the ribose unit of one nucleotide bonds with the phosphate unit of another, releasing a single water molecule. This is a classic example of a polycondensation process [
21]; however, classical models of polycondensation are not appropriate for RNA polymerization because they were developed for chemical batch reactors where the reaction product is continuously evacuated to increase yield [
22] (ch. 2). Instead, since the essential processes of life take place in aqueous solution, the condensate is negligibly small in comparison to the solution as a whole. As a result, the external concentration of water is approximately constant, so RNA polymerization is more accurately modeled as a polyaddition process.
Consider an experiment initially consisting of a solution of monomers (e.g., nucleotides) capable of bonding with each other to form polymers. Each monomer can support two bonds, one on its left and one on its right, so that these monomers can link together to form linear polymers of an arbitrary length. A contiguous chain of k monomer units will henceforth be referred to as a k-mer, including the monomer case where k = 1. For the sake of visualization, one can imagine each monomer unit as a puzzle piece with a A terminus and an B terminus. It is important to note that no matter how long a polymer becomes, it always has precisely one unbound A terminus and one unbound B terminus.
We now make the assumption that the system is well-mixed in the sense that all reactants move and interact freely independent of mass, polymerization status, etc. Under these conditions, the polyaddition interaction between A and B termini is described by Hill-Langmuir protein-ligand reaction kinetics; that is, the two termini bind to each other with a reaction rate constant
k+, and bonded A − B pairs separate from each other at a rate
k−. These are assumed to be independent of the configuration of the reacting monomer units; that is, the bonding rate
k+ does not depend on whether each A and B terminus is the endpoint of a long polymer or of a free monomer, nor is the unbonding rate
k− affected by the position of the A − B bond within a polymer. Under these conditions, the reactions affecting each bonding site take the following simple form:
2. Flory-Schulz Polymer Length Distribution
We have assumed that all binding sites behave identically; this implies that each site has the same (potentially time-varying) probability
p of being occupied at any given time. This fact, independent of bonding and un-bonding rates, leads very directly to a geometric distribution of polymer length [
21]. Alternatively, Higgs [
15] provides a proof of the Flory-Schulz distribution as an equilibrium state characterized entirely by bonding and un-bonding rates as opposed to bonding probabilities.
To see how our bonding probability assumption leads to a geometric distribution, we can perform the thought experiment of randomly selecting a
k-mer of any length from the solution. Moving from left to right along the
k-mer, the probability of a bond existing between two consecutive monomer units is
p. In this way, we can view each
k-mer as a sequence of Bernoulli trials, where the length of the
k-mer is the number of trials up to and including the first failure. The result is by definition a geometric distribution with parameter
p, so the probability mass function
ρ(
k) over polymer length is given for positive
k by:
From this probability distribution over polymer length, we would like to find the expected concentration of each
k-mer as a function of our total concentration [U] of monomer units. If we define
to be equal to the total concentration of reactants, including monomers and polymers of all lengths, the expected concentration
n(
k) of
k-mers is given by multiplication with (
2) as follows:
However, we would like to express this result in terms of [U] rather than
because
varies with time as bonds break and reform, whereas [U] is fixed in a closed system. We can find the value of
using conservation of mass:
The distribution of polymer lengths for any bonding probability
p is given by substituting the value of
into (
3):
2.1. Steady-State Bonding Probability from Reaction Rates
We consider a step-growth polymerization process described by (
1), and assume that the total number of reactants is large enough for the law of mass action to apply. Under these conditions, if we introduce the equilibrium constant
, the steady-state concentration of A − B bonds [AB] is given by the Hill-Langmuir equation:
Here [A] is always equal to the total reactant concentration
because each monomer or polymer has exactly one unbound A and one unbound B terminus. Thus we can calculate the steady-state bonding probability
:
Rearranging to solve for
gives a quadratic equation with two real roots for positive values of
. One of these roots is greater than 1 and thus cannot correspond to a probability, so the other must be the solution. We introduce the reduced rate constant
and solve to find a value of
which can be substituted into (
4):
2.2. Thermodynamics of Bonding
We have described the bonding sites as a vast number of non-interacting systems which alternate stochastically between discrete states. This means that the steady-state probability of bonding can be described by Boltzmann statistics if we associate a Gibbs free energy
with the bound state:
For this system, the equilibrium constant
must have units of concentration, meaning that the commonly-employed expression
is dimensionally inconsistent. Solving (
5) for
, then substituting (
7), we find:
The relation defining the Gibbs free energy implies a functional dependence between
and temperature:
, where
and
are the enthalpy and entropy of bonding respectively. This means that depending on the signs of these two quantities, a polymerization reaction may change favorability depending on temperature as shown in Table 1-3 of Voet & Voet [
23]; both
and
will vary with temperature to reflect this. The four cases are compared in
Figure 1 as well as in
Table 1.
We expect intuitively that in most polymerization reactions
would be negative due to the increased order, meaning that polymerization would have to be enthalpically favorable in order to be observed at all. This explains the observation that polymerization is favorable at low temperatures but polymers break down as temperature increases, for example in self-assembly of nanowires [
24].
The suggestion that polymerization ought only to proceed when bond formation is enthalpically favorable may appear to conflict with the fact that the formation of an ester linkage between a sugar and a phosphate group is endergonic under standard conditions as shown in Table 13-4 of Nelson & Cox [
25]. However, this is a consequence of entropic unfavorability—by confining the reactants to the surface of a microdroplet, Nam et al. were able to nearly eliminate the contribution of the
term to the free energy of esterification, revealing a favorable negative value of
[
26]. Other means of reducing the entropic unfavorability of polymerization such as mineral surface adsorption [
27], restriction to small cavities [
28], or the excluded volume effect of crowding [
29] can also increase the favorability of polymerization.
3. Dynamics
In this section, we look at another way of thinking about our chemical system. In particular, we consider a countably infinite family of reaction equations which describe the way in which
i-mers and
j-mers bond to form
-mers, represented with the chemical symbols P
i, P
j, and P
i+j. The chemical equations in this family are of the form:
It is perhaps not immediately obvious that (
8) describes the same system as (
1), but in fact they are two different views of the same chemical process. From the perspective of bond formation, a
k-mer is identical to a monomer in that it has precisely one A terminus and one B terminus. In this view, (
8) is derived from splitting up the single reaction Equation (
1) into separate chemical equations describing the behavior of each possible configuration of A and B termini: the A terminus is the end of an
i-mer, and the B terminus is the end of a
j-mer.
3.1. Continuous Dynamics
We have found a set of chemical equations which describe the interactions of individual k-mers Pk. This is fundamentally a stochastic jump process describing discrete numbers of k-mers, but in the thermodynamic limit as the number of reactants grows very large, we can concern ourselves with the deterministic, continuous evolution of the expected concentration of k-mers.
Our dynamics can be written as a system of differential equations describing the time derivative of . As is usual for deriving mass-action differential equations from systems of chemical equations, we find the time derivative of by a summation over each place where Pk occurs in the system of chemical equations: if it is on the left-hand side, a negative contribution is made to , and if on the right, the contribution is positive.
Any given P
k can appear in all three positions in the chemical Equation (
8). For each equation where P
k appears as the first term on the left side (i.e., for each possible synthesis partner
), we lose P
k at a rate
, but gain it at a rate
. Each of those contributions should also be doubled to handle the functionally identical case where P
k appears as the second term on the left side. Finally, when P
k appears on the right side, for each possible split point
, we gain P
k at a rate
and lose it at a rate
. The facts above can be consolidated into a single differential equation describing the evolution of
as follows:
3.2. Reduction to One Dimension
We consider the special case where the initial condition is a Flory-Schulz distribution (
4) with rate parameter
. For example, the
case would be a solution consisting entirely of monomers, and is particularly relevant as it is a popular experimental initial condition [
18,
19,
20,
30].
The derivation of the Flory-Schulz distribution holds for all time in a well-mixed step-growth polymerization process, even as the distribution parameter
p evolves. This has been predicted theoretically [
31] and demonstrated experimentally [
24,
32,
33]. We would like to calculate the rate at which the distribution parameter
p changes with time. A generalization of this problem was discussed in the context of self-assembling nanoparticles by Gu et al. [
34] (SI 2).
Applying the principle of mass action to (
1) to calculate the time derivative of the total concentration of bonds [AB], then dividing through by [U] gives:
The time evolution of the Flory-Schulz parameter
p according to our closed-form solution of this equation, together with the resulting time evolution of the polymer length distribution, is shown in
Figure 2. The initial condition is
, corresponding to an all-monomer solution.
3.3. Closed-Form Solution
For the special case we just considered where the initial distribution is Flory-Schulz, the system has been reduced to the one-dimensional ODE (
10). We can go one step further: this ODE is separable and admits a closed-form solution. In preparation for this, we will perform some simplifications. First, recall that the steady-state bonding probability
, where
and
. We nondimensionalize the ODE (
10) by setting
, transforming the equation into:
The result is a separable ODE, allowing us to write:
This gives us
as a function of
p, which can be inverted to give a solution to the ODE:
We can fix
to solve for the value of
c:
Finally, we can recover the original time parameterization by replacing
with
, which gives the parameter of the Flory-Schulz distribution as a function of time:
As , the tanh function asymptotically approaches a value of 1 regardless of initial condition, which recovers the previously derived steady-state value .
4. Numerical Treatments
We have derived the rate dynamics of interacting
k-mers (
9) from the family of reaction Equation (
8). However, because the state vectors lie in an infinite-dimensional space, physically realizable numerical methods require us to approximate these dynamics in finitely many dimensions. From our work in
Section 2, we know that the expected number of extremely long polymers tends to be low due to the geometrically-distributed equilibrium state. Therefore, we can achieve very low error by introducing a constraint
d on the maximum length of polymers to be considered. This effectively constrains the system from the infinite-dimensional space
down to the finite-dimensional
.
4.1. Choice of Parameter Values
An experimentalist investigating polymerization in the lab might choose a set of representative conditions in the form of an initial temperature, pH, total concentration of monomer units, and presence of other cofactors such as salts. The dynamics of the system are determined by the rate constants
and
, which are a function of the experimental conditions; although the effect of pH and salt cofactors on hydrolysis have been studied in depth, e.g., by Oivanen et al. [
35], the effects of the same on synthesis rates have remained obscure.
In our setting, we consider conditions common to experiments investigating the hot-spring origins of life hypothesis [
16,
18,
20], with temperature
and pH of approximately 3. This allows us to take the approximate value of
from the experimental results of Oivanen et al. [
35] for similar conditions. As noted in
Section 2.2, although ester formation is endergonic under standard conditions, it is enthalpically favorable and can be made spontaneous by decreasing its entropic unfavorability. We assume that
has been brought down by some means to an illustrative negative value, and compute the corresponding value of
.
4.2. Truncation
Although we seek to truncate the system to a finite dimension
d, we do this not by throwing away polymers which become too large, but rather by eliminating the formation of longer polymers in the first place. This means that we approximate the family of reaction Equation (
8) by prohibiting all reactions which include a reactant of length greater than
d:
The dynamics can be derived from the reaction family (
12) in exactly the same way that (
9) was derived from (
8), the only difference being that the first sum becomes finite due to the truncation. The resulting system of ODEs, describing the evolution of a state vector
whose components
represent the concentration of
k-mers, is given by:
A perhaps more obvious method of truncation would be to keep the exact original form of (
9), but ignore lengths above
d by taking
for
. However, this approach leads to unsatisfactory results because it is equivalent to permanently deleting any
k-mer which forms with
. Since the mass associated with these deleted
k-mers is never returned to the system, mass is continually being lost, so the system asymptotically approaches a steady state at
.
4.3. Simulations
To demonstrate the dynamics of the system and the effects of truncation, we numerically solve (
13) starting from an initial solution of exclusively monomers for the truncation lengths
and
, and plot the concentration of
k-mers up to length 10 over time in
Figure 3. All simulations were run using the DifferentialEquations.jl package [
36] with parameters other than
d held fixed; these values are given in
Table 2. The Github repository containing our simulation code is given in the
Supplementary Materials below.
The expected equilibrium state is the geometric distribution (
4), which would appear uniformly spaced on a logarithmic plot, with the dimer concentration equal to
multiplied by the monomer concentration and so on. In the case where
, this is exactly what we observe; however, when we truncate to
, the distribution goes through an inversion after which
d-mers, rather than monomers, dominate. Since truncation depends on the assumption that longer polymers are negligible, this is obviously nonphysical.
Although each of our simulations converges to some steady-state distribution, the degree of agreement with our theoretical prediction varies depending on the truncation length
d. To visualize this,
Figure 4 plots the steady state distributions for three values of
d compared to the theoretical steady-state Flory-Schulz distribution.
4.4. Error Bound
Since the Flory-Schulz distribution of polymer length which is the solution to the system of reaction Equation (
9) includes a non-zero expected concentration for polymers longer than any finite
d, it is impossible for the truncated probability distribution which is the solution to (
13) to be identical to the infinite-dimensional solution. As noted above, the dynamics of (
13) are exactly the result of constraining the dynamics of (
9) to finite maximum polymer length while preserving conservation of mass. Therefore, the distance between the true solution
and its projection
onto the set of
d-dimensional distributions with the correct total mass provides a lower bound to the error of
any mass-preserving truncation of the reaction family (
8). We can use the “mass operator”
, which counts the total concentration of monomer units in the system, to write this projection as:
We can simplify the objective by separating the portion of
which is allowed to vary from the infinite “tail” which is fixed to zero. Introducing the
d-dimensional truncations
x and
of
n and
respectively, and letting
to capture the error in the tail, we have:
Now we can change variables to
and find the optimal projection using ordinary least squares. In plainer language, the problem being solved is to find the smallest correction
whose total mass is equal to that of the missing “tail”
y. The finite-dimensional version of the mass operator
can be constructed as a
matrix whose entries are ascending integers.
The well-known closed-form solution to the ordinary least squares problem (
15) is:
This can be brought into more elementary terms by calculating:
and
We can now directly calculate the total distance of the projection using (
14).
This provides a lower bound on the sum-squared error of any solution with maximum polymer length
d and the same total mass as the previously proven solution
. In order to make this result more directly comparable between different parameter values, however, we will use the relative error:
4.5. Applying the Error Bound
This result E, which is strictly greater than but asymptotically equal to , provides an absolute lower bound on the error between the instantaneous distribution of polymer lengths and any mass-conserving finite approximation to this distribution. In other words, error terms on the order of arise in any simulation of our chemical system, so long as the simulation (a) produces finite-dimensional results and (b) obeys conservation of mass. These errors can be surprisingly large even for quite reasonable-sounding d as p approaches 1. This result is relatively insensitive to the choice of error metric; although we specifically investigate the case of norm, other metrics which we tested in simulation also produced error on the order of .
It is important to emphasize that this is a lower bound on error; this does not guarantee that a certain choice of
d will produce less than a certain error, which is in fact impossible without being more specific about the method of solution. For example, in
Figure 5, above a certain truncation length of about
, the finite precision of the solver becomes more of a limiting factor than the truncation error. Likewise, early in the dynamical simulation when the instantaneous value of
p is very small, the error bound is practically useless. Instead, this bound guarantees that any simulation which chooses
d too small will produce at least a certain specified error.
As an example of applying this error bound in practice, if we consider a system with
, corresponding to a steady-state bonding probability
, we can numerically solve (
16) for
d to find that a simulation with
cannot have final relative error less than
. The simpler error bound
is even easier to apply: any truncation length
must produce final relative error
. In the above case, this laxer bound is only able to rule out truncations up to
, but the ease of calculation makes this bound probably more useful than the tighter one.
5. Comparison to Experiment
The main value of a theoretical model of any physical process is the predictive and interpretive power it brings to designing and analyzing experimental results. In this section, we present a few commonly measured experimental quantities and the predictions our theory makes about them.
5.1. Critical Concentration
As the total concentration [U] of monomer units increases, the concentration of monomers in the final solution approaches a fixed value
called the critical concentration [
37] (
Section 3.2). This value is a function only of the rate constants, and can be calculated by taking the limit of
as [U] is taken to infinity. The monomer concentration
is given by substituting (
6) into the Flory-Schulz distribution (
4) with
:
As [U] grows to infinity, the reduced equilibrium constant
goes to zero, so we can calculate
as:
The critical concentration provides an alternate route to experimental determination of rate constants; since it is likely easier to measure the monomer concentration than to find the complete length distribution, it is possible to use the critical concentration to find one rate constant given the other.
5.2. Polymer Yield
Experimental studies commonly report the polymer yield, the fraction of mass which is converted to polymers at equilibrium. We can compute this mass conversion efficiency
as:
The analogous quantity derived from concentrations rather than masses is simply equal to , since the monomer concentration ratio is exactly .
5.3. Mass Distribution
It is frequently easier to measure the
mass rather than the concentration of polymers of each given length. For example, the output of high-performance liquid chromatography (HPLC) is a “spectrum” where the height and location of each peak corresponds approximately to the total and per-molecule mass of a reaction product respectively. For comparison with such results, we follow Flory’s treatment of the polymer mass distribution
, which can be defined in terms of the concentration distribution
(
3) as follows [
38]:
Here, each term
is the theoretical total mass of
k-mers, and
is the molar mass of the corresponding monomer. Since
is a single peak distribution we can analyze an experimentally determined mass distribution
by matching the position
of the peak of
to the position
of the peak of
. We find the peak of our theoretical distribution by differentiating
:
The unique zero of this equation is
, so according to our theory, the mode of the mass distribution is related to the bonding probability by:
In other words, given an experimental Flory-Schultz-like mass distribution, one can derive the value for the bonding probability
p. At equilibrium,
, so the Gibbs free energy of bond formation can be calculated using (
7).
As a simple example of how these results may be applied, Monnard et al. used HPLC to measure nonenzymatic polymerization of activated nucleotides [
28]. When multiple bases were mixed, the HPLC results are difficult to interpret, but in the case with pure uridine, the result appears to be a Flory-Schulz mass distribution with a mode of about 2. This corresponds by (
18) to
. Substituting this value into (
17), we expect polymer yield of
, consistent with their reported value of 88%.
We can also estimate the free energy of bond formation under their experimental conditions in this way: when
, the Boltzmann statistics of (
7) give a free energy
, corresponding to a process where bond formation is slightly favorable. Additionally, given the free energy at two different temperatures, we can separate the entropic and enthalpic contributions. Since this experiment was carried out at −18 °C, we can make a first approximation by assuming that our calculated value is directly comparable to the value for sugar-phosphate esterification under standard conditions of 3.3 kcal/mol as shown in Table 13-4 of Nelson & Cox [
25].
The temperature of this experiment was 43 K colder than the standard temperature of 25 °C, and the concomitant free energy change was 3.5 kcal/mol. Since
, a bit of algebra gives
and
. These conditions correspond to the case in
Figure 1 and
Table 1 where polymerization is favorable below a critical temperature
. It seems suspect, however, that
would be so large [
26], suggesting that the role of entropy in bond formation is reduced by the environment of the eutectic ice-water mixture, in agreement with the commentary of Monnard et al. [
28].
5.4. Degree of Polymerization
Another quantity commonly measured in experiments is the average degree of polymerization in obtained solutions, e.g., [
30]. Our model allows the evolution of this parameter to be calculated easily; by Carother’s equation, the number average degree of polymerization is given by
, so we can use (
10) and the chain rule to calculate:
This correctly simplifies in the irreversible case
or when the degree of polymerization
to a linear increase in degree of polymerization
. This linear dependence on initial concentration is in contrast to the quadratic dependence predicted for polycondensation. We are not aware of a study of the time course of
in RNA polymerization, but these dynamics have been observed in the synthesis of nanowires by Gao et al. [
24], in supramolecular polymerization of micelles by Lu et al. [
39], and in simulations of interfacial polymerization by Xing et al. [
40]. In all three cases, the reversibility of the polymerization process is also in evidence due to the slowdown in the rate of increase in degree of polymerization.
6. Discussion
Our model provides an explicit description of the formation of RNA polymers in aqueous prebiotic conditions as is necessary for the RNA-world hypothesis. The mathematical and computational models presented in this paper generalize to all polymers that grow by polyaddition in well-mixed solutions. In these cases, as well as in polycondensation processes where the concentration of the condensate remains approximately constant, our models describe the dynamics of the length distribution as well as the eventual steady state. The time evolution of an initial Flory-Schulz distribution is completely determined by the evolution of the bonding probability, for which we have stated a closed form solution depending only on the forward and back reaction rates and the number of monomer units.
In the simple case of an initial population of RNA monomers in absence of any cofactors, whenever polymerization is favorable in the sense that the Gibbs energy of bond formation is negative, the steady-state concentration of polymers is expected to exceed the concentration of monomers. However, we do not expect to see a population length inversion, sometimes dubbed a “kinetic trap,” in which polymers of certain lengths achieve a greater concentration than any shorter polymer. Any experimental deviation from these predictions with or without the presence of cofactors indicates the presence of significant higher-order effects (e.g., hairpin structures, cyclical polymers, catalysis) and may suggest future directions for mathematical models. The inclusion of the shielding of bonding sites from hydrolysis by virtue of the secondary structure of RNA in the model may increase the lifetime of long polymers in solution, leading to a recovery of kinetic trap-like behavior. Similarly, should the RNA in question be encapsulated in lipid vesicles small enough to introduce finite-size effects, or assisted in polymerization by association with a surface, then the statistics of our model may no longer apply.
Additionally, when considering the hydrothermal origins of life hypothesis advanced by Deamer et al. it becomes important to consider the effects of wet-dry cycling on RNA polymerization [
3]. In the dry phase, polymerization is favorable due to the lack of hydrolysis but the well-mixed assumption is violated. In the wet phase, polymerization is unfavorable due to the presence of hydrolysis but the well-mixed assumption is upheld. Intuitively, this means that alternation between a relatively long dry phase and short wet phase allows for the “well-enough mixing” of the RNA molecules, such that the solution approaches the polymer distribution predicted for a dry phase with mixing [
15]. In short, wet-dry cycling leads to an effective increase in the probability of bonding and therefore increases the concentration of long polymers. Besides wet-dry cycling, another important feature of the hydrothermal hypothesis is the suggestion that biogenesis occurs at a certain optimal temperature: it is well-known that elevated temperature is required to increase the rate of biological reactions [
3], but as temperature increases, polymerization becomes entropically unfavorable, suggesting a Goldilocks effect with regard to the ambient temperature at the origins of life.