Next Article in Journal
Proteome Expression Signatures: Differences between Orbital and Subcutaneous Abdominal Adipose Tissues
Next Article in Special Issue
Determining the Identity Nucleotides and the Energy of Binding of tRNAs to Their Aminoacyl-tRNA Synthetases Using a Simple Logistic Model
Previous Article in Journal
Moderate Sedation or Deep Sedation for ERCP: What Are the Preferences in the Literature?
Previous Article in Special Issue
Is Life Binary or Gradual?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution

by
Vladimir N. Uversky
Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
Life 2024, 14(10), 1307; https://doi.org/10.3390/life14101307
Submission received: 27 September 2024 / Revised: 13 October 2024 / Accepted: 14 October 2024 / Published: 15 October 2024
(This article belongs to the Special Issue What Is Life?)

Abstract

:
Obviously, the discussion of different factors that could have contributed to the origin of life and evolution is clear speculation, since there is no way of checking the validity of most of the related hypotheses in practice, as the corresponding events not only already happened, but took place in a very distant past. However, there are a few undisputable facts that are present at the moment, such as the existence of a wide variety of living forms and the abundant presence of intrinsically disordered proteins (IDPs) or hybrid proteins containing ordered domains and intrinsically disordered regions (IDRs) in all living forms. Since it seems that the currently existing living forms originated from a common ancestor, their variety is a result of evolution. Therefore, one could ask a logical question of what role(s) the structureless and highly dynamic but vastly abundant and multifunctional IDPs/IDRs might have in evolution. This study represents an attempt to consider various ideas pertaining to the potential roles of protein intrinsic disorder in the origin of life and evolution.

1. Introduction: Who Are You, Mr. IDP?

For most of its existence, protein science was ruled by the famous “lock-and-key” model proposed in 1894 by the German chemist Hermann Emil Louis Fischer (1852–1919) to describe the molecular mechanisms of enzymatic activity [1]. Here, the unique complementarity of the rigid structures of a substrate and an enzyme was suggested to define the efficiency of catalysis. Therefore, the specific functionality of a given protein was believed to be predetermined by the precise spatial positioning of its amino acid side chains and prosthetic groups, which, in its turn, was predestinated through a defined 3D structure of this protein (the so-called structure–function paradigm). Despite its numerous limitations, this structure–function paradigm, assuming that protein functionality is directly linked to its unique rigid 3D structure, acted as a ‘Big Bang’ that gave rise to the universe of modern protein science [2,3], a universe where ordered proteins with well-defined structures conduct well-defined functions in a “unique sequence—unique structure—unique function” manner.
However, even the most structured proteins, instead of being rigid crystal-like entities, represent dynamic systems with different degrees of conformational flexibility [3]. In fact, the 3D structures of ordered proteins determined by X-ray crystallography and many other ensemble-based techniques represent averaged pictures [4]. This is because proteins constantly undergo structural rearrangements originating from the fact that the conformational forces stabilizing the protein structure are weak and can be broken even at ambient temperatures due to thermal fluctuations [3,5], providing protein groups involved in such interactions with the ability to form new weak interactions with comparable energy [5]. Therefore, ordered proteins exist as dynamic ensembles of interchanging conformations, where structural rearrangements, being of relatively small scale, happen relatively fast (they occur typically in a time scale that is faster than the time required for structure determination by X-ray crystallography and many other physical techniques) [4].
It was also pointed out that not all structures deposited in the Protein Data Bank (PDB) [6] are defined throughout their entire protein lengths but instead contain regions with missing electron densities (i.e., portions of protein sequences missing from the determined structures) [7,8]. These regions of missing electron densities, being flexible or disordered in nature, are incapable of coherent scattering of X-rays. They are very common in the PDB, as less than 30% of PDB protein structures do not have them [9]. In addition to ordered proteins possessing different degrees of conformational flexibility and ordered proteins containing malleable/disordered regions of varying lengths, many biologically active proteins are characterized by a complete or almost complete lack of ordered structure under physiological conditions and exist as highly dynamic and heterogeneous conformational ensembles [5,10,11,12,13,14,15]. These IDPs and hybrid proteins containing ordered domains and various IDRs [16] are characterized by remarkable conformational heterogeneity and constitute a significant part of the protein kingdom [17,18,19,20].
Since IDPs/IDRs cannot spontaneously fold under the “physiological” conditions promoting folding of ordered proteins/domains, it was not surprising to find that the universe of protein amino acid sequences can be divided into at least two very different categories: sequences that naturally fold into ordered proteins or domains, and sequences that yield IDPs/IDRs [3,21]. Furthermore, the removal of the restrictions posed by the need to spontaneously fold into an ordered structure to become functional dramatically increased the sequence space available to IDPs/IDRs in comparison with the sequence space available to foldable proteins and domains [3,22]. Therefore, the amino acid sequences of structureless and ordered proteins are dramatically different [10,12,13,23,24,25]. For example, IDPs with extended disorder (so-called native coils and native pre-molten globules) were shown to be characterized by a low content of hydrophobic residues combined with a high content of similarly charged residues [12]. At the more grained level, the IDPs/IDRs were documented to be significantly depleted in the so-called order-promoting amino acids (Cys, Trp, Tyr, Ile, Phe, Val, Leu, His, Thr, and Asn) and enriched in the disorder-promoting Ala, Gly, Asp, Met, Lys, Arg, Ser, Gln, Pro, and Glu residues [10,13,24,25,26,27,28]. These and other disorder-specific peculiarities of the amino acid sequences were used to design numerous computational tools for the reliable prediction of intrinsic disorder in proteins [10,13,17,29,30,31,32,33,34,35]. The use of those tools has opened a way to evaluate the natural prevalence of protein disorder, revealing that many proteins are expected to contain long IDRs and that the eukaryotic proteomes have a higher fraction of intrinsic disorder than prokaryotic proteomes [17,18,20,36,37,38,39,40]. It was also pointed out that these differences in disorder distribution within the protein universe can be understood by taking into account the facts that IDPs/IDRs have evolved to have specific functions, being commonly involved in regulation, recognition, and signaling (see below), and that the eukaryotes and especially in multicellular eukaryotic organisms possess complex and well-developed regulation networks that might rely on the capability of IDPs/IDRs to perform the necessary regulatory functions [5,19,41,42]. In fact, being commonly involved in the recognition, regulation, and control of various signaling pathways [41,42,43], IDPs/IDRs have a unique functional arsenal that is parallel and complementary to the catalytic and transport functions of ordered proteins [24,44,45,46].

2. Roles of Intrinsic Disorder in the Origin of Life

2.1. Prebiotic Life on Earth: Intrinsic Disorder of Extraterrestrial Peptides

Since glycine, among other molecules, was detected in comets, meteorites (see [47,48,49]), and the interstellar medium [50], and since oligoglycine can be synthesized on the surface of cold solid particles (cosmic dust) [51], one can assume that extraterrestrial biomolecules contributed to the origin of life on Earth [52]. In fact, CO (carbon monoxide), carbon (C), and NH3 (ammonia), which are the three most abundant species in the star-forming interstellar medium, were shown to condensate on the surface of cold dust grains and form isomeric glycine monomers in a barrierless manner; these can then polymerize to produce homo-polymeric peptides of different lengths even at low temperatures under astrophysically relevant conditions in the absence of irradiation or water [51]. Therefore, polypeptides of significant lengths, and not just elementary amino acids such as glycine, may be synthesized in rocky planets in the habitable zone and may have served as an important element when life as we know it originated ~4 billion years ago (see [53]).
It is unclear whether more complex heteropeptides can be synthesized via the mechanism proposed for extraterrestrial polyglycine synthesis [51]. However, meteorites (particularly carbonaceous chondrites) were shown to contain various amino acids. For example, 52 different amino acids were found in the Murchison meteorite, among which 33 were unknown in natural materials, while eight were amino acids found in terrestrial proteins [54]. Furthermore, a 4641 Da amino acid polymer predominantly containing glycine and some hydroxy-glycine and alanine [55] and the 2320 Da meteoritic protein hemolithin containing two glycine strands, each 16 residues long, terminated by iron atoms and holding additional oxygen and lithium atoms [56] were found in the carbonaceous chondrite CV3 meteorites Acfer 086 and Allende.
Importantly, isomeric polyglycine-based peptides similar to ones of extraterrestrial origin were strongly predicted to be intrinsically disordered [52]. Therefore, homopolypeptides that can be synthesized extraterrestrially from glycine via the pathway proposed by Krasnokutski et al. [51] or by some other yet unknown mechanisms cannot be ordered. Obviously, this is not a big surprise as the glycine included in such polypeptides, besides being the simplest amino acid, is considered a disorder-promoting residue. It was also emphasized that such disordered polypeptides of extraterrestrial origin could have been present for a long time due to the absence of proteases in the abiotic environment of the primitive Earth [52]. Of course, a peptide bond can be decomposed via uncatalyzed hydrolysis involving direct attack of the peptide bond by water. However, the half-time of such uncatalyzed hydrolysis is expected to be as long as 600 years [57]. Furthermore, since the atmosphere of the primordial Earth was reducing and had no molecular oxygen or other reactive oxides, the primordial ocean did not contain molecular oxygen or other reactive oxides, which further slowed down the rate of spontaneous hydrolysis of the primordial peptides [52]. To conclude this section, it is tempting to hypothesize that extraterrestrial IDPs might have contributed to the prebiotic origin of life on Earth [52]. A more detailed description of this important concept will be provided in the subsequent sections.

2.2. Prebiotic Life on Earth: Intrinsic Disorder of Primordial Proteins

The complex 3D structures of modern ordered proteins represent the result of lengthy molecular evolution. What then can one say about the structures of primordial proteins? It is clear that the chances of the first polypeptides that appeared in the primordial soup of the primitive Earth to have unique 3D structures are negligibly slim. Instead, with a very high probability, such polypeptides were intrinsically disordered. We can find indirect clues supporting the validity of this hypothesis when looking at some known facts. Although the Earth formed about 4.5 billion years ago and became cool enough to potentially spawn life around 4.2 billion years ago, the first fossils are dated 3.85 billion years, raising the question of what was happening in those years in between. At the beginning of the 20th century, Alexander I. Oparin (1894–1980) [58] and John Burdon Sanderson Haldane (1892–1964) [59] proposed a model that constitutes a cornerstone of the theory of molecular evolution according to which some organic molecules could have been synthesized spontaneously from the gases of the primitive Earth’s atmosphere. Such abiotic production of organic molecules would require a reducing atmosphere and ample supply of energy in the form of lightning and/or ultraviolet light. The validity of this idea was demonstrated thirty years later when Stanley Lloyd Miller (1930–2007) and Harold Clayton Urey (1893–1981) conducted elegant experiments deservedly known now as the Miller–Urey experiments. These experiments showed that placing non-organic compounds such as water vapor, hydrogen, methane, and ammonia, which were believed to represent the major components of the atmosphere of the primordial Earth, into a closed system and running a continuous electric current through the system to simulate lightning storms, believed to be common on the early Earth, resulted in the appearance of various organic molecules, including some amino acids [60,61]. Importantly, only about half of the modern amino acids were synthesized in these Miller–Urey experiments [60,61], suggesting that the first proteins on Earth may have contained only a few amino acids.
In line with these considerations, the biosynthetic theory of the genetic code evolution suggests that the genetic code evolved from a simpler form encoding fewer amino acids [62], likely in parallel with the invention of biosynthetic pathways for new and chemically more complex amino acids [63]. Peculiarities of the redundancy of the standard genetic code, where 20 amino acids are encoded by 64 codons, provide some support to the validity of this hypothesis. Here, despite the fact that the redundant codons encoding one amino acid may differ in any of their three positions, only the third position of some of such codons may be fourfold degenerate, i.e., represents a position where all possible nucleotide changes are synonymous as they do not change the amino acid. If these peculiarities of the modern genetic code reflect its evolution, then it is likely that a doublet code preceded the triplet code, indicating that the third position was not used at all in the early genetic code. This means that this early code used 4 × 4 = 16 codons, thereby encoding 16 or fewer amino acids, if a termination codon is taken into account [64], indicating that evolutionarily old and new amino acids can be potentially discriminated. These and many other observations were used by Edward N. Trifonov to propose the following consensus order of the appearance of the 20 amino acids on the evolutionary scene: G/A, V/D, P, S, E/L, T, R, N, K, Q, I, C, H, F, M, Y, and W [65]. Let us look at this scale from the viewpoint of protein intrinsic disorder, where residues can be arranged based on their order-promoting (or foldability) potential [10,13,24,25,26,27,28]. In fact, there are three scales that can provide a ranking of the tendencies of amino acid residues to promote order or disorder: these are the Top-IDP scale (W, F, Y, I, M, L, V, N, C, T, A, G, R, D, H, Q, K, S, E, and P) [23], the DisProt-based scale (C, W, Y, I, F, V, L, H, T, N, A, G, D, M, K, R, S, Q, E, and P) [66], and the scale based on the average number of contacts per residue in the ordered proteins (W, F, V, I, L, M, V, C, H, R, T, Q, N, S, K, E, D, A, P, and G) [67]. Figure 1 represents a comparison of these scales with the amino acid novelty scale proposed by Trifonov and shows that typically, older residues (e.g., G, D, E, P, and S) have a strong tendency to be disorder-promoting, whereas many newer amino acids (e.g., C, W, Y, and F) tend to be order-promoting.
Figure 2 provides another view of these correlations by showing modern genetic code complemented with information on the early and late codons (shown by light red and light blue colors, respectively) and on corresponding disorder- and order-promoting residues as evaluated based on the DisProt scale (shown by red and blue colors, respectively). Codons with intermediate age and disorder-neutral residues are shown by light pink and pink colors, respectively. This presentation emphasizes that there is relatively good agreement between the “age” of the residue and its disorder-promoting capacity, with early residues being mostly disorder-promoting and the majority of late residues being mostly order-promoting. This conclusion follows from the abundance of matching colors (light red–red, light blue–blue, and light pink–pink). There are only two noticeable exceptions to this rule, V and L, which are early but order-promoting residues.
There are also other facts that can provide further support for this idea. Since in the early stages of evolution, the primordial Earth was likely hotter than in the current day, more stable codon–anticodon interactions (in the absence of additional stabilizing interactions) were more favorable under these early conditions with presumably higher temperatures [65]. Therefore, the thermostability of the codons (measured as melting enthalpies (kcal/M) of the dinucleotide stacks corresponding to the first and second codon positions [68]) should have at least some correlation with the amino acid novelty scale. Figure 3A shows that such correlation is indeed observed, as early amino acids are typically encoded by more thermostable codons. Furthermore, Figure 3B shows that there is also an inverse correlation between codon thermostability and the disorder-promoting capability of amino acids, with disorder-promoting residues being encoded by more thermostable codons. One can also add another angle here and bring into consideration residue buriability, which provides a quantitative measure of the driving force for the burial of an amino acid residue in proteins and thereby contributes to the conformational stability of ordered proteins [69]. Figure 3C shows that codon thermostability is inversely correlated with the buriability of the residues encoded by these codons, whereas Figure 3D illustrates the presence of a correlation between the buriability and novelty of residues, where old residues are expected to be less buriable whereas high buriability is characteristic of new residues. Finally, Figure 3E shows that the disorder-promoting residues are less buriable than the order-promoting residues.
Taken together, these observations indicate that the primordial polypeptides were intrinsically disordered, as evolutionarily old amino acids, encoded by more thermostable codons, were less buriable and mostly disorder-promoting. Although it is rather unlikely that these disordered primordial polypeptides possessed high catalytic activity [70], undoubtedly they played important roles in the origin of life and were crucial players in early evolution as well. In fact, as per the RNA world theory, enzymatic activity evolution involves the transfer of catalytic power from catalytic RNAs (known as ribozymes, with an exceptional illustrative example being given by a ribosome, which is an RNA enzyme actually catalyzing the formation of peptide bonds during protein translation, and which is defined as “a creature with a hundred of waggly tails” since its stability is supported by numerous ribosomal proteins, most of which are disordered in the unbound state and fold upon binding to ribosomes [71]) to ribonucleoproteins (RNPs) and then to proteins [72]. Based on these premises, in an organism that was the first to invent protein synthesis, the first proteins would be IDPs with some nonspecific RNA chaperone activities rather than specific catalysts [70,73]. However, in the RNA world, where misfolding-prone RNA [74,75] was used for both information storage and catalysis [76], the presence of such disordered RNA chaperones would be highly beneficial to their carriers, providing them a significant selective advantage. Furthermore, the transfer of enzymatic activity from RNAs (ribozymes) to proteins was a logical evolutionary step determined by the higher stability of protein structures than RNA structures and by the dramatic increase in the variability of physicochemical properties of amino acids in comparison with those of nucleotides. Since a stable structure represents an important prerequisite for the proper spatial arrangement of catalytic residues, which is needed for efficient catalysis [77], transferring catalytic activities to proteins generated strong evolutionary pressure toward proteins with well-folded structures.

3. Roles of Intrinsic Disorder in Evolution

3.1. Wavy Evolution of Intrinsic Disorder: Back to the Future or Blast from the Past

Figure 4 represents a snapshot of the distribution of intrinsic disorder in the modern proteomes [20] and illustrates the well-known fact that IDPs/IDRs are more prevalent in eukaryotes than in less complex organisms [17,18,36,37,38,39,40]. As already pointed out, this plot, representing the dependence of the fraction of disordered residues on the proteome size, has a well-defined gap between the prokaryotes and eukaryotes, as the majority of the prokaryotic species have 27% or fewer disordered residues, whereas almost all eukaryotes are predicted to have 32% or more disordered residues [20]. This observation indicates the existence of a complex stepwise correlation between the increase in organism complexity and the increase in the amount of intrinsic disorder and suggests that the “origination” of intrinsic disorder was crucial for moving from the less complex prokaryotic to more complex eukaryotic cells, which contain many intricate innovations that seemingly arose all at once. Therefore, the sharp jump in the levels of proteome disorder parallels a morphological gap between the prokaryotic and eukaryotic cells, indicating that the increased usage of intrinsic disorder paralleled and likely was crucial to the increase in the morphological complexity of the cell [20].
These observations clearly indicate that IDPs/IDRs, with their ability to control various signaling, recognition, and regulation pathways and networks, act as crucial life maintainers in eukaryotic organisms, especially multicellular eukaryotic organisms [5,19,41,42]. They also seem to suggest that the introduction of intrinsic disorder represents a relatively recent evolutionary “invention” that helped the move from prokaryotes to eukaryotes. However, as was discussed in the previous section, more likely than not, primordial proteins/polypeptides were intrinsically disordered. Therefore, the increased use of intrinsic disorder in eukaryotic organisms clearly represents a blast from the past and can be considered a “back to the future” event. This is illustrated by Figure 5 schematically showing that the pattern of the global evolution of intrinsic disorder is not straight, but wavy. Here, evolution starts with the highly disordered primordial proteins primarily acting as RNA chaperones. Since the competitive advantage of primitive cells was likely defined by the degree of their independence from the fluctuating environmental conditions linked to the ability to catalyze the production of all the constituents necessary for their independent existence, highly disordered RNA chaperones evolved into the ordered enzymes with well-folded unique 3D structures. At the subsequent evolutionary steps, protein intrinsic disorder was reinvented because IDRs/IDPs have specific features crucial for the regulation of complex processes. This prompted the development of more complex organisms from the last universal ancestor (i.e., the most recent organism from which all organisms now living on Earth descend [78,79]), eventually leading to the advent of the highly elaborated eukaryotic cells.

3.2. Intrinsic Disorder and LLPS: From Prebiotic Life to Origin of Cellular Life and Evolution

The aforementioned Miller–Urey experiments demonstrated that simple building blocks (including amino acids) required for the formation of complex macromolecules could form in environments seemingly mimicking early Earth [60,61]. These amino acids could have naturally assembled into polypeptide chains without the need for complex biological machinery. The principle possibility of such prebiotic peptide synthesis has been studied for decades, with researchers investigating different geological settings such as volcanic geothermal fields, hydrothermal fields, sea-floor sediments, and tidal flats [80,81,82,83,84,85] and also looking at the effects of minerals, salts, ions, and pH [80,81,82,83]. Under highly alkaline conditions, peptide synthesis was favored, and the 20-mer oligopeptides (Gly20—with no doubts, this was an IDP!) were synthesized [86]. However, such highly alkaline conditions could not support RNA synthesis due to the low stability of this biopolymer. Another attractive possibility was recently demonstrated in the experiments conducted by Yuki Sumie, Keiichiro Sato, Takeshi Kakegawa, and Yoshihiro Furukawa, who have shown that boric acid can catalyze polypeptide synthesis under neutral and acidic conditions, leading to the appearance of 39-residue-long glycine polypeptides (Gly39—IDP again!) [87]. These observations suggested that in the primordial Earth, polypeptides and proto-proteins could be spontaneously formed from the assembled amino acids in the coastal areas of ancient small continents and islands rich in boric acid [87]. Furthermore, it was indicated that “the same conditions would allow for the formation of RNAs and interactions of primordial proteins and RNAs that could be inherited by RNA-dependent protein synthesis during the evolution of life” [87]. These experiments provided important clues on how early chemistry could have evolved into self-replicating structures. Importantly, the phase separation of primitive macromolecules into liquid coacervates was proposed in the 1920s by Alexander I. Oparin as the first step in the origin of life [58,88].
Therefore, it is likely that primordial IDPs in general (and polyglycine in particular), liquid–liquid phase separation (LLPS, see below), and membraneless organelles (MLOs, see below) played crucial roles in prebiotic evolution. In fact, it was pointed out that polyglycine, with its ability to phase separate and form membraneless droplets and amyloid accretions, very likely contributed to the organization of the protocell domains, facilitation of the evolution of the genetic code, and the overall transition of the pre-life to the cellular life [89]. IDPs in the form of extraterrestrial polypeptides or the primordial IDPs abiotically synthesized on the early Earth could cause the emergence of self-organizing systems that evolved over time following natural selection [90,91,92]. Consistent with this hypothesis, a recent study by Matsuo and Kurihara [93] showed that under appropriate conditions, peptide generation and self-assembly occur concurrently and can give rise to a proliferating peptide-based droplet through liquid–liquid phase separation in water. Furthermore, it was observed that the droplets experienced steady growth–division cycle by periodic addition of monomers through autocatalytic self-reproduction [93]. It was also emphasized that LLPS “may represent a primordial mechanism for functional self-assembly of relatively unevolved molecular assemblies in the early stages of the evolution of life” [94].
LLPS-driven primordial coacervate formation did not wane during evolution. Instead, it seems that its fate is similar to that of IDPs. This is reflected in the fact that although different MLOs are found in the cells of all kingdoms of life, the variability of these biomolecular condensates is dramatically increased in eukaryotic cells, as most of the 100+ currently known MLOs/BCs are of eukaryotic origin [95]. A very important aspect related to the functionality of IDPs and IDRs is their crucial role in the regulation and control of LLPS, an important process associated with the biogenesis of various MLOs and biomolecular condensates (BCs) [94,96,97,98,99,100,101,102,103,104,105,106]. In fact, more than a hundred different MLOs/BCs can be found in the cytoplasm, nucleus, and mitochondria (and chloroplasts) of eukaryotic cells, as well as in the cytoplasm of bacteria and archaea, and, likely, in viruses [95], where they represent “an intricate solution of the cellular need to facilitate and regulate molecular interactions by physically isolating target molecules in specialized compartments in a reversible and controllable way” [102]. IDPs/IDRs are central constituents of all the MLOs investigated so far [98,101,102,107,108,109,110], as their structural plasticity and capability to be involved in multivalent, stochastic, weak, palpation-like interactions are crucial for LLPS, leading to the spontaneous separation of a homogeneous solution into two distinct immiscible liquids or “phases”: a dense phase and a dilute phase, both characterized by high water content and not separated by the membranes. As a result, MLOs always contain IDPs despite the fact that they differ from each other by the specific sets of their resident proteins [102]. It seems that the formation of MLOs/BCs often represents a way of the intracellular compartmentalization of IDPs/IDRs [101,102,108,111,112]. Being liquid in nature, MLOs are characterized by high levels of internal dynamics [94,96,113,114,115,116,117], thereby representing fluid disorder-based ensembles. Since MLOs can be formed on linear cellular structures such as chromatin and cytoskeleton, or in/on the membranes, or in the bulk of the nucleoplasm/cytoplasm/matrix/stroma, they are classified as 1D, 2D, or 3D assemblages that can influence each other, thus representing important pathway for intracellular communication and regulation [118].
It is clear that the protein intrinsic disorder, biological phase separation, and MLO phenomena are interlinked [102,106,107,118,119] since LLPS of specific IDPs is required for the formation of many (if not all) MLOs [98,102,111,120,121,122,123,124,125]. It was pointed out that this IDP/IDR-LLPS-MLO interconnection is redefining the organizational principles of living matter from a rather mechanistic model, where functions of proteins are determined by their rigid globular structures and where intracellular processes occur within the rigid membrane-encapsulated organelles, to a new model, where highly dynamic “biological soft matter” (IDPs and MLOs) positioned at the “edge of chaos” represents a critical foundation of life and defines complexity and evolution of the living things [107].

3.3. Intrinsic Disorder in Nucleic Acid-Binding Proteins

The textbook truism defines genetic programming as a classic molecular biology dogma, where genetic information flows from DNA to RNA to protein. However, it is clear now that this straightforward DNA → RNA → protein information flow, being an oversimplification, is mostly applicable to simple organisms. In fact, using it, one can understand how the E. coli genome works, as bacterial genomes mostly contain the information required for making proteins (typically, ~90% of bacterial genomes are responsible for protein coding). However, the eukaryotic genomes are immensely more complex, as reflected in the facts that genes of higher organisms represent complex mosaics of coding (exons) and non-coding sequences (introns that are removed from the messenger RNA during the process of splicing and can be extraordinarily large, accounting for the majority of the DNA sequence in human genes [126]), all of which are transcribed [127,128,129], with exons covering around 2.8% of the human genome [126]. Curiously, although most of the non-coding DNA in the eukaryotic proteome was considered non-functional (therefore termed “junk DNA” [130,131]), it was eventually shown that the vast majority (at least 80%) of the human and mouse genomes are in fact transcribed and have assigned biochemical functions [132,133]. The majority of the genome sequences conserved between humans and other mammals correspond to the non-coding intergenic and intronic regions, rather than in the protein-coding exons themselves, thereby indicating that these non-coding sequences have critical roles in development and cellular processes. Furthermore, the relative amount of non-coding sequences was shown to increase consistently with the organism’s complexity [133], indicating that bacterial genomes are mostly dedicated to making proteins, whereas eukaryotic genomes are mostly dedicated to the production of non-coding RNAs with various regulatory functions. Therefore, especially in complex organisms, RNA not only acts as a passive, mostly linear, messenger between DNA and protein but is actively involved in the regulation of genome organization and gene expression [134]. In doing that, RNA can fold into specific 3D structures that are complex and can be allosterically responsive, and which “can both recruit generic effector proteins and guide the resulting complexes sequence-specifically to other RNAs and DNA” [134].
Obviously, most of the regulatory RNA functions are conducted in close conjunction with the RNA-binding proteins (RBPs), which are intimately involved in the regulation of gene expression, post-transcriptional regulation, and protein synthesis, as well as governing the maturation and fate of their target RNA substrates [135,136]. Furthermore, RBPs establish a specific network complementing a network regulating gene activity and differently organizing RNA transcripts in different tissues. The global importance of RBPs is reflected in the fact that the human proteome contains at least 1542 such proteins [135,136], indicating that RBPs represent the third major protein group in human cells, in addition to soluble globular proteins and membrane proteins. Based on the comprehensive bioinformatics analysis of ~548,000 proteins forming nucleiomes (i.e., sets of nucleic acid-binding proteins) in 1121 species of Archaea, Bacteria, and Eukaryota, it was concluded that the entire nucleiome is enriched in intrinsic disorder, as evidenced by significantly increased intrinsic disorder content in DNA- and RNA-binding proteins relative to other proteins in corresponding proteomes [137]. This global analysis supported conclusions of earlier studies focused on specific families and classes of DNA- or RNA-binding proteins, with some of the illustrative examples of intrinsically disordered DNA- or RNA-binding proteins being histones [138], ribosomal proteins [71], transcription factors [139,140,141], and proteins involved in the biogenesis and action of yeast [142] and human spliceosomes [143]. Furthermore, focused bioinformatics analysis of the prevalence of intrinsic disorder in human RBP binding to six common RNA types—messenger RNA (mRNA), transfer RNA (tRNA), small nuclear RNA (snRNA), non-coding RNA (ncRNA), ribosomal RNA (rRNA), and internal ribosome RNA (irRNA)—revealed that although RNA-binding proteins are generally enriched in intrinsic disorder, the disorder propensity is unequally distributed across proteins that bind different RNA types [144]. In fact, although the mRNA-, rRNA-, and snRNA-binding proteins were predicted to be significantly enriched in disorder, the proteins that interact with ncRNA and irRNA were not enriched in disorder, and the tRNA-binding proteins were significantly depleted in disorder [144].

4. Intrinsic Disorder as a Means of Increasing Proteome Complexity

4.1. Alternative Splicing

Alternative splicing is an important process by which two or more mature mRNAs are produced from a single mRNA by the inclusion and omission of different segments [145,146], which therefore serves as an important mechanism for enhancing protein diversity in multicellular eukaryotes [147]. For example, the tissue specificity of many proteins is achieved via alternative splicing. The process is very common especially in higher eukaryotes, with between 35 and 60% of human genes yielding protein isoforms by means of alternatively spliced mRNAs [148,149,150]. It was hypothesized that alternative splicing affects the diversity of protein functions, such as protein–protein interactions, ligand binding, and enzymatic activity [151,152,153]. In multicellular organisms, such added protein diversity from alternative splicing is important for tissue-specific signaling and regulatory networks.
The aforementioned fact that the spliceosomal RBPs are enriched in intrinsic disorder [142,143] reflects the crucial importance of IDPs/IDRs in the splicing of the eukaryotic protein-encoding mRNAs, a process by which a spliceosome removes the non-coding regions (introns) from a pre-messenger RNA (pre-mRNA) transcript and joins the coding regions (exons) to create mature mRNA. Since during splicing, exons from the same gene can be joined in different combinations, leading to different, but related, mRNA transcripts, and since these alternatively spliced mRNAs can be translated into different proteins with distinct structures and functions, IDP-containing spliceosomes play crucial roles in the alternative splicing-driven increase in proteome complexity. Furthermore, because of their intrinsically disordered nature, many spliceosomal RBPs possess several unrelated functions, i.e., have the ability to moonlight, whereas some spliceosomal RBPs drive LLPS and the formation of various MLOs via interaction with RNA. To illustrate the disorder status of some of such spliceosomal intrinsically disordered RBPs, Figure 6 represents AlphaFold-generated 3D structural model for one of the moonlighting RBPs involved in the regulation of alternative splicing in nervous system, RNA binding protein fox-1 homolog 2 (RBFOX2; UniProt ID: O43251), which besides regulating the alternative splicing events by binding to 5′-UGCAUGU-3′ elements can also act as a negative regulator of the human estrogen receptor (ER) signaling and play a role in some ovarian cancers [154]. Figure 6B represents a per-residue intrinsic disorder profile generated by RIDAO and shows that human RBFOX2 is predicted to have high levels of intrinsic disorder, especially in its N-terminal region preceding the RNA recognition motif (RRM, residues 121–197). Figure 6C shows disorder profile for the spliceosomal RBP serine/arginine repetitive matrix protein 2 (SRRM2, UniProt ID: Q9UQ35) that serves as a component of the minor spliceosome and is thereby required for pre-mRNA splicing but is also involved in the biogenesis of nuclear speckles (NS), which are among the most prominent biomolecular condensates [155]. Figure 6C leaves no doubt that SRRM2 is an extremely disordered protein. Curiously, the region comprising residues 197–259, which is sufficient for RNA binding, is predicted to be mostly disordered as well.
Importantly, IDPs/IDRs are not only crucial for the control and execution of alternative splicing of precursor pre-mRNAs but also have a vital role in another side of this phenomenon, as protein regions affected by alternative splicing of pre-mRNAs are enriched in intrinsic disorder [158]. The fact that alternatively spliced segments of mRNAs mostly encode IDRs provides an important means for avoiding potential conformational catastrophes. This is because, in ordered proteins capable of spontaneous folding, most of the amino acid sequence contributes to the folding process and is involved in structural stability, as the specific sequence determines which interactions can form between amino acid residues, ultimately shaping the 3D structure of a protein. In other words, the information contained in a protein amino acid sequence determines its unique 3D structure and thereby acts as a specific protein folding code. Therefore, it is likely that the removal of a piece of an amino acid sequence of a foldable protein containing a part of the said folding code (e.g., as a result of alternative splicing of the corresponding mRNA) would distort the capability of a protein to spontaneously fold in the right structure, causing the aforementioned conformational catastrophe reflected in protein misfolding, aggregation, and associated issues. However, no conformational catastrophe is expected if the protein/region is intrinsically disordered, as the removal of a piece with “no structure” would have much of an effect on the remaining “no structure”. On the other hand, it was proposed that associating alternative splicing with protein disorder enables the time- and tissue-specific modulation of protein function [158]. Since IDRs are frequently utilized in protein-binding regions, having alternative splicing of pre-mRNA coupled to IDRs can define tissue-specific signaling and regulatory diversity [158]. Furthermore, since regulatory and signaling elements of IDPs/IDRs can be as short as just a few residues, and since functionally important segments can be located within the IDRs with a high density, the functionality of IDPs/IDRs can be completely rewired via alternative splicing [158]. Therefore, a linkage between alternative splicing and signaling via IDRs represents one of the possible molecular mechanisms that led to the origin of cell differentiation, which ultimately gave rise to multicellular organisms [158].

4.2. Post-Translational Modifications

In addition to the aforementioned alternative splicing, the complexity of a proteome relative to its encoding genome is known to be dramatically increased via various posttranslational modifications (PTMs) of proteins. These spontaneous or enzymatically catalyzed chemical changes of a polypeptide chain happen after DNA has been transcribed into RNA and translated into protein and can be reversible or irreversible. PTM-related increases in proteome complexity are determined by the capability of PTMs to extend the range of amino acid structures and physicochemical properties, thereby leading to the diversification of protein structures and functions [159]. It is emphasized that because of various PTMs, proteins might contain more than 140 physicochemically different residues despite the fact that 20 primary amino acids are typically encoded by DNA [159]. It was also indicated that there are as many as 300 physiologically relevant PTMs in higher eukaryotes [160]. Although all amino acid side chains can serve as PTM targets, most commonly, protein PTMs are found at side chains that can act as either strong (C, M, S, T, Y, K, H, R, D, and E) or weak (N and Q) nucleophiles, whereas the remaining residues (P, G, L, I, V, A, W, and F) are rarely involved in enzymatically catalyzed covalent modifications of their side chains [159]. Furthermore, since some commonly observed PTMs (e.g., phosphorylation and glycosylation) are readily reversed by the action of specific demodifying enzymes, the interplay between the corresponding modifying and demodifying enzymes provides an important means for rapid and economical control of protein function.
The overall importance of PTMs in various aspects of the cellular “life” of proteins is reflected in the fact that as much as 5% of the eukaryotic genomes are expected to encode PTM-related enzymes [160]. In fact, some PTMs are known to regulate the process of protein folding, whereas other PTMs control protein targeting to specific subcellular compartments and interaction with ligands or other proteins, and still other PTMs manage protein functional states affecting catalytic activity of enzymes or the signaling potential of proteins in various signal transduction pathways [161,162]. It is estimated that phosphorylation/dephosphorylation cycles originating from carefully regulated protein kinase and phosphatase activities control the functions of one-third of eukaryotic proteins [163]. Not surprisingly, eukaryotic protein kinases constitute one of the largest protein families, where yeast, mouse, and human kinomes include 119, 540, and ~520 kinases, the human genome contains more than 150 genes encoding phosphatases, whereas there are 1019 kinase- and 300 phosphatase-coding genes in Arabidopsis thaliana [163]. The functionality of some proteins is controlled by multiple different PTMs that can act individually or synergistically to fine-tune molecular interactions and modulate overall protein activity and stability [164]. An illustrative example of well-known multi-PTM proteins is given by a family of nuclear IDPs, histones, that are known to undergo acetylation, ADP-ribosylation, methylation, phosphorylation, SUMOylation, and ubiquitylation at different stages of their function [138]. Although for a long time the N-terminal tails of the core histones containing an extraordinary number of different PTMs were known to play important roles in the nucleosome dynamics and related gene expression and transcription [165], over 30 PTMs have been reported in the core domains of these proteins as well [166].
Importantly, most enzymatically catalyzed PTMs have intimate connections to protein intrinsic disorder, as PTM sites targeted by modifying enzymes are commonly placed within IDRs. This is illustrated by phosphorylation, for which bioinformatics analysis revealed that many protein phosphorylation sites were located in regions that were structurally characterized as IDRs [167,168]. Furthermore, there is a high correspondence between the prediction of disorder and the occurrence of phosphorylation [169], and amino acid compositions, sequence change, complexity, and hydrophobicity, as well as many other sequence features of the regions adjacent to phosphorylation sites are very similar to those of IDRs [169]. In addition to phosphorylation, several other PTM types, such as acetylation, fatty acid acylation, methylation, protease digestion, and ubiquitination, have also been observed to preferentially occur within IDRs [45,167,168,170]. These observations indicate that in eukaryotic cells, localization of sites targeted for various PTMs show a strong preference for IDRs, making these sites easily accessible to modifying enzymes and explaining the functional promiscuity of those enzymes, where a single enzyme could bind to and modify a wide variety of protein targets.

4.3. Intrinsic Disorder, Structural Heterogeneity, Multifunctionality, and Binding Promiscuity

Importantly, protein intrinsic disorder has multiple flavors, as proteins have different levels and depths of disorder, and different parts of a protein can be (dis)ordered to different degrees [42]. This heterogeneity of disorder can be summarized by rephrasing the famous opening line of Leo Tolstoy’s novel Anna Karenina: “All ordered proteins are alike; each disordered protein is disordered in its own way”. In fact, IDPs/IDRs can exist in the extended (coil- or pre-molten globule-like) or collapsed (molten globule-like) forms [2,5,12,13,15,27,171,172,173], and an IDP/IDR can be more or less compact and possess smaller or larger amounts of flexible secondary/tertiary structures [2,5,12,13,173,174]. Furthermore, a typical IDP/IDR is not structurally homogeneous and instead might contain a multitude of potentially foldable, partially foldable, differently foldable, or non-foldable structural elements [3,22], indicating that foldability (or structure-coding potential) is non-homogeneously distributed within the amino acid sequences of a protein. One should also keep in mind that this distribution of differently (dis)ordered regions is constantly changing in time, and a given segment of a protein molecule can potentially show different structures or lack of structure at different time points [3,22].
Therefore, protein structure represents a highly dynamic and very heterogeneous entity, where not only the entire protein molecule is expected to be disordered to different degrees, but various protein segments (even rather short ones) can be differently disordered as well [3,22,109,175,176,177]. Such a mosaic structural architecture of a protein molecule can be considered as a set of foldons (regions capable of spontaneous folding), non-foldons (segments that do not fold), semi-foldons (regions that are always in a semi-folded state), inducible foldons (segments that can gain structure (at least partially fold) upon interaction with binding partners), inducible morphing foldons (regions capable of folding to the different structures upon interaction with different binding partners), and unfoldons (important but less stable parts of ordered proteins that must unfold (or undergo order–disorder transition, at least partially) in order to make the protein active) [3,22,109,175,176,177]. The distribution of these variously (dis)ordered segments (foldons, non-foldons, inducible foldons, inducible morphing foldons, semi-foldons, and unfoldons) is constantly changing in time, and the entire protein has a highly dynamic and morphing structure that is not rigid or crystal-like [3,22,109,176,177]. Furthermore, many proteins exist as complex structural hybrids possessing ordered and differently disordered domains, thereby defining another level of structural heterogeneity crucial for their functions [16]. Therefore, it is clear that the classification of proteins as ordered and disordered is an obvious oversimplification, as the structure-disorder space of a protein represents a continuum, with no obvious boundary between order and disorder [3,176].
It is clear that such complex, highly dynamic, mosaic-like structural organization of proteins is also reflected in complex disorder-based functionality of proteins, as all the differently (dis)ordered structural segments of proteins (foldons, non-foldons, inducible foldons, inducible morphing foldons, semi-foldons, and unfoldons) might have very different functions. Furthermore, since all these foldons, semi-foldons, non-foldons, inducible foldons, inducible morphing foldons, and unfoldons can be found within one protein molecule, one can clearly see that a protein with such a heterogeneous structure is inherently multifunctional. Therefore, the aforementioned protein structural continuum defines protein multifunctionality. These considerations constitute the basis for a “protein structure-function continuum” model, where a functional protein exists as a dynamic conformational ensemble characterized by a broad spectrum of structural features possessing different functionalities, and provides a global link between the protein structure and function [178].
Among the important functional features of IDPs/IDRs residing on their lack of stable structure are their ability to serve as hub proteins, i.e., nodes in protein–protein interaction networks that have a very large number of connections to other nodes [179,180,181,182,183,184,185], to bind partners with both high specificity and low affinity [186], to be engaged in promiscuous interactions with unrelated partners such as other proteins, small molecules, and nucleic acids [187], to contain molecular recognition features (MoRFs), which are short binding regions located within longer disordered regions that can fold upon interaction with a partner [179,188,189,190], to adopt different structures upon binding to different partners [10,187,191,192,193,194,195], to form fuzzy complexes, where a significant part of an IDP continues to be disordered even in the bound state outside the binding interface [158,196,197,198,199,200,201], to act as dynamic and sensitive “on-off” switches [198], and to be able to return to their highly dynamic and pliable conformations after the completion of a particular function [3,22].
Disorder-based interactions are commonly combinatorial and promiscuous in nature, and such combinatorial and promiscuous interactivity defines the multifunctionality of IDPs/IDRs. An illustrative example of this concept is given by the GPCR–G-protein signaling system, which in humans includes more than 800 G-protein-coupled receptors (GPCRs) [202,203,204,205] and a large set of intracellularly located guanine nucleotide-binding proteins (G-proteins), which are heterotrimers composed of α, β, and γ subunits, with their Gα subunit being diversified even further, as there are four major families (Gαs, Gαi, Gαq, and Gα12) encoded by 16 human genes [204,206,207]. Furthermore, the complexity of this system goes far beyond a multitude of pairwise ligand–GPCR and GPCR–G-protein interactions, as one GPCR can recognize more than one extracellular signal and interact with more than one G-protein and one ligand can activate more than one GPCR, and multiple GPCRs can couple to the same G-protein [208]. The biological importance of this system cannot be overemphasized, as it recognizes a multitude of extracellular ligands, triggers a variety of intracellular signaling cascades in cellular responses to hormone neurotransmitters, ions, photons, and other environmental stimuli, and is responsible for vision, olfaction, and taste. In fact, more than 1000 natural and artificial extracellular ligands, ranging from photons to amines, lipids, nucleotides, organic odorants, peptides, and proteins can interact with and activate GPCRs [205,206], and these signals are used to initiate a wide spectrum of intracellular signaling cascades via interaction of an activated GPCR with a Gα subunit, which is a member of one of the four major Gα families. This results in the activation or modulation of various downstream effector proteins and key secondary messengers [206,209,210]. The combinatorial and promiscuous nature of this system is further reflected in the fact that interactions between the activated GPCRs and Gα proteins are characterized by complex coupling selectivity, where several different GPCRs can pair with the same Gα protein and one GPCR can combine with more than one Gα protein. All these features define the GPCR–G-protein system as a cellular “control panel” capable of detecting an exceptionally diversified set of molecules outside the cell and initiating a broad variety of intracellular signaling cascades in response [211]. This combinatorial promiscuity is further amplified and, in fact, is explained by the presence of intrinsic disorder and associated with high conformational flexibility of the members of this system. In fact, it was shown that the cytoplasmic and extracellular regions of GPCRs encompass numerous IDRs, multiple disorder-based binding sites, and abundant PTM sites, and typically have multiple isoforms generated by alternative splicing [208,212]. Similarly, all human G-proteins contain noticeable levels of functional intrinsic disorder, include numerous sites of various PTMs, include disorder-based interaction sites, and exist as multiple isoforms generated by alternative splicing [208]. Furthermore, both GPCRs and G-proteins often undergo function-associated conformational changes that range from domain motion to binding-induced disorder-to-order transitions. In other words, the multifunctionality of these major players of the GPCR–G-protein system is determined by the fact that all these proteins exist as numerous and highly dynamic conformational/basic, inducible/modified, and functioning proteoforms [208].
It is important to note that combinatorial promiscuity can not only be used to describe the assembly of operating protein systems, but also to define the outputs of action of the corresponding promiscuous reconfigurable signaling networks at the organismal level. This point is illustrated by the action of a family of important chemosensory GPCRs, the olfactory receptors (ORs), which are located in the nasal olfactory epithelium and are responsible for the sense of smell. In humans, ~400 ORs are used to discriminate at least one trillion olfactory stimuli [213]. Obviously, such a situation is incompatible with the scenario, where each dedicated OR recognizes one specific odorant molecule. Instead, ORs of a particular type can display broad sensitivities to different odorants (i.e., it can recognize multiple odorants), each odorant can promiscuously bind to receptors of many types (i.e., one odorant is recognized by multiple ORs), and different odorants are recognized by different combinations of ORs [214,215]. Therefore, odorants are discriminated in a combinatorial manner [214], where ORs bind odorants promiscuously with different affinities, and the corresponding combinatorial rules define the output signal sent to the brain.

5. Protein Intrinsic Disorder and Evolution of Multicellularity

5.1. Intrinsic Disorder and Proteoforms

It is very likely that IDPs played important roles in various stages of the origin of life and evolution, being involved in prebiotic evolution preceding the origin of Tibor Ganti’s Chemoton, a suspected precursor to the first universal common ancestor and, subsequently, to later stages of evolution, including the early origin of complex multicellularity and the ensuing bilateria during the Cambrian explosion ~571 million years ago [216,217,218,219]. The cornerstone of modern evolutionary theory is the existence of a last universal common ancestor (LUCA), which is a hypothetical common ancestral cell from which the three domains of life—the Bacteria, the Archaea, and the Eukarya—have originated [78,79] and which lived roughly 3.5 billion years ago, as it follows from a comprehensive computational analysis using model selection theory without making an assumption that sequence similarity indicates a genealogical relationship [79]. The existence of LUCA is supported by multiple observations [79,220,221,222], such as the following:
  • The agreement between phylogeny and biogeography;
  • The correspondence between phylogeny and the paleontological record;
  • The existence of numerous predicted transitional fossils;
  • The hierarchical classification of morphological characteristics;
  • The marked similarities between biological structures with different functions (that is, homologies); and
  • The congruence of morphological and molecular phylogenies.
Complex multicellularity implies the presence in the organism of multiple differently specialized cells responsible for the formation of tissues and organs. Among the molecular mechanisms required for the development of complex multicellularity are the means to increasing the size of the functional proteome relative to the encoding genome that encodes it, which also represents an important phenomenon behind the observation that the complexities of biological systems are mostly determined by their proteome sizes and not by the dimensions of their genomes [223]. This can be illustrated by the gene–protein relationship in Homo sapiens [224,225,226,227,228], where the number of protein-coding genes ranges between 20,000 and 25,000 [132], but the actual number of functionally different proteins is in the range of a few million [229] to several billion [230]. The required structural and functional diversification of a proteome can be achieved by allelic variations (i.e., single or multiple point mutations (amino acid polymorphisms), indels, single nucleotide polymorphisms (SNPs)), alternative splicing, mRNA editing, and other pre-translational mechanisms affecting mRNAs, as well as by a wide spectrum of various PTMs of a polypeptide chain. As a result, a single gene encodes a set of distinct protein molecules, known as proteoforms [230]. Since all these aforementioned mechanisms are associated with some changes in the physicochemical structure of a polypeptide chain, the resulting proteoforms have induced or modified natures. Importantly, protein structural diversity is further enhanced by intrinsic disorder and functionality, giving rise to the conformational or basic proteoforms and functioning proteoforms, respectively [231]. However, since many PTM sites are preferentially located within the IDRs [169,232], since mRNA regions affected by alternative splicing predominantly encode IDRs [158], since IDPs/IDRs act as highly promiscuous binders [5,11,12,13,14,15,22,24,167,168,174,179,188,198,233,234,235,236,237,238,239], and since IDPs/IDRs are characterized by exceptional spatiotemporal heterogeneity, proteins and protein regions without unique structures represent a very rich source of proteoforms [231].

5.2. Casual Emergence

Since multicellular organisms represent complex systems, their organization and behavior are driven by casual emergence, where the higher scale of a system has stronger causal relationships than its underlying lower scales, allowing macroscales to reduce noise in causal relationships, thereby leading to stronger causes at the higher scale level [240]. Emergence is defined as the appearance of a multi-part, complex system, the behavior of which cannot be derived, predicted, or understood by looking at the behavior of its parts. It is one of the characteristic features of complex systems, the behavior of which is determined by a set of common rules [241]:
Complex systems contain many heterogeneous components involved in nonlinear interactions, where a small perturbation may cause a large effect, a proportional effect, or even no effect at all. Therefore, the behavior of a complex system cannot be expressed as the sum of the behaviors of its parts (or of their multiples):
  • The constituents of a complex system are interdependent;
  • A complex system possesses a structure spanning several scales and may be nested, i.e., the components of a complex system may themselves be complex systems;
  • A complex system is capable of emergent behavior, which is unanticipated behavior shown by the system, for example, the arising of novel and coherent structures, patterns, and properties during the process of self-organization;
  • Complexity involves an interplay between chaos (disorder) and order;
  • Complexity involves an interplay between cooperation and competition, and complex systems contain both positive (amplifying) and negative (damping) feedback;
  • Complex systems may have a memory. In other words, the history of a complex system may be important, since due to their dynamic nature, complex systems change over time, and prior states may have an influence on present states (for example, no two genetically identical mice or even two single cells that share the exact same DNA sequence are absolutely identical because of environmental influences, random variations in gene expression, and epigenetic modifications).
It was emphasized that IDPs/IDRs are complex “edge of chaos” systems, as their behavior obeys the aforementioned regulations. “Heterogeneous nature of IDPs is obvious. In fact, IDPs and IDRs are heterogeneous at multiple levels. Globally, they can be compact or extended and their major structural components are heterogeneous too, giving rise to foldons, induced foldons, semi-foldons and non-foldons. These structural components can be independent or interdependent, and they are able to interact nonlinearly. Functional misfolding represents an illustration of the interplay between cooperation and competition. The spatiotemporal complexity of IDPs/IDPRs is further increased by the fact that they and their structural components are always moving between order and disorder. IDPs are able to sense various stimuli and response to these stimuli via corresponding structural changes, where even smallest environmental perturbations might produce large structural and functional outcomes. IDPs/IDPRs possess emergent behavior, since under some conditions they are able to undergo self-organization via stimuli-induced disorder-to-order transitions. Finally, MoRFs, SLiMs and PreSMos represents a memory of the IDP, since they are transiently populated in the non-bound state and may have a profound influence on IDP binding mechanism and on the resulting bound state. All this supports the hypothesis that IDPs/IDPs are positioned at the edge of chaos” [22].
Since in the case of casual emergence, groups of features influence the future of a system together, rather than separately, this mechanism is crucial for governing reliable large-scale responses, such as determining the fate of a single cell, defining intercellular communication and collaboration to form tissues and organs, and even delineating the behavior reaction of an organism in responses to external stimuli. Although casual emergences were shown to be present in protein–protein interaction (PPI) networks (interactomes) of both prokaryotes and eukaryotes (where a cluster of PPIs can be replaced by a single “macro-node” capable of conducting the same job as the collective), it was more evident in eukaryotes and especially in the complex multicellular eukaryotic organisms [242]. These findings indicated that the more complex organisms tend to more often use higher organization levels of their networks for casual roles, thereby becoming more tolerant to noise and indeterminism of their microscales, as macroscales of interactomes are more resilient than microscales [242]. In this way, their noisy microscales do not serve as primary determinants of the phenotypic outcomes ranging from body structure and body shape to behavior [242]. Importantly, this increase in the casual emergence in complex eukaryotes can explain a rather counter-intuitive observation that the effectiveness of the protein interactomes measured as the effective information that serves as an information–theoretic network quantity based on the entropy of random walker behavior on a network and is reflected in the certainty (or uncertainty) contained in connectivity of analyzed network [243] decreased in moving from prokaryotes to eukaryotes [242]. It is very likely that the observed increase in casual emergence in complex eukaryotes is linked to the higher levels of intrinsic disorder in their proteomes. In fact, although due to the abundant presence of IDPs/IDRs, eukaryotic interactomes at their microscales become noisy, more stochastic, and less effective over evolutionary time [242], the formation of macro-nodes that define the macroscale structure of the corresponding interactomes is likely to be driven by protein intrinsic disorder.
In an attempt to understand what might trigger the transition to multicellularity, the genome and proteome of a single cellular eukaryote, the amoeboid holozoan Capsaspora owczarzaki, which is one of the evolutionarily closest relatives of the first multicellular animals, were investigated [244]. The researchers paid special attention to the genes/proteins involved in the transcriptional regulation, as untangling the early evolution of transcription factors (TFs) is critical for understanding the origin of metazoans and animal development [244]. This analysis revealed that C. owczarzaki contains more transcription factors than any other known single-cellular organism and that the transcription factors found in this organism are already organized in specific networks that are often found in multicellular animals as well. It was also emphasized that the complexity of the repertoire of transcription factors in C. owczarzaki “is strikingly high, pushing back further the origin of some transcription factors formerly thought to be metazoan specific” [244]. Therefore, it seems that at least some means (in the form of the specific TF-containing networks) required for animal development were present even before the appearance of multicellularity, suggesting that the switch to multicellularity was driven by devising new ways of gene regulation rather than by the appearance of more new genes [244]. Figure 7 illustrates the remarkably high level of global intrinsic disorder content in the C. owczarzaki proteome, which is comparable to that of the human proteome.
Phenotypic changes in animal lineages are linked to the gain, loss, and modification of gene regulatory elements [245]. Often, such regulation is achieved using cis-regulatory conserved non-exonic elements (CNEEs), which are evolutionarily conserved yet do not overlap with any, coding or noncoding, mature transcript [245], and which show a strong linkage with trait/disease-associated single nucleotide polymorphisms [246]. By analyzing genome-wide sets of putative regulatory regions in five vertebrates, including humans, to infer the branch on which each CNEE came under selective constraint, it was shown that the evolution of gene regulatory elements is characterized by the presence of three extended periods [245]. It was indicated that instead of the gradual changes in the frequencies of regulatory elements over the past 650 million years, the evolution of CNEEs saw three different eras, with early vertebrate evolution lasting from the vertebrate ancestor until about 300 million years ago (when mammals split with birds and reptiles), characterized by regulatory gains near transcription factors and developmental genes. The second period, which was between 300 and 100 million years ago, was characterized by the replacement of the first trend by a high frequency of regulatory innovations near extra-cellular signaling genes, and then, from 100 million years ago, the third period, which affected placental mammals, was characterized by an increase in regulatory innovations for genes involved in post-translational protein modification [245]. Although CNEEs, by default, are non-coding elements, peculiarities in their evolution indicate the crucial roles of regulatory gains of genes mostly encoding proteins with high levels of intrinsic disorder, such as transcription factors and receptors, or proteins mostly acting by modifying functionality of IDPs/IDRs, such as proteins related to PTM control. Therefore, this specific CNEE evolution emphasizes the importance of IDPs/IDRs in animal evolution.

5.3. Intrinsic Disorder, Noise/Stochasticity of Transcriptional Regulation, and Development

The examples in the preceding section illustrate the overall complexity of the disorder-based organizing principles of biological networks, which are inherently noisy, and, being promiscuous, rather indiscriminative, and insensitive to the fine details, use combinatorial and fuzzy logics to solve various cellular and organismal queues. Furthermore, all these observations hint at the idea that biological actions are stochastic/noisy, and part of this stochasticity/noisiness is determined by the presence of intrinsic disorder in acting proteins. Importantly, this biological noisiness represents an important driving factor for development and evolution. This concept can be illustrated by considering the dynamical landscape defining stochastic determination of cell fate during, for example, the differentiation of mouse hematopoietic stem cells into specialized blood cell types via the formation of multipotent progenitor cells first. One of these multipotent progenitor cells, the myeloid progenitor cell, can differentiate either into erythrocytes or precursors of certain white blood cells, with the choice between erythroid and myelomonocytic fates being determined by the interplay between the two lineage-determining transcription factors, GATA1 and PU.1 [247]. In this bifurcation, multipotent progenitor cells expressing more GATA1 will end up in the erythroid state, whereas the myeloid state is triggered by higher levels of PU.1 expression. The complexity of this relatively simple system regulation is determined by the fact that it has sensitive feedback, as GATA1 and PU.1, being self-promoting, can inhibit each other’s expression. The dynamics of the resulting binary fate decision system represent an illustration of the phenomenon of “multilineage priming”, where a gene-circuit generates stable attractors corresponding to the erythroid and myelomonocytic fates, as well as an uncommitted metastable state characterized by co-expression of both TFs [247]. Here, commitment to a particular cell fate occurs in two stages, where at the first stage, the progenitor state is destabilized in an almost symmetrical bifurcation event, resulting in a poised state at the boundary between the two lineage-specific attractors; second, the cell is driven to the respective, now accessible attractors” [247]. It was also shown that another TF, GATA2, which is antagonistic to PU.1 but boosts GATA1 expression, plays an important role in the differentiation of mouse hematopoietic stem cells by adding to the transcription noise [248]. Here, infrequent stochastic bursts of transcription lead to the co-expression of these antagonistic TFs in the majority of hematopoietic stem and progenitor cells, thereby opening a possibility for the cells to reach both target lineages more reliably instead of being stuck on one or another track [248]. In other words, the noisiness of the transcription regulation represents an important way of keeping all the cell-fate options open, where a system maintains a temporally stable probability of cells in every available transcriptional state [248]. Since the major players in this system are transcription factors, it is not surprising that GATA1, GATA2, and PU.1 are highly disordered, as illustrated by Figure 8. It is tempting to assume that this system serves as an illustration of the utilization of protein intrinsic disorder in noisy transcriptional regulation required for cell differentiation.

6. Conclusions

This article analyzes some of the potential implementations of intrinsic disorder in the origin of life and evolution. Clearly, the views presented here are rather personalized and admittedly subjective. With a very high probability, some aspects are incompletely covered, and other aspects related to this subject are missed. However, one message is absolutely clear: neither the origin of life nor evolution would be possible without protein intrinsic disorder. In fact, IDPs, with their highly heterogeneous structural organization, related multifunctionality, and enormous interactivity, seem to be perfect life organizers and evolution drivers. Even in a perfect world of highly ordered biological catalysts (enzymes), intrinsic disorder cannot be ridiculed, since primordial IDPs were the entities that started the molecular evolution of modern enzymes. In fact, the chances that a perfect catalyst with a unique 3D structure responsible for a unique catalytic function would spontaneously appear on the primordial Earth are negligible. Instead, one can easily imagine a scenario where an extremely floppy polypeptide capable of lousy substrate recognition could have very sloppy catalytic activity. If the rate of the resulting floppy–sloppy “pseudo catalytic” reaction would be even slightly higher than the rate of the corresponding spontaneous, non-catalyzed reaction, one would have an excellent starting point for evolutionary improvement. Obviously, not everything would evolve into highly ordered specialized machines, and numerous modern biological processes are critically dependent on the floppo-sloppiness of IDPs. Life is not something frozen in time and space, and biological processes (especially those in more complex organisms) are not controlled by the precise “chain of command”, being instead stochastic in nature. Acting as crucial constituents of terrestrial life, IDPs are “edge of the chaos” systems capable of emerging behavior. IDP-driven, IDP-governed, or at least IDP-related emergence is everywhere and has multiple forms and levels. The apparent re-introduction of intrinsic disorder associated with the evolutionary emergence of complex eukaryotic organisms might represent a natural way of addressing the second law of thermodynamics, where the emergence of such organismal complexity is compensated by a noticeable leap in the protein disorder content that provides a necessary increase in the system’s entropy. Evolution is rooted in intrinsic disorder, as IDPs were crucial for the origin of life and the emergence of protocells, drove the split between prokaryotes and eukaryotes, and orchestrated the emergence of multicellularity.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Fischer, E. Einfluss der configuration auf die wirkung der enzyme. Ber. Dt. Chem. Ges. 1894, 27, 2985–2993. [Google Scholar] [CrossRef]
  2. Uversky, V.N. Natively unfolded proteins: A point where biology waits for physics. Protein Sci. 2002, 11, 739–756. [Google Scholar] [CrossRef] [PubMed]
  3. Uversky, V.N. A decade and a half of protein intrinsic disorder: Biology still waits for physics. Protein Sci. 2013, 22, 693–724. [Google Scholar] [CrossRef]
  4. Petsko, G.A.; Ringe, D. Primers in Biology. Protein Structure and Function; New Science Press Ltd.: London, UK; Sinauer Associates, Inc. Publishers: Sunderland, MA, USA; Blackwell Publishing: London, UK, 2004. [Google Scholar]
  5. Uversky, V.N.; Dunker, A.K. Understanding protein non-folding. Biochim. Biophys. Acta 2010, 1804, 1231–1264. [Google Scholar] [CrossRef]
  6. Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.; Meyer, E.F., Jr.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol. 1977, 112, 535–542. [Google Scholar] [CrossRef]
  7. Bloomer, A.C.; Champness, J.N.; Bricogne, G.; Staden, R.; Klug, A. Protein disk of tobacco mosaic virus at 2.8 A resolution showing the interactions within and between subunits. Nature 1978, 276, 362–368. [Google Scholar] [CrossRef]
  8. Bode, W.; Schwager, P.; Huber, R. The transition of bovine trypsinogen to a trypsin-like state upon strong ligand binding. The refined crystal structures of the bovine trypsinogen-pancreatic trypsin inhibitor complex and of its ternary complex with Ile-Val at 1.9 A resolution. J. Mol. Biol. 1978, 118, 99–112. [Google Scholar] [CrossRef]
  9. Le Gall, T.; Romero, P.R.; Cortese, M.S.; Uversky, V.N.; Dunker, A.K. Intrinsic disorder in the Protein Data Bank. J. Biomol. Struct. Dyn. 2007, 24, 325–342. [Google Scholar] [CrossRef] [PubMed]
  10. Dunker, A.K.; Garner, E.; Guilliot, S.; Romero, P.; Albrecht, K.; Hart, J.; Obradovic, Z.; Kissinger, C.; Villafranca, J.E. Protein disorder and the evolution of molecular recognition: Theory, predictions and observations. Pac. Symp. Biocomput. 1998, 3, 473–484. [Google Scholar]
  11. Wright, P.E.; Dyson, H.J. Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. J. Mol. Biol. 1999, 293, 321–331. [Google Scholar] [CrossRef]
  12. Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 2000, 41, 415–427. [Google Scholar] [CrossRef] [PubMed]
  13. Dunker, A.K.; Lawson, J.D.; Brown, C.J.; Williams, R.M.; Romero, P.; Oh, J.S.; Oldfield, C.J.; Campen, A.M.; Ratliff, C.M.; Hipps, K.W.; et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001, 19, 26–59. [Google Scholar] [CrossRef] [PubMed]
  14. Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 2002, 27, 527–533. [Google Scholar] [CrossRef] [PubMed]
  15. Daughdrill, G.W.; Pielak, G.J.; Uversky, V.N.; Cortese, M.S.; Dunker, A.K. Natively disordered proteins. In Handbook of Protein Folding; Buchner, J., Kiefhaber, T., Eds.; Wiley-VCH, Verlag GmbH & Co. KGaA: Weinheim, Germany, 2005; pp. 271–353. [Google Scholar] [CrossRef]
  16. Dunker, A.K.; Babu, M.M.; Barbar, E.; Blackledge, M.; Bondos, S.E.; Dosztányi, Z.; Dyson, H.J.; Forman-Kay, J.; Fuxreiter, M.; Gsponer, J.; et al. What’s in a name? Why these proteins are intrinsically disordered. Intrinsically Disord. Proteins 2013, 1, e24157. [Google Scholar] [CrossRef]
  17. Dunker, A.K.; Obradovic, Z.; Romero, P.; Garner, E.C.; Brown, C.J. Intrinsic protein disorder in complete genomes. Genome Inform. Ser. Workshop Genome Inform. 2000, 11, 161–171. [Google Scholar]
  18. Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 2004, 337, 635–645. [Google Scholar] [CrossRef]
  19. Uversky, V.N. The mysterious unfoldome: Structureless, underappreciated, yet vital part of any given proteome. J. Biomed. Biotechnol. 2010, 2010, 568068. [Google Scholar] [CrossRef]
  20. Xue, B.; Dunker, A.K.; Uversky, V.N. Orderly order in protein intrinsic disorder distribution: Disorder in 3500 proteomes from viruses and the three domains of life. J. Biomol. Struct. Dyn. 2012, 30, 137–149. [Google Scholar] [CrossRef]
  21. Wathen, B.; Jia, Z. Folding by numbers: Primary sequence statistics and their use in studying protein folding. Int. J. Mol. Sci. 2009, 10, 1567–1589. [Google Scholar] [CrossRef]
  22. Uversky, V.N. Unusual biophysics of intrinsically disordered proteins. Biochim. Biophys. Acta 2013, 1834, 932–951. [Google Scholar] [CrossRef]
  23. Campen, A.; Williams, R.M.; Brown, C.J.; Meng, J.; Uversky, V.N.; Dunker, A.K. TOP-IDP-scale: A new amino acid scale measuring propensity for intrinsic disorder. Protein Pept. Lett. 2008, 15, 956–963. [Google Scholar] [CrossRef] [PubMed]
  24. Radivojac, P.; Iakoucheva, L.M.; Oldfield, C.J.; Obradovic, Z.; Uversky, V.N.; Dunker, A.K. Intrinsic disorder and functional proteomics. Biophys. J. 2007, 92, 1439–1456. [Google Scholar] [CrossRef] [PubMed]
  25. Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.; Dunker, A.K. Sequence complexity of disordered protein. Proteins 2001, 42, 38–48. [Google Scholar] [CrossRef] [PubMed]
  26. Garner, E.; Cannon, P.; Romero, P.; Obradovic, Z.; Dunker, A.K. Predicting Disordered Regions from Amino Acid Sequence: Common Themes Despite Differing Structural Characterization. Genome Inform. Ser. Workshop Genome Inform. 1998, 9, 201–213. [Google Scholar]
  27. Williams, R.M.; Obradovi, Z.; Mathura, V.; Braun, W.; Garner, E.C.; Young, J.; Takayama, S.; Brown, C.J.; Dunker, A.K. The protein non-folding problem: Amino acid determinants of intrinsic order and disorder. Pac. Symp. Biocomput. 2001, 89–100. [Google Scholar] [CrossRef]
  28. Vacic, V.; Uversky, V.N.; Dunker, A.K.; Lonardi, S. Composition Profiler: A tool for discovery and visualization of amino acid composition differences. BMC Bioinform. 2007, 8, 211. [Google Scholar] [CrossRef]
  29. Ferron, F.; Longhi, S.; Canard, B.; Karlin, D. A practical overview of protein disorder prediction methods. Proteins 2006, 65, 1–14. [Google Scholar] [CrossRef]
  30. Bourhis, J.M.; Canard, B.; Longhi, S. Predicting protein disorder and induced folding: From theoretical principles to practical applications. Curr. Protein Pept. Sci. 2007, 8, 135–149. [Google Scholar] [CrossRef]
  31. Dosztanyi, Z.; Sandor, M.; Tompa, P.; Simon, I. Prediction of protein disorder at the domain level. Curr. Protein Pept. Sci. 2007, 8, 161–171. [Google Scholar] [CrossRef]
  32. Dosztanyi, Z.; Tompa, P. Prediction of protein disorder. Methods Mol. Biol. 2008, 426, 103–115. [Google Scholar] [CrossRef]
  33. He, B.; Wang, K.; Liu, Y.; Xue, B.; Uversky, V.N.; Dunker, A.K. Predicting intrinsic disorder in proteins: An overview. Cell Res. 2009, 19, 929–949. [Google Scholar] [CrossRef] [PubMed]
  34. Jin, F.; Liu, Z. Inherent Relationships among Different Biophysical Prediction Methods for Intrinsically Disordered Proteins. Biophys. J. 2013, 104, 488–495. [Google Scholar] [CrossRef] [PubMed]
  35. Romero, P.; Obradovic, Z.; Kissinger, C.R.; Villafranca, J.E.; Garner, E.; Guilliot, S.; Dunker, A.K. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448. [Google Scholar]
  36. Feng, Z.P.; Zhang, X.; Han, P.; Arora, N.; Anders, R.F.; Norton, R.S. Abundance of intrinsically unstructured proteins in P. falciparum and other apicomplexan parasite proteomes. Mol. Biochem. Parasitol. 2006, 150, 256–267. [Google Scholar] [CrossRef]
  37. Tompa, P.; Dosztanyi, Z.; Simon, I. Prevalent structural disorder in E. coli and S. cerevisiae proteomes. J. Proteome Res. 2006, 5, 1996–2000. [Google Scholar] [CrossRef] [PubMed]
  38. Galea, C.A.; High, A.A.; Obenauer, J.C.; Mishra, A.; Park, C.G.; Punta, M.; Schlessinger, A.; Ma, J.; Rost, B.; Slaughter, C.A.; et al. Large-scale analysis of thermostable, mammalian proteins provides insights into the intrinsically disordered proteome. J. Proteome Res. 2009, 8, 211–226. [Google Scholar] [CrossRef] [PubMed]
  39. Xue, B.; Williams, R.W.; Oldfield, C.J.; Dunker, A.K.; Uversky, V.N. Archaic chaos: Intrinsically disordered proteins in Archaea. BMC Syst. Biol. 2010, 4 (Suppl. S1), S1. [Google Scholar] [CrossRef]
  40. Burra, P.V.; Kalmar, L.; Tompa, P. Reduction in structural disorder and functional complexity in the thermal adaptation of prokaryotes. PLoS ONE 2010, 5, e12069. [Google Scholar] [CrossRef]
  41. Dunker, A.K.; Cortese, M.S.; Romero, P.; Iakoucheva, L.M.; Uversky, V.N. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005, 272, 5129–5148. [Google Scholar] [CrossRef]
  42. Uversky, V.N.; Oldfield, C.J.; Dunker, A.K. Showing your ID: Intrinsic disorder as an ID for recognition, regulation and cell signaling. J. Mol. Recognit. 2005, 18, 343–384. [Google Scholar] [CrossRef]
  43. Iakoucheva, L.M.; Brown, C.J.; Lawson, J.D.; Obradovic, Z.; Dunker, A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J. Mol. Biol. 2002, 323, 573–584. [Google Scholar] [CrossRef] [PubMed]
  44. Vucetic, S.; Xie, H.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker, A.K.; Obradovic, Z.; Uversky, V.N. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. J. Proteome Res. 2007, 6, 1899–1916. [Google Scholar] [CrossRef] [PubMed]
  45. Xie, H.; Vucetic, S.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker, A.K.; Obradovic, Z.; Uversky, V.N. Functional anthology of intrinsic disorder. 3. Ligands, post-translational modifications, and diseases associated with intrinsically disordered proteins. J. Proteome Res. 2007, 6, 1917–1932. [Google Scholar] [CrossRef]
  46. Xie, H.; Vucetic, S.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker, A.K.; Uversky, V.N.; Obradovic, Z. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J. Proteome Res. 2007, 6, 1882–1898. [Google Scholar] [CrossRef] [PubMed]
  47. Wickramasinghe, N.C.; Wickramasinghe, J.; Napier, W. Comets and the Origin of Life; World Scientific: Singapore, 2009. [Google Scholar]
  48. Nakashima, S.; Kebukawa, Y.; Kitadai, N.; Igisu, M.; Matsuoka, N. Geochemistry and the Origin of Life: From Extraterrestrial Processes, Chemical Evolution on Earth, Fossilized Life’s Records, to Natures of the Extant Life. Life 2018, 8, 39. [Google Scholar] [CrossRef] [PubMed]
  49. Rimola, A.; Balucani, N.; Ceccarelli, C.; Ugliengo, P. Tracing the Primordial Chemical Life of Glycine: A Review from Quantum Chemical Simulations. Int. J. Mol. Sci. 2022, 23, 4252. [Google Scholar] [CrossRef]
  50. Irvine, W.M. Extraterrestrial organic matter: A review. Orig. Life Evol. Biosph. 1998, 28, 365–383. [Google Scholar] [CrossRef]
  51. Krasnokutski, S.; Chuang, K.-J.; Jäger, C.; Ueberschaar, N.; Henning, T. A pathway to peptides in space through the condensation of atomic carbon. Nat. Astron. 2022, 6, 381–386. [Google Scholar] [CrossRef]
  52. Kulkarni, P.; Salgia, R.; Uversky, V.N. Intrinsic disorder, extraterrestrial peptides, and prebiotic life on the earth. J. Biomol. Struct. Dyn. 2023, 41, 5481–5485. [Google Scholar] [CrossRef]
  53. Rivera-Valentin, E.G.; Filiberto, J.; Lynch, K.L.; Mamajanov, I.; Lyons, T.W.; Schulte, M.; Mendez, A. Introduction-First Billion Years: Habitability. Astrobiology 2021, 21, 893–905. [Google Scholar] [CrossRef]
  54. Cronin, J.R.; Pizzarello, S. Amino acids in meteorites. Adv. Space Res. 1983, 3, 5–18. [Google Scholar] [CrossRef] [PubMed]
  55. McGeoch, J.E.; McGeoch, M.W. A 4641Da polymer of amino acids in Acfer 086 and Allende meteorites. arXiv 2017, arXiv:1707.09080. [Google Scholar] [CrossRef]
  56. McGeoch, M.; Dikler, S.; McGeoch, J.E. Hemolithin: A meteoritic protein containing iron and lithium. arXiv 2020, arXiv:2002.11688. [Google Scholar] [CrossRef]
  57. Radzicka, A.; Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 1996, 118, 6105–6109. [Google Scholar] [CrossRef]
  58. Oparin, A.I. The Origin of Life; Moscow Worker Publisher: Moscow, Russia, 1924. (In Russian) [Google Scholar]
  59. Haldane, J.B.S. The origin of life. In The Rationalist Annual for the Year 1929; Watts, C.A., Ed.; Watts & Co: London, UK, 1929; pp. 3–10. [Google Scholar]
  60. Miller, S.L. A production of amino acids under possible primitive earth conditions. Science 1953, 117, 528–529. [Google Scholar] [CrossRef]
  61. Miller, S.L.; Urey, H.C. Organic compound synthesis on the primitive earth. Science 1959, 130, 245–251. [Google Scholar] [CrossRef] [PubMed]
  62. Crick, F.H. The origin of the genetic code. J. Mol. Biol. 1968, 38, 367–379. [Google Scholar] [CrossRef] [PubMed]
  63. Wong, J.T. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. USA 1975, 72, 1909–1912. [Google Scholar] [CrossRef]
  64. Jukes, T.H. Possibilities for the evolution of the genetic code from a preceding form. Nature 1973, 246, 22–26. [Google Scholar] [CrossRef]
  65. Trifonov, E.N. Consensus temporal order of amino acids and evolution of the triplet code. Gene 2000, 261, 139–151. [Google Scholar] [CrossRef]
  66. Sickmeier, M.; Hamilton, J.A.; LeGall, T.; Vacic, V.; Cortese, M.S.; Tantos, A.; Szabo, B.; Tompa, P.; Chen, J.; Uversky, V.N.; et al. DisProt: The Database of Disordered Proteins. Nucleic Acids Res. 2007, 35, D786–D793. [Google Scholar] [CrossRef] [PubMed]
  67. Garbuzynskiy, S.O.; Lobanov, M.Y.; Galzitskaya, O.V. To be folded or to be unfolded? Protein Sci. 2004, 13, 2871–2877. [Google Scholar] [CrossRef] [PubMed]
  68. Xia, T.; SantaLucia, J., Jr.; Burkard, M.E.; Kierzek, R.; Schroeder, S.J.; Jiao, X.; Cox, C.; Turner, D.H. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 1998, 37, 14719–14735. [Google Scholar] [CrossRef]
  69. Zhou, H.; Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 2004, 54, 315–322. [Google Scholar] [CrossRef] [PubMed]
  70. Poole, A.M.; Jeffares, D.C.; Penny, D. The path from the RNA world. J. Mol. Evol. 1998, 46, 1–17. [Google Scholar] [CrossRef]
  71. Peng, Z.; Oldfield, C.J.; Xue, B.; Mizianty, M.J.; Dunker, A.K.; Kurgan, L.; Uversky, V.N. A creature with a hundred waggly tails: Intrinsically disordered proteins in the ribosome. Cell Mol. Life Sci. 2014, 71, 1477–1504. [Google Scholar] [CrossRef] [PubMed]
  72. Jeffares, D.C.; Poole, A.M.; Penny, D. Relics from the RNA world. J. Mol. Evol. 1998, 46, 18–36. [Google Scholar] [CrossRef]
  73. Tompa, P.; Csermely, P. The role of structural disorder in the function of RNA and protein chaperones. Faseb J. 2004, 18, 1169–1175. [Google Scholar] [CrossRef]
  74. Treiber, D.K.; Williamson, J.R. Beyond kinetic traps in RNA folding. Curr. Opin. Struct. Biol. 2001, 11, 309–314. [Google Scholar] [CrossRef]
  75. Cristofari, G.; Darlix, J.L. The ubiquitous nature of RNA chaperone proteins. Prog. Nucleic Acid. Res. Mol. Biol. 2002, 72, 223–268. [Google Scholar] [CrossRef]
  76. Gilbert, W. Origin of life-the RNA world. Nature 1986, 319, 618. [Google Scholar] [CrossRef]
  77. Csermely, P. Proteins, RNAs and chaperones in enzyme evolution: A folding perspective. Trends Biochem. Sci. 1997, 22, 147–149. [Google Scholar] [CrossRef] [PubMed]
  78. Doolittle, W.F. Uprooting the tree of life. Sci. Am. 2000, 282, 90–95. [Google Scholar] [CrossRef] [PubMed]
  79. Theobald, D.L. A formal test of the theory of universal common ancestry. Nature 2010, 465, 219–222. [Google Scholar] [CrossRef]
  80. Lahav, N.; White, D.; Chang, S. Peptide formation in the prebiotic era: Thermal condensation of glycine in fluctuating clay environments. Science 1978, 201, 67–69. [Google Scholar] [CrossRef] [PubMed]
  81. Rodriguez-Garcia, M.; Surman, A.J.; Cooper, G.J.T.; Suarez-Marina, I.; Hosni, Z.; Lee, M.P.; Cronin, L. Formation of oligopeptides in high yield under simple programmable conditions. Nat. Commun. 2015, 6, 8385. [Google Scholar] [CrossRef]
  82. Campbell, T.D.; Febrian, R.; McCarthy, J.T.; Kleinschmidt, H.E.; Forsythe, J.G.; Bracher, P.J. Prebiotic condensation through wet-dry cycling regulated by deliquescence. Nat. Commun. 2019, 10, 4508. [Google Scholar] [CrossRef]
  83. Sakata, K.; Kitadai, N.; Yokoyama, T. Effects of pH and temperature on dimerization rate of glycine: Evaluation of favorable environmental conditions for chemical evolution of life. Geochim. Cosmochim. Acta 2010, 74, 6841–6851. [Google Scholar] [CrossRef]
  84. Imai, E.; Honda, H.; Hatori, K.; Brack, A.; Matsuno, K. Elongation of oligopeptides in a simulated submarine hydrothermal system. Science 1999, 283, 831–833. [Google Scholar] [CrossRef]
  85. Ohara, S.; Kakegawa, T.; Nakazawa, H. Pressure effects on the abiotic polymerization of glycine. Orig. Life Evol. Biosph. 2007, 37, 215–223. [Google Scholar] [CrossRef]
  86. Muller, F.; Escobar, L.; Xu, F.; Wegrzyn, E.; Nainyte, M.; Amatov, T.; Chan, C.Y.; Pichler, A.; Carell, T. A prebiotically plausible scenario of an RNA-peptide world. Nature 2022, 605, 279–284. [Google Scholar] [CrossRef]
  87. Sumie, Y.; Sato, K.; Kakegawa, T.; Furukawa, Y. Boron-assisted abiotic polypeptide synthesis. Commun. Chem. 2023, 6, 89. [Google Scholar] [CrossRef]
  88. Lazcano, A. Historical development of origins research. Cold Spring Harb. Perspect. Biol. 2010, 2, a002089. [Google Scholar] [CrossRef] [PubMed]
  89. Lei, L.; Burton, Z.F. Chaos, order and systematics in evolution of the genetic code. Preprints, 2020, in press. [CrossRef]
  90. Kauffman, S.A. The Origins of Order: Self-Organization and Selection in Evolution; Oxford University Press: Oxford, MI, USA, 1993. [Google Scholar]
  91. Kacar, B.; Garcia, A.K.; Anbar, A.D. Evolutionary History of Bioessential Elements Can Guide the Search for Life in the Universe. Chembiochem 2021, 22, 114–119. [Google Scholar] [CrossRef]
  92. Matveev, V.V. Cell theory, intrinsically disordered proteins, and the physics of the origin of life. Prog. Biophys. Mol. Biol. 2019, 149, 114–130. [Google Scholar] [CrossRef] [PubMed]
  93. Matsuo, M.; Kurihara, K. Proliferating coacervate droplets as the missing link between chemistry and biology in the origins of life. Nat. Commun. 2021, 12, 5487. [Google Scholar] [CrossRef]
  94. Brangwynne, C.P.; Eckmann, C.R.; Courson, D.S.; Rybarska, A.; Hoege, C.; Gharakhani, J.; Julicher, F.; Hyman, A.A. Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science 2009, 324, 1729–1732. [Google Scholar] [CrossRef]
  95. Darling, A.L.; Uversky, V.N. Known types of membrane-less organelles and biomolecular condensates. In Droplets of Life: Membrane-Less Organelles, Biomolecular Condensates, and Biological Liquid-Liquid Phase Separation, 1st ed.; Uversky, V.N., Ed.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 271–335. [Google Scholar] [CrossRef]
  96. Brangwynne, C.P. Phase transitions and size scaling of membrane-less organelles. J. Cell Biol. 2013, 203, 875–881. [Google Scholar] [CrossRef]
  97. Brangwynne, C.P.; Tompa, P.; Pappu, R.V. Polymer physics of intracellular phase transitions. Nat. Phys. 2015, 11, 899–904. [Google Scholar] [CrossRef]
  98. Uversky, V.N.; Kuznetsova, I.M.; Turoverov, K.K.; Zaslavsky, B. Intrinsically disordered proteins as crucial constituents of cellular aqueous two phase systems and coacervates. FEBS Lett. 2015, 589, 15–22. [Google Scholar] [CrossRef] [PubMed]
  99. Dundr, M.; Misteli, T. Biogenesis of nuclear bodies. Cold Spring Harb. Perspect. Biol. 2010, 2, a000711. [Google Scholar] [CrossRef] [PubMed]
  100. Zhu, L.; Brangwynne, C.P. Nuclear bodies: The emerging biophysics of nucleoplasmic phases. Curr. Opin. Cell Biol. 2015, 34, 23–30. [Google Scholar] [CrossRef] [PubMed]
  101. Uversky, V.N. Intrinsically disordered proteins in overcrowded milieu: Membrane-less organelles, phase separation, and intrinsic disorder. Curr. Opin. Struct. Biol. 2017, 44, 18–30. [Google Scholar] [CrossRef]
  102. Uversky, V.N. Protein intrinsic disorder-based liquid-liquid phase transitions in biological systems: Complex coacervates and membrane-less organelles. Adv. Colloid. Interface Sci. 2017, 239, 97–114. [Google Scholar] [CrossRef] [PubMed]
  103. Feric, M.; Vaidya, N.; Harmon, T.S.; Mitrea, D.M.; Zhu, L.; Richardson, T.M.; Kriwacki, R.W.; Pappu, R.V.; Brangwynne, C.P. Coexisting Liquid Phases Underlie Nucleolar Subcompartments. Cell 2016, 165, 1686–1697. [Google Scholar] [CrossRef]
  104. Mitrea, D.M.; Kriwacki, R.W. Phase separation in biology; functional organization of a higher order. Cell Commun. Signal 2016, 14, 1. [Google Scholar] [CrossRef]
  105. Martin, E.W.; Holehouse, A.S. Intrinsically disordered protein regions and phase separation: Sequence determinants of assembly or lack thereof. Emerg. Top. Life Sci. 2020, 4, 307–329. [Google Scholar] [CrossRef]
  106. Antifeeva, I.A.; Fonin, A.V.; Fefilova, A.S.; Stepanenko, O.V.; Povarova, O.I.; Silonov, S.A.; Kuznetsova, I.M.; Uversky, V.N.; Turoverov, K.K. Liquid-liquid phase separation as an organizing principle of intracellular space: Overview of the evolution of the cell compartmentalization concept. Cell Mol. Life Sci. 2022, 79, 251. [Google Scholar] [CrossRef]
  107. Turoverov, K.K.; Kuznetsova, I.M.; Fonin, A.V.; Darling, A.L.; Zaslavsky, B.Y.; Uversky, V.N. Stochasticity of Biological Soft Matter: Emerging Concepts in Intrinsically Disordered Proteins and Biological Phase Separation. Trends Biochem. Sci. 2019, 44, 716–728. [Google Scholar] [CrossRef]
  108. Darling, A.L.; Liu, Y.; Oldfield, C.J.; Uversky, V.N. Intrinsically Disordered Proteome of Human Membrane-Less Organelles. Proteomics 2018, 18, e1700193. [Google Scholar] [CrossRef] [PubMed]
  109. Uversky, V.N. Protein intrinsic disorder and structure-function continuum. Prog. Mol. Biol. Transl. Sci. 2019, 166, 1–17. [Google Scholar] [CrossRef] [PubMed]
  110. Uversky, V.N. Recent Developments in the Field of Intrinsically Disordered Proteins: Intrinsic Disorder–Based Emergence in Cellular Biology in Light of the Physiological and Pathological Liquid–Liquid Phase Transitions. Annu. Rev. Biophys. 2021, 50, 135–156. [Google Scholar] [CrossRef] [PubMed]
  111. Meng, F.; Na, I.; Kurgan, L.; Uversky, V.N. Compartmentalization and Functionality of Nuclear Disorder: Intrinsic Disorder and Protein-Protein Interactions in Intra-Nuclear Compartments. Int. J. Mol. Sci. 2015, 17, 24. [Google Scholar] [CrossRef]
  112. Uversky, V.N. The roles of intrinsic disorder-based liquid-liquid phase transitions in the “Dr. Jekyll-Mr. Hyde” behavior of proteins involved in amyotrophic lateral sclerosis and frontotemporal lobar degeneration. Autophagy 2017, 13, 2115–2162. [Google Scholar] [CrossRef]
  113. Brangwynne, C.P.; Mitchison, T.J.; Hyman, A.A. Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proc. Natl. Acad. Sci. USA 2011, 108, 4334–4339. [Google Scholar] [CrossRef]
  114. Li, P.; Banjade, S.; Cheng, H.C.; Kim, S.; Chen, B.; Guo, L.; Llaguno, M.; Hollingsworth, J.V.; King, D.S.; Banani, S.F.; et al. Phase transitions in the assembly of multivalent signalling proteins. Nature 2012, 483, 336–340. [Google Scholar] [CrossRef] [PubMed]
  115. Aggarwal, S.; Snaidero, N.; Pahler, G.; Frey, S.; Sanchez, P.; Zweckstetter, M.; Janshoff, A.; Schneider, A.; Weil, M.T.; Schaap, I.A.; et al. Myelin membrane assembly is driven by a phase transition of myelin basic proteins into a cohesive protein meshwork. PLoS Biol. 2013, 11, e1001577. [Google Scholar] [CrossRef]
  116. Feric, M.; Brangwynne, C.P. A nuclear F-actin scaffold stabilizes ribonucleoprotein droplets against gravity in large cells. Nat. Cell Biol. 2013, 15, 1253–1259. [Google Scholar] [CrossRef]
  117. Wippich, F.; Bodenmiller, B.; Trajkovska, M.G.; Wanka, S.; Aebersold, R.; Pelkmans, L. Dual specificity kinase DYRK3 couples stress granule condensation/dissolution to mTORC1 signaling. Cell 2013, 152, 791–805. [Google Scholar] [CrossRef]
  118. Nesterov, S.V.; Ilyinsky, N.S.; Uversky, V.N. Liquid-liquid phase separation as a common organizing principle of intracellular space and biomembranes providing dynamic adaptive responses. Biochim. Biophys. Acta Mol. Cell Res. 2021, 1868, 119102. [Google Scholar] [CrossRef] [PubMed]
  119. Fonin, A.V.; Antifeeva, I.A.; Kuznetsova, I.M.; Turoverov, K.K.; Zaslavsky, B.Y.; Kulkarni, P.; Uversky, V.N. Biological soft matter: Intrinsically disordered proteins in liquid-liquid phase separation and biomolecular condensates. Essays Biochem. 2022, 66, 831–847. [Google Scholar] [CrossRef] [PubMed]
  120. Nott, T.J.; Petsalaki, E.; Farber, P.; Jervis, D.; Fussner, E.; Plochowietz, A.; Craggs, T.D.; Bazett-Jones, D.P.; Pawson, T.; Forman-Kay, J.D.; et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 2015, 57, 936–947. [Google Scholar] [CrossRef] [PubMed]
  121. Mitrea, D.M.; Cika, J.A.; Guy, C.S.; Ban, D.; Banerjee, P.R.; Stanley, C.B.; Nourse, A.; Deniz, A.A.; Kriwacki, R.W. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA. Elife 2016, 5, e13571. [Google Scholar] [CrossRef]
  122. Elbaum-Garfinkle, S.; Kim, Y.; Szczepaniak, K.; Chen, C.C.; Eckmann, C.R.; Myong, S.; Brangwynne, C.P. The disordered P granule protein LAF-1 drives phase separation into droplets with tunable viscosity and dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 7189–7194. [Google Scholar] [CrossRef]
  123. Lin, Y.; Protter, D.S.; Rosen, M.K.; Parker, R. Formation and Maturation of Phase-Separated Liquid Droplets by RNA-Binding Proteins. Mol. Cell 2015, 60, 208–219. [Google Scholar] [CrossRef]
  124. Toretsky, J.A.; Wright, P.E. Assemblages: Functional units formed by cellular phase separation. J. Cell Biol. 2014, 206, 579–588. [Google Scholar] [CrossRef]
  125. Csizmok, V.; Follis, A.V.; Kriwacki, R.W.; Forman-Kay, J.D. Dynamic Protein Interaction Networks and New Structural Paradigms in Signaling. Chem. Rev. 2016, 116, 6424. [Google Scholar] [CrossRef]
  126. Rigau, M.; Juan, D.; Valencia, A.; Rico, D. Intronic CNVs and gene expression variation in human populations. PLoS Genet. 2019, 15, e1007902. [Google Scholar] [CrossRef]
  127. Berget, S.M.; Moore, C.; Sharp, P.A. Spliced segments at the 5′ terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. USA 1977, 74, 3171–3175. [Google Scholar] [CrossRef]
  128. Chow, L.T.; Gelinas, R.E.; Broker, T.R.; Roberts, R.J. An amazing sequence arrangement at the 5′ ends of adenovirus 2 messenger RNA. Cell 1977, 12, 1–8. [Google Scholar] [CrossRef] [PubMed]
  129. Williamson, B. DNA insertions and gene structure. Nature 1977, 270, 295–297. [Google Scholar] [CrossRef]
  130. Eddy, S.R. The C-value paradox, junk DNA and ENCODE. Curr. Biol. 2012, 22, R898–R899. [Google Scholar] [CrossRef] [PubMed]
  131. Palazzo, A.F.; Gregory, T.R. The case for junk DNA. PLoS Genet. 2014, 10, e1004351. [Google Scholar] [CrossRef]
  132. Consortium, T.E.P. An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489, 57–74. [Google Scholar] [CrossRef]
  133. Taft, R.J.; Pheasant, M.; Mattick, J.S. The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays 2007, 29, 288–299. [Google Scholar] [CrossRef]
  134. Morris, K.V.; Mattick, J.S. The rise of regulatory RNA. Nat. Rev. Genet. 2014, 15, 423–437. [Google Scholar] [CrossRef]
  135. Gerstberger, S.; Hafner, M.; Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 2014, 15, 829–845. [Google Scholar] [CrossRef]
  136. Van Nostrand, E.L.; Freese, P.; Pratt, G.A.; Wang, X.; Wei, X.; Xiao, R.; Blue, S.M.; Chen, J.Y.; Cody, N.A.L.; Dominguez, D.; et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 2020, 583, 711–719. [Google Scholar] [CrossRef]
  137. Wang, C.; Uversky, V.N.; Kurgan, L. Disordered nucleiome: Abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 2016, 16, 1486–1498. [Google Scholar] [CrossRef]
  138. Peng, Z.; Mizianty, M.J.; Xue, B.; Kurgan, L.; Uversky, V.N. More than just tails: Intrinsic disorder in histone proteins. Mol. Biosyst. 2012, 8, 1886–1901. [Google Scholar] [CrossRef] [PubMed]
  139. Bhalla, J.; Storchan, G.B.; MacCarthy, C.M.; Uversky, V.N.; Tcherkasskaya, O. Local flexibility in molecular function paradigm. Mol. Cell Proteom. 2006, 5, 1212–1223. [Google Scholar] [CrossRef] [PubMed]
  140. Liu, J.; Perumal, N.B.; Oldfield, C.J.; Su, E.W.; Uversky, V.N.; Dunker, A.K. Intrinsic disorder in transcription factors. Biochemistry 2006, 45, 6873–6888. [Google Scholar] [CrossRef] [PubMed]
  141. Minezaki, Y.; Homma, K.; Kinjo, A.R.; Nishikawa, K. Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J. Mol. Biol. 2006, 359, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  142. Coelho Ribeiro Mde, L.; Espinosa, J.; Islam, S.; Martinez, O.; Thanki, J.J.; Mazariegos, S.; Nguyen, T.; Larina, M.; Xue, B.; Uversky, V.N. Malleable ribonucleoprotein machine: Protein intrinsic disorder in the Saccharomyces cerevisiae spliceosome. PeerJ 2013, 1, e2. [Google Scholar] [CrossRef]
  143. Korneta, I.; Bujnicki, J.M. Intrinsic disorder in the human spliceosomal proteome. PLoS Comput. Biol. 2012, 8, e1002641. [Google Scholar] [CrossRef]
  144. Zhao, B.; Katuwawala, A.; Oldfield, C.J.; Hu, G.; Wu, Z.; Uversky, V.N.; Kurgan, L. Intrinsic Disorder in Human RNA-Binding Proteins. J. Mol. Biol. 2021, 433, 167229. [Google Scholar] [CrossRef]
  145. Sambrook, J. Adenovirus amazes at Cold Spring Harbor. Nature 1977, 268, 101–104. [Google Scholar] [CrossRef]
  146. Black, D.L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 2003, 72, 291–336. [Google Scholar] [CrossRef]
  147. Graveley, B.R. Alternative splicing: Increasing diversity in the proteomic world. Trends Genet. 2001, 17, 100–107. [Google Scholar] [CrossRef]
  148. Stamm, S.; Ben-Ari, S.; Rafalska, I.; Tang, Y.; Zhang, Z.; Toiber, D.; Thanaraj, T.A.; Soreq, H. Function of alternative splicing. Gene 2005, 344, 1–20. [Google Scholar] [CrossRef] [PubMed]
  149. Brett, D.; Hanke, J.; Lehmann, G.; Haase, S.; Delbruck, S.; Krueger, S.; Reich, J.; Bork, P. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 2000, 474, 83–86. [Google Scholar] [CrossRef] [PubMed]
  150. Johnson, J.M.; Castle, J.; Garrett-Engele, P.; Kan, Z.; Loerch, P.M.; Armour, C.D.; Santos, R.; Schadt, E.E.; Stoughton, R.; Shoemaker, D.D. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 2003, 302, 2141–2144. [Google Scholar] [CrossRef] [PubMed]
  151. Minneman, K.P. Splice variants of G protein-coupled receptors. Mol. Interv. 2001, 1, 108–116. [Google Scholar]
  152. Thai, T.H.; Kearney, J.F. Distinct and opposite activities of human terminal deoxynucleotidyltransferase splice variants. J. Immunol. 2004, 173, 4009–4019. [Google Scholar] [CrossRef]
  153. Scheper, W.; Zwart, R.; Baas, F. Alternative splicing in the N-terminus of Alzheimer’s presenilin 1. Neurogenetics 2004, 5, 223–227. [Google Scholar] [CrossRef]
  154. Norris, J.D.; Fan, D.; Sherk, A.; McDonnell, D.P. A negative coregulator for the human ER. Mol. Endocrinol. 2002, 16, 459–468. [Google Scholar] [CrossRef]
  155. Ilik, I.A.; Malszycki, M.; Lubke, A.K.; Schade, C.; Meierhofer, D.; Aktas, T. SON and SRRM2 are essential for nuclear speckle formation. Elife 2020, 9, e60579. [Google Scholar] [CrossRef] [PubMed]
  156. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  157. Dayhoff, G.W., 2nd; Uversky, V.N. Rapid prediction and analysis of protein intrinsic disorder. Protein Sci. 2022, 31, e4496. [Google Scholar] [CrossRef]
  158. Romero, P.R.; Zaidi, S.; Fang, Y.Y.; Uversky, V.N.; Radivojac, P.; Oldfield, C.J.; Cortese, M.S.; Sickmeier, M.; LeGall, T.; Obradovic, Z.; et al. Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc. Natl. Acad. Sci. USA 2006, 103, 8390–8395. [Google Scholar] [CrossRef] [PubMed]
  159. Walsh, C.T.; Garneau-Tsodikova, S.; Gatto, G.J., Jr. Protein posttranslational modifications: The chemistry of proteome diversifications. Angew. Chem. Int. Ed. Engl. 2005, 44, 7342–7372. [Google Scholar] [CrossRef]
  160. Witze, E.S.; Old, W.M.; Resing, K.A.; Ahn, N.G. Mapping protein post-translational modifications with mass spectrometry. Nat. Methods 2007, 4, 798–806. [Google Scholar] [CrossRef]
  161. Deribe, Y.L.; Pawson, T.; Dikic, I. Post-translational modifications in signal integration. Nat. Struct. Mol. Biol. 2010, 17, 666–672. [Google Scholar] [CrossRef]
  162. Mann, M.; Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol. 2003, 21, 255–261. [Google Scholar] [CrossRef]
  163. Marks, F. Protein Phosphorylation; VCH Weinheim: New York, NY, USA; Basel, Switzerland; Cambridge, UK; Tokyo, Janpan, 1996. [Google Scholar]
  164. Yang, X.J. Multisite protein modification and intramolecular signaling. Oncogene 2005, 24, 1653–1662. [Google Scholar] [CrossRef] [PubMed]
  165. Erler, J.; Zhang, R.; Petridis, L.; Cheng, X.; Smith, J.C.; Langowski, J. The role of histone tails in the nucleosome: A computational study. Biophys. J. 2014, 107, 2911–2922. [Google Scholar] [CrossRef] [PubMed]
  166. Mersfelder, E.L.; Parthun, M.R. The tale beyond the tail: Histone core domain modifications and the regulation of chromatin structure. Nucleic Acids Res. 2006, 34, 2653–2662. [Google Scholar] [CrossRef]
  167. Dunker, A.K.; Brown, C.J.; Lawson, J.D.; Iakoucheva, L.M.; Obradovic, Z. Intrinsic disorder and protein function. Biochemistry 2002, 41, 6573–6582. [Google Scholar] [CrossRef]
  168. Dunker, A.K.; Brown, C.J.; Obradovic, Z. Identification and functions of usefully disordered proteins. Adv. Protein Chem. 2002, 62, 25–49. [Google Scholar] [CrossRef]
  169. Iakoucheva, L.M.; Radivojac, P.; Brown, C.J.; O’Connor, T.R.; Sikes, J.G.; Obradovic, Z.; Dunker, A.K. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004, 32, 1037–1049. [Google Scholar] [CrossRef] [PubMed]
  170. Radivojac, P.; Vacic, V.; Haynes, C.; Cocklin, R.R.; Mohan, A.; Heyen, J.W.; Goebl, M.G.; Iakoucheva, L.M. Identification, analysis, and prediction of protein ubiquitination sites. Proteins 2010, 78, 365–380. [Google Scholar] [CrossRef] [PubMed]
  171. Uversky, V.N. Protein folding revisited. A polypeptide chain at the folding-misfolding-nonfolding cross-roads: Which way to go? Cell Mol. Life Sci. 2003, 60, 1852–1871. [Google Scholar] [CrossRef]
  172. Turoverov, K.K.; Kuznetsova, I.M.; Uversky, V.N. The protein kingdom extended: Ordered and intrinsically disordered proteins, their folding, supramolecular complex formation, and aggregation. Prog. Biophys. Mol. Biol. 2010, 102, 73–84. [Google Scholar] [CrossRef]
  173. Uversky, V.N. What does it mean to be natively unfolded? Eur. J. Biochem. 2002, 269, 2–12. [Google Scholar] [CrossRef]
  174. Dunker, A.K.; Obradovic, Z. The protein trinity--linking function and disorder. Nat. Biotechnol. 2001, 19, 805–806. [Google Scholar] [CrossRef]
  175. Uversky, V.N. Paradoxes and wonders of intrinsic disorder: Complexity of simplicity. Intrinsically Disord. Proteins 2016, 4, e1135015. [Google Scholar] [CrossRef] [PubMed]
  176. DeForte, S.; Uversky, V.N. Order, Disorder, and Everything in Between. Molecules 2016, 21, 1090. [Google Scholar] [CrossRef]
  177. Uversky, V.N. Dancing Protein Clouds: The Strange Biology and Chaotic Physics of Intrinsically Disordered Proteins. J. Biol. Chem. 2016, 291, 6681–6688. [Google Scholar] [CrossRef]
  178. Uversky, V.N. Functional roles of transiently and intrinsically disordered regions within proteins. FEBS J. 2015, 282, 1182–1189. [Google Scholar] [CrossRef]
  179. Oldfield, C.J.; Cheng, Y.; Cortese, M.S.; Romero, P.; Uversky, V.N.; Dunker, A.K. Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry 2005, 44, 12454–12470. [Google Scholar] [CrossRef] [PubMed]
  180. Patil, A.; Nakamura, H. Disordered domains and high surface charge confer hubs with the ability to interact with multiple proteins in interaction networks. FEBS Lett. 2006, 580, 2041–2045. [Google Scholar] [CrossRef] [PubMed]
  181. Ekman, D.; Light, S.; Bjorklund, A.K.; Elofsson, A. What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006, 7, R45. [Google Scholar] [CrossRef]
  182. Haynes, C.; Oldfield, C.J.; Ji, F.; Klitgord, N.; Cusick, M.E.; Radivojac, P.; Uversky, V.N.; Vidal, M.; Iakoucheva, L.M. Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes. PLoS Comput. Biol. 2006, 2, e100. [Google Scholar] [CrossRef]
  183. Dosztanyi, Z.; Chen, J.; Dunker, A.K.; Simon, I.; Tompa, P. Disorder and sequence repeats in hub proteins and their implications for network evolution. J. Proteome Res. 2006, 5, 2985–2995. [Google Scholar] [CrossRef] [PubMed]
  184. Singh, G.P.; Dash, D. Intrinsic disorder in yeast transcriptional regulatory network. Proteins 2007, 68, 602–605. [Google Scholar] [CrossRef]
  185. Singh, G.P.; Ganapathi, M.; Dash, D. Role of intrinsic disorder in transient interactions of hub proteins. Proteins 2007, 66, 761–765. [Google Scholar] [CrossRef]
  186. Schulz, G.E. Nucleotide Binding Proteins. In Molecular Mechanism of Biological Recognition; Balaban, M., Ed.; Elsevier/North-Holland Biomedical Press: New York, NY, USA, 1979; pp. 79–94. [Google Scholar]
  187. Kriwacki, R.W.; Hengst, L.; Tennant, L.; Reed, S.I.; Wright, P.E. Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state: Conformational disorder mediates binding diversity. Proc. Natl. Acad. Sci. USA 1996, 93, 11504–11509. [Google Scholar] [CrossRef] [PubMed]
  188. Mohan, A.; Oldfield, C.J.; Radivojac, P.; Vacic, V.; Cortese, M.S.; Dunker, A.K.; Uversky, V.N. Analysis of molecular recognition features (MoRFs). J. Mol. Biol. 2006, 362, 1043–1059. [Google Scholar] [CrossRef]
  189. Cheng, Y.; Oldfield, C.J.; Meng, J.; Romero, P.; Uversky, V.N.; Dunker, A.K. Mining alpha-helix-forming molecular recognition features with cross species sequence alignments. Biochemistry 2007, 46, 13468–13477. [Google Scholar] [CrossRef]
  190. Disfani, F.M.; Hsu, W.L.; Mizianty, M.J.; Oldfield, C.J.; Xue, B.; Dunker, A.K.; Uversky, V.N.; Kurgan, L. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 2012, 28, i75–i83. [Google Scholar] [CrossRef] [PubMed]
  191. Landsteiner, K. The Specificity of Serological Reactions; Courier Dover Publications: Mineola, NY, USA, 1936. [Google Scholar]
  192. Pauling, L. A theory of the structure and process of formation of antibodies. J. Am. Chem. Soc. 1940, 62, 2643–2657. [Google Scholar] [CrossRef]
  193. Karush, F. Heterogeneity of the binding sites of bovine serum albumin. J. Am. Chem. Soc. 1950, 72, 2705–2713. [Google Scholar] [CrossRef]
  194. Meador, W.E.; Means, A.R.; Quiocho, F.A. Modulation of calmodulin plasticity in molecular recognition on the basis of x-ray structures. Science 1993, 262, 1718–1721. [Google Scholar] [CrossRef]
  195. Uversky, V.N. A protein-chameleon: Conformational plasticity of alpha-synuclein, a disordered protein involved in neurodegenerative disorders. J. Biomol. Struct. Dyn. 2003, 21, 211–234. [Google Scholar] [CrossRef]
  196. Fuxreiter, M.; Tompa, P. Fuzzy complexes: A more stochastic view of protein function. Adv. Exp. Med. Biol. 2012, 725, 1–14. [Google Scholar] [CrossRef]
  197. Tompa, P.; Fuxreiter, M. Fuzzy complexes: Polymorphism and structural disorder in protein-protein interactions. Trends Biochem. Sci. 2008, 33, 2–8. [Google Scholar] [CrossRef]
  198. Uversky, V.N. Multitude of binding modes attainable by intrinsically disordered proteins: A portrait gallery of disorder-based complexes. Chem. Soc. Rev. 2011, 40, 1623–1634. [Google Scholar] [CrossRef]
  199. Permyakov, S.E.; Millett, I.S.; Doniach, S.; Permyakov, E.A.; Uversky, V.N. Natively unfolded C-terminal domain of caldesmon remains substantially unstructured after the effective binding to calmodulin. Proteins 2003, 53, 855–862. [Google Scholar] [CrossRef] [PubMed]
  200. Sigalov, A.; Aivazian, D.; Stern, L. Homooligomerization of the cytoplasmic domain of the T cell receptor zeta chain and of other proteins containing the immunoreceptor tyrosine-based activation motif. Biochemistry 2004, 43, 2049–2061. [Google Scholar] [CrossRef]
  201. Sigalov, A.B.; Zhuravleva, A.V.; Orekhov, V.Y. Binding of intrinsically disordered proteins is not necessarily accompanied by a structural transition to a folded form. Biochimie 2007, 89, 419–421. [Google Scholar] [CrossRef] [PubMed]
  202. Bjarnadottir, T.K.; Gloriam, D.E.; Hellstrand, S.H.; Kristiansson, H.; Fredriksson, R.; Schioth, H.B. Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse. Genomics 2006, 88, 263–273. [Google Scholar] [CrossRef] [PubMed]
  203. Anantharaman, V.; Abhiman, S.; de Souza, R.F.; Aravind, L. Comparative genomics uncovers novel structural and functional features of the heterotrimeric GTPase signaling system. Gene 2011, 475, 63–78. [Google Scholar] [CrossRef]
  204. Southan, C.; Sharman, J.L.; Benson, H.E.; Faccenda, E.; Pawson, A.J.; Alexander, S.P.; Buneman, O.P.; Davenport, A.P.; McGrath, J.C.; Peters, J.A.; et al. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: Towards curated quantitative interactions between 1300 protein targets and 6000 ligands. Nucleic Acids Res 2016, 44, D1054–D1068. [Google Scholar] [CrossRef]
  205. Fredriksson, R.; Lagerstrom, M.C.; Lundin, L.G.; Schioth, H.B. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol. Pharmacol. 2003, 63, 1256–1272. [Google Scholar] [CrossRef]
  206. Flock, T.; Hauser, A.S.; Lund, N.; Gloriam, D.E.; Balaji, S.; Babu, M.M. Selectivity determinants of GPCR-G-protein binding. Nature 2017, 545, 317–322. [Google Scholar] [CrossRef]
  207. Isberg, V.; de Graaf, C.; Bortolato, A.; Cherezov, V.; Katritch, V.; Marshall, F.H.; Mordalski, S.; Pin, J.P.; Stevens, R.C.; Vriend, G.; et al. Generic GPCR residue numbers-aligning topology maps while minding the gaps. Trends Pharmacol. Sci. 2015, 36, 22–31. [Google Scholar] [CrossRef]
  208. Fonin, A.V.; Darling, A.L.; Kuznetsova, I.M.; Turoverov, K.K.; Uversky, V.N. Multi-functionality of proteins involved in GPCR and G protein signaling: Making sense of structure-function continuum with intrinsic disorder-based proteoforms. Cell Mol. Life Sci. 2019, 76, 4461–4492. [Google Scholar] [CrossRef]
  209. Neves, S.R.; Ram, P.T.; Iyengar, R. G protein pathways. Science 2002, 296, 1636–1639. [Google Scholar] [CrossRef]
  210. Marinissen, M.J.; Gutkind, J.S. G-protein-coupled receptors and signaling networks: Emerging paradigms. Trends Pharmacol. Sci. 2001, 22, 368–376. [Google Scholar] [CrossRef]
  211. Latorraca, N.R.; Venkatakrishnan, A.J.; Dror, R.O. GPCR Dynamics: Structures in Motion. Chem. Rev. 2017, 117, 139–155. [Google Scholar] [CrossRef] [PubMed]
  212. Venkatakrishnan, A.J.; Flock, T.; Prado, D.E.; Oates, M.E.; Gough, J.; Madan Babu, M. Structured and disordered facets of the GPCR fold. Curr. Opin. Struct. Biol. 2014, 27, 129–137. [Google Scholar] [CrossRef] [PubMed]
  213. Bushdid, C.; Magnasco, M.O.; Vosshall, L.B.; Keller, A. Humans can discriminate more than 1 trillion olfactory stimuli. Science 2014, 343, 1370–1372. [Google Scholar] [CrossRef]
  214. Malnic, B.; Hirono, J.; Sato, T.; Buck, L.B. Combinatorial receptor codes for odors. Cell 1999, 96, 713–723. [Google Scholar] [CrossRef]
  215. Reddy, G.; Zak, J.D.; Vergassola, M.; Murthy, V.N. Antagonism in olfactory receptor neurons and its implications for the perception of odor mixtures. Elife 2018, 7, e34958. [Google Scholar] [CrossRef]
  216. Gánti, T. Chemoton Theory: Theory of Living Systems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  217. Kulkarni, P.; Bhattacharya, S.; Achuthan, S.; Behal, A.; Jolly, M.K.; Kotnala, S.; Mohanty, A.; Rangarajan, G.; Salgia, R.; Uversky, V. Intrinsically Disordered Proteins: Critical Components of the Wetware. Chem. Rev. 2022, 122, 6614–6633. [Google Scholar] [CrossRef] [PubMed]
  218. Katsnelson, A. Did Disordered Proteins Help Launch Life on Earth? ACS Cent. Sci. 2020, 6, 1854–1857. [Google Scholar] [CrossRef] [PubMed]
  219. Kulkarni, P.; Uversky, V.N. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. Proteomics 2018, 18, e1800061. [Google Scholar] [CrossRef]
  220. Penny, D.; Foulds, L.R.; Hendy, M.D. Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences. Nature 1982, 297, 197–200. [Google Scholar] [CrossRef]
  221. Futuyma, D.J. Evolutionary Biology, 3rd ed.; Sinauer Associates Inc.: Sunderland, MA, USA, 1998. [Google Scholar]
  222. Zuckerkandl, E.; Pauling, L. Evolutionary divergence and convergence in proteins. In Evolving Genes and Proteins; Elsevier: Amsterdam, The Netherlands, 1965; pp. 97–166. [Google Scholar]
  223. Schluter, H.; Apweiler, R.; Holzhutter, H.G.; Jungblut, P.R. Finding one’s way in proteomics: A protein species nomenclature. Chem. Cent. J. 2009, 3, 11. [Google Scholar] [CrossRef]
  224. Uhlen, M.; Bjorling, E.; Agaton, C.; Szigyarto, C.A.; Amini, B.; Andersen, E.; Andersson, A.C.; Angelidou, P.; Asplund, A.; Asplund, C.; et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteom. 2005, 4, 1920–1932. [Google Scholar] [CrossRef] [PubMed]
  225. Farrah, T.; Deutsch, E.W.; Omenn, G.S.; Sun, Z.; Watts, J.D.; Yamamoto, T.; Shteynberg, D.; Harris, M.M.; Moritz, R.L. State of the Human Proteome in 2013 as Viewed through PeptideAtlas: Comparing the Kidney, Urine, and Plasma Proteomes for the Biology- and Disease-Driven Human Proteome Project. J. Proteome Res. 2014, 13, 60–75. [Google Scholar] [CrossRef] [PubMed]
  226. Farrah, T.; Deutsch, E.W.; Hoopmann, M.R.; Hallows, J.L.; Sun, Z.; Huang, C.Y.; Moritz, R.L. The State of the Human Proteome in 2012 as Viewed through PeptideAtlas. J. Proteome Res. 2013, 12, 162–171. [Google Scholar] [CrossRef] [PubMed]
  227. Reddy, P.J.; Ray, S.; Srivastava, S. The Quest of the Human Proteome and the Missing Proteins: Digging Deeper. Omics-A J. Integr. Biol. 2015, 19, 276–282. [Google Scholar] [CrossRef]
  228. Kim, M.S.; Pinto, S.M.; Getnet, D.; Nirujogi, R.S.; Manda, S.S.; Chaerkady, R.; Madugundu, A.K.; Kelkar, D.S.; Isserlin, R.; Jain, S.; et al. A draft map of the human proteome. Nature 2014, 509, 575–581. [Google Scholar] [CrossRef]
  229. Ponomarenko, E.A.; Poverennaya, E.V.; Ilgisonis, E.V.; Pyatnitskiy, M.A.; Kopylov, A.T.; Zgoda, V.G.; Lisitsa, A.V.; Archakov, A.I. The Size of the Human Proteome: The Width and Depth. Int. J. Anal. Chem. 2016, 2016, 7436849. [Google Scholar] [CrossRef]
  230. Smith, L.M.; Kelleher, N.L.; Consortium for Top Down, P. Proteoform: A single term describing protein complexity. Nat. Methods 2013, 10, 186–187. [Google Scholar] [CrossRef]
  231. Uversky, V.N. p53 Proteoforms and Intrinsic Disorder: An Illustration of the Protein Structure-Function Continuum Concept. Int. J. Mol. Sci. 2016, 17, 1874. [Google Scholar] [CrossRef]
  232. Pejaver, V.; Hsu, W.L.; Xin, F.; Dunker, A.K.; Uversky, V.N.; Radivojac, P. The structural and functional signatures of proteins that undergo multiple events of post-translational modification. Protein Sci. 2014, 23, 1077–1093. [Google Scholar] [CrossRef]
  233. Dunker, A.K.; Silman, I.; Uversky, V.N.; Sussman, J.L. Function and structure of inherently disordered proteins. Curr. Opin. Struct. Biol. 2008, 18, 756–764. [Google Scholar] [CrossRef]
  234. Dunker, A.K.; Uversky, V.N. Signal transduction via unstructured protein conduits. Nat. Chem. Biol. 2008, 4, 229–230. [Google Scholar] [CrossRef] [PubMed]
  235. Uversky, V.N. Disordered competitive recruiter: Fast and foldable. J. Mol. Biol. 2012, 418, 267–268. [Google Scholar] [CrossRef] [PubMed]
  236. Uversky, V.N.; Dunker, A.K. The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biol. Rep. 2013, 5, 1. [Google Scholar] [CrossRef] [PubMed]
  237. Dyson, H.J.; Wright, P.E. Coupling of folding and binding for unstructured proteins. Curr. Opin. Struct. Biol. 2002, 12, 54–60. [Google Scholar] [CrossRef]
  238. Dyson, H.J.; Wright, P.E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 2005, 6, 197–208. [Google Scholar] [CrossRef]
  239. Vacic, V.; Oldfield, C.J.; Mohan, A.; Radivojac, P.; Cortese, M.S.; Uversky, V.N.; Dunker, A.K. Characterization of molecular recognition features, MoRFs, and their binding partners. J. Proteome Res. 2007, 6, 2351–2366. [Google Scholar] [CrossRef]
  240. Comolatti, R.; Hoel, E. Causal emergence is widespread across measures of causation. arXiv 2022, arXiv:2202.01854. [Google Scholar] [CrossRef]
  241. Baranger, M. Chaos, Complexity, and Entropy; New England Complex Systems Institute: Cambridge, UK, 2000; Volume 7. [Google Scholar]
  242. Klein, B.; Hoel, E.; Swain, A.; Griebenow, R.; Levin, M. Evolution and emergence: Higher order information structure in protein interactomes across the tree of life. Integr. Biol. 2021, 13, 283–294. [Google Scholar] [CrossRef]
  243. Klein, B.; Hoel, E. The emergence of informative higher scales in complex networks. Complexity 2020, 2020, 8932526. [Google Scholar] [CrossRef]
  244. Sebe-Pedros, A.; de Mendoza, A.; Lang, B.F.; Degnan, B.M.; Ruiz-Trillo, I. Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Mol. Biol. Evol. 2011, 28, 1241–1254. [Google Scholar] [CrossRef]
  245. Lowe, C.B.; Kellis, M.; Siepel, A.; Raney, B.J.; Clamp, M.; Salama, S.R.; Kingsley, D.M.; Lindblad-Toh, K.; Haussler, D. Three periods of regulatory innovation during vertebrate evolution. Science 2011, 333, 1019–1024. [Google Scholar] [CrossRef] [PubMed]
  246. Hindorff, L.A.; Sethupathy, P.; Junkins, H.A.; Ramos, E.M.; Mehta, J.P.; Collins, F.S.; Manolio, T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 2009, 106, 9362–9367. [Google Scholar] [CrossRef] [PubMed]
  247. Huang, S.; Guo, Y.P.; May, G.; Enver, T. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev. Biol. 2007, 305, 695–713. [Google Scholar] [CrossRef]
  248. Wheat, J.C.; Sella, Y.; Willcockson, M.; Skoultchi, A.I.; Bergman, A.; Singer, R.H.; Steidl, U. Single-molecule imaging of transcription dynamics in somatic stem cells. Nature 2020, 583, 431–436. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Correlations between three foldability scales (the scale based on the average number of contacts per residue in the ordered proteins (Galzitskaya) [67] (A), the DisProt-based scale [66] (B), and the Top-IDP scale [23] (C)) and amino acid novelty scale proposed by Trifonov [65]. Red and blue symbols correspond to disorder- and order-promoting residues as defined by the DisProt-based scale. Pink and cyan squares with error bars show averaged values.
Figure 1. Correlations between three foldability scales (the scale based on the average number of contacts per residue in the ordered proteins (Galzitskaya) [67] (A), the DisProt-based scale [66] (B), and the Top-IDP scale [23] (C)) and amino acid novelty scale proposed by Trifonov [65]. Red and blue symbols correspond to disorder- and order-promoting residues as defined by the DisProt-based scale. Pink and cyan squares with error bars show averaged values.
Life 14 01307 g001
Figure 2. Modern genetic code with information on the early and late codons (shown by light red and light blue colors, respectively) and disorder- and order-promoting residues (shown by red and blue colors, respectively). Codons with intermediate ages (i.e., those located between early and late codons) are shown by the light pink color, whereas disorder-neutral residues are shown by the pink color. Adapted with permission from Ref. [3]. Copyright © 2013. The Protein Society.
Figure 2. Modern genetic code with information on the early and late codons (shown by light red and light blue colors, respectively) and disorder- and order-promoting residues (shown by red and blue colors, respectively). Codons with intermediate ages (i.e., those located between early and late codons) are shown by the light pink color, whereas disorder-neutral residues are shown by the pink color. Adapted with permission from Ref. [3]. Copyright © 2013. The Protein Society.
Life 14 01307 g002
Figure 3. Correlations between thermostability of the codons (measured as melting enthalpies (kcal/M) of the dinucleotide stacks corresponding to the first and second codon positions [68]) and amino acid novelty of corresponding residue (A), thermostability of codons and DisProt foldability of corresponding residues (B), and thermostability of codons and buriability of corresponding residues (C), buriability of amino acids and their novelty, (D), and DisProt foldability and buriability (E). Red and blue symbols correspond to disorder- and order-promoting residues as defined by the DisProt-based scale.
Figure 3. Correlations between thermostability of the codons (measured as melting enthalpies (kcal/M) of the dinucleotide stacks corresponding to the first and second codon positions [68]) and amino acid novelty of corresponding residue (A), thermostability of codons and DisProt foldability of corresponding residues (B), and thermostability of codons and buriability of corresponding residues (C), buriability of amino acids and their novelty, (D), and DisProt foldability and buriability (E). Red and blue symbols correspond to disorder- and order-promoting residues as defined by the DisProt-based scale.
Life 14 01307 g003
Figure 4. Correlation between intrinsic disorder content and proteome size of 3484 species of viruses, archaea, bacteria, and eukaryotes. Each symbol indicates a species. There are six groups of species: viruses expressing one polyprotein precursor (small red circles filled with blue), other viruses (small red circles), bacteria (small green circles), archaea (blue circles), unicellular eukaryotes (brown squares), and multicellular eukaryotes (pink triangles). Each viral polyprotein was analyzed as a single polypeptide chain, without parsing it into the individual proteins before predictions. The proteome size is the number of proteins in the proteome of that species and is shown as the log base. The average fraction of disordered residues is calculated by averaging the fraction of disordered residues of each sequence over all sequences of that species. Disorder prediction is evaluated by PONDR-VSL2B. Adapted with permission from Ref. [3]. Copyright © 2013 The Protein Society.
Figure 4. Correlation between intrinsic disorder content and proteome size of 3484 species of viruses, archaea, bacteria, and eukaryotes. Each symbol indicates a species. There are six groups of species: viruses expressing one polyprotein precursor (small red circles filled with blue), other viruses (small red circles), bacteria (small green circles), archaea (blue circles), unicellular eukaryotes (brown squares), and multicellular eukaryotes (pink triangles). Each viral polyprotein was analyzed as a single polypeptide chain, without parsing it into the individual proteins before predictions. The proteome size is the number of proteins in the proteome of that species and is shown as the log base. The average fraction of disordered residues is calculated by averaging the fraction of disordered residues of each sequence over all sequences of that species. Disorder prediction is evaluated by PONDR-VSL2B. Adapted with permission from Ref. [3]. Copyright © 2013 The Protein Society.
Life 14 01307 g004
Figure 5. Wavy pattern of the global evolution of protein intrinsic disorder. The x-axis represents evolutionary time and the y-axis shows disorder content in proteins at a given evolutionary time point. Here, primordial proteins are expected to be mostly disordered (left-hand side of the plot), proteins in LUA likely are mostly structured (center of the plot), whereas many proteins in eukaryotes are either totally disordered or hybrids containing both ordered and disordered regions (right-hand side of the plot). Adapted with permission from Ref. [3]. Copyright © 2013 The Protein Society.
Figure 5. Wavy pattern of the global evolution of protein intrinsic disorder. The x-axis represents evolutionary time and the y-axis shows disorder content in proteins at a given evolutionary time point. Here, primordial proteins are expected to be mostly disordered (left-hand side of the plot), proteins in LUA likely are mostly structured (center of the plot), whereas many proteins in eukaryotes are either totally disordered or hybrids containing both ordered and disordered regions (right-hand side of the plot). Adapted with permission from Ref. [3]. Copyright © 2013 The Protein Society.
Life 14 01307 g005
Figure 6. Intrinsic disorder in spliceosomal proteins. (A) The 3D structural model generated for one of the moon-lighting spliceosomal proteins RBFOX2 (UniProt ID: O43251) by AlphaFold [156]. The structure is colored according to the model confidence. (B) Per-residue intrinsic disorder profile of RBFOX2 generated by RIDAO [157]. (C) RIDAO-generated per-residue intrinsic disorder profile of spliceosomal protein SRRM2 (UniProt ID: Q9UQ35) involved in the biogenesis of nuclear speckles.
Figure 6. Intrinsic disorder in spliceosomal proteins. (A) The 3D structural model generated for one of the moon-lighting spliceosomal proteins RBFOX2 (UniProt ID: O43251) by AlphaFold [156]. The structure is colored according to the model confidence. (B) Per-residue intrinsic disorder profile of RBFOX2 generated by RIDAO [157]. (C) RIDAO-generated per-residue intrinsic disorder profile of spliceosomal protein SRRM2 (UniProt ID: Q9UQ35) involved in the biogenesis of nuclear speckles.
Life 14 01307 g006
Figure 7. Multifactorial intrinsic disorder analysis of the entire proteome of amoeboid holozoan Capsaspora owczarzaki containing 9794 proteins. (A) PONDR® VSL2 Score vs. VSL2 PONDR® (%) analysis. PONDR® VSL2 (%) is a percent of predicted intrinsically disordered residues (PPIDRs), i.e., residues with disorder scores above 0.5. PONDR® VSL2 score is the average disorder score (ADS) for a protein. Based on these parameters, query proteins are classified as ordered (PPIDR < 10%; ADS < 0.15), moderately disordered (10% ≤ PPIDR < 30%; 0.15 ≤ ADS < 0.5), and highly disordered (PPIDR ≥ 30%; ADS ≥ 0.5). Color blocks indicate regions in which proteins are mostly ordered (blue and light blue), moderately disordered (pink and light pink), or mostly disordered (red). If the two parameters agree, the corresponding part of the background is dark (blue or pink), whereas light blue and light pink reflect areas in which the predictors disagree with each other. The boundaries of the colored regions represent arbitrary and accepted cutoffs for ADS (y-axis) and the percentage of predicted disordered residues (PPIDR; x-axis). For comparison, in the human proteome, 0.4%, 5.1%, 33.7%, 21.0%, and 40.1% of proteins are located within blue, light blue, pink, light pink, and red segments, respectively. This distribution observed in the human proteome is remarkably close to the distribution reported here for the C. owczarzaki proteins. (B) Charge-Hydropathy and Cumulative Distribution Function (CH-CDF) analysis of C. owczarzaki proteins. The CH-CDF plot is a two-dimensional representation that integrates both the CH plot, which correlates a protein’s net charge and hydrophobicity with its structural order, and the CDF, which cumulates disorder predictions from the N-terminus to the C-terminus of a protein, offering insight into the distribution of disorder residues. The y-axis (ΔCH) represents the protein’s distance from the CH boundary, indicating the balance between charge and hydrophobicity, while the x-axis (ΔCDF) represents the deviation of a protein’s disorder frequency from the CDF boundary. Proteins are then stratified into four quadrants: Quadrant 1 (bottom right) indicates proteins likely to be structured; Quadrant 2 (bottom left) includes proteins that may be in a molten globule state or lack a unique 3D structure; Quadrant 3 (top left) consists of proteins predicted to be highly disordered; Quadrant 4 (top right) captures proteins that present a mixed prediction of being disordered according to CH but ordered according to CDF. For comparison, 59.1%, 25.5%, 12.3%, and 3.1% of human proteins are located within quadrants Q1, Q2, Q3, and Q4, respectively. This indicates that although the C. owczarzaki and human proteomes contain comparable fractions of ordered proteins, there are noticeably more native molten globules and noticeably less highly disordered proteins in the C. owczarzaki proteome.
Figure 7. Multifactorial intrinsic disorder analysis of the entire proteome of amoeboid holozoan Capsaspora owczarzaki containing 9794 proteins. (A) PONDR® VSL2 Score vs. VSL2 PONDR® (%) analysis. PONDR® VSL2 (%) is a percent of predicted intrinsically disordered residues (PPIDRs), i.e., residues with disorder scores above 0.5. PONDR® VSL2 score is the average disorder score (ADS) for a protein. Based on these parameters, query proteins are classified as ordered (PPIDR < 10%; ADS < 0.15), moderately disordered (10% ≤ PPIDR < 30%; 0.15 ≤ ADS < 0.5), and highly disordered (PPIDR ≥ 30%; ADS ≥ 0.5). Color blocks indicate regions in which proteins are mostly ordered (blue and light blue), moderately disordered (pink and light pink), or mostly disordered (red). If the two parameters agree, the corresponding part of the background is dark (blue or pink), whereas light blue and light pink reflect areas in which the predictors disagree with each other. The boundaries of the colored regions represent arbitrary and accepted cutoffs for ADS (y-axis) and the percentage of predicted disordered residues (PPIDR; x-axis). For comparison, in the human proteome, 0.4%, 5.1%, 33.7%, 21.0%, and 40.1% of proteins are located within blue, light blue, pink, light pink, and red segments, respectively. This distribution observed in the human proteome is remarkably close to the distribution reported here for the C. owczarzaki proteins. (B) Charge-Hydropathy and Cumulative Distribution Function (CH-CDF) analysis of C. owczarzaki proteins. The CH-CDF plot is a two-dimensional representation that integrates both the CH plot, which correlates a protein’s net charge and hydrophobicity with its structural order, and the CDF, which cumulates disorder predictions from the N-terminus to the C-terminus of a protein, offering insight into the distribution of disorder residues. The y-axis (ΔCH) represents the protein’s distance from the CH boundary, indicating the balance between charge and hydrophobicity, while the x-axis (ΔCDF) represents the deviation of a protein’s disorder frequency from the CDF boundary. Proteins are then stratified into four quadrants: Quadrant 1 (bottom right) indicates proteins likely to be structured; Quadrant 2 (bottom left) includes proteins that may be in a molten globule state or lack a unique 3D structure; Quadrant 3 (top left) consists of proteins predicted to be highly disordered; Quadrant 4 (top right) captures proteins that present a mixed prediction of being disordered according to CH but ordered according to CDF. For comparison, 59.1%, 25.5%, 12.3%, and 3.1% of human proteins are located within quadrants Q1, Q2, Q3, and Q4, respectively. This indicates that although the C. owczarzaki and human proteomes contain comparable fractions of ordered proteins, there are noticeably more native molten globules and noticeably less highly disordered proteins in the C. owczarzaki proteome.
Life 14 01307 g007
Figure 8. Three-dimensional structural model generated by AlphaFold [156] for mouse GATA1 (UniProt ID: P17679) (A), GATA2 (UniProt ID: O09100) (B), and PU.1 (UniProt ID: P17679) (C) proteins. Structures are colored according to the model confidence, with blue, cyan, yellow, and orange colors corresponding to the regions with very high, high, low, and very low confidence, respectively.
Figure 8. Three-dimensional structural model generated by AlphaFold [156] for mouse GATA1 (UniProt ID: P17679) (A), GATA2 (UniProt ID: O09100) (B), and PU.1 (UniProt ID: P17679) (C) proteins. Structures are colored according to the model confidence, with blue, cyan, yellow, and orange colors corresponding to the regions with very high, high, low, and very low confidence, respectively.
Life 14 01307 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Uversky, V.N. On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution. Life 2024, 14, 1307. https://doi.org/10.3390/life14101307

AMA Style

Uversky VN. On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution. Life. 2024; 14(10):1307. https://doi.org/10.3390/life14101307

Chicago/Turabian Style

Uversky, Vladimir N. 2024. "On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution" Life 14, no. 10: 1307. https://doi.org/10.3390/life14101307

APA Style

Uversky, V. N. (2024). On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution. Life, 14(10), 1307. https://doi.org/10.3390/life14101307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop