3.1. Analysis of Literature on Protein Stabilization
At first glance, it seems that in order to confirm or refute the aforementioned hypothesis, it would be sufficient to analyze the available literature data. For example, one might choose publications, where the authors stabilized proteins by introducing artificially engineered disulfide bonds. Then, via using programs such as PONDR
® FIT, one can find parts of the polypeptide chain that are weakened (predicted as intrinsically disorder) and try to answer the question if there is really a correlation between the stabilization of the protein and the location of the mutation in the sites that are predicted as intrinsically disorder? However, it turned out that the literature data analysis would not help in finding such a correlation, since many factors influence the stability of proteins with artificially introduced disulfide bonds, such as the size of the protein, the specific features of its structure, as well as the design features of disulfide bonds. For example, the introduced disulfide bonds almost always stabilize small proteins, regardless of their location (see [
4,
31]). In addition, when comparing the effects of different disulfide bonds on protein stability, it is important for the artificial disulfide bonds to be equally designed. For example, it is not correct to compare the stabilizing effects of a disulfide bond introduced on the surface of a protein and that is placed in its hydrophobic core. It is clear that the disulfide bonds within the hydrophobic core of the protein could lead to either destabilization of a protein (if it breaks the packing of the protein hydrophobic core) or to protein stabilization, if it stably joints different structural parts of a protein. Similar reasoning leads to the idea that it is not quite correct to compare the effect of disulfide bonds introduced in different parts of a large protein, if secondary structure of these parts is very different.
Using the example of several proteins below, we have tried to demonstrate why it is impossible to use literature data to find a correlation between protein stabilization and the location of the disulfide bonds (
Figure 2). At first glance, these studies confirm our hypothesis. However, if we consider the arguments about the influence of different factors on protein stabilization given above, it turns out that no reliable confirmation for the hypothesis can be drawn.
Figure 2 shows the PONDR
® FIT-based plots of the predicted intrinsic disorder propensities (PIDP) within the amino acid sequences of several such proteins. In addition to these disorder plots,
Figure 2 shows the positions of cysteine residues introduced by the authors of the cited works and the corresponding elevation or diminution of stability temperature of the protein in degrees as digits. From
Figure 2A it could be seen that the N-terminus of the lysozyme can be regarded as the weakest spot (PIDP > 0.5) of this protein. This is possibly the reason of why the three introduced disulfide bonds "fixing" or “stitching” the N-terminus to the protein globule do stabilize the protein [
2]. On the contrary, two disulfide bonds inserted into the well-structured protein region (residues 100–150 with PIDP < 0.5) do not stabilize the protein (
Figure 2A). It would seem that the work on the stabilization of lysozyme [
2], confirms our assumption. However, perhaps such a different effect of the introduced disulfide bonds on the stability of lysozyme is due to their location in the two domains of lysozyme. Three disulfide bonds that stabilize the protein connect the N- and C-terminal domains of lysozyme. Two disulfide bonds that destabilize the protein are located within the same C-terminal domain.
We must also comment on the results of introduction of disulfide bonds into small proteins (Examples on
Figure 2C,D [
32,
33]). Disulfide bonds in any part of a small (around 100 amino acid residues) protein are expected to lead to the overall stabilization of protein structure. The reason is that most often, small proteins fold/unfold without intermediate states. On the other hand, introduction of disulfide bonds in any part of a small protein leads to the sizable compactization of the unfolded state, thereby stabilizing the native state of the protein. It should be reminded here that the free energy of protein stabilization consists of entropy and enthalpy components. For small two-state proteins, it is shown that the gain of free energy when introducing disulfide bonds is associated mainly with a decrease in the entropy of the unfolded state of the protein [
31]. That is why, for example, both disulfide bonds in immunoglobulin domain [
33] resulted in the protein stabilization (
Figure 2D). This is in contrast to the perspectives of our hypothesis, since the disulfide bonds introduced into the middle part of the protein (residues 50–70) should not lead to sizable stabilization, since it falls into the region with low intrinsic disorder propensities (
Figure 2D).
Folding of large proteins is accompanied by the formation of several intermediate states, and large proteins contain several structural elements (domains) with different stability. Therefore, for such proteins, it is very important to know in which structural element the disulfide bond is introduced. By stabilizing the weakest of these elements, one can stabilize the protein as a whole.
Figure 2B shows a PONDR
® FIT-based plot of the predicted per-residue intrinsic disorder propensity of neutral protease. It can be seen that this protein has two regions with high PIDP values, the N-terminus of the protein and the region around residues 200–250. A disulfide bond introduced at the N-terminus of the neutral protease led to protein stabilization (
Figure 2B) [
34]. However, confirmation of our hypothesis would require testing the effects of the disulfide bonds introduced into other regions of the protein. Unfortunately, there is no such data.
An interesting result was reported in the work on the engineered disulfide-based stabilization of subtilisin [
35]. No one of the studied in the work substitutions led to stabilization of the protein [
35]. On one hand, all the residues chosen for substitutions were located within the protein regions with rather low PIDP (<0.5,
Figure 2E), confirming our hypothesis. However, on the other hand, other factors could also contribute to these outputs, such as the presence of some specific structural features of the protein. Subtilisin is a unidomain protein with complex structure, where the elements of secondary structure are packed into four layers [
36] (PDB:2ST1). Therefore, the majority of the disulfide bonds introduced by the authors “fix” or ”stitch” together the elements of secondary structure located within the core of the protein, that are clamped by other structural elements of subtilisin.
The aforementioned examples show that the literature data do not allow us to unequivocally confirm or refute our hypothesis, since many factors might affect stability of proteins with the introduced disulfide bonds, such as design features of cysteine bridges, structural features of target proteins, and their sizes. Therefore, a validation of our hypothesis can be conducted either using bioinformatics methods—by collecting a large database of proteins with introduced disulfide bonds and finding correlations between stability and IDP, or experimentally. We decided to go for an experimental test of our hypothesis, systematically excluding factors associated with the structural features of proteins and the design features of disulfide bonds. There are at least two ways to do this: (1) Select one protein that has similar repeating structural elements, calculate PIDP and introduce a disulfide bond into each of the structural elements, and then evaluate the effect of mutations on protein stability; (2) select two proteins with different amino acid sequence, but with the same three-dimensional structure and carry out similar studies.
3.2. Investigation of Green Fluorescent Protein
We have been studying the folding features of green fluorescent protein for a long time [
18,
37,
38,
39,
40,
41,
42]. This protein is a β-barrel consisting of very similar β-strand (
Figure 3). Therefore, in its structure, one can find similar structural elements and select pairs of close amino acid residues and design disulfide bonds on the surface of the protein (to avoid disruption of the internal packaging of the protein).
Figure 3A,B show two projections of the GFP β-barrel, whereas
Figure 3C–F represent structural elements selected for the experimental analysis, and
Figure 3G shows the plot of intrinsic disorder propensities for amino acid sequences of green fluorescent protein calculated by PONDR
® FIT and IsUnstruct.
Since these two programs use absolutely different approaches for calculation of the intrinsic disorder propensities (See Materials and Methods), from our point of view, comparison of the results of PONDR
® FIT and IsUnstruct should help in minimization of the program selection-dependent errors and select the weakened regions of the proteins more reliably.
Figure 3G shows that the only large protein region predicted as intrinsically disordered (PIDP > 0.5) by both programs is located at the C-terminal part of GFP. Therefore, this C-terminal region could be regarded as weakened. GFP comprises a "barrel" composed of similar β-hairpins. To check our hypothesis, we decided to cross-link both the predicted weakened region (which is a hairpin) and other similar beta-hairpins of the protein with a disorder score below 0.5 by disulfide bonds.
Figure 3 shows 3D structure of GFP, separate view of β-hairpins, and amino acid residues substituted to cysteines. In this study, four mutant variants with substitutions V11C–D36C, Q111C–V93C, K162C–Q184C, S202–T225C were investigated [
18].
Figure 4 shows the equilibrium urea-induced unfolding curves of the wild type GFP and its four double-cysteine mutants in oxidized forms (V11C–D36C, Q111C–V93C, K162C–Q184C, S202C–T225C) monitored by changes in far-UV CD spectra. It can be seen that only one of the introduced disulfide bonds stabilized the protein. This was the S202C–T225C mutant form. In the previous works [
18,
39,
40,
41], we conducted DSC experiments and described heat denaturation of wild type GFP and its mutant oxidized and modified forms. In these previous studies, thermal denaturation of GFP was shown to be quite a complex non-equilibrium process that could be described by a model including two consequent irreversible denaturation stages. Nonetheless, those DSC studies allowed us to conclude that only one mutation, S202C–T225C, stabilized the protein, whereas other mutations destabilized the protein, affecting activation energies and rate constants of heat denaturation.
Therefore, based on these observations we can conclude that the introduction of disulfide bonds into weakened region of GFP that are predicted by PONDR® FIT and IsUnstruct programs as intrinsically disordered leads to stabilization of the protein.
Another confirmation of our hypothesis that the use of the predictors of intrinsic disorder to find regions with high PIDP in structured proteins can help in discovering their weakened spots was found in ribosome proteins L1 from different organisms.
3.3. L1 Protein Study
To find convincing confirmation of correlation of high intrinsic disorder propensities with weakness of protein tertiary structure one must attempt to exclude all other factors affecting protein stability during introduction of cysteine bridges, such as structural features of the proteins and positions of amino acid residues specific for mutations.
The idea is that to correctly test our assumption, it is necessary to study two proteins with identical three-dimensional structures, but quite different amino acid sequences. Similar three-dimensional structure gives us the opportunity to introduce amino acid substitutions into similar regions of proteins with identical secondary structure. Different amino acid sequences give different intrinsic disorder propensity values for structurally identical parts of the proteins. Hence, introduction of identical substitutions into similar structural motifs of proteins with various intrinsic disorder propensities allows us to exclude the effect of secondary structure context for the regions of introduced disulfide bonds.
Two ribosomal proteins L1 were chosen for this study, L1 from the halophilic archaeon
Haloarcula marismortui (HmaL1) and L1 from the extremophilic bacterium
Aquifex aeolicus (AaeL1). The three-dimensional structure of AaeL1 is known from the experiments [
43]; for HmaL1, a homology model was built [
44]. Both proteins have highly conservative three-dimensional structure specific for ribosome L1 proteins, which can be divided into two domains. The amino acid sequences of these proteins have sequence identity of only 33%. Hence, this pair of proteins satisfies our requirements: They are similar in structure, but quite different in sequence.
3.3.1. Comparison of Spatial Structure and Amino Acid Sequences of AaeL1 and HmaL1 Proteins
The three-dimensional structure of L1 proteins is remarkably conservative. It can be divided into two domains (I and II) connected by a flexible linker allowing to change mutual orientation of the domains. There are two variants of domain orientation in the solution, "open" and "closed" states [
21].
Figure 5 (left) shows superposition of the experimentally determined 3D structure of AaeL1 (blue) and modelled 3D structure of HmaL1 (green). Both proteins have two-domain structure specific for L1 proteins: N- and С-tails are located within domain I, which is connected by two-strand linker with domain II. The basic elements of secondary structure of domain I in these two proteins are superposed quite well onto each other. When comparing two protein structures, a specific difference between archaeal and bacterial L1 proteins can be seen; i.e., different rotation degree of the domain II relative to domain I. However, when comparing only domain II structures, we do also see good superposition (see
Figure 6 right).
Alignment of amino acid sequences of the proteins in BLAST (
https://blast.ncbi.nlm.nih.gov/Blast.cgi) showed 33% amino acid identity between AaeL1 and HmaL1.
Figure 5 (right) shows alignment of amino acid sequences of two proteins, the identical amino acid residues are colored gray. AaeL1 and HmaL1 differ in their length: AaeL1 consists of 242 a.a., HmaL1 consists of 212 a.a.
3.3.2. Mapping of Disordered Regions and Design of Disulfide Bridges in AaeL1 and HmaL1
Disordered regions in AaeL1 and HmaL1 proteins were mapped using PONDR
® FIT and IsUnstruct programs [
10,
12].
Figure 6 shows intrinsic disorder propensities for amino acid sequences of AaeL1 (blue) and HmaL1 (green). Gray square shows the region of protein sequence belonging to domain II. In this domain, we see large difference between AaeL1 and HmaL1 proteins in their intrinsic disorder propensities (the region is underlined by red line below the plot). For AaeL1 protein (
Figure 6 left, blue line), this region is predicted as structured (low PIDP values), whereas for HmaL1 (
Figure 6 left, green line) it is predicted as intrinsically disordered (high PIDP propensity). Therefore, this is a region most suitable for the examination of our idea. If the results of PONDR
® FIT and IsUnstruct can be really interpreted as prediction of structured and weakened regions of polypeptide chain, the introduction of disulfide bonds into a region predicted as disordered in HmaL1 should lead to elevated stability and decreased conformational mobility of this protein, whereas insertion of disulfide bonds into the same region predicted as structured in AaeL1 would not affect protein stability or would even decrease it. If local spatial structure is more significant for protein stabilization, then the introduced replacements should affect both proteins similarly.
It is worth noting that a detailed analysis of
Figure 6 left allows us to understand why it is difficult to select a pair of proteins with the same three-dimensional structure but different IDPs to test our hypothesis. We examined several pairs of proteins, including ribosomal proteins from different organisms. However, it turned out that either the PIDP values calculated for different proteins are close to each other, despite the different amino acid compositions (this is evident, for example, for the regions around the residues 150 or 200,
Figure 6 left), or the predictions of different programs are very different. For example, in
Figure 6 left the PIDP, calculated PONDR
® FIT and IsUnstruct differ in the region around the residue 25 for AaeL1, and in the region around residue 50 for HmaL1.
Figure 6 (right) shows superposition of the three-dimensional structure of domains II of AaeL1 and HmaL1 proteins. The average RMSD calculated from C
α atoms is 2.8 Å. Blue and green colors show the positions of regions chosen for introduction of disulfide bonds based on the predictions of PONDR
® FIT and IsUnstruct, respectively. Yellow spheres show the residues chosen for substitution to cysteine residues, D101 and K127 in AaeL1, and E82 and D114 in HmaL1 protein. Selection of amino acid residues for substitution to cysteines was made based on two criteria: 1) Distance between C
β-atoms of amino acid residues must be ~5 Å; 2) the residues must point to each other in space.
3.3.3. Investigation of the HmaL1–E82C-D114C Conformational Stability
Heat denaturation of the wild type HmaL1 and HmaL1–E82C–D114C mutant form was studied by CD spectroscopy.
Figure 7 shows the dependence of the fraction of native state of wild type HmaL1 and a mutant with inserted disulfide bonds, HmaL1–E82C–D114C, and modified by iodoacetamide, HmaL1–E82C–D114C mod, analyzed by changes in molar ellipticity at 215 nm. HmaL1 and HmaL1-E82C-D114C mod are melting at 55 °C, whereas HmaL1–E82C–D114C with an oxidized disulfide bond melts at 65 °C. Therefore, the substitutions for cysteine alone do not contribute significantly to protein stability, whereas the formation of a disulfide bond stabilizes the HmaL1protein. This confirms our suggestion that disulfide bonds inserted into a region predicted to be disordered (despite being structured in an actual protein) should lead to the increase in protein stability.
3.3.4. Investigation of the AaeL1–D101C–K127C Conformational Stability
Heat denaturation of the wild type AaeL1 and its variant with disulfide bond, AaeL1–D101C–K127C, was studied by circular dichroism spectroscopy. One should remember that the AaeL1 protein is from an extremophilic bacterium, and, as a result, it is characterized by a very high conformational stability. Therefore, thermal denaturation of this protein was investigated in the presence of 4 M urea.
Figure 8A shows curves of the temperature dependence of the molar ellipticity at 215 nm. It is clear that the curves coincide for the wild type protein and its mutant form with disulfide bond, AaeL1–D101C–K127C. To further validate the melting points of wild type AaeL1 and AaeL1–D101C–K127C, we studied the melting of these proteins by differential scanning microcalorimetry.
Figure 8B represents the melting curves of AaeL1 and its oxidized double cysteine mutant form with disulfide bond in the presence of 4 M urea. Melting points coincide for wild type and mutant variants, comprising 347 K. Therefore, it follows from the presented data that the introduction of a disulfide bond into the protein region, which is predicted by the PONDR
® FIT and IsUnstruct to be structured, does not lead to protein stabilization.
Therefore, studies on L1 and GFP confirmed our hypothesis that programs calculating intrinsic disorder propensities can be used for mapping the weakened regions of proteins. Hence, they can be used for designing stable proteins.
3.4. Stabilization of Gαo
One of the research groups of our institute had a task to obtain crystals of a Gαo protein from
D. melanogaster, but their multiple attempts were unsuccessful. We suggested to create a mutant of this protein, assuming that the stable form of this protein would be better suitable for crystallization than the wild type protein. Gαo consists of two domains. Similar to ribosomal L1 protein, the first domain is formed by the N– and С–terminal regions of Gαo (1–64 and 181–345), and domain II comprises its central region (65–180) [
45]. Domain I of Gαo protein possesses GTPase activity, and its amino acid sequence is conserved among various organisms. The sequence of the second domain varies in different organisms, and this variance can be the reason for instability or mobility of domain II of Gαo from
D. melanogaster, resulting in inability to crystallize this protein. Therefore, we decided to stabilize the second domain of Gαo protein using the proposed approach for rational design of artificial disulfide bond.
Figure 9 shows intrinsic disorder propensities calculated for amino acid sequence of Gαo. The plots built by PONDR
® FIT and IsUnstruct are apparently similar. Vertical dashed lines on
Figure 9 border the region of Gαo forming the domain II of this protein (residues 65–180). Both programs used for calculations predict quite high propensity of 90–120 aa region to be in disordered state. We decided to choose exactly this "weakened" region for stabilization by introduction of a disulfide bond.
Design of disulfide bond in Gαo from
D. Melanogaster and characterization of the resulting constructs are described in detail in our previous study [
20].
Figure 10 shows the differential scanning microcalorimetry generated melting curves of Gαo and its mutant form with introduced disulfide bond. Each of melting curves shows two maxima, which, as we showed [
20], are related to melting of two domains of Gαo.
Figure 10 shows that the position of the first melting peak coincides in wild type protein and its mutant form (T
m1~320К), whereas the second peak of the melting curve differs in wild type ( T
m2 = 329 K for wild type Gαo, а T
m2 = 333 K for the mutant). Therefore, the introduced disulfide bond stabilized one of the Gαo domains and increased its melting point by 4 degrees.
3.5. Searching for Stable Circular Permutant Variants of the GroEL Apical Domain
When designing complex fusion proteins, sometimes a task of creation of circular permutants arises. For this purpose, N– and C–termini of the protein are tailored with a linker, and a site for cleavage is selected in a protein structure, defining novel N– and C–termini of the protein. It often leads to dramatic destabilization of the protein. How can one select cleavage site in a protein without affecting its stability? We believe that the aforementioned approach based on the prediction of intrinsic disorder predisposition can be used for this purpose too. Particularly, intrinsic disorder propensities for circular permutants with different cleavage positions should be calculated, and the position resulting in maximal degree of order of the structure should be selected. This sub-project was inspired and initiated by our colleagues who studied apical domain of GroEL chaperone [
22,
23]. The protein chosen for a practical task to design a stable circular permutant was an apical domain of GroEL. For this purpose, we started with calculation of intrinsic disorder propensity of GroEL apical domain sequences with different positions of N– and C–termini.
Figure 11A shows intrinsic disorder propensity plots only for three proteins that were isolated and experimentally studied. Gro192 is a wild type GroEL apical domain, Gro207 is a domain with N– and C–termini tailored by a linker and cleavage at the amino acid residue 207, and Gro230 is a circular permutant with cleavage site at position 230. Intrinsic disorder propensities were calculated in analogous manner for all the variants of N– and C–end position with the step of 5 residues. For each of such plots, number of amino acid residues with disorder score above 0.5 can be calculated. For instance, in Gro192, this value equals to 10 for the N–terminal region and 10 for the C–tail; i.e., 20 amino acid residues have an PIDP > 0.5. Analogous calculation for sequences with different cleavage site position allows building the dependency of the number of amino acid residues with intrinsic disorder on the position of cleavage site in circular permutant. An example of such a plot is shown on
Figure 11B. A minimum on the plot means the lowest number of residues with IDP > 0.5; i.e., this variant can be regarded as most stable. Maximal value means the highest number of amino acids with IDP > 0.5; i.e., this circular permutant variant would be unstable. According to our calculations, the most stable variant of circular permutant would arise if the cleavage site would be designed at the 207 residue. For additional examination, we designed a circular permutant with cleavage site at the 230 residue. According to the plot on
Figure 11B, this variant should be similar to the wild type protein or even slightly destabilized.
Having isolated and purified the apical domain of GroEL and its two circular permutants, we studied their stability.
Figure 12A shows circular dichroism spectra and melting curves of the proteins. Far UV CD spectra of Gro192 and Gro207 appear to be almost identical. Their shape and intensity correspond to structured proteins. Shape of the Gro230 far-UV CD spectrum suggests the lowered degree of structure in this protein.
Figure 12B shows melting curves for Gro192 and Gro207 proteins. Since Gro230 aggregated during heating, it was impossible to record its melting curve. It can be defined from the plots on
Figure 12B that the circular permutant of Gro207 protein is destabilized by only two degrees compared to wild type protein, which can be considered as an undoubted success of the design of this mutant.
The results of the design of circular permutants described above does also confirm our assumption on feasibility of intrinsic disorder propensity predicting programs to search for weakened or stable protein regions, that could be the targets for introduction of diverse mutations.