Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Protein Database
2.1.1. The 2124 Representatives of Domain Superfamilies
2.1.2. The 3594 Representatives of Biological Assemblies
2.1.3. The 6 Example Proteins
- Endoglucanase A (PDB code: 1IS9), is a round, relatively large monomer. Use of the outlier residue detection methods should minimally affect its globularity metrics.
- Uncharacterized Protein (PDB code: 3BPD) is a toroidal homoheptamer. It can be easily put inside a bounding ellipsoid but also has a large cavity in its middle. The presence of this cavity should be decipherable from the globularity metrics.
- dUTPase YncF (PDB code: 4B0H) is a homotrimer with long, disordered C-terminal regions at M119–K144. They protrude from the monomers, but in the complex they wrap around the neighbor chains to secure the entire globular motif.
- HLA-DR Invariant Chain (PDB code: 1IIE) is another homotrimer with disordered fragments of the chains (S181–K192 regions at the C-termini) stretching away from the highly globular center of the mass of the molecule. This central “ball” is dense only in the complex—the monomers have a relatively loose tertiary structure.
- Alpha Synuclein (PDB code: 2KKW) is the biologically active (micelle-bound) form of Alpha Synuclein. It is a non-globular chain with a very loose tertiary structure. Its disordered C-terminal region at G101–A140 freely bends out of the plane formed by the two alpha helices that constitute the main body of the molecule.
- Ribosomal Protein L9 (PDB code: 1DIV) is the only multi-domain protein (M1–Q55 and R56–K149) in this group. Its homodimer, shaped like the letter Y, is non-globular.
2.2. The Structure Reference Terminology
2.3. The Ellipsoid Profile Algorithm
2.3.1. Preparation of the Structure
2.3.2. Minimum Volume Enclosing Ellipsoid
2.3.3. Detection of Outlier Residues
2.3.4. Alignment of the Molecule
2.3.5. Generation of the Grid
2.3.6. Voxelization of the Protein
2.3.7. Ellipsoid Indexes
2.3.8. Ellipsoid Profile
2.3.9. Globularity Classes
- the protein appears non-globular (class N) when EI0.3 < 0.3 or EI1.0 < 0.3;
- the protein appears semi-globular (class S) when 0.3 ≤ EI0.3 < 0.5 and EI1.0 ≥ 0.3;
- the protein appears globular (class G) when EI0.3 ≥ 0.5 and EI1.0 ≥ 0.3;
- the protein appears highly globular (class H) when EI0.3 ≥ 0.5 and EI1.0 ≥ 0.5;
- the protein appears unusual (class U, supplemental) when EI0.3 ≤ EI1.0;
- the protein appears elongated (class E, supplemental) when T ≥ 2;
2.3.10. Areas under the Profile
2.4. Tools and Websites
3. Results and Discussion
3.1. The Improved Ellipsoid Profile Algorithm
3.1.1. The Principal Component Analysis
3.1.2. The Confidence Ellipsoid
3.1.3. The FOD–PCA Algorithm
3.1.4. The Problem with the Kernel Density
- 1
- decrease susceptibility of the MVEE to the features of protein’s molecular surface;
- 2
- isolate significant outliers from the guides in structures where outliers are present;
- 3
- do not significantly impact the EP algorithm’s metrics if outliers are not present.
3.1.5. The PCA-Based Outlier Detection Subroutine
3.2. Improved Ellipsoid Profile of the Example Proteins
3.3. Improved Ellipsoid Profile of the Domain Superfamilies
3.4. Improved Ellipsoid Profile of the Biological Assemblies
3.4.1. Creation of the Database
- R and E are the crystallographic resolution and error coefficient, respectively;
- mi is the number of residues in REMARK 465 for the i-th chain in the assembly;
- ui is the number of residues in REMARK 475 for the i-th chain in the assembly;
- li is the length of the sequence (SEQRES records) of the i-th chain in the assembly;
- c is the total number of chains in the reconstructed assembly (i.e., after the application of all required symmetry operators from REMARK 350, between 1 and 60).
3.4.2. Analysis of the Database
- asymmetric—monomers with one (A1o) or many domains (A1m), dimers (A2), trimers (A3) and complexes of four or more chains (A4+);
- cyclic—homomers with C2 symmetry and one (C2o=) or many domains (C2m=), heteromers with C2 symmetry and one (C2o≠) or many domains (C2m≠), complexes with C3 symmetry (C3) and complexes with C4 or higher order symmetries (C4+);
- dihedral—complexes with D2 (D2), D3 (D3) and D4 or higher order symmetries (D4+);
- complexes with either tetrahedral, octahedral or icosahedral symmetries (TOI).
4. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Beckerman, M. Fundamentals of Neurodegeneration and Protein Misfolding Disorders. Biological and Medical Physics, Biomedical Engineering; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Pereira, J.; Simpkin, A.J.; Hartmann, M.D.; Rigden, D.J.; Keegan, R.M.; Lupas, A.N. High-accuracy protein structure prediction in CASP14. Proteins Struct. Funct. Bioinform. 2021, 89, 1687–1699. [Google Scholar] [CrossRef] [PubMed]
- Adcock, S.A.; McCammon, J.A. Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins. Chem. Rev. 2006, 106, 1589–1615. [Google Scholar] [CrossRef] [Green Version]
- Singh, N.; Li, W. Recent Advances in Coarse-Grained Models for Biomolecules and Their Applications. Int. J. Mol. Sci. 2019, 20, 3774. [Google Scholar] [CrossRef] [Green Version]
- Liwo, A.; Czaplewski, C.; Sieradzan, A.K.; Lipska, A.G.; Samsonov, S.A.; Murarka, R.K. Theory and Practice of Coarse-Grained Molecular Dynamics of Biologically Important Systems. Biomolecules 2021, 11, 1347. [Google Scholar] [CrossRef]
- Onuchic, J.N.; Luthey-Schulten, Z.; Wolynes, P.G. Theory of Protein Folding: The Energy Landscape Perspective. Annu. Rev. Phys. Chem. 1997, 48, 545–600. [Google Scholar] [CrossRef] [Green Version]
- Kuhlman, B.; Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019, 20, 681–697. [Google Scholar] [CrossRef]
- Kauzmann, W. Some Factors in the Interpretation of Protein Denaturation. Adv. Protein Chem. 1959, 14, 1–63. [Google Scholar] [CrossRef]
- Dill, K.A.; Truskett, T.M.; Vlachy, V.; Hribar-Lee, B. Modeling Water, the Hydrophobic Effect, and Ion Solvation. Annu. Rev. Biophys. Biomol. Struct. 2005, 34, 173–199. [Google Scholar] [CrossRef]
- Bellissent-Funel, M.-C.; Hassanali, A.; Havenith, M.; Henchman, R.; Pohl, P.; Sterpone, F.; van der Spoel, D.; Xu, Y.; Garcia, A.E. Water Determines the Structure and Dynamics of Proteins. Chem. Rev. 2016, 116, 7673–7697. [Google Scholar] [CrossRef] [PubMed]
- Konieczny, L.; Roterman, I. Summary. In From Globular Proteins to Amyloids; Elsevier: Amsterdam, The Netherlands, 2020; pp. 241–252. [Google Scholar] [CrossRef]
- Onufriev, A.V.; Izadi, S. Water models for biomolecular simulations. WIREs Comput. Mol. Sci. 2018, 8, e1347. [Google Scholar] [CrossRef]
- Knight, J.L.; Brooks, C.L. Surveying implicit solvent models for estimating small molecule absolute hydration free energies. J. Comput. Chem. 2011, 32, 2909–2923. [Google Scholar] [CrossRef] [Green Version]
- Konieczny, L.; Roterman, I. Introduction. In From Globular Proteins to Amyloids; Elsevier: Amsterdam, The Netherlands, 2020; pp. xiii–xx. [Google Scholar] [CrossRef]
- Konieczny, L.; Brylinski, M.; Roterman, I. Gauss-function-Based model of hydrophobicity density in proteins. Silico Biol. 2006, 6, 15–22. [Google Scholar]
- Banach, M.; Konieczny, L.; Roterman, I. The fuzzy oil drop model, based on hydrophobicity density distribution, generalizes the influence of water environment on protein structure and function. J. Theor. Biol. 2014, 359, 6–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Konieczny, L.; Roterman, I. Description of the fuzzy oil drop model. In From Globular Proteins to Amyloids; Elsevier: Amsterdam, The Netherlands, 2020; pp. 1–11. [Google Scholar] [CrossRef]
- Levitt, M. A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 1976, 104, 59–107. [Google Scholar] [CrossRef] [PubMed]
- Banach, M.; Chomilier, J.; Roterman, I. Contribution to the Understanding of Protein–Protein Interface and Ligand Binding Site Based on Hydrophobicity Distribution—Application to Ferredoxin I and II Cases. Appl. Sci. 2021, 11, 8514. [Google Scholar] [CrossRef]
- Banach, M. Assessment of Globularity of Protein Structures via Minimum Volume Ellipsoids and Voxel-Based Atom Representation. Crystals 2021, 11, 1539. [Google Scholar] [CrossRef]
- Khachiyan, L.G. Rounding of Polytopes in the Real Number Model of Computation. Math. Oper. Res. 1996, 21, 307–320. [Google Scholar] [CrossRef]
- Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
- Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
- Banach, M. Symmetrization in the Calculation Pipeline of Gauss Function-Based Modeling of Hydrophobicity in Protein Structures. Symmetry 2022, 14, 1876. [Google Scholar] [CrossRef]
- Jolliffe, I.T. Principal Component Analysis. In Springer Series in Statistics; Springer: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
- Jolicoeur, P. The multivariate normal distribution. In Introduction to Biometry; Springer: Boston, MA, USA, 1999; pp. 253–265. [Google Scholar] [CrossRef]
- Fox, N.K.; Brenner, S.E.; Chandonia, J.-M. SCOPe: Structural Classification of Proteins—Extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014, 42, D304–D309. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- SCOPe. Available online: https://scop.berkeley.edu/astral/subsets (accessed on 8 December 2022).
- Chandonia, J.-M.; Guan, L.; Lin, S.; Yu, C.; Fox, N.K.; Brenner, S.E. SCOPe: Improvements to the structural classification of proteins—Extended database to facilitate variant interpretation and machine learning. Nucleic Acids Res. 2021, 50, D553–D559. [Google Scholar] [CrossRef] [PubMed]
- Available online: https://scop.berkeley.edu (accessed on 8 December 2022).
- Burley, S.K.; Bhikadiya, C.; Bi, C.; Bittrich, S.; Chen, L.; Crichlow, G.V.; Christie, C.H.; Dalenberg, K.; Di Costanzo, L.; Duarte, J.M.; et al. RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2020, 49, D437–D451. [Google Scholar] [CrossRef]
- SCOPe. Available online: https://www.rcsb.org (accessed on 8 December 2022).
- Brenner, S.E.; Chothia, C.; Hubbard, T.J.; Murzin, A.G. Understanding protein structure: Using scop for fold interpretation. Methods Enzymol. 1996, 266, 635–643. [Google Scholar] [CrossRef]
- Majumdar, I.; Kinch, L.; Grishin, N.V. A Database of Domain Definitions for Proteins with Complex Interdomain Geometry. PLoS ONE 2009, 4, e5084. [Google Scholar] [CrossRef] [Green Version]
- Xu, Q.; Dunbrack, R.L. Principles and characteristics of biological assemblies in experimentally determined protein structures. Curr. Opin. Struct. Biol. 2019, 55, 34–49. [Google Scholar] [CrossRef]
- Elez, K.; Bonvin, A.M.J.J.; Vangone, A. Biological vs. Crystallographic Protein Interfaces: An Overview of Computational Approaches for Their Classification. Crystals 2020, 10, 114. [Google Scholar] [CrossRef] [Green Version]
- Levy, E.D.; Leal, J.P.; Chothia, C.; Teichmann, S. 3D Complex: A Structural Classification of Protein Complexes. PLOS Comput. Biol. 2006, 2, e155. [Google Scholar] [CrossRef]
- 3Dcomplex. Available online: https://shmoo.weizmann.ac.il/elevy/3dcomplexV6/Home.cgi (accessed on 8 December 2022).
- Schmidt, A.; Gonzalez, A.; Morris, R.J.; Costabel, M.; Alzari, P.M.; Lamzin, V.S. Advantages of high-resolution phasing: MAD to atomic resolution. Acta Crystallogr. Sect. D Biol. Crystallogr. 2002, 58, 1433–1441. [Google Scholar] [CrossRef] [Green Version]
- Eswaramoorthy, S.; Burley, S.K.; Sauder, J.M.; Swaminathan, S. Crystal Structure of an Uncharacterized Protein (O28723_ARCFU) from Archaeoglobus fulgidus. 2008. Available online: https://www.wwpdb.org/pdb?id=pdb_00003bpd (accessed on 8 December 2022).
- García-Nafría, J.; Timm, J.; Harrison, C.; Turkenburg, J.P.; Wilson, K.S. Tying down the arm in Bacillus dUTPase: Structure and mechanism. Acta Crystallogr. Sect. D Biol. Crystallogr. 2013, 69, 1367–1380. [Google Scholar] [CrossRef]
- Jasanoff, A.; Wagner, G.; Wiley, D.C. Structure of a trimeric domain of the MHC class II-associated chaperonin and targeting protein Ii. EMBO J. 1998, 17, 6812–6818. [Google Scholar] [CrossRef] [Green Version]
- Rao, J.N.; Jao, C.C.; Hegde, B.G.; Langen, R.; Ulmer, T.S. A Combinatorial NMR and EPR Approach for Evaluating the Structural Ensemble of Partially Folded Proteins. J. Am. Chem. Soc. 2010, 132, 8657–8668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hoffman, D.; Davies, C.; Gerchman, S.; Kycia, J.; Porter, S.; White, S.; Ramakrishnan, V. Crystal structure of prokaryotic ribosomal protein L9: A bi-lobed RNA-binding protein. EMBO J. 1994, 13, 205–212. [Google Scholar] [CrossRef]
- Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Hubbard, S.; Thornton, J. NACCESS, Computer Program; Department of Biochemistry Molecular Biology, University College: London, UK, 1993. [Google Scholar]
- Ribeiro, J.; Ríos-Vera, C.; Melo, F.; Schüller, A. Calculation of accurate interatomic contact surface areas for the quantitative analysis of non-bonded molecular interactions. Bioinformatics 2019, 35, 3499–3501. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- GitHub, Inc. Available online: https://github.com/nioroso-x3/dr_sasa_n (accessed on 8 December 2022).
- The PyMOL Molecular Graphics System; Version 2.0; Schrödinger, LLC: New York, NY, USA, 2017.
- PyMOL by Schrodinger. Available online: https://pymol.org (accessed on 8 December 2022).
- Sullivan, C.; Kaszynski, A. PyVista: 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). J. Open Source Softw. 2019, 4, 1450. [Google Scholar] [CrossRef]
- Schroeder, W.J.; Martin, K.M. The Visualization Toolkit. In Visualization Handbook; Elsevier: Amsterdam, The Netherlands, 2005; pp. 593–614. [Google Scholar] [CrossRef]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0 Contributors. SciPy 1.0 Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef] [Green Version]
- Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 498–520. [Google Scholar] [CrossRef]
- Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, R.; Collins, E.J.; Bourret, R.B.; Silversmith, R.E. Structure and catalytic mechanism of the E. coli chemotaxis phosphatase CheZ. Nat. Struct. Biol. 2002, 9, 570–575. [Google Scholar] [CrossRef] [PubMed]
- Nawrotek, A.; Knossow, M.; Gigant, B. The Determinants That Govern Microtubule Assembly from the Atomic Structure of GTP-Tubulin. J. Mol. Biol. 2011, 412, 35–42. [Google Scholar] [CrossRef] [PubMed]
- Antonyuk, S.V.; Ellis, M.J.; Strange, R.W.; Bessho, Y.; Kuramitsu, S.; Shinkai, A.; Yokoyama, S.; Hasnain, S.S. Structure of SurE protein from Aquifex aeolicus VF5 at 1.5 Å resolution. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 2009, 65, 1204–1208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fu, Z.-Q.; Du Bois, G.C.; Song, S.P.; Kulikovskaya, I.; Virgilio, L.; Rothstein, J.L.; Croce, C.M.; Weber, I.T.; Harrison, R.W. Crystal structure of MTCP-1: Implications for role of TCL-1 and MTCP-1 in T cell malignancies. Proc. Natl. Acad. Sci. USA 1998, 95, 3413–3418. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- PBD. Available online: https://www.rcsb.org/stats/summary (accessed on 8 December 2022).
- Krissinel, E.; Henrick, K. Inference of Macromolecular Assemblies from Crystalline State. J. Mol. Biol. 2007, 372, 774–797. [Google Scholar] [CrossRef]
- White, M.D.; Payne, K.A.P.; Fisher, K.; Marshall, S.A.; Parker, D.; Rattray, N.J.W.; Trivedi, D.K.; Goodacre, R.; Rigby, S.E.J.; Scrutton, N.S.; et al. UbiX is a flavin prenyltransferase required for bacterial ubiquinone biosynthesis. Nature 2015, 522, 502–506. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.C.-H.; Krucinski, J.; Miercke, L.J.W.; Finer-Moore, J.S.; Tang, A.H.; Leavitt, A.D.; Stroud, R.M. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: A model for viral DNA binding. Proc. Natl. Acad. Sci. USA 2000, 97, 8233–8238. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- MathWorks. Available online: https://www.mathworks.com/matlabcentral/fileexchange/9542-minimum-volume-enclosing-ellipsoid (accessed on 8 December 2022).
- Stack Overflow. Available online: https://stackoverflow.com/questions/14016898/port-matlab-bounding-ellipsoid-code-to-python (accessed on 8 December 2022).
PDB Code | Molecule | Source Organism | Chain Length | SCOP Domain(s) | Quaternary Structure | Resolution | Ref. |
---|---|---|---|---|---|---|---|
1IS9 | Endoglucanase A | Clostridium thermocellum | 363 aa! | a.102.1.2 | monomer | 1.03 Å | [41] |
3BPD | Uncharacterized Protein | Archaeoglobus fulgidus | 100 aa! | d.58.61.1 | homo-7-mer | 2.80 Å | [42] |
4B0H | dUTPase YncF | Bacillus subtilis | 144 aa! | b.85.4.0 | homo-3-mer | 1.18 Å | [43] |
1IIE | HLA-DR Invariant Chain | Homo sapiens | 75 aa | a.109.1.1 | homo-3-mer | NMR (1/20) | [44] |
2KKW | Alpha Synuclein | Homo sapiens | 140 aa | h.7.1.1 | monomer | NMR (1/34) | [45] |
1DIV | Ribosomal Protein L9 | Bacillus stearothermophilus | 149 aa | d.99.1.1 d.100.1.1 | homo-2-mer | 2.60 Å | [46] |
PDB Code | Selected Chains | Effective Atoms | Bounding Ellipsoid | Ellipsoid Index | Ellipsoid Profile | Globularity Classes | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
All | Guide | rx | ry | rz | V | T | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | |||
1IS9 | A | 358 | 354 | 27 | 24 | 23 | 14.9 | 0.57 | 0.589 | 0.561 | 0.541 | 0.590 | G and H |
3BPD | A–G | 638 | 612 | 39 | 39 | 23 | 35.0 | 0.63 | 0.207 | 0.344 | 0.000 | 0.275 | N and U |
4B0H | B | 143 | 111 | 24 | 19 | 13 | 5.9 | 0.75 | 0.597 | 0.498 | 0.526 | 0.560 | G |
1IIE | A–C | 225 | 198 | 22 | 22 | 18 | 8.7 | 0.55 | 0.713 | 0.592 | 0.781 | 0.662 | G and H |
2KKW | A | 140 | 119 | 62 | 23 | 19 | 27.1 | 1.48 | 0.086 | 0.084 | 0.107 | 0.087 | N |
1DIV | A+B | 298 | 244 | 45 | 35 | 22 | 34.6 | 0.79 | 0.300 | 0.198 | 0.493 | 0.261 | N |
PDB Code | Selected Chains | Effective Atoms | Bounding Ellipsoid | Ellipsoid Index | Ellipsoid Profile | Globularity Classes | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
All | Guide | rx | ry | rz | V | T | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | |||
1IS9 | A | 358 | 358 | 29 | 25 | 22 | 15.9 | 0.62 | 0.593 | 0.541 | 0.587 | 0.581 | G and H |
3BPD | A–G | 638 | 638 | 40 | 39 | 29 | 45.2 | 0.59 | 0.176 | 0.297 | 0.002 | 0.237 | N and U |
4B0H | B | 143 | 143 | 34 | 26 | 21 | 18.6 | 0.72 | 0.214 | 0.197 | 0.177 | 0.210 | N! |
1IIE | A–C | 225 | 225 | 44 | 44 | 22 | 42.6 | 0.67 | 0.387 | 0.189 | 0.413 | 0.295 | N! |
2KKW | A | 140 | 140 | 60 | 42 | 22 | 55.4 | 0.94 | 0.031 | 0.047 | 0.004 | 0.039 | N and U! |
1DIV | A+B | 298 | 298 | 50 | 50 | 23 | 57.5 | 0.69 | 0.227 | 0.149 | 0.462 | 0.200 | N |
Domains | Effective | Guides | V | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | Globularity Classes | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cl | cf | sf | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | N | S | G | H | U | E |
a | 290 | 519 | 134 | 90 | 93% | 5% | 6.3 | 5.3 | 0.55 | 0.11 | 0.49 | 0.09 | 0.52 | 0.17 | 0.53 | 0.10 | 23 | 73 | 423 | 309 | 43 | 21 |
b | 179 | 374 | 162 | 101 | 93% | 5% | 7.3 | 5.7 | 0.58 | 0.07 | 0.51 | 0.06 | 0.54 | 0.14 | 0.56 | 0.06 | 9 | 20 | 345 | 265 | 15 | 2 |
c | 147 | 246 | 233 | 105 | 95% | 4% | 10.6 | 5.9 | 0.58 | 0.05 | 0.51 | 0.05 | 0.59 | 0.10 | 0.55 | 0.05 | 2 | 10 | 234 | 168 | 5 | 0 |
d | 395 | 577 | 141 | 78 | 93% | 5% | 6.4 | 4.5 | 0.57 | 0.07 | 0.50 | 0.07 | 0.56 | 0.13 | 0.55 | 0.07 | 18 | 34 | 525 | 376 | 10 | 5 |
e | 73 | 73 | 360 | 204 | 92% | 5% | 20.3 | 16.7 | 0.51 | 0.10 | 0.44 | 0.08 | 0.51 | 0.15 | 0.48 | 0.09 | 5 | 16 | 52 | 17 | 3 | 0 |
f | 69 | 130 | 176 | 144 | 88% | 6% | 9.2 | 8.9 | 0.54 | 0.18 | 0.46 | 0.14 | 0.53 | 0.24 | 0.51 | 0.16 | 16 | 33 | 81 | 42 | 14 | 40 |
g | 98 | 139 | 61 | 31 | 93% | 5% | 2.4 | 2.0 | 0.60 | 0.12 | 0.53 | 0.11 | 0.62 | 0.20 | 0.57 | 0.11 | 4 | 15 | 120 | 98 | 10 | 3 |
h | 6 | 66 | 105 | 83 | 89% | 7% | 7.2 | 14.1 | 0.56 | 0.23 | 0.48 | 0.17 | 0.53 | 0.28 | 0.53 | 0.20 | 10 | 12 | 44 | 31 | 8 | 56 |
all | 1257 | 2124 | 157 | 113 | 93% | 5% | 7.4 | 7.2 | 0.57 | 0.10 | 0.50 | 0.09 | 0.55 | 0.16 | 0.54 | 0.09 | 87 | 213 | 1824 | 1306 | 108 | 127 |
Domains | Effective | Guides | V | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | Globularity Classes | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cl | cf | sf | μ | σ | 100% | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | N | S | G | H | U | E |
a | 290 | 519 | 134 | 90 | 100% | 9.5 | 8.7 | 0.51 | 0.13 | 0.41 | 0.11 | 0.51 | 0.19 | 0.47 | 0.12 | 78 | 106 | 335 | 117 | 37 | 13 |
b | 179 | 374 | 162 | 101 | 100% | 11.1 | 10.8 | 0.54 | 0.10 | 0.43 | 0.09 | 0.54 | 0.15 | 0.50 | 0.10 | 43 | 31 | 300 | 76 | 8 | 1 |
c | 147 | 246 | 233 | 105 | 100% | 14.4 | 8.9 | 0.55 | 0.08 | 0.44 | 0.07 | 0.57 | 0.12 | 0.51 | 0.07 | 10 | 39 | 197 | 56 | 2 | 0 |
d | 395 | 577 | 141 | 78 | 100% | 9.5 | 9.8 | 0.53 | 0.11 | 0.43 | 0.09 | 0.55 | 0.15 | 0.49 | 0.10 | 71 | 64 | 442 | 130 | 9 | 2 |
e | 73 | 73 | 360 | 204 | 100% | 31.4 | 29.4 | 0.46 | 0.11 | 0.36 | 0.09 | 0.48 | 0.15 | 0.42 | 0.10 | 21 | 21 | 31 | 0 | 3 | 0 |
f | 69 | 130 | 176 | 144 | 100% | 15.6 | 15.2 | 0.43 | 0.18 | 0.35 | 0.13 | 0.43 | 0.25 | 0.40 | 0.15 | 46 | 35 | 49 | 13 | 21 | 30 |
g | 98 | 139 | 61 | 31 | 100% | 3.7 | 3.4 | 0.55 | 0.14 | 0.46 | 0.12 | 0.59 | 0.20 | 0.51 | 0.13 | 18 | 19 | 102 | 57 | 5 | 0 |
h | 6 | 66 | 105 | 83 | 100% | 12.3 | 18.8 | 0.44 | 0.23 | 0.37 | 0.18 | 0.42 | 0.30 | 0.41 | 0.21 | 28 | 11 | 27 | 17 | 19 | 46 |
all | 1257 | 2124 | 157 | 113 | 100% | 11.2 | 12.3 | 0.52 | 0.13 | 0.42 | 0.11 | 0.53 | 0.18 | 0.48 | 0.12 | 315 | 326 | 1483 | 466 | 104 | 92 |
Assemblies | Effective | Guides | V | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | Globularity Classes | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Group | Count | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | N | S | G | H | U | E |
A1o | 657 | 187 | 119 | 95% | 4% | 8.6 | 7.1 | 0.58 | 0.06 | 0.51 | 0.06 | 0.56 | 0.13 | 0.56 | 0.06 | 8 | 43 | 606 | 464 | 17 | 7 |
A1m | 527 | 416 | 223 | 93% | 4% | 23.4 | 15.7 | 0.52 | 0.08 | 0.45 | 0.07 | 0.51 | 0.15 | 0.49 | 0.07 | 17 | 132 | 378 | 101 | 17 | 13 |
A2 | 352 | 445 | 227 | 93% | 4% | 25.7 | 17.2 | 0.52 | 0.07 | 0.44 | 0.06 | 0.53 | 0.13 | 0.49 | 0.07 | 10 | 92 | 250 | 51 | 4 | 4 |
A3 | 169 | 607 | 285 | 92% | 3% | 36.9 | 21.5 | 0.49 | 0.06 | 0.41 | 0.05 | 0.48 | 0.13 | 0.46 | 0.05 | 5 | 95 | 69 | 3 | 2 | 2 |
A4+ | 103 | 1589 | 1597 | 92% | 4% | 133.8 | 244.1 | 0.42 | 0.10 | 0.35 | 0.07 | 0.37 | 0.19 | 0.39 | 0.09 | 27 | 54 | 22 | 0 | 10 | 0 |
C2o= | 537 | 392 | 324 | 94% | 4% | 21.5 | 23.6 | 0.53 | 0.08 | 0.46 | 0.06 | 0.46 | 0.15 | 0.51 | 0.07 | 14 | 115 | 408 | 164 | 19 | 2 |
C2m= | 328 | 784 | 401 | 93% | 4% | 48.2 | 28.2 | 0.48 | 0.07 | 0.41 | 0.06 | 0.43 | 0.15 | 0.45 | 0.07 | 19 | 148 | 161 | 10 | 12 | 4 |
C2o≠ | 83 | 807 | 408 | 92% | 4% | 48.0 | 25.0 | 0.49 | 0.07 | 0.41 | 0.05 | 0.42 | 0.16 | 0.46 | 0.06 | 2 | 40 | 41 | 2 | 4 | 0 |
C2m≠ | 81 | 1449 | 1001 | 91% | 5% | 102.2 | 79.5 | 0.43 | 0.08 | 0.36 | 0.07 | 0.36 | 0.18 | 0.41 | 0.08 | 15 | 47 | 19 | 2 | 5 | 2 |
C3 | 184 | 799 | 586 | 94% | 5% | 50.4 | 42.8 | 0.49 | 0.09 | 0.41 | 0.07 | 0.34 | 0.17 | 0.46 | 0.08 | 17 | 68 | 99 | 18 | 17 | 3 |
C4+ | 109 | 1256 | 918 | 96% | 4% | 87.7 | 78.2 | 0.40 | 0.12 | 0.39 | 0.07 | 0.16 | 0.16 | 0.40 | 0.09 | 27 | 55 | 27 | 7 | 48 | 0 |
D2 | 224 | 1172 | 736 | 95% | 4% | 75.5 | 58.3 | 0.47 | 0.08 | 0.41 | 0.06 | 0.28 | 0.16 | 0.45 | 0.07 | 12 | 115 | 97 | 12 | 29 | 1 |
D3 | 134 | 1772 | 1001 | 97% | 3% | 121.9 | 80.9 | 0.43 | 0.10 | 0.39 | 0.06 | 0.16 | 0.13 | 0.41 | 0.08 | 21 | 77 | 36 | 9 | 34 | 1 |
D4+ | 79 | 2323 | 1629 | 98% | 2% | 165.6 | 140.3 | 0.35 | 0.12 | 0.37 | 0.06 | 0.05 | 0.07 | 0.36 | 0.08 | 26 | 49 | 4 | 0 | 43 | 0 |
TOI | 27 | 5080 | 3546 | 99% | 1% | 378.9 | 331.5 | 0.23 | 0.16 | 0.33 | 0.08 | 0.02 | 0.05 | 0.28 | 0.11 | 18 | 8 | 1 | 0 | 21 | 0 |
all | 3594 | 702 | 895 | 94% | 4% | 45.1 | 79.6 | 0.51 | 0.10 | 0.44 | 0.08 | 0.44 | 0.19 | 0.48 | 0.09 | 238 | 1138 | 2218 | 843 | 282 | 39 |
Assemblies | Effective | Guides | V | EI0.3 | EI1.0 | |EP|0.1 | |EP|1.0 | Globularity Classes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Group | Count | μ | σ | 100% | μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | N | S | G | H | U | E |
A1o | 657 | 187 | 119 | 100% | 11.8 | 10.3 | 0.55 | 0.08 | 0.45 | 0.08 | 0.56 | 0.13 | 0.51 | 0.08 | 29 | 85 | 543 | 184 | 10 | 3 |
A1m | 527 | 416 | 223 | 100% | 34.4 | 25.7 | 0.48 | 0.10 | 0.37 | 0.08 | 0.48 | 0.16 | 0.44 | 0.09 | 84 | 169 | 274 | 15 | 10 | 10 |
A2 | 352 | 445 | 227 | 100% | 39.0 | 28.9 | 0.48 | 0.09 | 0.36 | 0.07 | 0.51 | 0.14 | 0.43 | 0.08 | 77 | 96 | 179 | 4 | 5 | 1 |
A3 | 169 | 607 | 285 | 100% | 56.4 | 38.2 | 0.44 | 0.07 | 0.33 | 0.06 | 0.47 | 0.14 | 0.39 | 0.06 | 41 | 93 | 35 | 0 | 1 | 0 |
A4+ | 103 | 1589 | 1597 | 100% | 199.0 | 271.3 | 0.37 | 0.11 | 0.28 | 0.07 | 0.35 | 0.18 | 0.33 | 0.09 | 52 | 44 | 7 | 0 | 6 | 0 |
C2o= | 537 | 392 | 324 | 100% | 30.5 | 34.4 | 0.50 | 0.10 | 0.40 | 0.08 | 0.45 | 0.15 | 0.46 | 0.09 | 75 | 127 | 335 | 43 | 17 | 3 |
C2m= | 328 | 784 | 401 | 100% | 70.8 | 42.9 | 0.44 | 0.09 | 0.34 | 0.07 | 0.41 | 0.15 | 0.40 | 0.07 | 93 | 148 | 87 | 2 | 8 | 4 |
C2o≠ | 83 | 807 | 408 | 100% | 71.9 | 39.5 | 0.44 | 0.08 | 0.34 | 0.06 | 0.43 | 0.16 | 0.40 | 0.07 | 20 | 41 | 22 | 0 | 3 | 0 |
C2m≠ | 81 | 1449 | 1001 | 100% | 170.8 | 146.4 | 0.39 | 0.10 | 0.28 | 0.08 | 0.35 | 0.18 | 0.34 | 0.09 | 49 | 20 | 12 | 1 | 1 | 1 |
C3 | 184 | 799 | 586 | 100% | 71.1 | 64.8 | 0.46 | 0.10 | 0.36 | 0.08 | 0.34 | 0.17 | 0.41 | 0.09 | 49 | 60 | 75 | 6 | 14 | 3 |
C4+ | 109 | 1256 | 918 | 100% | 107.7 | 93.5 | 0.38 | 0.12 | 0.35 | 0.08 | 0.16 | 0.16 | 0.36 | 0.10 | 42 | 47 | 20 | 4 | 39 | 0 |
D2 | 224 | 1172 | 736 | 100% | 100.1 | 78.7 | 0.46 | 0.08 | 0.36 | 0.07 | 0.29 | 0.16 | 0.42 | 0.07 | 51 | 95 | 78 | 2 | 12 | 0 |
D3 | 134 | 1772 | 1001 | 100% | 148.4 | 103.7 | 0.43 | 0.10 | 0.36 | 0.07 | 0.18 | 0.14 | 0.40 | 0.08 | 32 | 70 | 32 | 4 | 24 | 0 |
D4+ | 79 | 2323 | 1629 | 100% | 186.2 | 156.3 | 0.35 | 0.12 | 0.36 | 0.07 | 0.06 | 0.08 | 0.35 | 0.08 | 31 | 44 | 4 | 0 | 41 | 0 |
TOI | 27 | 5080 | 3546 | 100% | 390.6 | 329.4 | 0.23 | 0.16 | 0.33 | 0.08 | 0.03 | 0.05 | 0.28 | 0.11 | 16 | 10 | 1 | 0 | 19 | 0 |
all | 3594 | 702 | 895 | 100% | 61.6 | 95.0 | 0.47 | 0.11 | 0.37 | 0.09 | 0.43 | 0.19 | 0.43 | 0.10 | 741 | 1149 | 1704 | 265 | 210 | 25 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Banach, M. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB. Biomolecules 2023, 13, 385. https://doi.org/10.3390/biom13020385
Banach M. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB. Biomolecules. 2023; 13(2):385. https://doi.org/10.3390/biom13020385
Chicago/Turabian StyleBanach, Mateusz. 2023. "Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB" Biomolecules 13, no. 2: 385. https://doi.org/10.3390/biom13020385
APA StyleBanach, M. (2023). Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB. Biomolecules, 13(2), 385. https://doi.org/10.3390/biom13020385