Recent Progress of Protein Tertiary Structure Prediction
Abstract
:1. Introduction
2. An Overview of Protein Structure Prediction
2.1. Template-Based Modeling (TBM) Methods
2.2. Fragment Assembly Simulation Methods for Free Modeling (FM)
2.3. Contact-Based Protein Structure Prediction
2.4. Distance-Based Protein Structure Prediction
2.5. End-to-End Protein Structure Prediction
2.6. Protein Language Model-Based Protein Structure Prediction
2.7. Multi-Domain Protein Structure Prediction
2.8. CASP and Most Recent CASP Results
2.9. AlphaFold Protein Structure Database (AlphaFold DB)
3. Discussion and Perspective
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
References
- Anfinsen, C.B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef]
- Sanger, F.; Nicklen, S.; Coulson, A.R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 1977, 74, 5463–5467. [Google Scholar] [CrossRef]
- Venter, J.C.; Adams, M.D.; Myers, E.W.; Li, P.W.; Mural, R.J.; Sutton, G.G.; Smith, H.O.; Yandell, M.; Evans, C.A.; Holt, R.A.; et al. The Sequence of the Human Genome. Science 2001, 291, 1304–1351. [Google Scholar] [CrossRef]
- Metzker, M.L. Sequencing technologies—The next generation. Nat. Rev. Genet. 2010, 11, 31–46. [Google Scholar] [CrossRef] [PubMed]
- Bairoch, A.; Apweiler, R.; Wu, C.H.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2005, 33 (Suppl. 1), D154–D159. [Google Scholar] [CrossRef] [PubMed]
- Glusker, J.P. X-ray crystallography of proteins. Methods Biochem. Anal. 1994, 37, 1–72. [Google Scholar] [CrossRef] [PubMed]
- Cavanagh, J. Protein NMR Spectroscopy: Principles and Practice; Academic Press: Cambridge, MA, USA, 1996. [Google Scholar]
- Cheng, Y. Single-Particle Cryo-EM at Crystallographic Resolution. Cell 2015, 161, 450–457. [Google Scholar] [CrossRef] [PubMed]
- Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [PubMed]
- The UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2008, 36 (Suppl. 1), D190–D195. [Google Scholar] [CrossRef]
- Levitt, M.; Warshel, A. Computer simulation of protein folding. Nature 1975, 253, 694–698. [Google Scholar] [CrossRef]
- Lewis, P.N.; Momany, F.A.; Scheraga, H.A. Folding of Polypeptide Chains in Proteins: A Proposed Mechanism for Folding. Proc. Natl. Acad. Sci. USA 1971, 68, 2293–2297. [Google Scholar] [CrossRef]
- McCammon, J.A.; Gelin, B.R.; Karplus, M. Dynamics of folded proteins. Nature 1977, 267, 585–590. [Google Scholar] [CrossRef]
- Bowie, J.U.; Lüthy, R.; Eisenberg, D. A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional Structure. Science 1991, 253, 164–170. [Google Scholar] [CrossRef]
- Skolnick, J.; Kolinski, A. Simulations of the Folding of a Globular Protein. Science 1990, 250, 1121–1125. [Google Scholar] [CrossRef]
- Šali, A.; Blundell, T.L. Comparative Protein Modelling by Satisfaction of Spatial Restraints. J. Mol. Biol. 1993, 234, 779–815. [Google Scholar] [CrossRef]
- Simons, K.T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 1997, 268, 209–225. [Google Scholar] [CrossRef]
- Roy, A.; Kucukural, A.; Zhang, Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 2010, 5, 725–738. [Google Scholar] [CrossRef]
- Xu, D.; Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins Struct. Funct. Bioinform. 2012, 80, 1715–1735. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Yan, R.; Roy, A.; Xu, D.; Poisson, J.; Zhang, Y. The I-TASSER Suite: Protein structure and function prediction. Nat. Methods 2015, 12, 7–8. [Google Scholar] [CrossRef] [PubMed]
- Ovchinnikov, S.; Park, H.; Varghese, N.; Huang, P.-S.; Pavlopoulos, G.A.; Kim, D.E.; Kamisetty, H.; Kyrpides, N.C.; Baker, D. Protein structure determination using metagenome sequence data. Science 2017, 355, 294–298. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput. Biol. 2017, 13, e1005324. [Google Scholar] [CrossRef]
- Zheng, W.; Li, Y.; Zhang, C.; Pearce, R.; Mortuza, S.M.; Zhang, Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins Struct. Funct. Bioinform. 2019, 87, 1149–1164. [Google Scholar] [CrossRef] [PubMed]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
- Yang, J.; Anishchenko, I.; Park, H.; Peng, Z.; Ovchinnikov, S.; Baker, D. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 2020, 117, 1496–1503. [Google Scholar] [CrossRef]
- Fischer, D.; Eisenberg, D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc. Natl. Acad. Sci. USA 1997, 94, 11929–11934. [Google Scholar] [CrossRef] [PubMed]
- Sánchez, R.; Šali, A. Evaluation of comparative protein structure modeling by MODELLER-3. Proteins Struct. Funct. Bioinform. 1997, 29, 50–58. [Google Scholar] [CrossRef]
- Zhang, Y.; Skolnick, J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA 2004, 101, 7594–7599. [Google Scholar] [CrossRef]
- Malmström, L.; Riffle, M.; Strauss, C.E.M.; Chivian, D.; Davis, T.N.; Bonneau, R.; Baker, D. Superfamily Assignments for the Yeast Proteome through Integration of Structure Prediction with the Gene Ontology. PLoS Biol. 2007, 5, e76. [Google Scholar] [CrossRef]
- Mukherjee, S.; Szilagyi, A.; Roy, A.; Zhang, Y. Genome-Wide Protein Structure Prediction. In Multiscale Approaches to Protein Modeling: Structure Prediction, Dynamics, Thermodynamics and Macromolecular Assemblies; Kolinski, A., Ed.; Springer: New York, NY, USA, 2011; pp. 255–279. [Google Scholar]
- Xu, D.; Zhang, Y. Ab Initio structure prediction for Escherichia coli: Towards genome-wide protein structure modeling and fold assignment. Sci. Rep. 2013, 3, 1895. [Google Scholar] [CrossRef]
- Zhang, C.; Zheng, W.; Cheng, M.; Omenn, G.S.; Freddolino, P.L.; Zhang, Y. Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome. J. Proteome Res. 2021, 20, 1178–1189. [Google Scholar] [CrossRef]
- Kim, D.E.; Chivian, D.; Baker, D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004, 32 (Suppl. 2), W526–W531. [Google Scholar] [CrossRef]
- Kelley, L.A.; Sternberg, M.J.E. Protein structure prediction on the Web: A case study using the Phyre server. Nat. Protoc. 2009, 4, 363–371. [Google Scholar] [CrossRef]
- Schwede, T.; Kopp, J.R.; Guex, N.; Peitsch, M.C. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003, 31, 3381–3385. [Google Scholar] [CrossRef]
- Söding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33 (Suppl. 2), W244–W248. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Eickholt, J.; Cheng, J. MULTICOM: A multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 2010, 26, 882–888. [Google Scholar] [CrossRef] [PubMed]
- Källberg, M.; Wang, H.; Wang, S.; Peng, J.; Wang, Z.; Lu, H.; Xu, J. Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 2012, 7, 1511–1522. [Google Scholar] [CrossRef]
- Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. USA 2019, 116, 16856–16865. [Google Scholar] [CrossRef]
- Vaidehi, N.; Floriano, W.B.; Trabanino, R.; Hall, S.E.; Freddolino, P.; Choi, E.J.; Zamanakos, G.; Goddard, W.A. Prediction of structure and function of G protein-coupled receptors. Proc. Natl. Acad. Sci. USA 2002, 99, 12622–12627. [Google Scholar] [CrossRef]
- Zhang, Y.; Thiele, I.; Weekes, D.; Li, Z.; Jaroszewski, L.; Ginalski, K.; Deacon, A.M.; Wooley, J.; Lesley, S.A.; Wilson, I.A.; et al. Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima. Science 2009, 325, 1544–1549. [Google Scholar] [CrossRef] [PubMed]
- Loewenstein, Y.; Raimondo, D.; Redfern, O.C.; Watson, J.; Frishman, D.; Linial, M.; Orengo, C.; Thornton, J.; Tramontano, A. Protein function annotation by homology-based inference. Genome Biol. 2009, 10, 207. [Google Scholar] [CrossRef]
- Radivojac, P.; Clark, W.T.; Oron, T.R.; Schnoes, A.M.; Wittkop, T.; Sokolov, A.; Graim, K.; Funk, C.; Verspoor, K.; Ben-Hur, A.; et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 2013, 10, 221–227. [Google Scholar] [CrossRef]
- Zhang, C.; Zheng, W.; Huang, X.; Bell, E.W.; Zhou, X.; Zhang, Y. Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1. J. Proteome Res. 2020, 19, 1351–1360. [Google Scholar] [CrossRef]
- Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005, 33 (Suppl. 2), W306–W310. [Google Scholar] [CrossRef]
- Tokuriki, N.; Tawfik, D.S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 2009, 19, 596–604. [Google Scholar] [CrossRef]
- Quan, L.; Lv, Q.; Zhang, Y. STRUM: Structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 2016, 32, 2936–2946. [Google Scholar] [CrossRef] [PubMed]
- Porta-Pardo, E.; Hrabe, T.; Godzik, A. Cancer3D: Understanding cancer mutations through protein structures. Nucleic Acids Res. 2015, 43, D968–D973. [Google Scholar] [CrossRef] [PubMed]
- Pires, D.E.V.; Ascher, D.B.; Blundell, T.L. mCSM: Predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 2014, 30, 335–342. [Google Scholar] [CrossRef] [PubMed]
- Porta-Pardo, E.; Godzik, A. Mutation Drivers of Immunological Responses to Cancer. Cancer Immunol. Res. 2016, 4, 789–798. [Google Scholar] [CrossRef] [PubMed]
- Sundaram, L.; Gao, H.; Padigepati, S.R.; McRae, J.F.; Li, Y.; Kosmicki, J.A.; Fritzilas, N.; Hakenberg, J.; Dutta, A.; Shon, J.; et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 2018, 50, 1161–1170. [Google Scholar] [CrossRef]
- Woodard, J.; Zhang, C.; Zhang, Y. ADDRESS: A Database of Disease-associated Human Variants Incorporating Protein Structure and Folding Stabilities. J. Mol. Biol. 2021, 433, 166840. [Google Scholar] [CrossRef] [PubMed]
- Evers, A.; Klebe, G. Successful Virtual Screening for a Submicromolar Antagonist of the Neurokinin-1 Receptor Based on a Ligand-Supported Homology Model. J. Med. Chem. 2004, 47, 5381–5392. [Google Scholar] [CrossRef]
- Klebe, G. Virtual ligand screening: Strategies, perspectives and limitations. Drug Discov. Today 2006, 11, 580–594. [Google Scholar] [CrossRef]
- Zhou, H.; Skolnick, J. FINDSITEX: A Structure-Based, Small Molecule Virtual Screening Approach with Application to All Identified Human GPCRs. Mol. Pharm. 2012, 9, 1775–1784. [Google Scholar] [CrossRef]
- Roy, A.; Zhang, Y. Recognizing Protein-Ligand Binding Sites by Global Structural Alignment and Local Geometry Refinement. Structure 2012, 20, 987–997. [Google Scholar] [CrossRef] [PubMed]
- Vajda, S.; Guarnieri, F. Characterization of protein-ligand interaction sites using experimental and computational methods. Curr. Opin. Drug Discov. Dev. 2006, 9, 354–362. [Google Scholar]
- Choudhary, S.; Malik, Y.S.; Tomar, S. Identification of SARS-CoV-2 Cell Entry Inhibitors by Drug Repurposing Using in silico Structure-Based Virtual Screening Approach. Front. Immunol. 2020, 11, 1664. [Google Scholar] [CrossRef] [PubMed]
- Chan, W.K.B.; Zhang, Y. Virtual Screening of Human Class-A GPCRs Using Ligand Profiles Built on Multiple Ligand–Receptor Interactions. J. Mol. Biol. 2020, 432, 4872–4890. [Google Scholar] [CrossRef] [PubMed]
- Kuntz, I.D. Structure-Based Strategies for Drug Design and Discovery. Science 1992, 257, 1078–1082. [Google Scholar] [CrossRef] [PubMed]
- Drews, J. Drug Discovery: A Historical Perspective. Science 2000, 287, 1960–1964. [Google Scholar] [CrossRef] [PubMed]
- Evers, A.; Klabunde, T. Structure-based Drug Discovery Using GPCR Homology Modeling: Successful Virtual Screening for Antagonists of the Alpha1A Adrenergic Receptor. J. Med. Chem. 2005, 48, 1088–1097. [Google Scholar] [CrossRef] [PubMed]
- Ekins, S.; Mestres, J.; Testa, B. In silico pharmacology for drug discovery: Applications to targets and beyond. Br. J. Pharmacol. 2007, 152, 21–37. [Google Scholar] [CrossRef]
- Shan, Y.; Kim, E.T.; Eastwood, M.P.; Dror, R.O.; Seeliger, M.A.; Shaw, D.E. How Does a Drug Molecule Find Its Target Binding Site? J. Am. Chem. Soc. 2011, 133, 9181–9183. [Google Scholar] [CrossRef] [PubMed]
- Han, X.; Wang, C.; Qin, C.; Xiang, W.; Fernandez-Salas, E.; Yang, C.-Y.; Wang, M.; Zhao, L.; Xu, T.; Chinnaswamy, K.; et al. Discovery of ARD-69 as a Highly Potent Proteolysis Targeting Chimera (PROTAC) Degrader of Androgen Receptor (AR) for the Treatment of Prostate Cancer. J. Med. Chem. 2019, 62, 941–964. [Google Scholar] [CrossRef]
- Pearce, R.; Zhang, Y. Toward the solution of the protein structure prediction problem. J. Biol. Chem. 2021, 297, 100870. [Google Scholar] [CrossRef] [PubMed]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
- Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197. [Google Scholar] [CrossRef]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef] [PubMed]
- Wu, S.; Zhang, Y. MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins Struct. Funct. Bioinform. 2008, 72, 547–556. [Google Scholar] [CrossRef]
- Eddy, S.R. Profile hidden Markov models. Bioinformatics 1998, 14, 755–763. [Google Scholar] [CrossRef]
- Söding, J. Protein homology detection by HMM–HMM comparison. Bioinformatics 2005, 21, 951–960. [Google Scholar] [CrossRef]
- Buchan, D.W.A.; Jones, D.T. EigenTHREADER: Analogous protein fold recognition by efficient contact map threading. Bioinformatics 2017, 33, 2684–2690. [Google Scholar] [CrossRef]
- Zheng, W.; Wuyun, Q.; Li, Y.; Mortuza, S.M.; Zhang, C.; Pearce, R.; Ruan, J.; Zhang, Y. Detecting distant-homology protein structures by aligning deep neural-network based contact maps. PLoS Comput. Biol. 2019, 15, e1007411. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, S.; Bu, D.; Xu, J. Protein threading using residue co-variation and deep learning. Bioinformatics 2018, 34, i263–i273. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, S.; Roche, R.; Moussad, B.; Bhattacharya, D. DisCovER: Distance- and orientation-based covariational threading for weakly homologous proteins. Proteins Struct. Funct. Bioinform. 2022, 90, 579–588. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Wuyun, Q.; Zhou, X.; Li, Y.; Freddolino, P.L.; Zhang, Y. LOMETS3: Integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res. 2022, 50, W454–W464. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Zhang, C.; Wuyun, Q.; Pearce, R.; Li, Y.; Zhang, Y. LOMETS2: Improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019, 47, W429–W436. [Google Scholar] [CrossRef] [PubMed]
- Wu, S.; Zhang, Y. LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Res. 2007, 35, 3375–3382. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Zheng, W.; Mortuza, S.M.; Li, Y.; Zhang, Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 2020, 36, 2105–2112. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Shen, Y. Template-based prediction of protein structure with deep learning. BMC Genom. 2020, 21, 878. [Google Scholar] [CrossRef]
- Gao, M.; Skolnick, J. A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 2021, 37, 490–496. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef] [PubMed]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
- Kaminski, K.; Ludwiczak, J.; Pawlicki, K.; Alva, V.; Dunin-Horkawicz, S. pLM-BLAST: Distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics 2023, 39, btad579. [Google Scholar] [CrossRef] [PubMed]
- Pantolini, L.; Studer, G.; Pereira, J.; Durairaj, J.; Tauriello, G.; Schwede, T. Embedding-based alignment: Combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone. Bioinformatics 2024, 40, btad786. [Google Scholar] [CrossRef] [PubMed]
- Llinares-López, F.; Berthet, Q.; Blondel, M.; Teboul, O.; Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods 2023, 20, 104–111. [Google Scholar] [CrossRef] [PubMed]
- James, T.M.; Charlie, E.M.S.; Robert, B.; Daniel, B.; Vladimir, G.; Richard, B. Protein Structural Alignments From Sequence. bioRxiv 2020. [Google Scholar] [CrossRef]
- Meier, A.; Söding, J. Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling. PLoS Comput. Biol. 2015, 11, e1004343. [Google Scholar] [CrossRef]
- Zheng, W.; Zhang, C.; Bell, E.W.; Zhang, Y. I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. Future Gener. Comput. Syst. 2019, 99, 73–85. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, Y. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res. 2015, 43, W174–W181. [Google Scholar] [CrossRef]
- Zhang, Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins Struct. Funct. Bioinform. 2007, 69, 108–117. [Google Scholar] [CrossRef] [PubMed]
- Song, Y.; DiMaio, F.; Wang, R.Y.-R.; Kim, D.; Miles, C.; Brunette, T.J.; Thompson, J.; Baker, D. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21, 1735–1742. [Google Scholar] [CrossRef] [PubMed]
- Piana, S.; Klepeis, J.L.; Shaw, D.E. Assessing the accuracy of physical models used in protein-folding simulations: Quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 2014, 24, 98–105. [Google Scholar] [CrossRef] [PubMed]
- Bowie, J.U.; Eisenberg, D. An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. Proc. Natl. Acad. Sci. USA 1994, 91, 4436–4440. [Google Scholar] [CrossRef] [PubMed]
- Göbel, U.; Sander, C.; Schneider, R.; Valencia, A. Correlated mutations and residue contacts in proteins. Proteins Struct. Funct. Bioinform. 1994, 18, 309–317. [Google Scholar] [CrossRef] [PubMed]
- Thomas, D.J.; Casari, G.; Sander, C. The prediction of protein contacts from multiple sequence alignments. Protein Eng. Des. Sel. 1996, 9, 941–948. [Google Scholar] [CrossRef] [PubMed]
- Chiu, D.K.Y.; Kolodziejczak, T. Inferring consensus structure from nucleic acid sequences. Bioinformatics 1991, 7, 347–352. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Zhang, C.; Bell, E.W.; Yu, D.-J.; Zhang, Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins Struct. Funct. Bioinform. 2019, 87, 1082–1091. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, C.; Bell, E.W.; Zheng, W.; Zhou, X.; Yu, D.-J.; Zhang, Y. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput. Biol. 2021, 17, e1008865. [Google Scholar] [CrossRef]
- Adhikari, B.; Cheng, J. CONFOLD2: Improved contact-driven ab initio protein structure modeling. BMC Bioinform. 2018, 19, 22. [Google Scholar] [CrossRef]
- Li, Y.; Hu, J.; Zhang, C.; Yu, D.-J.; Zhang, Y. ResPRE: High-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2019, 35, 4647–4655. [Google Scholar] [CrossRef] [PubMed]
- Ding, W.; Gong, H. Predicting the Real-Valued Inter-Residue Distances for Proteins. Adv. Sci. 2020, 7, 2001314. [Google Scholar] [CrossRef] [PubMed]
- Greener, J.G.; Kandathil, S.M.; Jones, D.T. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019, 10, 3977. [Google Scholar] [CrossRef] [PubMed]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins Struct. Funct. Bioinform. 2019, 87, 1141–1148. [Google Scholar] [CrossRef] [PubMed]
- Du, Z.; Su, H.; Wang, W.; Ye, L.; Wei, H.; Peng, Z.; Anishchenko, I.; Baker, D.; Yang, J. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 2021, 16, 5634–5651. [Google Scholar] [CrossRef]
- Zheng, W.; Li, Y.; Zhang, C.; Zhou, X.; Pearce, R.; Bell, E.W.; Huang, X.; Zhang, Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins Struct. Funct. Bioinform. 2021, 89, 1734–1751. [Google Scholar] [CrossRef] [PubMed]
- Callaway, E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203–204. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making protein folding accessible to all. Nat. Methods 2022, 19, 679–682. [Google Scholar] [CrossRef]
- Gustaf, A.; Nazim, B.; Christina, F.; Sachin, K.; Qinghui, X.; William, G.; Timothy, J.O.D.; Daniel, B.; Ian, F.; Niccolò, Z.; et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv 2023. [Google Scholar] [CrossRef]
- Ziyao, L.; Xuyang, L.; Weijie, C.; Fan, S.; Hangrui, B.; Guolin, K.; Linfeng, Z. Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold. bioRxiv 2022. [Google Scholar] [CrossRef]
- Steinegger, M.; Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Pearce, R.; Zhang, Y. Deep learning techniques have significantly impacted protein structure prediction and protein design. Curr. Opin. Struct. Biol. 2021, 68, 194–207. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 2010, 26, 889–895. [Google Scholar] [CrossRef] [PubMed]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
- Fang, X.; Wang, F.; Liu, L.; He, J.; Lin, D.; Xiang, Y.; Zhu, K.; Zhang, X.; Wu, H.; Li, H.; et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat. Mach. Intell. 2023, 5, 1087–1096. [Google Scholar] [CrossRef]
- Jin, S.; Chenchen, H.; Yuyang, Z.; Junjie, S.; Xibin, Z.; Fajie, Y. SaProt: Protein Language Modeling with Structure-aware Vocabulary. bioRxiv 2023. [Google Scholar] [CrossRef]
- Ruidong, W.; Fan, D.; Rui, W.; Rui, S.; Xiwen, Z.; Shitong, L.; Chenpeng, S.; Zuofan, W.; Qi, X.; Bonnie, B.; et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022. [Google Scholar] [CrossRef]
- Konstantin, W.; Michael, H.; Martin, S.; Burkhard, R. Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies. bioRxiv 2022. [Google Scholar] [CrossRef]
- Schauperl, M.; Denny, R.A. AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges. J. Chem. Inf. Model. 2022, 62, 3142–3156. [Google Scholar] [CrossRef]
- Chothia, C.; Gough, J.; Vogel, C.; Teichmann, S.A. Evolution of the Protein Repertoire. Science 2003, 300, 1701–1703. [Google Scholar] [CrossRef]
- Wollacott, A.M.; Zanghellini, A.; Murphy, P.; Baker, D. Prediction of structures of multidomain proteins from structures of the individual domains. Protein Sci. 2007, 16, 165–175. [Google Scholar] [CrossRef]
- Xu, D.; Jaroszewski, L.; Li, Z.; Godzik, A. AIDA: Ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction. Bioinformatics 2015, 31, 2098–2105. [Google Scholar] [CrossRef]
- Zhou, X.; Peng, C.; Zheng, W.; Li, Y.; Zhang, G.; Zhang, Y. DEMO2: Assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction. Nucleic Acids Res. 2022, 50, W235–W245. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Hu, J.; Zhang, C.; Zhang, G.; Zhang, Y. Assembling multidomain protein structures through analogous global structural alignments. Proc. Natl. Acad. Sci. USA 2019, 116, 15930–15938. [Google Scholar] [CrossRef]
- Peng, C.-X.; Zhou, X.-G.; Xia, Y.-H.; Liu, J.; Hou, M.-H.; Zhang, G.-J. Structural analogue-based protein structure domain assembly assisted by deep learning. Bioinformatics 2022, 38, 4513–4521. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Zheng, W.; Li, Y.; Pearce, R.; Zhang, C.; Bell, E.W.; Zhang, G.; Zhang, Y. I-TASSER-MTD: A deep-learning-based platform for multi-domain protein structure and function prediction. Nat. Protoc. 2022, 17, 2326–2353. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Zhou, X.; Wuyun, Q.; Pearce, R.; Li, Y.; Zhang, Y. FUpred: Detecting protein domains through deep-learning-based contact map prediction. Bioinformatics 2020, 36, 3749–3757. [Google Scholar] [CrossRef] [PubMed]
- Xue, Z.; Xu, D.; Wang, Y.; Zhang, Y. ThreaDom: Extracting protein domain boundary information from multiple threading alignments. Bioinformatics 2013, 29, i247–i256. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Wuyun, Q.; Freddolino, P.L.; Zhang, Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1684–1703. [Google Scholar] [CrossRef]
- Xia, Y.; Zhao, K.; Liu, D.; Zhou, X.; Zhang, G. Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning. Commun. Biol. 2023, 6, 1221. [Google Scholar] [CrossRef]
- Zhu, H.-T.; Xia, Y.-H.; Zhang, G.-J. E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning. J. Chem. Inf. Model. 2023, 63, 6451–6461. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, C.; Yu, D.-J.; Zhang, Y. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction. iScience 2022, 25, 104425. [Google Scholar] [CrossRef] [PubMed]
- Moult, J.; Pedersen, J.T.; Judson, R.; Fidelis, K. A large-scale experiment to assess protein structure prediction methods. Proteins Struct. Funct. Bioinform. 1995, 23, ii-iv. [Google Scholar] [CrossRef]
- Zhang, Y.; Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins Struct. Funct. Bioinform. 2004, 57, 702–710. [Google Scholar] [CrossRef] [PubMed]
- Simpkin, A.J.; Mesdaghi, S.; Sánchez Rodríguez, F.; Elliott, L.; Murphy, D.L.; Kryshtafovych, A.; Keegan, R.M.; Rigden, D.J. Tertiary structure assessment at CASP15. Proteins 2023, 91, 1616–1635. [Google Scholar] [CrossRef]
- Robin, X.; Studer, G.; Durairaj, J.; Eberhardt, J.; Schwede, T.; Walters, W.P. Assessment of protein–ligand complexes in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1811–1821. [Google Scholar] [CrossRef]
- Pang, M.; He, W.; Lu, X.; She, Y.; Xie, L.; Kong, R.; Chang, S. CoDock-Ligand: Combined template-based docking and CNN-based scoring in ligand binding prediction. BMC Bioinform. 2023, 24, 444. [Google Scholar] [CrossRef]
- Xu, X.; Duan, R.; Zou, X. Template-guided method for protein–ligand complex structure prediction: Application to CASP15 protein–ligand studies. Proteins Struct. Funct. Bioinform. 2023, 91, 1829–1836. [Google Scholar] [CrossRef]
- Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef]
- Liu, X.; Jiang, H.; Li, H. SHAFTS: A Hybrid Approach for 3D Molecular Similarity Calculation. 1. Method and Assessment of Virtual Screening. J. Chem. Inf. Model. 2011, 51, 2372–2385. [Google Scholar] [CrossRef]
- Shen, T.; Liu, F.; Wang, Z.; Sun, J.; Bu, Y.; Meng, J.; Chen, W.; Yao, K.; Mu, Y.; Li, W.; et al. zPoseScore model for accurate and robust protein–ligand docking pose scoring in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1837–1849. [Google Scholar] [CrossRef]
- Kotelnikov, S.; Ashizawa, R.; Popov, K.I.; Khan, O.; Ignatov, M.; Li, S.X.; Hassan, M.; Coutsias, E.A.; Poda, G.; Padhorny, D.; et al. Accurate ligand–protein docking in CASP15 using the ClusPro LigTBM server. Proteins Struct. Funct. Bioinform. 2023, 91, 1822–1828. [Google Scholar] [CrossRef]
- Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef]
- Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef] [PubMed]
- The UniProt, C. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef] [PubMed]
- Hekkelman, M.L.; de Vries, I.; Joosten, R.P.; Perrakis, A. AlphaFill: Enriching AlphaFold models with ligands and cofactors. Nat. Methods 2023, 20, 205–213. [Google Scholar] [CrossRef]
- Wehrspan, Z.J.; McDonnell, R.T.; Elcock, A.H. Identification of Iron-Sulfur (Fe-S) Cluster and Zinc (Zn) Binding Sites Within Proteomes Predicted by DeepMind’s AlphaFold2 Program Dramatically Expands the Metalloproteome. J. Mol. Biol. 2022, 434, 167377. [Google Scholar] [CrossRef]
- Jakubec, D.; Skoda, P.; Krivak, R.; Novotny, M.; Hoksza, D. PrankWeb 3: Accelerated ligand-binding site predictions for experimental and modelled protein structures. Nucleic Acids Res. 2022, 50, W593–W597. [Google Scholar] [CrossRef]
- Bludau, I.; Willems, S.; Zeng, W.-F.; Strauss, M.T.; Hansen, F.M.; Tanzer, M.C.; Karayel, O.; Schulman, B.A.; Mann, M. The structural context of posttranslational modifications at a proteome-wide scale. PLoS Biol. 2022, 20, e3001636. [Google Scholar] [CrossRef] [PubMed]
- van Kempen, M.; Kim, S.S.; Tumescheit, C.; Mirdita, M.; Lee, J.; Gilchrist, C.L.M.; Söding, J.; Steinegger, M. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 2023. [Google Scholar] [CrossRef] [PubMed]
- Aderinwale, T.; Bharadwaj, V.; Christoffer, C.; Terashi, G.; Zhang, Z.; Jahandideh, R.; Kagaya, Y.; Kihara, D. Real-time structure search and structure classification for AlphaFold protein models. Commun. Biol. 2022, 5, 316. [Google Scholar] [CrossRef] [PubMed]
- Bordin, N.; Sillitoe, I.; Nallapareddy, V.; Rauer, C.; Lam, S.D.; Waman, V.P.; Sen, N.; Heinzinger, M.; Littmann, M.; Kim, S.; et al. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160. [Google Scholar] [CrossRef] [PubMed]
- David, A.; Islam, S.; Tankhilevich, E.; Sternberg, M.J.E. The AlphaFold Database of Protein Structures: A Biologist’s Guide. J. Mol. Biol. 2022, 434, 167336. [Google Scholar] [CrossRef] [PubMed]
- Hou, M.; Jin, S.; Cui, X.; Peng, C.; Zhao, K.; Song, L.; Zhang, G. Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm. Interdiscip. Sci. Comput. Life Sci. 2024. [Google Scholar] [CrossRef]
- Wayment-Steele, H.K.; Ojoawo, A.; Otten, R.; Apitz, J.M.; Pitsawong, W.; Hömberger, M.; Ovchinnikov, S.; Colwell, L.; Kern, D. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2023. [Google Scholar] [CrossRef] [PubMed]
- del Alamo, D.; Sala, D.; McHaourab, H.S.; Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 2022, 11, e75751. [Google Scholar] [CrossRef] [PubMed]
- Park, S.H.; Ayoub, A.; Lee, Y.-T.; Xu, J.; Kim, H.; Zheng, W.; Zhang, B.; Sha, L.; An, S.; Zhang, Y.; et al. Cryo-EM structure of the human MLL1 core complex bound to the nucleosome. Nat. Commun. 2019, 10, 5540. [Google Scholar] [CrossRef]
- Lee, Y.-T.; Ayoub, A.; Park, S.-H.; Sha, L.; Xu, J.; Mao, F.; Zheng, W.; Zhang, Y.; Cho, U.-S.; Dou, Y. Mechanism for DPY30 and ASH2L intrinsically disordered regions to modulate the MLL/SET1 activity on chromatin. Nat. Commun. 2021, 12, 2953. [Google Scholar] [CrossRef]
- Zhang, H.; Shang, R.; Kim, K.; Zheng, W.; Johnson, C.J.; Sun, L.; Niu, X.; Liu, L.; Zhou, J.; Liu, L.; et al. Evolution of a chordate-specific mechanism for myoblast fusion. Sci. Adv. 2022, 8, eadd2696. [Google Scholar] [CrossRef]
- Wu, S.; Tian, C.; Liu, P.; Guo, D.; Zheng, W.; Huang, X.; Zhang, Y.; Liu, L. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein–protein interactions. J. Med. Virol. 2021, 93, 2132–2140. [Google Scholar] [CrossRef]
- Zheng, W.; Zhang, C.; Li, Y.; Pearce, R.; Bell, E.W.; Zhang, Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep Methods 2021, 1, 100014. [Google Scholar] [CrossRef] [PubMed]
- Richard, E.; Michael, O.N.; Alexander, P.; Natasha, A.; Andrew, S.; Tim, G.; Augustin, Ž.; Russ, B.; Sam, B.; Jason, Y.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2022. [Google Scholar] [CrossRef]
- Chen, K.; Zhou, Y.; Wang, S.; Xiong, P. RNA tertiary structure modeling with BRiQ potential in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1771–1778. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Zhang, C.; Feng, C.; Pearce, R.; Lydia Freddolino, P.; Zhang, Y. Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction. Nat. Commun. 2023, 14, 5745. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Feng, C.; Han, R.; Wang, Z.; Ye, L.; Du, Z.; Wei, H.; Zhang, F.; Peng, Z.; Yang, J. trRosettaRNA: Automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266. [Google Scholar] [CrossRef]
- Baek, M.; McHugh, R.; Anishchenko, I.; Jiang, H.; Baker, D.; DiMaio, F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117–121. [Google Scholar] [CrossRef]
- Terwilliger, T.C.; Liebschner, D.; Croll, T.I.; Williams, C.J.; McCoy, A.J.; Poon, B.K.; Afonine, P.V.; Oeffner, R.D.; Richardson, J.S.; Read, R.J.; et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2024, 21, 110–116. [Google Scholar] [CrossRef]
- Xu, D.; Jaroszewski, L.; Li, Z.; Godzik, A. FFAS-3D: Improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics 2014, 30, 660–667. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Wang, S.; Wang, Z.; Xu, J. MRFalign: Protein Homology Detection through Alignment of Markov Random Fields. PLoS Comput. Biol. 2014, 10, e1003500. [Google Scholar] [CrossRef]
- Cheng, J.; Li, J.; Wang, Z.; Eickholt, J.; Deng, X. The MULTICOM toolbox for protein structure prediction. BMC Bioinform. 2012, 13, 65. [Google Scholar] [CrossRef]
- Cheng, J. A multi-template combination algorithm for protein comparative modeling. BMC Struct. Biol. 2008, 8, 18. [Google Scholar] [CrossRef]
- Kelley, L.A.; Mezulis, S.; Yates, C.M.; Wass, M.N.; Sternberg, M.J.E. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 2015, 10, 845–858. [Google Scholar] [CrossRef] [PubMed]
- Peng, J.; Xu, J. Raptorx: Exploiting structure information for protein alignment by statistical inference. Proteins Struct. Funct. Bioinform. 2011, 79, 161–171. [Google Scholar] [CrossRef]
- Yang, Y.; Faraggi, E.; Zhao, H.; Zhou, Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011, 27, 2076–2082. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T. Predicting novel protein folds by using FRAGFOLD. Proteins Struct. Funct. Bioinform. 2001, 45, 127–132. [Google Scholar] [CrossRef] [PubMed]
- Rohl, C.A.; Strauss, C.E.M.; Misura, K.M.S.; Baker, D. Protein Structure Prediction Using Rosetta. In Methods in Enzymology; Academic Press: Cambridge, MA, USA, 2004; Volume 383, pp. 66–93. [Google Scholar]
- Mortuza, S.M.; Zheng, W.; Zhang, C.; Li, Y.; Pearce, R.; Zhang, Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat. Commun. 2021, 12, 5011. [Google Scholar] [CrossRef] [PubMed]
- Pearce, R.; Li, Y.; Omenn, G.S.; Zhang, Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput. Biol. 2022, 18, e1010539. [Google Scholar] [CrossRef] [PubMed]
- Shen, T.; Wu, J.; Lan, H.; Zheng, L.; Pei, J.; Wang, S.; Liu, W.; Huang, J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14—(tFold for CASP14 contact prediction). Proteins Struct. Funct. Bioinform. 2021, 89, 1901–1910. [Google Scholar] [CrossRef]
- Cheng, S.; Zhao, X.; Lu, G.; Fang, J.; Yu, Z.; Zheng, T.; Wu, R.; Zhang, X.; Peng, J.; You, Y. FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours. arXiv 2022, arXiv:2203.00854. [Google Scholar]
- Wang, G.; Fang, X.; Wu, Z.; Liu, Y.; Xue, Y.; Xiang, Y.; Yu, D.; Wang, F.; Ma, Y. HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle. arXiv 2022, arXiv:2207.05477. [Google Scholar]
- Liu, S.; Zhang, J.; Chu, H.; Wang, M.; Xue, B.; Ni, N.; Yu, J.; Xie, Y.; Chen, Z.; Chen, M.; et al. PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction. arXiv 2022, arXiv:2206.12240. [Google Scholar]
- Ruffolo, J.A.; Chu, L.-S.; Mahajan, S.P.; Gray, J.J. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat. Commun. 2023, 14, 2389. [Google Scholar] [CrossRef] [PubMed]
- Jing, X.; Wu, F.; Luo, X.; Xu, J. RaptorX-Single: Single-sequence protein structure prediction by integrating protein language models. bioRxiv 2023. bioRxiv:2023.04.24.538081. [Google Scholar]
- Wang, W.; Peng, Z.; Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2022, 2, 804–814. [Google Scholar] [CrossRef] [PubMed]
Method | Method Type | Domain Type | TM-Score | p-Value | #{TM > 0.5} |
---|---|---|---|---|---|
AlphaFold2 | AlphaFold2-based | All | 0.8871 | - | 88 |
TBM | 0.9325 | - | 54 | ||
FM | 0.8207 | - | 34 | ||
ColabFold | All | 0.8846 | 3.29 × 10−2 | 88 | |
TBM | 0.9255 | 6.65 × 10−3 | 54 | ||
FM | 0.8250 | 4.25 × 10−1 | 34 | ||
OpenFold | All | 0.8692 | 3.17 × 10−2 | 85 | |
TBM | 0.9199 | 1.57 × 10−1 | 53 | ||
FM | 0.7952 | 5.24 × 10−2 | 32 | ||
Uni-Fold | All | 0.8930 | 9.93 × 10−1 | 88 | |
TBM | 0.9387 | 9.76 × 10−1 | 54 | ||
FM | 0.8262 | 9.26 × 10−1 | 34 | ||
AlphaFold2-Single | PLM-based | All | 0.5165 | 4.06 × 10−16 | 40 |
TBM | 0.6609 | 5.15 × 10−10 | 37 | ||
FM | 0.3057 | 2.18 × 10−11 | 3 | ||
ESMFold | All | 0.7206 | 4.64 × 10−14 | 66 | |
TBM | 0.8481 | 1.89 × 10−7 | 50 | ||
FM | 0.5346 | 1.02 × 10−10 | 16 | ||
OmegaFold | All | 0.6920 | 4.60 × 10−9 | 64 | |
TBM | 0.7944 | 5.42 × 10−6 | 46 | ||
FM | 0.5426 | 2.18 × 10−5 | 18 |
Method | Method Type | TM-Score | p-Value | #{TM > 0.5} |
---|---|---|---|---|
AlphaFold2 | AlphaFold2-based | 0.8514 | - | 60 |
ColabFold | 0.8461 | 2.37 × 10−1 | 61 | |
OpenFold | 0.8375 | 1.46 × 10−1 | 59 | |
Uni-Fold | 0.8561 | 9.82 × 10−1 | 61 | |
AlphaFold2-Single | PLM-based | 0.5164 | 6.88 × 10−12 | 30 |
ESMFold | 0.6676 | 1.80 × 10−10 | 41 | |
OmegaFold | 0.6728 | 4.42 × 10−6 | 43 |
Method | Advantages | Limitations |
---|---|---|
Template-based modeling (TBM) | The methods can achieve high accuracy and adeptly reflect evolutionary relationships when reliable templates are identifiable. | The accuracy of TBM significantly decreases when the available templates are only distantly related to the target protein. |
Template-free modeling (FM) | The methods are not limited to the availability of templates and, thus, can be applied to any protein. | The statistical and knowledge-based energy potentials used in FM methods may lead to suboptimal performance if they are inaccurate. Also, these energy potentials contain little residue–residue interaction information. |
Contact/distance-based methods | The energy potentials derived from deep learning-based restraints (contacts or distances) contain high-quality residue–residue interaction information. | The deep learning-based restraints (contacts or distances) and the final structural models are optimized separately, which may be difficult for improving overall accuracy. Additionally, the requirement for MSA inputs poses a challenge for distance-based methods, especially in cases in which high-quality MSAs are difficult to obtain. |
End-to-end methods | The deep learning-based restraints (contacts or distances) and the final structural models are optimized together, resulting in their high accuracy in single-domain proteins. | Such methods have shown limitations in accurately predicting the structures of multi-domain proteins, especially for proteins with few known homologs. |
Protein language model (PLM)-based methods | These methods have high scalability and computational efficiency, since they do not rely on MSA inputs. Also, their performance is relatively better for orphan proteins. | PLM-based methods currently suffer from relatively low accuracy in structure prediction. |
Multi-domain protein structure prediction methods | These methods are well-designed for multi-domain proteins, with high performance to balance the modeling quality of inter-domain and intra-domain interactions. | These methods face challenges in carefully balancing MSAs for both separate domains and full-length proteins, accurately modeling the orientations between disparate domains, and predicting the accurate domain boundaries. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wuyun, Q.; Chen, Y.; Shen, Y.; Cao, Y.; Hu, G.; Cui, W.; Gao, J.; Zheng, W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024, 29, 832. https://doi.org/10.3390/molecules29040832
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules. 2024; 29(4):832. https://doi.org/10.3390/molecules29040832
Chicago/Turabian StyleWuyun, Qiqige, Yihan Chen, Yifeng Shen, Yang Cao, Gang Hu, Wei Cui, Jianzhao Gao, and Wei Zheng. 2024. "Recent Progress of Protein Tertiary Structure Prediction" Molecules 29, no. 4: 832. https://doi.org/10.3390/molecules29040832
APA StyleWuyun, Q., Chen, Y., Shen, Y., Cao, Y., Hu, G., Cui, W., Gao, J., & Zheng, W. (2024). Recent Progress of Protein Tertiary Structure Prediction. Molecules, 29(4), 832. https://doi.org/10.3390/molecules29040832