AI-Driven Deep Learning Techniques in Protein Structure Prediction
Abstract
:1. Introduction
2. Established Protein Modeling
2.1. Homology Modeling
2.2. Ab Initio Modeling
2.3. Threading
3. Deep Learning-Based Models
3.1. AlphaFold
3.2. RoseTTAFold
3.3. ProteinBERT
3.4. DeepFold
3.5. OmegaFold
3.6. ESMFold
3.7. AI Integration
3.7.1. Swiss-Model
3.7.2. Rosetta
3.7.3. I-TASSER
3.8. Table Summary
3.9. Model Comparsion
3.10. Challenges and Limitations
4. Potential Applications
4.1. Drug Design
4.2. Industry
4.3. Education
4.4. Novel Protein
4.5. Future Research
5. Summary
Author Contributions
Funding
Conflicts of Interest
References
- Schulz, G.E.; Schirmer, R.H. Principles of Protein Structure; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Petsko, G.A.; Ringe, D. Protein Structure and Function; New Science Press: London, UK, 2004. [Google Scholar]
- Law, J. The development of specialties in science: The case of X-ray protein crystallography. Sci. Stud. 1973, 3, 275–303. [Google Scholar] [CrossRef]
- Smyth, M.; Martin, J. x Ray crystallography. Mol. Pathol. 2000, 53, 8. [Google Scholar] [CrossRef]
- Hu, Y.; Cheng, K.; He, L.; Zhang, X.; Jiang, B.; Jiang, L.; Li, C.; Wang, G.; Yang, Y.; Liu, M. NMR-based methods for protein analysis. Anal. Chem. 2021, 93, 1866–1879. [Google Scholar] [CrossRef] [PubMed]
- Koehler Leman, J.; Künze, G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int. J. Mol. Sci. 2023, 24, 7835. [Google Scholar] [CrossRef]
- Markwick, P.R.; Malliavin, T.; Nilges, M. Structural biology by NMR: Structure, dynamics, and interactions. PLoS Comput. Biol. 2008, 4, e1000168. [Google Scholar] [CrossRef] [PubMed]
- Purslow, J.A.; Khatiwada, B.; Bayro, M.J.; Venditti, V. NMR Methods for Structural Characterization of Protein-Protein Complexes. Front. Mol. Biosci. 2020, 7, 9. [Google Scholar] [CrossRef]
- Werner, M.H. Nuclear Magnetic Resonance (NMR) Spectroscopy: Structural Analysis of Proteins and Nucleic Acids; John Wiley & Sons Ltd.: Chichester, UK, 2007; Volume 2. [Google Scholar]
- Kabsch, W.; Rösch, P. Nuclear magnetic resonance: Protein structure determination. Nature 1986, 321, 469–470. [Google Scholar] [CrossRef] [PubMed]
- Namba, K.; Makino, F. Recent progress and future perspective of electron cryomicroscopy for structural life sciences. Microscopy 2022, 71 (Suppl. S1), i3–i14. [Google Scholar] [CrossRef] [PubMed]
- Stock, D.; Perisic, O.; Löwe, J. Robotic nanolitre protein crystallisation at the MRC Laboratory of Molecular Biology. Prog. Biophys. Mol. Biol. 2005, 88, 311–327. [Google Scholar] [CrossRef]
- Chatham, J.C.; Blackband, S.J. Nuclear magnetic resonance spectroscopy and imaging in animal research. IlAR J. 2001, 42, 189–208. [Google Scholar] [CrossRef]
- Bai, X.-C.; McMullan, G.; Scheres, S.H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 2015, 40, 49–57. [Google Scholar] [CrossRef]
- Egli, M. Diffraction Techniques in Structural Biology: Overview for unit 7 “Biophysical Analysis of Nucleic Acids”. In Current Protocols in Nucleic Acid Chemistry; Beaucage, S.L., Ed.; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar]
- Vénien-Bryan, C.; Li, Z.; Vuillard, L.; Boutin, J.A. Cryo-electron microscopy and X-ray crystallography: Complementary approaches to structural biology and drug discovery. Acta Crystallogr. Sect. F Struct. Biol. Commun. 2017, 73, 174–183. [Google Scholar] [CrossRef]
- Muench, S.P.; Antonyuk, S.V.; Hasnain, S.S. The expanding toolkit for structural biology: Synchrotrons, X-ray lasers and cryoEM. IUCrJ 2019, 6, 167–177. [Google Scholar] [CrossRef]
- Narasimhan, S. Determining Protein Structures Using X-Ray Crystallography. In Plant Functional Genomics: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2024; Volume 1, pp. 333–353. [Google Scholar]
- Benjin, X.; Ling, L. Developments, applications, and prospects of cryo-electron microscopy. Protein Sci. 2020, 29, 872–882. [Google Scholar] [CrossRef]
- Floudas, C.A. Computational methods in protein structure prediction. Biotechnol. Bioeng. 2007, 97, 207–213. [Google Scholar] [CrossRef] [PubMed]
- Bertoline, L.M.F.; Lima, A.N.; Krieger, J.E.; Teixeira, S.K. Before and after AlphaFold2: An overview of protein structure prediction. Front. Bioinform. 2023, 3, 1120370. [Google Scholar] [CrossRef] [PubMed]
- Krieger, E.; Nabuurs, S.B.; Vriend, G. Homology modeling. Struct. Bioinform. 2003, 44, 509–523. [Google Scholar]
- Bowie, J.U.; Lüthy, R.; Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science 1991, 253, 164–170. [Google Scholar] [CrossRef]
- Lee, J.; Freddolino, P.L.; Zhang, Y. Ab initio protein structure prediction. Protein Struct. Funct. Bioinform. 2017, 12, 176–181. [Google Scholar]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef] [PubMed]
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef] [PubMed]
- Pearce, R.; Li, Y.; Omenn, G.S.; Zhang, Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput. Biol. 2022, 18, e1010539. [Google Scholar] [CrossRef] [PubMed]
- Wu, R.; Ding, F.; Wang, R.; Shen, R.; Zhang, X.; Luo, S.; Su, C.; Wu, Z.; Xie, Q.; Berger, B. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022. bioRxiv:2022.07.21.500999. [Google Scholar]
- Hie, B.; Candido, S.; Lin, Z.; Kabeli, O.; Rao, R.; Smetanin, N.; Sercu, T.; Rives, A. A high-level programming language for generative protein design. bioRxiv 2022. bioRxiv:2022.12.21.521526. [Google Scholar]
- Verkuil, R.; Kabeli, O.; Du, Y.; Wicky, B.I.; Milles, L.F.; Dauparas, J.; Baker, D.; Ovchinnikov, S.; Sercu, T.; Rives, A. Language models generalize beyond natural proteins. bioRxiv 2022. bioRxiv:2022.12.21.521521. [Google Scholar]
- Waterhouse, A.; Bertoni, M.; Bienert, S.; Studer, G.; Tauriello, G.; Gumienny, R.; Heer, F.T.; de Beer, T.A.P.; Rempfer, C.; Bordoli, L.; et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018, 46, W296–W303. [Google Scholar] [CrossRef] [PubMed]
- Fiser, A.; Šali, A. Modeller: Generation and refinement of homology-based protein structure models. In Methods in Enzymology; Elsevier: Amsterdam, The Netherlands, 2003; Volume 374, pp. 461–491. [Google Scholar]
- Kelley, L.A.; Mezulis, S.; Yates, C.M.; Wass, M.N.; Sternberg, M.J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 2015, 10, 845–858. [Google Scholar] [CrossRef] [PubMed]
- Simons, K.T.; Bonneau, R.; Ruczinski, I.; Baker, D. Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins 1999, 37 (Suppl. S3), 171–176. [Google Scholar] [CrossRef]
- Xu, D.; Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins Struct. Funct. Bioinform. 2012, 80, 1715–1735. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, Y. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res. 2015, 43, W174–W181. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinform. 2008, 9, 40. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Zhang, C.; Li, Y.; Pearce, R.; Bell, E.W.; Zhang, Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep. Methods 2021, 1, 100014. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Zheng, W.; Li, Y.; Pearce, R.; Zhang, C.; Bell, E.W.; Zhang, G.; Zhang, Y. I-TASSER-MTD: A deep-learning-based platform for multi-domain protein structure and function prediction. Nat. Protoc. 2022, 17, 2326–2353. [Google Scholar] [CrossRef] [PubMed]
- Söding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33 (Suppl. S2), W244–W248. [Google Scholar] [CrossRef] [PubMed]
- Schwede, T.; Kopp, J.R.; Guex, N.; Peitsch, M.C. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003, 31, 3381–3385. [Google Scholar] [CrossRef] [PubMed]
- Biasini, M.; Bienert, S.; Waterhouse, A.; Arnold, K.; Studer, G.; Schmidt, T.; Kiefer, F.; Cassarino, T.G.; Bertoni, M.; Bordoli, L.; et al. SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014, 42, W252–W258. [Google Scholar] [CrossRef] [PubMed]
- Bertoni, M.; Kiefer, F.; Biasini, M.; Bordoli, L.; Schwede, T. Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Sci. Rep. 2017, 7, 10480. [Google Scholar] [CrossRef]
- Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 2012, 9, 173–175. [Google Scholar] [CrossRef]
- Benkert, P.; Biasini, M.; Schwede, T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2010, 27, 343–350. [Google Scholar] [CrossRef]
- T UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res 2017, 45, D158–D169. [Google Scholar] [CrossRef] [PubMed]
- T UniProt Consortium. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. [Google Scholar] [CrossRef] [PubMed]
- Burley, S.K.; Berman, H.M.; Kleywegt, G.J.; Markley, J.L.; Nakamura, H.; Velankar, S. Protein Data Bank (PDB): The single global macromolecular structure archive. Protein Crystallogr. Methods Protoc. 2017, 1607, 627–641. [Google Scholar]
- Bradley, P.; Misura, K.M.; Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 2005, 309, 1868–1871. [Google Scholar] [CrossRef] [PubMed]
- Ołdziej, S.; Czaplewski, C.; Liwo, A.; Chinchio, M.; Nanias, M.; Vila, J.A.; Khalili, M.; Arnautova, Y.A.; Jagielska, A.; Makowski, M.; et al. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: Assessment in two blind tests. Proc. Natl. Acad. Sci. USA 2005, 102, 7547–7552. [Google Scholar] [CrossRef] [PubMed]
- Jauch, R.; Yeo, H.C.; Kolatkar, P.R.; Clarke, N.D. Assessment of CASP7 structure predictions for template free targets. Proteins 2007, 69 (Suppl. S8), 57–67. [Google Scholar] [CrossRef] [PubMed]
- Abbass, J.; Nebel, J.-C. Rosetta and the Journey to Predict Proteins’ Structures, 20 Years on. Curr. Bioinform. 2020, 15, 611–626. [Google Scholar] [CrossRef]
- Söding, J. Protein homology detection by HMM–HMM comparison. Bioinformatics 2004, 21, 951–960. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Faraggi, E.; Zhao, H.; Zhou, Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011, 27, 2076–2082. [Google Scholar] [CrossRef]
- Peng, J.; Xu, J. RaptorX: Exploiting structure information for protein alignment by statistical inference. Proteins Struct. Funct. Bioinform. 2011, 79, 161–171. [Google Scholar] [CrossRef]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed]
- Rychlewski, L.; Jaroszewski, L.; Li, W.; Godzik, A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 2000, 9, 232–241. [Google Scholar] [CrossRef] [PubMed]
- Jaroszewski, L.; Rychlewski, L.; Godzik, A. Improving the quality of twilight-zone alignments. Protein Sci. 2000, 9, 1487–1496. [Google Scholar] [CrossRef]
- Ginalski, K.; Rychlewski, L. Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucleic Acids Res. 2003, 31, 3291–3292. [Google Scholar] [CrossRef] [PubMed]
- Watkins, A.M.; Rangan, R.; Das, R. Chapter Nine—Using Rosetta for RNA homology modeling. In Methods in Enzymology; Hargrove, A.E., Ed.; Academic Press: Cambridge, MA, USA, 2019; Volume 623, pp. 177–207. [Google Scholar]
- Misura, K.M.; Chivian, D.; Rohl, C.A.; Kim, D.E.; Baker, D. Physically realistic homology models built with ROSETTA can be more accurate than their templates. Proc. Natl. Acad. Sci. USA 2006, 103, 5361–5366. [Google Scholar] [CrossRef] [PubMed]
- Bender, B.J.; Marlow, B.; Meiler, J. Improving homology modeling from low-sequence identity templates in Rosetta: A case study in GPCRs. PLoS Comput. Biol. 2020, 16, e1007597. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T.; Taylort, W.R.; Thornton, J.M. A new approach to protein fold recognition. Nature 1992, 358, 86–89. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Skolnick, J. The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. USA 2005, 102, 1029–1034. [Google Scholar] [CrossRef]
- Zhang, H.; Shen, Y. Template-based prediction of protein structure with deep learning. BMC Genom. 2020, 21, 878. [Google Scholar] [CrossRef]
- Bienkowska, J.; Lathrop, R. Threading algorithms. In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics; Dunn, M., Ed.; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
- Wu, S.; Zhang, Y. LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Res. 2007, 35, 3375–3382. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Zhang, C.; Wuyun, Q.; Pearce, R.; Li, Y.; Zhang, Y. LOMETS2: Improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019, 47, W429–W436. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins Struct. Funct. Bioinform. 2014, 82, 175–187. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Roy, A.; Zhang, Y. BioLiP: A semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 2012, 41, D1096–D1103. [Google Scholar] [CrossRef] [PubMed]
- Nam, K.H. AI-based protein models enhance the accuracy of experimentally determined protein crystal structures. Front. Mol. Biosci. 2023, 10, 1208810. [Google Scholar] [CrossRef]
- Alkharusi, H. Categorical variables in regression analysis: A comparison of dummy and effect coding. Int. J. Educ. 2012, 4, 202. [Google Scholar] [CrossRef]
- Wang, Y.; Li, Z.; Zhang, Y.; Ma, Y.; Huang, Q.; Chen, X.; Dai, Z.; Zou, X. Performance improvement for a 2D convolutional neural network by using SSC encoding on protein–protein interaction tasks. BMC Bioinform. 2021, 22, 184. [Google Scholar] [CrossRef]
- Guo, Y.; Wu, J.; Ma, H.; Wang, S.; Huang, J. Bagging msa learning: Enhancing low-quality pssm with deep learning for accurate protein structure property prediction. In Proceedings of the Research in Computational Molecular Biology: 24th Annual International Conference, RECOMB 2020, Padua, Italy, 10–13 May 2020; Springer: Berlin/Heidelberg, Germany, 2020. Proceedings 24. pp. 88–103. [Google Scholar]
- Saini, H.; Raicar, G.; Lal, S.P.; Dehzangi, A.; Imoto, S.; Sharma, A. Protein fold recognition using genetic algorithm optimized voting scheme and profile bigram. J. Softw. 2016, 11, 756–767. [Google Scholar] [CrossRef]
- Barros-Álvarez, X.; Nwokonko, R.M.; Vizurraga, A.; Matzov, D.; He, F.; Papasergi-Scott, M.M.; Robertson, M.J.; Panova, O.; Yardeni, E.H.; Seven, A.B.; et al. The tethered peptide activation mechanism of adhesion GPCRs. Nature 2022, 604, 757–762. [Google Scholar] [CrossRef]
- Pearson, W.R. Using the FASTA Program to Search Protein and DNA Sequence Databases. In Computer Analysis of Sequence Data: Part I; Griffin, A.M., Griffin, H.G., Eds.; Humana Press: Totowa, NJ, USA, 1994; pp. 307–331. [Google Scholar] [CrossRef]
- DeLano, W.L. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr 2002, 40, 82–92. [Google Scholar]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef] [PubMed]
- Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the IEEE 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2021, 50, D439–D444. [Google Scholar] [CrossRef] [PubMed]
- Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar] [CrossRef]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef] [PubMed]
- Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar]
- Humphreys, I.R.; Pei, J.; Baek, M.; Krishnakumar, A.; Anishchenko, I.; Ovchinnikov, S.; Zhang, J.; Ness, T.J.; Banjade, S.; Bagde, S.R. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805. [Google Scholar] [CrossRef]
- Lee, C.; Su, B.-H.; Tseng, Y.J. Comparative studies of AlphaFold, RoseTTAFold and Modeller: A case study involving the use of G-protein-coupled receptors. Brief. Bioinform. 2022, 23, bbac308. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hiranuma, N.; Park, H.; Baek, M.; Anishchenko, I.; Dauparas, J.; Baker, D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat. Commun. 2021, 12, 1340. [Google Scholar] [CrossRef]
- Huang, P.-S.; Boyken, S.E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320–327. [Google Scholar] [CrossRef]
- Anishchenko, I.; Pellock, S.J.; Chidyausiku, T.M.; Ramelot, T.A.; Ovchinnikov, S.; Hao, J.; Bafna, K.; Norn, C.; Kang, A.; Bera, A.K.; et al. De novo protein design by deep network hallucination. Nature 2021, 600, 547–552. [Google Scholar] [CrossRef] [PubMed]
- Tsuchiya, Y.; Tomii, K. Neural networks for protein structure and function prediction and dynamic analysis. Biophys. Rev. 2020, 12, 569–573. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, D. refineD: Improved protein structure refinement using machine learning based restrained relaxation. Bioinformatics 2019, 35, 3320–3328. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Anand, N.; Huang, P. Generative modeling for protein structures. Adv. Neural Inf. Process. Syst. 2018, 31, 7494–7505. [Google Scholar]
- Du, Z.; Su, H.; Wang, W.; Ye, L.; Wei, H.; Peng, Z.; Anishchenko, I.; Baker, D.; Yang, J. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 2021, 16, 5634–5651. [Google Scholar] [CrossRef] [PubMed]
- Zemla, A.T. Local-Global Alignment for Finding 3D Similarities in Protein Structures. U.S. Patent 8,024,127, 20 September 2011. [Google Scholar]
- Elofsson, A. Progress at protein structure prediction, as seen in CASP15. Curr. Opin. Struct. Biol. 2023, 80, 102594. [Google Scholar] [CrossRef] [PubMed]
- Leemann, M.; Sagasta, A.; Eberhardt, J.; Schwede, T.; Robin, X.; Durairaj, J. Automated benchmarking of combined protein structure and ligand conformation prediction. Proteins Struct. Funct. Bioinform. 2023, 91, 1912–1924. [Google Scholar] [CrossRef] [PubMed]
- Robin, X.; Haas, J.; Gumienny, R.; Smolinski, A.; Tauriello, G.; Schwede, T. Continuous Automated Model EvaluatiOn (CAMEO)—Perspectives on the future of fully automated evaluation of structure prediction methods. Proteins Struct. Funct. Bioinform. 2021, 89, 1977–1986. [Google Scholar] [CrossRef]
- Haas, J.; Gumienny, R.; Barbato, A.; Ackermann, F.; Tauriello, G.; Bertoni, M.; Studer, G.; Smolinski, A.; Schwede, T. Introducing “best single template” models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins Struct. Funct. Bioinform. 2019, 87, 1378–1387. [Google Scholar] [CrossRef]
- Haas, J.; Barbato, A.; Behringer, D.; Studer, G.; Roth, S.; Bertoni, M.; Mostaguir, K.; Gumienny, R.; Schwede, T. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins Struct. Funct. Bioinform. 2018, 86, 387–398. [Google Scholar] [CrossRef]
- Haas, J.; Roth, S.; Arnold, K.; Kiefer, F.; Schmidt, T.; Bordoli, L.; Schwede, T. The Protein Model Portal—A comprehensive resource for protein structure and model information. Database 2013, 2013, bat031. [Google Scholar] [CrossRef] [PubMed]
- Mariani, V.; Biasini, M.; Barbato, A.; Schwede, T. lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013, 29, 2722–2728. [Google Scholar] [CrossRef]
- Jingcheng, Y.; Zhaoming, C.; Zhaoqun, L.; Mingliang, Z.; Wenjun, L.; He, H.; Qiwei, Y. Code of OpenComplex. 2022. Available online: https://github.com/baaihealth/OpenComplex (accessed on 3 July 2024).
- Ahdritz, G.; Bouatta, N.; Floristean, C.; Kadyan, S.; Xia, Q.; Gerecke, W.; O’Donnell, T.J.; Berenberg, D.; Fisk, I.; Zanichelli, N.; et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 2024, 1–11. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 4399. [Google Scholar]
- Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent advances in recurrent neural networks. arXiv 2017, arXiv:1801.01078. [Google Scholar]
- Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 2019, 6, 11. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 2019, 35, 4862–4865. [Google Scholar] [CrossRef]
- Gupta, R.; Gupta, N.; Rathi, P. Bacterial lipases: An overview of production, purification and biochemical properties. Appl. Microbiol. Biotechnol. 2004, 64, 763–781. [Google Scholar] [CrossRef]
- Ingraham, J.B.; Baranov, M.; Costello, Z.; Barber, K.W.; Wang, W.; Ismail, A.; Frappier, V.; Lord, D.M.; Ng-Thow-Hing, C.; Van Vlack, E.R. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070–1078. [Google Scholar] [CrossRef] [PubMed]
- Romero, P.A.; Arnold, F.H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 2009, 10, 866–876. [Google Scholar] [CrossRef] [PubMed]
- Sampaio, P.S.; Fernandes, P. Machine Learning: A Suitable Method for Biocatalysis. Catalysts 2023, 13, 961. [Google Scholar] [CrossRef]
- Qureshi, R.; Irfan, M.; Ali, H.; Khan, A.; Nittala, A.S.; Ali, S.; Shah, A.; Gondal, T.M.; Sadak, F.; Shah, Z. Artificial intelligence and biosensors in healthcare and its clinical relevance: A review. IEEE Access 2023, 11, 61600–61620. [Google Scholar] [CrossRef]
- Samir, A.; Ashour, F.H.; Hakim, A.A.; Bassyouni, M. Recent advances in biodegradable polymers for sustainable applications. npj Mater. Degrad. 2022, 6, 68. [Google Scholar] [CrossRef]
- Illergård, K.; Ardell, D.H.; Elofsson, A. Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins Struct. Funct. Bioinform. 2009, 77, 499–508. [Google Scholar] [CrossRef] [PubMed]
- Jiménez, J.; Doerr, S.; Martínez-Rosell, G.; Rose, A.S.; De Fabritiis, G. DeepSite: Protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017, 33, 3036–3042. [Google Scholar] [CrossRef] [PubMed]
- Dill, K.A.; MacCallum, J.L. The protein-folding problem, 50 years on. Science 2012, 338, 1042–1046. [Google Scholar] [CrossRef] [PubMed]
- Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D.R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942–957. [Google Scholar] [CrossRef]
- Kitchen, D.B.; Decornez, H.; Furr, J.R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug Discov. 2004, 3, 935–949. [Google Scholar] [CrossRef]
- Edelmann, M.J.; Nicholson, B.; Kessler, B.M. Pharmacological targets in the ubiquitin system offer new ways of treating cancer, neurodegenerative disorders and infectious diseases. Expert Rev. Mol. Med. 2011, 13, e35. [Google Scholar] [CrossRef]
- Jin, Y.; Wang, W.; Wang, Q.; Zhang, Y.; Zahid, K.R.; Raza, U.; Gong, Y. Alpha-1-antichymotrypsin as a novel biomarker for diagnosis, prognosis, and therapy prediction in human diseases. Cancer Cell Int. 2022, 22, 156. [Google Scholar] [CrossRef] [PubMed]
- Sosa, D.N.; Neculae, G.; Fauqueur, J.; Altman, R.B. Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery. J. Biomed. Semant. 2024, 15, 5. [Google Scholar] [CrossRef] [PubMed]
- Tunyasuvunakool, K. The prospects and opportunities of protein structure prediction with AI. Nat. Rev. Mol. Cell Biol. 2022, 23, 445–446. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Nielsen, J. Energy metabolism controls phenotypes by protein efficiency and allocation. Proc. Natl. Acad. Sci. USA 2019, 116, 17592–17597. [Google Scholar] [CrossRef] [PubMed]
- Lostao, A.; Lim, K.; Pallarés, M.C.; Ptak, A.; Marcuello, C. Recent advances in sensing the inter-biomolecular interactions at the nanoscale–A comprehensive review of AFM-based force spectroscopy. Int. J. Biol. Macromol. 2023, 238, 124089. [Google Scholar] [CrossRef]
- Baker, K.; Hughes, N.; Bhattacharya, S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. Front. Bioinform. 2024, 4, 1358550. [Google Scholar] [CrossRef]
Method | Advantages | Disadvantages | Examples |
---|---|---|---|
Homology Modeling | 1. Utilizes experimentally determined structures of homologous proteins. 2. Highly accurate when suitable templates are available. 3. Widely used and accessible. | 1. Relies on the availability of suitable templates. 2. Less reliable for unique proteins lacking close relatives in the database. 3. Less effective for exceptionally large or structurally complicated proteins. | Swiss-Model [32], Modeller [33], Phyre2 [34] |
Ab Initio Modeling | 1. Predicts structure solely from amino acid sequence, no need for existing templates. 2. Effective in producing models for proteins with limited sequence identity to known structures. 3. Can explore vast conformational spaces to identify low-energy protein structures. | 1. Computationally intensive. 2. Success depends on the accuracy of energy functions and sampling algorithms. 3. May struggle with proteins with novel folds or significant structural rearrangements. | Rosetta [35], QUARK [36], I-TASSER [37,38,39,40] |
Threading | 1. Considers both sequence and structural information for template selection. 2. Predicts structures for proteins with limited sequence similarity to known structures. 3. Increases the scope of prediction to include proteins with diverse sequences and folds. | 1. Requires significant computational resources for template search and alignment. 2. Relies on the accuracy of threading algorithms and the structural compatibility of templates. 3. May produce inaccurate models if no suitable templates are found. | I-TASSER [37,38,39,40], HHpred [41], Phyre2 [34] |
Model | Environment | Key Architecture |
---|---|---|
AlphaFold2 [25] | Open Source (Python), Cloud (Google Colab) | Transformer-based Deep Learning |
RoseTTAFold [26] | Open Source (Python), Cloud (Google Colab) | Attention-based Neural Network |
ProteinBERT [27] | Open Source (Python), Cloud (Google Colab) | Transformer-based Deep Learning |
DeepFold [28] | Open Source (Python), Web-based | Custom Multi-Stage Deep Learning Pipeline |
OmegaFold [29] | Open Source (Python), Cloud (Google Colab) | Transformer-based Deep Learning |
ESMFold [30,31] | Web-based, Open Source (Python), Cloud (Google Colab) | Transformer-based Deep Learning |
Swiss-Model [42] | Web-based | Homology |
Rosetta [35] | Web-based | Ab Initio, Homology, and Threading |
I-TASSER [37,38,39,40] | Web-based | Threading and Ab Initio |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, L.; Li, Q.; Nasif, K.F.A.; Xie, Y.; Deng, B.; Niu, S.; Pouriyeh, S.; Dai, Z.; Chen, J.; Xie, C.Y. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int. J. Mol. Sci. 2024, 25, 8426. https://doi.org/10.3390/ijms25158426
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. International Journal of Molecular Sciences. 2024; 25(15):8426. https://doi.org/10.3390/ijms25158426
Chicago/Turabian StyleChen, Lingtao, Qiaomu Li, Kazi Fahim Ahmad Nasif, Ying Xie, Bobin Deng, Shuteng Niu, Seyedamin Pouriyeh, Zhiyu Dai, Jiawei Chen, and Chloe Yixin Xie. 2024. "AI-Driven Deep Learning Techniques in Protein Structure Prediction" International Journal of Molecular Sciences 25, no. 15: 8426. https://doi.org/10.3390/ijms25158426
APA StyleChen, L., Li, Q., Nasif, K. F. A., Xie, Y., Deng, B., Niu, S., Pouriyeh, S., Dai, Z., Chen, J., & Xie, C. Y. (2024). AI-Driven Deep Learning Techniques in Protein Structure Prediction. International Journal of Molecular Sciences, 25(15), 8426. https://doi.org/10.3390/ijms25158426