Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction
Abstract
:1. Introduction
2. Protein Sequence Homology, 3D Structure, and Deep Learning
2.1. Protein Sequence Homology
2.2. 3D Structural Space of Proteins
2.3. Overview of Deep Learning Methods
3. Prediction of 1D and 2D Protein Structural Annotations
3.1. 1D Prediction
3.2. 2D Prediction
4. Prediction of Protein 3D Structures
4.1. Critical Assessment of Protein Structure Prediction (CASP)
4.2. 3D Structure Prediction Based on Contact Maps
4.3. In Combination with Template-Based Modeling
5. Prediction of Drug–Target Interactions (DTIs)
6. Conclusions and Outlook
Author Contributions
Funding
Conflicts of Interest
Glossary
Sequence homology | the biological resemblance between DNA, RNA, or protein squences, determined by their shared ancestry during the evolution of life. |
Protein co-evolution | a statistical model that the energetic interactions between amino acids that contribute to protein structure and function can be inferred from correlations between amino acids at pairs of positions in a large selection of homologous sequences across a protein family. |
Drug target interaction (DTI) | the binding of a drug to a target location that results in a change in its behavior/function. |
Deep learning | a class of machine learning approach that uses artificial neural networks (ANNs) with many layers of nonlinear processing units for learning data representation. |
Contact map | a bidimensional matrix coding the absence/presence or the probability of contact between residue pairs in a given protein. |
Direct coupling analysis (DCA) | statistical inference framework used to infer direct co-evolutionary couplings among residue pairs in multiple sequence alignment. |
Global Distance Test (GDT) score | ranges from 0 to 100; the percentage of amino acid residues (beads in the protein chain) within a threshold distance from the correct position; a score of ~90 GDT is informally considered to be competitive with the results obtained from experimental methods. |
Attention algorithm | developed to mimic the way a person might assemble a jigsaw puzzle; first connecting pieces in small clumps—in this case, clusters of amino acids—and then searching for ways to join the clumps in a larger whole. |
References
- Liebschner, D.; Afonine, P.V.; Baker, M.L.; Bunkoczi, G.; Chen, V.B.; Croll, T.I.; Hintze, B.; Hung, L.-W.; Jain, S.; McCoy, A.J.; et al. Macromolecular structure determination using X-rays, neutrons and electrons: Recent developments in Phenix. Acta Crystallogr. Sect. D 2019, 75, 861–877. [Google Scholar] [CrossRef] [Green Version]
- Bai, X.-C.; McMullan, G.; Scheres, S.H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 2015, 40, 49–57. [Google Scholar] [CrossRef]
- Wüthrich, K. The way to NMR structures of proteins. Nat. Struct. Biol. 2001, 8, 923–925. [Google Scholar] [CrossRef]
- Drenth, J. Principles of Protein X-ray Crystallography; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Anfinsen, C.B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef] [Green Version]
- Pauling, L.; Corey, R.B. Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds. Two New Pleated Sheets 1951, 37, 729–740. [Google Scholar]
- Pauling, L.; Corey, R.B.; Branson, H.R. The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA 1951, 37, 205–211. [Google Scholar] [CrossRef] [Green Version]
- Goodwin, S.; McPherson, J.D.; McCombie, W.R. Coming of age: Ten years of next-generation sequencing technologies. Nat. Rev. Genet. 2016, 17, 333. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Tegge, A.N.; Baldi, P. Machine Learning Methods for Protein Structure Prediction. IEEE Rev. Biomed. Eng. 2008, 1, 41–49. [Google Scholar] [CrossRef] [PubMed]
- Sun, S. Reduced representation model of protein structure prediction: Statistical potential and genetic algorithms. Protein Sci. 1993, 2, 762–785. [Google Scholar] [CrossRef] [Green Version]
- Torrisi, M.; Pollastri, G.; Le, Q. Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. J. 2020, 18, 1301–1310. [Google Scholar] [CrossRef]
- Rost, B.; Sander, C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct. Funct. Bioinform. 1994, 19, 55–72. [Google Scholar] [CrossRef]
- Kuhlman, B.; Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019, 20, 681–697. [Google Scholar] [CrossRef]
- Owens, J.D.; Houston, M.; Luebke, D.; Green, S.; Stone, J.E.; Phillips, J.C. GPU computing. Proc. IEEE 2008, 96, 879–899. [Google Scholar] [CrossRef]
- Wilkins, A.D.; Bachman, B.J.; Erdin, S.; Lichtarge, O. The use of evolutionary patterns in protein annotation. Curr. Opin. Struct. Biol. 2012, 22, 316–325. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Floudas, C.; Fung, H.; McAllister, S.; Mönnigmann, M.; Rajgaria, R. Advances in protein structure prediction and de novo protein design: A review. Chem. Eng. Sci. 2006, 61, 966–988. [Google Scholar] [CrossRef]
- Moult, J. A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 2005, 15, 285–289. [Google Scholar] [CrossRef]
- Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins Struct. Funct. Bioinform. 2018, 86, 7–15. [Google Scholar] [CrossRef] [PubMed]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XIII. Proteins 2019, 87, 1011–1020. [Google Scholar] [CrossRef] [Green Version]
- Sun, T.; Zhou, B.; Lai, L.; Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform. 2017, 18, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Wen, M.; Zhang, Z.; Niu, S.; Sha, H.; Yang, R.; Yun, Y.; Lu, H. Deep-learning-based drug–target interaction prediction. J. Proteome Res. 2017, 16, 1401–1409. [Google Scholar] [CrossRef]
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef] [PubMed]
- Rodionov, M.A.; Blundell, T.L. Sequence and structure conservation in a protein core. Proteins Struct. Funct. Bioinform. 1998, 33, 358–366. [Google Scholar] [CrossRef]
- Sadowski, M.I.; Jones, D.T. The sequence–structure relationship and protein function prediction. Curr. Opin. Struct. Biol. 2009, 19, 357–362. [Google Scholar] [CrossRef]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef] [Green Version]
- Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In A Field Guide to Dynamical Recurrent Neural Networks; IEEE Press: Hoboken, NJ, USA, 2001. [Google Scholar]
- Minai, A.A.; Williams, R.D. Perturbation response in feedforward networks. Neural Netw. 1994, 7, 783–796. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Hu, Y.; Huber, A.; Anumula, J.; Liu, S.-C. Overcoming the vanishing gradient problem in plain recurrent networks. arXiv 2018, arXiv:1801.06105. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal. Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
- Baldi, P.; Brunak, S.; Frasconi, P.; Soda, G.; Pollastri, G. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 1999, 15, 937–946. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Di Lena, P.; Nagata, K.; Baldi, P. Deep architectures for protein contact map prediction. Bioinformatics 2012, 28, 2449–2457. [Google Scholar] [CrossRef] [Green Version]
- Pérez-Ortiz, J.A.; Gers, F.A.; Eck, D.; Schmidhuber, J. Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Netw. 2003, 16, 241–250. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative study of CNN and RNN for natural language processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
- Hanson, J.; Paliwal, K.; Litfin, T.; Yang, Y.; Zhou, Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 2018, 34, 4039–4045. [Google Scholar] [CrossRef] [PubMed]
- Gligorijevic, V.; Renfrew, P.D.; Kosciolek, T.; Leman, J.K.; Berenberg, D.; Vatanen, T.; Chandler, C.; Taylor, B.C.; Fisk, I.M.; Vlamakis, H. Structure-based function prediction using graph convolutional networks. bioRxiv 2020. [Google Scholar] [CrossRef]
- Torrisi, M.; Kaleel, M.; Pollastri, G. Deeper profiles and cascaded recurrent and convolutional neural networks for state-of-the-art protein secondary structure prediction. Sci. Rep. 2019, 9, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Qiao, S.; Ji, S.; Li, Y. DeepSite: Bidirectional LSTM and CNN models for predicting DNA–protein binding. Int. J. Mach. Learn. Cybern. 2020, 11, 841–851. [Google Scholar] [CrossRef]
- Yang, Y.; Gao, J.; Wang, J.; Heffernan, R.; Hanson, J.; Paliwal, K.; Zhou, Y. Sixty-five years of the long march in protein secondary structure prediction: The final stretch? Brief. Bioinform. 2018, 19, 482–494. [Google Scholar] [CrossRef] [Green Version]
- Cuff, J.A.; Clamp, M.E.; Siddiqui, A.S.; Finlay, M.; Barton, G.J. JPred: A consensus secondary structure prediction server. Bioinformatics 1998, 14, 892–893. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cuff, J.A.; Barton, G.J. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins Struct. Funct. Bioinform. 2000, 40, 502–511. [Google Scholar] [CrossRef]
- McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics 2000, 16, 404–405. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [Green Version]
- Magnan, C.N.; Baldi, P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 2014, 30, 2592–2597. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bau, D.; Martin, A.J.; Mooney, C.; Vullo, A.; Walsh, I.; Pollastri, G. Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinform. 2006, 7, 402. [Google Scholar] [CrossRef] [Green Version]
- Torrisi, M.; Kaleel, M.; Pollastri, G. Porter 5: Fast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes. bioRxiv 2018. [Google Scholar] [CrossRef]
- Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 2012, 9, 173–175. [Google Scholar] [CrossRef]
- Mooney, C.; Vullo, A.; Pollastri, G. Protein structural motif prediction in multidimensional ø-ψ space leads to improved secondary structure prediction. J. Comput. Biol. 2006, 13, 1489–1502. [Google Scholar] [CrossRef]
- Kaleel, M.; Torrisi, M.; Mooney, C.; Pollastri, G. PaleAle 5.0: Prediction of protein relative solvent accessibility by deep learning. Amino Acids 2019, 51, 1289–1296. [Google Scholar] [CrossRef]
- Klausen, M.S.; Jespersen, M.C.; Nielsen, H.; Jensen, K.K.; Jurtz, V.I.; Sønderby, C.K.; Sommer, M.O.A.; Winther, O.; Nielsen, M.; Petersen, B. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 2019, 87, 520–527. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wood, M.J.; Hirst, J.D. Protein secondary structure prediction with dihedral angles. PROTEINS Struct. Funct. Bioinform. 2005, 59, 476–481. [Google Scholar] [CrossRef] [PubMed]
- Kountouris, P.; Hirst, J.D. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinform. 2009, 10, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Faraggi, E.; Zhang, T.; Yang, Y.; Kurgan, L.; Zhou, Y. SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J. Comput. Chem. 2012, 33, 259–267. [Google Scholar] [CrossRef] [Green Version]
- Hanson, J.; Paliwal, K.; Litfin, T.; Yang, Y.; Zhou, Y. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics 2019, 35, 2403–2410. [Google Scholar] [CrossRef]
- Yang, Y.; Heffernan, R.; Paliwal, K.; Lyons, J.; Dehzangi, A.; Sharma, A.; Wang, J.; Sattar, A.; Zhou, Y. Spider2: A package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks. In Prediction of Protein Secondary Structure; Springer: Berlin/Heidelberg, Germany, 2017; pp. 55–63. [Google Scholar]
- Heffernan, R.; Yang, Y.; Paliwal, K.; Zhou, Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 2017, 33, 2842–2849. [Google Scholar] [CrossRef] [Green Version]
- Kotowski, K.; Smolarczyk, T.; Roterman-Konieczna, I.; Stapor, K. ProteinUnet—An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures. J. Comput. Chem. 2021, 42, 50–59. [Google Scholar] [CrossRef]
- Heffernan, R.; Paliwal, K.; Lyons, J.; Singh, J.; Yang, Y.; Zhou, Y. Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning. J. Comput. Chem. 2018, 39, 2210–2216. [Google Scholar] [CrossRef]
- Dunker, A.K.; Lawson, J.D.; Brown, C.J.; Williams, R.M.; Romero, P.; Oh, J.S.; Oldfield, C.J.; Campen, A.M.; Ratliff, C.M.; Hipps, K.W. Intrinsically disordered protein. J. Mol. Graph. Model. 2001, 19, 26–59. [Google Scholar] [CrossRef] [Green Version]
- Dosztányi, Z. Prediction of protein disorder based on IUPred. Protein Sci. 2018, 27, 331–340. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jones, D.T.; Cozzetto, D. DISOPRED3: Precise disordered region predictions with annotated protein-binding activity. Bioinformatics 2015, 31, 857–863. [Google Scholar] [CrossRef] [PubMed]
- Hanson, J.; Paliwal, K.K.; Litfin, T.; Zhou, Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. Genom. Proteom. Bioinform. 2019, 17, 645–656. [Google Scholar] [CrossRef] [PubMed]
- Hanson, J.; Yang, Y.; Paliwal, K.; Zhou, Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 2017, 33, 685–692. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Aszodi, A.; Gradwell, M.; Taylor, W. Global fold determination from a small number of distance restraints. J. Mol. Biol. 1995, 251, 308–326. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.E.; DiMaio, F.; Yu-Ruei Wang, R.; Song, Y.; Baker, D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins Struct. Funct. Bioinform. 2014, 82, 208–218. [Google Scholar] [CrossRef] [Green Version]
- Bitbol, A.-F. Inferring interaction partners from protein sequences using mutual information. PLoS Comput. Biol. 2018, 14, e1006401. [Google Scholar] [CrossRef] [Green Version]
- Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D.S.; Sander, C.; Zecchina, R.; Onuchic, J.N.; Hwa, T.; Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA 2011, 108, E1293–E1301. [Google Scholar] [CrossRef] [Green Version]
- Jones, D.T.; Buchan, D.W.; Cozzetto, D.; Pontil, M. PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012, 28, 184–190. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Edgar, R.C.; Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol. 2006, 16, 368–373. [Google Scholar] [CrossRef] [PubMed]
- Dos Santos, R.N.; Morcos, F.; Jana, B.; Andricopulo, A.D.; Onuchic, J.N. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci. Rep. 2015, 5, 1–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Walsh, I.; Bau, D.; Martin, A.J.; Mooney, C.; Vullo, A.; Pollastri, G. Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC Struct Biol. 2009, 9, 5. [Google Scholar] [CrossRef] [Green Version]
- Eickholt, J.; Cheng, J.L. A study and benchmark of DNcon: A method for protein residue-residue contact prediction using deep networks. BMC Bioinform. 2013, 14, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Jones, D.T.; Singh, T.; Kosciolek, T.; Tetchner, S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 2015, 31, 999–1006. [Google Scholar] [CrossRef]
- Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput. Biol. 2017, 13, e1005324. [Google Scholar] [CrossRef] [Green Version]
- Adhikari, B.; Hou, J.; Cheng, J.L. DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2018, 34, 1466–1472. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Zhang, C.X.; Bell, E.W.; Zheng, W.; Zhou, X.G.; Yu, D.J.; Zhang, Y.; Kolodny, R. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput. Biol. 2021, 17, e1008865. [Google Scholar] [CrossRef]
- Liu, Y.; Palmedo, P.; Ye, Q.; Berger, B.; Peng, J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst. 2018, 6, 65–74.e3. [Google Scholar] [CrossRef] [Green Version]
- Jones, D.T.; Kandathil, S.M. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018, 34, 3308–3315. [Google Scholar] [CrossRef] [PubMed]
- Michel, M.; Menendez Hurtado, D.; Elofsson, A. PconsC4: Fast, accurate and hassle-free contact predictions. Bioinformatics 2019, 35, 2677–2679. [Google Scholar] [CrossRef] [PubMed]
- Ji, S.; Oruc, T.; Mead, L.; Rehman, M.F.; Thomas, C.M.; Butterworth, S.; Winn, P.J. DeepCDpred: Inter-residue distance and contact prediction for improved prediction of protein structure. PLoS ONE 2019, 14, e0205214. [Google Scholar] [CrossRef] [Green Version]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Zidek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef] [PubMed]
- Callaway, E. ’It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203–204. [Google Scholar] [CrossRef] [PubMed]
- AlphaFold: A Solution to a 50-Year-Old Grand Challenge in Biology (by the AlphaFold Team, Google DeepMind Blog). Available online: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology (accessed on 29 May 2021).
- Leman, J.K.; Weitzner, B.D.; Lewis, S.M.; Adolf-Bryfogle, J.; Alam, N.; Alford, R.F.; Aprahamian, M.; Baker, D.; Barlow, K.A.; Barth, P.; et al. Macromolecular modeling and design in Rosetta: Recent methods and frameworks. Nat. Methods 2020, 17, 665–680. [Google Scholar] [CrossRef] [PubMed]
- Cai, Y.; Li, X.; Sun, Z.; Lu, Y.; Zhao, H.; Hanson, J.; Paliwal, K.; Litfin, T.; Zhou, Y.; Yang, Y. SPOT-Fold: Fragment-Free Protein Structure Prediction Guided by Predicted Backbone Structure and Contact Map. J. Comput. Chem. 2020, 41, 745–750. [Google Scholar] [CrossRef] [PubMed]
- Greener, J.G.; Kandathil, S.M.; Jones, D.T. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019, 10, 3977. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019, 87, 1165–1178. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hopf, T.A.; Scharfe, C.P.; Rodrigues, J.P.; Green, A.G.; Kohlbacher, O.; Sander, C.; Bonvin, A.M.; Marks, D.S. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 2014, 3, e03430. [Google Scholar] [CrossRef]
- Sievers, F.; Higgins, D.G. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018, 27, 135–145. [Google Scholar] [CrossRef] [Green Version]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [Green Version]
- Waterhouse, A.; Bertoni, M.; Bienert, S.; Studer, G.; Tauriello, G.; Gumienny, R.; Heer, F.T.; de Beer, T.A.P.; Rempfer, C.; Bordoli, L.; et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018, 46, W296–W303. [Google Scholar] [CrossRef] [Green Version]
- Webb, B.; Sali, A. Protein Structure Modeling with MODELLER. Methods Mol. Biol. 2021, 2199, 239–255. [Google Scholar]
- Yang, J.; Yan, R.; Roy, A.; Xu, D.; Poisson, J.; Zhang, Y. The I-TASSER Suite: Protein structure and function prediction. Nat. Methods 2015, 12, 7–8. [Google Scholar] [CrossRef] [Green Version]
- Xu, G.; Ma, T.; Du, J.; Wang, Q.; Ma, J. OPUS-Rota2: An Improved Fast and Accurate Side-Chain Modeling Method. J. Chem. Theory Comput. 2019, 15, 5154–5160. [Google Scholar] [CrossRef] [PubMed]
- Huang, X.; Pearce, R.; Zhang, Y. FASPR: An open-source tool for fast and accurate protein side-chain packing. Bioinformatics 2020, 36, 3758–3765. [Google Scholar] [CrossRef]
- Krivov, G.G.; Shapovalov, M.V.; Dunbrack, R.L., Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins 2009, 77, 778–795. [Google Scholar] [CrossRef] [Green Version]
- Kuang, M.; Liu, Y.; Gao, L. DLPAlign: A Deep Learning based Progressive Alignment Method for Multiple Protein Sequences. In Proceedings of the CSBio’20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics, Bangkok, Thailand, 19–21 November 2020; pp. 83–92. [Google Scholar]
- Gao, M.; Zhou, H.; Skolnick, J. DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci. Rep. 2019, 9, 3514. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Shen, Y. Template-based prediction of protein structure with deep learning. BMC Genom. 2020, 21, 878. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Li, Y.; Zhang, C.; Pearce, R.; Mortuza, S.M.; Zhang, Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 2019, 87, 1149–1164. [Google Scholar] [CrossRef] [Green Version]
- He, B.; Mortuza, S.M.; Wang, Y.; Shen, H.B.; Zhang, Y. NeBcon: Protein contact map prediction using neural network training coupled with naive Bayes classifiers. Bioinformatics 2017, 33, 2296–2306. [Google Scholar] [CrossRef]
- Li, Y.; Hu, J.; Zhang, C.; Yu, D.J.; Zhang, Y. ResPRE: High-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2019, 35, 4647–4655. [Google Scholar] [CrossRef]
- Zheng, W.; Zhang, C.; Wuyun, Q.; Pearce, R.; Li, Y.; Zhang, Y. LOMETS2: Improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019, 47, W429–W436. [Google Scholar] [CrossRef] [Green Version]
- Moult, J.; Fidelis, K.; Kryshtafovych, A.; Rost, B.; Hubbard, T.; Tramontano, A. Critical assessment of methods of protein structure prediction—Round VII. Proteins: Struct. Funct. Bioinform. 2007, 69, 3–9. [Google Scholar] [CrossRef] [PubMed]
- Laskowski, R.A.; MacArthur, M.W.; Moss, D.S.; Thornton, J.M. PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993, 26, 283–291. [Google Scholar] [CrossRef]
- Hooft, R.W.; Sander, C.; Vriend, G. Objectively judging the quality of a protein structure from a Ramachandran plot. Bioinformatics 1997, 13, 425–430. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Conover, M.; Staples, M.; Si, D.; Sun, M.; Cao, R. AngularQA: Protein model quality assessment with LSTM networks. Comput. Math. Biophys. 2019, 7, 1–9. [Google Scholar] [CrossRef]
- Baldassarre, F.; Menéndez Hurtado, D.; Elofsson, A.; Azizpour, H. GraphQA: Protein model quality assessment using graph convolutional networks. Bioinformatics 2021, 37, 360–366. [Google Scholar] [CrossRef]
- Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A deep, convolutional neural network for bioactivity prediction in structure-based drug discovery. Abstr. Pap. Am. Chem. S 2016, 251. [Google Scholar]
- Jimenez, J.; Skalic, M.; Martinez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287–296. [Google Scholar] [CrossRef] [PubMed]
- Rifaioglu, A.S.; Nalbat, E.; Atalay, V.; Martin, M.J.; Cetin-Atalay, R.; Dogan, T. DEEPScreen: High performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem. Sci. 2020, 11, 2531–2557. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Morrone, J.A.; Weber, J.K.; Huynh, T.; Luo, H.; Cornell, W.D. Combining Docking Pose Rank and Structure with Deep Learning Improves Protein-Ligand Binding Mode Prediction over a Baseline Docking Approach. J. Chem. Inf. Model. 2020, 60, 4170–4179. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jimenez-Luna, J.; Cuzzolin, A.; Bolcato, G.; Sturlese, M.; Moro, S. A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection. Molecules 2020, 25, 2487. [Google Scholar] [CrossRef]
- Gentile, F.; Agrawal, V.; Hsing, M.; Ton, A.T.; Ban, F.; Norinder, U.; Gleave, M.E.; Cherkasov, A. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Cent. Sci. 2020, 6, 939–949. [Google Scholar] [CrossRef]
- Lu, J.; Hou, X.; Wang, C.; Zhang, Y. Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions. J. Chem. Inf. Model. 2019, 59, 4540–4549. [Google Scholar] [CrossRef]
- Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D.R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942–957. [Google Scholar] [CrossRef] [Green Version]
- Yasuo, N.; Sekijima, M. Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learning. J. Chem. Inf. Model. 2019, 59, 1050–1061. [Google Scholar] [CrossRef] [Green Version]
Method/Server | Target a | Topology (Incl. Earlier Steps) | Evolutionary Information (Incl. Earlier Steps) | Site |
---|---|---|---|---|
JPred | 1D—SS, SA | FFNN | PSI-BLAST | http://www.compbio.dundee.ac.uk/jpred/ |
SSpro | 1D—SS, SA(ACCpro) | BRNN–CNN | PSI-BLAST | http://scratch.proteomics.ics.uci.edu/ |
DISSPred | 1D—SS, TA | SVM | PSI-BLAST | https://comp.chem.nottingham.ac.uk/disspred/ |
SPIDER3 | 1D—SS, SA, TA, CN | BLSTM | PSI-BLAST HHblits None (SPIDER3-single) | https://sparks-lab.org/server/spider3/ |
ProteinUnet | 1D—SS, SA, TA, CN | CNN | None | https://codeocean.com/capsule/2521196/tree/v1 |
NetSurfP-2.0 | 1D—SS, SA, TA, DR | BLSTM | HHBlits | https://services.healthtech.dtu.dk/service.php?NetSurfP-2.0 |
IUPred | 1D—DR | Regression | None | https://iupred2a.elte.hu/ |
PSIPRED | 1D—SS(PSIPRED), DR(DISOPRED3) 2D—CM(MetaPSICOV2) 3D—TS(DMPfold) | FFNN | PSI-BLAST HHblits jackHMMer | http://bioinf.cs.ucl.ac.uk/psipred/ |
SPOT | 1D—SS, SA, TA, CN(SPOT-1D), DR(SPOT-Disorder) 2D—CM(SPOT-Contact) 3D—TS(SPOT-fold) | Residual CNN BLSTM 2D-BLSTM | PSI-BLAST HHblits | https://sparks-lab.org/service/ |
Distill(Brewery) | 1D—SS(Porter), LM(Porter+), SA(PaleAle), CN(BrownAle) 2D—CM(XX-Stout) 3D—TS(3Distill) | BRNN–CNN 2D-BRNN | PSI-BLAST HHblits | http://distillf.ucd.ie/distill/ |
RaptorX | 1D—SS, SA, DR(RaptorX-Property) 2D—CM(RaptorX-Contact) 3D—TS(RaptorX) | CNF Residual CNN | PSI-BLAST HHblits | http://raptorx.uchicago.edu/ |
MULTICOM | 2D—CM(DNCON2) 3D—TS | CNN | PSI-BLAST HHblits jackHMMer | http://sysbio.rnet.missouri.edu/dncon2/ |
TripletRes | 2D—CM | Residual CNN | HHblits jackHMMer | https://zhanglab.dcmb.med.umich.edu/TripletRes/ |
DeepContact | 2D—CM | Residual CNN | HHblits jackHMMer | https://github.com/largelymfs/deepcontact |
DeepCov | 2D—CM | CNN | HHblits | https://github.com/psipred/DeepCov |
Pconsc4 | 2D—CM | CNN | HHblits | https://github.com/ElofssonLab/PconsC4 |
DeepCDPred | 2D—MCM | FFNN | HHblits | https://github.com/PeterJamesWinn/DeepCDpred |
Alphafold2 | 2D—MCM 3D—TS | Residual CNN | PSI-BLAST HHblits | Alphafold: https://github.com/deepmind/deepmind-research/tree/master/alphafold_casp13 |
Rosetta Suite | 2D—MCM(trRosetta) 3D—TS | Residual CNN | PSI-BLAST HHblits | https://www.rosettacommons.org/ |
EVfold | 3D—TS | FFNN | HHblits jackHMMer | https://v1.evcouplings.org/complex |
DESTINI | 3D—TS | Residual CNN | HHblits PSI-BLAST | https://sites.gatech.edu/cssb/destini/ |
ThreaderAI | 3D—TS | Residual CNN | HHblits | https://github.com/ShenLab/ThreaderAI |
NEST | 3D—TS | FFNN | PSI-BLAST | http://honig.c2b2.columbia.edu/nest |
C-I-TASSER | 3D—TS | Residual CNN | PSI-BLAST | https://zhanglab.dcmb.med.umich.edu/C-I-TASSER/ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Suh, D.; Lee, J.W.; Choi, S.; Lee, Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int. J. Mol. Sci. 2021, 22, 6032. https://doi.org/10.3390/ijms22116032
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. International Journal of Molecular Sciences. 2021; 22(11):6032. https://doi.org/10.3390/ijms22116032
Chicago/Turabian StyleSuh, Donghyuk, Jai Woo Lee, Sun Choi, and Yoonji Lee. 2021. "Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction" International Journal of Molecular Sciences 22, no. 11: 6032. https://doi.org/10.3390/ijms22116032
APA StyleSuh, D., Lee, J. W., Choi, S., & Lee, Y. (2021). Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. International Journal of Molecular Sciences, 22(11), 6032. https://doi.org/10.3390/ijms22116032