Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder
Abstract
:1. Introduction
2. Results
2.1. VAEs Perform Better than AEs on Different Data Split
2.2. VAEs Can Be Further Optimized by Adjusting Hyperparameters
2.3. Visualization of VAE-Generated Conformations
2.4. Experimental Validation of VAE-Generated Conformations
2.5. Tests of VAEs on Structured Proteins
3. Discussion
4. Material and Methods
4.1. Molecular Dynamic Simulation
4.2. PDB Data Extraction and Preprocessing
4.3. Variational Autoencoder Design
4.4. VAE Training
4.5. Evaluation Criteria Calculation
4.6. Root-Mean-Square Deviation (RMSD)
4.7. Spearman Correlation Coefficient
4.8. Chemical Shift
4.9. Radius of Gyration
4.10. Refinement and Visualization of Generated Structures
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dunker, A.K.; Oldfield, C.J.; Meng, J.; Romero, P.; Yang, J.Y.; Chen, J.W.; Vacic, V.; Obradovic, Z.; Uversky, V.N. The unfoldomics decade: An update on intrinsically disordered proteins. BMC Genom. 2008, 9 (Suppl. 2), S1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uversky, V.N. Intrinsically disordered proteins from A to Z. Int. J. Biochem. Cell Biol. 2011, 43, 1090–1103. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 2000, 41, 415–427. [Google Scholar] [CrossRef] [PubMed]
- Zhao, R.; Gish, K.; Murphy, M.; Yin, Y.; Notterman, D.; Hoffman, W.H.; Tom, E.; Mack, D.H.; Levine, A.J. Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. Genes Dev. 2000, 14, 981–993. [Google Scholar] [CrossRef]
- Singleton, A.B.; Farrer, M.; Johnson, J.; Singleton, A.; Hague, S.; Kachergus, J.; Hulihan, M.; Peuralinna, T.; Dutra, A.; Nussbaum, R.; et al. alpha-Synuclein locus triplication causes Parkinson’s disease. Science 2003, 302, 841. [Google Scholar] [CrossRef] [Green Version]
- Kolarova, M.; Garcia-Sierra, F.; Bartos, A.; Ricny, J.; Ripova, D. Structure and pathology of tau protein in Alzheimer disease. Int. J. Alzheimers Dis. 2012, 2012, 731526. [Google Scholar] [CrossRef] [Green Version]
- Jensen, M.R.; Zweckstetter, M.; Huang, J.R.; Blackledge, M. Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy. Chem. Rev. 2014, 114, 6632–6660. [Google Scholar] [CrossRef]
- Castro, T.G.; Munteanu, F.D.; Cavaco-Paulo, A. Electrostatics of Tau Protein by Molecular Dynamics. Biomolecules 2019, 9, 116. [Google Scholar] [CrossRef] [Green Version]
- Allison, J.R. Computational methods for exploring protein conformations. Biochem. Soc. Trans. 2020, 48, 1707–1724. [Google Scholar] [CrossRef]
- Tsuchiya, Y.; Taneishi, K.; Yonezawa, Y. Autoencoder-Based Detection of Dynamic Allostery Triggered by Ligand Binding Based on Molecular Dynamics. J. Chem. Inf. Model. 2019, 59, 4043–4051. [Google Scholar] [CrossRef] [Green Version]
- Tian, H.; Jiang, X.; Trozzi, F.; Xiao, S.; Larson, E.C.; Tao, P. Explore Protein Conformational Space With Variational Autoencoder. Front. Mol. Biosci. 2021, 8, 781635. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Dey, S.; Hicks, A.; Zhou, H.X. Artificial intelligence guided conformational mining of intrinsically disordered proteins. Commun. Biol. 2022, 5, 610. [Google Scholar] [CrossRef] [PubMed]
- Mu, J.; Pan, Z.; Chen, H.F. Balanced Solvent Model for Intrinsically Disordered and Ordered Proteins. J. Chem. Inf. Model. 2021, 61, 5141–5151. [Google Scholar] [CrossRef] [PubMed]
- Shen, Y.; Bax, A. SPARTA+: A modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. NMR 2010, 48, 13–22. [Google Scholar] [CrossRef] [Green Version]
- Shaw, D.E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R.O.; Eastwood, M.P.; Bank, J.A.; Jumper, J.M.; Salmon, J.K.; Shan, Y.; et al. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341–346. [Google Scholar] [CrossRef] [Green Version]
- Glielmo, A.; Husic, B.E.; Rodriguez, A.; Clementi, C.; Noé, F.; Laio, A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121, 9722–9758. [Google Scholar] [CrossRef]
- Eguchi, R.R.; Choe, C.A.; Huang, P.S. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 2022, 18, e1010271. [Google Scholar] [CrossRef]
- Moritsugu, K. Multiscale Enhanced Sampling Using Machine Learning. Life 2021, 11, 1076. [Google Scholar] [CrossRef]
- Li, C.; Liu, J.; Chen, J.; Yuan, Y.; Yu, J.; Gou, Q.; Guo, Y.; Pu, X. An Interpretable Convolutional Neural Network Framework for Analyzing Molecular Dynamics Trajectories: A Case Study on Functional States for G-Protein-Coupled Receptors. J. Chem. Inf. Model. 2022, 62, 1399–1410. [Google Scholar] [CrossRef]
- Zhou, D.; Zheng, L.; Xu, J.; He, J. Misc-GAN: A Multi-scale Generative Model for Graphs. Front. Big Data 2019, 2, 3. [Google Scholar] [CrossRef] [Green Version]
- Zhou, H.; Wang, W.; Jin, J.; Zheng, Z.; Zhou, B. Graph Neural Network for Protein-Protein Interaction Prediction: A Comparative Study. Molecules 2022, 27, 6135. [Google Scholar] [CrossRef] [PubMed]
- Sanyal, S.; Anishchenko, I.; Dagar, A.; Baker, D.; Talukdar, P. ProteinGCN: Protein model quality assessment using Graph Convolutional Networks. bioRxiv 2020. bioRxiv:2020.2004.2006.028266. [Google Scholar] [CrossRef]
- Xiang, S.; Gapsys, V.; Kim, H.Y.; Bessonov, S.; Hsiao, H.H.; Möhlmann, S.; Klaukien, V.; Ficner, R.; Becker, S.; Urlaub, H.; et al. Phosphorylation drives a dynamic switch in serine/arginine-rich proteins. Structure 2013, 21, 2162–2174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sgourakis, N.G.; Yan, Y.; McCallum, S.A.; Wang, C.; Garcia, A.E. The Alzheimer’s peptides Abeta40 and 42 adopt distinct conformations in water: A combined MD/NMR study. J. Mol. Biol. 2007, 368, 1448–1457. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sterckx Yann, G.J.; Volkov Alexander, N.; Vranken Wim, F.; Kragelj, J.; Jensen Malene, R.; Buts, L.; Garcia-Pino, A.; Jové, T.; Van Melderen, L.; Blackledge, M.; et al. Small-Angle X-ray Scattering- and Nuclear Magnetic Resonance-Derived Conformational Ensemble of the Highly Flexible Antitoxin PaaA2. Structure 2014, 22, 854–865. [Google Scholar] [CrossRef] [Green Version]
- Wensley, B.G.; Batey, S.; Bone, F.A.; Chan, Z.M.; Tumelty, N.R.; Steward, A.; Kwa, L.G.; Borgia, A.; Clarke, J. Experimental evidence for a frustrated energy landscape in a three-helix-bundle protein family. Nature 2010, 463, 685–688. [Google Scholar] [CrossRef] [Green Version]
- Bertoncini, C.W.; Jung, Y.S.; Fernandez, C.O.; Hoyer, W.; Griesinger, C.; Jovin, T.M.; Zweckstetter, M. Release of long-range tertiary interactions potentiates aggregation of natively unstructured alpha-synuclein. Proc. Natl. Acad. Sci. USA 2005, 102, 1430–1435. [Google Scholar] [CrossRef] [Green Version]
- Case, D.A.; Ben-Shalom, I.Y.; Brozell, S.R.; Cerutti, D.S.; Cheatham, T.E., III; Cruzeiro, V.W.D.; Duke, R.E.; Giambasu, G.; Gilson, M.K.; Gohlke, H.; et al. AMBER 2018; University of California: San Francisco, CA, USA, 2018. [Google Scholar]
- Song, D.; Liu, H.; Luo, R.; Chen, H.F. Environment-Specific Force Field for Intrinsically Disordered and Ordered Proteins. J Chem. Inf. Model. 2020, 60, 2257–2267. [Google Scholar] [CrossRef]
- Jorgensen, W.L.; Chandrasekhar, J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926–935. [Google Scholar] [CrossRef]
- Gotz, A.W.; Williamson, M.J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R.C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 2012, 8, 1542–1555. [Google Scholar] [CrossRef]
- Essmann, U.; Perera, L.; Berkowitz, M.L.; Darden, T.; Lee, H.; Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577–8593. [Google Scholar] [CrossRef] [Green Version]
- Roe, D.R.; Cheatham, T.E., 3rd. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9, 3084–3095. [Google Scholar] [CrossRef] [PubMed]
- Degiacomi, M.T. Coupling Molecular Dynamics and Deep Learning to Mine Protein Conformational Space. Structure 2019, 27, 1034–1040. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ketkaew, R.; Creazzo, F.; Luber, S. Machine Learning-Assisted Discovery of Hidden States in Expanded Free Energy Space. J. Phys. Chem. Lett. 2022, 13, 1797–1805. [Google Scholar] [CrossRef]
- Jin, Y.; Johannissen, L.O.; Hay, S. Predicting new protein conformations from molecular dynamics simulation conformational landscapes and machine learning. Proteins 2021, 89, 915–921. [Google Scholar] [CrossRef]
- Wetzel, S.J. Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders. Phys. Rev. E 2017, 96, 022140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Song, Z.; Zhou, H.; Tian, H.; Wang, X.; Tao, P. Unraveling the energetic significance of chemical events in enzyme catalysis via machine-learning based regression approach. Commun. Chem. 2020, 3, 134. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. TensorFlow: A System for Large-Scale Machine Learning; USENIX Association: Savannah, GA, USA, 2016; pp. 265–283. [Google Scholar]
- Alam, F.F.; Shehu, A. Variational Autoencoders for Protein Structure Prediction. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Virtual, 21–24 September 2020; Association for Computing Machinery: New York, NY, USA; p. 27. [Google Scholar]
- Alam, F.F.; Rahman, T.; Shehu, A. Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection. Molecules 2020, 25, 1146. [Google Scholar] [CrossRef] [Green Version]
- Guo, X.; Du, Y.; Tadepalli, S.; Zhao, L.; Shehu, A. Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder. Bioinform. Adv. 2020, 1, vbab036. [Google Scholar] [CrossRef]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [Green Version]
- McGibbon, R.T.; Beauchamp, K.A.; Harrigan, M.P.; Klein, C.; Swails, J.M.; Hernández, C.X.; Schwantes, C.R.; Wang, L.P.; Lane, T.J.; Pande, V.S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528–1532. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Maier, J.A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Feig, M.; Karanicolas, J.; Brooks, C.L., 3rd. MMTSB Tool Set: Enhanced sampling and multiscale modeling methods for applications in structural biology. J. Mol. Graph. Model. 2004, 22, 377–395. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cragnell, C.; Durand, D.; Cabane, B.; Skepö, M. Coarse-grained modeling of the intrinsically disordered protein Histatin 5 in solution: Monte Carlo simulations in combination with SAXS. Proteins 2016, 84, 777–791. [Google Scholar] [CrossRef]
- Hou, L.; Shao, H.; Zhang, Y.; Li, H.; Menon, N.K.; Neuhaus, E.B.; Brewer, J.M.; Byeon, I.-J.L.; Ray, D.G.; Vitek, M.P.; et al. Solution NMR Studies of the Aβ(1−40) and Aβ(1−42) Peptides Establish that the Met35 Oxidation State Affects the Mechanism of Amyloid Formation. J. Am. Chem. Soc. 2004, 126, 1992–2005. [Google Scholar] [CrossRef]
- Kang, L.; Janowska, M.K.; Moriarty, G.M.; Baum, J. Mechanistic insight into the relationship between N-terminal acetylation of α-synuclein and fibril formation rates by NMR and fluorescence. PLoS ONE 2013, 8, e75018. [Google Scholar] [CrossRef] [Green Version]
- Cornilescu, G.; Marquardt, J.L.; Ottiger, M.; Bax, A. Validation of Protein Structure from Anisotropic Carbonyl Chemical Shifts in a Dilute Liquid Crystalline Phase. J. Am. Chem. Soc. 1998, 120, 6836–6837. [Google Scholar] [CrossRef]
- Otting, G.; Liepinsh, E.; Wüthrich, K. Disulfide bond isomerization in BPTI and BPTI(G36S): An NMR study of correlated mobility in proteins. Biochemistry 1993, 32, 3571. [Google Scholar] [CrossRef]
- Rauscher, S.; Gapsys, V.; Gajda, M.J.; Zweckstetter, M.; de Groot, B.L.; Grubmüller, H. Structural Ensembles of Intrinsically Disordered Proteins Depend Strongly on Force Field: A Comparison to Experiment. J. Chem. Theory Comput. 2015, 11, 5513–5524. [Google Scholar] [CrossRef] [Green Version]
- Granata, D.; Baftizadeh, F.; Habchi, J.; Galvagnion, C.; De Simone, A.; Camilloni, C.; Laio, A.; Vendruscolo, M. The inverted free energy landscape of an intrinsically disordered peptide by simulations and experiments. Sci. Rep. 2015, 5, 15449. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hofmann, H.; Soranno, A.; Borgia, A.; Gast, K.; Nettels, D.; Schuler, B. Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy. Proc. Natl. Acad. Sci. USA 2012, 109, 16155–16160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Morar, A.S.; Olteanu, A.; Young, G.B.; Pielak, G.J. Solvent-induced collapse of alpha-synuclein and acid-denatured cytochrome c. Protein Sci. 2001, 10, 2195–2199. [Google Scholar] [CrossRef] [PubMed]
- Schwalbe, M.; Ozenne, V.; Bibow, S.; Jaremko, M.; Jaremko, L.; Gajda, M.; Jensen, M.R.; Biernat, J.; Becker, S.; Mandelkow, E.; et al. Predictive atomic resolution descriptions of intrinsically disordered hTau40 and α-synuclein in solution from NMR and small angle scattering. Structure 2014, 22, 238–249. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Encoder | Latent Space | Decoder | |
---|---|---|---|
AE | 1024, 512, 256 | 256, 512, 1024 | |
VAE | 1024, 256, 64, 16 | 2 | 16, 64, 256, 1024 |
Protein Systems | Length, aa | Temperature, K | Ion Strength, mM | Time (ns) |
---|---|---|---|---|
Intrinsically Disordered Proteins | ||||
RS1 | 24 | 298 | 150 | 1000 |
Abeta40 | 40 | 277 | 20 | |
PaaA2 | 71 | 298 | 500 | |
R17 | 100 | 295 | 108 | |
α-synuclein | 140 | 285.5 | 150 | |
Ordered Proteins | ||||
Ubiquitin | 76 | 298 | 50 | 1000 |
BPTI | 58 | 300 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, J.-J.; Zhang, N.-J.; Wei, T.; Chen, H.-F. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. Int. J. Mol. Sci. 2023, 24, 6896. https://doi.org/10.3390/ijms24086896
Zhu J-J, Zhang N-J, Wei T, Chen H-F. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. International Journal of Molecular Sciences. 2023; 24(8):6896. https://doi.org/10.3390/ijms24086896
Chicago/Turabian StyleZhu, Jun-Jie, Ning-Jie Zhang, Ting Wei, and Hai-Feng Chen. 2023. "Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder" International Journal of Molecular Sciences 24, no. 8: 6896. https://doi.org/10.3390/ijms24086896
APA StyleZhu, J. -J., Zhang, N. -J., Wei, T., & Chen, H. -F. (2023). Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. International Journal of Molecular Sciences, 24(8), 6896. https://doi.org/10.3390/ijms24086896