3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures
Abstract
:1. Introduction
2. Results
2.1. Domain and Non-Domain Region Analysis
2.2. Secondary Structure Prediction with 3DCONS-DB
2.3. Residue Contact Number Predition with 3DCONS-DB
3. Discussion
4. Materials and Methods
4.1. Comparison of Domain and Non-Domain Regions
4.2. Database and Web Server
4.3. The Web Client
Supplementary Materials
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Yang, Y.; Heffernan, R.; Paliwal, K.; Lyons, J.; Dehzangi, A.; Sharma, A.; Wang, J.; Sattar, A.; Zhou, Y. Spider2: A package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks. Methods Mol. Biol. 2017, 1484, 55–63. [Google Scholar] [PubMed]
- Wang, S.; Peng, J.; Ma, J.; Xu, J. Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep. 2016, 6, 18962. [Google Scholar] [CrossRef] [PubMed]
- Skwark, M.J.; Raimondi, D.; Michel, M.; Elofsson, A. Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput. Biol. 2014, 10, e1003889. [Google Scholar] [CrossRef] [PubMed]
- Ishida, T.; Kinoshita, K. Prdos: Prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007, 35, W460–W464. [Google Scholar] [CrossRef] [PubMed]
- Zhou, J.; Xu, R.; He, Y.; Lu, Q.; Wang, H.; Kong, B. Pdnasite: Identification of DNA-binding site from protein sequence by incorporating spatial and sequence context. Sci. Rep. 2016, 6, 27653. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.; Tang, J.; Zou, Q. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci. 2017, 384, 135–144. [Google Scholar] [CrossRef]
- Melo, R.; Fieldhouse, R.; Melo, A.; Correia, J.D.; Cordeiro, M.N.; Gumus, Z.H.; Costa, J.; Bonvin, A.M.; Moreira, I.S. A machine learning approach for hot-spot detection at protein-protein interfaces. Int. J. Mol. Sci. 2016, 17. [Google Scholar] [CrossRef] [PubMed]
- Marchler-Bauer, A.; Derbyshire, M.K.; Gonzales, N.R.; Lu, S.; Chitsaz, F.; Geer, L.Y.; Geer, R.C.; He, J.; Gwadz, M.; Hurwitz, D.I.; et al. CDD: Ncbi’s conserved domain database. Nucleic Acids Res. 2015, 43, D222–D226. [Google Scholar] [CrossRef] [PubMed]
- Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A.; et al. The pfam protein families database: Towards a more sustainable future. Nucleic Acids Res. 2016, 44, D279–D285. [Google Scholar] [CrossRef] [PubMed]
- Letunic, I.; Doerks, T.; Bork, P. Smart: Recent updates, new developments and status in 2015. Nucleic Acids Res. 2015, 43, D257–D260. [Google Scholar] [CrossRef] [PubMed]
- Tatusov, R.L.; Fedorova, N.D.; Jackson, J.D.; Jacobs, A.R.; Kiryutin, B.; Koonin, E.V.; Krylov, D.M.; Mazumder, R.; Mekhedov, S.L.; Nikolskaya, A.N.; et al. The COG database: An updated version includes eukaryotes. BMC Bioinform. 2003, 4, 41. [Google Scholar] [CrossRef] [PubMed]
- Haft, D.H.; Selengut, J.D.; Richter, R.A.; Harkins, D.; Basu, M.K.; Beck, E. Tigrfams and genome properties in 2013. Nucleic Acids Res. 2013, 41, D387–D395. [Google Scholar] [CrossRef] [PubMed]
- Gowri, V.S.; Krishnadev, O.; Swamy, C.S.; Srinivasan, N. Mulpssm: A database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res. 2006, 34, D243–D246. [Google Scholar] [CrossRef] [PubMed]
- Shameer, K.; Nagarajan, P.; Gaurav, K.; Sowdhamini, R. 3PFDB—A database of best representative pssm profiles (brps) of protein families generated using a novel data mining approach. BioData Min. 2009, 2, 8. [Google Scholar] [CrossRef] [PubMed]
- Dawson, N.L.; Sillitoe, I.; Lees, J.G.; Lam, S.D.; Orengo, C.A. CATH-Gene3d: Generation of the resource and its use in obtaining structural and functional annotations for protein sequences. Methods Mol. Biol. 2017, 1558, 79–110. [Google Scholar] [PubMed]
- Oates, M.E.; Stahlhacke, J.; Vavoulis, D.V.; Smithers, B.; Rackham, O.J.; Sardar, A.J.; Zaucha, J.; Thurlby, N.; Fang, H.; Gough, J. The superfamily 1.75 database in 2014: A doubling of data. Nucleic Acids Res. 2015, 43, D227–D233. [Google Scholar] [CrossRef] [PubMed]
- Sillitoe, I.; Lewis, T.E.; Cuff, A.; Das, S.; Ashford, P.; Dawson, N.L.; Furnham, N.; Laskowski, R.A.; Lee, D.; Lees, J.G.; et al. Cath: Comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015, 43, D376–D381. [Google Scholar] [CrossRef] [PubMed]
- Andreeva, A.; Howorth, D.; Chothia, C.; Kulesha, E.; Murzin, A.G. Scop2 prototype: A new approach to protein structure mining. Nucleic Acids Res. 2014, 42, D310–D314. [Google Scholar] [CrossRef] [PubMed]
- Berman, H.; Henrick, K.; Nakamura, H.; Markley, J.L. The worldwide protein data bank (wwPDB): Ensuring a single, uniform archive of pdb data. Nucleic Acids Res. 2007, 35, D301–D303. [Google Scholar] [CrossRef] [PubMed]
- Dinkel, H.; Van Roey, K.; Michael, S.; Kumar, M.; Uyar, B.; Altenberg, B.; Milchevskaya, V.; Schneider, M.; Kuhn, H.; Behrendt, A.; et al. Elm 2016—Data update and new functionality of the eukaryotic linear motif resource. Nucleic Acids Res. 2016, 44, D294–D300. [Google Scholar] [CrossRef] [PubMed]
- Byun, J.A.; Melacini, G. Disordered regions flanking ordered domains modulate signaling transduction. Biophys. J. 2015, 109, 2447–2448. [Google Scholar] [CrossRef] [PubMed]
- Wright, P.E.; Dyson, H.J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell. Biol. 2015, 16, 18–29. [Google Scholar] [CrossRef] [PubMed]
- Williamson, R.M. Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters. J. Theor. Biol. 1995, 174, 179–188. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 1999, 292, 195–202. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Z. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinform. 2005, 6, 248. [Google Scholar] [CrossRef] [PubMed]
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef] [PubMed]
- Hornbeck, P.V.; Zhang, B.; Murray, B.; Kornhauser, J.M.; Latham, V.; Skrzypek, E. Phosphositeplus, 2014: Mutations, ptms and recalibrations. Nucleic Acids Res. 2015, 43, D512–D520. [Google Scholar] [CrossRef] [PubMed]
- Segura, J.; Sanchez-Garcia, R.; Martinez, M.; Cuenca-Alba, J.; Tabas-Madrid, D.; Sorzano, C.O.S.; Carazo, J.M. 3DBIONOTES v2.0: A web server for the automatic annotation of macromolecular structures. Bioinformatics 2017, 33, 3655–3657. [Google Scholar] [CrossRef] [PubMed]
- Tabas-Madrid, D.; Segura, J.; Sanchez-Garcia, R.; Cuenca-Alba, J.; Sorzano, C.O.; Carazo, J.M. 3DBIONOTES: A unified, enriched and interactive view of macromolecular information. J. Struct. Biol. 2016, 194, 231–234. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.J.; Shamsaddini, A.; Pan, Y.; Smith, K.; Crichton, D.J.; Simonyan, V.; Mazumder, R. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a high-performance integrated virtual environment (HIVE). Database 2014, 2014. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed]
- Suzek, B.E.; Wang, Y.; Huang, H.; McGarvey, P.B.; Wu, C.H.; UniProt, C. Uniref clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015, 31, 926–932. [Google Scholar] [CrossRef] [PubMed]
- Rose, A.S.; Hildebrand, P.W. NGL viewer: A web application for molecular visualization. Nucleic Acids Res. 2015, 43, W576–W579. [Google Scholar] [CrossRef] [PubMed]
Sample Availability: Not available. |
Residues (%) 1 | SS (%) 2 | BS (%) 3 | PTM (%) 4 | SLiM (%) 5 | Variants (%) 6 | |
---|---|---|---|---|---|---|
Domain | 78 | 81 | 78 | 62 | 63 | 77 |
Non-domain | 22 | 19 | 22 | 38 | 37 | 23 |
Region 1 | Gap Freq. (%) 2 | Entropy 3 | Entropy 4 |
---|---|---|---|
Domain | 1.8 | 1.36 | 1.97 |
Non-domain | 10.5 | 1.11 | 1.62 |
Threshold 1 | 8 Å | 10 Å | 12 Å | 14 Å |
---|---|---|---|---|
Yuan et al. 2 | 0.77 | 0.75 | 0.72 | 0.72 |
3DCONS-DB 3 | 0.62 | 0.64 | 0.68 | 0.69 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sanchez-Garcia, R.; Sorzano, C.O.S.; Carazo, J.M.; Segura, J. 3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures. Molecules 2017, 22, 2230. https://doi.org/10.3390/molecules22122230
Sanchez-Garcia R, Sorzano COS, Carazo JM, Segura J. 3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures. Molecules. 2017; 22(12):2230. https://doi.org/10.3390/molecules22122230
Chicago/Turabian StyleSanchez-Garcia, Ruben, Carlos Oscar Sanchez Sorzano, Jose Maria Carazo, and Joan Segura. 2017. "3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures" Molecules 22, no. 12: 2230. https://doi.org/10.3390/molecules22122230
APA StyleSanchez-Garcia, R., Sorzano, C. O. S., Carazo, J. M., & Segura, J. (2017). 3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures. Molecules, 22(12), 2230. https://doi.org/10.3390/molecules22122230