Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets
Abstract
:1. Introduction
- Sequence-based methods. Methods based on sequence information use features extracted from protein sequences to predict protein interaction sites. PPiPP [48] uses the position-specific scoring matrix (PSSM) and amino acid composition to predict PPI sites and achieves an area under the receiver operating characteristic (ROC) curve (AUC) of 0.729. DLPred [19], which uses long-short term memory (LSTM) to learn features such as PSSM, physical properties, and hydropathy index, obtains a higher AUC score of 0.811. Still, we need more information to improve prediction accuracy.
- Structure-based methods. Knowledge of the three-dimensional (3D) structure of the protein complex provides much valuable information on the protein interaction sites [14]. Some PPI sites predictors utilize 3D structural information of proteins for prediction. ProMate combines all the significant interface properties and reaches a success rate of 0.70 [33]. Bradford and Westhead [21] achieved a successful prediction rate of 0.76 based on protein structure data.
- Methods based on integrated information. Three-dimensional structure of proteins are far more difficult and expensive to elucidate than protein sequences, so its magnitude in protein structure databases such as the Protein Data Bank (PDB) [49] is remarkably smaller compared to that of sequences in protein sequence databases like UniProt [50]. Therefore, most methods use a combination of structural and sequence information for the prediction of PPI sites. Li et al. [38] use physicochemical properties, sequence conservation, residue disorder, secondary structure, solvent accessibility, and five 3D structural features to train a random forest model to predict PPI sites. SPPIDER [51] uses relative solvent accessibility (RSA), sequence and structure features to predict PPI sites and demonstrates that RSA prediction-based fingerprints of protein interactions significantly improve the discrimination between interacting and noninteracting sites. It yields an overall classification accuracy of about 0.74 and Matthews correlation coefficients (MCC) of 0.42. IntPred [39] uses 11 features of both sequence and structure and obtains a specificity of 0.916 and a sensitivity of 0.411. PAIRpred [24] captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. This method gives a remarkable AUC score of 0.870 and rank the first positive prediction (RFPP) value on the 176 complexes in protein–protein docking benchmark version 4.0 (DBD 4.0) [52] with its structure kernel.
2. Results
2.1. Distribution Tendency of Residues in Proteins
2.2. Residue Binding Propensity
2.3. Positive Samples with High Binding Propensity
2.4. Comparison with Randomly Sampled Data Set
2.5. Comparison with Existing Methods
3. Discussion
4. Materials and Methods
4.1. Data Sets
4.2. Definition of Interacting Residue Pairs
4.3. Distribution Tendency of Residues in Proteins
4.4. Binding Propensity of Residue Pairs
4.5. Features
4.5.1. Amino Acid Encoding
4.5.2. Sequence Features
Profile Features
Amino Acid Physicochemical Properties
4.5.3. Structure Features
4.6. Deep Learning Model
4.6.1. Input
4.6.2. Convolutional Layers
4.6.3. Pooling Layers
4.6.4. Fully Connected Layer
4.6.5. Activation Function and Loss Function
4.6.6. Model Optimization
4.7. Performance Measure
4.8. Validation on Randomly Sampled Data
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
PPIs | Protein-protein interactions |
AR | Abundance of residues |
AIR | Abundance of interacting residues |
RAIR | Relative abundance of interacting residues |
References
- Keskin, O.; Gursoy, A.; Ma, B.; Nussinov, R. Principles of Protein−Protein Interactions: What are the Preferred Ways For Proteins To Interact? Chem. Rev. 2008, 108, 1225–1244. [Google Scholar] [CrossRef] [PubMed]
- Chang, J.W.; Zhou, Y.Q.; Ul Qamar, M.T.; Chen, L.L.; Ding, Y.D. Prediction of Protein-Protein Interactions by Evidence Combining Methods. Int. J. Mol. Sci. 2016, 17, 1946. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, L.; You, Z.H.; Xia, S.X.; Liu, F.; Chen, X.; Yan, X.; Zhou, Y. Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. J. Theor. Biol. 2017, 418, 105–110. [Google Scholar] [CrossRef]
- Zhang, J.; Kurgan, L. Review and comparative assessment of sequence-based predictors of protein-binding residues. Brief. Bioinform. 2018, 19, 821–837. [Google Scholar] [CrossRef]
- Zhang, M.; Su, Q.; Lu, Y.; Zhao, M.; Niu, B. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Med. Chem. 2017, 13, 506–514. [Google Scholar] [CrossRef] [PubMed]
- Clackson, T.; Wells, J. A hot spot of binding energy in a hormone-receptor interface. Science 1995, 267, 383–386. [Google Scholar] [CrossRef]
- Bogan, A.A.; Thorn, K.S. Anatomy of hot spots in protein interfaces. J. Mol. Biol. 1998, 280, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Reichmann, D.; Rahat, O.; Cohen, M.; Neuvirth, H.; Schreiber, G. The molecular architecture of protein-protein binding sites. Curr. Opin. Struct. Biol. 2007, 17, 67–76. [Google Scholar] [CrossRef]
- Moreira, I.S.; Fernandes, P.A.; Ramos, M.J. Hot spots—A review of the protein-protein interface determinant amino-acid residues. Proteins 2007, 68, 803–812. [Google Scholar] [CrossRef]
- Ofran, Y.; Rost, B. Protein Interaction Hotspots Carved into Sequences. PLoS Comput. Biol. 2007, 3, e119. [Google Scholar] [CrossRef] [Green Version]
- Gallet, X.; Charloteaux, B.; Thomas, A.; Brasseur, R. A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 2000, 302, 917–926. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.W.; Jeong, J.C. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009, 25, 585–591. [Google Scholar] [CrossRef] [PubMed]
- Lalonde, S.; Ehrhardt, D.W.; Loque, D.; Chen, J.; Rhee, S.Y.; Frommer, W.B. Molecular and cellular approaches for the detection of protein-protein interactions: Latest techniques and current limitations. Plant. J. 2008, 53, 610–635. [Google Scholar] [CrossRef] [PubMed]
- Du, X.; Sun, S.; Hu, C.; Li, X.; Xia, J. Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor. J. Biol. Res. (Thessalon) 2016, 23, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, P.; Li, J. Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinform. 2010, 11. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.W.; You, Z.H.; Chen, X.; Li, L.P.; Huang, D.S.; Yan, G.Y.; Nie, R.; Huang, Y.A. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 2017, 8, 23638–23649. [Google Scholar] [CrossRef]
- Cuendet, M.A.; Michielin, O. Protein-protein interaction investigated by steered molecular dynamics: The TCR-pMHC complex. Biophys. J. 2008, 95, 3575–3590. [Google Scholar] [CrossRef] [Green Version]
- Terashi, G.; Takeda-Shitaka, M.; Takaya, D.; Komatsu, K.; Umeyama, H. Searching for protein-protein interaction sites and docking by the methods of molecular dynamics, grid scoring, and the pairwise interaction potential of amino acid residues. Proteins 2010, 60, 289–295. [Google Scholar] [CrossRef]
- Zhang, B.; Li, J.; Quan, L.; Chen, Y.; Lü, Q. Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 2019, 357, 86–100. [Google Scholar] [CrossRef]
- Koike, A.; Takagi, T. Prediction of protein–protein interaction sites using support vector machines. Protein Eng. Des. Sel. 2004, 17, 165–173. [Google Scholar] [CrossRef]
- Bradford, J.R.; Westhead, D.R. Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2004, 21, 1487–1494. [Google Scholar] [CrossRef]
- Wang, B.; Chen, P.; Huang, D.S.; Li, J.J.; Lok, T.M.; Lyu, M.R. Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 2006, 580, 380–384. [Google Scholar] [CrossRef] [Green Version]
- Zellner, H.; Staudigel, M.; Trenner, T.; Bittkowski, M.; Wolowski, V.; Icking, C.; Merkl, R. PresCont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2012, 80, 154–168. [Google Scholar] [CrossRef]
- Minhas, F.; Geiss, B.J.; Ben-Hur, A. PAIRpred: Partner-specific prediction of interacting residues from sequence and structure. Proteins 2014, 82, 1142–1155. [Google Scholar] [CrossRef] [Green Version]
- Dong, Q.; Wang, X.; Lin, L.; Guan, Y. Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins. BMC Bioinform. 2007, 8, 147–160. [Google Scholar] [CrossRef] [Green Version]
- Sriwastava, B.K.; Basu, S.; Maulik, U. Protein-protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM. J. Biosci. 2015, 40, 809–818. [Google Scholar] [CrossRef]
- Zhou, H.X.; Shan, Y. Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44, 336–343. [Google Scholar] [CrossRef]
- Fariselli, P.; Pazos, F.; Valencia, A.; Casadio, R. Prediction of protein–protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 2002, 269, 1356–1361. [Google Scholar] [CrossRef] [Green Version]
- Ofran, Y.; Rost, B. Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 2003, 544, 236–239. [Google Scholar] [CrossRef] [Green Version]
- Chen, H.; Zhou, H.X. Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data. Proteins 2005, 61, 21–35. [Google Scholar] [CrossRef]
- Ofran, Y.; Rost, B. ISIS: Interaction sites identified from sequence. Bioinformatics 2007, 23, e13–e16. [Google Scholar] [CrossRef]
- Singh, G.; Dhole, K.; Pai, P.P.; Mondal, S. SPRINGS: Prediction of protein-protein interaction sites using artificial neural networks. PeerJ PrePrints 2014, 2167–9843. [Google Scholar] [CrossRef]
- Neuvirth, H.; Raz, R.; Schreiber, G. ProMate: A structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 2004, 338, 181–199. [Google Scholar] [CrossRef]
- Bradford, J.R.; Needham, C.J.; Bulpitt, A.J.; Westhead, D.R. Insights into protein-protein interfaces using a Bayesian network prediction method. J. Mol. Biol. 2006, 362, 365–386. [Google Scholar] [CrossRef] [Green Version]
- Murakami, Y.; Mizuguchi, K. Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 2010, 26, 1841–1848. [Google Scholar] [CrossRef]
- Geng, H.; Lu, T.; Lin, X.; Liu, Y.; Yan, F. Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier. Biochem. Res. Int. 2015, 2015, 978193. [Google Scholar] [CrossRef] [Green Version]
- Šikić, M.; Tomić, S.; Vlahoviček, K. Prediction of protein–protein interaction sites in sequences and 3D structures by random forests. PLoS Comput. Biol. 2009, 5, e1000278. [Google Scholar] [CrossRef] [Green Version]
- Li, B.Q.; Feng, K.Y.; Chen, L.; Huang, T.; Cai, Y.D. Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 2012, 7, e43927. [Google Scholar] [CrossRef]
- Northey, T.; Baresic, A.; Martin, A.C.R. IntPred: A structure-based predictor of protein-protein interaction sites. Bioinformatics 2017, 34, 223–229. [Google Scholar] [CrossRef] [Green Version]
- Wei, Z.S.; Yang, J.Y.; Shen, H.B.; Yu, D.-J. A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites. IEEE. Trans. Nanobiosci. 2015, 14, 1. [Google Scholar] [CrossRef]
- Li, M.H.; Lin, L.; Wang, X.L.; Liu, T. Protein-protein interaction site prediction based on conditional random fields. Bioinformatics 2007, 23, 597–604. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, D.D.; Wang, R.; Yan, H. Fast prediction of protein–protein interaction sites based on Extreme Learning Machines. Neurocomputing 2014, 128, 258–266. [Google Scholar] [CrossRef]
- Dhole, K.; Singh, G.; Pai, P.P.; Mondal, S. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J. Theor. Biol. 2014, 348, 47–54. [Google Scholar] [CrossRef] [PubMed]
- Deng, L.; Guan, J.; Dong, Q.; Zhou, S. Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinform. 2009, 10, 426. [Google Scholar] [CrossRef] [Green Version]
- Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.C. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J. Biomol. Struct. Dyn. 2016, 34, 1946–1961. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA, 3–12 December 2012; pp. 1097–1105. [Google Scholar]
- Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Ahmad, S.; Mizuguchi, K. Partner-aware prediction of interacting residues in protein-protein complexes from sequence data. PLoS ONE 2011, 6, e29104. [Google Scholar] [CrossRef] [Green Version]
- Burley, S.K.; Berman, H.M.; Bhikadiya, C.; Bi, C.; Chen, L.; Di Costanzo, L.; Christie, C.; Dalenberg, K.; Duarte, J.M.; Dutta, S.; et al. RCSB Protein Data Bank: Biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019, 47, D464–D474. [Google Scholar] [CrossRef] [Green Version]
- Rolf, A.; Amos, B.; Wu, C.H.; Barker, W.C.; Brigitte, B.; Serenella, F.; Elisabeth, G.; Huang, H.; Rodrigo, L.; Michele, M. UniProt: The Universal Protein knowledgebase. Nucleic Acids Res. 2004, 46, 2699. [Google Scholar]
- Porollo, A.; Meller, J. Prediction-based fingerprints of protein-protein interactions. Proteins 2007, 66, 630–645. [Google Scholar] [CrossRef]
- Hwang, H.; Vreven, T.; Janin, J.; Weng, Z. Protein-protein docking benchmark version 4.0. Proteins 2010, 78, 3111–3114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vreven, T.; Moal, I.H.; Vangone, A.; Pierce, B.G.; Kastritis, P.L.; Torchala, M.; Chaleil, R.; Jiménez-García, B.; Bates, P.A.; Fernandez-Recio, J. Updates to the Integrated Protein–Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. J. Mol. Biol. 2015, 427, 3031–3041. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wei, Z.; Han, K.; Yang, J.; Shen, H.; Yu, D. Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 2016, 193, 201–212. [Google Scholar] [CrossRef]
- Faraggi, E.; Zhang, T.; Yang, Y.; Kurgan, L.; Zhou, Y. SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J. Comput. Chem. 2012, 33, 259–267. [Google Scholar] [CrossRef] [Green Version]
- Aumentado-Armstrong, T.T.; Istrate, B.; Murgita, R.A. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol. Biol. 2015, 10, 7. [Google Scholar] [CrossRef] [Green Version]
- Kuo, T.H.; Li, K.B. Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids. Int. J. Mol.Sci. 2016, 17, 1788–1806. [Google Scholar] [CrossRef] [Green Version]
- Ofran, Y.; Rost, B. Analysing Six Types of Protein–Protein Interfaces. J. Mol. Biol. 2003, 325, 377–387. [Google Scholar] [CrossRef]
- Samanta, U.; Pal, D.; Chakrabarti, P. Environment of tryptophan side chains in proteins. Proteins 2000, 38, 288–300. [Google Scholar] [CrossRef]
- Liu, T.Y. Easyensemble and feature selection for imbalance data sets. In Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS 2009), Shanghai, China, 3–5 August 2009; pp. 517–520. [Google Scholar]
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef]
- Sander, C.; Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 1991, 9, 56–68. [Google Scholar] [CrossRef]
- Rost, B.; Sander, C. Conservation and prediction of solvent accessibility in protein families. Proteins 1994, 20, 216–226. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Pi, D.; Wang, C. The Prediction of Protein-Protein Interaction Sites Based on RBF Classifier Improved by SMOTE. Math. Probl. Eng. 2014, 2014, 1–7. [Google Scholar] [CrossRef] [Green Version]
- Jing, X.; Dong, Q.; Hong, D.C.; Lu, R. Amino acid encoding methods for protein sequences: A comprehensive review and assessment. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F. Gapped BLAST and PSI-BLAST: A new generation of protein detabase search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Du, X.; Sun, S.; Hu, C.; Yao, Y.; Yan, Y.; Zhang, Y. DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks. J. Chem. Inf. Model. 2017, 57, 1499. [Google Scholar] [CrossRef] [PubMed]
- Jones, S.; Thornton, J.M. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 1997, 272, 121–132. [Google Scholar] [CrossRef]
- Jasti, L.S.; Fadnavis, N.W.; Addepally, U.; Daniels, S.; Deokar, S.; Ponrathnam, S. Comparison of polymer induced and solvent induced trypsin denaturation: The role of hydrophobicity. Colloids Surf. B Biointerfaces 2014, 116, 201–205. [Google Scholar] [CrossRef]
- Chanphai, P.; Bekale, L.; Tajmir-Riahi, H.A. Effect of hydrophobicity on protein–protein interactions. Eur. Polym. J. 2015, 67, 224–231. [Google Scholar] [CrossRef]
- Mihel, J.; Sikic, M.; Tomic, S.; Jeren, B.; Vlahovicek, K. PSAIA–protein structure and interaction analyzer. BMC Struct. Biol. 2008, 8, 21. [Google Scholar] [CrossRef] [Green Version]
Residue | Nw | ARw | Ns | ARs | ARw/ARs |
---|---|---|---|---|---|
A | 9558 | 0.071 | 3599 | 0.058 | 1.22 |
L | 11,587 | 0.087 | 2913 | 0.047 | 1.85 |
I | 6700 | 0.050 | 1462 | 0.024 | 2.08 |
V | 9407 | 0.070 | 2207 | 0.036 | 1.94 |
G | 9930 | 0.074 | 4839 | 0.078 | 0.95 |
K | 8155 | 0.061 | 5656 | 0.092 | 0.66 |
R | 6093 | 0.046 | 3820 | 0.062 | 0.74 |
D | 7561 | 0.057 | 4823 | 0.078 | 0.73 |
E | 8667 | 0.065 | 5700 | 0.092 | 0.71 |
H | 3128 | 0.023 | 1448 | 0.023 | 1.00 |
N | 5812 | 0.043 | 3673 | 0.060 | 0.72 |
Q | 5478 | 0.041 | 3396 | 0.055 | 0.75 |
S | 9575 | 0.072 | 5498 | 0.089 | 0.81 |
T | 8345 | 0.062 | 4069 | 0.066 | 0.94 |
C | 2742 | 0.020 | 518 | 0.008 | 2.50 |
M | 2785 | 0.021 | 791 | 0.013 | 1.62 |
Y | 4811 | 0.036 | 1812 | 0.029 | 1.24 |
W | 2035 | 0.015 | 552 | 0.009 | 1.67 |
F | 5132 | 0.038 | 1101 | 0.018 | 2.11 |
P | 6294 | 0.047 | 3845 | 0.062 | 0.76 |
Total | 133,795 | 1 | 61,722 | 1 |
i\j | A | L | I | V | G | K | R | D | E | H | N | Q | S | T | C | M | Y | W | F | P |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 0.47 | 1.58 | 1.80 | 1.24 | 1.07 | 0.58 | 1.00 | 0.60 | 0.61 | 1.22 | 0.99 | 0.99 | 0.79 | 1.09 | 2.31 | 1.88 | 2.83 | 2.52 | 2.74 | 0.44 |
L | 0.82 | 1.17 | 2.68 | 1.43 | 0.80 | 0.62 | 0.96 | 0.46 | 0.50 | 1.35 | 0.83 | 0.99 | 0.73 | 0.94 | 2.96 | 2.75 | 2.57 | 3.44 | 3.03 | 0.61 |
I | 0.75 | 2.16 | 1.38 | 1.25 | 0.93 | 0.61 | 0.97 | 0.61 | 0.52 | 1.53 | 0.90 | 1.09 | 0.66 | 0.67 | 2.76 | 1.31 | 2.50 | 3.39 | 3.36 | 0.59 |
V | 0.74 | 1.65 | 1.79 | 0.89 | 0.99 | 0.53 | 1.24 | 0.71 | 0.54 | 1.19 | 0.98 | 0.91 | 0.69 | 0.93 | 1.98 | 2.00 | 3.16 | 2.87 | 2.29 | 0.59 |
G | 0.92 | 1.33 | 1.90 | 1.42 | 0.56 | 0.62 | 1.15 | 0.66 | 0.53 | 1.01 | 1.10 | 1.02 | 0.59 | 1.13 | 3.48 | 2.12 | 2.70 | 3.76 | 2.33 | 0.53 |
K | 0.62 | 1.28 | 1.56 | 0.95 | 0.78 | 0.19 | 0.76 | 1.35 | 1.43 | 1.04 | 0.91 | 0.76 | 0.91 | 0.85 | 2.08 | 1.28 | 2.88 | 2.66 | 1.99 | 0.57 |
R | 0.63 | 1.16 | 1.45 | 1.30 | 0.84 | 0.44 | 0.41 | 1.23 | 1.04 | 1.17 | 1.06 | 1.02 | 0.76 | 0.91 | 2.07 | 1.64 | 2.68 | 3.83 | 2.12 | 0.65 |
D | 0.58 | 0.88 | 1.43 | 1.16 | 0.76 | 1.23 | 1.93 | 0.28 | 0.57 | 1.18 | 1.09 | 1.10 | 0.83 | 0.95 | 1.48 | 1.02 | 2.85 | 2.28 | 2.00 | 0.45 |
E | 0.63 | 0.99 | 1.28 | 0.93 | 0.64 | 1.38 | 1.71 | 0.60 | 0.28 | 1.40 | 1.06 | 0.93 | 0.97 | 1.08 | 1.14 | 2.20 | 2.08 | 2.14 | 2.09 | 0.62 |
H | 0.74 | 1.59 | 2.22 | 1.21 | 0.72 | 0.59 | 1.14 | 0.73 | 0.82 | 0.50 | 1.10 | 0.83 | 0.71 | 0.98 | 3.07 | 2.10 | 2.19 | 2.75 | 2.23 | 0.64 |
N | 0.73 | 1.18 | 1.59 | 1.21 | 0.95 | 0.63 | 1.25 | 0.83 | 0.76 | 1.34 | 0.54 | 1.12 | 0.75 | 1.22 | 1.41 | 1.23 | 2.85 | 2.26 | 1.76 | 0.66 |
Q | 0.75 | 1.46 | 1.99 | 1.16 | 0.90 | 0.54 | 1.25 | 0.86 | 0.69 | 1.04 | 1.16 | 0.42 | 0.75 | 1.07 | 1.35 | 1.62 | 2.37 | 2.46 | 2.43 | 0.80 |
S | 0.77 | 1.37 | 1.54 | 1.12 | 0.67 | 0.82 | 1.19 | 0.82 | 0.91 | 1.14 | 0.99 | 0.95 | 0.37 | 1.05 | 3.53 | 2.00 | 2.12 | 2.92 | 2.16 | 0.68 |
T | 0.82 | 1.36 | 1.19 | 1.16 | 0.98 | 0.59 | 1.08 | 0.73 | 0.78 | 1.20 | 1.23 | 1.04 | 0.81 | 0.51 | 1.10 | 1.68 | 2.85 | 3.61 | 2.21 | 0.66 |
C | 0.76 | 1.88 | 2.17 | 1.09 | 1.34 | 0.64 | 1.09 | 0.50 | 0.37 | 1.67 | 0.63 | 0.58 | 1.20 | 0.48 | 0.63 | 2.49 | 2.36 | 2.78 | 3.09 | 0.63 |
M | 0.74 | 2.08 | 1.23 | 1.32 | 0.97 | 0.47 | 1.03 | 0.41 | 0.84 | 1.36 | 0.65 | 0.83 | 0.81 | 0.88 | 2.97 | 1.19 | 2.54 | 1.86 | 3.80 | 0.73 |
Y | 0.80 | 1.41 | 1.70 | 1.50 | 0.89 | 0.76 | 1.22 | 0.83 | 0.57 | 1.03 | 1.10 | 0.88 | 0.62 | 1.08 | 2.03 | 1.84 | 0.70 | 3.03 | 2.82 | 0.91 |
W | 0.62 | 1.63 | 1.99 | 1.18 | 1.08 | 0.61 | 1.51 | 0.57 | 0.51 | 1.11 | 0.75 | 0.79 | 0.74 | 1.19 | 2.07 | 1.16 | 2.62 | 1.67 | 1.81 | 0.92 |
F | 0.83 | 1.78 | 2.43 | 1.16 | 0.82 | 0.56 | 1.03 | 0.62 | 0.62 | 1.11 | 0.72 | 0.96 | 0.67 | 0.90 | 2.84 | 2.94 | 3.01 | 2.23 | 1.12 | 0.84 |
P | 0.49 | 1.31 | 1.58 | 1.11 | 0.69 | 0.59 | 1.15 | 0.52 | 0.68 | 1.18 | 0.99 | 1.16 | 0.78 | 0.98 | 2.13 | 2.09 | 3.59 | 4.17 | 3.09 | 0.35 |
Polarity 1 | 8.1 | 4.9 | 5.2 | 5.9 | 9.0 | 11.3 | 10.5 | 13.0 | 12.3 | 10.4 | 11.6 | 10.5 | 9.2 | 8.6 | 5.5 | 5.7 | 6.2 | 5.4 | 5.2 | 8.0 |
Hydrophobicity 1 | 0.62 | 1.06 | 1.38 | 1.08 | 0.48 | −1.5 | −2.53 | −0.9 | −0.74 | −0.4 | −0.78 | −0.85 | −0.18 | −0.05 | 0.29 | 0.64 | 0.26 | 0.81 | 1.19 | 0.12 |
ARw/ARs 2 | 1.22 | 1.85 | 2.08 | 1.94 | 0.95 | 0.66 | 0.74 | 0.73 | 0.71 | 1 | 0.72 | 0.75 | 0.81 | 0.94 | 2.5 | 1.62 | 1.24 | 1.67 | 2.11 | 0.76 |
Data Set | Method | RFPP | ||||||
---|---|---|---|---|---|---|---|---|
10% | 25% | 50% | 75% | 90% | 100% | |||
DBD 4.0 (116 Dimers) | PAIRPred | 1 | 4 | 11 | 53 | 194 | 2861 | |
DBD 4.0 (116 Dimers) | OURS | High propensity | 2 | 8 | 26 | 69 | 169 | 580 |
DBD 5.0 (138 Dimers) | OURS | High propensity | 2 | 8 | 30 | 82 | 224 | 582 |
DBD Version | No. of Complexes (Dimers) | |
---|---|---|
Total | Used | |
4.0 | 175 (117) | 174 1 (116 2) |
5.0 | 230 (139) | (138 3) |
Version | PDB ID |
---|---|
DBD 4.0 And DBD5.0 | 1ACB 1AK4 1ATN 1AVX 1AY7 1B6C 1BKD 1BUH 1BVN 1CGI 1CLV 1D6R 1DFJ 1E6E 1E96 1EAW 1EFN 1EWY 1F34 1F6M 1FC2 1FFW 1FLE 1FQ1 1FQJ 1GCQ 1GHQ 1GL1 1GLA 1GPW 1GRN 1GXD 1H1V 1H9D 1HE1 1HE8 1I2M 1IBR 1IRA 1J2J 1JIW 1JK9 1JTG 1KAC 1KTZ 1KXP 1KXQ 1LFD 1M10 1MAH 1MQ8 1NW9 1OC0 1OPH 1PPE 1PVH 1PXV 1QA9 1R0R 1R6Q 1R8S 1S1Q 1SBB 1SYX 1T6B 1TMQ 1UDI 1US7 1WQ1 1XD3 1XQS 1Y64 1YVB 1Z0K 1Z5Y 1ZHH 1ZHI 1ZM4 2A5T 2A9K 2ABZ 2AJF 2AYO 2B42 2BTF 2C0L 2CFH 2FJU 2G77 2H7V 2HLE 2HQS 2HRK 2I25 2I9B 2IDO 2J0T 2J7P 2NZ8 2O3B 2O8V 2OOB 2OT3 2OUL 2OZA 2PCC 2SIC 2SNI 2UUY 2VDB 2Z0E 3CPH 3D5S 3SGQ 4CPA 7CEI |
DBD 5.0 | 1JTD 2A1A 2GAF 2YVJ 3A4S 3K75 3PC8 3VLB 4H03 2GTP 2X9A 3BIW 3H2V 4M76 4FZA 4IZ7 3BX7 3DAW 3S9D 3FN1 1RKE 3F1P |
Residue | Max ASA(A2) | Residue | Max ASA(A2) |
---|---|---|---|
A | 106 | E | 194 |
L | 164 | H | 184 |
I | 169 | N | 157 |
V | 142 | Q | 198 |
G | 84 | S | 130 |
K | 205 | T | 142 |
R | 248 | C | 135 |
D | 163 | M | 188 |
Y | 222 | F | 197 |
W | 227 | P | 136 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, Z.; Deng, X.; Shu, K. Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int. J. Mol. Sci. 2020, 21, 467. https://doi.org/10.3390/ijms21020467
Xie Z, Deng X, Shu K. Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. International Journal of Molecular Sciences. 2020; 21(2):467. https://doi.org/10.3390/ijms21020467
Chicago/Turabian StyleXie, Zengyan, Xiaoya Deng, and Kunxian Shu. 2020. "Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets" International Journal of Molecular Sciences 21, no. 2: 467. https://doi.org/10.3390/ijms21020467
APA StyleXie, Z., Deng, X., & Shu, K. (2020). Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. International Journal of Molecular Sciences, 21(2), 467. https://doi.org/10.3390/ijms21020467