MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Collection and Preprocessing
2.2. Information Encoding
2.2.1. Sequence Information
2.2.2. Physicochemical Properties
2.2.3. Structure Information
2.3. MDCAN-Lys Architecture
2.3.1. Dense Convolutional Network for Feature Extraction
2.3.2. CBAM for Adaptive Feature Optimization
2.4. Model Training
2.5. Performance Evaluation
3. Results and Discussion
3.1. Selection of Window Size
3.2. Comparison with Existing Methods
3.3. Ablation Experiments
3.3.1. Feature Combination Ablation Experiment
3.3.2. Model Architecture Ablation Experiment
3.4. Biological Insights into Succinylation Prediction
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Khoury, G.A.; Baliban, R.C.; Floudas, C.A. Proteome-wide post-translational modification statistics: Frequency analysis and curation of the swiss-prot database. Sci. Rep. 2011, 1, 90. [Google Scholar] [CrossRef]
- Zhang, Z.; Tan, M.; Xie, Z.; Dai, L.; Chen, Y.; Zhao, Y. Identification of lysine succinylation as a new post-translational modification. Nat. Chem. Biol. 2010, 7, 58–63. [Google Scholar] [CrossRef]
- Wang, Y.; Guo, Y.R.; Liu, K.; Peiying, Y.; Liu, R.; Xia, Y.; Tan, L.; Yang, P.; Lee, J.-H.; Li, X.-J.; et al. KAT2A coupled with the α-KGDH complex acts as a histone H3 succinyltransferase. Nat. Cell Biol. 2017, 552, 273–277. [Google Scholar] [CrossRef] [Green Version]
- Papanicolaou, K.N.; Eo’Rourke, B.; Efoster, D.B. Metabolism leaves its mark on the powerhouse: Recent progress in post-translational modifications of lysine in mitochondria. Front. Physiol. 2014, 5, 301. [Google Scholar] [CrossRef] [Green Version]
- Rardin, M.J.; He, W.; Nishida, Y.; Newman, J.C.; Carrico, C.; Danielson, S.R.; Guo, A.; Gut, P.; Sahu, A.K.; Li, B.; et al. SIRT5 Regulates the Mitochondrial Lysine Succinylome and Metabolic Networks. Cell Metab. 2013, 18, 920–933. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Park, J.; Chen, Y.; Tishkoff, D.X.; Peng, C.; Tan, M.; Dai, L.; Xie, Z.; Zhang, Y.; Zwaans, B.M.; Skinner, M.E.; et al. SIRT5-Mediated Lysine Desuccinylation Impacts Diverse Metabolic Pathways. Mol. Cell 2013, 50, 919–930. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alleyn, M.; Breitzig, M.; Lockey, R.; Kolliputi, N. The dawn of succinylation: A posttranslational modification. Am. J. Physiol. Physiol. 2018, 314, C228–C232. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Wang, Z.; Li, X.; Liu, B.; Liu, M.; Liu, L.; Chen, S.; Ren, M.; Wang, Y.; Yu, M.; et al. SHMT2 Desuccinylation by SIRT5 Drives Cancer Cell Proliferation. Cancer Res. 2018, 78, 372–386. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Ding, Y.-X.; Ding, J.; Lei, Y.-H.; Wu, L.-Y.; Deng, N.-Y. iSuc-PseAAC: Predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Sci. Rep. 2015, 5, srep10184. [Google Scholar] [CrossRef]
- Hasan, M.; Yang, S.; Zhou, Y.; Mollah, N.H. SuccinSite: A computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst. 2016, 12, 786–795. [Google Scholar] [CrossRef] [PubMed]
- López, Y.; Sharma, A.; Dehzangi, A.; Lal, S.P.; Taherzadeh, G.; Sattar, A.; Tsunoda, T. Success: Evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genom. 2018, 19, 105–114. [Google Scholar] [CrossRef]
- Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J. Biol. 2016, 394, 223–230. [Google Scholar] [CrossRef] [PubMed]
- Ning, Q.; Zhao, X.; Bao, L.; Ma, Z.; Zhao, X. Detecting Succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinform. 2018, 19, 1–9. [Google Scholar] [CrossRef]
- Dehzangi, A.; López, Y.; Lal, S.P.; Taherzadeh, G.; Michaelson, J.; Sattar, A.; Tsunoda, T.; Sharma, A. PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J. Biol. 2017, 425, 97–102. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Webb, G.I.; Baggag, A.; Bensmail, H.; Song, J. PROSPECT: A web server for predicting protein histidine phosphorylation sites. J. Bioinform. Comput. Biol. 2020, 18, 2050018. [Google Scholar] [CrossRef]
- Li, S.; Yu, K.; Wang, D.; Zhang, Q.; Liu, Z.-X.; Zhao, L.; Cheng, H. Deep learning based prediction of species-specific protein S-glutathionylation sites. Biochim. Biophys. Acta Proteins Proteom. 2020, 1868, 140422. [Google Scholar] [CrossRef] [PubMed]
- López, Y.; Dehzangi, A.; Lal, S.P.; Taherzadeh, G.; Michaelson, J.; Sattar, A.; Tsunoda, T.; Sharma, A. SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. Anal. Biochem. 2017, 527, 24–32. [Google Scholar] [CrossRef] [PubMed]
- Nguyen-Vo, T.-H.; Nguyen, Q.H.; Do, T.T.; Nguyen, T.-N.; Rahardja, S.; Nguyen, B.P. iPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features. BMC Genom. 2019, 20, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Liu, Q.; Chen, J.; Wang, Y.; Li, S.; Jia, C.; Song, J.; Li, F. DeepTorrent: A deep learning-based approach for predicting DNA N4-methylcytosine sites. Brief. Bioinform. 2021, 22. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhang, X.; Lu, M.; Wang, H.; Choe, Y. Attention augmentation with multi-residual in bidirectional LSTM. Neurocomputing 2020, 385, 340–347. [Google Scholar] [CrossRef]
- Peng, D.; Zhang, D.; Liu, C.; Lu, J. BG-SAC: Entity relationship classification model based on Self-Attention supported Capsule Networks. Appl. Soft Comput. 2020, 91, 106186. [Google Scholar] [CrossRef]
- Cai, R.; Chen, X.; Fang, Y.; Wu, M.; Hao, Y. Dual-dropout graph convolutional network for predicting synthetic lethality in human cancers. Bioinformatics 2020, 36, 4458–4465. [Google Scholar] [CrossRef]
- Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
- Huang, K.-Y.; Hsu, J.B.-K.; Lee, T.-Y. Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method. Sci. Rep. 2019, 9, 1–15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ning, W.; Xu, H.; Jiang, P.; Cheng, H.; Deng, W.; Guo, Y.; Xue, Y. HybridSucc: A Hybrid-learning Architecture for General and Species-specific Succinylation Site Prediction. Genom. Proteom. Bioinform. 2020, 18, 194–207. [Google Scholar] [CrossRef] [PubMed]
- Thapa, N.; Chaudhari, M.; McManus, S.; Roy, K.; Newman, R.H.; Saigo, H.; Kc, D.B. DeepSuccinylSite: A deep learning based approach for protein succinylation site prediction. BMC Bioinform. 2020, 21, 1–10. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2019, 35, 2386–2394. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Zhou, J.; Lin, S.; Deng, W.; Zhang, Y.; Xue, Y. PLMD: An updated data resource of protein lysine modifications. J. Genet. Genom. 2017, 44, 243–250. [Google Scholar] [CrossRef]
- Heffernan, R.; Yang, Y.; Paliwal, K.K.; Zhou, Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 2017, 33, 2842–2849. [Google Scholar] [CrossRef] [Green Version]
- Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics 2010, 26, 680–682. [Google Scholar] [CrossRef]
- Hasan, A.M.; Ahmad, S. mLysPTMpred: Multiple Lysine PTM Site Prediction using combination of svm with resolving data imbalance issue. Nat. Sci. 2018, 10, 370–384. [Google Scholar] [CrossRef] [Green Version]
- Atchley, W.R.; Zhao, J.; Fernandes, A.D.; Drüke, T. Solving the protein sequence metric problem. Proc. Natl. Acad. Sci. USA 2005, 102, 6395–6400. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hu, R.; Andreas, J.; Rohrbach, M.; Darrell, T.; Saenko, K. Learning to Reason: End-to-End Module Networks for Visual Question Answering. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 804–813. [Google Scholar]
- Yu, L.; Lin, Z.; Shen, X.; Yang, J.; Lu, X.; Bansal, M.; Berg, T.L. MAttNet: Modular Attention Network for Referring Expression Comprehension. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1307–1315. [Google Scholar]
- He, F.; Wang, R.; Li, J.; Bao, L.; Xu, D.; Zhao, X. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC Syst. Biol. 2018, 12, 81–90. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 1097–1105. [Google Scholar] [CrossRef]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learning Representations, (ICLR), San Diego, CA, USA, 5–8 May 2015. [Google Scholar]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Chandra, A.; Sharma, A.; Dehzangi, A.; Ranganathan, S.; Jokhan, A.; Chou, K.-C.; Tsunoda, T. PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci. Rep. 2018, 8, 17923. [Google Scholar] [CrossRef] [PubMed]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C. iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem. 2016, 497, 48–56. [Google Scholar] [CrossRef] [PubMed]
- Hasan, M.; Khatun, M.S.; Mollah, N.H.; Yong, C.; Guo, D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int. J. Nanomed. 2017, 12, 6303–6315. [Google Scholar] [CrossRef] [Green Version]
- Ning, Q.; Ma, Z.; Zhao, X.; Yin, M. SSKM_Succ: A novel succinylation sites prediction method incorprating K-means clustering with a new semi-supervised learning algorithm. IEEE ACM Trans. Comput. Biol. Bioinform. 2020, 1. [Google Scholar] [CrossRef] [PubMed]
- Taylor, W.R. Protein structural domain identification. Protein Eng. Des. Sel. 1999, 12, 203–216. [Google Scholar] [CrossRef] [Green Version]
- Vacic, V.; Iakoucheva, L.; Radivojac, P. Two Sample Logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics 2006, 22, 1536–1537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, Y.; Jia, C.; Li, F.; Song, J. Inspector: A lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal. Biochem. 2020, 593, 113592. [Google Scholar] [CrossRef] [PubMed]
- The UniProt Consortium. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, M.; Yang, Y.; Wang, H.; Xu, Y. A deep learning method to more accurately recall known lysine acetylation sites. BMC Bioinform. 2019, 20, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Le, N.-Q.-K.; Nguyen, B.P. Prediction of FMN Binding Sites in Electron Transport Chains based on 2-D CNN and PSSM Profiles. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 1. [Google Scholar] [CrossRef] [PubMed]
- Le, N.Q.K.; Nguyen, Q.H.; Chen, X.; Rahardja, S.; Nguyen, B.P. Classification of adaptor proteins using recurrent neural networks and PSSM profiles. BMC Genom. 2019, 20, 966–969. [Google Scholar] [CrossRef]
- Li, Y.; Wang, M.; Wang, H.; Tan, H.; Zhang, Z.; Webb, G.; Song, J. Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. Sci. Rep. 2015, 4, 5765. [Google Scholar] [CrossRef] [Green Version]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. arXiv 2017, arXiv:1710.09829. [Google Scholar]
- Nguyen, B.P.; Nguyen, Q.H.; Doan-Ngoc, G.-N.; Nguyen-Vo, T.-H.; Rahardja, S. iProDNA-CapsNet: Identifying protein-DNA binding residues using capsule neural networks. BMC Bioinform. 2019, 20, 612–634. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, Q.H.; Nguyen-Vo, T.-H.; Le, N.Q.K.; Do, T.T.; Rahardja, S.; Nguyen, B.P. iEnhancer-ECNN: Identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genom. 2019, 20, 910–951. [Google Scholar] [CrossRef] [PubMed]
Dataset Type | Number of Proteins | Positive Samples | Negative Samples |
---|---|---|---|
Training Set | 2776 | 5885 | 64,140 |
Independent Test Set | 309 | 684 | 6709 |
Method | Sn | Sp | Acc | MCC | Gmean | AUC | AUPR |
---|---|---|---|---|---|---|---|
DeepSuccinylSite a | 69.84 | 68.11 | 69.87 | 38.16 | 68.77 | 75.92 | 72.63 |
DeepSuccinylSite b | 75.69 | 55.62 | 65.66 | 32.03 | 64.80 | 71.66 | 68.93 |
Our Method (k = 4) | 63.69 | 77.48 | 76.32 | 26.11 | 69.76 | 79.36 | 24.28 |
Our Method (k = 6) | 60.58 | 79.59 | 77.99 | 26.48 | 68.65 | 79.73 | 24.96 |
Our Method (k = 8) | 67.26 | 75.71 | 74.99 | 26.80 | 71.03 | 79.94 | 25.33 |
Our Method (k = 10) | 66.81 | 76.75 | 75.91 | 27.36 | 71.37 | 80.35 | 25.88 |
Method | Acc | Sn | Sp | MCC | Gmean | AUC | AUPR |
---|---|---|---|---|---|---|---|
iSuc-PseAAC | 82.34 | 13.29 | 89.23 | 2.32 | 34.44 | - | - |
iSuc-PseOpt | 72.33 | 31.17 | 76.45 | 2.32 | 48.81 | - | - |
SuccinSite | 84.20 | 58.79 | 86.74 | 34.51 | 71.41 | - | - |
pSuc-Lys | 78.39 | 23.22 | 83.89 | 5.47 | 44.14 | - | - |
HybridSucc | 63.00 | 39.00 | 65.40 | 2.65 | 50.50 | - | - |
DeepSuccinylSite a | 56.96 | 68.42 | 55.79 | 14.07 | 61.78 | 66.75 | 16.77 |
DeepSuccinylSite b | 62.63 | 65.35 | 62.35 | 16.37 | 63.83 | 68.67 | 17.48 |
Our Method | 72.96 | 70.32 | 73.23 | 27.33 | 71.76 | 79.03 | 25.24 |
Feature a | √ | √ | √ | √ | |||
Feature b | √ | √ | √ | √ | |||
Feature c | √ | √ | √ | √ | |||
Acc | 68.93 | 76.64 | 76.06 | 75.59 | 68.24 | 77.12 | 75.91 |
MCC | 26.67 | 24.85 | 4.53 | 26.74 | 23.43 | 26.17 | 27.36 |
AUC | 79.99 | 78.62 | 55.55 | 79.63 | 77.45 | 79.04 | 80.35 |
AUPR | 25.70 | 23.98 | 10.36 | 24.62 | 22.70 | 24.46 | 25.88 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Zhao, H.; Yan, Z.; Zhao, J.; Han, J. MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules 2021, 11, 872. https://doi.org/10.3390/biom11060872
Wang H, Zhao H, Yan Z, Zhao J, Han J. MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules. 2021; 11(6):872. https://doi.org/10.3390/biom11060872
Chicago/Turabian StyleWang, Huiqing, Hong Zhao, Zhiliang Yan, Jian Zhao, and Jiale Han. 2021. "MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network" Biomolecules 11, no. 6: 872. https://doi.org/10.3390/biom11060872
APA StyleWang, H., Zhao, H., Yan, Z., Zhao, J., & Han, J. (2021). MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules, 11(6), 872. https://doi.org/10.3390/biom11060872