Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae
Abstract
:1. Introduction
2. Results
2.1. Performances of Different Features
2.2. Improving Predictive Performance Using Ensemble Learning
3. Materials and Methods
3.1. Benchmark Dataset
3.2. Sequence Encoding Scheme
3.2.1. Nucleotide Physicochemical Property (NPCP)
3.2.2. Pseudo Dinucleotide Composition
3.2.3. Secondary Structure Component (SSC)
3.3. Support Vector Machine
3.4. Ensemble Classifiers
3.5. Performance Evaluation
3.6. Jackknife Cross-Validation
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Dyubankova, N.; Sochacka, E.; Kraszewska, K.; Nawrot, B.; Herdewijn, P.; Lescrinier, E. Contribution of dihydrouridine in folding of the D-arm in tRNA. Organ. Biomol. Chem. 2015, 13, 4960–4966. [Google Scholar] [CrossRef] [PubMed]
- Sprinzl, M.; Horn, C.; Brown, M.; Ioudovitch, A.; Steinberg, S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998, 26, 148–153. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, F.; Tanaka, Y.; Yamashita, K.; Suzuki, T.; Nakamura, A.; Hirano, N.; Suzuki, T.; Yao, M.; Tanaka, I. Molecular basis of dihydrouridine formation on tRNA. Proc. Natl. Acad. Sci. USA 2011, 108, 19593–19598. [Google Scholar] [CrossRef] [Green Version]
- Jones, C.I.; Spencer, A.C.; Hsu, J.L.; Spremulli, L.L.; Martinis, S.A.; DeRider, M.; Agris, P.F. A counterintuitive Mg2+-dependent and modification-assisted functional folding of mitochondrial tRNAs. J. Mol. Biol. 2006, 362, 771–786. [Google Scholar] [CrossRef] [PubMed]
- Dalluge, J.J.; Hashizume, T.; Sopchik, A.E.; McCloskey, J.A.; Davis, D.R. Conformational flexibility in RNA: The role of dihydrouridine. Nucleic Acids Res. 1996, 24, 1073–1079. [Google Scholar] [CrossRef]
- Kasprzak, J.M.; Czerwoniec, A.; Bujnicki, J.M. Molecular evolution of dihydrouridine synthases. BMC Bioinform. 2012, 13, 153. [Google Scholar] [CrossRef]
- Whelan, F.; Jenkins, H.T.; Griffiths, S.C.; Byrne, R.T.; Dodson, E.J.; Antson, A.A. From bacterial to human dihydrouridine synthase: Automated structure determination. Acta Crystallogr. Sect. D Biol. Crystallogr. 2015, 71, 1564–1571. [Google Scholar] [CrossRef]
- Alexandrov, A.; Chernyakov, I.; Gu, W.; Hiley, S.L.; Hughes, T.R.; Grayhack, E.J.; Phizicky, E.M. Rapid tRNA decay can result from lack of nonessential modifications. Mol. Cell 2006, 21, 87–96. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Xing, P.; Zou, Q. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci. Rep. 2017, 7, 40242. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jia, C.; Zuo, Y.; Zou, Q. O-GlcNAcPRED-II: An integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics 2018, 34, 2029–2036. [Google Scholar] [CrossRef]
- Zou, Q.; Guo, J.; Ju, Y.; Wu, M.; Zeng, X.; Hong, Z. Improving tRNAscan-SE Annotation Results via Ensemble Classifiers. Mol. Inform. 2015, 34, 761–770. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iRSpot-PseDNC: Identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013, 41, e68. [Google Scholar] [CrossRef] [PubMed]
- Wan, S.; Duan, Y.; Zou, Q. HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source. Proteomics 2017, 17, 1700262. [Google Scholar] [CrossRef] [PubMed]
- Xuan, J.J.; Sun, W.J.; Lin, P.H.; Zhou, K.R.; Liu, S.; Zheng, L.L.; Qu, L.H.; Yang, J.H. RMBase v2.0: Deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2018, 46, D327–D334. [Google Scholar] [CrossRef] [PubMed]
- Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Feng, P.; Lin, H.; Chen, W. Identifying RNA N(6)-Methyladenosine Sites in Escherichia coli Genome. Front. Microbiol. 2018, 9, 955. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. Mol. Ther.-Nucleic Acids 2017, 7, 155–163. [Google Scholar] [CrossRef]
- Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem. 2014, 456, 53–60. [Google Scholar] [CrossRef]
- Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.-C. PseKNC-General: A cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 2014, 31, 119–120. [Google Scholar] [CrossRef]
- Chen, W.; Feng, P.-M.; Deng, E.-Z.; Lin, H.; Chou, K.-C. iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem. 2014, 462, 76–83. [Google Scholar] [CrossRef]
- Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C. iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition. BioMed Res. Int. 2014. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Liang, Z.Y.; Tang, H.; Chen, W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017. [Google Scholar] [CrossRef]
- Chen, W.; Lin, H.; Chou, K.-C. Pseudo nucleotide composition or PseKNC: An effective formulation for analyzing genomic sequences. Mol. BioSyst. 2015, 11, 2620–2634. [Google Scholar] [CrossRef]
- Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C. iRNA-Methyl: Identifying N-6-methyladenosine sites using pseudo nucleotide composition. Anal. Biochem. 2015, 490, 26–33. [Google Scholar] [CrossRef]
- Freier, S.M.; Kierzek, R.; Jaeger, J.A.; Sugimoto, N.; Caruthers, M.H.; Neilson, T.; Turner, D.H. Improved free-energy parameters for predictions of RNA duplex stability. Proc. Natl. Acad. Sci. USA 1986, 83, 9373–9377. [Google Scholar] [CrossRef] [PubMed]
- Xia, T.; SantaLucia, J., Jr.; Burkard, M.E.; Kierzek, R.; Schroeder, S.J.; Jiao, X.; Cox, C.; Turner, D.H. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 1998, 37, 14719–14735. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.J.; Olson, W.K.; Bussemaker, H.J. The RNA backbone plays a crucial role in mediating the intrinsic stability of the GpU dinucleotide platform and the GpUpA/GpA miniduplex. Nucleic Acids Res. 2010, 38, 4868–4876. [Google Scholar] [CrossRef]
- Lorenz, R.; Bernhart, S.H.; Honer Zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithms Mol. Biol. AMB 2011, 6, 26. [Google Scholar] [CrossRef] [Green Version]
- Feng, C.Q.; Zhang, Z.Y.; Zhu, X.J.; Lin, Y.; Chen, W.; Tang, H.; Lin, H. iTerm-PseKNC: A sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2018. [Google Scholar] [CrossRef]
- Su, Z.D.; Huang, Y.; Zhang, Z.Y.; Zhao, Y.W.; Wang, D.; Chen, W.; Chou, K.C.; Lin, H. iLoc-lncRNA: Predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 2018. [Google Scholar] [CrossRef]
- Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 2017, 33, 3518–3523. [Google Scholar] [CrossRef] [PubMed]
- Li, D.; Ju, Y.; Zou, Q. Protein Folds Prediction with Hierarchical Structured SVM. Curr. Proteomics 2016, 13, 79–85. [Google Scholar] [CrossRef]
- Wang, S.P.; Zhang, Q.; Lu, J.; Cai, Y.D. Analysis and Prediction of Nitrated Tyrosine Sites with the mRMR Method and Support Vector Machine Algorithm. Curr. Bioinform. 2018, 13, 3–13. [Google Scholar] [CrossRef]
- Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: A Sequence-Based Predictor for Identifying 2′-O-Methylation Sites in Homo sapiens. J. Comput. Biol. J. Comput. Mol. Cell Biol. 2018, 25, 1266–1277. [Google Scholar] [CrossRef] [PubMed]
- Dao, F.Y.; Lv, H.; Wang, F.; Feng, C.Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2018. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.-M.; Chen, W.; Lin, H.; Chou, K.-C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem. 2013, 442, 118–125. [Google Scholar] [CrossRef] [PubMed]
- Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N.D.; Webb, G.I.; Chou, K.C. iProt-Sub: A comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Briefings Bioinform. 2018. [Google Scholar] [CrossRef]
- Zhu, X.J.; Feng, C.Q.; Lai, H.Y.; Chen, W.; Lin, H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl.-Based Syst. 2018. [Google Scholar] [CrossRef]
- Yang, H.; Qiu, W.R.; Liu, G.Q.; Guo, F.B.; Chen, W.; Chou, K.C.; Lin, H. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int. J. Biol. Sci. 2018, 14, 883–891. [Google Scholar] [CrossRef]
- Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: A tool to identify growth hormone-binding proteins. Int. J. Biol. Sci. 2018, 14, 957–964. [Google Scholar] [CrossRef]
- Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naive Bayes. Comput. Math. Methods Med. 2013, 2013, 567529. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med. 2013, 2013, 530696. [Google Scholar] [CrossRef] [PubMed]
- Lai, H.Y.; Chen, X.X.; Chen, W.; Tang, H.; Lin, H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget 2017, 8, 28169–28175. [Google Scholar] [CrossRef] [PubMed]
- Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition. BioMed Res. Int. 2016, 2016, 5413903. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition. BioMed Res. Int. 2016, 2016, 1654623. [Google Scholar] [CrossRef] [PubMed]
Sample Availability: Samples of the compounds are not available from the authors. |
Methods | Sn (%) | Sp (%) | Acc (%) | MCC |
---|---|---|---|---|
NPCP-SVM | 67.65 | 100 | 83.82 | 0.59 |
PseDNC-SVM | 73.53 | 77.94 | 75.74 | 0.50 |
SSC-SVM | 70.59 | 75.00 | 72.79 | 0.45 |
Ensemble SVM | 76.47 | 89.71 | 83.09 | 0.62 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, P.; Xu, Z.; Yang, H.; Lv, H.; Ding, H.; Liu, L. Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae. Molecules 2019, 24, 380. https://doi.org/10.3390/molecules24030380
Feng P, Xu Z, Yang H, Lv H, Ding H, Liu L. Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae. Molecules. 2019; 24(3):380. https://doi.org/10.3390/molecules24030380
Chicago/Turabian StyleFeng, Pengmian, Zhaochun Xu, Hui Yang, Hao Lv, Hui Ding, and Li Liu. 2019. "Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae" Molecules 24, no. 3: 380. https://doi.org/10.3390/molecules24030380
APA StyleFeng, P., Xu, Z., Yang, H., Lv, H., Ding, H., & Liu, L. (2019). Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae. Molecules, 24(3), 380. https://doi.org/10.3390/molecules24030380