Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Rationale for Selection of the Six Bioinformatic Tools
2.2. Robustness of the Bioinformatic Tools
3. Results
3.1. Coding Potential Alignment Tool (CPAT)
3.2. Coding Potential Calculator 2 (CPC2)
3.3. LGC Web Server
- ORF Length (length of the longest ORF),
- GC Content (GC content of the longest ORF),
- Coding Potential Score for the transcript (protein-coding RNA if greater than 0 or ncRNA if smaller than 0),
- Coding Label (Coding and Non-coding),
- The probability of ORF for coding sequence (pc),
- The probability of ORF for non-coding sequence (pnc),
- Stop-codon probability for coding sequence (fc), and
- Stop-codon probability for non-coding sequence (fnc).
3.4. Coding–Non-Coding Identifying Tool (CNIT)
3.5. RNAsamba
3.6. MicroPeptide Identification Tool (MiPepid)
3.7. Robustness of the Prediction of Bioinformatic Tools
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shapiro, J.A. Revisiting the Central Dogma in the 21st Century. Ann. N. Y. Acad. Sci. 2009, 1178, 6–28. [Google Scholar] [CrossRef]
- Koonin, E.V. Does the central dogma still stand? Biol. Direct. 2012, 7, 17. [Google Scholar] [CrossRef] [Green Version]
- Ørom, U.A.; Derrien, T.; Beringer, M.; Gumireddy, K.; Gardini, A.; Bussotti, G.; Lai, F.; Zytnicki, M.; Notredame, C.; Huang, Q.; et al. Long noncoding rnas with enhancer—Like function in human cells. Cell 2010, 143, 46–58. [Google Scholar] [CrossRef] [Green Version]
- The FANTOM Consortium; Carninci, P.; Kasukawa, T.; Katayama, S.; Gough, J.; Frith, M.C.; Maeda, N.; Oyama, R.; Ravasi, T.; Lenhard, B.; et al. The Transcriptional Landscape of the Mammalian Genome. Science 2005, 309, 1559–1563. [Google Scholar] [CrossRef] [Green Version]
- Uchida, S.; Dimmeler, S. Long Noncoding RNAs in Cardiovascular Diseases. Circ. Res. 2015, 116, 737–750. [Google Scholar] [CrossRef] [Green Version]
- Mowel, W.K.; Kotzin, J.J.; McCright, S.J.; Neal, V.D.; Henao-Mejia, J. Control of Immune Cell Homeostasis and Function by lncRNAs. Trends Immunol. 2018, 39, 55–69. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Zhang, L.; Xu, Y.; Wu, X.; Zhou, Y.; Mo, J. Immune-related long noncoding RNA signature for predicting survival and immune checkpoint blockade in hepatocellular carcinoma. J. Cell. Physiol. 2020, 235, 9304–9316. [Google Scholar] [CrossRef] [PubMed]
- Sigdel, K.R.; Cheng, A.; Wang, Y.; Duan, L.; Zhang, Y. The Emerging Functions of Long Noncoding RNA in Immune Cells: Autoimmune Diseases. J. Immunol. Res. 2015, 2015, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lu, S.; Wang, T.; Zhang, G.; He, Q.Y. Understanding the proteome encoded by “non-coding RNAs”: New insights into human genome. Sci. China Life Sci. 2020, 63, 986–995. [Google Scholar] [CrossRef]
- Vitorino, R.; Guedes, S.; Amado, F.; Santos, M.; Akimitsu, N. The role of micropeptides in biology. Cell. Mol. Life Sci. 2021, 78, 3285–3298. [Google Scholar] [CrossRef]
- Wei, L.H.; Guo, J.U. Coding functions of “noncoding” RNAs. Science 2020, 367, 1074–1075. [Google Scholar] [CrossRef]
- Zlotorynski, E. The functions of short ORFs and their microproteins. Nat. Rev. Mol. Cell Biol. 2020, 21, 252–253. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Brunner, A.N.; Cogan, J.Z.; Nuñez, J.K.; Fields, A.P.; Adamson, B.; Itzhak, D.N.; Li, J.Y.; Mann, M.; Leonetti, M.D.; et al. Pervasive functional translation of non-canonical human open reading frames. Science 2020, 367, 1140–1146. [Google Scholar] [CrossRef]
- Fu, X.-D. Non-coding RNA: A new frontier in regulatory biology. Natl. Sci. Rev. 2014, 1, 190–204. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nam, J.-W.; Choi, S.-W.; You, B.-H. Incredible RNA: Dual Functions of Coding and Noncoding. Mol. Cells 2016, 39, 367–374. [Google Scholar] [CrossRef] [Green Version]
- Hartford, C.C.R.; Lal, A. When Long Noncoding Becomes Protein Coding. Mol. Cell. Biol. 2020, 40, 00519–00528. [Google Scholar] [CrossRef] [Green Version]
- Matsumoto, A.; Pasut, A.; Matsumoto, M.; Yamashita, R.; Fung, J.; Monteleone, E.; Saghatelian, A.; Nakayama, K.I.; Clohessy, J.G.; Pandolfi, P.P. mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide. Nat. Cell Biol. 2017, 541, 228–232. [Google Scholar] [CrossRef] [PubMed]
- Tornesello, M.L.; Faraonio, R.; Buonaguro, L.; Annunziata, C.; Starita, N.; Cerasuolo, A.; Pezzuto, F.; Tornesello, A.L.; Buonaguro, F.M. The Role of microRNAs, Long Non-coding RNAs, and Circular RNAs in Cervical Cancer. Front. Oncol. 2020, 10, 150. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Wu, S.; Zhu, X.; Zhang, L.; Deng, J.; Li, F.; Guo, B.; Zhang, S.; Wu, R.; Zhang, Z.; et al. LncRNA-encoded polypeptide ASRPS inhibits triple-negative breast cancer angiogenesis. J. Exp. Med. 2020, 217, e20190950. [Google Scholar] [CrossRef]
- Anfossi, S.; Calin, G.A. When non-coding is not enough. J. Exp. Med. 2020, 217, 217. [Google Scholar] [CrossRef]
- Pauli, A.; Norris, M.L.; Valen, E.; Chew, G.-L.; Gagnon, J.A.; Zimmerman, S.; Mitchell, A.; Ma, J.; Dubrulle, J.; Reyon, D.; et al. Toddler: An Embryonic Signal That Promotes Cell Movement via Apelin Receptors. Science 2014, 343, 1248636. [Google Scholar] [CrossRef]
- Huang, Y.; Liu, N.; Wang, J.P.; Wang, Y.Q.; Yu, X.L.; Bin Wang, Z.; Cheng, X.C.; Zou, Q. Regulatory long non-coding RNA and its functions. J. Physiol. Biochem. 2012, 68, 611–618. [Google Scholar] [CrossRef] [PubMed]
- Zhao, T.; Xu, J.; Liu, L.; Bai, J.; Xu, C.; Xiao, Y.; Li, X.; Zhang, L. Identification of cancer-related lncRNAs through integrating genome, regulome and transcriptome features. Mol. BioSyst. 2014, 11, 126–136. [Google Scholar] [CrossRef] [PubMed]
- A Butler, A.; Webb, W.M.; Lubin, F.D. Regulatory RNAs and control of epigenetic mechanisms: Expectations for cognition and cognitive dysfunction. Epigenomics 2016, 8, 135–151. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pauli, A.; Valen, E.; Lin, M.F.; Garber, M.; Vastenhouw, N.L.; Levin, J.Z.; Fan, L.; Sandelin, A.; Rinn, J.L.; Regev, A.; et al. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res. 2011, 22, 577–591. [Google Scholar] [CrossRef] [Green Version]
- Choi, S.-W.; Kim, H.-W.; Nam, J.-W. The small peptide world in long noncoding RNAs. Briefings Bioinform. 2019, 20, 1853–1864. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, L.; Park, H.J.; Dasari, S.; Wang, S.; Kocher, J.-P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013, 41, e74. [Google Scholar] [CrossRef]
- Kong, L.; Zhang, Y.; Ye, Z.-Q.; Liu, X.-Q.; Zhao, S.-Q.; Wei, L.; Gao, G. CPC: Assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007, 35, W345–W349. [Google Scholar] [CrossRef] [PubMed]
- Kang, Y.J.; Yang, D.C.; Kong, L.; Hou, M.; Meng, Y.Q.; Wei, L.; Gao, G. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017, 45, W12–W16. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Yin, H.; Li, B.; Yu, C.; Wang, F.; Xu, X.; Cao, J.; Bao, Y.; Wang, L.; A Abbasi, A.; et al. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics 2019, 35, 2949–2956. [Google Scholar] [CrossRef]
- Camargo, A.P.; Sourkov, V.; Pereira, G.A.G.; Carazzolle, M.F. RNAsamba: Neural network-based assessment of the protein-coding potential of RNA sequences. NAR Genom. Bioinform. 2020, 2, lqz024. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, M.; Gribskov, M. MiPepid: MicroPeptide identification tool using machine learning. BMC Bioinform. 2019, 20, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hu, X.; Chen, W.; Li, J.; Huang, S.; Xu, X.; Zhang, X.; Xiang, S.; Liu, C. ZFLNC: A comprehensive and well-annotated database for zebrafish lncRNA. Database 2018, 2018. [Google Scholar] [CrossRef] [PubMed]
- Yates, A.; Akanni, W.; Amode, M.R.; Barrell, D.; Billis, K.; Carvalho-Silva, D.; Cummins, C.; Clapham, P.; Fitzgerald, S.; Gil, L.; et al. Ensembl 2016. Nucleic Acids Res. 2016, 44, D710–D716. [Google Scholar] [CrossRef] [PubMed]
- Xie, C.; Yuan, J.; Li, H.; Li, M.; Zhao, G.; Bu, D.; Zhu, W.; Wu, W.; Chen, R.; Zhao, Y. NONCODEv4: Exploring the world of long non-coding RNA genes. Nucleic Acids Res. 2014, 42, D98–D103. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dhiman, H.; Kapoor, S.; Sivadas, A.; Sivasubbu, S.; Scaria, V. zflncRNApedia: A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs. PLoS ONE 2015, 10, e0129997. [Google Scholar] [CrossRef]
- Guo, J.-C.; Fang, S.-S.; Wu, Y.; Zhang, J.-H.; Chen, Y.; Liu, J.; Wu, B.; Wu, J.-R.; Li, E.-M.; Xu, L.-Y.; et al. CNIT: A fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition. Nucleic Acids Res. 2019, 47, W516–W522. [Google Scholar] [CrossRef] [Green Version]
- Swift, A.; Heale, R.; Twycross, A. What are sensitivity and specificity. Evid. Based Nurs. 2020, 23, 2–4. [Google Scholar] [CrossRef] [Green Version]
- Antonov, I.V.; Mazurov, E.; Borodovsky, M.; A Medvedeva, Y. Prediction of lncRNAs and their interactions with nucleic acids: Benchmarking bioinformatics tools. Briefings Bioinform. 2018, 20, 551–564. [Google Scholar] [CrossRef] [PubMed]
- Sun, L.; Luo, H.; Bu, D.; Zhao, G.; Yu, K.; Zhang, C.; Liu, Y.; Chen, R.; Zhao, Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013, 41, e166. [Google Scholar] [CrossRef]
- Makarewich, C.A.; Olson, E.N. Mining for Micropeptides. Trends Cell Biol. 2017, 27, 685–696. [Google Scholar] [CrossRef] [PubMed]
- Yeasmin, F.; Yada, T.; Akimitsu, N. Micropeptides Encoded in Transcripts Previously Identified as Long Noncoding RNAs: A New Chapter in Transcriptomics and Proteomics. Front. Genet. 2018, 9, 144. [Google Scholar] [CrossRef] [PubMed]
Software Suite | Journals Publishing the Software | Year | Applied Species | Input | Output | URLs |
---|---|---|---|---|---|---|
Coding Potential Alignment Tool (CPAT) [27] | Nucleic Acids Research | 2013 | Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster | BED, FASTA | Coding Probability, ORF size, Fickett Score, Hexamer Score, Coding Label | http://lilab.research.bcm.edu/ |
Coding Potential Calculator 2 (CPC2) [28,29] | Nucleic Acids Research | 2017 | Homo sapiens, Mus musculus, Xenopus laevis, Danio rerio and Drosophila melanogaster | FASTA, BED or GTF | Peptide length, Fickett score, Pi, ORF integrity, coding probability, coding label | http://cpc2.gao-lab.org/ http://cpc2.gao-lab.org/batch.php |
LGC web server [30] | Bioinformatics | 2019 | Cross-species manner from plants to mammals | FASTA, BED, GTF, | ORF Length, Coding Potential Score for the transcript, Coding/Non-coding Label, Probability of ORF for coding sequence (pc), | https://bigd.big.ac.cn/lgc/calculator |
Coding-Non-Coding Identifying Tool (CNIT) [37] | Nucleic Acids Research | 2019 | 37 species (11 animal species, 26 plant species) | FASTA, GTF | Gene classification (coding or noncoding) | http://cnit.noncode.org/CNIT/ http://cnit.noncode.org/CNIT/batch |
RNAsamba [31] | NAR Genomics and Bioinformatics | 2019 | Mus musculus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana | FASTA, FA and FNA | Coding score and classification | https://rnasamba.lge.ibi.unicamp.br/ |
MicroPeptide identification tool (MiPepid) [32] | BMC Bioinformatics volume | 2019 | Mus musculus, Danio rerio, Saccharomyces cerevisiae, E. coli and Arabidopsis thaliana | FASTA | https://github.com/MindAI/MiPepid |
Robustness | CPC2 | LGC | CNIT | RNAsamba | MiPepid |
---|---|---|---|---|---|
Sensitivity | 0.5709 | 0.4591 | 0.7218 | 0.8780 | 0.1931 |
Specificity | 0.9865 | 0.9317 | 0.8515 | 0.8312 | 0.8244 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mishra, S.K.; Wang, H. Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish. Biology 2021, 10, 371. https://doi.org/10.3390/biology10050371
Mishra SK, Wang H. Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish. Biology. 2021; 10(5):371. https://doi.org/10.3390/biology10050371
Chicago/Turabian StyleMishra, Shital Kumar, and Han Wang. 2021. "Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish" Biology 10, no. 5: 371. https://doi.org/10.3390/biology10050371
APA StyleMishra, S. K., & Wang, H. (2021). Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish. Biology, 10(5), 371. https://doi.org/10.3390/biology10050371