D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Positive Datasets
2.2. Negative Datasets
2.3. Small ORF Datasets
2.4. D-sORF Machine-Learning Framework
2.5. Data Preprocessing and Transformation
2.6. ML Algorithm Selection and Evaluation
2.7. sORF Predictor
3. Results
3.1. Algorithm Evaluation
3.2. Application of D-sORF in Annotating Experimentally Detected sORFs in Genomic Regions Enriched with Coding and Non-Coding Transcripts
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
sORF ID | Length | Sequence |
SorfId=andreev_2015:538832 | 300 | MVLRRLLAALLHSPQLVERLSESRPIRRAAQLTAFALLQAQLRGQDAARRLQDLAAGPVGSLCRRAERFRDAFTQELRRGLRGRSGPPPGSQRGPGANI |
SorfId=elkon_2015:50819 | 210 | MLRWLRDFVLPTAACQDAEQPTRYETLFQALDRNGDGVVDIGELQEGLRNLGIPLGQDAEEVGRRRGAA |
SorfId=fritsch_2012:225652 | 180 | MRQLKGKPKKETSKDKKERKQAMQEARQQITTVVLPTLAVVVLLIVVFVYVATRPTITE |
SorfId=andreev_2015:496184 | 213 | MAPWSREAVLSLYRALLRQGRQLRYTDRDFYFASIRREFRKNQKLEDAEARERQLEKGLVFLNGKLGRII |
SorfId=rubio_2014:98637 | 141 | MKESSIDDLMKSLDKNSDQEIDFKEYSVFLTMLCMAYNDFFLEDNK |
SorfId=andreev_2015:629427 | 261 | MTDRYTIHSQLEHLQSKYIGTGHADTTKWEWLVNQHRDSYCSYMGHFDLLNYFAIAENESKARVRFNLMEKMLQPCGPPADKPEEN |
SorfId=crappe_2014:52683 | 249 | MSATWTLSPEPLPPSTGPPVGAGLDAEQRTVFAFVLCLLVVLVLLMVRCVRILLDPYSRMPASSWTDHKEALERGQFDYALV |
SorfId=calviello_2016:870882 | 252 | MEALGSGHYVGGSIRSMAAAALSGLAVRLSRPQGTRGSYGAFCKTLTRTLLTFFDLAWRLRKNFFYFYILASVILNVHLQVYI |
SorfId=andreev_2015:491102 | 189 | MAAATLTSKLYSLLFRRTSTFALTIIVGVMFFERAFDQGADAIYDHINEGVRACAIPDLGPA |
SorfId=calviello_2016:375783 | 189 | MTKAGSKGGNLRDKLDGNELDLSLSDLNEVPVKELVSIVPRGPGAHLRRALAPTWATYFPLL |
SorfId=liu_HEK_2013:259226 | 204 | MLSRLQELRKEEETLLRLKAALHDQLNRLKVTPRPSPAPPPFLPSDHFPPTWASVVTTCCGQFSATH |
SorfId=gonzalez_2014:380995 | 267 | MSIMDHSPTTGVVTVIVILIAIAALGALILGCWCYLRLQRISQSEDEESIVGDGETKEPFLLVQYSAKGPCVERKAKLMTPNGPEVHG |
SorfId=calviello_2016:540138 | 264 | MYCLQWLLPVLLIPKPLNPALWFSHSMFMGFYLLSFLLERKPCTICALVFLAALFLICYSCWGNCFLYHCSDSPLPESAHDPGVVGT |
SorfId=calviello_2016:185382 | 207 | MAPAAAPSSLAVRASSPAATPTSYGVFCKGLSRTLLAFFELAWQLRMNFPYFYVAGSVILNIRLQVHI |
SorfId=andreev_2015:177588 | 294 | MEEISLANLDTNKLEAIAQEIYVDLIEDSCLGFCFEVHRAVKCGYFYLEFAETGSVKDFGIQPVEDKGACRLPLCSLPGEPGNGPDQQLQRSPPEFQ |
SorfId=andreev_2015:409703 | 114 | MITDVQLAIFANMLGVSLFLLVVLYHYVAVNNPKKQE |
SorfId=andreev_2015:37830 | 279 | MTKLAQWLWGLAILGSTWVALTTGALGLELPLSCQEVLWPLPAYLLVSAGCYALGTVGYRVATFHDCEDAARELQSQIQEARADLARRGLRF |
SorfId=andreev_2015:13871 | 279 | MWPVFWTVVRTYAPYVTFPVAFVVGAVGYHLEWFIRGKDPQPVEEEKSISERREDRKLDELLGKDHTQVVSLKDKLEFAPKAVLNRNRPEKN |
SorfId=park_2016:1581934 | 237 | MREENKGMPSGGGSDEGLASAAARGLVEKVRQLLEAGADPNGVNRFGRRAIQVAGAPGPRRQGARERGARPRRIGAGT |
SorfId=gonzalez_2014:153430 | 150 | VRDKKLLNDLNGAVEDAKTARLFNITSSALAASCIILVFIFLRYPLTDY |
SorfId=fritsch_2012:974998 | 183 | MDAVSQVPMEVVLPKHILDIWVIVLIILATIVIMTSLLLCPATAVIIYRMRTHPILSGAV |
SorfId=andreev_2015:137699 | 264 | MSTSVPQGHTWTQRVKKDDEEEDPLDQLISRSGCAASHFAVQECMAQHQDWRQCQPQVQAFKDCMSEQQARRQEELQRRQEQAGAHH |
SorfId=andreev_2015:321927 | 207 | VGVGKNKCLYALEEGIVRYTKEVYVPHPRNTEAVDLITRLPKGAVLYKTFVHVVPAKPEGTFKLVAML |
SorfId=calviello_2016:206180 | 141 | MPAGVPMSTYLKMFAASLLAMCAGAEVVHRYYRPDLVSAGGLLFYF |
SorfId=gonzalez_2014:52803 | 162 | VNIDHFTKDITMKNLVEPSLSSFDMAQKRIHALMEKDSLPRFVRSEFYQELIK |
SorfId=elkon_2015:525198 | 114 | MVVDRLFLWTFIIFTSVGTLVIFLDATYHLPPPDPFP |
SorfId=andreev_2015:272691 | 150 | MDINRRRYPAHLARSSSRKYTELPHGAISEDQAVGPADIPCDSTGQTST |
SorfId=calviello_2016:187121 | 222 | LLYLGRDYPKGADYFKKRLKNIFLKNKDVKNPEKIKELIAQGEFVMKELEALYFLRKYRAMKQRYYSDTNKTN |
SorfId=loayza_puch_2016:211324 | 231 | MDLRRVKEYFSWLYYQYQIISCCAVLEPWERSMFNTILLTIIAMVVYTAYVFIPIHIRLAWEFFSKICGYHSTISN |
SorfId=gonzalez_2014:438290 | 78 | MLDIFILMFFAIIGLVILSYIIYLL |
SorfId=cenik_2015:853557 | 171 | MADVSERTLQLSVLVAFASGVLLGWQANRLRRRYLDWRKRRLQDKLAATQKKLDLA |
SorfId=andreev_2015:169859 | 174 | MPTGKQLADIGYKTFSTSMMLLTVYGGYLCSVRVYHYFQWRRAQRQAAEEQKTSGIM |
SorfId=crappe_2014:5055 | 144 | DVGSLDEKMKSLDVNQDSELKFNEYWRLIGELAKEIRKKKDLKIRKK |
SorfId=calviello_2016:673102 | 225 | MFDIKAWAEYVVEWAAKDPYGFLTTVILALTPLFLASAVLSWKLAKMIEAREKEQKKKQKRQENIAKAKRLKKD |
SorfId=ingolia_2012:368307 | 69 | LGMEEEDVIEVYQEQTGGHSTV |
SorfId=andreev_2015:611100 | 273 | MPKRKAKGDAKGDKAKVKDEPQRRSARLSAKPAPPKPEPRPKKASAKKGEKLPKGRKGKADAGKDGNNPAKNRDASTLQSQKAEGTGDAK |
SorfId=elkon_2015:4281 | 108 | MPQTFSGGRGFELDSGSDDMDPGRPRPPRDPDELH |
SorfId=calviello_2016:40163 | 273 | MRDPVSSQYSSFLFWRMPIPELDLSELEGLGLSDTATYKVKDSSVGKMIGQATAADQEKNPEGDGLLEYSTFNFWRAPIASIHSFELDLL |
SorfId=elkon_2015:564093 | 213 | MPARRLLLLLTLLLPGLGVSDRGAWGGGQLATAGSGPGQRRGAGAGVRAGSATAAARCPVSPAVGGSGRA |
SorfId=andreev_2015:217825 | 87 | VERIKERVEEKEGIPPQQQRLIYSGKQM |
SorfId=calviello_2016:764879 | 126 | VSKCSEEIKNYIEERSGEDPLVKGIPEDKNPFKEKGSCVIS |
SorfId=calviello_2016:642790 | 204 | VKAAFTRAYNKEAHLTPYSLQAIKASRHSTSPSLDSEYNEELNEDDSQSDEKDQDAIETDAMIKVVF |
SorfId=andreev_2015:308992 | 210 | MTTLIPILSTFLFEDFSKASGFKGQRPETLHERLTLVSVYAPYLLIPFILLIFMLRSPYYKYEEKRKKK |
SorfId=gonzalez_2014:172668 | 129 | VSKAAADLMTYCDAHACEDPLITPVPTSENPFREKKFFCALL |
SorfId=andreev_2015:216730 | 162 | VIPKRTNRPGISTTDRGFPRARYRARTTNYNSSRSRFYSGFNSRPRGRVYRSG |
SorfId=gonzalez_2014:645009 | 93 | VFRYSLQKLAYTVSRTGRQVLGERRQRAPN |
SorfId=werner_2015:456667 | 147 | MVITSENDEDRGGQEKESKEESVLAMLGIIGTILNLIVIIFVYIYTTL |
SorfId=iwasaki_2016:288070 | 114 | MAEGNTLISVDYEIFGKVQGVFFRKHTQVCGLQALGW |
SorfId=andreev_2015:619791 | 195 | MMNNGLLQQPSALMLLPCRPVLTSVALNANFVSWKSRTKYTITPVKMRKSGGRDHTGGNKDRGI |
SorfId=werner_2015:555128 | 159 | MGCHSSKSTTVAAESQKLEEEREGREPGLETGTQAADCKDAPLKDGTPEPKS |
References
- Ladoukakis, E.; Pereira, V.; Magny, E.G.; Eyre-Walker, A.; Couso, J.P. Hundreds of putatively functional small open reading frames in Drosophila. Genome Biol. 2011, 12, R118. [Google Scholar] [CrossRef]
- Bliss, M. Banting’s, Best’s, and Collip’s accounts of the discovery of insulin. Bull. Hist. Med. 1982, 56, 554–568. [Google Scholar]
- Wadler, C.S.; Vanderpool, C.K. A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. USA 2007, 104, 20454–20459. [Google Scholar] [CrossRef] [PubMed]
- Casson, S.A.; Chilley, P.M.; Topping, J.F.; Evans, I.M.; Souter, M.A.; Lindsey, K. The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning. Plant Cell 2002, 14, 1705–1721. [Google Scholar] [CrossRef]
- Rohrig, H.; Schmidt, J.; Miklashevichs, E.; Schell, J.; John, M. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc. Natl. Acad. Sci. USA 2002, 99, 1915–1920. [Google Scholar] [CrossRef]
- Lee, D.H.; Acharya, S.S.; Kwon, M.; Drane, P.; Guan, Y.; Adelmant, G.; Kalev, P.; Shah, J.; Pellman, D.; Marto, J.A.; et al. Dephosphorylation enables the recruitment of 53BP1 to double-strand DNA breaks. Mol. Cell 2014, 54, 512–525. [Google Scholar] [CrossRef] [PubMed]
- Dong, X.; Wang, D.; Liu, P.; Li, C.; Zhao, Q.; Zhu, D.; Yu, J. Zm908p11, encoded by a short open reading frame (sORF) gene, functions in pollen tube growth as a profilin ligand in maize. J. Exp. Bot. 2013, 64, 2359–2372. [Google Scholar] [CrossRef]
- Kastenmayer, J.P.; Ni, L.; Chu, A.; Kitchen, L.E.; Au, W.C.; Yang, H.; Carter, C.D.; Wheeler, D.; Davis, R.W.; Boeke, J.D.; et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006, 16, 365–373. [Google Scholar] [CrossRef]
- Gleason, C.A.; Liu, Q.L.; Williamson, V.M. Silencing a candidate nematode effector gene corresponding to the tomato resistance gene Mi-1 leads to acquisition of virulence. Mol. Plant-Microbe Interact. 2008, 21, 576–585. [Google Scholar] [CrossRef]
- Kondo, T.; Hashimoto, Y.; Kato, K.; Inagaki, S.; Hayashi, S.; Kageyama, Y. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat. Cell Biol. 2007, 9, 660–665. [Google Scholar] [CrossRef]
- Galindo, M.I.; Pueyo, J.I.; Fouix, S.; Bishop, S.A.; Couso, J.P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 2007, 5, e106. [Google Scholar] [CrossRef] [PubMed]
- Hashimoto, Y.; Niikura, T.; Tajima, H.; Yasukawa, T.; Sudo, H.; Ito, Y.; Kita, Y.; Kawasumi, M.; Kouyama, K.; Doyu, M.; et al. A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer’s disease genes and Abeta. Proc. Natl. Acad. Sci. USA 2001, 98, 6336–6341. [Google Scholar] [CrossRef] [PubMed]
- Lee, C.; Zeng, J.; Drew, B.G.; Sallam, T.; Martin-Montalvo, A.; Wan, J.; Kim, S.J.; Mehta, H.; Hevener, A.L.; de Cabo, R.; et al. The mitochondrial-derived peptide MOTS-c promotes metabolic homeostasis and reduces obesity and insulin resistance. Cell Metab. 2015, 21, 443–454. [Google Scholar] [CrossRef] [PubMed]
- Pauli, A.; Norris, M.L.; Valen, E.; Chew, G.L.; Gagnon, J.A.; Zimmerman, S.; Mitchell, A.; Ma, J.; Dubrulle, J.; Reyon, D.; et al. Toddler: An embryonic signal that promotes cell movement via Apelin receptors. Science 2014, 343, 1248636. [Google Scholar] [CrossRef]
- Chanut-Delalande, H.; Hashimoto, Y.; Pelissier-Monier, A.; Spokony, R.; Dib, A.; Kondo, T.; Bohère, J.; Niimi, K.; Latapie, Y.; Inagaki, S.; et al. Pri peptides are mediators of ecdysone for the temporal control of development. Nat. Cell Biol. 2014, 16, 1035–1044. [Google Scholar] [CrossRef] [PubMed]
- Magny, E.G.; Pueyo, J.I.; Pearl, F.M.; Cespedes, M.A.; Niven, J.E.; Bishop, S.A.; Couso, J.P. Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames. Science 2013, 341, 1116–1120. [Google Scholar] [CrossRef]
- Tonkin, J.; Rosenthal, N. One small step for muscle: A new micropeptide regulates performance. Cell Metab. 2015, 21, 515–516. [Google Scholar] [CrossRef] [PubMed]
- Crappé, J.; Van Criekinge, W.; Trooskens, G.; Hayakawa, E.; Luyten, W.; Baggerman, G.; Menschaert, G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs. BMC Genom. 2013, 14, 648. [Google Scholar] [CrossRef]
- Andrews, S.J.; Rothnagel, J.A. Emerging evidence for functional peptides encoded by short open reading frames. Nat. Rev. Genet. 2014, 15, 193–204. [Google Scholar] [CrossRef]
- Slavoff, S.A.; Heo, J.; Budnik, B.A.; Hanakahi, L.A.; Saghatelian, A. A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining. J. Biol. Chem. 2014, 289, 10950–10957. [Google Scholar] [CrossRef]
- Yosten, G.L.C.; Liu, J.; Ji, H.; Sandberg, K.; Speth, R.; Samson, W.K. A 5′-upstream short open reading frame encoded peptide regulates angiotensin type 1a receptor production and signalling via the β-arrestin pathway. J. Physiol. 2016, 594, 1601–1605. [Google Scholar] [CrossRef] [PubMed]
- Schwab, S.R.; Li, K.C.; Kang, C.; Shastri, N. Constitutive display of cryptic translation products by MHC class I molecules. Science 2003, 301, 1367–1371. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.F.; Parkhurst, M.R.; Kawakami, Y.; Robbins, P.F.; Rosenberg, S.A. Utilization of an alternative open reading frame of a normal gene in generating a novel human cancer antigen. J. Exp. Med. 1996, 183, 1131–1140. [Google Scholar] [CrossRef] [PubMed]
- Ruiz-Orera, J.; Messeguer, X.; Subirana, J.A.; Alba, M.M. Long non-coding RNAs as a source of new peptides. eLife 2014, 3, e03523. [Google Scholar] [CrossRef] [PubMed]
- McLysaght, A.; Guerzoni, D. New genes from non-coding sequence: The role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos. Trans. R. Soc. B Biol. Sci. 2015, 370, 20140332. [Google Scholar] [CrossRef] [PubMed]
- Strullu-Derrien, C.; Selosse, M.-A.; Kenrick, P.; Martin, F.M. The origin and evolution of mycorrhizal symbioses: From palaeomycology to phylogenomics. New Phytol. 2018, 220, 1012–1030. [Google Scholar] [CrossRef] [PubMed]
- Cabili, M.N.; Trapnell, C.; Goff, L.; Koziol, M.; Tazon-Vega, B.; Regev, A.; Rinn, J.L. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011, 25, 1915–1927. [Google Scholar] [CrossRef]
- Kutter, C.; Watt, S.; Stefflova, K.; Wilson, M.D.; Goncalves, A.; Ponting, C.P.; Odom, D.T.; Marques, A.C. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 2012, 8, e1002841. [Google Scholar] [CrossRef] [PubMed]
- Laing, W.A.; Martínez-Sánchez, M.; Wright, M.A.; Bulley, S.M.; Brewster, D.; Dare, A.P.; Rassam, M.; Wang, D.; Storey, R.; Macknight, R.C.; et al. An upstream open reading frame is essential for feedback regulation of ascorbate biosynthesis in Arabidopsis. Plant Cell 2015, 27, 772–786. [Google Scholar] [CrossRef]
- Lauressergues, D.; Couzigou, J.M.; Clemente, H.S.; Martinez, Y.; Dunand, C.; Bécard, G.; Combier, J.P. Primary transcripts of microRNAs encode regulatory peptides. Nature 2015, 520, 90–93. [Google Scholar] [CrossRef]
- Couso, J.-P.; Patraquim, P. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 2017, 18, 575–589. [Google Scholar] [CrossRef] [PubMed]
- Ingolia, N.T.; Ghaemmaghami, S.; Newman, J.R.S.; Weissman, J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 2009, 324, 218–223. [Google Scholar] [CrossRef] [PubMed]
- Ingolia, N.T.; Lareau, L.F.; Weissman, J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 2011, 147, 789–802. [Google Scholar] [CrossRef] [PubMed]
- Ingolia, N.T.; Brar, G.A.; Rouskin, S.; McGeachy, A.M.; Weissman, J.S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 2012, 7, 1534–1550. [Google Scholar] [CrossRef] [PubMed]
- Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. [Google Scholar] [CrossRef] [PubMed]
- Mann, M.; Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol. 2003, 21, 255–261. [Google Scholar] [CrossRef] [PubMed]
- Ingolia, N.T.; Brar, G.A.; Stern-Ginossar, N.; Harris, M.S.; Talhouarne, G.J.; Jackson, S.E.; Wills, M.R.; Weissman, J.S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 2014, 8, 1365–1379. [Google Scholar] [CrossRef] [PubMed]
- Andreev, D.E.; O’Connor, P.B.; Fahey, C.; Kenny, E.M.; Terenin, I.M.; Dmitriev, S.E.; Cormican, P.; Morris, D.W.; Shatsky, I.N.; Baranov, P.V. Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression. eLife 2015, 4, e03971. [Google Scholar] [CrossRef] [PubMed]
- Bazzini, A.A.; Johnstone, T.G.; Christiano, R.; Mackowiak, S.D.; Obermayer, B.; Fleming, E.S.; Vejnar, C.E.; Lee, M.T.; Rajewsky, N.; Walther, T.C.; et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014, 33, 981–993. [Google Scholar] [CrossRef]
- Olexiouk, V.; Van Criekinge, W.; Menschaert, G. An update on sORFs.org: A repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2018, 46, D497–D502. [Google Scholar] [CrossRef]
- Olexiouk, V.; Crappé, J.; Verbruggen, S.; Verhegen, K.; Martens, L.; Menschaert, G. sORFs.org: A repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2016, 44, D324–D329. [Google Scholar] [CrossRef] [PubMed]
- Cohen, S.M. Everything old is new again: (linc)RNAs make proteins! EMBO J. 2014, 33, 937–938. [Google Scholar] [CrossRef]
- Raj, A.; Wang, S.H.; Shim, H.; Harpak, A.; Li, Y.I.; Engelmann, B.; Stephens, M.; Gilad, Y.; Pritchard, J.K. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. eLife 2016, 5, e13328. [Google Scholar] [CrossRef]
- Kozak, M. An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 1987, 15, 8125–8148. [Google Scholar] [CrossRef]
- Michel, A.M.; Kiniry, S.J.; O’Connor, P.B.F.; Mullan, J.P.; Baranov, P.V. GWIPS-viz: 2018 update. Nucleic Acids Res. 2018, 46, D823–D830. [Google Scholar] [CrossRef]
- Li, Y.; Zhou, H.; Chen, X.; Zheng, Y.; Kang, Q.; Hao, D.; Zhang, L.; Song, T.; Luo, H.; Hao, Y.; et al. SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling. Genom. Proteom. Bioinform. 2021, 19, 602–610. [Google Scholar] [CrossRef] [PubMed]
- Brunet, M.A.; Brunelle, M.; Lucier, J.F.; Delcourt, V.; Levesque, M.; Grenier, F.; Samandi, S.; Leblanc, S.; Aguilar, J.D.; Dufour, P.; et al. OpenProt: A more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res. 2019, 47, D403–D410. [Google Scholar] [CrossRef]
- Chen, Y.; Long, W.; Yang, L.; Zhao, Y.; Wu, X.; Li, M.; Du, F.; Chen, Y.; Yang, Z.; Wen, Q.; et al. Functional Peptides Encoded by Long Non-Coding RNAs in Gastrointestinal Cancer. Front. Oncol. 2021, 11, 777374. [Google Scholar] [CrossRef] [PubMed]
- Leong, H.S.; Dawson, K.; Wirth, C.; Li, Y.; Connolly, Y.; Smith, D.L.; Wilkinson, C.R.; Miller, C.J. A global non-coding RNA system modulates fission yeast protein levels in response to stress. Nat. Commun. 2014, 5, 3947. [Google Scholar] [CrossRef]
- Mazin, P.V.; Fisunov, G.Y.; Gorbachev, A.Y.; Kapitskaya, K.Y.; Altukhov, I.A.; Semashko, T.A.; Alexeev, D.G.; Govorun, V.M. Transcriptome analysis reveals novel regulatory mechanisms in a genome-reduced bacterium. Nucleic Acids Res. 2014, 42, 13254–13268. [Google Scholar] [CrossRef]
- Giannakakis, A.; Zhang, J.; Jenjaroenpun, P.; Nama, S.; Zainolabidin, N.; Aau, M.Y.; Yarmishyn, A.A.; Vaz, C.; Ivshina, A.V.; Grinchuk, O.V.; et al. Contrasting expression patterns of coding and noncoding parts of the human genome upon oxidative stress. Sci. Rep. 2015, 5, 9737. [Google Scholar] [CrossRef]
- Pircher, A.; Gebetsberger, J.; Polacek, N. Ribosome-associated ncRNAs: An emerging class of translation regulators. RNA Biol. 2014, 11, 1335–1339. [Google Scholar] [CrossRef]
- Slavoff, S.A.; Mitchell, A.J.; Schwaid, A.G.; Cabili, M.N.; Ma, J.; Levin, J.Z.; Karger, A.D.; Budnik, B.A.; Rinn, J.L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 2013, 9, 59–64. [Google Scholar] [CrossRef] [PubMed]
- Hurst, L.D. The Ka/Ks ratio: Diagnosing the form of sequence evolution. Trends Genet. 2002, 18, 486. [Google Scholar] [CrossRef]
- Pollard, K.S.; Hubisz, M.J.; Rosenbloom, K.R.; Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010, 20, 110–121. [Google Scholar] [CrossRef]
- Hanada, K.; Zhang, X.; Borevitz, J.O.; Li, W.-H.; Shiu, S.-H. A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection. Genome Res. 2007, 17, 632–640. [Google Scholar] [CrossRef]
- Hanada, K.; Akiyama, K.; Sakurai, T.; Toyoda, T.; Shinozaki, K.; Shiu, S.-H. sORF finder: A program package to identify small open reading frames with high coding potential. Bioinformatics 2010, 26, 399–400. [Google Scholar] [CrossRef]
- Chugunova, A.; Navalayeu, T.; Dontsova, O.; Sergiev, P. Mining for Small Translated ORFs. J. Proteome Res. 2018, 17, 1–11. [Google Scholar] [CrossRef] [PubMed]
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef]
- Siepel, A.; Bejerano, G.; Pedersen, J.S.; Hinrichs, A.S.; Hou, M.; Rosenbloom, K.; Clawson, H.; Spieth, J.; Hillier, L.W.; Richards, S.; et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15, 1034–1050. [Google Scholar] [CrossRef]
- McGeoch, D.J. On the predictive recognition of signal peptide sequences. Virus Res. 1985, 3, 271–286. [Google Scholar] [CrossRef] [PubMed]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Fix, E.; Hodges, J.L. Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties; USAF School of Aviation Medicine: Randolf Fiend, TX, USA, 1951. [Google Scholar]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
- Dever, T.E.; Ivanov, I.P.; Hinnebusch, A.G. Translational regulation by uORFs and start codon selection stringency. Genes Dev. 2023, 37, 474–489. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Hao, W.; Yang, J.; Zhang, Y.; Wang, X.; Zhang, C. Emerging roles and potential clinical applications of translatable circular RNAs in cancer and other human diseases. Genes Dis. 2023, 10, 1994–2012. [Google Scholar] [CrossRef]
- Mudge, J.M.; Ruiz-Orera, J.; Prensner, J.R.; Brunet, M.A.; Calvet, F.; Jungreis, I.; Gonzalez, J.M.; Magrane, M.; Martinez, T.F.; Schulz, J.F.; et al. Standardized annotation of translated open reading frames. Nat. Biotechnol. 2022, 40, 994–999. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Perdikopanis, N.; Giannakakis, A.; Kavakiotis, I.; Hatzigeorgiou, A.G. D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery. Biology 2024, 13, 563. https://doi.org/10.3390/biology13080563
Perdikopanis N, Giannakakis A, Kavakiotis I, Hatzigeorgiou AG. D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery. Biology. 2024; 13(8):563. https://doi.org/10.3390/biology13080563
Chicago/Turabian StylePerdikopanis, Nikos, Antonis Giannakakis, Ioannis Kavakiotis, and Artemis G. Hatzigeorgiou. 2024. "D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery" Biology 13, no. 8: 563. https://doi.org/10.3390/biology13080563
APA StylePerdikopanis, N., Giannakakis, A., Kavakiotis, I., & Hatzigeorgiou, A. G. (2024). D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery. Biology, 13(8), 563. https://doi.org/10.3390/biology13080563