Annotation and Characterization of the Zacco platypus Genome
Abstract
:1. Introduction
2. Materials and Methods
2.1. Sample Collection and DNA Extraction
2.2. Genome Sequencing
2.3. Genomic Repeat Analysis and RNA Profiling
2.4. Gene Prediction and Annotation
2.5. Functional Gene Annotation
2.6. Gene Family Identification and Phylogenetic Analysis
3. Results and Discussion
3.1. Genome Sequencing and Assembly
3.2. Comparison of Transposable Elements and Non-Coding RNA Profiling
3.3. Gene Prediction and Functional Annotation with Fish Genomes
3.4. Gene Family Identification and Phylogenetic Analysis
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Van der Oost, R.; Beyer, J.; Vermeulen, N.P. Fish bioaccumulation and biomarkers in environmental risk assessment: A review. Environ. Toxicol. Pharmacol. 2003, 13, 57–149. [Google Scholar] [CrossRef]
- Piña, B.; Barata, C. A genomic and ecotoxicological perspective of DNA array studies in aquatic environmental risk assessment. Aquat. Toxicol. 2011, 105, 40–49. [Google Scholar] [CrossRef]
- Lieschke, G.J.; Currie, P.D. Animal models of human disease: Zebrafish swim into view. Nat. Rev. Genet. 2007, 8, 353–367. [Google Scholar] [CrossRef]
- Kasahara, M.; Naruse, K.; Sasaki, S.; Nakatani, Y.; Qu, W.; Ahsan, B.; Yamada, T.; Nagayasu, Y.; Doi, K.; Kasai, Y.; et al. The medaka draft genome and insights into vertebrate genome evolution. Nature 2007, 447, 714–719. [Google Scholar] [CrossRef]
- Howe, K.; Clark, M.D.; Torroja, C.F.; Torrance, J.; Berthelot, C.; Muffato, M.; Collins, J.E.; Humphray, S.; McLaren, K.; Matthews, L.; et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 2013, 496, 498–503. [Google Scholar] [CrossRef]
- Leprieur, F.; Beauchard, O.; Blanchet, S.; Oberdorff, T.; Brosse, S. Fish invasions in the world’s river systems: When natural processes are blurred by human activities. PLoS Biol. 2008, 6, e28. [Google Scholar]
- Vorosmarty, C.J.; Green, P.; Salisbury, J.; Lammers, R.B. Global water resources: Vulnerability from climate change and population growth. Science 2000, 289, 284–288. [Google Scholar] [CrossRef]
- Klemm, D.J. Fish Field and Laboratory Methods for Evaluating the Biological Integrity of Surface Waters; Environmental Monitoring Systems Laboratory-Cincinnati, Office of Modeling, Monitoring Systems, and Quality Assurance, Office of Research and Development, U.S. Environmental Protection Agency: Cincinnati, OH, USA, 1993.
- Kim, J.-H.; Yeom, D.-H.; Kim, W.-K.; An, K.-G. Regional ecological health or risk assessments of stream ecosystems using biomarkers and bioindicators of target species (Pale Chub). Water Air Soil Pollut. 2016, 227, 469. [Google Scholar] [CrossRef]
- Dai, Y.-J.; Jia, Y.-F.; Chen, N.; Bian, W.-P.; Li, Q.-K.; Ma, Y.-B.; Chen, Y.-L.; Pei, D.-S. Zebrafish as a model system to study toxicology. Environ. Toxicol. Chem. 2014, 33, 11–17. [Google Scholar] [CrossRef]
- Kim, W.-S.; Park, K.; Park, J.-W.; Lee, S.-H.; Kim, J.-H.; Kim, Y.-J.; Oh, G.-H.; Ko, B.-S.; Park, J.-W.; Hong, C.; et al. Transcriptional responses of stress-related genes in pale chub (Zacco platypus) inhabiting different aquatic environments: Application for biomonitoring aquatic ecosystems. Int. J. Environ. Res. Public Health 2022, 19, 11471. [Google Scholar] [CrossRef]
- Kim, W.-K.; Jung, J. In situ impact assessment of wastewater effluents by integrating multi-level biomarker responses in the pale chub (Zacco platypus). Ecotoxicol. Environ. Saf. 2016, 128, 246–251. [Google Scholar] [CrossRef]
- Canzler, S.; Schor, J.; Busch, W.; Schubert, K.; Rolle-Kampczyk, U.E.; Seitz, H.; Kamp, H.; von Bergen, M.; Buesen, R.; Hackermüller, J. Prospects and challenges of multi-omics data integration in toxicology. Arch. Toxicol. 2020, 94, 371–388. [Google Scholar] [CrossRef] [PubMed]
- Nam, S.-E.; Bae, D.-Y.; Ki, J.-S.; Ahn, C.-Y.; Rhee, J.-S. The importance of multi-omics approaches for the health assessment of freshwater ecosystems. Mol. Cell. Toxicol. 2023, 19, 3–11. [Google Scholar] [CrossRef]
- Xu, X.; Chen, J.; Guan, W.; Niu, B.; Yi, S.; Lou, B. A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus. Sci. Data 2024, 11, 317. [Google Scholar] [CrossRef]
- Allen, G.C.; Flores-Vergara, M.; Krasynanski, S.; Kumar, S.; Thompson, W. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 2006, 1, 2320–2325. [Google Scholar] [CrossRef]
- Folmer, O.; Black, M.; Hoeh, W.; Lutz, R.; Vrijenhoek, R. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol. Mar. Biol. Biotechnol. 1994, 3, 294–299. [Google Scholar] [PubMed]
- Brandon-Mong, G.-J.; Gan, H.-M.; Sing, K.-W.; Lee, P.-S.; Lim, P.-E.; Wilson, J.-J. DNA metabarcoding of insects and allies: An evaluation of primers and pipelines. Bull. Entomol. Res. 2015, 105, 717–727. [Google Scholar] [CrossRef]
- Nam, S.-E.; Rhee, J.-S. Chromosomal-level genome assembly data from the pale chub, Zacco platypus (Jordan & Evermann, 1902). Data Brief 2024, 55, 110596. [Google Scholar]
- Bolger, A.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
- Bushnell, B.; Rood, J.; Singer, E. BBMerge–accurate paired shotgun read merging via overlap. PLoS ONE 2017, 12, e0185056. [Google Scholar] [CrossRef]
- Ranallo-Benavidez, T.R.; Jaron, K.S.; Schatz, M.C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 2020, 11, 1432. [Google Scholar] [CrossRef] [PubMed]
- Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 2010, 26, 589–595. [Google Scholar] [CrossRef]
- Burton, J.N.; Adey, A.; Patwardhan, R.P.; Qiu, R.; Kitzman, J.O.; Shendure, J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 2013, 31, 1119–1125. [Google Scholar] [CrossRef]
- Durand, N.C.; Robinson, J.T.; Shamim, M.S.; Machol, I.; Mesirov, J.P.; Lander, E.S.; Aiden, E.L. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016, 3, 99–101. [Google Scholar] [CrossRef] [PubMed]
- Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19, 1639–1645. [Google Scholar] [CrossRef]
- Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef]
- Winter, D.; Lee, K.; Cox, M. Pafr: Read, Manipulate and Visualize Pairwise mApping Format. 2020. Available online: https://dwinter.github.io/pafr/ (accessed on 3 July 2024).
- Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef]
- Bao, Z.; Eddy, S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12, 1269–1276. [Google Scholar] [CrossRef]
- Flynn, J.M.; Hubley, R.; Goubert, C.; Rosen, J.; Clark, A.G.; Feschotte, C.; Smit, A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 2020, 117, 9451–9457. [Google Scholar] [CrossRef]
- Nawrocki, E.P.; Eddy, S.R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 2013, 29, 2933–2935. [Google Scholar] [CrossRef]
- Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.; Lieber, M.; et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef] [PubMed]
- Holt, C.; Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 2011, 12, 491. [Google Scholar] [CrossRef] [PubMed]
- Ter-Hovhannisyan, V.; Lomsadze, A.; Chernoff, Y.O.; Borodovsky, M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008, 18, 1979–1990. [Google Scholar] [CrossRef]
- Korf, I. Gene finding in novel genomes. BMC Bioinform. 2004, 5, 59. [Google Scholar] [CrossRef]
- Stanke, M.; Schöffmann, O.; Morgenstern, B.; Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform. 2006, 7, 62. [Google Scholar] [CrossRef]
- Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef] [PubMed]
- Buchfink, B.; Reuter, K.; Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 2021, 18, 366–368. [Google Scholar] [CrossRef]
- Jones, P.; Binns, D.; Chang, H.-Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
- Götz, S.; Garcia-Gomez, J.M.; Terol, J.; Williams, T.D.; Nagaraj, S.H.; Nueda, M.J.; Robles, M.; Talón, M.; Dopazo, J.; Conesa, A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008, 36, 3420–3435. [Google Scholar] [CrossRef] [PubMed]
- Moriya, Y.; Itoh, M.; Okuda, S.; Yoshizawa, A.C.; Kanehisa, M. KAAS: An automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007, 35, W182–W185. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
- Hedges, S.B.; Dudley, J.; Kumar, S. TimeTree: A public knowledge-base of divergence times among organisms. Bioinformatics 2006, 22, 2971–2972. [Google Scholar] [CrossRef]
- Bouckaert, R.; Vaughan, T.G.; Barido-Sottani, J.; Duchêne, S.; Fourment, M.; Gavryushkina, A.; Heled, J.; Jones, G.; Kühnert, D.; De Maio, N.; et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2019, 15, e1006650. [Google Scholar] [CrossRef]
- Vurture, G.W.; Sedlazeck, F.J.; Nattestad, M.; Underwood, C.J.; Fang, H.; Gurtowski, J.; Schatz, M.C. GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics 2017, 33, 2202–2204. [Google Scholar] [CrossRef]
- Kidwell, M.G. Transposable elements and the evolution of genome size in eukaryotes. Genetica 2002, 115, 49–63. [Google Scholar] [CrossRef]
- Chalopin, D.; Naville, M.; Plard, F.; Galiana, D.; Volff, J.N. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biolol. Evol. 2015, 7, 567–580. [Google Scholar] [CrossRef]
- Shao, F.; Han, M.; Peng, Z. Evolution and diversity of transposable elements in fish genomes. Sci. Rep. 2019, 9, 15399. [Google Scholar] [CrossRef]
- Risinger, C.; Larhammar, D. Multiple loci for synapse protein SNAP-25 in the tetraploid goldfish. Proc. Natl. Acad. Sci. USA 1993, 90, 10598–10602. [Google Scholar] [CrossRef]
- Meyer, A.; Schartl, M. Gene and genome duplications in vertebrates: The one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 1999, 11, 699–704. [Google Scholar] [CrossRef]
- Robinson-Rechavi, M.; Marchand, O.; Escriva, H.; Bardet, P.L.; Zelus, D.; Hughes, S.; Laudet, V. Euteleost fish genomes are characterized by expansion of gene families. Genome Res. 2001, 11, 781–788. [Google Scholar] [CrossRef]
- Rowland, A.; Miners, J.O.; Mackenzie, P.I. The UDP-glucuronosyltransferases: Their role in drug metabolism and detoxification. Int. J. Biochem. Cell Biol. 2013, 45, 1121–1132. [Google Scholar] [CrossRef]
- Guengerich, F.P. Common and uncommon cytochrome P450 reactions related to metabolism and chemical toxicity. Chem. Res. Toxicol. 2001, 14, 611–650. [Google Scholar] [CrossRef]
- Rhee, J.-S.; Kim, B.-M.; Choi, B.-S.; Choi, I.-Y.; Wu, R.S.S.; Nelson, D.R.; Lee, J.-S. Whole spectrum of cytochrome P450 genes and molecular responses to water-accommodated fractions exposure in the marine medaka. Environ. Sci. Technol. 2013, 47, 4804–4812. [Google Scholar] [CrossRef]
- Margis, R.; Dunand, C.; Teixeira, F.K.; Margis-Pinheiro, M. Glutathione peroxidase family—An evolutionary overview. FEBS J. 2008, 275, 3959–3970. [Google Scholar] [CrossRef]
- Cuenda, A. Mitogen-activated protein kinase kinase 4 (MKK4). Int. J. Biochem. Cell Biol. 2000, 32, 581–587. [Google Scholar] [CrossRef]
De Novo Assembly (HiFi) | Final Scaffolding (Hi-C) | |
---|---|---|
Number of sequences | 336 | 319 |
Number of scaffolds (pseudomolecule) | - | 24 |
Number of contigs over 100 kb | - | 40 |
Number of contigs over 1 Mb | - | 25 |
Total length (bp) | 838,569,924 | 824,428,551 |
Minimum length | 13,094 | 13,094 |
Maximum length | 47,596,783 | 47,632,145 |
N50 | 31,873,553 | 33,346,874 |
Actinopterygii_odb10 | No. | % |
---|---|---|
Complete BUSCOs (%) | 3572 | 98.10% |
Complete and single copy | 3521 | 96.70% |
Complete and duplicated | 51 | 1.40% |
Fragmented | 18 | 0.50% |
Missing | 50 | 1.40% |
RNA-Seq | Iso-Seq | Final Predicted Genes | |
---|---|---|---|
Processing | Trinity assembly | Clustering with 95% identities | Genes selected with AED 0~0.99 |
Contigs number | 211,667 | 201,849 | 34,036 |
Contigs length (bp) | 211,578,101 | 302,217,686 | 48,030,879 |
Min length (bp) | 189 | 96 | 102 |
Max length (bp) | 40,464 | 11,540 | 92,250 |
Average length (bp) | 1000 | 1497 | 1411 |
GC content (%) | 48.4 | 44.5 | 50.3 |
BUSCOs % | C: 82.7% [S: 34.2%, D: 48.5%], F: 6.6%, M: 10.7% | C: 84.9% [S: 40.3%, D: 44.6%], F: 2.8%, M: 12.3% | C: 89.4% [S: 87.9%, D: 1.5%], F: 4.3%, M: 6.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nam, S.-E.; Bae, D.-Y.; Rhee, J.-S. Annotation and Characterization of the Zacco platypus Genome. Diversity 2024, 16, 529. https://doi.org/10.3390/d16090529
Nam S-E, Bae D-Y, Rhee J-S. Annotation and Characterization of the Zacco platypus Genome. Diversity. 2024; 16(9):529. https://doi.org/10.3390/d16090529
Chicago/Turabian StyleNam, Sang-Eun, Dae-Yeul Bae, and Jae-Sung Rhee. 2024. "Annotation and Characterization of the Zacco platypus Genome" Diversity 16, no. 9: 529. https://doi.org/10.3390/d16090529
APA StyleNam, S. -E., Bae, D. -Y., & Rhee, J. -S. (2024). Annotation and Characterization of the Zacco platypus Genome. Diversity, 16(9), 529. https://doi.org/10.3390/d16090529