Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case
Abstract
:1. Introduction
2. Material and Methods
2.1. DNA Extraction and Sequencing
2.2. Genomic Reads Subsampling and Preprocessing
2.3. Distance Calculation and Phylogenetic Placement
3. Results
4. Discussion
4.1. Coverage
4.2. Selection of the Best Matched Species
4.3. Phylogenetic Placement
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hebert, P.; Cywinska, A.; Ball, S.; Dewaard, J. Biological identifications through DNA barcodes. Proc. R Soc. B Biol. Sci. 2003, 270, 313–321. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Savolainen, V.; Cowan, R.S.; Vogler, A.P.; Roderick, G.K.; Lane, R. Towards writing the encyclopaedia of life: An introduction to DNA barcoding. Phil. Trans. R. Soc. B 2005, 360, 1805–1811. [Google Scholar] [CrossRef] [PubMed]
- Schindel, D.E.; Miller, S.E. DNA barcoding a useful tool for taxonomists. Nature 2005, 435, 17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Taberlet, P.; Coissac, E.; Pompanon, F.; Brochmann, C.; Willerslev, E. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol. 2012, 21, 2045–2050. [Google Scholar] [CrossRef]
- Bohmann, K.; Mirarab, S.; Bafna, V.; Gilbert, M.T.P. Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification. Mol. Ecol. 2020, 29, 2521–2534. [Google Scholar] [CrossRef]
- Quicke, D.L.J.; Smith, M.A.; Janzen, D.H.; Hallwachs, W.; Fernandez-Triana, J.; Laurenne, N.M.; Zaldívar-Riverón, A.; Shaw, M.R.; Broad, G.R.; Klopfstein, S.; et al. Utility of the DNA barcoding gene fragment for parasitic wasp phylogeny (Hymenoptera: Ichneumonoidea): Data release and new measure of taxonomic congruence. Mol. Ecol. Resour. 2012, 12, 676–685. [Google Scholar] [CrossRef]
- Ratnasingham, S.; Hebert, P.D.N. BOLD: The barcode of life data system (www.Barcodinglife.Org). Mol. Ecol. Notes 2007, 7, 355–364. [Google Scholar] [CrossRef] [Green Version]
- Coissac, E.; Hollingsworth, P.M.; Lavergne, S.; Taberlet, P. From barcodes to genomes: Extending the concept of DNA barcoding. Mol. Ecol. 2016, 25, 1423–1428. [Google Scholar] [CrossRef] [Green Version]
- Straub, S.C.K.; Parks, M.; Weitemier, K.; Fishbein, M.; Cronn, R.C.; Liston, A. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics. Am. J. Bot. 2012, 99, 349–364. [Google Scholar] [CrossRef] [Green Version]
- Dodsworth, S. Genome skimming for next-generation biodiversity analysis. Trends Plant Sci. 2015, 20, 525–527. [Google Scholar] [CrossRef]
- Rachtman, E.; Balaban, M.; Bafna, V.; Mirarab, S. The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters. Mol. Ecol. Resour. 2020, 20, 649–661. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Li, Q.; Kong, L.; Yu, H. Mitogenomic phylogeny of Nassarius (Gastropoda: Neogastropoda). Zool. Scr. 2019, 48, 302–312. [Google Scholar] [CrossRef]
- Uribe, J.E.; Williams, S.T.; Templado, J.; Abalde, S.; Zardoya, R. Denser mitogenomic sampling improves resolution of the phylogeny of the superfamily Trochoidea (Gastropoda: Vetigastropoda). J. Molluscan Stud. 2017, 83, 111–118. [Google Scholar] [CrossRef]
- Irisarri, I.; Uribe, J.E.; Eernisse, D.J.; Zardoya, R. A mitogenomic phylogeny of chitons (Mollusca: Polyplacophora). BMC Evol. Biol. 2020, 20, 22. [Google Scholar] [CrossRef]
- Zardoya, R.; Meyer, A. Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Mol. Biol. Evol. 1996, 13, 933–942. [Google Scholar] [CrossRef] [Green Version]
- Uribe, J.E.; Irisarri, I.; Templado, J.; Zardoya, R. New patellogastropod mitogenomes help counteracting long-branch attraction in the deep phylogeny of gastropod mollusks. Mol. Phylogenetics Evol. 2019, 133, 12–23. [Google Scholar] [CrossRef]
- Sarmashghi, S.; Bohmann, K.; Gilbert, M.T.P.; Bafna, V.; Mirarab, S. Skmer: Assembly-free and alignment-free sample identification using genome skims. Genome Biol. 2019, 20, 34. [Google Scholar] [CrossRef] [Green Version]
- Sohn, J.I.; Nam, J.W. The present and future of de novo whole-genome assembly. Brief Bioinform. 2018, 19, 23–40. [Google Scholar] [CrossRef] [Green Version]
- Benoit, G.; Peterlongo, P.; Mariadassou, M.; Drezen, E.; Schbath, S.; Lavenier, D.; Lemaitre, C. Multiple comparative metagenomics using multiset k-mer counting. PeerJ Comput. Sci. 2016, 2, e94. [Google Scholar] [CrossRef] [Green Version]
- Fan, H.; Ives, A.R.; Surget-Groba, Y.; Cannon, C.H. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genom. 2015, 16, 522. [Google Scholar] [CrossRef] [Green Version]
- Ondov, B.D.; Treangen, T.J.; Melsted, P.; Mallonee, A.B.; Bergman, N.H.; Koren, S.; Phillippy, A.M. Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016, 17, 132. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Balaban, M.; Sarmashghi, S.; Mirarab, S. APPLES: Scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 2020, 69, 566–578. [Google Scholar] [CrossRef] [PubMed]
- Nakano, T.; Ozawa, T. Worldwide phylogeography of limpets of the order Patellogastropoda: Molecular, morphological and palaeontological evidence. J. Molluscan Stud. 2007, 73, 79–99. [Google Scholar] [CrossRef]
- Nakano, T.; Sasaki, T. Recent advances in molecular phylogeny, systematics and evolution of patellogastropod limpets. J. Molluscan Stud. 2011, 77, 203–217. [Google Scholar] [CrossRef]
- Bushnell, B. BBTools Software Package. Available online: http://sourceforge.Net/projects/bbmap (accessed on 1 December 2021).
- Chen, S.F.; Zhou, Y.Q.; Chen, Y.R.; Gu, J. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
- Wood, D.E.; Lu, J.; Langmead, B. Improved metagenomic analysis with kraken 2. Genome Biol. 2019, 20, 257. [Google Scholar] [CrossRef] [Green Version]
- Marçais, G.; Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 2011, 27, 764–770. [Google Scholar] [CrossRef] [Green Version]
- Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.; Kumar, S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 2011, 28, 2731–2739. [Google Scholar] [CrossRef] [Green Version]
- Lefort, V.; Desper, R.; Gascuel, O. FastME 2.0: A comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 2015, 32, 2798–2800. [Google Scholar] [CrossRef] [Green Version]
- Moshiri, N. TreeSwift: A massively scalable Python tree package. SoftwareX 2020, 11, 100436. [Google Scholar] [CrossRef]
- Jukes, T.H.; Cantor, C.R. Evolution of protein molecules. In Mammalian Protein Metabolism; Munro, H.N., Ed.; Academic Press: New York, NY, USA, 1969; Volume III, pp. 21–132. ISBN 978-1-4832-3211-9. [Google Scholar]
- Rambaut, A. FigTree v1.4.4. Available online: http://tree.bio.ed.ac.uk/software/figtree (accessed on 15 December 2021).
- Xu, T.; Qi, L.; Kong, L.F.; Li, Q. Mitogenomics reveals phylogenetic relationships of Patellogastropoda (Mollusca, Gastropoda) and dynamic gene rearrangements. Zool. Scr. 2022, 51, 147–160. [Google Scholar] [CrossRef]
- Hickerson, M.J.; Meyer, C.P.; Moritz, C.; Hedin, M. DNA Barcoding will often fail to discover new animal species over broad parameter space. Syst. Biol. 2006, 55, 729–739. [Google Scholar] [CrossRef] [PubMed]
- Besnard, G.; Christin, P.A.; Male, P.-J.G.; Lhuillier, E.; Lauzeral, C.; Coissac, E.; Vorontsova, M.S. From museums to genomics: Old herbarium specimens shed light on a C3 to C4 transition. J. Exp. Bot. 2014, 65, 6711–6721. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Wang, X.; Xie, L.; Tan, M.; Li, Z.; Su, X.; Zhang, H.; Misof, B.; Kjer, K.M.; Tang, M.; et al. Mitochondrial capture enriches mito-DNA 100-fold, enabling PCR-free mitogenomics biodiversity analysis. Mol. Ecol. Resour. 2016, 16, 470–479. [Google Scholar] [CrossRef]
- Zapata, F.; Wilson, N.G.; Howison, M.; Andrade, S.C.S.; Jörger, K.M.; Schroedl, M.; Goetz, F.E.; Giribet, G.; Dunn, C.W. Phylogenomic analyses of deep gastropod relationships reject Orthogastropoda. Proc. R Soc. B 2014, 281, 20141739. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gaitán-Espitia, J.D.; González-Wevar, C.A.; Poulin, E.; Cardenas, L. Antarctic and sub-Antarctic Nacella limpets reveal novel evolutionary characteristics of mitochondrial genomes in Patellogastropoda. Mol. Phylogenetics Evol. 2019, 131, 1–7. [Google Scholar] [CrossRef] [PubMed]
Subclass | Family | Species | Locality |
---|---|---|---|
Patellogastropoda | |||
Nacellidae | |||
Cellana toreuma | Yangjiang, Guangdong, China | ||
Wenchang, Hainan, China | |||
Cellana nigrolineata | Jeju Island, South Korea | ||
Cellana grata | Ningde, Fujian, China | ||
Patellidae | |||
Scutellastra flexuosa | Sansha, Hainan, China | ||
Lottiidae | |||
Nipponacmea radula | Weihai, Shandong, China | ||
Patelloida ryukyuensis | Weihai, Shandong, China | ||
Patelloida saccharina lanx | Wenchang, Hainan, China | ||
Patelloida conulus | Weihai, Shandong, China | ||
Lottia cassis | Weihai, Shandong, China | ||
Lottia goshimai | Qingdao, Shandong, China | ||
Vetigastropoda | |||
Trochidae | |||
Trochus maculatus | Sanya, Hainan, China |
COI | 0.1 Gb | 0.5 Gb | 1 Gb | 2 Gb | 4 Gb | Largest Data | |
---|---|---|---|---|---|---|---|
C. toreuma (GD) | 0.003 | 0.0918 | 0.0095 | 0.0115 | 0.0131 | 0.0145 | 0.0159 |
C. grata | 0.174 | 0.1981 | 0.1272 | 0.1334 | 0.1359 | 0.1382 | 0.1406 |
C. nigrolineata | 0.187 | 0.1987 | 0.1236 | 0.1299 | 0.1337 | 0.1373 | 0.1397 |
P. ryukyuensis | 0.368 | 0.2727 | 0.1986 | 0.2261 | 0.2318 | 0.2531 | 0.2554 |
P. conulus | 0.384 | 0.2630 | 0.1985 | 0.2106 | 0.2254 | 0.2395 | 0.2410 |
L. goshimai | 0.392 | 0.2541 | 0.1883 | 0.1955 | 0.2042 | 0.2174 | 0.2229 |
P. saccharina lanx | 0.399 | 0.2480 | 0.1807 | 0.2041 | 0.2112 | 0.2230 | 0.228 |
L. cassis | 0.676 | 0.2705 | 0.2078 | 0.2311 | 0.2342 | 0.2479 | 0.2609 |
N. radula | 0.684 | 0.2439 | 0.2016 | 0.2112 | 0.2116 | 0.2320 | 0.2389 |
COI | 0.1 Gb | 0.5 Gb | 1 Gb | 2 Gb | 4 Gb | Largest Data | |
---|---|---|---|---|---|---|---|
C. grata | 0.216 | 0.2541 | 0.1953 | 0.2138 | 0.2174 | 0.2359 | 0.2408 |
C. toreuma | 0.216 | 0.2439 | 0.1697 | 0.1806 | 0.1974 | 0.2139 | 0.2230 |
C. nigrolineata | 0.224 | 0.2582 | 0.1926 | 0.2077 | 0.2173 | 0.2365 | 0.2470 |
L. goshimai | 0.376 | 0.2503 | 0.1806 | 0.1850 | 0.2037 | 0.2179 | 0.2229 |
P. ryukyuensis | 0.379 | 0.2705 | 0.1905 | 0.2120 | 0.2194 | 0.2384 | 0.2379 |
P. conulus | 0.388 | 0.2480 | 0.1885 | 0.2011 | 0.2112 | 0.2218 | 0.2299 |
P. saccharina lanx | 0.408 | 0.2330 | 0.1637 | 0.1788 | 0.1881 | 0.2003 | 0.2043 |
L. cassis | 0.654 | 0.2727 | 0.1961 | 0.2141 | 0.2407 | 0.2403 | 0.2526 |
N. radula | 0.673 | 0.2480 | 0.1863 | 0.2030 | 0.2217 | 0.2317 | 0.2360 |
COI | 0.1 Gb | 0.5 Gb | 1 Gb | 2 Gb | 4 Gb | Largest Data | |
---|---|---|---|---|---|---|---|
P. ryukyuensis | 0.152 | 0.1155 | 0.0501 | 0.0570 | 0.0656 | 0.0729 | 0.0778 |
P. saccharina lanx | 0.234 | 0.1900 | 0.1308 | 0.1375 | 0.1481 | 0.1581 | 0.1627 |
L. goshimai | 0.245 | 0.1744 | 0.1044 | 0.1122 | 0.1216 | 0.1263 | 0.1288 |
L. cassis | 0.306 | 0.2176 | 0.152 | 0.1574 | 0.1666 | 0.1787 | 0.1797 |
N. radula | 0.325 | 0.1951 | 0.1310 | 0.1434 | 0.1528 | 0.1597 | 0.1614 |
C. nigrolineata | 0.359 | 0.2775 | 0.2353 | 0.2434 | 0.2394 | 0.2458 | 0.2524 |
C. grata | 0.362 | 0.2802 | 0.2225 | 0.2487 | 0.2445 | 0.2435 | 0.2501 |
C. toreuma | 0.365 | 0.2630 | 0.1986 | 0.2041 | 0.2300 | 0.2491 | 0.2467 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, T.; Kong, L.; Li, Q. Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes 2022, 13, 1192. https://doi.org/10.3390/genes13071192
Xu T, Kong L, Li Q. Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes. 2022; 13(7):1192. https://doi.org/10.3390/genes13071192
Chicago/Turabian StyleXu, Tao, Lingfeng Kong, and Qi Li. 2022. "Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case" Genes 13, no. 7: 1192. https://doi.org/10.3390/genes13071192
APA StyleXu, T., Kong, L., & Li, Q. (2022). Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes, 13(7), 1192. https://doi.org/10.3390/genes13071192