Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes
Abstract
:1. Introduction
2. Results
2.1. Summary of Raw Data, Assemblies, and Benchmarks
2.2. ONT Genome Assembly
2.3. CLR Genome Assembly
2.4. HiFi Genome Assembly
2.5. Construction and Quality Assessment of Hi-C-Based Chromosome-Level Genomes
2.6. Case
3. Discussion
4. Materials and Methods
4.1. Insect Material
4.2. Genome Sequencing and Creation of Subsets
4.3. De Novo Genome Assembly Workflow
4.4. Hi-C Scaffolding and Gap Filling
4.5. Genome Assembly Evaluation
4.6. Chromosomal Synteny Analysis
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- You, M.; Ke, F.; You, S.; Wu, Z.; Liu, Q.; He, W.; Baxter, S.W.; Yuchi, Z.; Vasseur, L.; Gurr, G.M.; et al. Variation among 532 genomes unveils the origin and evolutionary history of a global insect herbivore. Nat. Commun. 2020, 11, 2321. [Google Scholar] [CrossRef]
- Wu, N.; Zhang, S.; Li, X.; Cao, Y.; Liu, X.; Wang, Q.; Liu, Q.; Liu, H.; Hu, X.; Zhou, X.J.; et al. Fall webworm genomes yield insights into rapid adaptation of invasive species. Nat. Ecol. Evol. 2019, 3, 105–115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, Q.; Zhao, H.; Wen, M.; Li, J.; Zhou, H.; Wang, J.; Zhou, Y.; Liu, Y.; Du, L.; Kang, H.; et al. Genome of the webworm Hyphantria cunea unveils genetic adaptations supporting its rapid invasion and spread. BMC Genom. 2020, 21, 242. [Google Scholar] [CrossRef] [Green Version]
- Wan, F.; Yin, C.; Tang, R.; Chen, M.; Wu, Q.; Huang, C.; Qian, W.; Rota-Stabelli, O.; Yang, N.; Wang, S.; et al. A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance. Nat. Commun. 2019, 10, 4237. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Benowitz, K.M.; Allan, C.W.; Degain, B.A.; Li, X.; Fabrick, J.A.; Tabashnik, B.E.; Carrière, Y.; Matzkin, L.M. Novel genetic basis of resistance to Bt toxin Cry1Ac in Helicoverpa zea. Genetics 2022, 221, iyac037. [Google Scholar] [CrossRef]
- Edelman, N.B.; Frandsen, P.B.; Miyagi, M.; Clavijo, B.; Davey, J.; Dikow, R.B.; García-Accinelli, G.; Van Belleghem, S.M.; Patterson, N.; Neafsey, D.E.; et al. Genomic architecture and introgression shape a butterfly radiation. Science 2019, 366, 594–599. [Google Scholar] [CrossRef] [Green Version]
- Xia, Q.; Li, S.; Feng, Q. Advances in silkworm studies accelerated by the genome sequencing of Bombyx mori. Annu. Rev. Entomol. 2014, 59, 513–536. [Google Scholar] [CrossRef]
- Kumar, K.R.; Cowley, M.J.; Davis, R.L. Next-Generation Sequencing and Emerging Technologies. Semin. Thromb. Hemost. 2019, 45, 661–673. [Google Scholar] [CrossRef]
- Sohn, J.I.; Nam, J.W. The present and future of de novo whole-genome assembly. Brief Bioinform. 2018, 19, 23–40. [Google Scholar]
- Mei, Y.; Jing, D.; Tang, S.; Chen, X.; Chen, H.; Duanmu, H.; Cong, Y.; Chen, M.; Ye, X.; Zhou, H.; et al. InsectBase 2.0, a comprehensive gene resource for insects. Nucleic Acids Res. 2022, 50, D1040–D1045. [Google Scholar] [CrossRef]
- Triant, D.A.; Cinel, S.D.; Kawahara, A.Y. Lepidoptera genomes, current knowledge.; gaps and future directions. Curr. Opin. Insect. Sci. 2018, 25, 99–105. [Google Scholar] [CrossRef] [PubMed]
- Wenger, A.M.; Peluso, P.; Rowell, W.J.; Chang, P.C.; Hall, R.J.; Concepcion, G.T.; Ebler, J.; Fungtammasan, A.; Kolesnikov, A.; Olson, N.D.; et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 2019, 37, 1155–1162. [Google Scholar] [CrossRef] [PubMed]
- van Dijk, E.L.; Jaszczyszyn, Y.; Naquin, D.; Thermes, C. The Third Revolution in Sequencing Technology. Trends Genet. 2018, 34, 666–681. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, C.G.; Yang, S.H.; Wang, X.; Bai, F.W.; Wang, Z. Benchmarking of long-read sequencing.; assemblers and polishers for yeast genome. Brief Bioinform. 2022, 23, bbac146. [Google Scholar] [CrossRef]
- Kim, J.; Lee, C.; Ko, B.J.; Yoo, D.A.; Won, S.; Phillippy, A.M.; Fedrigo, O.; Zhang, G.; Howe, K.; Wood, J.; et al. False gene and chromosome losses in genome assemblies caused by GC content variation and repeats. Genome Biol. 2022, 23, 204. [Google Scholar] [CrossRef] [PubMed]
- Ko, B.J.; Lee, C.; Kim, J.; Rhie, A.; Yoo, D.A.; Howe, K.; Wood, J.; Cho, S.; Brown, S.; Formenti, G.; et al. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol. 2022, 23, 205. [Google Scholar] [CrossRef]
- Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO, assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Zhang, Y.; Wang, A.Y.; Gao, M.; Chong, Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biol. 2021, 22, 312. [Google Scholar] [CrossRef]
- Yamaguchi, K.; Kadota, M.; Nishimura, O.; Ohishi, Y.; Naito, Y.; Kuraku, S. Technical considerations in Hi-C scaffolding and evaluation of chromosome-scale genome assemblies. Mol. Ecol. 2021, 30, 5923–5934. [Google Scholar] [CrossRef]
- Lu, F.; Wei, Z.; Luo, Y.; Guo, H.; Zhang, G.; Xia, Q.; Wang, Y. SilkDB 3.0, visualizing and exploring multiple levels of data for silkworm. Nucleic Acids Res. 2020, 48, D749–D755. [Google Scholar] [CrossRef] [Green Version]
- Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST, quality assessment tool for genome assemblies. Bioinformatics 2013, 29, 1072–1075. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, X.; Luan, Y.; Yue, F.; Eagle, C. A deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci. Adv. 2022, 8, eabn9215. [Google Scholar] [CrossRef]
- Murigneux, V.; Rai, S.K.; Furtado, A.; Bruxner, T.J.C.; Tian, W.; Harliwong, I.; Wei, H.; Yang, B.; Ye, Q.; Anderson, E.; et al. Comparison of long-read methods for sequencing and assembly of a plant genome. Gigascience 2020, 9, giaa146. [Google Scholar] [CrossRef] [PubMed]
- Nichuguti, N.; Fujiwara, H. Essential factors involved in the precise targeting and insertion of telomere-specific non-LTR retrotransposon.; SART1Bm. Sci. Rep. 2020, 10, 8963. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.W.; Kim, M.J.; Kim, S.R.; Park, J.S.; Kim, K.Y.; Kim, K.H.; Kwak, W.; Kim, I. Whole-genome sequences of 37 breeding line Bombyx mori strains and their phenotypes established since 1960s. Sci. Data 2022, 9, 189. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Shen, S.; Peng, J.; Zhou, X.; Kong, X.; Ren, P.; Liu, F.; Han, L.; Zhan, S.; Huang, Y.; et al. Chromosome-level genome assembly of an important pine defoliator.; Dendrolimus punctatus (Lepidoptera; Lasiocampidae). Mol. Ecol. Resour. 2020, 20, 1023–1037. [Google Scholar] [CrossRef]
- Thomas, G.W.C.; Dohmen, E.; Hughes, D.S.T.; Murali, S.C.; Poelchau, M.; Glastad, K.; Anstead, C.A.; Ayoub, N.A.; Batterham, P.; Bellair, M.; et al. Gene content evolution in the arthropods. Genome Biol. 2020, 21, 15. [Google Scholar] [CrossRef] [Green Version]
- Peccoud, J.; Loiseau, V.; Cordaux, R.; Gilbert, C. Massive horizontal transfer of transposable elements in insects. Proc. Natl. Acad. Sci. USA 2017, 114, 4721–4726. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Liu, Z.; Liu, C.; Shi, Z.; Pang, L.; Chen, C.; Chen, Y.; Pan, R.; Zhou, W.; Chen, X.X.; et al. HGT is widespread in insects and contributes to male courtship in lepidopterans. Cell 2022, 185, 2975–2987.e10. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, S.; Zhao, Q.; Ming, R.; Tang, H. Assembly of allele-aware.; chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 2019, 5, 833–845. [Google Scholar] [CrossRef]
- Koren, S.; Walenz, B.P.; Berlin, K.; Miller, J.R.; Bergman, N.H.; Phillippy, A.M. Canu, scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017, 27, 722–736. [Google Scholar] [CrossRef] [PubMed]
- Li, M.; Sun, C.; Xu, N.; Bian, P.; Tian, X.; Wang, X.; Wang, Y.; Jia, X.; Heller, R.; Wang, M.; et al. De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Mol. Biol. Evol. 2022, 39, msac066. [Google Scholar] [CrossRef] [PubMed]
- Ruan, J.; Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 2020, 17, 155–158. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Nie, F.; Xie, S.Q.; Zheng, Y.F.; Dai, Q.; Bray, T.; Wang, Y.X.; Xing, J.F.; Huang, Z.J.; Wang, D.P.; et al. Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat. Commun. 2021, 12, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Xiao, C.L.; Chen, Y.; Xie, S.Q.; Chen, K.N.; Wang, Y.; Han, Y.; Luo, F.; Xie, Z. MECAT: Fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat. Methods 2017, 14, 1072–1074. [Google Scholar] [CrossRef]
- Nurk, S.; Walenz, B.P.; Rhie, A.; Vollger, M.R.; Logsdon, G.A.; Grothe, R.; Miga, K.H.; Eichler, E.E.; Phillippy, A.M.; Koren, S. HiCanu, accurate assembly of segmental duplications.; satellites.; and allelic variants from high-fidelity long reads. Genome Res. 2020, 30, 1291–1305. [Google Scholar] [CrossRef]
- Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef]
- Li, H. Minimap2, pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef] [Green Version]
- Vaser, R.; Sović, I.; Nagarajan, N.; Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017, 27, 737–746. [Google Scholar] [CrossRef] [Green Version]
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic, a flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed]
- Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar] [CrossRef] [Green Version]
- Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Durand, N.C.; Robinson, J.T.; Shamim, M.S.; Machol, I.; Mesirov, J.P.; Lander, E.S.; Aiden, E.L. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016, 3, 99–101. [Google Scholar] [CrossRef] [Green Version]
- Xu, M.; Guo, L.; Gu, S.; Wang, O.; Zhang, R.; Peters, B.A.; Fan, G.; Liu, X.; Xu, X.; Deng, L.; et al. TGS-GapCloser, A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 2020, 9, giaa094. [Google Scholar] [CrossRef] [PubMed]
- Marçais, G.; Delcher, A.L.; Phillippy, A.M.; Coston, R.; Salzberg, S.L.; Zimin, A. MUMmer4, A fast and versatile genome alignment system. PLoS Comput. Biol. 2018, 14, e1005944. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools, An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 2020, 13, 1194–1202. [Google Scholar] [CrossRef] [PubMed]
Acronym | Description | Reference |
---|---|---|
CLR | PacBio Continuous Long Reads; 48 Gb (110× coverage, E-size = 11,909 bp) | This study |
ONT | Oxford Nanopore Technologies Reads; 70 Gb (160× coverage, E-size = 33,543 bp) | This study |
HIFI | PacBio High Fidelity reads; 27 Gb (60× coverage, E-size = 16,484 bp) | This study |
Hi-C | High-throughput Chromosome conformation capture sequencing; used for scaffolding | Lu et al. (2020) [20] |
Contigs a | Contig N50 (kb) | Structural Error | Small-Scale Error (/Mb) | QV b | Complete Genes c | |
---|---|---|---|---|---|---|
CLR-Canu | 2387.9 | 2161 | 1177.3 | 515.3 | 32 | 989.1 |
CLR-wtdbg2 | 2347.3 | 1236 | 65.2 | 2386.8 | 26.8 | 883.9 |
CLR-wtdbg2_polished | 2206.6 | 1246 | 342 | 765.7 | 30.6 | 1054.1 |
CLR-MECAT2 | 1992 | 2111 | 1310 | 5594.9 | 22.8 | 618 |
CLR-MECAT2_polished | 1948.5 | 2106 | 265.5 | 490.9 | 32.4 | 1182 |
CLR-NextDenovo | 424.2 | 6463 | 125 | 207.4 | 36.7 | 1237.5 |
ONT-wtdbg2 | 9410.5 | 442 | 582.4 | 4488 | 22.1 | 1010.1 |
ONT-wtdbg2_polished | 6526 | 516 | 1473.3 | 788.6 | 27.4 | 1207.6 |
ONT-NECAT | 742.8 | 2656 | 1939.2 | 1691 | 24.3 | 1264.8 |
ONT-NextDenovo | 103.1 | 11,454 | 895.3 | 1093.5 | 27.1 | 1275 |
HIFI-HiCanu | 825.5 | 11,854 | 0.3 | 5.6 | 66.1 | 1342.8 |
HIFI-hifiasm | 163.7 | 13,247 | 2.7 | 9.4 | 50.6 | 1343.7 |
N9L-CLR a | D9L × N4-ONT b | P50T-HiFi c | KRSM d | |
---|---|---|---|---|
Size (Mb) | 446.7 | 454.6 | 456.6 | 446.2 |
Scaffold N50 (Mb) | 16.92 | 17.48 | 16.92 | 16.89 |
Contigs | 51 | 29 | 29 | 33 |
Contig N50 (Mb) | 14.38 | 17.48 | 16.92 | 16.89 |
Max contig e (Mb) | 21.50 | 21.92 | 21.53 | 21.51 |
Structural error | 193 | 852 | 2 | 34 |
Small-scale error (/Mb) | 88.47 | 980.86 | 1.79 | 55.13 |
QV | 36.84 | 27.37 | 56.84 | 41.75 |
BUSCOs f | 97.2% | 94.8% | 98.8% | 97.5% |
PAR g | 0.11% | 0.03% | 0.03% | 0.05% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, T.; Xing, W.; Wang, A.; Zhang, N.; Jia, L.; Ma, S.; Xia, Q. Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes. Int. J. Mol. Sci. 2023, 24, 649. https://doi.org/10.3390/ijms24010649
Zhang T, Xing W, Wang A, Zhang N, Jia L, Ma S, Xia Q. Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes. International Journal of Molecular Sciences. 2023; 24(1):649. https://doi.org/10.3390/ijms24010649
Chicago/Turabian StyleZhang, Tong, Weiqing Xing, Aoming Wang, Na Zhang, Ling Jia, Sanyuan Ma, and Qingyou Xia. 2023. "Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes" International Journal of Molecular Sciences 24, no. 1: 649. https://doi.org/10.3390/ijms24010649
APA StyleZhang, T., Xing, W., Wang, A., Zhang, N., Jia, L., Ma, S., & Xia, Q. (2023). Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes. International Journal of Molecular Sciences, 24(1), 649. https://doi.org/10.3390/ijms24010649