Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions
Abstract
:1. Introduction
2. Results
2.1. De Novo Assembly and Structural Validation of the HAEMATOCOCCUS Lacustris Genome
2.2. Structural Validation of Four Published Genome Assemblies
3. Discussion
4. Materials and Methods
4.1. Haematococcus lacustris Cultivation
4.2. ONT Sequencing
4.3. Illumina Sequencing
4.4. Optical Genome Mapping
4.5. Hi-C Data Generation
4.6. H. Lacustris De Novo Genome Assembly
4.7. Optical Genome Maps: Assembly and Hybrid Scaffolding
4.8. Source of Publicly Available Sequencing Data
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lee, J.Y.; Kong, M.; Oh, J.; Lim, J.S.; Chung, S.H.; Kim, J.M.; Kim, J.S.; Kim, K.H.; Yoo, J.C.; Kwak, W. Comparative Evaluation of Nanopore Polishing Tools for Microbial Genome Assembly and Polishing Strategies for Downstream Analysis. Sci. Rep. 2021, 11, 20740. [Google Scholar] [CrossRef]
- Wick, R.R.; Holt, K.E. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput. Biol. 2022, 18, e1009802. [Google Scholar] [CrossRef] [PubMed]
- Xue, Z.; Chen-guang, L.; Shi-hui, Y.; Xia, W.; Feng-wu, B.; Zhuo, W. Benchmarking of long-read sequencing, assemblers and polishers for yeast genome. Brief. Bioinform. 2022, 23, bbac146. [Google Scholar]
- Chen, Z.; Erickson, D.L.; Meng, J. Polishing the Oxford nanopore long-read assemblies of bacterial pathogens with illumina short reads to improve genomic analyses. Genomics 2021, 113, 1366–1377. [Google Scholar] [CrossRef]
- Liu, J.; Seetharam, A.S.; Chougule, K.; Ou, S.; Swentowsky, K.W.; Gent, J.I.; Llaca, V.; Woodhouse, M.R.; Manchanda, N.; Presting, G.G.; et al. Gapless assembly of maize chromosomes using long-read technologies. Genome Biol. 2020, 21, 121. [Google Scholar] [CrossRef]
- Nie, S.; Zhao, S.W.; Shi, T.L.; Zhao, W.; Zhang, R.G.; Tian, X.C.; Guo, J.F.; Yan, X.M.; Bao, Y.T.; Li, Z.C.; et al. Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color. Hortic. Res. 2022, uhac241. [Google Scholar] [CrossRef]
- Li, K.; Jiang, W.; Hui, Y.; Kong, M.; Feng, L.Y.; Gao, L.Z.; Li, P.; Lu, S. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol. Plant 2021, 14, 1745–1756. [Google Scholar] [CrossRef] [PubMed]
- Nurk, S.; Koren, S.; Rhie, A.; Rautiainen, M.; Bzikadze, A.V.; Mikheenko, A.; Vollger, M.R.; Altemose, N.; Uralsky, L.; Gershman, A.; et al. The complete sequence of a human genome. Science 2022, 376, 44–53. [Google Scholar] [CrossRef]
- Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De novo assembly of the aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef] [Green Version]
- Ghurye, J.; Pop, M. Modern technologies and algorithms for scaffolding assembled genomes. PLoS Comput. Biol. 2019, 15, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Mc Cartney, A.; Giulio Formenti, A.M. European Reference Genome Atlas–Pilot Project–Official Guidelines. Available online: https://Drive.Google.Com/Uc?Export=download&id=1bPL2xNxGCTz3HMfL2yt11E2fnYXPU-7s (accessed on 10 November 2022).
- Ghurye, J.; Rhie, A.; Walenz, B.P.; Schmitt, A.; Selvaraj, S.; Pop, M.; Phillippy, A.M.; Koren, S. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput. Biol. 2019, 15, e1007273. [Google Scholar] [CrossRef] [PubMed]
- Dudchenko, O.; Shamim, M.S.; Batra, S.S.; Durand, N.C.; Musial, N.T.; Mostofa, R.; Pham, M.; Hilaire, B.G.S.; Yao, W.; Stamenova, E.; et al. The juicebox assembly tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. BioRxiv 2018, 254797. [Google Scholar]
- Burton, J.N.; Adey, A.; Patwardhan, R.P.; Qiu, R.; Kitzman, J.O.; Shendure, J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 2013, 31, 1119–1125. [Google Scholar] [CrossRef]
- Bickhart, D.M.; Rosen, B.D.; Koren, S.; Sayre, B.L.; Hastie, A.R.; Chan, S.; Lee, J.; Lam, E.T.; Liachko, I.; Sullivan, S.T.; et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 2017, 49, 643–650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yamaguchi, K.; Kadota, M.; Nishimura, O.; Ohishi, Y.; Naito, Y.; Kuraku, S. Technical considerations in Hi-C scaffolding and evaluation of chromosome-scale genome assemblies. Mol. Ecol. 2021, 30, 5923–5934. [Google Scholar] [CrossRef] [PubMed]
- Shearer, L.A.; Anderson, L.K.; de Jong, H.; Smit, S.; Goicoechea, J.L.; Roe, B.A.; Hua, A.; Giovannoni, J.J.; Stack, S.M. Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome. G3: Genes Genomes Genet. 2014, 4, 1395–1405. [Google Scholar] [CrossRef] [Green Version]
- Aston, C.; Mishra, B.; Schwartz, D.C. Optical mapping and its potential for large-scale sequencing projects. Trends Biotechnol. 1999, 17, 297–302. [Google Scholar] [CrossRef]
- O’Bleness, M.; Searles, V.B.; Dickens, C.M.; Astling, D.; Albracht, D.; Mak, A.C.Y.; Lai, Y.Y.Y.; Lin, C.; Chu, C.; Graves, T.; et al. Finished sequence and assembly of the DUF1220-Rich 1q21 region using a haploid human genome. BMC Genom. 2014, 15, 387. [Google Scholar] [CrossRef] [Green Version]
- Udall, J.A.; Dawe, R.K. Is it ordered correctly? Validating genome assemblies by optical mapping. Plant Cell 2018, 30, 7–14. [Google Scholar] [CrossRef] [Green Version]
- Yuan, Y.; Bayer, P.E.; Scheben, A.; Chan, C.K.K.; Edwards, D. BioNanoAnalyst: A visualisation tool to assess genome assembly quality using bionano data. BMC Bioinform. 2017, 18, 323. [Google Scholar] [CrossRef] [Green Version]
- Jarvis, E.D.; Formenti, G.; Rhie, A.; Guarracino, A.; Yang, C.; Wood, J.; Tracey, A.; Thibaud-Nissen, F.; Vollger, M.R.; Porubsky, D.; et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022, 611, 519–531. [Google Scholar] [CrossRef] [PubMed]
- Teague, B.; Waterman, M.S.; Goldstein, S.; Potamousis, K.; Zhou, S.; Reslewic, S.; Sarkar, D.; Valouev, A.; Churas, C.; Kidd, J.M.; et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl. Acad. Sci. USA 2010, 107, 10848–10853. [Google Scholar] [CrossRef] [PubMed]
- Marcolungo, L.; Cecchin, M.; Bellamoli, F.; Lopatriello, G.; Rossato, M.; Girolomoni, L.; Giovannone, B.; Cosentino, E.; Rombauts, S.; Delledonne, M.B. High Quality Haematococcus Lacustris Genome Assembly and Annotation Reveals Its Diploid Genetic Features; Department of biotechnology, University of Verona: Verona, Italy, 2022; Submitted. [Google Scholar]
- Xu, W.; Zhang, Q.; Yuan, W.; Xu, F.; Muhammad Aslam, M.; Miao, R.; Li, Y.; Wang, Q.; Li, X.; Zhang, X.; et al. The genome evolution and low-phosphorus adaptation in white lupin. Nat. Commun. 2020, 11, 1069. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hufnagel, B.; Marques, A.; Soriano, A.; Marquès, L.; Divol, F.; Doumas, P.; Sallet, E.; Mancinotti, D.; Carrere, S.; Marande, W.; et al. High-quality genome sequence of white lupin provides insight into soil exploration and seed quality. Nat. Commun. 2020, 11, 492. [Google Scholar] [CrossRef] [Green Version]
- Tørresen, O.K.; Star, B.; Mier, P.; Andrade-Navarro, M.A.; Bateman, A.; Jarnot, P.; Gruca, A.; Grynberg, M.; Kajava, A.V.; Promponas, V.J.; et al. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019, 47, 10994–11006. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.Y.; Roberts, H.; Flores, D.S.C.; Cutler, A.J.; Brown, A.C.; Whalley, J.P.; Mielczarek, O.; Buck, D.; Lockstone, H.; Xella, B.; et al. Using de novo assembly to identify structural variation of eight complex immune system gene regions. PLoS Comput. Biol. 2021, 17, e1009254. [Google Scholar] [CrossRef]
- Staňková, H.; Hastie, A.R.; Chan, S.; Vrána, J.; Tulpová, Z.; Kubaláková, M.; Visendi, P.; Hayashi, S.; Luo, M.; Batley, J.; et al. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol. J. 2016, 14, 1523–1531. [Google Scholar] [CrossRef] [Green Version]
- Oliver, J.S.; Catalano, A.; Davis, J.R.; Grinberg, B.S.; Hutchins, T.E.; Kaiser, M.D.; Nurnberg, S.; Sage, J.M.; Seward, L.; Simelgor, G.; et al. High-definition electronic genome maps from single molecule data. BioRxiv 2017, 139840. [Google Scholar] [CrossRef] [Green Version]
- Çilingir, F.G.; A’Bear, L.; Hansen, D.; Davis, L.R.; Bunbury, N.; Ozgul, A.; Croll, D.; Grossen, C. Chromosome-level genome assembly for the aldabra giant tortoise enables insights into the genetic health of a threatened population. Gigascience 2022, 11, giac090. [Google Scholar] [CrossRef]
- Nath, O.; Fletcher, S.J.; Hayward, A.; Shaw, L.M.; Masouleh, A.K.; Furtado, A.; Henry, R.J.; Mitter, N. A haplotype resolved chromosomal level avocado genome allows analysis of novel avocado genes. Hortic. Res. 2022, 9, uhac157. [Google Scholar] [CrossRef]
- Zhao, S.; Yang, X.; Pang, B.; Zhang, L.; Wang, Q.; He, S.; Dou, H.; Zhang, H. A chromosome-level genome assembly of the redfin culter (Chanodichthys Erythropterus). Sci. Data 2022, 9, 535. [Google Scholar] [CrossRef] [PubMed]
- Lutz, K.A.; Wang, W.; Zdepski, A.; Michael, T.P. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing. BMC Biotechnol. 2011, 11, 54. [Google Scholar] [CrossRef]
- Wei, S.; Williams, Z. Rapid short-read sequencing and aneuploidy detection using MinION nanopore technology. Genetics 2016, 202, 37–44. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schwessinger, B. High Quality DNA from Fungi for Long Read Sequencing e.g., PacBio. Available online: https://www.protocols.io/view/High-quality-DNA-from-Fungi-for-long-read-sequenci-j8nlkky6l5r7/v2?version_warning=no (accessed on 18 April 2016).
- Zhang, M.; Zhang, Y.; Scheuring, C.F.; Wu, C.C.; Dong, J.J.; Zhang, H. Bin preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research. Nat. Protoc. 2012, 7, 467–478. [Google Scholar] [CrossRef] [PubMed]
- Kolmogorov, M.; Yuan, J.; Lin, Y.; Pevzner, P.A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019, 37, 540–546. [Google Scholar] [CrossRef]
- Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef] [Green Version]
- Vaser, R.; Sović, I.; Nagarajan, N.; Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017, 27, 737–746. [Google Scholar] [CrossRef] [Green Version]
- Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K.; et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 2014, 9, e112963. [Google Scholar] [CrossRef]
- Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef]
ONT Assembly | ONT + Hi-C | ONT + Hi-C + OM | |
---|---|---|---|
Total assembly length | 150,233,945 | 150,327,121 | 175,351,177 |
Total scaffold length (bp) | - | 137,192,742 | 162,448,219 |
Number of scaffolds | - | 32 | 1507 |
Scaffold N50 (bp) | - | 4,010,071 | 669,654 |
Scaffold average length | - | 4,287,273 | 107,796 |
Longest scaffold (bp) | - | 9,907,970 | 4,400,370 |
Shortest scaffold (bp) | - | 1,735,284 | 210 |
Number of gaps | - | 930 | 1120 |
Gap size (bp) | - | 93,000 | 25,118,148 |
Contigs in scaffolds | - | 962 | 1696 |
Remaining contigs | 1799 | 849 | 58 |
Remaining contig total length (bp) | 150,233,945 | 13,134,379 | 12,902,958 |
Remaining contig N50 (bp) | 230,876 | 49,528 | 248,743 |
Remaining contig average length (bp) | 83,510 | 15,470 | 222,465 |
Remaining contig maximum length (bp) | 987,270 | 438,472 | 925,569 |
Remaining contig minimum length (bp) | 501 | 501 | 80,956 |
Type | Number | Percentage |
---|---|---|
Confirmed error | 50/62 | 81% |
Non-confirmed error | 12/62 | 19% |
PacBio + Hi-C | PacBio + Hi-C After OM Conflict Resolution | PacBio + Hi-C + OM | |
---|---|---|---|
Total assembly length (bp) | 558,896,430 | 558,896,430 | 566,549,843 |
Total scaffolds length (bp) | 500,601,628 | 494,945,822 | 506,701,445 |
Number of scaffolds | 67 | 169 | 124 |
Scaffold N50 (bp) | 18,661,206 | 9,318,959 | 11,119,386 |
Scaffold average length (bp) | 7,471,666.09 | 2,928,673.50 | 4,086,301.98 |
Longest scaffold (bp) | 25,248,489 | 19,026,298 | 22,043,127 |
Shortest scaffold (bp) | 11,445 | 4439 | 4439 |
Number of gaps | 1764 | 1764 | 1823 |
Gap size (bp) | 1,532,853 | 1,532,853 | 9,186,266 |
Contigs in scaffolds | 1831 | 1933 | 1947 |
Remaining contigs | 1513 | 1812 | 1798 |
Remaining contig total length (bp) | 58,294,802 | 63,950,608 | 59,848,398 |
Corvus hawaiiensis | Porphyrio hotchstetteri | Alosa sapidissima | |
---|---|---|---|
Number of conflicts | 352 | 182 | 825 |
Number of scaffolds cut/Number of chromosomes | 21/48 | 30/71 | 24/29 |
Corvus hawaiiensis | Porphyrio hotchstetteri | Alosa sapidissima | |||||||
---|---|---|---|---|---|---|---|---|---|
Hi-C | After OM Conflict Resolution | OM | Hi-C | After OM Conflict Resolution | OM | Hi-C | After OM Conflict Resolution | OM | |
Total assembly length (bp) | 1,151,594,481 | 1,151,593,281 | 1,169,606,052 | 1,270,322,674 | 1,270,322,434 | 1,280,192,390 | 903,564,947 | 903,564,347 | 951,311,491 |
Total scaffold length (bp) | 1,089,055,265 | 997,131,828 | 1,136,185,454 | 1,221,472,023 | 1,167,038,074 | 1,229,274,670 | 898,428,452 | 782,348,280 | 924,915,787 |
Number of scaffolds | 48 | 69 | 65 | 71 | 123 | 107 | 29 | 378 | 147 |
Scaffold N50 (bp) | 76,278,832 | 35,694,304 | 65,807,562 | 71,566,193 | 43,238,439 | 71,566,193 | 38,440,066 | 3,604,007 | 35,651,408 |
Scaffold average length (bp) | 22,688,651.35 | 14,451,185.91 | 17,479,776.22 | 17,203,831.31 | 9,488,114.42 | 11,488,548.32 | 30,980,291.45 | 2,069,704.44 | 6,291,944.13 |
Longest scaffold (bp) | 123,451,405 | 105,815,375 | 120,194,775 | 224,114,340 | 105,985,913 | 170,159,902 | 56,504,578 | 18,980,070 | 62,312,241 |
Shortest scaffold (bp) | 86,660 | 17,826 | 17,826 | 25,768 | 2832 | 2832 | 22,192 | 1287 | 1287 |
Number of gaps | 294 | 294 | 384 | 327 | 327 | 406 | 1639 | 1641 | 1983 |
Gap size (bp) | 1,813,227 | 1,813,227 | 19,825,998 | 5,316,249 | 5,316,249 | 15,186,205 | 4,995,812 | 4,995,812 | 52,742,956 |
Contigs in scaffolds | 342 | 363 | 449 | 398 | 442 | 506 | 1668 | 1993 | 2108 |
Remaining contigs | 139 | 472 | 386 | 103 | 242 | 178 | 44 | 522 | 407 |
Remaining contig total length (bp) | 62,539,216 | 154,461,453 | 33,420,598 | 48,850,651 | 103,284,360 | 50,917,720 | 5,136,495 | 121,216,067 | 26,395,704 |
Remaining contig N50 (bp) | 16,938,386 | 9,525,645 | 551,250 | 12,827,660 | 16,562,103 | 12,827,660 | 233,926 | 1,091,412 | 130,353 |
remaining contig average length (bp) | 449,922.42 | 327,248.84 | 86,581.86 | 474,278.17 | 426,794.88 | 286,054.61 | 116,738.52 | 232,214.69 | 64,854.31 |
Longest remaining contig (bp) | 23,235,100 | 33,421,925 | 7,094,517 | 19,484,417 | 20,414,059 | 19,484,417 | 403,292 | 6,672,834 | 1,756,549 |
Shortest remaining contig (bp) | 2297 | 641 | 641 | 3912 | 504 | 504 | 2376 | 659 | 659 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marcolungo, L.; Vincenzi, L.; Ballottari, M.; Cecchin, M.; Cosentino, E.; Mignani, T.; Limongi, A.; Ferraris, I.; Orlandi, M.; Rossato, M.; et al. Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions. Plants 2023, 12, 320. https://doi.org/10.3390/plants12020320
Marcolungo L, Vincenzi L, Ballottari M, Cecchin M, Cosentino E, Mignani T, Limongi A, Ferraris I, Orlandi M, Rossato M, et al. Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions. Plants. 2023; 12(2):320. https://doi.org/10.3390/plants12020320
Chicago/Turabian StyleMarcolungo, Luca, Leonardo Vincenzi, Matteo Ballottari, Michela Cecchin, Emanuela Cosentino, Thomas Mignani, Antonina Limongi, Irene Ferraris, Matteo Orlandi, Marzia Rossato, and et al. 2023. "Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions" Plants 12, no. 2: 320. https://doi.org/10.3390/plants12020320
APA StyleMarcolungo, L., Vincenzi, L., Ballottari, M., Cecchin, M., Cosentino, E., Mignani, T., Limongi, A., Ferraris, I., Orlandi, M., Rossato, M., & Delledonne, M. (2023). Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions. Plants, 12(2), 320. https://doi.org/10.3390/plants12020320