BacSeq: A User-Friendly Automated Pipeline for Whole-Genome Sequence Analysis of Bacterial Genomes
Abstract
:1. Introduction
2. Materials and Methods
2.1. Bioinformatics Pipeline
2.2. Requirements
2.3. Pipeline Customization
3. Results and Discussion
3.1. Graphical User Interface (GUI)
3.2. Use Case: Draft Genome Sequences of Acinetobacter baumannii Isolates
3.2.1. Quality Control
3.2.2. Genome Assembly, Assembly Quality Assessments, and Genome Annotation
3.2.3. Antibiotic Resistance, Also including Plasmids and Virulence Factors
3.2.4. Comparative Analysis
3.2.5. Other Analysis
3.3. Hybrid Library Assembly for Complete Genome Analysis
3.4. Limitation of BacSeq
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mardis, E.R. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24, 133–141. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mardis, E.R. DNA sequencing technologies: 2006–2016. Nat. Protoc. 2017, 12, 213–218. [Google Scholar] [CrossRef] [PubMed]
- Pevsner, J. Bioinformatics and Functional Genomics; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Quijada, N.M.; Rodríguez-Lázaro, D.; Eiros, J.M.; Hernández, M. TORMES: An automated pipeline for whole bacterial genome analysis. Bioinformatics 2019, 35, 4207–4212. [Google Scholar] [CrossRef]
- Ewels, P.; Magnusson, M.; Lundin, S.; Kaller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016, 32, 3047–3048. [Google Scholar] [CrossRef] [Green Version]
- Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
- Prjibelski, A.; Antipov, D.; Meleshko, D.; Lapidus, A.; Korobeynikov, A. Using SPAdes De Novo Assembler. Curr. Protoc. Bioinform. 2020, 70, e102. [Google Scholar] [CrossRef]
- Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics 2013, 29, 1072–1075. [Google Scholar] [CrossRef] [Green Version]
- Waterhouse, R.M.; Seppey, M.; Simao, F.A.; Manni, M.; Ioannidis, P.; Klioutchnikov, G.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 2018, 35, 543–548. [Google Scholar] [CrossRef] [Green Version]
- Seemann, T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics 2014, 30, 2068–2069. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cantalapiedra, C.P.; Hernandez-Plaza, A.; Letunic, I.; Bork, P.; Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 2021, 38, 5825–5829. [Google Scholar] [CrossRef] [PubMed]
- Eddy, S.R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 2011, 7, e1002195. [Google Scholar] [CrossRef] [Green Version]
- Buchfink, B.; Reuter, K.; Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 2021, 18, 366–368. [Google Scholar] [CrossRef]
- Steinegger, M.; Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef] [Green Version]
- Hyatt, D.; Chen, G.L.; Locascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010, 11, 119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tatusov, R.L.; Galperin, M.Y.; Natale, D.A.; Koonin, E.V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28, 33–36. [Google Scholar] [CrossRef] [Green Version]
- Harris, M.A.; Clark, J.; Ireland, A.; Lomax, J.; Ashburner, M.; Foulger, R.; Eilbeck, K.; Lewis, S.; Marshall, B.; Mungall, C.; et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32, D258–D261. [Google Scholar]
- Finn, R.D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; et al. Pfam: The protein families database. Nucleic Acids Res. 2014, 42, D222–D230. [Google Scholar] [CrossRef] [Green Version]
- Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
- Alcock, B.P.; Raphenya, A.R.; Lau, T.T.Y.; Tsang, K.K.; Bouchard, M.; Edalatmand, A.; Huynh, W.; Nguyen, A.V.; Cheng, A.A.; Liu, S.; et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 2020, 48, D517–D525. [Google Scholar] [CrossRef]
- Bortolaia, V.; Kaas, R.S.; Ruppe, E.; Roberts, M.C.; Schwarz, S.; Cattoir, V.; Philippon, A.; Allesoe, R.L.; Rebelo, A.R.; Florensa, A.F.; et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 2020, 75, 3491–3500. [Google Scholar] [CrossRef]
- Gupta, S.K.; Padmanabhan, B.R.; Diene, S.M.; Lopez-Rojas, R.; Kempf, M.; Landraud, L.; Rolain, J.-M. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob. Agents Chemother. 2014, 58, 212–220. [Google Scholar] [CrossRef] [Green Version]
- Doster, E.; Lakin, S.M.; Dean, C.J.; Wolfe, C.; Young, J.G.; Boucher, C.; Belk, K.E.; Noyes, N.R.; Morley, P.S. MEGARes 2.0: A database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Res. 2020, 48, D561–D569. [Google Scholar]
- Liu, B.; Zheng, D.D.; Jin, Q.; Chen, L.H.; Yang, J. VFDB 2019: A comparative pathogenomic platform with an interactive web interface. Nucleic Acids Res. 2019, 47, D687–D692. [Google Scholar] [CrossRef]
- Carattoli, A.; Hasman, H. PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid Replicons in Whole-Genome Sequencing (WGS). Methods Mol. Biol. 2020, 2075, 285–294. [Google Scholar]
- Siguier, P.; Perochon, J.; Lestrade, L.; Mahillon, J.; Chandler, M. ISfinder: The reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006, 34, D32–D36. [Google Scholar] [CrossRef] [Green Version]
- Yin, Y.; Mao, X.; Yang, J.; Chen, X.; Mao, F.; Xu, Y. dbCAN: A web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012, 40, W445–W451. [Google Scholar] [CrossRef]
- Zhang, H.; Yohe, T.; Huang, L.; Entwistle, S.; Wu, P.; Yang, Z.; Busk, P.K.; Xu, Y.; Yin, Y. dbCAN2: A meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018, 46, W95–W101. [Google Scholar] [CrossRef] [Green Version]
- Grissa, I.; Vergnaud, G.; Pourcel, C. CRISPRFinder: A web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007, 35, W52–W57. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Page, A.J.; Cummins, C.A.; Hunt, M.; Wong, V.K.; Reuter, S.; Holden, M.T.G.; Fookes, M.; Falush, D.; Keane, J.A.; Parkhill, J. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015, 31, 3691–3693. [Google Scholar] [CrossRef] [Green Version]
- Page, A.J.; Taylor, B.; Delaney, A.J.; Soares, J.; Seemann, T.; Keane, J.A.; Harris, S.R. SNP-sites: Rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genom. 2016, 2, e000056. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 2009, 26, 1641–1650. [Google Scholar] [CrossRef]
- Antunes, L.; Visca, P.; Towner, K.J. Acinetobacter baumannii: Evolution of a global pathogen. Pathog. Dis. 2014, 71, 292–301. [Google Scholar] [CrossRef] [Green Version]
- Partridge, S.R.; Kwong, S.M.; Firth, N.; Jensen, S.O. Mobile genetic elements associated with antimicrobial resistance. Clin. Microbiol. Rev. 2018, 31, e00088-17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- CDC. Antibiotic Resistance Threats in the United States, 2019 (2019 AR Threats Report), Centers for Disease Control and Prevention (CDC), Atlanta, GA. 2019. Available online: https://www.cdc.gov/drugresistance/Biggest-Threats.html (accessed on 4 January 2020).
- Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data; Babraham Bioinformatics, Babraham Institute: Cambridge, UK, 2010. [Google Scholar]
- Treangen, T.J.; Salzberg, S.L. Repetitive DNA and next-generation sequencing: Computational challenges and solutions. Nat. Rev. Genet. 2012, 13, 36–46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chukamnerd, A.; Singkhamanan, K.; Chongsuvivatwong, V.; Palittapongarnpim, P.; Doi, Y.; Pomwised, R.; Sakunrang, C.; Jeenkeawpiam, K.; Yingkajorn, M.; Chusri, S. Whole-genome analysis of carbapenem-resistant Acinetobacter baumannii from clinical isolates in Southern Thailand. Comput. Struct. Biotechnol. J. 2022, 20, 545–558. [Google Scholar] [CrossRef]
- Hernández-Díaz, E.A.; Vázquez-Garcidueñas, M.S.; Negrete-Paz, A.M.; Vázquez-Marrufo, G. Comparative Genomic Analysis Discloses Differential Distribution of Antibiotic Resistance Determinants between Worldwide Strains of the Emergent ST213 Genotype of Salmonella Typhimurium. Antibiotics 2022, 11, 925. [Google Scholar] [CrossRef] [PubMed]
- Tsui, C.K.; Abid, F.B.; McElheny, C.L.; Almuslamani, M.; Omrani, A.S.; Doi, Y. Genomic epidemiology revealed the emergence and worldwide dissemination of ST383 carbapenem-resistant hypervirulent Klebsiella pneumoniae and hospital acquired infections of ST196 Klebsiella quasipneumoniae in Qatar. bioRxiv 2022. [Google Scholar] [CrossRef]
- Alzahrani, K.O.; Al-Reshoodi, F.M.; Alshdokhi, E.A.; Alhamed, A.S.; Al Hadlaq, M.A.; Mujallad, M.I.; Mukhtar, L.E.; Alsufyani, A.T.; Alajlan, A.A.; Al Rashidy, M.S. Antimicrobial resistance and genomic characterization of Salmonella enterica isolates from chicken meat. Front. Microbiol. 2023, 14, 1104164. [Google Scholar] [CrossRef]
- Bloomfield, S.; Duong, V.T.; Tuyen, H.T.; Campbell, J.I.; Thomson, N.R.; Parkhill, J.; Le Phuc, H.; Chau, T.T.H.; Maskell, D.J.; Perron, G.G. Mobility of antimicrobial resistance across serovars and disease presentations in non-typhoidal Salmonella from animals and humans in Vietnam. Microb. Genom. 2022, 8, 000798. [Google Scholar] [CrossRef]
- Mira, A.; Martín-Cuadrado, A.B.; D’Auria, G.; Rodríguez-Valera, F. The bacterial pan-genome: A new paradigm in microbiology. Int. Microbiol. 2010, 13, 45–57. [Google Scholar]
- Polz, M.F.; Alm, E.J.; Hanage, W.P. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 2013, 29, 170–175. [Google Scholar] [CrossRef] [Green Version]
- Palmer, M.; Venter, S.N.; Coetzee, M.P.; Steenkamp, E.T. Prokaryotic species are sui generis evolutionary units. Syst. Appl. Microbiol. 2019, 42, 145–158. [Google Scholar] [CrossRef]
- Cerqueira, F.M.; Photenhauer, A.L.; Pollet, R.M.; Brown, H.A.; Koropatkin, N.M. Starch digestion by gut bacteria: Crowdsourcing for carbs. Trends Microbiol. 2020, 28, 95–108. [Google Scholar] [CrossRef]
- Surachat, K.; Kantachote, D.; Deachamag, P.; Wonglapsuwan, M. Genomic insight into Pediococcus acidilactici HN9, a potential probiotic strain isolated from the traditional Thai-style fermented Beef Nhang. Microorganisms 2020, 9, 50. [Google Scholar] [CrossRef]
- Rowland, I.; Gibson, G.; Heinken, A.; Scott, K.; Swann, J.; Thiele, I.; Tuohy, K. Gut microbiota functions: Metabolism of nutrients and other food components. Eur. J. Nutr. 2018, 57, 1–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Amitai, G.; Sorek, R. CRISPR–Cas adaptation: Insights into the mechanism of action. Nat. Rev. Microbiol. 2016, 14, 67. [Google Scholar] [CrossRef]
- Chevallereau, A.; Meaden, S.; van Houte, S.; Westra, E.R.; Rollie, C. The effect of bacterial mutation rate on the evolution of CRISPR-Cas adaptive immunity. Philos. Trans. R. Soc. B 2019, 374, 20180094. [Google Scholar] [CrossRef] [Green Version]
- De la Fuente-Núñez, C.; Lu, T.K. CRISPR-Cas9 technology: Applications in genome engineering, development of sequence-specific antimicrobials, and future prospects. Integr. Biol. 2017, 9, 109–122. [Google Scholar] [CrossRef] [PubMed]
- Wood, D.E.; Salzberg, S.L. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014, 15, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Isolate Code | Number of Contigs | Total Length | %GC | N50 | N90 | L50 | L90 |
---|---|---|---|---|---|---|---|
PA020 | 92 | 9,243,789 | 36.80 | 569,197 | 59,780 | 6 | 29 |
PA025 | 66 | 3,906,279 | 38.89 | 152,139 | 41,611 | 6 | 24 |
ST001 | 68 | 4,111,741 | 39.02 | 128,875 | 39,322 | 8 | 28 |
ST009 | 69 | 3,844,585 | 39.01 | 113,438 | 40,316 | 11 | 32 |
ST010 | 68 | 3,872,017 | 38.93 | 123,627 | 42,752 | 10 | 30 |
ST024 | 62 | 4,294,911 | 38.82 | 250,119 | 64,045 | 6 | 18 |
ST028 | 187 | 9,539,281 | 49.62 | 187,251 | 35,730 | 17 | 58 |
ST032 | 67 | 3,844,381 | 39.01 | 122,061 | 42,602 | 11 | 31 |
ST034 | 58 | 4,225,388 | 38.88 | 250,219 | 71,864 | 6 | 16 |
ST035 | 70 | 4,035,126 | 38.99 | 176,611 | 43,801 | 7 | 25 |
ST036 | 818 | 4,329,065 | 38.79 | 65,441 | 1278 | 15 | 255 |
YL005 | 109 | 3,910,735 | 38.91 | 76,044 | 195,99 | 18 | 55 |
YL006 | 53 | 3,894,856 | 38.99 | 190,977 | 61,096 | 6 | 19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chukamnerd, A.; Jeenkeawpiam, K.; Chusri, S.; Pomwised, R.; Singkhamanan, K.; Surachat, K. BacSeq: A User-Friendly Automated Pipeline for Whole-Genome Sequence Analysis of Bacterial Genomes. Microorganisms 2023, 11, 1769. https://doi.org/10.3390/microorganisms11071769
Chukamnerd A, Jeenkeawpiam K, Chusri S, Pomwised R, Singkhamanan K, Surachat K. BacSeq: A User-Friendly Automated Pipeline for Whole-Genome Sequence Analysis of Bacterial Genomes. Microorganisms. 2023; 11(7):1769. https://doi.org/10.3390/microorganisms11071769
Chicago/Turabian StyleChukamnerd, Arnon, Kongpop Jeenkeawpiam, Sarunyou Chusri, Rattanaruji Pomwised, Kamonnut Singkhamanan, and Komwit Surachat. 2023. "BacSeq: A User-Friendly Automated Pipeline for Whole-Genome Sequence Analysis of Bacterial Genomes" Microorganisms 11, no. 7: 1769. https://doi.org/10.3390/microorganisms11071769
APA StyleChukamnerd, A., Jeenkeawpiam, K., Chusri, S., Pomwised, R., Singkhamanan, K., & Surachat, K. (2023). BacSeq: A User-Friendly Automated Pipeline for Whole-Genome Sequence Analysis of Bacterial Genomes. Microorganisms, 11(7), 1769. https://doi.org/10.3390/microorganisms11071769