BacSPaD: A Robust Bacterial Strains’ Pathogenicity Resource Based on Integrated and Curated Genomic Metadata
Abstract
:1. Introduction
1.1. Leveraging Bacterial Omics Data for Pathogenicity Insights and Public Health
1.2. Current Challenges
1.3. Objectives of BacSPaD
2. Materials and Methods
2.1. Data Acquisition
2.2. Data Pre-Processing and Integration
2.3. Quality Control
3. Results
3.1. Pathogenicity Annotation
3.1.1. HP Labeling Workflow
3.1.2. NHP Labeling Workflow
3.1.3. Manual Curation
3.2. Case Studies of HP and Inconclusive Genomes
3.3. Database Overview and Analysis
3.3.1. General Statistics and Distribution
3.3.2. Virulence Factor Analysis
3.3.3. Database Structure
- Data: Integrated dataset with pathogenicity annotation for each strain. Users can perform queries by any keyword across any field, as well as field-specific searches. Detailed descriptions of each metadata field are available in Supplementary Information: Table S1. Users may download selected genomes or retrieve them in batch along with various other data files, such as proteomes and protein families, from the BV-BRC FTP site at https://www.bv-brc.org/docs/quick_references/ftp.html (accessed on 6 August 2024). To facilitate the search for strains associated with a specific disease or isolation source category/subcategory, a categorization of diseases and isolation sources was also performed and the obtained fields added to this data. These were designated, respectively, as ‘disease category’, ‘disease subcategory’, and ‘isolation source’.
- Dashboard: This section features a range of statistical visualizations, including the top 10 and 50 species, the top 12 families, a location distribution map according to the country of isolation, and interactive visualizations of taxonomy, isolation sources, and disease categories with respective subcategories.
- Molecular Biology: This section includes visualizations on the distributions for plasmids and contigs counts, genome lengths in base pairs (‘bp’), GC content percentage, and protein-coding sequences (‘PATRIC CDS’).
- Virulence Factors: Virulence factor information for the most prevalent clinical species, including the gene name; the frequency at which it is found in HP strains; the frequency at which it is found in NHP strains; a list of the BV-BRC genome IDs in which it is found; the species names; and the corresponding number of strains, species, genera, and families.
- About: Summary of the utility of BacSPaD for microbiology research.
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Didelot, X.; Walker, A.S.; Peto, T.E.; Crook, D.W.; Wilson, D.J. Within-Host Evolution of Bacterial Pathogens. Nat. Rev. Microbiol. 2016, 14, 150–162. [Google Scholar] [CrossRef]
- Boolchandani, M.; D’Souza, A.W.; Dantas, G. Sequencing-Based Methods and Resources to Study Antimicrobial Resistance. Nat. Rev. Genet. 2019, 20, 356–370. [Google Scholar] [CrossRef] [PubMed]
- Subramanian, D.; Natarajan, J. Leveraging Big Data Bioinformatics Approaches to Extract Knowledge from Staphylococcus aureus Public Omics Data. Crit. Rev. Microbiol. 2023, 49, 391–413. [Google Scholar] [CrossRef]
- Ben Khedher, M.; Ghedira, K.; Rolain, J.-M.; Ruimy, R.; Croce, O. Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies. Int. J. Mol. Sci. 2022, 23, 1395. [Google Scholar] [CrossRef] [PubMed]
- Jung, A.; Metzner, M.; Ryll, M. Comparison of Pathogenic and Non-Pathogenic Enterococcus cecorum Strains from Different Animal Species. BMC Microbiol. 2017, 17, 33. [Google Scholar] [CrossRef]
- Fouts, D.E.; Matthias, M.A.; Adhikarla, H.; Adler, B.; Amorim-Santos, L.; Berg, D.E.; Bulach, D.; Buschiazzo, A.; Chang, Y.-F.; Galloway, R.L.; et al. What Makes a Bacterial Species Pathogenic? Comparative Genomic Analysis of the Genus leptospira. PLoS Negl. Trop. Dis. 2016, 10, e0004403. [Google Scholar] [CrossRef] [PubMed]
- Cosentino, S.; Voldby Larsen, M.; Møller Aarestrup, F.; Lund, O. PathogenFinder—Distinguishing Friend from Foe Using Bacterial Whole Genome Sequence Data. PLoS ONE 2013, 8, e77302. [Google Scholar] [CrossRef]
- Deneke, C.; Rentzsch, R.; Renard, B.Y. PaPrBaG: A Machine Learning Approach for the Detection of Novel Pathogens from NGS Data. Sci. Rep. 2017, 7, 39194. [Google Scholar] [CrossRef] [PubMed]
- Barash, E.; Sal-Man, N.; Sabato, S.; Ziv-Ukelson, M. BacPaCS—Bacterial Pathogenicity Classification via Sparse-SVM. Bioinformatics 2019, 35, 2001–2008. [Google Scholar] [CrossRef]
- Naor-Hoffmann, S.; Svetlitsky, D.; Sal-Man, N.; Orenstein, Y.; Ziv-Ukelson, M. Predicting the Pathogenicity of Bacterial Genomes Using Widely Spread Protein Families. BMC Bioinform. 2022, 23, 253. [Google Scholar] [CrossRef]
- Kitts, P.A.; Church, D.M.; Thibaud-Nissen, F.; Choi, J.; Hem, V.; Sapojnikov, V.; Smith, R.G.; Tatusova, T.; Xiang, C.; Zherikov, A.; et al. Assembly: A Resource for Assembled Genomes at NCBI. Nucleic Acids Res. 2016, 44, D73–D80. [Google Scholar] [CrossRef]
- Mukherjee, S.; Stamatis, D.; Bertsch, J.; Ovchinnikova, G.; Sundaramurthi, J.C.; Lee, J.; Kandimalla, M.; Chen, I.-M.A.; Kyrpides, N.C.; Reddy, T.B.K. Genomes OnLine Database (GOLD) v.8: Overview and Updates. Nucleic Acids Res. 2021, 49, D723–D733. [Google Scholar] [CrossRef]
- Markowitz, V.M.; Chen, I.-M.A.; Palaniappan, K.; Chu, K.; Szeto, E.; Grechkin, Y.; Ratner, A.; Jacob, B.; Huang, J.; Williams, P.; et al. IMG: The Integrated Microbial Genomes Database and Comparative Analysis System. Nucleic Acids Res. 2012, 40, D115–D122. [Google Scholar] [CrossRef]
- Reimer, L.C.; Sardà Carbasse, J.; Koblitz, J.; Ebeling, C.; Podstawka, A.; Overmann, J. BacDive in 2022: The Knowledge Base for Standardized Bacterial and Archaeal Data. Nucleic Acids Res. 2022, 50, D741–D746. [Google Scholar] [CrossRef]
- Guo, C.; Chen, Q.; Fan, G.; Sun, Y.; Nie, J.; Shen, Z.; Meng, Z.; Zhou, Y.; Li, S.; Wang, S.; et al. GcPathogen: A Comprehensive Genomic Resource of Human Pathogens for Public Health. Nucleic Acids Res. 2024, 52, D714–D723. [Google Scholar] [CrossRef]
- Feng, Y.; Zou, S.; Chen, H.; Yu, Y.; Ruan, Z. BacWGSTdb 2.0: A One-Stop Repository for Bacterial Whole-Genome Sequence Typing and Source Tracking. Nucleic Acids Res. 2021, 49, D644–D650. [Google Scholar] [CrossRef] [PubMed]
- Kaur, S.; Payne, M.; Luo, L.; Octavia, S.; Tanaka, M.M.; Sintchenko, V.; Lan, R. MGTdb: A Web Service and Database for Studying the Global and Local Genomic Epidemiology of Bacterial Pathogens. Database 2022, 2022, baac094. [Google Scholar] [CrossRef] [PubMed]
- Bäumler, A.; Fang, F.C. Host Specificity of Bacterial Pathogens. Cold Spring Harb. Perspect. Med. 2013, 3, a010041. [Google Scholar] [CrossRef] [PubMed]
- Falkow, S. Molecular Koch’s Postulates Applied to Microbial Pathogenicity. Clin. Infect. Dis. 1988, 10, S274–S276. [Google Scholar] [CrossRef]
- Olson, R.D.; Assaf, R.; Brettin, T.; Conrad, N.; Cucinell, C.; Davis, J.J.; Dempsey, D.M.; Dickerman, A.; Dietrich, E.M.; Kenyon, R.W.; et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): A Resource Combining PATRIC, IRD and ViPR. Nucleic Acids Res. 2023, 51, D678–D689. [Google Scholar] [CrossRef]
- Barrett, T.; Clark, K.; Gevorgyan, R.; Gorelenkov, V.; Gribov, E.; Karsch-Mizrachi, I.; Kimelman, M.; Pruitt, K.D.; Resenchuk, S.; Tatusova, T.; et al. BioProject and BioSample Databases at NCBI: Facilitating Capture and Organization of Metadata. Nucleic Acids Res. 2012, 40, D57–D63. [Google Scholar] [CrossRef] [PubMed]
- Schoch, C.L.; Ciufo, S.; Domrachev, M.; Hotton, C.L.; Kannan, S.; Khovanskaya, R.; Leipe, D.; Mcveigh, R.; O’Neill, K.; Robbertse, B.; et al. NCBI Taxonomy: A Comprehensive Update on Curation, Resources and Tools. Database 2020, 2020, baaa062. [Google Scholar] [CrossRef] [PubMed]
- Parks, D.H.; Imelfort, M.; Skennerton, C.T.; Hugenholtz, P.; Tyson, G.W. CheckM: Assessing the Quality of Microbial Genomes Recovered from Isolates, Single Cells, and Metagenomes. Genome Res. 2015, 25, 1043–1055. [Google Scholar] [CrossRef] [PubMed]
- Sichtig, H.; Minogue, T.; Yan, Y.; Stefan, C.; Hall, A.; Tallon, L.; Sadzewicz, L.; Nadendla, S.; Klimke, W.; Hatcher, E.; et al. FDA-ARGOS Is a Database with Public Quality-Controlled Reference Genomes for Diagnostic Use and Regulatory Science. Nat. Commun. 2019, 10, 3313. [Google Scholar] [CrossRef]
- Osowicki, J.; Azzopardi, K.I.; Fabri, L.; Frost, H.R.; Rivera-Hernandez, T.; Neeland, M.R.; Whitcombe, A.L.; Grobler, A.; Gutman, S.J.; Baker, C.; et al. A Controlled Human Infection Model of Streptococcus Pyogenes Pharyngitis (CHIVAS-M75): An Observational, Dose-Finding Study. Lancet Microbe 2021, 2, e291–e299. [Google Scholar] [CrossRef] [PubMed]
- Liu, B.; Zheng, D.; Zhou, S.; Chen, L.; Yang, J. VFDB 2022: A General Classification Scheme for Bacterial Virulence Factors. Nucleic Acids Res. 2022, 50, D912–D917. [Google Scholar] [CrossRef] [PubMed]
- Seemann, T. Abricate: Mag_Right: Mass Screening of Contigs for Antimicrobial and Virulence Genes; Github: San Francisco, CA, USA, 2019. [Google Scholar]
- Braz, V.S.; Melchior, K.; Moreira, C.G. Escherichia Coli as a Multifaceted Pathogenic and Versatile Bacterium. Front. Cell. Infect. Microbiol. 2020, 10, 548492. [Google Scholar] [CrossRef]
- Aslam, B.; Khurshid, M.; Arshad, M.I.; Muzammil, S.; Rasool, M.; Yasmeen, N.; Shah, T.; Chaudhry, T.H.; Rasool, M.H.; Shahid, A.; et al. Antibiotic Resistance: One Health One World Outlook. Front. Cell. Infect. Microbiol. 2021, 11, 771510. [Google Scholar] [CrossRef] [PubMed]
- Goris, J.; Konstantinidis, K.T.; Klappenbach, J.A.; Coenye, T.; Vandamme, P.; Tiedje, J.M. DNA–DNA Hybridization Values and Their Relationship to Whole-Genome Sequence Similarities. Int. J. Syst. Evol. Microbiol. 2007, 57, 81–91. [Google Scholar] [CrossRef]
- Ondov, B.D.; Treangen, T.J.; Melsted, P.; Mallonee, A.B.; Bergman, N.H.; Koren, S.; Phillippy, A.M. Mash: Fast Genome and Metagenome Distance Estimation Using MinHash. Genome Biol. 2016, 17, 132. [Google Scholar] [CrossRef]
Species Name | Genome Name | Relevant Metadata Field(s) and Content | Label |
---|---|---|---|
Streptococcus pyogenes | Streptococcus pyogenes strain M75 | Comments: “…modern controlled human infection * model, with the aim of safely and successfully causing pharyngitis * in healthy ** adult volunteers” | HP |
Neisseria meningitidis | Neisseria meningitidis strain S4 | Comments: “…ability to cause septicaemic disease * and meningitis * (…) meningococcus is primarily an obligate commensal ** of the human nasopharynx, and it is unclear why the bacterium has evolved exquisite mechanisms to avoid host immunity (…) genome of S4, an invasive * strain of Neisseria meningitidis”. | HP |
Citrobacter koseri | Citrobacter koseri strain MPUCK001 | Isolation source: “The skin surface of human (disease *: atopic dermatitis *) neck” | Inconclusive (after manual revision) |
Pseudomonas putida | Pseudomonas putida strain 15420352 | Isolation source: “urine”; host health: “pulmonary infection *” | Inconclusive (after manual revision) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ribeiro, S.; Chaumet, G.; Alves, K.; Nourikyan, J.; Shi, L.; Lavergne, J.-P.; Mijakovic, I.; de Bernard, S.; Buffat, L. BacSPaD: A Robust Bacterial Strains’ Pathogenicity Resource Based on Integrated and Curated Genomic Metadata. Pathogens 2024, 13, 672. https://doi.org/10.3390/pathogens13080672
Ribeiro S, Chaumet G, Alves K, Nourikyan J, Shi L, Lavergne J-P, Mijakovic I, de Bernard S, Buffat L. BacSPaD: A Robust Bacterial Strains’ Pathogenicity Resource Based on Integrated and Curated Genomic Metadata. Pathogens. 2024; 13(8):672. https://doi.org/10.3390/pathogens13080672
Chicago/Turabian StyleRibeiro, Sara, Guillaume Chaumet, Karine Alves, Julien Nourikyan, Lei Shi, Jean-Pierre Lavergne, Ivan Mijakovic, Simon de Bernard, and Laurent Buffat. 2024. "BacSPaD: A Robust Bacterial Strains’ Pathogenicity Resource Based on Integrated and Curated Genomic Metadata" Pathogens 13, no. 8: 672. https://doi.org/10.3390/pathogens13080672
APA StyleRibeiro, S., Chaumet, G., Alves, K., Nourikyan, J., Shi, L., Lavergne, J. -P., Mijakovic, I., de Bernard, S., & Buffat, L. (2024). BacSPaD: A Robust Bacterial Strains’ Pathogenicity Resource Based on Integrated and Curated Genomic Metadata. Pathogens, 13(8), 672. https://doi.org/10.3390/pathogens13080672