StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level
Abstract
:1. Introduction
2. Materials and Methods
2.1. Site-Specific Reference Genome Sets
2.2. Building n-Gram-Based DNA Signature Element Models (DSEMs)
2.2.1. n-Gram Generation and Encoding
2.2.2. Scoring Function
2.3. Identification of Strains from Metagenomic Sequencing Datasets
2.4. Quantification of Strains
2.5. In Silico and Experimental Validation Using Simulated and Mock Communities
2.5.1. Simulated Datasets for Testing
2.5.2. Mock Community Datasets for Testing
2.6. Comparison against Other Popular Methods
2.7. Statistical Measures Used for Performance Testing
3. Results
3.1. n-Gram-Based Body-Site-Specific DSEMs
3.2. Identification of Optimal Size of an n-Gram for DSEM Building
3.3. DSEM Building from GI Tract Reference Genomes
3.4. Threshold Score Cutoff for Taxa Prediction
3.5. Assessing the Performance of the StrainIQ Algorithm Based on Simulated Datasets
3.6. Assessing the Performance of the StrainIQ Algorithm on Experimental Datasets
3.7. Comparison of StrainIQ Performance with other Popular Methods
3.8. Quantification of the Identified Taxa from the Metagenomic Data
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DNA | deoxyribonucleic acid |
DSEM | DNA signature element model |
GI tract | gastrointestinal tract |
HMP | human microbiome project |
MAG | metagenome-assembled genomes |
NCBI | National center for biotechnology information |
References
- Reynoso-García, J.; Miranda-Santiago, A.E.; Meléndez-Vázquez, N.M.; Acosta-Pagán, K.; Sánchez-Rosado, M.; Díaz-Rivera, J.; Rosado-Quiñones, A.M.; Acevedo-Márquez, L.; Cruz-Roldán, L.; Tosado-Rodríguez, E.L.; et al. A complete guide to human microbiomes: Body niches, transmission, development, dysbiosis, and restoration. Front. Syst. Biol. 2022, 2, 951403. [Google Scholar] [CrossRef]
- Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65. [Google Scholar] [CrossRef] [PubMed]
- Kilian, M.; Chapple, I.L.C.; Hannig, M.; Marsh, P.D.; Meuric, V.; Pedersen, A.M.L.; Tonetti, M.S.; Wade, W.G.; Zaura, E. The oral microbiome—An update for oral healthcare professionals. Br. Dent. J. 2016, 221, 657–666. [Google Scholar] [CrossRef] [PubMed]
- DeGruttola, A.K.; Low, D.; Mizoguchi, A.; Mizoguchi, E. Current Understanding of Dysbiosis in Disease in Human and Animal Models. Inflamm. Bowel Dis. 2016, 22, 1137–1150. [Google Scholar] [CrossRef]
- Yoo, J.Y.; Groer, M.; Dutra, S.V.O.; Sarkar, A.; McSkimming, D.I. Gut Microbiota and Immune System Interactions. Microorganisms 2020, 8, 1587. [Google Scholar] [CrossRef]
- Takiishi, T.; Fenero, C.I.M.; Câmara, N.O.S. Intestinal barrier and gut microbiota: Shaping our immune responses throughout life. Tissue Barriers 2017, 5, e1373208. [Google Scholar] [CrossRef] [PubMed]
- Lavelle, A.; Sokol, H. Gut microbiota-derived metabolites as key actors in inflammatory bowel disease. Nat. Rev. Gastroenterol. Hepatol. 2020, 17, 223–237. [Google Scholar] [CrossRef] [PubMed]
- Zheng, D.; Liwinski, T.; Elinav, E. Interaction between microbiota and immunity in health and disease. Cell Res. 2020, 30, 492–506. [Google Scholar] [CrossRef]
- Li, C.; Liang, Y.; Qiao, Y. Messengers From the Gut: Gut Microbiota-Derived Metabolites on Host Regulation. Front. Microbiol. 2022, 13, 863407. [Google Scholar] [CrossRef]
- Dekaboruah, E.; Suryavanshi, M.V.; Chettri, D.; Verma, A.K. Human microbiome: An academic update on human body site specific surveillance and its possible role. Arch. Microbiol. 2020, 202, 2147–2167. [Google Scholar] [CrossRef]
- Clemente, J.C.; Ursell, L.K.; Parfrey, L.W.; Knight, R. The Impact of the Gut Microbiota on Human Health: An Integrative View. Cell 2012, 148, 1258–1270. [Google Scholar] [CrossRef] [PubMed]
- Davis, C.D. The Gut Microbiome and Its Role in Obesity. Nutr. Today 2016, 51, 167–174. [Google Scholar] [CrossRef]
- Zitvogel, L.; Galluzzi, L.; Viaud, S.; Vétizou, M.; Daillère, R.; Merad, M.; Kroemer, G. Cancer and the gut microbiota: An unexpected link. Sci. Transl. Med. 2015, 7, 271. [Google Scholar] [CrossRef]
- Clapp, M.; Aurora, N.; Herrera, L.; Bhatia, M.; Wilen, E.; Wakefield, S. Gut Microbiota’s Effect on Mental Health: The Gut-Brain Axis. Clin. Pract. 2017, 7, 987. [Google Scholar] [CrossRef] [PubMed]
- Bellaguarda, E.; Chang, E.B. IBD and the Gut Microbiota—From Bench to Personalized Medicine. In Current Gastroenterology Reports; Springer: Berlin/Heidelberg, Germany, 2015; Volume 17, pp. 1–13. [Google Scholar]
- Albhaisi, S.A.M.; Bajaj, J.S.; Sanyal, A.J. Role of Gut Microbiota in Liver Disease. Am. J. Physiol. Gastrointest. Liver Physiol. 2020, 318, G84–G98. [Google Scholar] [CrossRef] [PubMed]
- Fukuda, S.; Ohno, H. Gut microbiome and metabolic diseases. Semin. Immunopathol. 2013, 36, 103–114. [Google Scholar] [CrossRef] [PubMed]
- Mueller, N.T.; Bakacs, E.; Combellick, J.; Grigoryan, Z.; Dominguez-Bello, M.G. The infant microbiome development: Mom matters. Trends Mol. Med. 2015, 21, 109–117. [Google Scholar] [CrossRef]
- Mayer, E.A.; Tillisch, K.; Gupta, A. Gut/Brain Axis and the Microbiota. J. Clin. Investig. 2015, 125, 926–938. [Google Scholar] [CrossRef]
- Kitaya, K.; Nagai, Y.; Arai, W.; Sakuraba, Y.; Ishikawa, T. Characterization of Microbiota in Endometrial Fluid and Vaginal Secretions in Infertile Women with Repeated Implantation Failure. Mediat. Inflamm. 2019, 2019, 4893437. [Google Scholar] [CrossRef]
- Iniesta, M.; Chamorro, C.; Ambrosio, N.; Marín, M.J.; Sanz, M.; Herrera, D. Subgingival microbiome in periodontal health, gingivitis and different stages of periodontitis. J. Clin. Periodontol. 2023, 50, 905–920. [Google Scholar] [CrossRef]
- Kim, Y.-T.; Jeong, J.; Mun, S.; Yun, K.; Han, K.; Jeong, S.-N. Comparison of the oral microbial composition between healthy individuals and periodontitis patients in different oral sampling sites using 16S metagenome profiling. J. Periodontal Implant. Sci. 2022, 52, 394–410. [Google Scholar] [CrossRef] [PubMed]
- Land, M.; Hauser, L.; Jun, S.-R.; Nookaew, I.; Leuze, M.R.; Ahn, T.-H.; Karpinets, T.; Lund, O.; Kora, G.; Wassenaar, T.; et al. Insights from 20 years of bacterial genome sequencing. Funct. Integr. Genom. 2015, 15, 141–161. [Google Scholar] [CrossRef]
- Franzosa, E.A.; Huang, K.; Meadow, J.F.; Gevers, D.; Lemon, K.P.; Bohannan, B.J.M.; Huttenhower, C. Identifying personal microbiomes using metagenomic codes. Proc. Natl. Acad. Sci. USA 2015, 112, E2930–E2938. [Google Scholar] [CrossRef]
- Lim, S.K.; Stuart, R.L.; Mackin, K.E.; Carter, G.P.; Kotsanas, D.; Francis, M.J.; Easton, M.; Dimovski, K.; Elliott, B.; Riley, T.V.; et al. Emergence of a Ribotype 244 Strain of Clostridium difficile Associated With Severe Disease and Related to the Epidemic Ribotype 027 Strain. Clin. Infect. Dis. 2014, 58, 1723–1730. [Google Scholar] [CrossRef] [PubMed]
- Rasheed, M.U.; Thajuddin, N.; Ahamed, P.; Teklemariam, Z.; Jamil, K. Antimicrobial drug resistance in strains of escherichia coli isolated from food sources. Rev. Do Inst. Med. Trop. São Paulo 2014, 56, 341–346. [Google Scholar] [CrossRef]
- Huson, D.H.; Auch, A.F.; Qi, J.; Schuster, S.C. Megan analysis of metagenomic data. Genome Res. 2007, 17, 377–386. [Google Scholar] [CrossRef]
- Sobih, A.; Tomescu, A.I.; Mäkinen, V. Metaflow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows. In Proceedings of the Research in Computational Molecular Biology—RECOMB 2016, Santa Monica, CA, USA, 17–21 April 2016; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Volume 9649. [Google Scholar] [CrossRef]
- Hong, C.; Manimaran, S.; Shen, Y.; Perez-Rogers, J.F.; Byrd, A.L.; Castro-Nallar, E.; Crandall, K.A.; Johnson, W.E. PathoScope 2.0: A complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome 2014, 2, 33. [Google Scholar] [CrossRef]
- Wood, D.E.; Salzberg, S.L. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014, 15, R46. [Google Scholar] [CrossRef] [PubMed]
- Breitwieser, F.P.; Baker, D.N.; Salzberg, S.L. KrakenUniq: Confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 2018, 19, 198. [Google Scholar] [CrossRef]
- Ounit, R.; Wanamaker, S.; Close, T.J.; Lonardi, S. CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 2015, 16, 236. [Google Scholar] [CrossRef]
- Ounit, R.; Lonardi, S. Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics 2016, 32, 3823–3825. [Google Scholar] [CrossRef] [PubMed]
- Ames, S.K.; Hysom, D.A.; Gardner, S.N.; Lloyd, G.S.; Gokhale, M.B.; Allen, J.E. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 2013, 29, 2253–2260. [Google Scholar] [CrossRef]
- Albanese, D.; Donati, C. Strain profiling and epidemiology of bacterial species from metagenomic sequencing. Nat. Commun. 2017, 8, 2260. [Google Scholar] [CrossRef]
- Luo, C.; Knight, R.; Siljander, H.; Knip, M.; Xavier, R.J.; Gevers, D. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 2015, 33, 1045–1052. [Google Scholar] [CrossRef] [PubMed]
- Costello, E.K.; Lauber, C.L.; Hamady, M.; Fierer, N.; Gordon, J.I.; Knight, R. Bacterial Community Variation in Human Body Habitats Across Space and Time. Science 2009, 326, 1694–1697. [Google Scholar] [CrossRef]
- Srinivasan, S.M.; Guda, C. MetaID: A novel method for identification and quantification of metagenomic samples. BMC Genom. 2013, 14, S4. [Google Scholar] [CrossRef] [PubMed]
- Segata, N.; Waldron, L.; Ballarini, A.; Narasimhan, V.; Jousson, O.; Huttenhower, C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 2012, 9, 811–814. [Google Scholar] [CrossRef]
- Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Srinivasan, S.M.; Vural, S.; King, B.R.; Guda, C. Mining for class-specific motifs in protein sequence classification. BMC Bioinform. 2013, 14, 96. [Google Scholar] [CrossRef]
- Guda, C.; King, B.R.; Pal, L.R.; Guda, P. A Top-Down Approach to Infer and Compare Domain-Domain Interactions across Eight Model Organisms. PLoS ONE 2009, 4, e5096. [Google Scholar] [CrossRef]
- Gourlé, H.; Karlsson-Lindsjö, O.; Hayer, J.; Bongcam-Rudloff, E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics 2018, 35, 521–522. [Google Scholar] [CrossRef] [PubMed]
- Crick, F.H.C.; Barnett, L.; Brenner, S.; Watts-Tobin, R.J. General Nature of the Genetic Code for Proteins. Nature 1961, 192, 1227–1232. [Google Scholar] [CrossRef] [PubMed]
- Milanese, A.; Mende, D.R.; Paoli, L.; Salazar, G.; Ruscheweyh, H.-J.; Cuenca, M.; Hingamp, P.; Alves, R.; Costea, P.I.; Coelho, L.P.; et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 2019, 10, 1014. [Google Scholar] [CrossRef] [PubMed]
- Mukherjee, S.; Stamatis, D.; Li, C.T.; Ovchinnikova, G.; Bertsch, J.; Sundaramurthi, J.C.; Kandimalla, M.; Nicolopoulos, P.A.; Favognano, A.; A Chen, I.-M.; et al. Twenty-five years of Genomes OnLine Database (GOLD): Data updates and new features in v.9. Nucleic Acids Res. 2022, 51, D957–D963. [Google Scholar] [CrossRef] [PubMed]
Genus | Species | Strain | |
---|---|---|---|
StrainIQ | 0.977 | 0.886 | 0.821 |
KrakenUniq | 0.983 | 0.942 | 0.639 |
MetaPhlAn | 0.914 | 0.719 | NA |
CLARK | 0.887 | 0.719 | NA |
Sets | StrainIQ | KrakenUniq | StrainIQ’s Lead (%) |
---|---|---|---|
Set 1 | 211 | 176 | 11.67 |
Set 2 | 196 | 190 | 2.00 |
Set 3 | 190 | 198 | −2.67 |
Set 4 | 183 | 140 | 21.50 |
Set 5 | 175 | 143 | 16.00 |
Set 6 | 147 | 151 | −2.00 |
Set 7 | 203 | 127 | 38.00 |
Set 8 | 187 | 145 | 21.00 |
Set 9 | 179 | 147 | 16.00 |
Set 10 | 173 | 142 | 15.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pandey, S.; Avuthu, N.; Guda, C. StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level. Genes 2023, 14, 1647. https://doi.org/10.3390/genes14081647
Pandey S, Avuthu N, Guda C. StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level. Genes. 2023; 14(8):1647. https://doi.org/10.3390/genes14081647
Chicago/Turabian StylePandey, Sanjit, Nagavardhini Avuthu, and Chittibabu Guda. 2023. "StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level" Genes 14, no. 8: 1647. https://doi.org/10.3390/genes14081647
APA StylePandey, S., Avuthu, N., & Guda, C. (2023). StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level. Genes, 14(8), 1647. https://doi.org/10.3390/genes14081647