What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities?
Abstract
:1. Introduction
2. Materials and Methods
2.1. Samples and Sample Preparation
2.2. Statistical Methods
3. Results
3.1. Observed Counts in the Swedish Snus Matrix
3.2. Comparing the Bias of Extraction Methods in Model Community Data
3.3. Effect of Tobacco Matrix on Bias in Model Community Data
3.4. Comparing Bias in ZMC and in ZMC + Matrix Samples to Bias in Real Samples
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Disclaimer
Appendix A
Appendix A.1. Measuring Bias Factors in a Single Model Community Sample Set
Appendix A.2. Comparing Bias Factors in Two Model Community Sample Sets
References
- Brooks, J.P.; Edwards, D.J.; Harwich, M.D., Jr.; Rivera, M.C.; Fettweis, J.M.; Serrano, M.G.; Reris, R.A.; Sheth, N.U.; Huang, B.; Girerd, P.; et al. The truth about metagenomics: Quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 2015, 15, 66. [Google Scholar] [CrossRef] [PubMed]
- McLaren, M.R.; Willis, A.D.; Callahan, B.J. Consistent and correctable bias in metagenomic sequencing experiments. Elife 2019, 8, e46923. [Google Scholar] [CrossRef] [PubMed]
- D’Amore, R.; Ijaz, U.Z.; Schirmer, M.; Kenny, J.G.; Gregory, R.; Darby, A.C.; Shakya, M.; Podar, M.; Quince, C.; Hall, N. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genom. 2016, 17, 55. [Google Scholar] [CrossRef]
- Hugerth, L.W.; Andersson, A.F. Analysing Microbial Community Composition through Amplicon Sequencing: From Sampling to Hypothesis Testing. Front. Microbiol. 2017, 8, 1561. [Google Scholar] [CrossRef] [PubMed]
- Pollock, J.; Glendinning, L.; Wisedchanwet, T.; Watson, M. The Madness of Microbiome: Attempting to Find Consensus “Best Practice” for 16S Microbiome Studies. Appl. Environ. Microbiol. 2018, 84, e02627-17. [Google Scholar] [CrossRef] [PubMed]
- Ross, M.G.; Russ, C.; Costello, M.; Hollinger, A.; Lennon, N.J.; Hegarty, R.; Nusbaum, C.; Jaffe, D.B. Characterizing and measuring bias in sequence data. Genome Biol. 2013, 14, R51. [Google Scholar] [CrossRef]
- Weiss, S.; Xu, Z.Z.; Peddada, S.; Amir, A.; Bittinger, K.; Gonzalez, A.; Lozupone, C.; Zaneveld, J.R.; Vazquez-Baeza, Y.; Birmingham, A.; et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 2017, 5, 27. [Google Scholar] [CrossRef]
- Lin, H.; Peddada, S.D. Analysis of microbial compositions: A review of normalization and differential abundance analysis. NPJ Biofilms Microbiomes 2020, 6, 60. [Google Scholar] [CrossRef]
- Morgan, J.L.; Darling, A.E.; Eisen, J.A. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS ONE 2010, 5, e10209. [Google Scholar] [CrossRef]
- Tyx, R.E.; Rivera, A.J.; Satten, G.A.; Keong, L.M.; Kuklenyik, P.; Lee, G.E.; Lawler, T.S.; Kimbrell, J.B.; Stanfill, S.B.; Valentin-Blasini, L.; et al. Associations between microbial communities and key chemical constituents in U.S. domestic moist snuff. PLoS ONE 2022, 17, e0267104. [Google Scholar] [CrossRef]
- Wu, L.; Li, F.; Deng, C.; Xu, D.; Jiang, S.; Xiong, Y. A method for obtaining DNA from compost. Appl. Microbiol. Biotechnol. 2009, 84, 389–395. [Google Scholar] [CrossRef] [PubMed]
- Tyx, R.E.; Stanfill, S.B.; Keong, L.M.; Rivera, A.J.; Satten, G.A.; Watson, C.H. Characterization of Bacterial Communities in Selected Smokeless Tobacco Products Using 16S rDNA Analysis. PLoS ONE 2016, 11, e0146939. [Google Scholar] [CrossRef] [PubMed]
- Fisher, M.T.; Bennett, C.B.; Hayes, A.; Kargalioglu, Y.; Knox, B.L.; Xu, D.; Muhammad-Kah, R.; Gaworski, C.L. Sources of and technical approaches for the abatement of tobacco specific nitrosamine formation in moist smokeless tobacco products. Food Chem. Toxicol. 2012, 50, 942–948. [Google Scholar] [CrossRef] [PubMed]
- Han, J.; Sanad, Y.M.; Deck, J.; Sutherland, J.B.; Li, Z.; Walters, M.J.; Duran, N.; Holman, M.R.; Foley, S.L. Bacterial populations associated with smokeless tobacco products. Appl. Environ. Microbiol. 2016, 82, 6273–6283. [Google Scholar] [CrossRef] [PubMed]
- Smyth, E.M.; Kulkarni, P.; Claye, E.; Stanfill, S.; Tyx, R.; Maddox, C.; Mongodin, E.F.; Sapkota, A.R. Smokeless tobacco products harbor diverse bacterial microbiota that differ across products and brands. Appl. Microbiol. Biotechnol. 2017, 101, 5391–5403. [Google Scholar] [CrossRef] [PubMed]
- Zhao, N.; Satten, G.A. A Log-Linear Model for Inference on Bias in Microbiome Studies. In Statistical Analysis of Microbiome Data; Datta, S., Guha, S., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 221–246. [Google Scholar]
- McMurdie, P.J.; Holmes, S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 2013, 8, e61217. [Google Scholar] [CrossRef]
- Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson, A.J.; Holmes, S.P. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef]
- Aitchison, J.; Barceló-Vidal, C.; Martín-Fernández, J.A.; Pawlowsky-Glahn, V. Logratio Analysis and Compositional Distance. Math. Geol. 2000, 32, 271–275. [Google Scholar] [CrossRef]
- Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967, 27, 209–220. [Google Scholar]
- Robert, P.; Escoufier, Y. A Unifying Tool for Linear Multivariate Statistical Methods: The RV- Coefficient. J. R. Stat. Soc. Ser. C Appl. Stat. 1976, 25, 257–265. [Google Scholar] [CrossRef]
- Minas, C.; Curry, E.; Montana, G. A distance-based test of association between paired heterogeneous genomic data. Bioinformatics 2013, 29, 2555–2563. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Taxon | Base | Enzymes | Lifeguard® | RNAProtect | |
---|---|---|---|---|---|
ZMC | |||||
Estimated | Bacillus | 0.744 | 0.530 | 0.751 | 0.872 |
Listeria | 0.349 | 0.534 | 0.383 | 0.286 | |
Staphylococcus | −0.006 | 0.136 | 0.142 | −0.201 | |
Enterococcus | 0.082 | −0.051 | 0.140 | 0.058 | |
Lactobacillus | −0.023 | 0.053 | 0.099 | −0.221 | |
Escherichia/Shigella | −0.399 | −0.433 | −0.572 | −0.302 | |
Salmonella | −0.414 | −0.450 | −0.578 | −0.318 | |
Pseudomonas | −0.333 | −0.318 | −0.365 | −0.174 | |
Effect Size | 0.141 | 0.135 | 0.194 | 0.145 | |
Test of Presence of Bias | 0.0021 | 0.0017 | 0.0003 | <1 × 10−4 | |
ZMC + Matrix | |||||
Estimated | Bacillus | 0.730 | 0.013 | 0.842 | 0.735 |
Listeria | 0.487 | 0.395 | 0.448 | 0.319 | |
Staphylococcus | −0.282 | 0.315 | −0.215 | −0.738 | |
Enterococcus | 0.414 | −0.408 | 0.281 | 0.020 | |
Lactobacillus | 0.438 | 0.402 | 0.340 | 0.253 | |
Escherichia/Shigella | −0.632 | −0.285 | −0.590 | −0.233 | |
Salmonella | −0.665 | −0.292 | −0.621 | −0.249 | |
Pseudomonas | −0.490 | −0.140 | −0.485 | −0.107 | |
Effect Size | 0.287 | 0.096 | 0.265 | 0.172 | |
Test of Presence of Bias | <1 × 10−4 | 0.0002 | <1 × 10−4 | <1 × 10−4 |
ZMC + Matrix (Overall: 0.152 (p < 1 × 10−4)) | |||||
ZMC (Overall: 0.029 ( = 0.1646)) | Base | Enzymes | Lifeguard® | RNAProtect | |
Base | 0.242 ( < 1 × 10−4) | 0.006 ( = 0.4384) | 0.113 ( = 0.0006) | ||
Enzymes | 0.016 ( = 0.4212) | 0.221 ( < 1 × 10−4) | 0.231 ( < 1 × 10−4) | ||
Lifeguard® | 0.012 ( = 0.5032) | 0.019 ( = 0.3092) | 0.098 ( = 0.0016) | ||
RNAProtect | 0.018 ( = 0.3907) | 0.054 ( = 0.0738) | 0.053 ( = 0.0797) |
p-Value (10,000 Permutation Replicates) | |||||
Base | Enzymes | Lifeguard | RNAProtect | Overall | |
Effect Size | 0.070 | 0.081 | 0.029 | 0.069 | 0.062 |
p-value | 0.0134 | 0.0130 | 0.2126 | 0.0118 | 0.0032 |
Protocol Pairs | ZMC | ZMC + Matrix | Tobacco |
Base vs. Enzymes | 0.2141 | 0.3645 | 0.3564 |
Base vs. Lifeguard | 0.1897 | 0.0583 | 0.1776 |
Base vs. RNAProtect | 0.2270 | 0.2490 | 0.1830 |
Enzymes vs. Lifeguard | 0.2319 | 0.3483 | 0.3769 |
Enzymes vs. RNAProtect | 0.3970 | 0.3558 | 0.3700 |
Lifeguard vs. RNAProtect | 0.3940 | 0.2321 | 0.1704 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, M.; Tyx, R.E.; Rivera, A.J.; Zhao, N.; Satten, G.A. What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities? Genes 2022, 13, 1758. https://doi.org/10.3390/genes13101758
Li M, Tyx RE, Rivera AJ, Zhao N, Satten GA. What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities? Genes. 2022; 13(10):1758. https://doi.org/10.3390/genes13101758
Chicago/Turabian StyleLi, Mo, Robert E. Tyx, Angel J. Rivera, Ni Zhao, and Glen A. Satten. 2022. "What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities?" Genes 13, no. 10: 1758. https://doi.org/10.3390/genes13101758
APA StyleLi, M., Tyx, R. E., Rivera, A. J., Zhao, N., & Satten, G. A. (2022). What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities? Genes, 13(10), 1758. https://doi.org/10.3390/genes13101758