Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review
Abstract
:1. Introduction
2. Metagenomics
3. Sampling and Storage
4. Sequencing
5. Bioinformatic Processing
5.1. Quality Assessment and Filtration of Readings
5.2. Contig Assembly De Novo
5.3. Contig Clustering
5.4. Quality Assessments of MAGs
5.5. Defining and Analyzing the Pangenome
5.6. Taxonomic Profiling
5.7. Construction of Phylogenetic Trees from Metagenomic Data
5.8. Determining Gene Functions and MAG Metabolic Profiles
5.9. Integrating Metagenomic Data with Metadata
6. Software for Comprehensive Metagenomic Analyses
7. Conclusions
8. Glossary
- ASCII (American Standard Code for Information Interchange) is a character encoding system utilized in computers and communication devices to symbolize textual characters, with each character being allocated a distinct numerical value represented as an integer within the range of 0–127.
- FASTA is a file format employed for the storage of DNA, RNA, and protein sequences.
- Deep learning is a division of artificial intelligence (AI) that concentrates on the development and training of neural networks capable of learning and executing tasks automatically, without the need for explicit programming.
- HTML (Hypertext Markup Language) is a markup language utilized for the construction of websites, serving as the foundational language for structuring and presenting content on the web.
- HTS (high-throughput screening) involves high-throughput techniques for screening vast quantities of substances, leveraging automation and miniaturization to analyze numerous substances simultaneously. Various detection methods are employed, such as chemical reactions, absorbance, fluorescence, and bioluminescence, to identify the substances being tested.
- Kbp (kilobase pair) is a unit of measurement in molecular biology equivalent to 1000 nucleobase pairs.
- A k-mer is a nucleotide sequence of length k in DNA or RNA, comprising any of the four nucleotides: adenine, guanine, cytosine, and thymine in DNA or uracil in RNA.
- Contig refers to a continuous series of nucleotides within the genome, generated by amalgamating DNA sequence reads.
- MAG (metagenome-assembled genome) denotes a genome reconstructed from the combined genetic material present in a sample containing taxonomically diverse organisms from a specific environment.
- NGS (next-generation sequencing) encompasses advanced sequencing methodologies that facilitate rapid and simultaneous reading of multiple DNA fragments.
- OTU (operational taxonomic unit) is a taxonomic grouping for nucleotide sequences based on their sequence similarity.
- PHRED is a computational tool that evaluates the quality of DNA sequences acquired during sequencing, providing a probability estimation of errors in reading specific nucleotides. The resultant quality assessment, known as the PHRED score, is expressed as a numerical value on a logarithmic scale (0, 20, 40, 60), where higher values indicate greater accuracy in reading.
- Scaffolds denote extended sequences comprising ordered and linked contigs, representing a segment of the genome not assigned to a particular chromosome.
- TSV (tab-separated values) is a text file format where values are delimited by the tab character (TAB), facilitating the storage and transmission of data in tabular form.
- WMS (whole-metagenome sequencing) encompasses complete sequencing of the metagenome, enabling the analysis of all genetic material within a metagenomic sample.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cocolin, L.; Alessandria, V.; Dolci, P.; Gorra, R.; Rantsiou, K. Culture independent methods to assess the diversity and dynamics of microbiota during food fermentation. Int. J. Food Microbiol. 2013, 167, 29–43. [Google Scholar] [CrossRef]
- Pogačić, T.; Kelava, N.; Zamberlin, Š.; Dolenčić-Špehar, I.; Samaržija, D. Methods for culture-independent identification of lactic acid bacteria in dairy products. Food Technol. Biotechnol. 2010, 48, 3–11. [Google Scholar]
- Capozzi, F.; Bordoni, A. Foodomics: A new comprehensive approach to food and nutrition. Genes Nutr. 2013, 8, 1–4. [Google Scholar] [CrossRef] [PubMed]
- Cifuentes, A. Food analysis and foodomics. J. Chromatogr. A 2009, 1216, 7109. [Google Scholar] [CrossRef] [PubMed]
- Alhoshy, M.; Shehata, A.I.; Habib, Y.J.; Abdel-Latif, H.M.; Wang, Y.; Zhang, Z. Nutrigenomics in crustaceans: Current status and future prospects. Fish Shellfish. Immunol. 2022, 129, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Marcum, J.A. Nutrigenetics/nutrigenomics, personalized nutrition, and precision healthcare. Curr. Nutr. Rep. 2020, 9, 338–345. [Google Scholar] [CrossRef]
- Ordovas, J.M.; Ferguson, L.R.; Tai, E.S.; Mathers, J.C. Personalised nutrition and health. BMJ 2018, 361, bmj.k2173. [Google Scholar] [CrossRef]
- Dong, Z.; Chen, Y. Transcriptomics: Advances and approaches. Sci. China Life Sci. 2013, 56, 960–967. [Google Scholar] [CrossRef]
- Allard, M.W.; Bell, R.; Ferreira, C.M.; Gonzalez-Escalona, N.; Hoffmann, M.; Muruvanda, T.; Ottesen, A.; Ramachandran, P.; Reed, E.; Sharma, S. Genomics of foodborne pathogens for microbial food safety. Curr. Opin. Biotechnol. 2018, 49, 224–229. [Google Scholar] [CrossRef]
- Van Eck, N.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
- Gilbert, J.A.; Hughes, M. Gene expression profiling: Metatranscriptomics. In High-Throughput Next Generation Sequencing: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2011; pp. 195–205. [Google Scholar]
- Dollive, S.; Peterfreund, G.L.; Sherrill-Mix, S.; Bittinger, K.; Sinha, R.; Hoffmann, C.; Nabel, C.S.; Hill, D.A.; Artis, D.; Bachman, M.A. A tool kit for quantifying eukaryotic rRNA gene sequences from human microbiome samples. Genome Biol. 2012, 13, 1–13. [Google Scholar] [CrossRef]
- Tickle, T.L.; Segata, N.; Waldron, L.; Weingart, U.; Huttenhower, C. Two-stage microbial community experimental design. ISME J. 2013, 7, 2330–2339. [Google Scholar] [CrossRef] [PubMed]
- Staniszewski, A.; Kordowska-Wiater, M. Probiotic Yeasts and How to Find Them—Polish Wines of Spontaneous Fermentation as Source for Potentially Probiotic Yeasts. Foods 2023, 12, 3392. [Google Scholar] [CrossRef] [PubMed]
- Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65. [Google Scholar] [CrossRef] [PubMed]
- Zinger, L.; Gobet, A.; Pommier, T. Two decades of describing the unseen majority of aquatic microbial diversity. Mol. Ecol. 2012, 21, 1878–1896. [Google Scholar] [CrossRef] [PubMed]
- Kable, M.E.; Srisengfa, Y.; Xue, Z.; Coates, L.C.; Marco, M.L. Viable and total bacterial populations undergo equipment-and time-dependent shifts during milk processing. Appl. Environ. Microbiol. 2019, 85, e00270-19. [Google Scholar] [CrossRef]
- Barcenilla, C.; Cobo-Díaz, J.F.; De Filippis, F.; Valentino, V.; Cabrera Rubio, R.; O’Neil, D.; Mahler de Sanchez, L.; Armanini, F.; Carlino, N.; Blanco-Míguez, A. Improved sampling and DNA extraction procedures for microbiome analysis in food-processing environments. Nat. Protoc. 2024, 19, 1–20. [Google Scholar] [CrossRef] [PubMed]
- Knight, R.; Jansson, J.; Field, D.; Fierer, N.; Desai, N.; Fuhrman, J.A.; Hugenholtz, P.; Van Der Lelie, D.; Meyer, F.; Stevens, R. Unlocking the potential of metagenomics through replicated experimental design. Nat. Biotechnol. 2012, 30, 513–520. [Google Scholar] [CrossRef]
- Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J.R.; Amaral-Zettler, L.; Gilbert, J.A.; Karsch-Mizrachi, I.; Johnston, A.; Cochrane, G. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 2011, 29, 415–420. [Google Scholar] [CrossRef]
- Salazar, J.K.; Carstens, C.K.; Ramachandran, P.; Shazer, A.G.; Narula, S.S.; Reed, E.; Ottesen, A.; Schill, K.M. Metagenomics of pasteurized and unpasteurized gouda cheese using targeted 16S rDNA sequencing. BMC Microbiol. 2018, 18, 1–13. [Google Scholar] [CrossRef]
- Soyuçok, A.; Yurt, M.N.Z.; Altunbas, O.; Ozalp, V.C.; Sudagidan, M. Metagenomic and chemical analysis of Tarhana during traditional fermentation process. Food Biosci. 2021, 39, 100824. [Google Scholar] [CrossRef]
- Mancini, A.; Rodriguez, M.C.; Zago, M.; Cologna, N.; Goss, A.; Carafa, I.; Tuohy, K.; Merz, A.; Franciosi, E. Massive survey on bacterial–bacteriophages biodiversity and quality of natural whey starter cultures in Trentingrana cheese production. Front. Microbiol. 2021, 12, 678012. [Google Scholar] [CrossRef] [PubMed]
- Kaashyap, M.; Cohen, M.; Mantri, N. Microbial diversity and characteristics of kombucha as revealed by metagenomic and physicochemical analysis. Nutrients 2021, 13, 4446. [Google Scholar] [CrossRef] [PubMed]
- Fabricio, M.F.; Mann, M.B.; Kothe, C.I.; Frazzon, J.; Tischer, B.; Flôres, S.H.; Ayub, M.A.Z. Effect of freeze-dried kombucha culture on microbial composition and assessment of metabolic dynamics during fermentation. Food Microbiol. 2022, 101, 103889. [Google Scholar] [CrossRef] [PubMed]
- Treviso, R.L.; Sant’Anna, V.; Fabricio, M.F.; Ayub, M.A.Z.; Brandelli, A.; Hickert, L.R. Time and temperature influence on physicochemical, microbiological, and sensory profiles of yerba mate kombucha. J. Food Sci. Technol. 2024, 1–10. [Google Scholar] [CrossRef]
- González-Orozco, B.D.; García-Cano, I.; Escobar-Zepeda, A.; Jiménez-Flores, R.; Álvarez, V.B. Metagenomic analysis and antibacterial activity of kefir microorganisms. J. Food Sci. 2023, 88, 2933–2949. [Google Scholar] [CrossRef] [PubMed]
- Nejati, F.; Capitain, C.C.; Krause, J.L.; Kang, G.-U.; Riedel, R.; Chang, H.-D.; Kurreck, J.; Junne, S.; Weller, P.; Neubauer, P. Traditional Grain-Based vs. Commercial Milk Kefirs, How Different Are They? Appl. Sci. 2022, 12, 3838. [Google Scholar] [CrossRef]
- Qu, T.; Wang, P.; Zhao, X.; Liang, L.; Ge, Y.; Chen, Y. Metagenomics reveals differences in the composition of bacterial antimicrobial resistance and antibiotic resistance genes in pasteurized yogurt and probiotic bacteria yogurt from China. J. Dairy Sci. 2024, 107, 3451–3467. [Google Scholar] [CrossRef] [PubMed]
- Luzzi, G.; Brinks, E.; Fritsche, J.; Franz, C.M. Microbial composition of sweetness-enhanced yoghurt during fermentation and storage. AMB Express 2020, 10, 1–7. [Google Scholar] [CrossRef]
- Kim, E.; Cho, E.-J.; Yang, S.-M.; Kim, M.-J.; Kim, H.-Y. Novel approaches for the identification of microbial communities in kimchi: MALDI-TOF MS analysis and high-throughput sequencing. Food Microbiol. 2021, 94, 103641. [Google Scholar] [CrossRef]
- Hwang, H.; Lee, H.J.; Lee, M.-A.; Sohn, H.; Chang, Y.H.; Han, S.G.; Jeong, J.Y.; Lee, S.H.; Hong, S.W. Selection and characterization of Staphylococcus hominis subsp. hominis WiKim0113 isolated from kimchi as a starter culture for the production of natural pre-converted nitrite. Food Sci. Anim. Resour. 2020, 40, 512. [Google Scholar] [CrossRef]
- Jeong, C.-H.; Sohn, H.; Hwang, H.; Lee, H.-J.; Kim, T.-W.; Kim, D.-S.; Kim, C.-S.; Han, S.-G.; Hong, S.-W. Comparison of the probiotic potential between Lactiplantibacillus plantarum isolated from kimchi and standard probiotic strains isolated from different sources. Foods 2021, 10, 2125. [Google Scholar] [CrossRef] [PubMed]
- Tlais, A.Z.A.; Lemos Junior, W.J.F.; Filannino, P.; Campanaro, S.; Gobbetti, M.; Di Cagno, R. How microbiome composition correlates with biochemical changes during sauerkraut fermentation: A focus on neglected bacterial players and functionalities. Microbiol. Spectr. 2022, 10, e00168-22. [Google Scholar] [CrossRef]
- Zhang, J.; Song, H.S.; Zhang, C.; Kim, Y.B.; Roh, S.W.; Liu, D. Culture-independent analysis of the bacterial community in Chinese fermented vegetables and genomic analysis of lactic acid bacteria. Arch. Microbiol. 2021, 203, 4693–4703. [Google Scholar] [CrossRef]
- Frank, J.A.; Pan, Y.; Tooming-Klunderud, A.; Eijsink, V.G.; McHardy, A.C.; Nederbragt, A.J.; Pope, P.B. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci. Rep. 2016, 6, 25373. [Google Scholar] [CrossRef] [PubMed]
- Nicholls, S.M.; Quick, J.C.; Tang, S.; Loman, N.J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 2019, 8, giz043. [Google Scholar] [CrossRef]
- Ardui, S.; Ameur, A.; Vermeesch, J.R.; Hestand, M.S. Single molecule real-time (SMRT) sequencing comes of age: Applications and utilities for medical diagnostics. Nucleic Acids Res. 2018, 46, 2159–2168. [Google Scholar] [CrossRef] [PubMed]
- Wick, R.R.; Judd, L.M.; Holt, K.E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019, 20, 1–10. [Google Scholar] [CrossRef]
- Kothe, C.I.; Mohellibi, N.; Renault, P. Revealing the microbial heritage of traditional Brazilian cheeses through metagenomics. Food Res. Int. 2022, 157, 111265. [Google Scholar] [CrossRef]
- Suárez, N.; Weckx, S.; Minahk, C.; Hebert, E.M.; Saavedra, L. Metagenomics-based approach for studying and selecting bioprotective strains from the bacterial community of artisanal cheeses. Int. J. Food Microbiol. 2020, 335, 108894. [Google Scholar] [CrossRef]
- Kothe, C.I.; Bolotin, A.; Kraïem, B.-F.; Dridi, B.; Team, F.M.; Renault, P. Unraveling the world of halophilic and halotolerant bacteria in cheese by combining cultural, genomic and metagenomic approaches. Int. J. Food Microbiol. 2021, 358, 109312. [Google Scholar] [CrossRef]
- Bellassi, P.; Rocchetti, G.; Nocetti, M.; Lucini, L.; Masoero, F.; Morelli, L. A combined metabolomic and metagenomic approach to discriminate raw milk for the production of hard cheese. Foods 2021, 10, 109. [Google Scholar] [CrossRef] [PubMed]
- Pradhan, S.; Prabhakar, M.R.; Karthika Parvathy, K.; Dey, B.; Jayaraman, S.; Behera, B.; Paramasivan, B. Metagenomic and physicochemical analysis of Kombucha beverage produced from tea waste. J. Food Sci. Technol. 2023, 60, 1088–1096. [Google Scholar] [CrossRef] [PubMed]
- Góes-Neto, A.; Kukharenko, O.; Orlovska, I.; Podolich, O.; Imchen, M.; Kumavath, R.; Kato, R.B.; de Carvalho, D.S.; Tiwari, S.; Brenig, B. Shotgun metagenomic analysis of kombucha mutualistic community exposed to mars-like environment outside the international space station. Environ. Microbiol. 2021, 23, 3727–3742. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Lagishetty, V.; Kurnia, P.; Henning, S.M.; Ahdoot, A.I.; Jacobs, J.P. Microbial and chemical profiles of commercial kombucha products. Nutrients 2022, 14, 670. [Google Scholar] [CrossRef] [PubMed]
- Landis, E.A.; Fogarty, E.; Edwards, J.C.; Popa, O.; Eren, A.M.; Wolfe, B.E. Microbial diversity and interaction specificity in kombucha tea fermentations. mSystems 2022, 7, e00157-22. [Google Scholar] [CrossRef]
- Liu, S.; Lu, S.-Y.; Qureshi, N.; Enshasy, H.A.E.; Skory, C.D. Antibacterial property and metagenomic analysis of milk kefir. Probiotics Antimicrob. Proteins 2022, 14, 1170–1183. [Google Scholar] [CrossRef] [PubMed]
- Biçer, Y.; Telli, A.E.; Sönmez, G.; Turkal, G.; Telli, N.; Uçar, G. Comparison of commercial and traditional kefir microbiota using metagenomic analysis. Int. J. Dairy Technol. 2021, 74, 528–534. [Google Scholar] [CrossRef]
- Walsh, L.H.; Coakley, M.; Walsh, A.M.; Crispie, F.; O’Toole, P.W.; Cotter, P.D. Analysis of the milk kefir pan-metagenome reveals four community types, core species, and associated metabolic pathways. Iscience 2023, 26, 108004. [Google Scholar] [CrossRef]
- Aydin, S.; Erözden, A.A.; Tavşanlı, N.; Müdüroğlu, A.; Çalışkan, M.; Kara, İ. Anthocyanin Addition to Kefir: Metagenomic Analysis of Microbial Community Structure. Curr. Microbiol. 2022, 79, 327. [Google Scholar] [CrossRef]
- Qiu, S.; Zeng, H.; Yang, Z.; Hung, W.L.; Wang, B.; Yang, A. Dynamic metagenome-scale metabolic modeling of a yogurt bacterial community. Biotechnol. Bioeng. 2023, 120, 2186–2198. [Google Scholar] [CrossRef] [PubMed]
- Suh, S.H.; Kim, M.K. Microbial communities related to sensory characteristics of commercial drinkable yogurt products in Korea. Innov. Food Sci. Emerg. Technol. 2021, 67, 102565. [Google Scholar] [CrossRef]
- Samelis, J.; Doulgeraki, A.I.; Bikouli, V.; Pappas, D.; Kakouri, A. Microbiological and metagenomic characterization of a retail delicatessen Galotyri-like fresh acid-curd cheese product. Fermentation 2021, 7, 67. [Google Scholar] [CrossRef]
- Le Roy, C.I.; Kurilshikov, A.; Leeming, E.R.; Visconti, A.; Bowyer, R.C.; Menni, C.; Falchi, M.; Koutnikova, H.; Veiga, P.; Zhernakova, A. Yoghurt consumption is associated with changes in the composition of the human gut microbiome and metabolome. BMC Microbiol. 2022, 22, 39. [Google Scholar]
- Oh, Y.-J.; Park, Y.-R.; Hong, J.; Lee, D.-Y. Metagenomic, Metabolomic, and Functional Evaluation of Kimchi Broth Treated with Light-Emitting Diodes (LEDs). Metabolites 2021, 11, 472. [Google Scholar] [CrossRef] [PubMed]
- Park, D.H. Effects of carbon dioxide on metabolite production and bacterial communities during kimchi fermentation. Biosci. Biotechnol. Biochem. 2018, 82, 1234–1242. [Google Scholar] [CrossRef]
- Gaudioso, G.; Weil, T.; Marzorati, G.; Solovyev, P.; Bontempo, L.; Franciosi, E.; Bertoldi, L.; Pedrolli, C.; Tuohy, K.M.; Fava, F. Microbial and metabolic characterization of organic artisanal sauerkraut fermentation and study of gut health-promoting properties of sauerkraut brine. Front. Microbiol. 2022, 13, 929738. [Google Scholar] [CrossRef] [PubMed]
- Huang, W.; Peng, H.; Chen, J.; Yan, X.; Zhang, Y. Bacterial diversity analysis of Chaozhou Sauerkraut based on high-throughput sequencing of different production methods. Fermentation 2023, 9, 282. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, Y.; Wu, L.; Zhang, L.; Wang, S. Characterization of microbiota of naturally fermented sauerkraut by high-throughput sequencing. Food Sci. Biotechnol. 2023, 32, 855–862. [Google Scholar] [CrossRef]
- Thriene, K.; Hansen, S.S.; Binder, N.; Michels, K.B. Effects of fermented vegetable consumption on human gut microbiome diversity—A pilot study. Fermentation 2022, 8, 118. [Google Scholar] [CrossRef]
- Falgueras, J.; Lara, A.J.; Fernández-Pozo, N.; Cantón, F.R.; Pérez-Trabado, G.; Claros, M.G. SeqTrim: A high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinform. 2010, 11, 1–12. [Google Scholar] [CrossRef]
- Aronesty, E. Ea-Utils: Command-Line Tools for Processing Biological Sequencing Data; Expression Analysis: Durham, NC, USA, 2011. [Google Scholar]
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Mai, Y.; Liu, D.; He, W.; Lin, X.; Xu, C.; Zhang, L.; Meng, X.; Mafofo, J.; Zaher, W.A. Fast-bonito: A faster deep learning based basecaller for nanopore sequencing. Artif. Intell. Life Sci. 2021, 1, 100011. [Google Scholar] [CrossRef]
- Zeng, J.; Cai, H.; Peng, H.; Wang, H.; Zhang, Y.; Akutsu, T. Causalcall: Nanopore basecalling using a temporal convolutional network. Front. Genet. 2020, 10, 1332. [Google Scholar] [CrossRef] [PubMed]
- Leggett, R.M.; Heavens, D.; Caccamo, M.; Clark, M.D.; Davey, R.P. NanoOK: Multi-reference alignment analysis of nanopore sequencing data, quality and error profiles. Bioinformatics 2016, 32, 142–144. [Google Scholar] [CrossRef] [PubMed]
- Simpson, J.T.; Pop, M. The theory and practice of genome sequence assembly. Annu. Rev. Genom. Hum. Genet. 2015, 16, 153–172. [Google Scholar] [CrossRef] [PubMed]
- Schwartz, D.C.; Waterman, M.S. New generations: Sequencing machines and their computational challenges. J. Comput. Sci. Technol. 2010, 25, 3. [Google Scholar] [CrossRef]
- Zaheer, R.; Noyes, N.; Ortega Polo, R.; Cook, S.R.; Marinier, E.; Van Domselaar, G.; Belk, K.E.; Morley, P.S.; McAllister, T.A. Impact of sequencing depth on the characterization of the microbiome and resistome. Sci. Rep. 2018, 8, 1–11. [Google Scholar] [CrossRef]
- Howe, A.; Chain, P.S. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial). Front. Microbiol. 2015, 6, 678. [Google Scholar] [CrossRef]
- Peng, Y.; Leung, H.C.; Yiu, S.-M.; Chin, F.Y. Meta-IDBA: A de Novo assembler for metagenomic data. Bioinformatics 2011, 27, i94–i101. [Google Scholar] [CrossRef]
- Mapleson, D.; Drou, N.; Swarbreck, D. RAMPART: A workflow management system for de novo genome assembly. Bioinformatics 2015, 31, 1824–1826. [Google Scholar] [CrossRef] [PubMed]
- Peng, Y.; Leung, H.C.; Yiu, S.-M.; Chin, F.Y. IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012, 28, 1420–1428. [Google Scholar] [CrossRef]
- Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
- Li, D.; Liu, C.-M.; Luo, R.; Sadakane, K.; Lam, T.-W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015, 31, 1674–1676. [Google Scholar] [CrossRef]
- Boisvert, S.; Raymond, F.; Godzaridis, É.; Laviolette, F.; Corbeil, J. Ray Meta: Scalable de novo metagenome assembly and profiling. Genome Biol. 2012, 13, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Sato, K.; Sakakibara, Y. An extended genovo metagenomic assembler by incorporating paired-end information. PeerJ 2013, 1, e196. [Google Scholar]
- Kim, M.; Zhang, X.; Ligo, J.G.; Farnoud, F.; Veeravalli, V.V.; Milenkovic, O. MetaCRAM: An integrated pipeline for metagenomic taxonomy identification and compression. BMC Bioinform. 2016, 17, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Wood, D.E.; Salzberg, S.L. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014, 15, 1–12. [Google Scholar] [CrossRef]
- Yang, X.; Charlebois, P.; Gnerre, S.; Coole, M.G.; Lennon, N.J.; Levin, J.Z.; Qu, J.; Ryan, E.M.; Zody, M.C.; Henn, M.R. De novo assembly of highly diverse viral populations. BMC Genom. 2012, 13, 1–13. [Google Scholar] [CrossRef]
- Brady, A.; Salzberg, S. PhymmBL expanded: Confidence scores, custom databases, parallelization and more. Nat. Methods 2011, 8, 367. [Google Scholar] [CrossRef]
- Cock, P.J.; Chilton, J.M.; Grüning, B.; Johnson, J.E.; Soranzo, N. NCBI BLAST+ integrated into Galaxy. Gigascience 2015, 4, s13742-015. [Google Scholar] [CrossRef]
- Gregor, I.; Dröge, J.; Schirmer, M.; Quince, C.; McHardy, A.C. PhyloPythiaS+: A self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 2016, 4, e1603. [Google Scholar] [CrossRef]
- Haider, B.; Ahn, T.-H.; Bushnell, B.; Chai, J.; Copeland, A.; Pan, C. Omega: An overlap-graph de novo assembler for metagenomics. Bioinformatics 2014, 30, 2717–2722. [Google Scholar] [CrossRef]
- Ruby, J.G.; Bellare, P.; DeRisi, J.L. PRICE: Software for the targeted assembly of components of (Meta) genomic sequence data. G3 Genes Genomes Genet. 2013, 3, 865–880. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Wang, J.; Xia, D.; Yu, G. EnSVMB: Metagenomics fragments classification using ensemble SVM and BLAST. Sci. Rep. 2017, 7, 9440. [Google Scholar] [CrossRef] [PubMed]
- Myers, E.W.; Sutton, G.G.; Delcher, A.L.; Dew, I.M.; Fasulo, D.P.; Flanigan, M.J.; Kravitz, S.A.; Mobarry, C.M.; Reinert, K.H.; Remington, K.A. A whole-genome assembly of Drosophila. Science 2000, 287, 2196–2204. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Huang, P.; You, R.; Sun, F.; Zhu, S. MetaBinner: A high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol. 2023, 24, 1. [Google Scholar] [CrossRef]
- Kelley, D.R.; Salzberg, S.L. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinform. 2010, 11, 1–12. [Google Scholar] [CrossRef]
- Strous, M.; Kraft, B.; Bisdorf, R.; Tegetmeyer, H.E. The binning of metagenomic contigs for microbial physiology of mixed cultures. Front. Microbiol. 2012, 3, 410. [Google Scholar] [CrossRef]
- Kang, D.D.; Li, F.; Kirton, E.; Thomas, A.; Egan, R.; An, H.; Wang, Z. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 2019, 7, e7359. [Google Scholar] [CrossRef]
- Imelfort, M.; Parks, D.; Woodcroft, B.J.; Dennis, P.; Hugenholtz, P.; Tyson, G.W. GroopM: An automated tool for the recovery of population genomes from related metagenomes. PeerJ 2014, 2, e603. [Google Scholar] [CrossRef]
- Alneberg, J.; Bjarnason, B.S.; de Bruijn, I.; Schirmer, M.; Quick, J.; Ijaz, U.Z.; Loman, N.J.; Andersson, A.F.; Quince, C. CONCOCT: Clustering contigs on coverage and composition. arXiv 2013, arXiv:1312.4038. [Google Scholar]
- Albertsen, M.; Hugenholtz, P.; Skarshewski, A.; Nielsen, K.L.; Tyson, G.W.; Nielsen, P.H. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 2013, 31, 533–538. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Z.; Li, X.; Guo, L. Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases. In Interdisciplinary Sciences: Computational Life Sciences; Springer: Berlin/Heidelberg, Germany, 2022; Volume 14, pp. 795–803. [Google Scholar]
- Parks, D.H.; Imelfort, M.; Skennerton, C.T.; Hugenholtz, P.; Tyson, G.W. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015, 25, 1043–1055. [Google Scholar] [CrossRef] [PubMed]
- Tettelin, H.; Riley, D.; Cattuto, C.; Medini, D. Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 2008, 11, 472–477. [Google Scholar] [CrossRef] [PubMed]
- Mosquera-Rendón, J.; Rada-Bravo, A.M.; Cárdenas-Brito, S.; Corredor, M.; Restrepo-Pineda, E.; Benítez-Páez, A. Pangenome-wide and molecular evolution analyses of the Pseudomonas aeruginosa species. BMC Genom. 2016, 17, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Lapierre, P.; Gogarten, J.P. Estimating the size of the bacterial pan-genome. Trends Genet. 2009, 25, 107–110. [Google Scholar] [CrossRef] [PubMed]
- Jordan, I.K.; Makarova, K.S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res. 2001, 11, 555–565. [Google Scholar] [CrossRef] [PubMed]
- Overbeek, R.; Olson, R.; Pusch, G.D.; Olsen, G.J.; Davis, J.J.; Disz, T.; Edwards, R.A.; Gerdes, S.; Parrello, B.; Shukla, M. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014, 42, D206–D214. [Google Scholar] [CrossRef]
- Seemann, T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics 2014, 30, 2068–2069. [Google Scholar] [CrossRef]
- Tabari, E.; Su, Z. PorthoMCL: Parallel orthology prediction using MCL for the realm of massive genome availability. Big Data Anal. 2017, 2, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Chaudhari, N.M.; Gupta, V.K.; Dutta, C. BPGA-an ultra-fast pan-genome analysis pipeline. Sci. Rep. 2016, 6, 24373. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Blom, J.; Glaeser, S.; Jaenicke, S.; Juhre, T.; Rupp, O.; Schwengers, O.; Spänig, S.; Goesmann, A. A review of bioinformatics platforms for comparative genomics. Recent developments of the EDGAR 2.0 platform and its utility for taxonomic and phylogenetic studies. J. Biotechnol. 2017, 261, 2–9. [Google Scholar] [CrossRef] [PubMed]
- Contreras-Moreira, B.; Vinuesa, P. GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl. Environ. Microbiol. 2013, 79, 7696–7701. [Google Scholar] [CrossRef] [PubMed]
- Pantoja, Y.; Pinheiro, K.; Veras, A.; Araújo, F.; Lopes de Sousa, A.; Guimarães, L.C.; Silva, A.; Ramos, R.T. PanWeb: A web interface for pan-genomic analysis. PLoS ONE 2017, 12, e0178154. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Wu, J.; Yang, J.; Sun, S.; Xiao, J.; Yu, J. PGAP: Pan-genomes analysis pipeline. Bioinformatics 2012, 28, 416–418. [Google Scholar] [CrossRef] [PubMed]
- Page, A.J.; Cummins, C.A.; Hunt, M.; Wong, V.K.; Reuter, S.; Holden, M.T.; Fookes, M.; Falush, D.; Keane, J.A.; Parkhill, J. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015, 31, 3691–3693. [Google Scholar] [CrossRef]
- Calle, M.L. Statistical analysis of metagenomics data. Genom. Inform. 2019, 17, e6. [Google Scholar] [CrossRef]
- Segata, N.; Börnigen, D.; Morgan, X.C.; Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 2013, 4, 2304. [Google Scholar] [CrossRef]
- Darling, A.E.; Jospin, G.; Lowe, E.; Matsen IV, F.A.; Bik, H.M.; Eisen, J.A. PhyloSift: Phylogenetic analysis of genomes and metagenomes. PeerJ 2014, 2, e243. [Google Scholar] [CrossRef]
- Wu, Y.-W. ezTree: An automated pipeline for identifying phylogenetic marker genes and inferring evolutionary relationships among uncultivated prokaryotic draft genomes. BMC Genom. 2018, 19, 7–16. [Google Scholar] [CrossRef] [PubMed]
- Lee, M.D. GToTree: A user-friendly workflow for phylogenomics. Bioinformatics 2019, 35, 4162–4164. [Google Scholar] [CrossRef] [PubMed]
- Wu, M.; Eisen, J.A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008, 9, 1–11. [Google Scholar] [CrossRef]
- Marçais, G.; Delcher, A.L.; Phillippy, A.M.; Coston, R.; Salzberg, S.L.; Zimin, A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 2018, 14, e1005944. [Google Scholar] [CrossRef]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef]
- Xie, R.; Zan, X.; Chu, L.; Su, Y.; Xu, P.; Liu, W. Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage. BMC Bioinform. 2023, 24, 111. [Google Scholar] [CrossRef] [PubMed]
- Garriga, E.; Di Tommaso, P.; Magis, C.; Erb, I.; Mansouri, L.; Baltzis, A.; Floden, E.; Notredame, C. Multiple sequence alignment computation using the t-coffee regressive algorithm implementation. In Multiple Sequence Alignment: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2021; pp. 89–97. [Google Scholar]
- Wheeler, T.J.; Kececioglu, J.D. Multiple alignment by aligning alignments. Bioinformatics 2007, 23, i559–i568. [Google Scholar] [CrossRef] [PubMed]
- Mirarab, S.; Nguyen, N.; Warnow, T. PASTA: Ultra-large multiple sequence alignment. In Proceedings of the Research in Computational Molecular Biology: 18th Annual International Conference, RECOMB 2014, Pittsburgh, PA, USA, 2–5 April 2014; Proceedings 18. pp. 177–191. [Google Scholar]
- Nguyen, N.-P.; Mirarab, S.; Kumar, K.; Warnow, T. Ultra-large alignments using ensembles of hidden Markov models. In Proceedings of the Research in Computational Molecular Biology: 19th Annual International Conference, RECOMB 2015, Warsaw, Poland, 12–15 April 2015; Proceedings 19. pp. 259–260. [Google Scholar]
- Liu, K.; Linder, C.R.; Warnow, T. RAxML and FastTree: Comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS ONE 2011, 6, e27731. [Google Scholar] [CrossRef]
- Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 2009, 26, 1641–1650. [Google Scholar] [CrossRef]
- Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [Google Scholar] [CrossRef]
- Kozlov, A.M.; Darriba, D.; Flouri, T.; Morel, B.; Stamatakis, A. RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019, 35, 4453–4455. [Google Scholar] [CrossRef] [PubMed]
- Mirarab, S.; Warnow, T. ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 2015, 31, i44–i52. [Google Scholar] [CrossRef] [PubMed]
- Vachaspati, P.; Warnow, T. ASTRID: Accurate species trees from internode distances. BMC Genom. 2015, 16, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, L.-T.; Schmidt, H.A.; Von Haeseler, A.; Minh, B.Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef]
- Kapli, P.; Yang, Z.; Telford, M.J. Phylogenetic tree building in the genomic age. Nat. Rev. Genet. 2020, 21, 428–444. [Google Scholar] [CrossRef]
- Silva, M.; Machado, M.P.; Silva, D.N.; Rossi, M.; Moran-Gilad, J.; Santos, S.; Ramirez, M.; Carrico, J.A. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. Microb. Genom. 2018, 4, e000166. [Google Scholar] [CrossRef]
- Consortium, U. UniProt: A hub for protein information. Nucleic Acids Res. 2015, 43, D204–D212. [Google Scholar] [CrossRef] [PubMed]
- Kanehisa, M.; Sato, Y.; Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 2016, 428, 726–731. [Google Scholar] [CrossRef]
- Gevers, D.; Knight, R.; Petrosino, J.F.; Huang, K.; McGuire, A.L.; Birren, B.W.; Nelson, K.E.; White, O.; Methé, B.A.; Huttenhower, C. The Human Microbiome Project: A community resource for the healthy human microbiome. PLoS Biol. 2012, 10, e1001377. [Google Scholar] [CrossRef]
- Barbera, P.; Kozlov, A.M.; Czech, L.; Morel, B.; Darriba, D.; Flouri, T.; Stamatakis, A. EPA-ng: Massively parallel evolutionary placement of genetic sequences. Syst. Biol. 2019, 68, 365–369. [Google Scholar] [CrossRef] [PubMed]
- Schloss, P.D.; Westcott, S.L.; Ryabin, T.; Hall, J.R.; Hartmann, M.; Hollister, E.B.; Lesniewski, R.A.; Oakley, B.B.; Parks, D.H.; Robinson, C.J. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 2009, 75, 7537–7541. [Google Scholar] [CrossRef] [PubMed]
- Bolyen, E.; Rideout, J.R.; Dillon, M.R.; Bokulich, N.A.; Abnet, C.C.; Al-Ghalith, G.A.; Alexander, H.; Alm, E.J.; Arumugam, M.; Asnicar, F. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 2019, 37, 852–857. [Google Scholar] [CrossRef] [PubMed]
Product | Sampling Detail | DNA Isolation | References |
---|---|---|---|
Cheese | 1 g sample | PowerFood Microbial DNA Isolation Kit | [21,22,23] |
Kombucha | 50 mL sample | PureLink Microbiome DNA Purification Kit | [24,25,26] |
Kefir | 3–5 g sample | DNeasy PowerSoil Kit | [27,28] |
Yogurt | 10–20 mL sample | QIAamp Fast DNA Stool Mini Kit | [29,30] |
Kimchi | 3 mL sample | QIAamp Fast DNA Stool Mini Kit | [31,32,33] |
Sauerkraut | 1.2 mL sample | FastDNA™ S PIN kit for Soil | [34,35] |
Product | Target | Sequencing Platform | References |
---|---|---|---|
Cheese | 16S rRNA V3-V4 16S rRNAV4 MGS | Illumina MiSeq Illumina MiSeq Illumina HiSeq | [40,41,42,43] |
Kombucha | 16S rRNA V1-V9 MGS MGS 16S rRNA V1-V9 | Oxford Nanopore Technologies MinION Illumina HiSeq Illumina Novaseq 6000 Illumina NextSeq 500 | [44,45,46,47] |
Kefir | MGS 16S rRNA V3-V4 | Illumina HiSeq Illumina MiSeq | [48,49,50,51] |
Yogurt | MGS 16S rRNA V2, V4, V6, V7, V8, V9 16S rRNA V2-4-8, V3-7-9 | Illumina HiSeq Ion GeneStudio S5 Ion Torrent PGM | [52,53,54,55] |
Kimchi | 16S rRNA V3-V4 16S rRNA V1-V3 | Illumina MiSeq Roche 454 GS-FLX Plus | [56,57] |
Sauerkraut | 16S rRNA V3-V4 | Illumina MiSeq Illumina NovaSeq | [58,59,60,61] |
Program | Method (dBG—de Bruijn Graph; OLC—Overlap Layout Consensus) | Characteristic Feature | Publications |
---|---|---|---|
Genovo | OLC | It employs deep learning; randomly selects contigs for matching reads | [78] |
IDBA-UD | dBG | It breaks down the graph locally at each depth. | [72,74] |
MEGAHIT | dBG | K-mers split based on identification with reference genomes. | [76] |
Omega | OLC | Scaffolding using long reads; unmatched contigs are grouped based on coverage. | [85] |
Price | Hybrid | Identical reads are assembled first, followed by less similar ones. | [86] |
Ray | dBG | Distributed program connected to the network; profiles the microbiome based on unique labeled k-mers. | [77] |
SPAdes | dBG | The metaSPAdes extension utilizes stream processing to resolve the graph. | [75] |
Software | Orthology Analysis | Pangenome Construction | References |
---|---|---|---|
BPGA | CD-HIT, OrthoMCL | Power-law regression | [106] |
EDGAR 2.0 | Score ratio values | Heaps’ law | [107] |
GET_HOMOLOGUES | COGtriangles, OrgoMCL | Plot_pancore_matrix.pl | [108] |
PanWeb | PGAP | PGAP | [109] |
PGAP | MultiParanoid, Gene Family | Heaps’ law | [110] |
Roary | CD-HIT, BLAST, MCL | (Not mentioned) | [111] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sadurski, J.; Polak-Berecka, M.; Staniszewski, A.; Waśko, A. Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review. Foods 2024, 13, 2216. https://doi.org/10.3390/foods13142216
Sadurski J, Polak-Berecka M, Staniszewski A, Waśko A. Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review. Foods. 2024; 13(14):2216. https://doi.org/10.3390/foods13142216
Chicago/Turabian StyleSadurski, Jan, Magdalena Polak-Berecka, Adam Staniszewski, and Adam Waśko. 2024. "Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review" Foods 13, no. 14: 2216. https://doi.org/10.3390/foods13142216
APA StyleSadurski, J., Polak-Berecka, M., Staniszewski, A., & Waśko, A. (2024). Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review. Foods, 13(14), 2216. https://doi.org/10.3390/foods13142216