Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review
Abstract
:1. Motivation
2. Methods That Account for General Distribution Characteristics of Microbiome Data
2.1. Non-Negative Counts and Compositional Data
2.2. Zero-Inflation
2.3. Overdispersion
2.4. High Dimensionality
3. Challenges in Longitudinal Analysis of Microbiome Data
3.1. Challenges in Temporal Study Design and Sample Collection
3.2. Challenges in Appropriate Handling of Longitudinal Features in Microbiome Data
4. Methods for Preprocessing
4.1. Normalization
4.2. Variable Selection and Dimensionality Reduction
4.3. Interpolation Dealing Irregular Longitudinal Data
5. Statistical Models Suitable for Longitudinal Microbiome Data
5.1. Mixed Effect Models
5.2. ARIMA Models
5.3. State Space Models
5.4. Principal Trend Analysis Models
5.5. Generalized Lotka–Volterra Models
5.6. Bayesian Models
6. Downstream Analysis of Longitudinal Microbiome Data
6.1. Temporal Differential Abundance Testing
6.2. Time Series Clustering
6.3. Dynamic Interaction Network Analysis
6.4. Classification of Participants in Longitudinal Microbiome Studies
6.5. Other Microbiome Time-Series Related Analyses
7. Conclusions and Discussion
Author Contributions
Funding
Conflicts of Interest
References
- Stewart, C.J.; Embleton, N.D.; Clements, E.; Luna, P.N.; Smith, D.P.; Fofanova, T.Y.; Nelson, A.; Taylor, G.; Orr, C.H.; Petrosino, J.F.; et al. Cesarean or vaginal birth does not impact the longitudinal development of the gut microbiome in a cohort of exclusively preterm infants. Front. Microbiol. 2017, 8, 1008. [Google Scholar] [CrossRef] [PubMed]
- Lloyd-Price, J.; Arze, C.; Ananthakrishnan, A.N.; Schirmer, M.; Avila-Pacheco, J.; Poon, T.W.; Andrews, E.; Ajami, N.J.; Bonham, K.S.; Brislawn, C.J.; et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 2019, 569, 655–662. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Shan, G.; Sodergren, E.; Weinstock, G.; Walker, W.A.; Gregory, K.E. Longitudinal analysis of the premature infant intestinal microbiome prior to necrotizing enterocolitis: A case-control study. PLoS ONE 2015, 10, e0118632. [Google Scholar] [CrossRef] [PubMed]
- Dashper, S.; Mitchell, H.; Lê Cao, K.A.; Carpenter, L.; Gussy, M.; Calache, H.; Gladman, S.; Bulach, D.; Hoffmann, B.; Catmull, D.; et al. Temporal development of the oral microbiome and prediction of early childhood caries. Sci. Rep. 2019, 9, 19732. [Google Scholar] [CrossRef] [PubMed]
- Toivonen, L.; Schuez-Havupalo, L.; Karppinen, S.; Waris, M.; Hoffman, K.L.; Camargo, C.A., Jr.; Hasegawa, K.; Peltola, V. Antibiotic treatments during infancy, changes in nasal microbiota, and asthma development: Population-based cohort study. Clin. Infect. Dis. 2021, 72, 1546–1554. [Google Scholar] [CrossRef] [PubMed]
- Salosensaari, A.; Laitinen, V.; Havulinna, A.S.; Meric, G.; Cheng, S.; Perola, M.; Valsta, L.; Alfthan, G.; Inouye, M.; Watrous, J.D.; et al. Taxonomic signatures of cause-specific mortality risk in human gut microbiome. Nat. Commun. 2021, 12, 2671. [Google Scholar] [CrossRef] [PubMed]
- Cho, H.; Ren, Z.; Divaris, K.; Roach, J.; Lin, B.M.; Liu, C.; Azcarate-Peril, M.A.; Simancas-Pallares, M.A.; Shrestha, P.; Orlenko, A.; et al. Selenomonas sputigena acts as a pathobiont mediating spatial structure and biofilm virulence in early childhood caries. Nat. Commun. 2023, 14, 2919. [Google Scholar] [CrossRef]
- Sun, Z.; Pan, X.F.; Li, X.; Jiang, L.; Hu, P.; Wang, Y.; Ye, Y.; Wu, P.; Zhao, B.; Xu, J.; et al. The Gut Microbiome Dynamically Associates with Host Glucose Metabolism throughout Pregnancy: Longitudinal Findings from a Matched Case-Control Study of Gestational Diabetes Mellitus. Adv. Sci. 2023, 10, 2205289. [Google Scholar] [CrossRef]
- Bosch, A.A.; Piters, W.A.d.S.; van Houten, M.A.; Chu, M.L.J.; Biesbroek, G.; Kool, J.; Pernet, P.; de Groot, P.K.C.; Eijkemans, M.J.; Keijser, B.J.; et al. Maturation of the infant respiratory microbiota, environmental drivers, and health consequences. A prospective cohort study. Am. J. Respir. Crit. Care Med. 2017, 196, 1582–1590. [Google Scholar] [CrossRef]
- Weiss, S.; Xu, Z.Z.; Peddada, S.; Amir, A.; Bittinger, K.; Gonzalez, A.; Lozupone, C.; Zaneveld, J.R.; Vázquez-Baeza, Y.; Birmingham, A.; et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 2017, 5, 27. [Google Scholar] [CrossRef]
- Yang, L.; Chen, J. A comprehensive evaluation of microbial differential abundance analysis methods: Current status and potential solutions. Microbiome 2022, 10, 130. [Google Scholar] [CrossRef] [PubMed]
- Kodikara, S.; Ellul, S.; Lê Cao, K.A. Statistical challenges in longitudinal microbiome data analysis. Briefings Bioinform. 2022, 23, bbac273. [Google Scholar] [CrossRef] [PubMed]
- Gloor, G.B.; Wu, J.R.; Pawlowsky-Glahn, V.; Egozcue, J.J. It’s all relative: Analyzing microbiome data as compositions. Ann. Epidemiol. 2016, 26, 322–329. [Google Scholar] [CrossRef] [PubMed]
- Lutz, K.C.; Jiang, S.; Neugent, M.L.; De Nisco, N.J.; Zhan, X.; Li, Q. A survey of statistical methods for microbiome data analysis. Front. Appl. Math. Stat. 2022, 8, 884810. [Google Scholar] [CrossRef]
- Faust, K.; Lahti, L.; Gonze, D.; De Vos, W.M.; Raes, J. Metagenomics meets time series analysis: Unraveling microbial community dynamics. Curr. Opin. Microbiol. 2015, 25, 56–66. [Google Scholar] [CrossRef] [PubMed]
- Qu, Y.; Lyu, R.; Wang, D.; Butler, C.; Yap, P.T.; Zhu, H.; Dashper, S.; Ribeiro, A.A.; Divaris, K.; Wu, D. BGOB: A Novel Interpolation Model for Irregularly-Sampled Microbiome Data Based on ODE-Related Deep Learning Methods. 2023. Available online: https://github.com/Rachel-Lyu/BGOB_n_test (accessed on 27 December 2023).
- Prentice, R.L. Design issues in cohort studies. Stat. Methods Med Res. 1995, 4, 273–292. [Google Scholar] [CrossRef] [PubMed]
- Gloor, G.B.; Macklaim, J.M.; Pawlowsky-Glahn, V.; Egozcue, J.J. Microbiome datasets are compositional: And this is not optional. Front. Microbiol. 2017, 8, 2224. [Google Scholar] [CrossRef]
- Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B Methodol. 1982, 44, 139–160. [Google Scholar] [CrossRef]
- Gibson, T.; Gerber, G. Robust and scalable models of microbiome dynamics. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 June 2018; pp. 1763–1772. [Google Scholar]
- Cho, H.; Qu, Y.; Liu, C.; Tang, B.; Lyu, R.; Lin, B.M.; Roach, J.; Azcarate-Peril, M.A.; Aguiar Ribeiro, A.; Love, M.I.; et al. Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data. Briefings Bioinform. 2023, 24, bbad279. [Google Scholar] [CrossRef]
- Tsilimigras, M.C.; Fodor, A.A. Compositional data analysis of the microbiome: Fundamentals, tools, and challenges. Ann. Epidemiol. 2016, 26, 330–335. [Google Scholar] [CrossRef]
- Martin, T.G.; Wintle, B.A.; Rhodes, J.R.; Kuhnert, P.M.; Field, S.A.; Low-Choy, S.J.; Tyre, A.J.; Possingham, H.P. Zero tolerance ecology: Improving ecological inference by modelling the source of zero observations. Ecol. Lett. 2005, 8, 1235–1246. [Google Scholar] [CrossRef] [PubMed]
- Martín-Fernández, J.A.; Barceló-Vidal, C.; Pawlowsky-Glahn, V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 2003, 35, 253–278. [Google Scholar] [CrossRef]
- Chen, E.Z.; Li, H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 2016, 32, 2611–2617. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Yi, N. NBZIMM: Negative binomial and zero-inflated mixed models, with application to microbiome/metagenomics data analysis. BMC Bioinform. 2020, 21, 488. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Yi, N. Fast zero-inflated negative binomial mixed modeling approach for analyzing longitudinal metagenomics data. Bioinformatics 2020, 36, 2345–2351. [Google Scholar] [CrossRef] [PubMed]
- Rapaport, F.; Khanin, R.; Liang, Y.; Pirun, M.; Krek, A.; Zumbo, P.; Mason, C.E.; Socci, N.D.; Betel, D. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 2013, 14, R95. [Google Scholar] [CrossRef] [PubMed]
- Gardner, W.; Mulvey, E.P.; Shaw, E.C. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychol. Bull. 1995, 118, 392. [Google Scholar] [CrossRef]
- Liu, J.; Zhong, W.; Li, R. A selective overview of feature screening for ultrahigh-dimensional data. Sci. China Math. 2015, 58, 2033–2054. [Google Scholar] [CrossRef]
- Shaw, G.T.W.; Pao, Y.Y.; Wang, D. MetaMIS: A metagenomic microbial interaction simulator based on microbial community profiles. BMC Bioinform. 2016, 17, 488. [Google Scholar] [CrossRef]
- Treangen, T.J.; Ondov, B.D.; Koren, S.; Phillippy, A.M. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 2014, 15, 524. [Google Scholar] [CrossRef]
- Faust, K.; Raes, J. Microbial interactions: From networks to models. Nat. Rev. Microbiol. 2012, 10, 538–550. [Google Scholar] [CrossRef] [PubMed]
- Björk, J.R.; Dasari, M.; Grieneisen, L.; Archie, E.A. Primate microbiomes over time: Longitudinal answers to standing questions in microbiome research. Am. J. Primatol. 2019, 81, e22970. [Google Scholar] [CrossRef] [PubMed]
- Huttenhower, C.; Gevers, D.; Knight, R.; Abubucker, S.; Badger, J.H.; Chinwalla, A.T.; Creasy, H.H.; Earl, A.M.; FitzGerald, M.G.; Fulton, R.S.; et al. Structure, function and diversity of the healthy human microbiome. Nature 2012, 486, 207. [Google Scholar]
- Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65. [Google Scholar] [CrossRef] [PubMed]
- Claesson, M.J.; Cusack, S.; O’Sullivan, O.; Greene-Diniz, R.; de Weerd, H.; Flannery, E.; Marchesi, J.R.; Falush, D.; Dinan, T.; Fitzgerald, G.; et al. Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc. Natl. Acad. Sci. USA 2011, 108, 4586–4591. [Google Scholar] [CrossRef]
- Faith, J.J.; Guruge, J.L.; Charbonneau, M.; Subramanian, S.; Seedorf, H.; Goodman, A.L.; Clemente, J.C.; Knight, R.; Heath, A.C.; Leibel, R.L.; et al. The long-term stability of the human gut microbiota. Science 2013, 341, 1237439. [Google Scholar] [CrossRef]
- Flores, G.E.; Caporaso, J.G.; Henley, J.B.; Rideout, J.R.; Domogala, D.; Chase, J.; Leff, J.W.; Vázquez-Baeza, Y.; Gonzalez, A.; Knight, R.; et al. Temporal variability is a personalized feature of the human microbiome. Genome Biol. 2014, 15, 531. [Google Scholar] [CrossRef]
- Caporaso, J.G.; Lauber, C.L.; Costello, E.K.; Berg-Lyons, D.; Gonzalez, A.; Stombaugh, J.; Knights, D.; Gajer, P.; Ravel, J.; Fierer, N.; et al. Moving pictures of the human microbiome. Genome Biol. 2011, 12, R50. [Google Scholar] [CrossRef]
- Divaris, K.; Shungin, D.; Rodríguez-Cortés, A.; Basta, P.V.; Roach, J.; Cho, H.; Wu, D.; Ferreira Zandoná, A.G.; Ginnis, J.; Ramamoorthy, S.; et al. The supragingival biofilm in early childhood caries: Clinical and laboratory protocols and bioinformatics pipelines supporting metagenomics, metatranscriptomics, and metabolomics studies of the oral microbiome. Odontogenesis: Methods Protoc. 2019, 1922, 525–548. [Google Scholar]
- Gerber, G.K. The dynamic microbiome. FEBS Lett. 2014, 588, 4131–4139. [Google Scholar] [CrossRef]
- Dakos, V.; Beninca, E.; van Nes, E.H.; Philippart, C.J.; Scheffer, M.; Huisman, J. Interannual variability in species composition explained as seasonally entrained chaos. Proc. R. Soc. B Biol. Sci. 2009, 276, 2871–2880. [Google Scholar] [CrossRef] [PubMed]
- Costello, E.K.; Stagaman, K.; Dethlefsen, L.; Bohannan, B.J.; Relman, D.A. The application of ecological theory toward an understanding of the human microbiome. Science 2012, 336, 1255–1262. [Google Scholar] [CrossRef] [PubMed]
- Duncan, G.J.; Kalton, G. Issues of design and analysis of surveys across time. Int. Stat. Rev. 1987, 55, 97–117. [Google Scholar] [CrossRef]
- Vuran, M.C.; Akan, Ö.B.; Akyildiz, I.F. Spatio-temporal correlation: Theory and applications for wireless sensor networks. Comput. Netw. 2004, 45, 245–259. [Google Scholar] [CrossRef]
- Silverman, J.D.; Durand, H.K.; Bloom, R.J.; Mukherjee, S.; David, L.A. Dynamic linear models guide design and analysis of microbiota studies within artificial human guts. Microbiome 2018, 6, 202. [Google Scholar]
- Äijö, T.; Müller, C.L.; Bonneau, R. Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing. Bioinformatics 2018, 34, 372–380. [Google Scholar] [CrossRef] [PubMed]
- Joseph, T.A.; Pasarkar, A.P.; Pe’er, I. Efficient and accurate inference of mixed microbial population trajectories from longitudinal count data. Cell Syst. 2020, 10, 463–469. [Google Scholar] [CrossRef]
- Coenen, A.R.; Hu, S.K.; Luo, E.; Muratore, D.; Weitz, J.S. A primer for microbiome time-series analysis. Front. Genet. 2020, 11, 310. [Google Scholar] [CrossRef]
- Lin, H.; Peddada, S.D. Analysis of microbial compositions: A review of normalization and differential abundance analysis. NPJ Biofilms Microbiomes 2020, 6, 60. [Google Scholar] [CrossRef]
- Fernandes, A.D.; Reid, J.N.; Macklaim, J.M.; McMurrough, T.A.; Edgell, D.R.; Gloor, G.B. Unifying the analysis of high-throughput sequencing datasets: Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2014, 2, 15. [Google Scholar] [CrossRef]
- Yang, Y.H.; Dudoit, S.; Luu, P.; Speed, T.P. Normalization for cDNA microarry data. In Microarrays: Optical Technologies and Informatics; SPIE: Bellingham WA, USA, 2001; Volume 4266, pp. 141–152. [Google Scholar]
- Zhou, H.; He, K.; Chen, J.; Zhang, X. LinDA: Linear models for differential abundance analysis of microbiome compositional data. Genome Biol. 2022, 23, 95. [Google Scholar] [CrossRef] [PubMed]
- Robinson, M.D.; Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11, R25. [Google Scholar] [CrossRef] [PubMed]
- Paulson, J.N.; Stine, O.C.; Bravo, H.C.; Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 2013, 10, 1200–1202. [Google Scholar] [CrossRef]
- Willis, A.D. Rarefaction, alpha diversity, and statistics. Front. Microbiol. 2019, 10, 2407. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Meier, L.; Van De Geer, S.; Bühlmann, P. The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 53–71. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. A sparse-group lasso. J. Comput. Graph. Stat. 2013, 22, 231–245. [Google Scholar] [CrossRef]
- Bak, S. Generalized Linear Regression Model with LASSO, Group LASSO, and Sparse Group LASSO Regularization Methods for Finding Bacteria Associated with Colorectal Cancer Using Microbiome Data. Ph.D. Thesis, University of Guelph, Guelph, ON, USA, 2017. [Google Scholar]
- Borcard, D.; Legendre, P. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol. Model. 2002, 153, 51–68. [Google Scholar] [CrossRef]
- Bodein, A.; Chapleur, O.; Droit, A.; Lê Cao, K.A. A generic multivariate framework for the integration of microbiome longitudinal studies with other data types. Front. Genet. 2019, 10, 963. [Google Scholar] [CrossRef]
- Tataru, C.A.; David, M.M. Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease. PLoS Comput. Biol. 2020, 16, e1007859. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Oh, M.; Zhang, L. DeepMicro: Deep representation learning for disease prediction based on microbiome data. Sci. Rep. 2020, 10, 6026. [Google Scholar] [CrossRef] [PubMed]
- Shields-Cutler, R.R.; Al-Ghalith, G.A.; Yassour, M.; Knights, D. SplinectomeR enables group comparisons in longitudinal microbiome studies. Front. Microbiol. 2018, 9, 785. [Google Scholar] [CrossRef] [PubMed]
- Luo, D.; Ziebell, S.; An, L. An informative approach on differential abundance analysis for time-course metagenomic sequencing data. Bioinformatics 2017, 33, 1286–1292. [Google Scholar] [CrossRef] [PubMed]
- Chen, R.T.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural ordinary differential equations. arXiv Prepr. 2018, arXiv:1806.07366. [Google Scholar]
- Bokulich, N.A.; Dillon, M.R.; Zhang, Y.; Rideout, J.R.; Bolyen, E.; Li, H.; Albert, P.S.; Caporaso, J.G. q2-longitudinal: Longitudinal and paired-sample analyses of microbiome data. MSystems 2018, 3, e00219-18. [Google Scholar] [CrossRef] [PubMed]
- Ridenhour, B.J.; Brooker, S.L.; Williams, J.E.; Van Leuven, J.T.; Miller, A.W.; Dearing, M.D.; Remien, C.H. Modeling time-series data from microbial communities. ISME J. 2017, 11, 2526–2537. [Google Scholar] [CrossRef] [PubMed]
- Chen, I.; Kelkar, Y.D.; Gu, Y.; Zhou, J.; Qiu, X.; Wu, H. High-dimensional linear state space models for dynamic microbial interaction networks. PLoS ONE 2017, 12, e0187822. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Hu, J.; Blaser, M.J.; Li, H. Microbial trend analysis for common dynamic trend, group comparison, and classification in longitudinal microbiome study. BMC Genom. 2021, 22, 667. [Google Scholar] [CrossRef]
- Bucci, V.; Tzen, B.; Li, N.; Simmons, M.; Tanoue, T.; Bogart, E.; Deng, L.; Yeliseyev, V.; Delaney, M.L.; Liu, Q.; et al. MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses. Genome Biol. 2016, 17, 121. [Google Scholar] [CrossRef]
- Stein, R.R.; Bucci, V.; Toussaint, N.C.; Buffie, C.G.; Rätsch, G.; Pamer, E.G.; Sander, C.; Xavier, J.B. Ecological modeling from time-series inference: Insight into dynamics and stability of intestinal microbiota. PLoS Comput. Biol. 2013, 9, e1003388. [Google Scholar] [CrossRef]
- Kuntal, B.K.; Gadgil, C.; Mande, S.S. Web-gLV: A web based platform for lotka-volterra based modeling and simulation of microbial populations. Front. Microbiol. 2019, 10, 288. [Google Scholar] [CrossRef] [PubMed]
- Xia, Y.; Sun, J. Linear Mixed-Effects Models for Longitudinal Microbiome Data. In Bioinformatic and Statistical Analysis of Microbiome Data: From Raw Sequences to Advanced Modeling with QIIME 2 and R; Springer: Cham, Switzerland, 2023; pp. 557–586. [Google Scholar]
- Gałecki, A.; Burzykowski, T. Linear Mixed-Effects Models Using R; Springer: New York, NY, USA, 2013. [Google Scholar]
- Chen, C.; Liu, L.M. Forecasting time series with outliers. J. Forecast. 1993, 12, 13–35. [Google Scholar] [CrossRef]
- Chen, Z.; Brown, E.N. State space model. Scholarpedia 2013, 8, 30868. [Google Scholar] [CrossRef]
- Zhang, Y.; Davis, R. Principal trend analysis for time-course data with applications in genomic medicine. Ann. Appl. Stat. 2013, 7, 2205–2228. [Google Scholar] [CrossRef]
- Jeganathan, P.; Callahan, B.J.; Proctor, D.M.; Relman, D.A.; Holmes, S.P. The block bootstrap method for longitudinal microbiome data. arXiv Prepr. 2018, arXiv:1809.01832. [Google Scholar]
- Benincà, E.; Pinto, S.; Cazelles, B.; Fuentes, S.; Shetty, S.; Bogaards, J.A. Wavelet clustering analysis as a tool for characterizing community structure in the human microbiome. Sci. Rep. 2023, 13, 8042. [Google Scholar] [CrossRef] [PubMed]
- Jover, L.F.; Romberg, J.; Weitz, J.S. Inferring phage–bacteria infection networks from time-series data. R. Soc. Open Sci. 2016, 3, 160654. [Google Scholar] [CrossRef] [PubMed]
- Ruiz-Perez, D.; Lugo-Martinez, J.; Bourguignon, N.; Mathee, K.; Lerner, B.; Bar-Joseph, Z.; Narasimhan, G. Dynamic bayesian networks for integrating multi-omics time series microbiome data. Msystems 2021, 6, e01105-20. [Google Scholar] [CrossRef]
- Ai, D.; Li, X.; Liu, G.; Liang, X.; Xia, L.C. Constructing the Microbial Association Network from large-scale time series data using Granger causality. Genes 2019, 10, 216. [Google Scholar] [CrossRef]
- Mainali, K.; Bewick, S.; Vecchio-Pagan, B.; Karig, D.; Fagan, W.F. Detecting interaction networks in the human microbiome with conditional Granger causality. PLoS Comput. Biol. 2019, 15, e1007037. [Google Scholar] [CrossRef]
- Metwally, A.A.; Yu, P.S.; Reiman, D.; Dai, Y.; Finn, P.W.; Perkins, D.L. Utilizing longitudinal microbiome taxonomic profiles to predict food allergy via long short-term memory networks. PLoS Comput. Biol. 2019, 15, e1006693. [Google Scholar] [CrossRef] [PubMed]
- Sharma, D.; Xu, W. phyLoSTM: A novel deep learning model on disease prediction from longitudinal microbiome data. Bioinformatics 2021, 37, 3707–3714. [Google Scholar] [CrossRef] [PubMed]
- Shi, Y.; Zhang, L.; Peterson, C.B.; Do, K.A.; Jenq, R.R. Performance determinants of unsupervised clustering methods for microbiome data. Microbiome 2022, 10, 25. [Google Scholar] [CrossRef] [PubMed]
- Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Kassambara, A. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning; STHDA, 2017; Volume 1. [Google Scholar]
- Liao, T.W. Clustering of time series data—A survey. Pattern Recognit. 2005, 38, 1857–1874. [Google Scholar] [CrossRef]
- Kim, T.; Chen, I.R.; Lin, Y.; Wang, A.Y.Y.; Yang, J.Y.H.; Yang, P. Impact of similarity metrics on single-cell RNA-seq data clustering. Briefings Bioinform. 2019, 20, 2316–2326. [Google Scholar] [CrossRef]
- Holmes, I.; Harris, K.; Quince, C. Dirichlet multinomial mixtures: Generative models for microbial metagenomics. PLoS ONE 2012, 7, e30126. [Google Scholar] [CrossRef]
- McGeachie, M.J.; Chang, H.H.; Weiss, S.T. CGBayesNets: Conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 2014, 10, e1003676. [Google Scholar] [CrossRef]
- Steele, J.A.; Countway, P.D.; Xia, L.; Vigil, P.D.; Beman, J.M.; Kim, D.Y.; Chow, C.E.T.; Sachdeva, R.; Jones, A.C.; Schwalbach, M.S.; et al. Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J. 2011, 5, 1414–1425. [Google Scholar] [CrossRef]
- Gilbert, J.A.; Steele, J.A.; Caporaso, J.G.; Steinbrück, L.; Reeder, J.; Temperton, B.; Huse, S.; McHardy, A.C.; Knight, R.; Joint, I.; et al. Defining seasonal marine microbial community dynamics. ISME J. 2012, 6, 298–308. [Google Scholar] [CrossRef] [PubMed]
- Lo, C.; Marculescu, R. MetaNN: Accurate classification of host phenotypes from metagenomic data using neural networks. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington, DC, USA, 29 August–1 September 2018; pp. 608–609. [Google Scholar]
- Zhou, Y.H.; Gallins, P. A review and tutorial of machine learning methods for microbiome host trait prediction. Front. Genet. 2019, 10, 579. [Google Scholar] [CrossRef] [PubMed]
- O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
- Fisher, C.K.; Mehta, P. Identifying keystone species in the human gut microbiome from metagenomic timeseries using sparse linear regression. PLoS ONE 2014, 9, e102451. [Google Scholar] [CrossRef] [PubMed]
- Hernández Medina, R.; Kutuzova, S.; Nielsen, K.N.; Johansen, J.; Hansen, L.H.; Nielsen, M.; Rasmussen, S. Machine learning and deep learning applications in microbiome research. ISME Commun. 2022, 2, 98. [Google Scholar] [CrossRef]
- Wang, Q.; Wang, K.; Wu, W.; Giannoulatou, E.; Ho, J.W.; Li, L. Host and microbiome multi-omics integration: Applications and methodologies. Biophys. Rev. 2019, 11, 55–65. [Google Scholar] [CrossRef]
- Park, S.Y.; Ufondu, A.; Lee, K.; Jayaraman, A. Emerging computational tools and models for studying gut microbiota composition and function. Curr. Opin. Biotechnol. 2020, 66, 301–311. [Google Scholar] [CrossRef]
Method | Category | Equation | Brief Description | Use Cases & Limitations |
---|---|---|---|---|
ALDEx2 [52] | Log-ratio | Converts observed abundances to log-ratios with a reference taxon. | Good for compositional data; dependent on choice of reference taxon. | |
CLR (Centered Log-Ratio) [53] | Log-ratio | Subtracts log of mean abundance from log of individual taxon abundance. | Popular in microbiome analysis; assumes constant sum across samples. | |
Z-score (Standard Score) | Scaling | Normalizes data to mean 0 and standard deviation 1. | Useful for Gaussian distribution assumptions; sensitive to outliers. | |
MED (Median Normalization) [53] | Scaling | Uses median intensity within a sample for scaling. | Robust against outliers; simple and effective. | |
UQ (Upper Quartile) | Scaling | Scales based on the 75th percentile (Q3) intensity. | Useful for range variation; does not adjust for compositional data. | |
TMM (Trimmed Mean of M-values) [55] | Scaling | Adjusts gene expression ratios, trimming extreme values for mean calculation. | Good for RNA-Seq data; may not be ideal for all microbiome data types. | |
TSS (Total Sum Scaling) | Scaling | Standardizes total feature intensities to a fixed value T across samples. | Ideal for studies focusing on total microbial load or gene expression levels; can be skewed by high-abundance features. | |
CSS (Cumulative Sum Scaling) [56] | Scaling | Scales data to a consistent cumulative sum (C) based on the dataset characteristics. Adjusts for varying signal intensities/sample depths. | Suitable for datasets with varying sequencing depths; preserves relative differences in lower abundance features. Doesn’t account for compositional nature. |
Models | Brief Description | Examples |
---|---|---|
Mixed effect models | Models handling population-level trends (fixed effects) and individual variations (random effects) | Bokulich et al. [70], Chen et al. [25], Zhang et al. [26], Zhang et al. [27] |
ARIMA models | Models combining autoregressive (AR) terms, differencing (I), and moving average (MA) terms | Benjamin et al. [71] |
State space models | Probabilistic graphical models that describe the probabilistic dependence between latent state variables and observed measurements | Chen et al. [72] |
Principal trend analysis models | Models that are used to identify and assess the main trends in a dataset over time or across different conditions | Wang et al. [73] |
Generalized Lotka–Volterra models | Models expressed like Equations allowing dynamic interaction among taxa (2) | Bucci et al. [74], Stein et al. [75], Gibson et al. [20], Shaw et al. [31], Kuntal et al. [76] |
Bayesian models | Models using Bayesian methods for parameter inference | Äijö et al. [48], Silverman et al. [47], Joseph et al. [49] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lyu, R.; Qu, Y.; Divaris, K.; Wu, D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes 2024, 15, 51. https://doi.org/10.3390/genes15010051
Lyu R, Qu Y, Divaris K, Wu D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes. 2024; 15(1):51. https://doi.org/10.3390/genes15010051
Chicago/Turabian StyleLyu, Ruiqi, Yixiang Qu, Kimon Divaris, and Di Wu. 2024. "Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review" Genes 15, no. 1: 51. https://doi.org/10.3390/genes15010051
APA StyleLyu, R., Qu, Y., Divaris, K., & Wu, D. (2024). Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes, 15(1), 51. https://doi.org/10.3390/genes15010051