Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations
Abstract
:1. Introduction
2. Materials and Methods
2.1. Microbial Data Processing
2.2. Evaluation Scheme
2.2.1. Source and Target Data Handling
2.2.2. Training and Testing Data
2.2.3. Machine Learning Scheme and Evaluations
2.3. Machine Learning Models
2.3.1. Logistic Regression
2.3.2. Random Forest
2.3.3. Support Vector Machine
2.4. Statistical Analysis
2.5. Evaluation Metrics
2.6. Machine Learning Model Baseline
3. Results
3.1. Microbial Differences among Study and Diagnosis
3.2. Microbial Inter and Intra-Diversity among Studies and Phenotypes
3.3. Impact of Percentage of Shared Features/Taxa between Source and Target
3.4. Impact of Percentage of Target in Source Data Set
3.5. Machine Learning Performance
3.6. Random Forest Top Predictors
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wilkinson, J.E.; Franzosa, E.A.; Everett, C.; Li, C.; Bae, S.; Berzansky, I.; Bhosle, A.; Bjørnevik, K.; Brennan, C.A.; Cao, Y.G.; et al. A Framework for Microbiome Science in Public Health. Nat. Med. 2021, 27, 766–774. [Google Scholar] [CrossRef] [PubMed]
- Lloyd-Price, J.; Arze, C.; Ananthakrishnan, A.N.; Schirmer, M.; Avila-Pacheco, J.; Poon, T.W.; Andrews, E.; Ajami, N.J.; Bonham, K.S.; Brislawn, C.J.; et al. Multi-Omics of the Gut Microbial Ecosystem in Inflammatory Bowel Diseases. Nature 2019, 569, 655–662. [Google Scholar] [CrossRef]
- Glassner, K.L.; Abraham, B.P.; Quigley, E.M.M. The Microbiome and Inflammatory Bowel Disease. J. Allergy Clin. Immunol. 2020, 145, 16–27. [Google Scholar] [CrossRef] [PubMed]
- Caparrós, E.; Wiest, R.; Scharl, M.; Rogler, G.; Gutiérrez Casbas, A.; Yilmaz, B.; Wawrzyniak, M.; Francés, R. Dysbiotic Microbiota Interactions in Crohn’s Disease. Gut Microbes 2021, 13, 1949096. [Google Scholar] [CrossRef]
- Gevers, D.; Kugathasan, S.; Denson, L.A.; Vázquez-Baeza, Y.; Van Treuren, W.; Ren, B.; Schwager, E.; Knights, D.; Song, S.J.; Yassour, M.; et al. The Treatment-Naive Microbiome in New-Onset Crohn’s Disease. Cell Host Microbe 2014, 15, 382–392. [Google Scholar] [CrossRef]
- Baxter, N.T.; Ruffin, M.T., IV; Rogers, M.A.M.; Schloss, P.D. Microbiota-Based Model Improves the Sensitivity of Fecal Immunochemical Test for Detecting Colonic Lesions. Genome Med. 2016, 8, 37. [Google Scholar] [CrossRef]
- Zeller, G.; Tap, J.; Voigt, A.Y.; Sunagawa, S.; Kultima, J.R.; Costea, P.I.; Amiot, A.; Böhm, J.; Brunetti, F.; Habermann, N.; et al. Potential of Fecal Microbiota for Early-Stage Detection of Colorectal. Mol. Syst. Biol. 2014, 10, 766. [Google Scholar] [CrossRef] [PubMed]
- Cai, C.; Zhang, X.; Liu, Y.; Shen, E.; Feng, Z.; Guo, C.; Han, Y.; Ouyang, Y.; Shen, H. Gut Microbiota Imbalance in Colorectal Cancer Patients, the Risk Factor of COVID-19 Mortality. Gut Pathog. 2021, 13, 70. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.-H.; Sun, G. Improve the Colorectal Cancer Diagnosis Using Gut Microbiome Data. Front. Mol. Biosci. 2022, 9, 921945. [Google Scholar] [CrossRef] [PubMed]
- Routy, B.; Le Chatelier, E.; Derosa, L.; Duong, C.P.M.; Alou, M.T.; Daillère, R.; Fluckiger, A.; Messaoudene, M.; Rauber, C.; Roberti, M.P.; et al. Gut Microbiome Influences Efficacy of PD-1–Based Immunotherapy against Epithelial Tumors. Science 2018, 359, 91–97. [Google Scholar] [CrossRef] [Green Version]
- Peters, B.A.; Wilson, M.; Moran, U.; Pavlick, A.; Izsak, A.; Wechter, T.; Weber, J.S.; Osman, I.; Ahn, J. Relating the Gut Metagenome and Metatranscriptome to Immunotherapy Responses in Melanoma Patients. Genome Med. 2019, 11, 61. [Google Scholar] [CrossRef]
- Gopalakrishnan, V.; Spencer, C.N.; Nezi, L.; Reuben, A.; Andrews, M.C.; Karpinets, T.V.; Prieto, P.A.; Vicente, D.; Hoffman, K.; Wei, S.C.; et al. Gut Microbiome Modulates Response to Anti–PD-1 Immunotherapy in Melanoma Patients. Science 2018, 359, 97–103. [Google Scholar] [CrossRef]
- Matson, V.; Fessler, J.; Bao, R.; Chongsuwat, T.; Zha, Y.; Alegre, M.-L.; Luke, J.J.; Gajewski, T.F. The Commensal Microbiome Is Associated with Anti–PD-1 Efficacy in Metastatic Melanoma Patients. Science 2018, 359, 104–108. [Google Scholar] [CrossRef]
- Frankel, A.E.; Coughlin, L.A.; Kim, J.; Froehlich, T.W.; Xie, Y.; Frenkel, E.P.; Koh, A.Y. Metagenomic Shotgun Sequencing and Unbiased Metabolomic Profiling Identify Specific Human Gut Microbiota and Metabolites Associated with Immune Checkpoint Therapy Efficacy in Melanoma Patients. Neoplasia 2017, 19, 848–855. [Google Scholar] [CrossRef]
- Zhou, Y.-H.; Gallins, P. A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction. Front. Genet. 2019, 10, 579. [Google Scholar] [CrossRef]
- Song, K.; Wright, F.A.; Zhou, Y.-H.H. Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction. Front. Mol. Biosci. 2020, 7, 610845. [Google Scholar] [CrossRef] [PubMed]
- Carrieri, A.P.; Haiminen, N.; Parida, L. Host Phenotype Prediction from Differentially Abundant Microbes Using RoDEO. In Computational Intelligence Methods for Bioinformatics and Biostatistics, Proceedings of the 13th International Meeting, CIBB 2016, Stirling, UK, 1–3 September 2016; Springer International Publishing: Cham, Switzerland, 2017; Volume 10477, pp. 27–41. [Google Scholar] [CrossRef]
- Mo, Z.; Huang, P.; Yang, C.; Xiao, S.; Zhang, G.; Ling, F.; Li, L. Meta-Analysis of 16S RRNA Microbial Data Identified Distinctive and Predictive Microbiota Dysbiosis in Colorectal Carcinoma Adjacent Tissue. mSystems 2020, 5, e00138-20. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.-H.; Brooks, P.; Wang, X. A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research. Stat. Biosci. 2018, 10, 41–58. [Google Scholar] [CrossRef] [PubMed]
- Hu, T.; Gallins, P.; Zhou, Y.-H. A Zero-Inflated Beta-Binomial Model for Microbiome Data Analysis. Stat 2018, 7, e185. [Google Scholar] [CrossRef] [PubMed]
- Kokol, P.; Kokol, M.; Zagoranski, S. Machine Learning on Small Size Samples: A Synthetic Knowledge Synthesis. Sci. Prog. 2022, 105, 003685042110297. [Google Scholar] [CrossRef]
- Roguet, A.; Eren, A.M.; Newton, R.J.; McLellan, S.L. Fecal Source Identification Using Random Forest. Microbiome 2018, 6, 185. [Google Scholar] [CrossRef] [PubMed]
- Ai, D.; Pan, H.; Han, R.; Li, X.; Liu, G.; Xia, L.C. Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer. Genes 2019, 10, 112. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Zhu, Z.; Sun, F. Increasing Prediction Performance of Colorectal Cancer Disease Status Using Random Forests Classification Based on Metagenomic Shotgun Sequencing Data. Synth. Syst. Biotechnol. 2022, 7, 574–585. [Google Scholar] [CrossRef]
- Thomas, A.M.; Manghi, P.; Asnicar, F.; Pasolli, E.; Armanini, F.; Zolfo, M.; Beghini, F.; Manara, S.; Karcher, N.; Pozzi, C.; et al. Metagenomic Analysis of Colorectal Cancer Datasets Identifies Cross-Cohort Microbial Diagnostic Signatures and a Link with Choline Degradation. Nat. Med. 2019, 25, 667–678. [Google Scholar] [CrossRef] [PubMed]
- Wiens, J.; Guttag, J.; Horvitz, E. A Study in Transfer Learning: Leveraging Data from Multiple Hospitals to Enhance Hospital-Specific Predictions. J. Am. Med. Inform. Assoc. 2014, 21, 699–706. [Google Scholar] [CrossRef]
- Gong, J.J.; Sundt, T.M.; Rawn, J.D.; Guttag, J.V. Instance Weighting for Patient-Specific Risk Stratification Models. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; ACM: New York, NY, USA, 2015; pp. 369–378. [Google Scholar]
- Song, K.; Zhou, Y.-H. C3NA: Correlation and Consensus-Based Cross-Taxonomy Network Analysis for Compositional Microbial Data. BMC Bioinform. 2022, 23, 468. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Kaul, A.; Mandal, S.; Davidov, O.; Peddada, S.D. Analysis of Microbiome Data in the Presence of Excess Zeros. Front. Microbiol. 2017, 8, 2114. [Google Scholar] [CrossRef]
- Bokulich, N.A.; Kaehler, B.D.; Rideout, J.R.; Dillon, M.; Bolyen, E.; Knight, R.; Huttley, G.A.; Gregory Caporaso, J. Optimizing Taxonomic Classification of Marker-Gene Amplicon Sequences with QIIME 2’s Q2-Feature-Classifier Plugin. Microbiome 2018, 6, 90. [Google Scholar] [CrossRef]
- Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson, A.J.A.; Holmes, S.P. DADA2: High-Resolution Sample Inference from Illumina Amplicon Data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef]
- Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glöckner, F.O. The SILVA Ribosomal RNA Gene Database Project: Improved Data Processing and Web-Based Tools. Nucleic Acids Res. 2013, 41, D590–D596. [Google Scholar] [CrossRef] [PubMed]
- Bolyen, E.; Rideout, J.R.; Dillon, M.R.; Bokulich, N.A.; Abnet, C.C.; Al-Ghalith, G.A.; Alexander, H.; Alm, E.J.; Arumugam, M.; Asnicar, F.; et al. Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using QIIME 2. Nat. Biotechnol. 2019, 37, 852–857. [Google Scholar] [CrossRef] [PubMed]
- Limeta, A. Meta Analysis of Gut Microbiome Composition in Patients Undergoing Immunotherapy. Available online: https://github.com/angelolimeta/Gut-microbiome-immunotherapy (accessed on 18 November 2022).
- Navas-Molina, J.A.; Peralta-Sánchez, J.M.; González, A.; McMurdie, P.J.; Vázquez-Baeza, Y.; Xu, Z.; Ursell, L.K.; Lauber, C.; Zhou, H.; Song, S.J.; et al. Advancing Our Understanding of the Human Microbiome Using QIIME. Methods Enzymol. 2013, 531, 371. [Google Scholar] [CrossRef]
- Lin, H.; Peddada, S. Das Analysis of Compositions of Microbiomes with Bias Correction. Nat. Commun. 2020, 11, 3514. [Google Scholar] [CrossRef] [PubMed]
- Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: https://www.R-project.org/ (accessed on 18 November 2022).
- Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
- Oksanen, J.; Blanchet, F.G.; Kindt, R.; Legendre, P.; Minchin, P.R.; O’hara, R.B.; Simpson, G.L.; Solymos, P.; Stevens, M.H.H.; Wagner, H. Vegan: Community Ecology Package. R Package Version 2.0-10. J. Stat. Softw. 2013. Available online: https://github.com/vegandevs/vegan (accessed on 18 November 2022).
- McMurdie, P.J.; Holmes, S. Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE 2013, 8, e61217. [Google Scholar] [CrossRef]
- Michie, M.G. Use of the Bray-Curtis Similarity Measure in Cluster Analysis of Foraminiferal Data. J. Int. Assoc. Math. Geol. 1982, 14, 661–667. [Google Scholar] [CrossRef]
- Anderson, M.J. Distance-Based Tests for Homogeneity of Multivariate Dispersions. Biometrics 2006, 62, 245–253. [Google Scholar] [CrossRef] [PubMed]
- Warton, D.I.; Wright, S.T.; Wang, Y. Distance-Based Multivariate Analyses Confound Location and Dispersion Effects. Methods Ecol. Evol. 2012, 3, 89–101. [Google Scholar] [CrossRef]
- Kuhn, M. Caret: Classification and Regression Training, ascl-1505; Astrophysics Source Code Library: Houghton, MI, USA, 2015. [Google Scholar]
- Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. PROC: An Open-Source Package for R and S+ to Analyze and Compare ROC Curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in {P}ython. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Cao, Q.; Sun, X.; Rajesh, K.; Chalasani, N.; Gelow, K.; Katz, B.; Shah, V.H.; Sanyal, A.J.; Smirnova, E. Effects of Rare Microbiome Taxa Filtering on Statistical Analysis. Front. Microbiol. 2021, 11, 607325. [Google Scholar] [CrossRef]
- Tindall, B.J.; Rosselló-Móra, R.; Busse, H.-J.; Ludwig, W.; Kämpfer, P. Notes on the Characterization of Prokaryote Strains for Taxonomic Purposes. Int. J. Syst. Evol. Microbiol. 2010, 60, 249–266. [Google Scholar] [CrossRef]
- Nikolaidis, M.; Mossialos, D.; Oliver, S.G.; Amoutzias, G.D. Comparative Analysis of the Core Proteomes among the Pseudomonas Major Evolutionary Groups Reveals Species-Specific Adaptations for Pseudomonas Aeruginosa and Pseudomonas Chlororaphis. Diversity 2020, 12, 289. [Google Scholar] [CrossRef]
- Nikolaidis, M.; Hesketh, A.; Mossialos, D.; Iliopoulos, I.; Oliver, S.G.; Amoutzias, G.D. A Comparative Analysis of the Core Proteomes within and among the Bacillus Subtilis and Bacillus Cereus Evolutionary Groups Reveals the Patterns of Lineage- and Species-Specific Adaptations. Microorganisms 2022, 10, 1720. [Google Scholar] [CrossRef]
- Sun, C.; Li, B.; Wang, B.; Zhao, J.; Zhang, X.; Li, T.; Li, W.; Tang, D.; Qiu, M.; Wang, X.; et al. The Role of Fusobacterium Nucleatum in Colorectal Cancer: From Carcinogenesis to Clinical Management. Chronic Dis. Transl. Med. 2019, 5, 178–187. [Google Scholar] [CrossRef] [PubMed]
- Abed, J.; Maalouf, N.; Manson, A.L.; Earl, A.M.; Parhi, L.; Emgård, J.E.M.; Klutstein, M.; Tayeb, S.; Almogy, G.; Atlan, K.A.; et al. Colon Cancer-Associated Fusobacterium Nucleatum May Originate From the Oral Cavity and Reach Colon Tumors via the Circulatory System. Front. Cell. Infect. Microbiol. 2020, 10, 400. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Yang, M.; Wang, D.; Zhang, S.; Yan, S.; Zhu, Y.; Chen, W. Alteration of the abundance of Parvimonas micra in the gut along the adenoma-carcinoma sequence. Oncol. Lett. 2020, 20, 106. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Zhang, X.; Zhou, Y.; Fu, K.; Lau, H.C.-H.; Chun, T.W.-Y.; Cheung, A.H.-K.; Coker, O.O.; Wei, H.; Wu, W.K.-K.; et al. Parvimonas Micra Promotes Colorectal Tumorigenesis and Is Associated with Prognosis of Colorectal Cancer Patients. Oncogene 2022, 41, 4200–4210. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Y.; Ling, Z.; Li, L. The Intestinal Microbiota and Colorectal Cancer. Front. Immunol. 2020, 11, 3100. [Google Scholar] [CrossRef]
- Mu, W.; Jia, Y.; Chen, X.; Li, H.; Wang, Z.; Cheng, B. Intracellular Porphyromonas Gingivalis Promotes the Proliferation of Colorectal Cancer Cells via the MAPK/ERK Signaling Pathway. Front. Cell. Infect. Microbiol. 2020, 10, 584798. [Google Scholar] [CrossRef] [PubMed]
- Okumura, S.; Konishi, Y.; Narukawa, M.; Sugiura, Y.; Yoshimoto, S.; Arai, Y.; Sato, S.; Yoshida, Y.; Tsuji, S.; Uemura, K.; et al. Gut Bacteria Identified in Colorectal Cancer Patients Promote Tumourigenesis via Butyrate Secretion. Nat. Commun. 2021, 12, 5674. [Google Scholar] [CrossRef] [PubMed]
- Olendzki, B.; Bucci, V.; Cawley, C.; Maserati, R.; McManus, M.; Olednzki, E.; Madziar, C.; Chiang, D.; Ward, D.V.; Pellish, R.; et al. Dietary Manipulation of the Gut Microbiome in Inflammatory Bowel Disease Patients: Pilot Study. Gut Microbes 2022, 14, 2046244. [Google Scholar] [CrossRef]
- Takahashi, K.; Nishida, A.; Fujimoto, T.; Fujii, M.; Shioya, M.; Imaeda, H.; Inatomi, O.; Bamba, S.; Andoh, A.; Sugimoto, M. Reduced Abundance of Butyrate-Producing Bacteria Species in the Fecal Microbial Community in Crohn’s Disease. Digestion 2016, 93, 59–65. [Google Scholar] [CrossRef]
- Moustafa, A.; Li, W.; Anderson, E.L.; Wong, E.H.M.; Dulai, P.S.; Sandborn, W.J.; Biggs, W.; Yooseph, S.; Jones, M.B.; Venter, C.J.; et al. Genetic Risk, Dysbiosis, and Treatment Stratification Using Host Genome and Gut Microbiome in Inflammatory Bowel Disease. Clin. Transl. Gastroenterol. 2018, 9, e132. [Google Scholar] [CrossRef]
- Rapozo, D.C.M.; Bernardazzi, C.; de Souza, H.S.P. Diet and Microbiota in Inflammatory Bowel Disease: The Gut in Disharmony. World J. Gastroenterol. 2017, 23, 2124. [Google Scholar] [CrossRef]
- Wright, E.K.; Kamm, M.A.; Wagner, J.; Teo, S.-M.; Cruz, P.D.; Hamilton, A.L.; Ritchie, K.J.; Inouye, M.; Kirkwood, C.D. Microbial Factors Associated with Postoperative Crohn’s Disease Recurrence. J. Crohn’s Colitis 2017, 11, 191–203. [Google Scholar] [CrossRef] [PubMed]
- Forbes, J.D.; Chen, C.; Knox, N.C.; Marrie, R.-A.; El-Gabalawy, H.; de Kievit, T.; Alfa, M.; Bernstein, C.N.; Van Domselaar, G. A Comparative Study of the Gut Microbiota in Immune-Mediated Inflammatory Diseases—Does a Common Dysbiosis Exist? Microbiome 2018, 6, 221. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Thomas, J.P.; Modos, D.; Rushbrook, S.M.; Powell, N.; Korcsmaros, T. The Emerging Role of Bile Acids in the Pathogenesis of Inflammatory Bowel Disease. Front. Immunol. 2022, 13, 829525. [Google Scholar] [CrossRef] [PubMed]
- Cook, D.P.; Gysemans, C.; Mathieu, C. Lactococcus Lactis As a Versatile Vehicle for Tolerogenic Immunotherapy. Front. Immunol. 2018, 8, 1961. [Google Scholar] [CrossRef]
- Lee, B.; Lee, J.; Woo, M.-Y.; Lee, M.J.; Shin, H.-J.; Kim, K.; Park, S. Modulation of the Gut Microbiota Alters the Tumour-Suppressive Efficacy of Tim-3 Pathway Blockade in a Bacterial Species- and Host Factor-Dependent Manner. Microorganisms 2020, 8, 1395. [Google Scholar] [CrossRef]
- Aarnoutse, R.; Ziemons, J.; Penders, J.; Rensen, S.S.; de Vos-Geelen, J.; Smidt, M.L. The Clinical Link between Human Intestinal Microbiota and Systemic Cancer Therapy. Int. J. Mol. Sci. 2019, 20, 4145. [Google Scholar] [CrossRef]
- Zheng, Y.; Wang, T.; Tu, X.; Huang, Y.; Zhang, H.; Tan, D.; Jiang, W.; Cai, S.; Zhao, P.; Song, R.; et al. Gut Microbiome Affects the Response to Anti-PD-1 Immunotherapy in Patients with Hepatocellular Carcinoma. J. Immunother. Cancer 2019, 7, 193. [Google Scholar] [CrossRef]
- Huang, J.; Liu, D.; Wang, Y.; Liu, L.; Li, J.; Yuan, J.; Jiang, Z.; Jiang, Z.; Hsiao, W.W.; Liu, H.; et al. Ginseng Polysaccharides Alter the Gut Microbiota and Kynurenine/Tryptophan Ratio, Potentiating the Antitumour Effect of Antiprogrammed Cell Death 1/Programmed Cell Death Ligand 1 (Anti-PD-1/PD-L1) Immunotherapy. Gut 2022, 71, 734–745. [Google Scholar] [CrossRef]
Phenotypes | Study and Design Information | Number of Taxa Categories | Taxonomic Level | ||||||
---|---|---|---|---|---|---|---|---|---|
Phylum | Class | Order | Family | Genus | Species | Stacked-Taxa | |||
Colorectal Cancer | Baxter et al. (Source) N = 261 | Unique Taxa Count | 19 | 32 | 73 | 135 | 373 | 387 | 1019 |
Filtered Taxa Count | 11 | 16 | 41 | 68 | 170 | 75 | 381 | ||
Zeller et al. (Target) N = 91 | Unique Taxa Count | 22 | 39 | 91 | 159 | 400 | 384 | 1095 | |
Filtered Taxa Count | 13 | 19 | 53 | 92 | 207 | 107 | 491 | ||
Number of Shared Taxa between Target and Source Datasets for Colorectal Cancer Studies | 11 | 16 | 40 | 66 | 156 | 65 | 354 | ||
Crohn’s Disease | Gevers et al. (Source) N = 1052 | Unique Taxa Count | 34 | 72 | 180 | 311 | 727 | 554 | 1878 |
Filtered Taxa Count | 9 | 12 | 34 | 53 | 117 | 34 | 259 | ||
IBDMDB (Target) N = 128 | Unique Taxa Count | 35 | 70 | 162 | 247 | 489 | 315 | 1318 | |
Filtered Taxa Count | 12 | 17 | 40 | 68 | 142 | 44 | 323 | ||
Number of Shared Taxa between Target and Source Datasets for Crohn’s Disease Studies | 9 | 12 | 34 | 53 | 113 | 31 | 252 | ||
Immunotherapy Responses | Routy et al. (Source) N = 127 | Unique Taxa Count | 14 | 31 | 49 | 83 | 191 | 595 | 963 |
Filtered Taxa Count | 12 | 23 | 33 | 55 | 110 | 251 | 484 | ||
Peters et al. (Target) N = 27 | Unique Taxa Count | 13 | 26 | 41 | 69 | 155 | 409 | 713 | |
Filtered Taxa Count | 12 | 24 | 35 | 60 | 110 | 261 | 502 | ||
Number of Shared Taxa between Target and Source Datasets for Immunotheapy Response Studies | 12 | 22 | 32 | 53 | 97 | 221 | 437 | ||
Gopalakrishnan et al. (Target) N = 25 | Unique Taxa Count | 13 | 27 | 38 | 64 | 133 | 342 | 617 | |
Filtered Taxa Count | 9 | 17 | 25 | 42 | 89 | 195 | 377 | ||
Number of Shared Taxa between Target and Source Datasets for Immunotherapy Response Studies | 9 | 17 | 25 | 40 | 82 | 180 | 353 | ||
Matson et al. (Target) N = 39 | Unique Taxa Count | 13 | 24 | 39 | 68 | 154 | 482 | 780 | |
Filtered Taxa Count | 12 | 22 | 34 | 60 | 113 | 261 | 502 | ||
Number of Shared Taxa between Target and Source Dataset for Immunotherapy Response Studies | 12 | 20 | 30 | 51 | 96 | 204 | 413 | ||
Frankel et al. (Target) N = 39 | Unique Taxa Count | 14 | 28 | 43 | 75 | 175 | 533 | 868 | |
Filtered Taxa Count | 11 | 23 | 36 | 60 | 129 | 317 | 576 | ||
Number of Shared Taxa between Target and Source Dataset for Immunotherapy Response Studies | 11 | 20 | 30 | 51 | 100 | 216 | 428 |
Disease of Interest | Taxonomic Levels | AUROC (AUROC Improvement from the Baseline Models) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Logistic | Logistic—L2 | Logistic—L2 with Weighted Instance: Euclidean | Logistic—L2 with Weighted Clusters (k = 3) | Random Forest with Case Weights (WI: Euclidean) | Random Forest with Feature Selections | Random Forest | Support Vector Machine | |||
50% of Target Samples in Source Data | Colorectal Cancer | Genus | 0.621(+0.031) | 0.605(+0.112) | 0.602(+0.109) | 0.576(+0.083) | 0.712(+0.075) | 0.753(+0.020) | 0.789(+0.042) | 0.697(+0.050) |
Stacked-Taxa | 0.574(+0.077) | 0.607(+0.096) | 0.606(+0.095) | 0.610(+0.098) | 0.743(+0.055) | 0.773(+0.022) | 0.807(+0.031) | 0.718(+0.043) | ||
Crohn’s Disease | Genus | 0.756(+0.012) | 0.515(+0.024) | 0.545(+0.055) | 0.554(+0.064) | 0.783(+0.073) | 0.842(+0.068) | 0.841(+0.064) | 0.784(+0.050) | |
Stacked-Taxa | 0.750(+0.034) | 0.500(0.000) | 0.567(+0.067) | 0.500(0.000) | 0.775(+0.086) | 0.825(+0.075) | 0.830(+0.074) | 0.825(+0.043) | ||
Immunotherapy Dataset 1 | Genus | 0.601(+0.056) | 0.500(+0.010) | 0.499(+0.009) | 0.500(+0.010) | 0.503(+0.003) | 0.683(+0.040) | 0.649(+0.015) | 0.591(+0.039) | |
Stacked-Taxa | 0.595(+0.043) | 0.498(−0.005) | 0.506(+0.003) | 0.499(−0.004) | 0.498(+0.003) | 0.686(+0.103) | 0.631(−0.017) | 0.624(+0.104) | ||
Immunotherapy Dataset 2 | Genus | 0.534(+0.022) | 0.500(−0.004) | 0.500(−0.004) | 0.497(−0.007) | 0.500(+0.000) | 0.564(+0.044) | 0.651(+0.058) | 0.615(+0.079) | |
Stacked-Taxa | 0.604(+0.025) | 0.500(+0.006) | 0.508(+0.015) | 0.531(+0.037) | 0.500(+0.000) | 0.605(+0.066) | 0.627(+0.089) | 0.710(−0.007) | ||
Immunotherapy Dataset 3 | Genus | 0.581(+0.034) | 0.500(+0.000) | 0.501(+0.001) | 0.507(+0.007) | 0.528(+0.028) | 0.581(+0.064) | 0.594(+0.052) | 0.567(+0.040) | |
Stacked-Taxa | 0.588(+0.030) | 0.500(+0.005) | 0.523(+0.028) | 0.527(+0.032) | 0.492(−0.003) | 0.589(+0.041) | 0.577(+0.066) | 0.576(+0.007) | ||
Immunotherapy Dataset 4 | Genus | 0.580(+0.050) | 0.497(−0.005) | 0.488(−0.014) | 0.490(−0.012) | 0.476(−0.024) | 0.606(+0.004) | 0.600(−0.006) | 0.579(+0.012) | |
Stacked-Taxa | 0.564(+0.026) | 0.475(−0.024) | 0.445(−0.054) | 0.446(−0.053) | 0.512(+0.018) | 0.610(+0.014) | 0.583(−0.009) | 0.572(+0.010) | ||
25% of Target Samples in Source Data | Colorectal Cancer | Genus | 0.598(+0.007) | 0.592(+0.099) | 0.585(+0.092) | 0.571(+0.078) | 0.689(+0.052) | 0.745(+0.011) | 0.772(+0.025) | 0.670(+0.023) |
Stacked-Taxa | 0.548(+0.051) | 0.604(+0.092) | 0.592(+0.080) | 0.604(+0.093) | 0.723(+0.036) | 0.765(+0.014) | 0.795(+0.020) | 0.709(+0.035) | ||
Crohn’s Disease | Genus | 0.754(+0.010) | 0.515(+0.024) | 0.538(+0.048) | 0.544(+0.054) | 0.749(+0.039) | 0.812(+0.038) | 0.812(+0.035) | 0.765(+0.030) | |
Stacked-Taxa | 0.745(+0.029) | 0.500(0.000) | 0.557(+0.057) | 0.500(0.000) | 0.737(+0.048) | 0.791(+0.041) | 0.798(+0.043) | 0.810(+0.029) | ||
Immunotherapy Dataset 1 | Genus | 0.557(+0.012) | 0.500(+0.010) | 0.500(+0.010) | 0.500(+0.010) | 0.493(−0.006) | 0.643(+0.000) | 0.630(−0.003) | 0.566(+0.013) | |
Stacked-Taxa | 0.572(+0.020) | 0.500(−0.003) | 0.498(−0.005) | 0.496(−0.007) | 0.502(+0.007) | 0.630(+0.047) | 0.622(−0.026) | 0.568(+0.048) | ||
Immunotherapy Dataset 2 | Genus | 0.530(+0.017) | 0.500(−0.004) | 0.500(−0.004) | 0.499(−0.005) | 0.500(0.000) | 0.568(+0.048) | 0.626(+0.034) | 0.570(+0.034) | |
Stacked-Taxa | 0.563(−0.016) | 0.500(+0.006) | 0.502(+0.008) | 0.524(+0.030) | 0.500(0.000) | 0.567(+0.028) | 0.594(+0.056) | 0.693(−0.024) | ||
Immunotherapy Dataset 3 | Genus | 0.555(+0.008) | 0.501(+0.001) | 0.500(+0.000) | 0.502(+0.002) | 0.516(+0.016) | 0.560(+0.043) | 0.566(+0.025) | 0.555(+0.028) | |
Stacked-Taxa | 0.571(+0.012) | 0.500(+0.005) | 0.506(+0.011) | 0.519(+0.024) | 0.505(+0.010) | 0.554(+0.006) | 0.548(+0.037) | 0.547(−0.022) | ||
Immunotherapy Dataset 4 | Genus | 0.580(+0.050) | 0.499(−0.003) | 0.495(−0.007) | 0.498(−0.004) | 0.487(−0.013) | 0.592(−0.010) | 0.591(−0.015) | 0.554(−0.013) | |
Stacked-Taxa | 0.548(+0.011) | 0.490(−0.009) | 0.473(−0.026) | 0.474(−0.025) | 0.504(+0.010) | 0.577(−0.020) | 0.586(−0.006) | 0.552(−0.010) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, K.; Zhou, Y.-H. Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering 2023, 10, 231. https://doi.org/10.3390/bioengineering10020231
Song K, Zhou Y-H. Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering. 2023; 10(2):231. https://doi.org/10.3390/bioengineering10020231
Chicago/Turabian StyleSong, Kuncheng, and Yi-Hui Zhou. 2023. "Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations" Bioengineering 10, no. 2: 231. https://doi.org/10.3390/bioengineering10020231
APA StyleSong, K., & Zhou, Y. -H. (2023). Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering, 10(2), 231. https://doi.org/10.3390/bioengineering10020231