Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning
Abstract
:1. Introduction
2. Results
2.1. Clinical Characteristics of the Cohort
2.2. Using scRNA-seq Transcripts as an Input to Classify Diagnostics
2.3. Using scRNA-seq Transcripts as an Input to Classify Cell Types
2.4. GO Analysis of Shortlisted Candidate Genes
2.5. Summary of Cellular Expression Patterns for Various Diagnostics in a Two-Dimensional Format
3. Discussion
4. Materials and Methods
4.1. Patient Enrollment and ScRNA-seq
4.2. ScRNA-seq Data Analysis
4.3. Machine Learning Classification
4.4. Gene Ontology Analysis
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hotchkiss, R.S.; Moldawer, L.L.; Opal, S.M.; Reinhart, K.; Turnbull, I.R.; Vincent, J.-L. Sepsis and septic shock. Nat. Rev. Dis. Prim. 2016, 2, 16045. [Google Scholar] [CrossRef]
- Singer, M.; Deutschman, C.S.; Seymour, C.W.; Shankar-Hari, M.; Annane, D.; Bauer, M.; Bellomo, R.; Bernard, G.R.; Chiche, J.-D.; Coopersmith, C.M.; et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016, 315, 801–810. [Google Scholar] [CrossRef]
- Bauer, M.; Gerlach, H.; Vogelmann, T.; Preissing, F.; Stiefel, J.; Adam, D. Mortality in sepsis and septic shock in Europe, North America and Australia between 2009 and 2019—Results from a systematic review and meta-analysis. Crit. Care 2020, 24, 239. [Google Scholar] [CrossRef]
- Rudd, K.E.; Johnson, S.C.; Agesa, K.M.; Shackelford, K.A.; Tsoi, D.; Kievlan, D.R.; Colombara, D.V.; Ikuta, K.S.; Kissoon, N.; Finfer, S.; et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: Analysis for the Global Burden of Disease Study. Lancet 2020, 395, 200–211. [Google Scholar] [CrossRef]
- Bermejo-Martin, J.F.; Gonzalez-Rivera, M.; Almansa, R.; Micheloud, D.; Tedim, A.P.; Dominguez-Gil, M.; Resino, S.; Martin-Fernandez, M.; Murua, P.R.; Perez-Garcia, F.; et al. Viral RNA load in plasma is associated with critical illness and a dysregulated host response in COVID-19. Crit. Care 2020, 24, 691. [Google Scholar] [CrossRef]
- Martinez, G.S.; Ostadgavahi, A.T.; Al-Rafat, A.M.; Garduno, A.; Cusack, R.; Bermejo-Martin, J.F.; Martin-Loeches, I.; Kelvin, D. Model-interpreted outcomes of artificial neural networks classifying immune biomarkers associated with severe infections in ICU. Front. Immunol. 2023, 14, 1137850. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Sharma, S.; Gioia, L.; Abe, B.; Holt, M.; Costanzo, A.; Kain, L.; Su, A.; Teyton, L. Using single cell analysis for translational studies in immune mediated diseases: Opportunities and challenges. Mol. Immunol. 2018, 103, 191–199. [Google Scholar] [CrossRef]
- Erfanian, N.; Heydari, A.A.; Feriz, A.M.; Ianez, P.; Derakhshani, A.; Ghasemigol, M.; Farahpour, M.; Razavi, S.M.; Nasseri, S.; Safarpour, H.; et al. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed. Pharmacother. 2023, 165, 115077. [Google Scholar] [CrossRef]
- Yan, H.; Lee, J.; Song, Q.; Li, Q.; Schiefelbein, J.; Zhao, B.; Li, S. Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods. New Phytol. 2022, 234, 1507–1520. [Google Scholar] [CrossRef]
- Gao, S.; Gao, X.; Zhu, R.; Wu, D.; Feng, Z.; Jiao, N.; Sun, R.; Gao, W.; He, Q.; Liu, Z.; et al. Microbial genes outperform species and SNVs as diagnostic markers for Crohn’s disease on multicohort fecal metagenomes empowered by artificial intelligence. Gut Microbes 2023, 15, 2221428. [Google Scholar] [CrossRef]
- Martinez, G.; Garduno, A.; Mahmud-Al-Rafat, A.; Ostadgavahi, A.T.; Avery, A.; Silva, S.d.A.e.; Cusack, R.; Cameron, C.; Cameron, M.; Martin-Loeches, I.; et al. An artificial neural network classification method employing longitudinally monitored immune biomarkers to predict the clinical outcome of critically ill COVID-19 patients. PeerJ 2022, 10, e14487. [Google Scholar] [CrossRef]
- Michelhaugh, S.A.; Januzzi, J.L. Using Artificial Intelligence to Better Predict and Develop Biomarkers. Heart Fail. Clin. 2022, 18, 275–285. [Google Scholar] [CrossRef]
- Yang, Z.; Bogdan, P.; Nazarian, S. An in silico deep learning approach to multi-epitope vaccine design: A SARS-CoV-2 case study. Sci. Rep. 2021, 11, 3238. [Google Scholar] [CrossRef]
- Martinez, G.S.; Dutt, M.; Kelvin, D.J.; Kumar, A. PoxiPred: An Artificial-Intelligence-Based Method for the Prediction of Potential Antigens and Epitopes to Accelerate Vaccine Development Efforts against Poxviruses. Biology 2024, 13, 125. [Google Scholar] [CrossRef]
- Cui, C.; Ding, X.; Wang, D.; Chen, L.; Xiao, F.; Xu, T.; Zheng, M.; Luo, X.; Jiang, H.; Chen, K. Drug repurposing against breast cancer by integrating drug-exposure expression profiles and drug-drug links based on graph neural network. Bioinformatics 2021, 37, 2930–2937. [Google Scholar] [CrossRef]
- Dohmen, J.; Baranovskii, A.; Ronen, J.; Uyar, B.; Franke, V.; Akalin, A. Identifying tumor cells at the single-cell level using machine learning. Genome Biol. 2022, 23, 123. [Google Scholar] [CrossRef]
- Cui, H.; Wang, C.; Maan, H.; Pang, K.; Luo, F.; Duan, N.; Wang, B. scGPT: Toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 2024, 1–11. [Google Scholar] [CrossRef]
- Moreira, T.G.; Gauthier, C.D.; Murphy, L.; Lanser, T.B.; Paul, A.; Matos, K.T.F.; Mangani, D.; Izzy, S.; Rezende, R.M.; Healy, B.C.; et al. Nasal administration of anti-CD3 mAb (Foralumab) downregulates NKG7 and increases TGFB1 and GIMAP7 expression in T cells in subjects with COVID-19. Proc. Natl. Acad. Sci. USA 2023, 120, e2220272120. [Google Scholar] [CrossRef]
- Jimenez-Duran, G.; Luque-Martin, R.; Patel, M.; Koppe, E.; Bernard, S.; Sharp, C.; Buchan, N.; Rea, C.; de Winther, M.P.; Turan, N.; et al. Pharmacological validation of targets regulating CD14 during macrophage differentiation. EBioMedicine 2020, 61, 103039. [Google Scholar] [CrossRef]
- Bermejo-Martin, J.F.; García-Mateo, N.; Motos, A.; Resino, S.; Tamayo, L.; Murua, P.R.; Bustamante-Munguira, E.; Curto, E.G.; Úbeda-Iglesias, A.; Torre, M.d.C.d.l.; et al. Effect of viral storm in patients admitted to intensive care units with severe COVID-19 in Spain: A multicentre, prospective, cohort study. Lancet Microbe 2023, 4, e431–e441. [Google Scholar] [CrossRef]
- Martin, T.R.; Wurfel, M.M.; Zanoni, I.; Ulevitch, R. Targeting innate immunity by blocking CD14: Novel approach to control inflammation and organ dysfunction in COVID-19 illness. EBioMedicine 2020, 57, 102836. [Google Scholar] [CrossRef]
- Chilunda, V.; Martinez-Aguado, P.; Xia, L.C.; Cheney, L.; Murphy, A.; Veksler, V.; Ruiz, V.; Calderon, T.M.; Berman, J.W. Transcriptional Changes in CD16+ Monocytes May Contribute to the Pathogenesis of COVID-19. Front. Immunol. 2021, 12, 665773. [Google Scholar] [CrossRef]
- Syrimi, E.; Fennell, E.; Richter, A.; Vrljicak, P.; Stark, R.; Ott, S.; Murray, P.G.; Al-Abadi, E.; Chikermane, A.; Dawson, P.; et al. The immune landscape of SARS-CoV-2-associated Multisystem Inflammatory Syndrome in Children (MIS-C) from acute disease to recovery. iScience 2021, 24, 103215. [Google Scholar] [CrossRef]
- Crist, S.A.; Griffith, T.S.; Ratliff, T.L. Structure/function analysis of the murine cd95l promoter reveals the identification of a novel transcriptional repressor and functional CD28 response element. J. Biol. Chem. 2003, 278, 35950–35958. [Google Scholar] [CrossRef]
- Meijer, B.; Gearry, R.B.; Day, A.S. The role of S100A12 as a systemic marker of inflammation. Int. J. Inflamm. 2012, 2012, 907078. [Google Scholar] [CrossRef]
- Witter, A.R.; Okunnu, B.M.; Berg, R.E. The Essential Role of Neutrophils during Infection with the Intracellular Bacterial Pathogen Listeria monocytogenes. J. Immunol. 2016, 197, 1557–1565. [Google Scholar] [CrossRef]
- Andreu-Ballester, J.C.; Tormo-Calandín, C.; Garcia-Ballesteros, C.; Pérez-Griera, J.; Amigó, V.; Almela-Quilis, A.; Ruiz del Castillo, J.; Peñarroja-Otero, C.; Ballester, F. Association of γδ T cells with disease severity and mortality in septic patients. Clin. Vaccine Immunol. 2013, 20, 738–746. [Google Scholar] [CrossRef]
- Rijkers, G.; Vervenne, T.; van der Pol, P. More Bricks in the Wall Against SARS-CoV-2 Infection: Involvement of g9d2 T Cells. Cell. Mol. Immunol. 2020, 17, 771–772. [Google Scholar] [CrossRef]
- von Borstel, A.; Nguyen, T.H.; Rowntree, L.C.; Ashhurst, T.M.; Allen, L.F.; Howson, L.J.; E Holmes, N.; Smibert, O.C.; A Trubiano, J.; Gordon, C.L.; et al. Circulating effector γδ T cell populations are associated with acute coronavirus disease 19 in unvaccinated individuals. Immunol. Cell Biol. 2023, 101, 321–332. [Google Scholar] [CrossRef]
- Martinez, G.S.; Pérez-Rueda, E.; Sarkar, S.; Kumar, A.; Silva, S.d.e. Machine learning and statistics shape a novel path in archaeal promoter annotation. BMC Bioinform. 2022, 23, 171. [Google Scholar] [CrossRef]
Sepsis | Mild COVID-19 * | Septic Shock | Severe COVID-19 | |
---|---|---|---|---|
Age (mean) | 57.25 | 76.33 | 61.25 | 65.25 |
Weight (kg) | 86.25 | 76.27 | 100 | 82.35 |
Sex at birth [m (n; %) f (n, %)] | m (3, 18.75%) f (1, 6.25%) | m (2, 12.5%) f (1, 6.25%) | m (3, 18.75%) f (1, 6.25%) | m (2, 12.5%) f (2, 12.5%) |
PaO2/FiO2 (worst) | 15 | 29 | 15 | 17 |
Temperature °C (highest) | 37.88 | 37.47 | 39.7 | 37.3 |
CRRT (%) | 25 | 0 | 25 | 50 |
Lactate (nmol/L) | 2.3 | 2.77 | 2.86 | 2.43 |
Creatinine (µmol/L) | 152.5 | 95.67 | 92.5 | 182 |
Bilirubin (mg/dL) | 21 | 17 | 12 | 18 |
HB (g/dL) | 10.33 | 11.67 | 7.1 | 12.35 |
APTT (seconds) | 42.73 | 31.67 | 30.2 | 54.5 |
CRP (mg/L) | 229.7 | 96.8 | 341.89 | 74.35 |
Neutrophils (109/L) | 25.43 | 5.17 | 14.9 | 10.85 |
Lymphocytes (109/L) | 0.38 | 0.77 | 0.8 | 0.98 |
APACHE (mean) | 37 | 9 | 41 | 21 |
SOFA (mean) | 9.75 | 1.3 | 12.75 | 7 |
Hospital length of stay (days) | 124 | 31 | 149 | 17 |
Survivors (%) | 75 | 66 | 75 | 75 |
ICU length of stay (days) | 24.25 | 0 | 38.5 | 24.5 |
Bacterial pathogens identified | K. oxytoca; S. aureus | NA | S. aureus S. epidermidis | S. epidermidis E. coli K. pneumoniae |
Accuracy | Precision | Recall | Specificity | |
---|---|---|---|---|
Support Vector Machine | 0.81 | 0.62 | 0.47 | 0.82 |
Random Forest | 0.82 | 0.69 | 0.49 | 0.83 |
XGBoost | 0.88 | 0.73 | 0.71 | 0.91 |
Logistic Regression | 0.69 | 0.63 | 0.15 | 0.95 |
Gradient Boosting | 0.77 | 0.71 | 0.50 | 0.90 |
K-Nearest Neighbors | 0.76 | 0.64 | 0.57 | 0.84 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sganzerla Martinez, G.; Garduno, A.; Toloue Ostadgavahi, A.; Hewins, B.; Dutt, M.; Kumar, A.; Martin-Loeches, I.; Kelvin, D.J. Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning. Int. J. Mol. Sci. 2024, 25, 5920. https://doi.org/10.3390/ijms25115920
Sganzerla Martinez G, Garduno A, Toloue Ostadgavahi A, Hewins B, Dutt M, Kumar A, Martin-Loeches I, Kelvin DJ. Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning. International Journal of Molecular Sciences. 2024; 25(11):5920. https://doi.org/10.3390/ijms25115920
Chicago/Turabian StyleSganzerla Martinez, Gustavo, Alexis Garduno, Ali Toloue Ostadgavahi, Benjamin Hewins, Mansi Dutt, Anuj Kumar, Ignacio Martin-Loeches, and David J. Kelvin. 2024. "Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning" International Journal of Molecular Sciences 25, no. 11: 5920. https://doi.org/10.3390/ijms25115920
APA StyleSganzerla Martinez, G., Garduno, A., Toloue Ostadgavahi, A., Hewins, B., Dutt, M., Kumar, A., Martin-Loeches, I., & Kelvin, D. J. (2024). Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning. International Journal of Molecular Sciences, 25(11), 5920. https://doi.org/10.3390/ijms25115920