Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features
Abstract
:1. Introduction
2. Materials and Methods
2.1. Graph Random Forest
2.2. Evaluation of Feature Importance
2.3. Details of Model Setting
2.4. Simulation Setting
Synthetic Data Generation
2.5. Real Datasets
3. Results
3.1. Simulation Results
3.2. Non-Small Cell Lung Cancer Data Results
3.3. Human Embryonic Stem Cell Data Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Johannes, M.; Brase, J.C.; Fröhlich, H.; Gade, S.; Gehrmann, M.; Fälth, M.; Sültmann, H.; Beißbarth, T. Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients. Bioinformatics 2010, 26, 2136–2144. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Touw, W.G.; Bayjanov, J.R.; Overmars, L.; Backus, L.; Boekhorst, J.; Wels, M.; van Hijum, S.A. Data mining in the Life Sciences with Random Forest: A walk in the park or lost in the jungle? Briefings Bioinform. 2013, 14, 315–326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nguyen, C.; Wang, Y.; Nguyen, H.N. Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J. Biomed. Sci. Eng. 2013, 6, 31887. [Google Scholar] [CrossRef]
- Toth, R.; Schiffmann, H.; Hube-Magg, C.; Büscheck, F.; Höflmayer, D.; Weidemann, S.; Lebok, P.; Fraune, C.; Minner, S.; Schlomm, T.; et al. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin. Epigenet. 2019, 11, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Sun, G.; Li, S.; Cao, Y.; Lang, F. Cervical cancer diagnosis based on random forest. Int. J. Perform. Eng. 2017, 13, 446. [Google Scholar] [CrossRef]
- Huynh-Thu, V.A.; Irrthum, A.; Wehenkel, L.; Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 2010, 5, e12776. [Google Scholar] [CrossRef]
- Navlakha, S.; Kingsford, C. The power of protein interaction networks for associating genes with diseases. Bioinformatics 2010, 26, 1057–1063. [Google Scholar] [CrossRef]
- Ideker, T.; Ozier, O.; Schwikowski, B.; Siegel, A.F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002, 18, S233–S240. [Google Scholar] [CrossRef] [Green Version]
- Petralia, F.; Wang, P.; Yang, J.; Tu, Z. Integrative random forest for gene regulatory network inference. Bioinformatics 2015, 31, i197–i205. [Google Scholar] [CrossRef] [Green Version]
- Wu, Q.W.; Xia, J.F.; Ni, J.C.; Zheng, C.H. GAERF: Predicting lncRNA-disease associations by graph auto-encoder and random forest. Briefings Bioinform. 2021, 22, bbaa391. [Google Scholar] [CrossRef] [PubMed]
- Kong, Y.; Yu, T. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics 2018, 34, 3727–3737. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chereda, H.; Bleckmann, A.; Menck, K.; Perera-Bel, J.; Stegmaier, P.; Auer, F.; Kramer, F.; Leha, A.; Beißbarth, T. Explaining decisions of graph convolutional neural networks: Patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome Med. 2021, 13, 1–16. [Google Scholar] [CrossRef]
- Barabási, A.L.; Gulbahce, N.; Loscalzo, J. Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 2011, 12, 56–68. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Das, J.; Yu, H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 2012, 6, 92. [Google Scholar] [CrossRef] [Green Version]
- Dutkowski, J.; Ideker, T. Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 2011, 7, e1002180. [Google Scholar] [CrossRef] [Green Version]
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [Green Version]
- The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014, 511, 543–550. [Google Scholar] [CrossRef]
- Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012, 489, 519. [Google Scholar] [CrossRef] [Green Version]
- Yu, G.; Wang, L.G.; Han, Y.; He, Q.Y. clusterProfiler: An R package for comparing biological themes among gene clusters. Omics J. Integr. Biol. 2012, 16, 284–287. [Google Scholar] [CrossRef]
- Situ, Y.; Gao, R.; Lei, L.; Deng, L.; Xu, Q.; Shao, Z. System analysis of FHIT in LUAD and LUSC: The expression, prognosis, gene regulation network, and regulation targets. Int. J. Biol. Markers 2022, 37, 158–169. [Google Scholar] [CrossRef]
- Galimberti, F.; Thompson, S.L.; Liu, X.; Li, H.; Memoli, V.; Green, S.R.; DiRenzo, J.; Greninger, P.; Sharma, S.V.; Settleman, J.; et al. Targeting the cyclin E-Cdk-2 complex represses lung cancer growth by triggering anaphase catastrophe. Clin. Cancer Res. 2010, 16, 109–120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumar, V.; Abbas, A.K.; Aster, J.C. Robbins Basic Pathology e-Book; Elsevier Health Sciences: Amsterdam, The Netherlands, 2017. [Google Scholar]
- Mason, P.J.; Perdigones, N. Telomere biology and translational research. Transl. Res. 2013, 162, 333–342. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hanahan, D.; Weinberg, R.A. Hallmarks of cancer: The next generation. Cell 2011, 144, 646–674. [Google Scholar] [CrossRef] [Green Version]
- Storti, C.B.; de Oliveira, R.A.; de Carvalho, M.; Hasimoto, E.N.; Cataneo, D.C.; Cataneo, A.J.M.; De Faveri, J.; Vasconcelos, E.J.R.; Dos Reis, P.P.; Cano, M.I.N. Telomere-associated genes and telomeric lncRNAs are biomarker candidates in lung squamous cell carcinoma (LUSC). Exp. Mol. Pathol. 2020, 112, 104354. [Google Scholar] [CrossRef]
- Chen, M.; Liu, X.; Du, J.; Wang, X.J.; Xia, L. Differentiated regulation of immune-response related genes between LUAD and LUSC subtypes of lung cancers. Oncotarget 2017, 8, 133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, X.; Deng, Y.; He, R.Q.; Li, X.J.; Ma, J.; Chen, G.; Hu, X.H. Upregulation of HOXA11 during the progression of lung adenocarcinoma detected via multiple approaches. Int. J. Mol. Med. 2018, 42, 2650–2664. [Google Scholar] [CrossRef] [Green Version]
- Anusewicz, D.; Orzechowska, M.; Bednarek, A.K. Lung squamous cell carcinoma and lung adenocarcinoma differential gene expression regulation through pathways of Notch, Hedgehog, Wnt, and ErbB signalling. Sci. Rep. 2020, 10, 21128. [Google Scholar] [CrossRef]
- Close, J.L.; Yao, Z.; Levi, B.P.; Miller, J.A.; Bakken, T.E.; Menon, V.; Ting, J.T.; Wall, A.; Krostag, A.R.; Thomsen, E.R.; et al. Single-cell profiling of an in vitro model of human interneuron development reveals temporal dynamics of cell type production and maturation. Neuron 2017, 93, 1035–1048. [Google Scholar] [CrossRef] [Green Version]
- Cavey, M.; Lecuit, T. Molecular bases of cell–cell junctions stability and dynamics. Cold Spring Harb. Perspect. Biol. 2009, 1, a002998. [Google Scholar] [CrossRef] [Green Version]
- Varga, J.; De Oliveira, T.; Greten, F.R. The architect who never sleeps: Tumor-induced plasticity. FEBS Lett. 2014, 588, 2422–2427. [Google Scholar] [CrossRef] [Green Version]
- Serrano-Gomez, S.J.; Maziveyi, M.; Alahari, S.K. Regulation of epithelial-mesenchymal transition through epigenetic and post-translational modifications. Mol. Cancer 2016, 15, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Abba, M.L.; Patil, N.; Leupold, J.H.; Allgayer, H. MicroRNA regulation of epithelial to mesenchymal transition. J. Clin. Med. 2016, 5, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Stiles, J.; Jernigan, T.L. The basics of brain development. Neuropsychol. Rev. 2010, 20, 327–348. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Van Den Heuvel, M.P.; Kersbergen, K.J.; De Reus, M.A.; Keunen, K.; Kahn, R.S.; Groenendaal, F.; De Vries, L.S.; Benders, M.J. The neonatal connectome during preterm brain development. Cereb. Cortex 2015, 25, 3000–3013. [Google Scholar] [CrossRef] [Green Version]
- Kandel, E.S.; Hay, N. The regulation and activities of the multifunctional serine/threonine kinase Akt/PKB. Exp. Cell Res. 1999, 253, 210–229. [Google Scholar] [CrossRef]
- Chong, Z.; Maiese, K. Targeting WNT, protein kinase B, and mitochondrial membrane integrity to foster cellular survival in the nervous system. Histol. Histopathol. 2004, 19, 495. [Google Scholar]
Methods | GRF 1 | RF 1 |
---|---|---|
Mean accuracy | 0.9457 (0.0116) | 0.9483 (0.0097) |
Number of connected components | 20.65 (3.63) | 94.9 (1.92) |
Size of the largest connected component | 73.75 (6.63) | 3.7 (1.22) |
Average distance | 4.29 (0.25) | 1.38 (0.38) |
Average distance in the largest component | 4.31 (0.25) | 1.53 (0.33) |
GOBPID 1 | Adj-p 2 | Term |
---|---|---|
GO:0051052 | regulation of DNA metabolic process | |
GO:0000723 | telomere maintenance | |
GO:0032069 | regulation of nuclease activity | |
GO:0032200 | telomere organization | |
GO:0032211 | negative regulation of telomere maintenance via telomerase | |
GO:0051098 | regulation of binding | |
GO:0048598 | embryonic morphogenesis | |
GO:1904357 | negative regulation of telomere maintenance via telomere lengthening | |
GO:0042098 | T cell proliferation | |
GO:0006303 | double-strand break repair via nonhomologous end joining |
Methods | GRF 1 | RF 1 |
---|---|---|
Mean accuracy | 0.9280 (0.0089) | 0.9301 (0.008) |
Number of connected component | 31.15 (4.83) | 83.85 (3.73) |
Size of the largest connected component | 67.00 (6.10) | 7.95 (3.32) |
Average distance | 3.67 (0.30) | 2.17 (0.62) |
Average distance in the largest component | 3.68 (0.30) | 2.54 (0.62) |
GOBPID 1 | Adj-p 2 | Term |
---|---|---|
GO:0010718 | 0.0002 | positive regulation of epithelial to mesenchymal transition |
GO:0050808 | 0.0002 | synapse organization |
GO:0010717 | 0.0002 | regulation of epithelial to mesenchymal transition |
GO:0034109 | 0.0002 | homotypic cell–cell adhesion |
GO:0007178 | 0.0002 | transmembrane receptor protein serine/threonine kinase signaling pathway |
GO:0070527 | 0.0003 | platelet aggregation |
GO:0048667 | 0.0003 | cell morphogenesis involved in neuron differentiation |
GO:0048812 | 0.0003 | neuron projection morphogenesis |
GO:1903706 | 0.0003 | regulation of hemopoiesis |
GO:0001837 | 0.0003 | epithelial to mesenchymal transition |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, L.; Wu, W.; Yu, T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023, 13, 1153. https://doi.org/10.3390/biom13071153
Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules. 2023; 13(7):1153. https://doi.org/10.3390/biom13071153
Chicago/Turabian StyleTian, Leqi, Wenbin Wu, and Tianwei Yu. 2023. "Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features" Biomolecules 13, no. 7: 1153. https://doi.org/10.3390/biom13071153
APA StyleTian, L., Wu, W., & Yu, T. (2023). Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules, 13(7), 1153. https://doi.org/10.3390/biom13071153