Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation
2.2. sCClust: Sparse Canonical Correlation Analysis with Clustering
2.3. Kaplan–Meier Plots and Minimum Hazard Ratio
3. Results
3.1. Survival Analysis
3.2. Comparison with Single- and Multi-Omics Methods
3.3. Pathway Over-Representation Analysis
3.4. Interpretation of the Canonical Weights of Genes
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
TCGA | the Cancer Genome Atlas |
GBM | glioblastoma multiforme |
COAD | colon adenocarcinoma |
LSCC | lung squamous cell carcinoma |
sCCA | sparse canonical correlation analysis |
References
- Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002, 16, 6–21. [Google Scholar] [CrossRef] [PubMed]
- Dhar, G.A.; Saha, S.; Mitra, P.; Nag Chaudhuri, R. DNA methylation and regulation of gene expression: Guardian of our health. Nucleus 2021, 64, 259–270. [Google Scholar] [CrossRef] [PubMed]
- Moore, L.D.; Le, T.; Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 2013, 38, 23–38. [Google Scholar] [CrossRef]
- Xu, W.; Xu, M.; Wang, L.; Zhou, W.; Xiang, R.; Shi, Y.; Zhang, Y.; Piao, Y. Integrative analysis of DNA methylation and gene expression identified cervical cancer-specific diagnostic biomarkers. Signal Transduct. Target. Ther. 2019, 4, 1–11. [Google Scholar] [CrossRef]
- Wagner, J.R.; Busche, S.; Ge, B.; Kwan, T.; Pastinen, T.; Blanchette, M. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 2014, 15, R37. [Google Scholar] [CrossRef] [PubMed]
- Jiang, L.; Xiao, Y.; Ding, Y.; Tang, J.; Guo, F. Discovering cancer subtypes via an accurate fusion strategy on multiple profile data. Front. Genet. 2019, 10, 20. [Google Scholar] [CrossRef] [PubMed]
- Froeling, F.E.; Casolino, R.; Pea, A.; Biankin, A.V.; Chang, D.K.; Precision-Panc. Molecular subtyping and precision medicine for pancreatic cancer. J. Clin. Med. 2021, 10, 149. [Google Scholar] [CrossRef] [PubMed]
- Lin, X.; Tian, T.; Wei, Z.; Hakonarson, H. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat. Commun. 2022, 13, 7705. [Google Scholar] [CrossRef]
- Rappoport, N.; Shamir, R. NEMO: Cancer subtyping by integration of partial multi-omic data. Bioinformatics 2019, 35, 3348–3356. [Google Scholar] [CrossRef]
- Wu, D.; Wang, D.; Zhang, M.Q.; Gu, J. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genom. 2015, 16, 1022. [Google Scholar] [CrossRef]
- Nguyen, T.; Tagett, R.; Diaz, D.; Draghici, S. A novel approach for data integration and disease subtyping. Genome Res. 2017, 27, 2025–2039. [Google Scholar] [CrossRef] [PubMed]
- Yamada, R.; Okada, D.; Wang, J.; Basak, T.; Koyama, S. Interpretation of omics data analyses. J. Hum. Genet. 2021, 66, 93–102. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Castelló, A. Principal components analysis in clinical studies. Ann. Transl. Med. 2017, 5, 351. [Google Scholar] [CrossRef] [PubMed]
- Soneson, C.; Lilljebjörn, H.; Fioretos, T.; Fontes, M. Integrative analysis of gene expression and copy number alterations using canonical correlation analysis. BMC Bioinform. 2010, 11, 191. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Y.; Jun, J.; Brennan, K.; Gevaert, O. Epimix is an integrative tool for epigenomic subtyping using dna methylation. Cell Rep. Methods 2023, 3, 100515. [Google Scholar] [CrossRef] [PubMed]
- Arslanturk, S.; Draghici, S.; Nguyen, T. Integrated cancer subtyping using heterogeneous genome-scale molecular datasets. In Proceedings of the Pacific Symposium on Biocomputing 2020; World Scientific: London, UK, 2019; pp. 551–562. [Google Scholar]
- ElKarami, B.; Alkhateeb, A.; Qattous, H.; Alshomali, L.; Shahrrava, B. Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network. Cancer Inform. 2022, 21, 11769351221124205. [Google Scholar] [CrossRef] [PubMed]
- Qattous, H.; Azzeh, M.; Ibrahim, R.; Abed Al-Ghafer, I.; Al Sorkhy, M.; Alkhateeb, A. PaCMAP-embedded convolutional neural network for multi-omics data integration. Heliyon 2023, 10, e23195. [Google Scholar] [CrossRef] [PubMed]
- Csala, A.; Voorbraak, F.P.; Zwinderman, A.H.; Hof, M.H. Sparse redundancy analysis of high-dimensional genetic and genomic data. Bioinformatics 2017, 33, 3228–3234. [Google Scholar] [CrossRef] [PubMed]
- R Core Team. R Language Definition; R Foundation for Statistical Computing: Vienna, Austria, 2000. [Google Scholar]
- Tajunisha, S.; Saravanan, V. Performance analysis of k-means with different initialization methods for high dimensional data. Int. J. Artif. Intell. Appl. (IJAIA) 2010, 1, 44–52. [Google Scholar] [CrossRef]
- Hotelling, H. Canonical correlation analysis (CCA). J. Educ. Psychol. 1935, 10. [Google Scholar]
- Witten, D.M.; Tibshirani, R.J. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 2009, 8, 28. [Google Scholar] [CrossRef] [PubMed]
- Waaijenborg, S.; Zwinderman, A.H. Penalized canonical correlation analysis to quantify the association between gene expression and DNA markers. In Proceedings of the BMC Proceedings; Springer: Berlin/Heidelberg, Germany, 2007; Volume 1, pp. 1–5. [Google Scholar]
- Rodosthenous, T.; Shahrezaei, V.; Evangelou, M. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: A comparison study. Bioinformatics 2020, 36, 4616–4625. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- Goel, M.K.; Khanna, P.; Kishore, J. Understanding survival analysis: Kaplan-Meier estimate. Int. J. Ayurveda Res. 2010, 1, 274. [Google Scholar]
- Rich, J.T.; Neely, J.G.; Paniello, R.C.; Voelker, C.C.; Nussenbaum, B.; Wang, E.W. A practical guide to understanding Kaplan-Meier curves. Otolaryngol.—Head Neck Surg. 2010, 143, 331–336. [Google Scholar] [CrossRef]
- Rafique, O.; Mir, A.H. A topological approach for cancer subtyping from gene expression data. J. Biomed. Inform. 2020, 102, 103357. [Google Scholar] [CrossRef]
- Blagoev, K.B.; Wilkerson, J.; Fojo, T. Hazard ratios in cancer clinical trials—A primer. Nat. Rev. Clin. Oncol. 2012, 9, 178–183. [Google Scholar] [CrossRef]
- Clark, T.G.; Bradburn, M.J.; Love, S.B.; Altman, D.G. Survival analysis part I: Basic concepts and first analyses. Br. J. Cancer 2003, 89, 232–238. [Google Scholar] [CrossRef] [PubMed]
- Prentice, R.L.; Gloeckler, L.A. Regression analysis of grouped survival data with application to breast cancer data. Biometrics 1978, 34, 57–67. [Google Scholar] [CrossRef]
- Ng’andu, N.H. An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox’s model. Stat. Med. 1997, 16, 611–626. [Google Scholar] [CrossRef]
- Grambsch, P.M.; Therneau, T.M. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994, 81, 515–526. [Google Scholar] [CrossRef]
- In, J.; Lee, D.K. Survival analysis: Part II-applied clinical data analysis. Korean J. Anesthesiol. 2019, 72, 441–457. [Google Scholar] [CrossRef] [PubMed]
- Royston, P.; Sauerbrei, W. A new measure of prognostic separation in survival data. Stat. Med. 2004, 23, 723–748. [Google Scholar] [CrossRef]
- Ng, A. Clustering with the k-means algorithm. Mach. Learn. 2012, 1–2. [Google Scholar]
- Shahapure, K.R.; Nicholas, C. Cluster quality analysis using silhouette score. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, NSW, Australia, 6–9 October 2020; pp. 747–748. [Google Scholar]
- Edwards, A.W.; Cavalli-Sforza, L.L. A method for cluster analysis. Biometrics 1965, 21, 362–375. [Google Scholar] [CrossRef] [PubMed]
- Sidaway, P. Glioblastoma subtypes revisited. Nat. Rev. Clin. Oncol. 2017, 14, 587. [Google Scholar] [CrossRef]
- Liu, J.; Jiang, C.; Xu, C.; Wang, D.; Shen, Y.; Liu, Y.; Gu, L. Identification and development of a novel invasion-related gene signature for prognosis prediction in colon adenocarcinoma. Cancer Cell Int. 2021, 21, 101. [Google Scholar] [CrossRef]
- Polo, V.; Pasello, G.; Frega, S.; Favaretto, A.; Koussis, H.; Conte, P.; Bonanno, L. Squamous cell carcinomas of the lung and of the head and neck: New insights on molecular characterization. Oncotarget 2016, 7, 25050. [Google Scholar] [CrossRef]
- Emmert-Streib, F.; Dehmer, M. Introduction to survival analysis in practice. Mach. Learn. Knowl. Extr. 2019, 1, 1013–1038. [Google Scholar] [CrossRef]
- Wang, B.; Mezlini, A.M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 2014, 11, 333–337. [Google Scholar] [CrossRef] [PubMed]
- Pai, S.; Bader, G.D. Patient similarity networks for precision medicine. J. Mol. Biol. 2018, 430, 2924–2938. [Google Scholar] [CrossRef]
- Hershberg, E.A.; Stevens, G.; Diesh, C.; Xie, P.; De Jesus Martinez, T.; Buels, R.; Stein, L.; Holmes, I. JBrowseR: An R interface to the JBrowse 2 genome browser. Bioinformatics 2021, 37, 3914–3915. [Google Scholar] [CrossRef]
- Ou, J.; Zhu, L.J. trackViewer: A Bioconductor package for interactive and integrative visualization of multi-omics data. Nat. Methods 2019, 16, 453–454. [Google Scholar] [CrossRef]
- Xu, T.; Le, T.D.; Liu, L.; Su, N.; Wang, R.; Sun, B.; Colaprico, A.; Bontempi, G.; Li, J. CancerSubtypes: An R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics 2017, 33, 3131–3133. [Google Scholar] [CrossRef]
- Pierre-Jean, M.; Mauger, F.; Deleuze, J.F.; Le Floch, E. PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data. Bioinformatics 2022, 38, 900–907. [Google Scholar] [CrossRef] [PubMed]
- Zhang, E.; Zhang, M.; Shi, C.; Sun, L.; Shan, L.; Zhang, H.; Song, Y. An overview of advances in multi-omics analysis in prostate cancer. Life Sci. 2020, 260, 118376. [Google Scholar] [CrossRef] [PubMed]
- Coretto, P.; Serra, A.; Tagliaferri, R. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics 2018, 34, 4064–4072. [Google Scholar] [CrossRef]
- Ramanan, V.K.; Shen, L.; Moore, J.H.; Saykin, A.J. Pathway analysis of genomic data: Concepts, methods, and prospects for future development. Trends Genet. 2012, 28, 323–332. [Google Scholar] [CrossRef]
- Lee, E.; Chuang, H.Y.; Kim, J.W.; Ideker, T.; Lee, D. Inferring pathway activity toward precise disease classification. PLoS Comput. Biol. 2008, 4, e1000217. [Google Scholar] [CrossRef]
- Kanehisa, M. The KEGG database. In Proceedings of the Novartis Foundation Symposium; Wiley Online Library: Hoboken, NJ, USA, 2002; pp. 91–100. [Google Scholar]
- Oh, J.H.; Choi, W.; Ko, E.; Kang, M.; Tannenbaum, A.; Deasy, J.O. PathCNN: Interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics 2021, 37, i443–i450. [Google Scholar] [CrossRef]
- Wang, J.J.; Wang, H.; Zhu, B.L.; Wang, X.; Qian, Y.H.; Xie, L.; Wang, W.J.; Zhu, J.; Chen, X.Y.; Wang, J.M.; et al. Development of a prognostic model of glioma based on immune-related genes. Oncol. Lett. 2021, 21, 116. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Zheng, Z.; Ruan, J.; Li, Z.; Tzeng, C.M. Chronic inflammation links cancer and Parkinson’s disease. Front. Aging Neurosci. 2016, 8, 126. [Google Scholar] [CrossRef] [PubMed]
- Savaskan, N.E.; Fan, Z.; Broggini, T.; Buchfelder, M.; Eyupoglu, I.Y. Neurodegeneration in the brain tumor microenvironment: Glutamate in the limelight. Curr. Neuropharmacol. 2015, 13, 258–265. [Google Scholar] [CrossRef] [PubMed]
- Jin, X.; Guan, Y.; Sheng, H.; Liu, Y. Crosstalk in competing endogenous RNA network reveals the complex molecular mechanism underlying lung cancer. Oncotarget 2017, 8, 91270. [Google Scholar] [CrossRef] [PubMed]
- Zhan, X.; Lu, M.; Yang, L.; Yang, J.; Zheng, S.; Guo, Y.; Li, B.; Wen, S.; Li, J.; Li, N. Ubiquitination-mediated molecular pathway alterations in human lung squamous cell carcinomas identified by quantitative ubiquitinomics. Front. Endocrinol. 2022, 13, 970843. [Google Scholar] [CrossRef] [PubMed]
- Tran, M.T. Overview of Ca2+ signaling in lung cancer progression and metastatic lung cancer with bone metastasis. Explor. Target. Anti-Tumor Ther. 2021, 2, 249. [Google Scholar] [CrossRef] [PubMed]
- Bodaghi, S.; Yamanegi, K.; Xiao, S.Y.; Da Costa, M.; Palefsky, J.M.; Zheng, Z.M. Colorectal papillomavirus infection in patients with colorectal cancer. Clin. Cancer Res. 2005, 11, 2862–2867. [Google Scholar] [CrossRef]
- Kolodkin-Gal, D.; Zamir, G.; Edden, Y.; Pikarsky, E.; Pikarsky, A.; Haim, H.; Haviv, Y.S.; Panet, A. Herpes simplex virus type 1 preferentially targets human colon carcinoma: Role of extracellular matrix. J. Virol. 2008, 82, 999–1010. [Google Scholar] [CrossRef]
- Wen, S.; He, L.; Zhong, Z.; Mi, H.; Liu, F. Prognostic model of colorectal cancer constructed by eight immune-related genes. Front. Mol. Biosci. 2020, 7, 604252. [Google Scholar] [CrossRef]
- Mjelle, R.; Sjursen, W.; Thommesen, L.; Sætrom, P.; Hofsli, E. Small RNA expression from viruses, bacteria and human miRNAs in colon cancer tissue and its association with microsatellite instability and tumor location. BMC Cancer 2019, 19, 161. [Google Scholar] [CrossRef] [PubMed]
- Arunachalam, E.; Rogers, W.; Simpson, G.R.; Möller-Levet, C.; Bolton, G.; Ismael, M.; Smith, C.; Keegen, K.; Bagwan, I.; Brend, T.; et al. HOX and PBX gene dysregulation as a therapeutic target in glioblastoma multiforme. BMC Cancer 2022, 22, 400. [Google Scholar] [CrossRef] [PubMed]
- Cimino, P.J.; Kim, Y.; Wu, H.J.; Alexander, J.; Wirsching, H.G.; Szulzewsky, F.; Pitter, K.; Ozawa, T.; Wang, J.; Vazquez, J.; et al. Increased HOXA5 expression provides a selective advantage for gain of whole chromosome 7 in IDH wild-type glioblastoma. Genes Dev. 2018, 32, 512–523. [Google Scholar] [CrossRef] [PubMed]
- Ferletta, M.; Uhrbom, L.; Olofsson, T.; Pontén, F.; Westermark, B. Sox10 has a broad expression pattern in gliomas and enhances platelet-derived growth factor-B–induced gliomagenesis. Mol. Cancer Res. 2007, 5, 891–897. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.; Liang, T.; Yang, P.; Wang, H.; Liu, Y.; Yang, F.; You, G. Classifying lower grade glioma cases according to whole genome gene expression. Oncotarget 2016, 7, 74031. [Google Scholar] [CrossRef]
- Xie, J.; Qiao, L.; Deng, G.; Liang, N.; Xing, L.; Zhang, J. PCGF1 is a prognostic biomarker and correlates with tumor immunity in gliomas. Ann. Transl. Med. 2022, 10, 227. [Google Scholar] [CrossRef] [PubMed]
- Plowman, J.; Bolderson, E.; Burgess, J.; Richard, D.; O’Byrne, K. P2. 14-08 Banf1 Predicts Lung Cancer Survival and Sensitivity to Platinum-Based Chemotherapy. J. Thorac. Oncol. 2019, 14, S832. [Google Scholar] [CrossRef]
- Liu, H.Y.; Zhao, H.; Li, W.X. Integrated analysis of transcriptome and prognosis data identifies FGF22 as a prognostic marker of lung adenocarcinoma. Technol. Cancer Res. Treat. 2019, 18, 1533033819827317. [Google Scholar] [CrossRef]
- Shin, G.C.; Moon, S.U.; Kang, H.S.; Choi, H.S.; Han, H.D.; Kim, K.H. PRKCSH contributes to tumorigenesis by selective boosting of IRE1 signaling pathway. Nat. Commun. 2019, 10, 3185. [Google Scholar] [CrossRef]
- Wu, H.; Qian, C.; Liu, C.; Xiang, J.; Ye, D.; Zhang, Z.; Zhang, X. Role and mechanism of FOXG1 in invasion and metastasis of colorectal cancer. Sheng Wu Gong Cheng Xue Bao Chin. J. Biotechnol. 2018, 34, 752–760. [Google Scholar]
- Shen, P.C.; Wang, Y.F.; Chang, H.C.; Huang, W.Y.; Lo, C.H.; Su, Y.F.; Yang, J.F.; Lin, C.S.; Dai, Y.H. Developing a novel DNA methylation risk score for survival and identification of prognostic gene mutations in endometrial cancer: A study based on TCGA data. Jpn. J. Clin. Oncol. 2022, 52, 992–1000. [Google Scholar] [CrossRef] [PubMed]
- Hansen, T.F.; Andersen, R.F.; Olsen, D.A.; Sørensen, F.B.; Jakobsen, A. Prognostic importance of circulating epidermal growth factor-like domain 7 in patients with metastatic colorectal cancer treated with chemotherapy and bevacizumab. Sci. Rep. 2017, 7, 1–9. [Google Scholar] [CrossRef] [PubMed]
Data | Method | p-Value | Minimum Hazard Ratio | SEP |
---|---|---|---|---|
GBM | sCClust SNF OTRIMLE NEMO PINTMF | 0.00004 0.00769 0.00462 0.00256 0.00646 | 1.4907 1.3509 1.1796 1.3513 1.1594 | 1.2889 1.2299 1.1496 1.2665 1.1880 |
LSCC | sCClust SNF OTRIMLE NEMO PINTMF | 0.00031 0.01612 0.02126 0.00107 0.00444 | 1.6106 1.2356 1.1345 1.3859 1.4136 | 1.7026 1.4667 1.1756 1.5254 1.4793 |
COAD | sCClust SNF OTRIMLE NEMO PINTMF | 0.00041 0.03923 0.03914 0.01260 0.00341 | 8.7327 4.8915 2.8059 5.7714 4.1594 | 1041.07 356.462 724.132 872.090 358.442 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jilani, M.; Degras, D.; Haspel, N. Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes 2024, 15, 631. https://doi.org/10.3390/genes15050631
Jilani M, Degras D, Haspel N. Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes. 2024; 15(5):631. https://doi.org/10.3390/genes15050631
Chicago/Turabian StyleJilani, Muneeba, David Degras, and Nurit Haspel. 2024. "Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression" Genes 15, no. 5: 631. https://doi.org/10.3390/genes15050631
APA StyleJilani, M., Degras, D., & Haspel, N. (2024). Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes, 15(5), 631. https://doi.org/10.3390/genes15050631