Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation
2.2. Data Preprocessing
2.3. XGBoost Modeling
2.4. Characterizing Metastasis Marker Genes
Algorithm 1. Inner product between the feature importance and the AUC performance of the trained models |
, |
where |
3. Results and Evaluations
3.1. Metastasis Marker Genes
3.2. Comparing with Known Metastasis Markers
3.3. Enrichment Tests on Metastasis-Related Processes
3.4. Survival Analysis
3.5. Literature Evidence
3.5.1. Metastasis Marker Genes with the Highest Metastasis Score
3.5.2. Metastasis Marker Genes Not Identified by Statistical Analysis
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dillekas, H.; Rogers, M.S.; Straume, O. Are 90% of deaths from cancer caused by metastases? Cancer Med. 2019, 8, 5574–5576. [Google Scholar] [CrossRef]
- Guan, X. Cancer metastases: Challenges and opportunities. Acta Pharm. Sin. B 2015, 5, 402–418. [Google Scholar] [CrossRef]
- Albaradei, S.; Thafar, M.; Alsaedi, A.; Van Neste, C.; Gojobori, T.; Essack, M.; Gao, X. Machine learning and deep learning methods that use omics data for metastasis prediction. Comput. Struct. Biotechnol. J. 2021, 19, 5008–5018. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Guo, Q.; Tang, Y.; Qu, W.; Zuo, J.; Ke, X.; Song, Y. Screening and evaluation of the role of immune genes of brain metastasis in lung adenocarcinoma progression based on the TCGA and GEO databases. J. Thorac. Dis. 2021, 13, 5016–5034. [Google Scholar] [CrossRef]
- Kim, G.E.; Kim, N.I.; Lee, J.S.; Park, M.H.; Kang, K. Differentially expressed genes in matched normal, cancer, and lymph node metastases predict clinical outcomes in patients with breast cancer. Appl. Immunohistochem. Mol. Morphol. 2020, 28, 111–122. [Google Scholar] [CrossRef]
- Wei, W.; Lv, Y.; Gan, Z.; Zhang, Y.; Han, X.; Xu, Z. Identification of key genes involved in the metastasis of clear cell renal cell carcinoma. Oncol. Lett. 2019, 17, 4321–4328. [Google Scholar] [CrossRef]
- Metri, R.; Mohan, A.; Nsengimana, J.; Pozniak, J.; Molina-Paris, C.; Newton-Bishop, J.; Bishop, D.; Chandra, N. Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach. Sci. Rep. 2017, 7, 17314. [Google Scholar] [CrossRef]
- Wei, D. A multigene support vector machine predictor for metastasis of cutaneous melanoma. Mol. Med. Rep. 2018, 17, 2907–2914. [Google Scholar] [CrossRef]
- Burton, M.; Thomassen, M.; Tan, Q.; Kruse, T.A. Gene expression profiles for predicting metastasis in breast cancer: A cross-study comparison of classification methods. Sci. World J. 2012, 2012, 380495. [Google Scholar] [CrossRef]
- Tseng, Y.-J.; Huang, C.-E.; Wen, C.-N.; Lai, P.-Y.; Wu, M.-H.; Sun, Y.-C.; Wang, H.-Y.; Lu, J.-J. Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int. J. Med. Inform. 2019, 128, 79–86. [Google Scholar] [CrossRef]
- Tamar, G.; Vasil, T. The Burden Of Breast Cancer in Tbilisi in 2015–2019. Eur. J. Biomed. Life Sci. 2021, 27–33. [Google Scholar]
- Tomczak, K.; Czerwińska, P.; Wiznerowicz, M. Review the cancer genome atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 2015, 2015, 68–77. [Google Scholar] [CrossRef]
- Liu, J.; Lichtenberg, T.; Hoadley, K.A.; Poisson, L.M.; Lazar, A.J.; Cherniack, A.D.; Kovatich, A.J.; Benz, C.C.; Levine, D.A.; Lee, A.V.; et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 2018, 173, 400–416.e411. [Google Scholar] [CrossRef]
- Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.S.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016, 44, e71. [Google Scholar] [CrossRef] [PubMed]
- Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature subset selection for malware detection in smart IoT platforms. Sensors 2021, 21, 1374. [Google Scholar] [CrossRef]
- Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
- Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483, 603–607. [Google Scholar] [CrossRef]
- Li, Y.; Umbach, D.M.; Bingham, A.; Li, Q.J.; Zhuang, Y.; Li, L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genom. 2019, 20, 1021. [Google Scholar] [CrossRef] [PubMed]
- Pellegrino, E.; Jacques, C.; Beaufils, N.; Nanni, I.; Carlioz, A.; Metellus, P.; Ouafik, L. Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci. Rep. 2021, 11, 21820. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Liu, H.-C.; Ng, K.-L.; Mekala, V.R.; Huang, C.-H. TMMGdb-Tumor Metastasis Mechanism-associated Gene Database. Curr. Bioinform. 2023, 18, 63–75. [Google Scholar]
- Liu, Y.; Li, Z.; Lu, J.; Zhao, M.; Qu, H. CMGene: A literature-based database and knowledge resource for cancer metastasis genes. J. Genet. Genom. 2017, 44, 277–279. [Google Scholar] [CrossRef]
- Zheng, G.; Ma, Y.; Zou, Y.; Yin, A.; Li, W.; Dong, D. HCMDB: The human cancer metastasis database. Nucleic Acids Res. 2018, 46, D950–D955. [Google Scholar] [CrossRef]
- Piñero, J.; Ramírez-Anguita, J.M.; Saüch-Pitarch, J.; Ronzano, F.; Centeno, E.; Sanz, F.; Furlong, L.I. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020, 48, D845–D855. [Google Scholar] [CrossRef] [PubMed]
- Ren, M.; Gao, Y.; Chen, Q.; Zhao, H.; Zhao, X.; Yue, W. The overexpression of keratin 23 promotes migration of ovarian cancer via epithelial-mesenchymal transition. BioMed Res. Int. 2020, 2020, 8218735. [Google Scholar] [CrossRef]
- Xie, Y.; Wolff, D.W.; Wei, T.; Wang, B.; Deng, C.; Kirui, J.K.; Jiang, H.; Qin, J.; Abel, P.W.; Tu, Y. Breast cancer migration and invasion depend on proteasome degradation of regulator of G-protein signaling 4. Cancer Res. 2009, 69, 5743–5751. [Google Scholar] [CrossRef]
- Maity, B.; Stewart, A.; O’Malley, Y.; Askeland, R.W.; Sugg, S.L.; Fisher, R.A. Regulator of G protein signaling 6 is a novel suppressor of breast tumor initiation and progression. Carcinogenesis 2013, 34, 1747–1755. [Google Scholar] [CrossRef]
- Papatsirou, M.; Diamantopoulos, M.A.; Katsaraki, K.; Kletsas, D.; Kontos, C.K.; Scorilas, A. Identification of novel circular RNAs of the human protein arginine methyltransferase 1 (PRMT1) gene, expressed in breast cancer cells. Genes 2022, 13, 1133. [Google Scholar] [CrossRef] [PubMed]
- Vasudevan, S.A.; Shang, X.; Chang, S.; Ge, N.; Diaz-Miron, J.L.; Russell, H.V.; Hicks, M.J.; Ludwig, A.D.; Wesson, C.L.; Burlingame, S.M. Neuroblastoma-derived secretory protein is a novel secreted factor overexpressed in neuroblastoma. Mol. Cancer Ther. 2009, 8, 2478–2489. [Google Scholar] [CrossRef]
- Keenan, A.B.; Torre, D.; Lachmann, A.; Leong, A.K.; Wojciechowicz, M.L.; Utti, V.; Jagodnik, K.M.; Kropiwnicki, E.; Wang, Z.; Ma’ayan, A. ChEA3: Transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 2019, 47, W212–W224. [Google Scholar] [CrossRef] [PubMed]
- Yong, B.-C.; Lu, J.-C.; Xie, X.-B.; Su, Q.; Tan, P.-X.; Tang, Q.-L.; Wang, J.; Huang, G.; Han, J.; Xu, H.-W. LDOC1 regulates Wnt5a expression and osteosarcoma cell metastasis and is correlated with the survival of osteosarcoma patients. Tumor Biol. 2017, 39, 1010428317691188. [Google Scholar] [CrossRef]
- Meyer-Schaller, N.; Tiede, S.; Ivanek, R.; Diepenbruck, M.; Christofori, G. A dual role of Irf1 in maintaining epithelial identity but also enabling EMT and metastasis formation of breast cancer cells. Oncogene 2020, 39, 4728–4740. [Google Scholar] [CrossRef] [PubMed]
- Maubant, S.; Tahtouh, T.; Brisson, A.; Maire, V.; Némati, F.; Tesson, B.; Ye, M.; Rigaill, G.; Noizet, M.; Dumont, A. LRP5 regulates the expression of STK40, a new potential target in triple-negative breast cancers. Oncotarget 2018, 9, 22586. [Google Scholar] [CrossRef]
- Zhang, R.; Zhu, Z.; Shen, W.; Li, X.; Dhoomun, D.K.; Tian, Y. Golgi membrane protein 1 (GOLM1) promotes growth and metastasis of breast cancer cells via regulating matrix metalloproteinase-13 (MMP13). Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 2019, 25, 847. [Google Scholar] [CrossRef] [PubMed]
- Chaudhary, S.; Appadurai, M.I.; Maurya, S.K.; Nallasamy, P.; Marimuthu, S.; Shah, A.; Atri, P.; Ramakanth, C.V.; Lele, S.M.; Seshacharyulu, P. MUC16 promotes triple-negative breast cancer lung metastasis by modulating RNA-binding protein ELAVL1/HUR. Breast Cancer Res. 2023, 25, 1–15. [Google Scholar] [CrossRef]
- Zhao, Y.; Kaushik, N.; Kang, J.-H.; Kaushik, N.K.; Son, S.H.; Uddin, N.; Kim, M.-J.; Kim, C.G.; Lee, S.-J. A feedback loop comprising EGF/TGFα sustains TFCP2-mediated breast cancer progression. Cancer Res. 2020, 80, 2217–2229. [Google Scholar] [CrossRef]
- Xu, M.-Y.; Chen, R.; Yu, J.-X.; Liu, T.; Qu, Y.; Lu, L.-G. AZGP1 suppresses epithelial-to-mesenchymal transition and hepatic carcinogenesis by blocking TGFβ1-ERK2 pathways. Cancer Lett. 2016, 374, 241–249. [Google Scholar] [CrossRef] [PubMed]
- Oughtred, R.; Stark, C.; Breitkreutz, B.-J.; Rust, J.; Boucher, L.; Chang, C.; Kolas, N.; O’Donnell, L.; Leung, G.; McAdam, R. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019, 47, D529–D541. [Google Scholar] [CrossRef]
- Cao, Y.; Geddes, T.A.; Yang, J.Y.H.; Yang, P. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2020, 2, 500–508. [Google Scholar] [CrossRef]
- Bartoszewicz, J.M.; Seidel, A.; Rentzsch, R.; Renard, B.Y. DeePaC: Predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 2020, 36, 81–89. [Google Scholar] [CrossRef]
- Torrisi, M.; Kaleel, M.; Pollastri, G. Deeper profiles and cascaded recurrent and convolutional neural networks for state-of-the-art protein secondary structure prediction. Sci. Rep. 2019, 9, 12374. [Google Scholar] [CrossRef]
- Grewal, J.K.; Tessier-Cloutier, B.; Jones, M.; Gakkhar, S.; Ma, Y.; Moore, R.; Mungall, A.J.; Zhao, Y.; Taylor, M.D.; Gelmon, K. Application of a neural network whole transcriptome–based pan-cancer method for diagnosis of primary and metastatic cancers. JAMA Netw. Open 2019, 2, e192597. [Google Scholar] [CrossRef] [PubMed]
- Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.A.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A.H. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef]
- Choudhury, A.; Gupta, D. A survey on medical diagnosis of diabetes using machine learning techniques. In Recent Developments in Machine Learning and Data Analytics: IC3 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 67–78. [Google Scholar]
- Shi, P.; Ray, S.; Zhu, Q.; Kon, M.A. Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinform. 2011, 12, 1–15. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
- Ravi, D.; Wong, C.; Lo, B.; Yang, G.-Z. A deep learning approach to on-node sensor data analytics for mobile or wearable devices. IEEE J. Biomed. Health Inform. 2016, 21, 56–64. [Google Scholar] [CrossRef] [PubMed]
- Mulita, F.; Verras, G.-I.; Anagnostopoulos, C.-N.; Kotis, K. A smarter health through the internet of surgical things. Sensors 2022, 22, 4577. [Google Scholar] [CrossRef]
- Cos, H.; Li, D.; Williams, G.; Chininis, J.; Dai, R.; Zhang, J.; Srivastava, R.; Raper, L.; Sanford, D.; Hawkins, W. Predicting outcomes in patients undergoing pancreatectomy using wearable technology and machine learning: Prospective cohort study. J. Med. Internet Res. 2021, 23, e23595. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jung, J.; Yoo, S. Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches. Genes 2023, 14, 1820. https://doi.org/10.3390/genes14091820
Jung J, Yoo S. Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches. Genes. 2023; 14(9):1820. https://doi.org/10.3390/genes14091820
Chicago/Turabian StyleJung, Jinmyung, and Sunyong Yoo. 2023. "Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches" Genes 14, no. 9: 1820. https://doi.org/10.3390/genes14091820
APA StyleJung, J., & Yoo, S. (2023). Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches. Genes, 14(9), 1820. https://doi.org/10.3390/genes14091820