Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging
Abstract
:1. Introduction
- Bayesian Ridge Regression (BRR) is introduced as a supervised domain-oriented feature selection method to reduce omics complexity and dimensionality. Features are selected based on domain contexts, such as drug response and cancer classification.
- A new method named Iterative Similarity Bagging (ISB) is presented to perform a dynamic reduction of dimensionality and complexity without losing the biological measurements of omics data, which is a common issue with some transformation-based integration approaches.
2. Related Work
3. Materials and Methods
3.1. Datasets
3.2. Bayesian Ridge Regression
3.3. Iterative Similarity Bagging
Algorithm 1: Iterative Similarity Bagging method |
Input: |
• Iterations count: i • Bag size (Columns in the bag): k • Single-omics dataset: data |
Output: |
• Selected features list after all iterations: selected_features |
Begin Declare List selected_features = [] // Increment value after each iteration Declare integer increment value: c Set index= 0 // Starting column to select genes in the bag. Set c= k For index=0: i For j=0: range(0,len(data.columns)) IF index < len(data.columns): df_bag = SELECT_COLUMNS(data, index : k df = TRANSPOSE(df_bag) df_Sim = compute_similarity(df) // Euclidean distance threshold= get_threshold(df_Sim) // Half-mean threshold iteration_selected_features = df_Sim[col] > threshold selected_features+= list(unique(iteration_selected_features)) index = index + c k=k + c End IF End For End For End |
3.4. Drug Response Prediction Using Graph Convolutional Network and Convolutional Neural Network
3.5. Evaluation Metrics
3.6. Experimental Setup
4. Results and Discussion
4.1. Genomic Features Selected by BRR
4.2. Genomic Features Selected by ISB
4.3. Effectiveness of BRR-ISB in Drug Response Prediction
4.4. Comparison with Related Works
- Researchers utilized Weighted Graph Regularized Matrix Factorization (WGRMF) [92] to predict the responses of cell lines to anti-cancer drugs. This model used the CCLE, which has 491 cell lines and 23 drugs with 10,870 known responses. WGRMF employed gene expression and drug fingerprints as inputs for the model.
- EBSRMF [81]: Researchers proposed Ensemble-based Similarity-Regularized Matrix Factorization, a bagging-based technique to enhance drug response prediction accuracy on the CCLE dataset. The dataset comprises 24 drugs and 363 types of cell lines. It utilized gene expression profiles and chemical structure.
- DeepDSC [80]: Gene expression data were utilized to extract features of cell lines by a stacked deep autoencoder. Subsequently, the gene expression data were combined with chemical structure information to forecast drug response. DeepDSC utilized the Cancer Cell Line Encyclopedia (CCLE), which has 491 cell lines and 23 drugs, along with 10,870 documented responses.
- SRMF [93]: Drug response prediction was accomplished by combining gene expression data with chemical structures using a Similarity-Regularized Matrix Factorization model. The CCLE dataset has 10,870 known responses, encompassing 491 distinct cell lines and 23 drugs.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-Omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020, 14, 117793221989905. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Wang, J.; Pan, D.; Wang, X.; Xu, Y.; Yan, J.; Wang, L.; Yang, X.; Yang, M.; Liu, G. Applications of Multi-omics Analysis in Human Diseases. MedComm 2023, 4, e315. [Google Scholar] [CrossRef] [PubMed]
- Kreitmaier, P.; Katsoula, G.; Zeggini, E. Insights from Multi-Omics Integration in Complex Disease Primary Tissues. Trends Genet. 2023, 39, 46–58. [Google Scholar] [CrossRef] [PubMed]
- Chong, J.; Soufan, O.; Li, C.; Caraus, I.; Li, S.; Bourque, G.; Wishart, D.S.; Xia, J. MetaboAnalyst 4.0: Towards More Transparent and Integrative Metabolomics Analysis. Nucleic Acids Res. 2018, 46, W486–W494. [Google Scholar] [CrossRef] [PubMed]
- López de Maturana, E.; Alonso, L.; Alarcón, P.; Martín-Antoniano, I.A.; Pineda, S.; Piorno, L.; Calle, M.L.; Malats, N. Challenges in the Integration of Omics and Non-Omics Data. Genes 2019, 10, 238. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Poulos, R.C.; Liu, J.; Zhong, Q. Machine Learning for Multi-Omics Data Integration in Cancer. iScience 2022, 25, 103798. [Google Scholar] [CrossRef] [PubMed]
- Picard, M.; Scott-Boyer, M.-P.; Bodein, A.; Périn, O.; Droit, A. Integration Strategies of Multi-Omics Data for Machine Learning Analysis. Comput. Struct. Biotechnol. J. 2021, 19, 3735–3746. [Google Scholar] [CrossRef]
- Hasin, Y.; Seldin, M.; Lusis, A. Multi-Omics Approaches to Disease. Genome Biol. 2017, 18, 83. [Google Scholar] [CrossRef] [PubMed]
- Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial Sequencing and Analysis of the Human Genome. Nature 2001, 409, 860–921. [Google Scholar] [CrossRef]
- Reel, P.S.; Reel, S.; Pearson, E.; Trucco, E.; Jefferson, E. Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnol. Adv. 2021, 49, 107739. [Google Scholar] [CrossRef]
- Almutiri, T.; Alomar, K.; Alganmi, N. Predicting Drug Response on Multi-Omics Data Using a Hybrid of Bayesian Ridge Regression with Deep Forest. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 470–482. [Google Scholar] [CrossRef]
- Nicora, G.; Vitali, F.; Dagliati, A.; Geifman, N.; Bellazzi, R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front. Oncol. 2020, 10, 1030. [Google Scholar] [CrossRef] [PubMed]
- Xuan, P.; Sun, C.; Zhang, T.; Ye, Y.; Shen, T.; Dong, Y. Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs. Front. Genet. 2019, 10, 459. [Google Scholar] [CrossRef] [PubMed]
- Yue, X.; Wang, Z.; Huang, J.; Parthasarathy, S.; Moosavinasab, S.; Huang, Y.; Lin, S.M.; Zhang, W.; Zhang, P.; Sun, H. Graph Embedding on Biomedical Networks: Methods, Applications and Evaluations. Bioinformatics 2020, 36, 1241–1251. [Google Scholar] [CrossRef]
- Ma, T.; Zhang, A. Affinity Network Fusion and Semi-Supervised Learning for Cancer Patient Clustering. Methods 2018, 145, 16–24. [Google Scholar] [CrossRef] [PubMed]
- Gligorijević, V.; Barot, M.; Bonneau, R. DeepNF: Deep Network Fusion for Protein Function Prediction. Bioinformatics 2018, 34, 3873–3881. [Google Scholar] [CrossRef] [PubMed]
- Wen, Y.; Song, X.; Yan, B.; Yang, X.; Wu, L.; Leng, D.; He, S.; Bo, X. Multi-Dimensional Data Integration Algorithm Based on Random Walk with Restart. BMC Bioinform. 2021, 22, 97. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Li, A.; Peng, C.; Wang, M. Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 13, 825–835. [Google Scholar] [CrossRef]
- He, Z.; Zhang, J.; Yuan, X.; Zhang, Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front. Genet. 2021, 11, 632901. [Google Scholar] [CrossRef]
- Ammad-ud-din, M.; Khan, S.A.; Malani, D.; Murumägi, A.; Kallioniemi, O.; Aittokallio, T.; Kaski, S. Drug Response Prediction by Inferring Pathway-Response Associations with Kernelized Bayesian Matrix Factorization. Bioinformatics 2016, 32, i455–i463. [Google Scholar] [CrossRef]
- Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Ammad-ud-din, M.; Hintsanen, P.; Khan, S.A.; et al. A Community Effort to Assess and Improve Drug Sensitivity Prediction Algorithms. Nat. Biotechnol. 2014, 32, 1202–1212. [Google Scholar] [CrossRef] [PubMed]
- Vahabi, N.; Michailidis, G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front. Genet. 2022, 13, 854752. [Google Scholar] [CrossRef] [PubMed]
- Gligorijević, V.; Pržulj, N. Methods for Biological Data Integration: Perspectives and Challenges. J. R. Soc. Interface 2015, 12, 20150571. [Google Scholar] [CrossRef] [PubMed]
- Wang, B.; Mezlini, A.M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity Network Fusion for Aggregating Data Types on a Genomic Scale. Nat. Methods 2014, 11, 333–337. [Google Scholar] [CrossRef] [PubMed]
- Efendi, A.; Effrihan, E. A Simulation Study on Bayesian Ridge Regression Models for Several Collinearity Levels. AIP Conf. Proc. 2017, 1913, 020031. [Google Scholar]
- Yassen, M.F.; Al-Duais, F.S.; Almazah, M. Ridge Regression Method and Bayesian Estimators under Composite LINEX Loss Function to Estimate the Shape Parameter in Lomax Distribution. Comput. Intell. Neurosci. 2022, 2022, 1200611. [Google Scholar] [CrossRef] [PubMed]
- Flavin, T.; Steiner, T.; Mitra, B.; Nagaraju, V. Bayesian Ridge Regression Based Model to Predict Fault Location in HVdc Network. In Proceedings of the 2022 IEEE Power & Energy Society General Meeting (PESGM), Denver, CO, USA, 17–21 July 2022; pp. 1–5. [Google Scholar]
- Ngo, G.; Beard, R.; Chandra, R. Evolutionary Bagging for Ensemble Learning. Neurocomputing 2022, 510, 1–14. [Google Scholar] [CrossRef]
- Toloşi, L.; Lengauer, T. Classification with Correlated Features: Unreliability of Feature Ranking and Solutions. Bioinformatics 2011, 27, 1986–1994. [Google Scholar] [CrossRef]
- Jain, I.; Jain, V.K.; Jain, R. Correlation Feature Selection Based Improved-Binary Particle Swarm Optimization for Gene Selection and Cancer Classification. Appl. Soft Comput. 2018, 62, 203–215. [Google Scholar] [CrossRef]
- Darst, B.F.; Malecki, K.C.; Engelman, C.D. Using Recursive Feature Elimination in Random Forest to Account for Correlated Variables in High Dimensional Data. BMC Genet. 2018, 19, 65. [Google Scholar] [CrossRef]
- Misra, B.B.; Langefeld, C.; Olivier, M.; Cox, L.A. Integrated Omics: Tools, Advances and Future Approaches. J. Mol. Endocrinol. 2019, 62, R21–R45. [Google Scholar] [CrossRef] [PubMed]
- Wörheide, M.A.; Krumsiek, J.; Kastenmüller, G.; Arnold, M. Multi-Omics Integration in Biomedical Research—A Metabolomics-Centric Review. Anal. Chim. Acta 2021, 1141, 144–162. [Google Scholar] [CrossRef]
- Park, M.; Kim, D.; Moon, K.; Park, T. Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components. Int. J. Mol. Sci. 2020, 21, 8202. [Google Scholar] [CrossRef]
- Xie, G.; Dong, C.; Kong, Y.; Zhong, J.; Li, M.; Wang, K. Group Lasso Regularized Deep Learning for Cancer Prognosis from Multi-Omics and Clinical Features. Genes 2019, 10, 240. [Google Scholar] [CrossRef] [PubMed]
- Xie, M.; Lei, X.; Zhong, J.; Ouyang, J.; Li, G. Drug Response Prediction Using Graph Representation Learning and Laplacian Feature Selection. BMC Bioinform. 2022, 23, 532. [Google Scholar] [CrossRef] [PubMed]
- Chu, T.; Nguyen, T.T.; Hai, B.D.; Nguyen, Q.H.; Nguyen, T. Graph Transformer for Drug Response Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 1065–1072. [Google Scholar] [CrossRef] [PubMed]
- Malik, V.; Kalakoti, Y.; Sundar, D. Deep Learning Assisted Multi-Omics Integration for Survival and Drug-Response Prediction in Breast Cancer. BMC Genom. 2021, 22, 214. [Google Scholar] [CrossRef]
- Wang, Z.; Li, H.; Carpenter, C.; Guan, Y. Challenge-Enabled Machine Learning to Drug-Response Prediction. AAPS J. 2020, 22, 106. [Google Scholar] [CrossRef]
- Bühlmann, P.; Van De Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011; ISBN 364220192X. [Google Scholar]
- Bøvelstad, H.M.; Nygård, S.; Størvold, H.L.; Aldrin, M.; Borgan, Ø.; Frigessi, A.; Lingjærde, O.C. Predicting Survival from Microarray Data—A Comparative Study. Bioinformatics 2007, 23, 2080–2087. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Partin, A.; Brettin, T.; Evrard, Y.A.; Zhu, Y.; Yoo, H.; Xia, F.; Jiang, S.; Clyde, A.; Shukla, M.; Fonstein, M. Learning Curves for Drug Response Prediction in Cancer Cell Lines. BMC Bioinform. 2021, 22, 252. [Google Scholar] [CrossRef] [PubMed]
- Chang, Y.; Park, H.; Yang, H.-J.; Lee, S.; Lee, K.-Y.; Kim, T.S.; Jung, J.; Shin, J.-M. Cancer Drug Response Profile Scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature. Sci. Rep. 2018, 8, 8857. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Brettin, T.; Evrard, Y.A.; Partin, A.; Xia, F.; Shukla, M.; Yoo, H.; Doroshow, J.H.; Stevens, R.L. Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug Response. Sci. Rep. 2020, 10, 18040. [Google Scholar] [CrossRef] [PubMed]
- Sotudian, S.; Paschalidis, I.C. Machine Learning for Pharmacogenomics and Personalized Medicine: A Ranking Model for Drug Sensitivity Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 2324–2333. [Google Scholar] [CrossRef] [PubMed]
- Roder, J.; Oliveira, C.; Net, L.; Tsypin, M.; Linstid, B.; Roder, H. A Dropout-Regularized Classifier Development Approach Optimized for Precision Medicine Test Discovery from Omics Data. BMC Bioinform. 2019, 20, 325. [Google Scholar] [CrossRef] [PubMed]
- Xiaolin, X.; Xiaozhi, L.; Guoping, H.; Hongwei, L.; Jinkuo, G.; Xiyun, B.; Zhen, T.; Xiaofang, M.; Yanxia, L.; Na, X. Overfit Deep Neural Network for Predicting Drug-Target Interactions. iScience 2023, 26, 107646. [Google Scholar] [CrossRef]
- Iorio, F.; Knijnenburg, T.A.; Vis, D.J.; Bignell, G.R.; Menden, M.P.; Schubert, M.; Aben, N.; Gonçalves, E.; Barthorpe, S.; Lightfoot, H.; et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell 2016, 166, 740–754. [Google Scholar] [CrossRef] [PubMed]
- Kurilov, R.; Haibe-Kains, B.; Brors, B. Assessment of Modelling Strategies for Drug Response Prediction in Cell Lines and Xenografts. Sci. Rep. 2020, 10, 2849. [Google Scholar] [CrossRef]
- Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D.; et al. The Cancer Cell Line Encyclopedia Enables Predictive Modelling of Anticancer Drug Sensitivity. Nature 2012, 483, 603–607. [Google Scholar] [CrossRef]
- Yang, W.; Soares, J.; Greninger, P.; Edelman, E.J.; Lightfoot, H.; Forbes, S.; Bindal, N.; Beare, D.; Smith, J.A.; Thompson, I.R. Genomics of Drug Sensitivity in Cancer (GDSC): A Resource for Therapeutic Biomarker Discovery in Cancer Cells. Nucleic Acids Res. 2012, 41, D955–D961. [Google Scholar] [CrossRef] [PubMed]
- Xu, X.; Gu, H.; Wang, Y.; Wang, J.; Qin, P. Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response. Front. Genet. 2019, 10, 233. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A. PubChem Substance and Compound Databases. Nucleic Acids Res. 2016, 44, D1202–D1213. [Google Scholar] [CrossRef] [PubMed]
- O’Boyle, N.M. Towards a Universal SMILES Representation—A Standard Method to Generate Canonical SMILES Based on the InChI. J. Cheminform. 2012, 4, 22. [Google Scholar] [CrossRef]
- Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular Graph Convolutions: Moving beyond Fingerprints. J. Comput. Aided Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [PubMed]
- Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N. Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 302–310. [Google Scholar]
- Landrum, G. Rdkit: Open-Source Cheminformatics Software. 2016. Volume 149. p. 650. Available online: http://www.rdkit.org/ (accessed on 24 June 2024).
- Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2019; ISBN 1492039780. [Google Scholar]
- Nguyen, T.; Nguyen, G.T.T.; Nguyen, T.; Le, D.-H. Graph Convolutional Networks for Drug Response Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 146–154. [Google Scholar] [CrossRef]
- Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A Benchmark for Molecular Machine Learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed]
- Fernández, I.; Frenking, G.; Merino, G. Aromaticity of Metallabenzenes and Related Compounds. Chem. Soc. Rev. 2015, 44, 6452–6463. [Google Scholar] [CrossRef]
- Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
- Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
- Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Neal, R.M. Bayesian Learning for Neural Networks; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 118, ISBN 1461207452. [Google Scholar]
- MacKay, D.J.C. Bayesian Interpolation. Neural Comput. 1992, 4, 415–447. [Google Scholar] [CrossRef]
- Ozdemir, S.; Susarla, D. Feature Engineering Made Easy: Identify Unique Features from Your Dataset in Order to Build Powerful Machine Learning Systems; Packt Publishing Ltd.: Birmingham, UK, 2018; ISBN 1787286479. [Google Scholar]
- Tancredi, A.; Anderson, C.; O’Hagan, A. Accounting for Threshold Uncertainty in Extreme Value Estimation. Extremes 2006, 9, 87–106. [Google Scholar] [CrossRef]
- Goodspeed, A.; Heiser, L.M.; Gray, J.W.; Costello, J.C. Tumor-Derived Cell Lines as Molecular Models of Cancer Pharmacogenomics. Mol. Cancer Res. 2016, 14, 3–13. [Google Scholar] [CrossRef] [PubMed]
- Gambardella, V.; Tarazona, N.; Cejalvo, J.M.; Lombardi, P.; Huerta, M.; Roselló, S.; Fleitas, T.; Roda, D.; Cervantes, A. Personalized Medicine: Recent Progress in Cancer Therapy. Cancers 2020, 12, 1009. [Google Scholar] [CrossRef] [PubMed]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Joseph, V.R. Optimal Ratio for Data Splitting. Stat. Anal. Data Min. ASA Data Sci. J. 2022, 15, 531–538. [Google Scholar] [CrossRef]
- Dunford, R.; Su, Q.; Tamang, E. The Pareto Principle. 2014. Available online: https://core.ac.uk/download/pdf/200202097.pdf (accessed on 24 June 2024).
- Nti, I.K.; Nyarko-Boateng, O.; Aning, J. Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross-Validation. Int. J. Inf. Technol. Comput. Sci. 2021, 13, 61–71. [Google Scholar]
- Wong, T.-T.; Yeh, P.-Y. Reliable Accuracy Estimates from K-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2019, 32, 1586–1594. [Google Scholar] [CrossRef]
- Liu, Q.; Hu, Z.; Jiang, R.; Zhou, M. DeepCDR: A Hybrid Graph Convolutional Network for Predicting Cancer Drug Response. Bioinformatics 2020, 36, i911–i918. [Google Scholar] [CrossRef]
- Li, M.; Wang, Y.; Zheng, R.; Shi, X.; Li, Y.; Wu, F.-X.; Wang, J. DeepDSC: A Deep Learning Method to Predict Drug Sensitivity of Cancer Cell Lines. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 575–582. [Google Scholar] [CrossRef] [PubMed]
- Shahzad, M.; Tahir, M.A.; Khan, M.A.; Jiang, R.; Malick, R.A.S. EBSRMF: Ensemble Based Similarity-Regularized Matrix Factorization to Predict Anticancer Drug Responses. J. Intell. Fuzzy Syst. 2022, 43, 3443–3452. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Beware of Q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Qiu, T.; Jiang, D.; Xu, H.; Zou, L.; Yang, Q.; Chen, C.; Jiao, B. SGCE Promotes Breast Cancer Stem Cells by Stabilizing EGFR. Adv. Sci. 2020, 7, 1903700. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Wang, H.; Liu, A. Identification of ATP1B1, a Key Copy Number Driver Gene in Diffuse Large B-Cell Lymphoma and Potential Target for Drugs. Ann. Transl. Med. 2022, 10, 1136. [Google Scholar] [CrossRef] [PubMed]
- Katuwal, N.B.; Kang, M.S.; Ghosh, M.; Hong, S.D.; Jeong, Y.G.; Park, S.M.; Kim, S.-G.; Sohn, J.; Kim, T.H.; Moon, Y.W. Targeting PEG10 as a Novel Therapeutic Approach to Overcome CDK4/6 Inhibitor Resistance in Breast Cancer. J. Exp. Clin. Cancer Res. 2023, 42, 325. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Xiang, L.; Peng, L.; Gu, H.; Wang, Y. Comprehensive Analysis of the Immune Implication of AKAP12 in Stomach Adenocarcinoma. Comput. Math. Methods Med. 2022, 2022, 3445230. [Google Scholar] [CrossRef]
- Lodi, M.; Voilquin, L.; Alpy, F.; Molière, S.; Reix, N.; Mathelin, C.; Chenard, M.-P.; Tomasetto, C.-L. STARD3: A New Biomarker in HER2-Positive Breast Cancer. Cancers 2023, 15, 362. [Google Scholar] [CrossRef] [PubMed]
- Shen, R.; Mo, Q.; Schultz, N.; Seshan, V.E.; Olshen, A.B.; Huse, J.; Ladanyi, M.; Sander, C. Integrative Subtype Discovery in Glioblastoma Using ICluster. PLoS ONE 2012, 7, e35236. [Google Scholar] [CrossRef]
- Bishop, C.M.; Tipping, M.E. Bayesian Regression and Classification. Nato Sci. Ser. Sub Ser. III Comput. Syst. Sci. 2003, 190, 267–288. [Google Scholar]
- Ying, X. An Overview of Overfitting and Its Solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, Y.; Li, Z. Removing the Feature Correlation Effect of Multiplicative Noise. Adv. Neural Inf. Process. Syst. 2018, 31. Available online: https://papers.nips.cc/paper_files/paper/2018/hash/e7b24b112a44fdd9ee93bdf998c6ca0e-Abstract.html (accessed on 24 June 2024).
- Guan, N.-N.; Zhao, Y.; Wang, C.-C.; Li, J.-Q.; Chen, X.; Piao, X. Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization. Mol. Ther. Nucleic Acids 2019, 17, 164–174. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Li, X.; Zhang, L.; Gao, Q. Improved Anticancer Drug Response Prediction in Cell Lines Using Matrix Factorization with Similarity Regularization. BMC Cancer 2017, 17, 513. [Google Scholar] [CrossRef] [PubMed]
- Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Type | Raw Data | Processed |
---|---|---|
Drugs | 24 | 24 |
Cell lines | 1061 | 363 |
Gene expression | 20,049 | 19,389 |
Copy number alteration | 24,960 | 24,960 |
Single-nucleotide mutation | 1667 | 1667 |
Type | Raw Data | Processed |
---|---|---|
Drugs | 98 | 98 |
Cell lines | 1124 | 555 |
Gene expression | 11,833 | 11,712 |
Copy number alteration | 24,960 | 24,959 |
Single-nucleotide mutation | 70 | 54 |
Type | Genes |
---|---|
Gene expression | TFPI2, SGCE, PPIC, ATP1B1, DSP, PEG10, MAGEA4, C1S, CPVL, GATA6. |
Copy number alteration | RASSF8AS1, MIR4302, CCNE1, RASSF8, LMNTD1, LOC102724958, STARD3, LINC00906, KRAS, LYRM5 |
Single-nucleotide mutation | AKAP12, TP53, NLRP3, ATRX, OBSCN, CARD10, KRAS, ATR, FZD1, GPR112 |
Type | Genes |
---|---|
Gene expression | TFPI2, SGCE, ATP1B1, DSP, PEG10, MAGEA4, C1S, CPVL, GATA6, RP11-490M8.1 |
Copy number alteration | RASSF8-AS1, STARD3, PPP1R1B, SLC35E3, ZNF536, SOX5, TRIT1, TMEM75, ZNF879, ST8SIA1 |
Single-nucleotide mutation | AKAP12, TP53, NLRP3, ATRX, OBSCN, CARD10, KRAS, ATR, FZD1, GPR112 |
Method | Input Features | Model Features | Training | Validation | Testing | Time | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | PCC | R2 | RMSE | PCC | R2 | RMSE | PCC | R2 | ||||
Baseline | 46,016 | 46,016 | 0.088 | 0.935 | 0.873 | 0.12 | 0.864 | 0.744 | 0.13 | 0.13 | 0.737 | 4:03:11 |
Iterative Similarity Bagging (ISB) | ||||||||||||
ISB bag size = 50, iterations = 5 | 46,016 | 13,844 | 0.087 | 0.935 | 0.875 | 0.118 | 0.871 | 0.755 | 0.126 | 0.87 | 0.754 | 10:17 |
ISB bag size = 50, iterations = 10 | 46,016 | 12,270 | 0.103 | 0.909 | 0.824 | 0.127 | 0.847 | 0.716 | 0.13 | 0.858 | 0.736 | 9:16 |
ISB bag size = 100, iterations = 10 | 46,016 | 5926 | 0.08 | 0.946 | 0.894 | 0.117 | 0.87 | 0.755 | 0.129 | 0.862 | 0.74 | 5:50 |
ISB bag size = 200, iterations = 10 | 46,016 | 2390 | 0.091 | 0.929 | 0.863 | 0.116 | 0.875 | 0.764 | 0.124 | 0.873 | 0.76 | 3:51 |
ISB bag size = 300, iterations = 10 | 46,016 | 2261 | 0.091 | 0.929 | 0.863 | 0.119 | 0.866 | 0.747 | 0.126 | 0.868 | 0.75 | 3:48 |
ISB bag size = 400, iterations = 10 | 46,016 | 2119 | 0.1 | 0.917 | 0.837 | 0.119 | 0.866 | 0.75 | 0.127 | 0.865 | 0.747 | 3:42 |
Bayesian Ridge Regression with Iterative Similarity Bagging (BRR-ISB) | ||||||||||||
BRR | 23,683 | 23,683 | 0.087 | 0.937 | 0.877 | 0.117 | 0.872 | 0.758 | 0.125 | 0.87 | 0.754 | 15:43 |
BRR-ISB bag size = 50, iterations = 5 | 23,683 | 5740 | 0.093 | 0.926 | 0.857 | 0.115 | 0.876 | 0.766 | 0.124 | 0.872 | 0.759 | 5:48 |
BRR-ISB bag size = 50, iterations = 10 | 23,683 | 4822 | 0.094 | 0.924 | 0.854 | 0.121 | 0.86 | 0.739 | 0.127 | 0.865 | 0.748 | 5:20 |
BRR-ISB bag size = 100, iterations = 10 | 23,683 | 2133 | 0.099 | 0.917 | 0.84 | 0.117 | 0.871 | 0.758 | 0.125 | 0.869 | 0.754 | 3:57 |
BRR-ISB bag size = 200, iterations = 10 | 23,683 | 1273 | 0.097 | 0.919 | 0.845 | 0.114 | 0.879 | 0.771 | 0.121 | 0.879 | 0.77 | 3:50 |
BRR-ISB bag size = 300, iterations = 10 | 23,683 | 1245 | 0.099 | 0.918 | 0.84 | 0.116 | 0.872 | 0.76 | 0.122 | 0.877 | 0.768 | 3:47 |
BRR-ISB bag size = 400, iterations = 10 | 23,683 | 1260 | 0.097 | 0.921 | 0.846 | 0.116 | 0.874 | 0.763 | 0.127 | 0.867 | 0.749 | 3:50 |
Similarity Network Fusion | ||||||||||||
SNF | 46,016 | 363 | 0.13 | 0.854 | 0.721 | 0.13 | 0.841 | 0.699 | 0.134 | 0.848 | 0.716 | 3:02 |
BRR-SNF | 23,683 | 363 | 0.126 | 0.859 | 0.738 | 0.127 | 0.844 | 0.713 | 0.138 | 0.847 | 0.7 | 3:02 |
Method | Input Features | Model Features | Training | Validation | Testing | Time | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | PCC | R2 | RMSE | PCC | R2 | RMSE | PCC | R2 | ||||
Baseline | 36,725 | 36,725 | 0.023 | 0.934 | 0.872 | 0.032 | 0.879 | 0.771 | 0.03 | 0.89 | 0.791 | 19:38:19 |
Iterative Similarity Bagging (ISB) | ||||||||||||
ISB bag size = 50, iterations = 5 | 36,725 | 11,987 | 0.022 | 0.943 | 0.889 | 0.031 | 0.881 | 0.776 | 0.03 | 0.895 | 0.8 | 59:48 |
ISB bag size = 50, iterations = 10 | 36,725 | 10,367 | 0.023 | 0.935 | 0.874 | 0.031 | 0.88 | 0.774 | 0.03 | 0.895 | 0.799 | 51:44 |
ISB bag size = 100, iterations = 10 | 36,725 | 4211 | 0.024 | 0.933 | 0.867 | 0.031 | 0.887 | 0.783 | 0.029 | 0.9 | 0.808 | 32:52 |
ISB bag size = 200, iterations = 10 | 36,725 | 1174 | 0.026 | 0.919 | 0.843 | 0.031 | 0.881 | 0.775 | 0.03 | 0.894 | 0.798 | 26:07 |
ISB bag size = 300, iterations = 10 | 36,725 | 974 | 0.025 | 0.92 | 0.846 | 0.031 | 0.883 | 0.78 | 0.029 | 0.896 | 0.803 | 24:31 |
ISB bag size = 400, iterations = 10 | 36,725 | 956 | 0.025 | 0.92 | 0.846 | 0.031 | 0.882 | 0.777 | 0.029 | 0.896 | 0.801 | 24:43 |
Bayesian Ridge Regression with Iterative Similarity Bagging (BRR-ISB) | ||||||||||||
BRR | 36,725 | 18,392 | 0.024 | 0.93 | 0.866 | 0.032 | 0.878 | 0.771 | 0.03 | 0.894 | 0.798 | 1:23:11 |
BRR-ISB bag size = 50, iterations = 5 | 18,392 | 5369 | 0.023 | 0.938 | 0.879 | 0.031 | 0.885 | 0.783 | 0.029 | 0.899 | 0.807 | 36:19 |
BRR-ISB bag size = 50, iterations = 10 | 18,392 | 4509 | 0.024 | 0.928 | 0.859 | 0.031 | 0.882 | 0.775 | 0.03 | 0.894 | 0.797 | 32:07 |
BRR-ISB bag size = 100, iterations = 10 | 18,392 | 1681 | 0.026 | 0.915 | 0.835 | 0.031 | 0.881 | 0.774 | 0.03 | 0.892 | 0.794 | 23:28 |
BRR-ISB bag size = 200, iterations = 10 | 18,392 | 606 | 0.026 | 0.916 | 0.838 | 0.031 | 0.883 | 0.777 | 0.029 | 0.896 | 0.801 | 19:58 |
BRR-ISB bag size = 300, iterations = 10 | 18,392 | 549 | 0.028 | 0.904 | 0.817 | 0.032 | 0.878 | 0.771 | 0.03 | 0.892 | 0.796 | 21:08 |
BRR-ISB bag size = 400, iterations = 10 | 18,392 | 566 | 0.028 | 0.903 | 0.815 | 0.032 | 0.877 | 0.769 | 0.03 | 0.892 | 0.796 | 21:33 |
Similarity Network Fusion | ||||||||||||
SNF | 36,725 | 555 | 0.029 | 0.896 | 0.802 | 0.033 | 0.87 | 0.756 | 0.031 | 0.884 | 0.782 | 20:23 |
BRR-SNF | 18,392 | 555 | 0.029 | 0.894 | 0.799 | 0.033 | 0.869 | 0.756 | 0.031 | 0.884 | 0.781 | 20:19 |
Model | RMSE | PCC | R2 |
---|---|---|---|
WGRMF | 0.56 | 0.72 | - |
EBSRMF | 0.21 | 0.86 | |
DeepDSC | 0.23 | - | 78 |
SRMF | 0.57 | 0.71 | |
BRR-ISB (Proposed) | 0.12 | 0.879 | 77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Almutiri, T.M.; Alomar, K.H.; Alganmi, N.A. Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Appl. Sci. 2024, 14, 5660. https://doi.org/10.3390/app14135660
Almutiri TM, Alomar KH, Alganmi NA. Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Applied Sciences. 2024; 14(13):5660. https://doi.org/10.3390/app14135660
Chicago/Turabian StyleAlmutiri, Talal Morizig, Khalid Hamad Alomar, and Nofe Ateq Alganmi. 2024. "Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging" Applied Sciences 14, no. 13: 5660. https://doi.org/10.3390/app14135660
APA StyleAlmutiri, T. M., Alomar, K. H., & Alganmi, N. A. (2024). Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Applied Sciences, 14(13), 5660. https://doi.org/10.3390/app14135660