Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species
Abstract
:1. Introduction
2. Materials and Methods
2.1. Materials
2.2. Methods
The Empirical Bayes Biclustering (Bi-EB) Model
2.3. Extracting Members of the Bicluster
2.4. Bicluster-Searching Algorithm after the Bi-EB Algorithm
2.5. Performance Comparisons among Biclustering Algorithms
2.5.1. Synthetic Data Generation
2.5.2. Evaluation Measurements
2.5.3. The Bi-EB Algorithm on the Three Synthetic Datasets
2.5.4. Comparison Based on Evaluation Measurements
3. Results
Bi-EB Targets the Module Detection of Common mRNA Expression/Protein Amount on Breast Cancer
- (i)
- The luminal A/B subtype
- (ii)
- The Basal-like subtype
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
References
- Saber, H.B.; Elloumi, M. DNA microarray data analysis: A new survey on biclustering. Int. J. Comput. Biol. 2015, 4, 21–37. [Google Scholar] [CrossRef] [Green Version]
- Cheng, Y.; Church, G.M. Biclustering of expression data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2000, 8, 93–103. [Google Scholar] [PubMed]
- Pontes, B.; Giráldez, R.; Aguilar-Ruiz, J.S. Biclustering on expression data: A review. J. Biomed. Inform. 2015, 57, 163–180. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lazzeroni, L.; Owen, A. Plaid models for gene expression data. Stat. Sin. 2002, 12, 61–86. [Google Scholar]
- Sheng, Q.; Moreau, Y.; De Moor, B. Biclustering microarray data by Gibbs sampling. Bioinformatics 2003, 19 (Suppl. S2), ii196–ii205. [Google Scholar] [CrossRef] [Green Version]
- Gu, J.; Liu, J.S. Bayesian biclustering of gene expression data. BMC Genom. 2008, 9, S4. [Google Scholar] [CrossRef] [Green Version]
- Amar, D.; Yekutieli, D.; Maron-Katz, A.; Hendler, T.; Shamir, R. A hierarchical Bayesian model for flexible module discovery in three-way time-series data. Bioinformatics 2015, 31, i17–i26. [Google Scholar] [CrossRef] [Green Version]
- Kirk, P.; Griffin, J.E.; Savage, R.S.; Ghahramani, Z.; Wild, D.L. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 2012, 28, 3290–3297. [Google Scholar] [CrossRef] [Green Version]
- Chekouo, T.; Murua, A. The penalized biclustering model and related algorithms. J. Appl. Stat. 2015, 42, 1255–1277. [Google Scholar] [CrossRef]
- Liu, J.; Lichtenberg, T.; Hoadley, K.A.; Poisson, L.M.; Lazar, A.J.; Cherniack, A.D.; Kovatich, A.J.; Benz, C.C.; Levine, D.A.; Lee, A.V.; et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 2018, 173, 400–416.e11. [Google Scholar] [CrossRef] [Green Version]
- Ghandi, M.; Huang, F.W.; Jané-Valbuena, J.; Kryukov, G.V.; Lo, C.C.; McDonald, R., III; Barretina, J.; Gelfand, E.T.; Bielski, C.M.; Li, H.; et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 2019, 569, 503–508. [Google Scholar] [CrossRef] [PubMed]
- Domcke, S.; Sinha, R.; Levine, D.A.; Sander, C.; Schultz, N. Evaluating cell lines as tumour models by comparison of genomic profiles. Nat. Commun. 2013, 4, 2126. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiang, G.L.; Zhang, S.J.; Yazdanparast, A.; Li, M.; Vikram Pawar, A.; Liu, Y.L.; Inavolu, S.M.; Cheng, L.J. Comprehensive comparison of molecular portraits between cell lines and tumors in breast cancer. BMC Genom. 2016, 17 (Suppl. 7), 525. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fragomeni, S.M.; Sciallis, A.; Jeruss, J.S. Molecular subtypes and local-regional control of breast cancer. Surg. Oncol. Clin. N. Am. 2018, 27, 95–120. [Google Scholar] [CrossRef] [PubMed]
- Sørlie, T.; Perou, C.M.; Tibshirani, R.; Aas, T.; Geisler, S.; Johnsen, H.; Hastie, T.; Eisen, M.B.; van de Rijn, M.; Jeffrey, S.S.; et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 2001, 98, 10869–10874. [Google Scholar] [CrossRef] [Green Version]
- Lehmann, B.D.; Bauer, J.A.; Chen, X.; Sanders, M.E.; Chakravarthy, A.B.; Shyr, Y.; Pietenpol, J.A. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Investig. 2011, 121, 2750–2767. [Google Scholar] [CrossRef] [Green Version]
- Lehmann, B.D.; Jovanović, B.; Chen, X.; Estrada, M.V.; Johnson, K.N.; Shyr, Y.; Moses, H.L.; Sanders, M.E.; Pietenpol, J.A. Refinement of triple-negative breast cancer molecular subtypes: Implications for neoadjuvant chemotherapy selection. PLoS ONE 2016, 11, e0157368. [Google Scholar] [CrossRef]
- Charafe-Jauffret, E.; Ginestier, C.; Monville, F.; Finetti, P.; Adélaïde, J.; Cervera, N.; Fekairi, S.; Xerri, L.; Jacquemier, J.; Birnbaum, D.; et al. Gene expression profiling of breast cell lines identifies potential new basal markers. Oncogene 2006, 25, 2273–2284. [Google Scholar] [CrossRef] [Green Version]
- Kao, J.; Salari, K.; Bocanegra, M.; Choi, Y.; Girard, L.; Gandhi, J.; Kwei, K.A.; Hernandez-Boussard, T.; Wang, P.; Gazdar, A.F.; et al. Molecular profiling of breast cancer cell lines defines relevant tumor models and provides a resource for cancer gene discovery. PLoS ONE 2009, 4, e6146. [Google Scholar] [CrossRef]
- Tseng, G.C.; Oh, M.K.; Rohlin, L.; Liao, J.C.; Wong, W.H. Issues in cDNA microarray analysis: Quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001, 29, 2549–2557. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Rouchka, E.C.; Brock, G.N.; Yan, J.; O’Toole, T.E.; Tieri, D.A.; Cooper, N.G. A combined approach with gene-wise normalization improves the analysis of RNA-seq data in human breast cancer subtypes. PLoS ONE 2018, 13, e0201813. [Google Scholar] [CrossRef] [PubMed]
- Murali, T.M.; Kasif, S. Extracting conserved gene expression motifs from gene expression data. In Proceedings of the Pacific Symposium on Biocomputing 2003, Kauai, HI, USA, 3–7 January 2003; pp. 77–88. [Google Scholar]
- Prelić, A.; Bleuler, S.; Zimmermann, P.; Wille, A.; Bühlmann, P.; Gruissem, W.; Hennig, L.; Thiele, L.; Zitzler, E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006, 22, 1122–1129. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kluger, Y.; Basri, R.; Chang, J.T.; Gerstein, M. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 2003, 13, 703–716. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hochreiter, S.; Bodenhofer, U.; Heusel, M.; Mayr, A.; Mitterecker, A.; Kasim, A.; Khamiakova, T.; van Sanden, S.; Lin, D.; Talloen, W.; et al. FABIA: Factor analysis for bicluster acquisition. Bioinformatics 2010, 26, 1520–1527. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, G.; Ma, Q.; Tang, H.; Paterson, A.H.; Xu, Y. QUBIC: A qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res. 2009, 37, e101. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Eren, K.; Deveci, M.; Küçüktunç, O.; Çatalyürek, Ü.V. A comparative analysis of biclustering algorithms for gene expression data. Brief. Bioinform. 2012, 14, 279–292. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, P.; Speicher, N.K.; Röttger, R.; Guo, J.; Baumbach, J. Bi-Force: Large-scale bicluster editing and its application to gene expression data biclustering. Nucleic Acids Res. 2014, 42, e78. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yazdanparast, A.; Li, L.; Radovich, M.; Cheng, L. Signal translational efficiency between mRNA expression and antibody-based protein expression for breast cancer and its subtypes from cell lines to tissue. Int. J. Comput. Biol. Drug Des. 2018, 11, 67–89. [Google Scholar] [CrossRef]
- Foulkes, W.D.; Smith, I.E.; Reis-Filho, J.S. Triple-negative breast cancer. N. Engl. J. Med. 2010, 363, 1938–1948. [Google Scholar] [CrossRef] [Green Version]
- Luo, Y.; Wang, F.; Szolovits, P. Tensor factorization toward precision medicine. Brief Bioinform. 2017, 18, 511–514. [Google Scholar] [CrossRef] [Green Version]
- Serra, A.; Fratello, M.; Fortino, V.; Raiconi, G.; Tagliaferri, R.; Greco, D. MVDA: A multi-view genomic data integration methodology. BMC Bioinform. 2015, 16, 261. [Google Scholar] [CrossRef] [PubMed]
- Meng, C.; Helm, D.; Frejno, M.; Kuster, B. moCluster: Identifying joint patterns across multiple omics data sets. J. Proteome Res. 2015, 15, 755–765. [Google Scholar] [CrossRef] [PubMed]
- Cheng, L. Challenges and strategies for differential transcriptome analysis from microarray to deep sequencing in statistics. Ann. Biom. Biostat. 2015, 2, 1014. [Google Scholar]
- Shen, R.; Olshen, A.B.; Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009, 25, 2906–2912. [Google Scholar] [CrossRef] [Green Version]
- Mo, Q.; Wang, S.; Seshan, V.E.; Olshen, A.B.; Schultz, N.; Sander, C.; Powers, R.S.; Ladanyi, M.; Shen, R. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA 2013, 110, 4245–4250. [Google Scholar] [CrossRef] [PubMed]
Inputs: Observed Data Matrix |
Data preprocessing: Remove the data batch effect, normalize the data, and input the missing incomplete data |
Fitting the empirical Bayes biclustering model using the EM algorithm: Initial values of and . For iteration t∈1, 2, …, N do Evaluate probabilities belonging to a bicluster (E-step) (M-step) then return Until |
Searching for specific bicluster: Set seed (such as druggable target gene) for initial searching and parameters Ac and; Sort gene set i and sample set j in decreasing order by number of 1s and 0s; Arrange bicluster based on ‘Ac’ and ‘’. |
Output:all biclusters, B1, B2, …Bi. |
Algorithm Name | Year | Parameters | Available Software | Reference |
---|---|---|---|---|
Cheng and Church | 2000 | The optimization threshold (δ) and the number of biclusters | R | [2] |
xMOTIFs | 2003 | The optimization threshold, the size of the bicluster threshold, the number of gene thresholds per iteration, and the number of genes in the initial bicluster | R | [24] |
BiMAX | 2006 | The size of biclusters | R, Java | [25] |
Plaid | 2002 | The number of biclusters, the number of iterations, amd the probability of in/excluding a gene during the clustering process | R | [4] |
Spectral | 2003 | The number of biclusters, the optimization threshold, and the size of the bicluster threshold | R | [26] |
FABIA | 2010 | The number of biclusters, the optimization threshold, the number of iterations, and the model-based parameter | R | [27] |
QUBIC | 2009 | The number of biclusters, the optimization threshold, and the overlap threshold for obtained biclusters | R, C | [28] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yazdanparast, A.; Li, L.; Zhang, C.; Cheng, L. Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes 2022, 13, 1982. https://doi.org/10.3390/genes13111982
Yazdanparast A, Li L, Zhang C, Cheng L. Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes. 2022; 13(11):1982. https://doi.org/10.3390/genes13111982
Chicago/Turabian StyleYazdanparast, Aida, Lang Li, Chi Zhang, and Lijun Cheng. 2022. "Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species" Genes 13, no. 11: 1982. https://doi.org/10.3390/genes13111982
APA StyleYazdanparast, A., Li, L., Zhang, C., & Cheng, L. (2022). Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes, 13(11), 1982. https://doi.org/10.3390/genes13111982