Integrating Multiple Interaction Networks for Gene Function Inference
Abstract
:1. Introduction
2. Results
2.1. Experimental Data Set
2.2. Evaluation Metrics
2.3. Impact of Feature Dimension on Performance
2.4. Performance Evaluaton of Multinetwork Integration
2.5. Comparison of Different Integrative Methods
2.6. Case Study: ESR1
3. Multinetwork Integration Based on gcForest
3.1. gcForest
3.1.1. Cascade Forest
3.1.2. Multigrained Scanning
3.2. Network Feature Extraction
3.3. Training and Prediction of ReprsentConcat
Algorithm 1: ReprsentConcat Algorithm |
Input: network_files: paths to adjacency list files, n: number of genes in input networks, d: number of output dimensions, onttype: which type of annotations to use, early_stopping_rounds: number of stopping the rounds Output: opt_pred_results: prediction results fori=1: length( network_files) A=load_network( network_files(i), n) Q=rwr(A, 0.5) R=ln(Q+1/n) U, ∑, V =svd(R) X=hstack(X, X_cur) end for Y=load_annotation(onttype) //load annotations //split the data into train data and test data X_train, Y_train, X_test, Y_test=train_test_split(X, Y) layer_id=0 while 1 if layer_id==0 X_cur_train=zeros(X_train) X_cur_test=zeros( X_test) else X_cur_train=X_proba_train.copy() X_cur_test= X_proba_test.copy() end if X_cur_train=hstack( X_cur_train, X_train) X_cur_ test =hstack( X_cur_ test, X_ test) for estimator in n_randomForests //train each forest through k-fold cross validation y_probas= estimator.fit_transform( X_cur_train, Y_train) y_train_proba_li+= y_probas y_test_probas= estimator.predict_proba(X_cur_ test) y_test_proba_li+= y_test_probas end for y_train_proba_li /=length(n_randomForests) y_test_proba_li /=length(n_randomForests) train_avg_F1=calc_F1(Y_train, y_train_proba_li) // calculate the F1 value test_avg_F1=calc_F1(Y_test, y_test_proba_li) test_F1_list.append( test_avg_F1) opt_layer_id=get_opt_layer_id( test_F1_list) if opt_layer_id = layer_id opt_pred_results=[ y_train_proba_li, y_test_proba_li] end if if layer_id - opt_layer_id >= early_stopping_rounds return opt_pred_results end if layer_id+=1 end while |
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Donghyeon, Y.; Minsoo, K.; Guanghua, X.; Tae Hyun, H. Review of biological network data and its applications. Genom. Inform. 2013, 11, 200–210. [Google Scholar]
- Batushansky, A.; Toubiana, D.; Fait, A. Correlation-Based Network Generation, Visualization, and Analysis as a Powerful Tool in Biological Studies: A Case Study in Cancer Cell Metabolism. BioMed Res. Int. 2016, 2016, 8313272. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, H.; Quan, X.W.; Yin, Y.B. A Heterogeneous Networks Fusion Algorithm Based on Local Topological Information for Neurodegenerative Disease. Curr. Bioinform. 2017, 12, 387–397. [Google Scholar] [CrossRef]
- Luo, J.W.; Liu, C.C. An Effective Method for Identifying Functional Modules in Dynamic PPI Networks. Curr. Bioinform. 2017, 12, 66–79. [Google Scholar] [CrossRef]
- Zeng, X.; Zhang, X.; Zou, Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief. Bioinform. 2016, 17, 193–203. [Google Scholar] [CrossRef] [PubMed]
- Zeng, C.; Zhan, W.; Deng, L. Curation, SDADB: A functional annotation database of protein structural domains. Database (Oxford) 2018, 2018, 64. [Google Scholar] [CrossRef] [PubMed]
- Zou, Q.; Li, J.; Wang, C.; Zeng, X. Approaches for Recognizing Disease Genes Based on Network. Biomed Res. Int. 2014, 2014, 416323. [Google Scholar] [CrossRef]
- Chua, H.N.; Sung, W.; Wong, L. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics 2006, 22, 1623–1630. [Google Scholar] [CrossRef]
- Milenković, T.; Pržulj, N. Topological Characteristics of Molecular Networks; Springer: New York, NY, USA, 2012; pp. 15–48. [Google Scholar]
- Sharan, R.; Ulitsky, I.; Shamir, R. Network-based prediction of protein function. Mol. Sys.Biol. 2007, 3, 88–88. [Google Scholar] [CrossRef]
- Wang, S.; Cho, H.; Zhai, C.; Berger, B.; Peng, J. Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics 2015, 31, 357–364. [Google Scholar] [CrossRef]
- Yu, G.; Zhu, H.; Domeniconi, C.; Guo, M. Integrating multiple networks for protein function prediction. BMC Sys. Biol. 2015, 9, 1–11. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, Z.; Chen, Z.; Deng, L. Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association Inference. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017. [Google Scholar] [CrossRef] [PubMed]
- Jiang, J.; Xing, F.; Zeng, X.; Zou, Q. RicyerDB: A Database For Collecting Rice Yield-related Genes with Biological Analysis. Int. J. Biol. Sci. 2018, 14, 965–970. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Ping, P.Y.; Kuang, L.N.; Ye, S.T.; Lqbal, F.M.B.; Pei, T.R. A Novel Approach Based on Bipartite Network to Predict Human Microbe-Disease Associations. Curr. Bioinform. 2018, 13, 141–148. [Google Scholar] [CrossRef]
- Liu, Y.; Zeng, X.; He, Z.; Zou, Q. Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 905–915. [Google Scholar] [CrossRef] [PubMed]
- Zhu, L.; Su, F.; Xu, Y.; Zou, Q. Network-based method for mining novel HPV infection related genes using random walk with restart algorithm. Biochim. Biophys. Acta Mol. Basis Dis. 2018, 1864, 2376–2383. [Google Scholar] [CrossRef] [PubMed]
- Zeng, X.; Liu, L.; Lü, L.; Zou, Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2018, 34, 2425–2432. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, J.; Fan, C.; Tang, Y.; Deng, L. KATZLGO: Large-scale Prediction of LncRNA Functions by Using the KATZ Measure Based on Multiple Networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017. [Google Scholar] [CrossRef]
- Mostafavi, S.; Ray, D.; Wardefarley, D.; Grouios, C.; Morris, Q. GeneMANIA: A real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008, 9, 1–15. [Google Scholar] [CrossRef]
- Dutkowski, J.; Kramer, M.; Surma, M.A.; Balakrishnan, R.; Cherry, J.M.; Krogan, N.J.; Ideker, T. A gene ontology inferred from molecular networks. Nat. Biotechnol. 2013, 31, 38–45. [Google Scholar] [CrossRef]
- Yu, G.; Fu, G.; Wang, J.; Zhu, H. Predicting protein function via semantic integration of multiple networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 13, 220–232. [Google Scholar] [CrossRef] [PubMed]
- Mostafavi, S.; Morris, Q. Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 2010, 26, 1759–1765. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lee, I.; Blom, U.M.; Wang, P.I.; Shim, J.E.; Marcotte, E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011, 21, 1109–1121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Meng, J.; Zhang, X.; Luan, Y. Global Propagation Method for Predicting Protein Function by Integrating Multiple Data Sources. Curr. Bioinform. 2016, 11, 186–194. [Google Scholar] [CrossRef]
- Franceschini, A.; Szklarczyk, D.; Frankild, S.; Kuhn, M.; Simonovic, M.; Roth, A.; Lin, J.; Minguez, P.; Bork, P.; Von Mering, C. STRING v9.1: Protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2012, 41, 808–815. [Google Scholar] [CrossRef] [PubMed]
- Cho, H.; Berger, B.; Peng, J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 2016, 3, 540–548. [Google Scholar] [CrossRef]
- Gligorijevic, V.; Barot, M.; Bonneau, R.J.B. deepNF: Deep network fusion for protein function prediction. Bioinformatics 2017, 34, 3873–3881. [Google Scholar] [CrossRef]
- Zhou, Z.; Feng, J. Deep forest: Towards an alternative to deep neural networks. Int. Joint Conf. Artif. Intell. 2017, 3553–3559. [Google Scholar]
- Ruepp, A.; Zollner, A.; Maier, D.; Albermann, K.; Hani, J.; Mokrejs, M.; Tetko, I.V.; Guldener, U.; Mannhaupt, G.; Munsterkotter, M. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 2004, 32, 5539–5545. [Google Scholar] [CrossRef] [Green Version]
- Consortium, G.O. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 2006, 34, 322–326. [Google Scholar] [CrossRef]
- Cho, H.; Berger, B.; Peng, J. Diffusion component analysis: Unraveling functional topology in biological networks. Res. Comput. Mol. Biol. 2015, 9029, 62–64. [Google Scholar] [PubMed]
- Zhang, B.; Li, L.; Lü, Q. Protein solvent-accessibility prediction by a stacked deep bidirectional recurrent neural network. Biomolecules 2018, 8, 33. [Google Scholar] [CrossRef]
- Signe, A.E.; Kadri, H.; Maire, P.; Outi, H.; Anneli, S.E.; Helle, K.; Andres, M.; Andres, S. Allelic estrogen receptor 1 (ESR1) gene variants predict the outcome of ovarian stimulation in in vitro fertilization. Mol. Hum. Reprod. 2007, 13, 521–526. [Google Scholar] [Green Version]
- Toy, W.; Yang, S.; Won, H.; Green, B.; Sakr, R.A.; Will, M.; Li, Z.; Gala, K.; Fanning, S.; King, T.A.; et al. ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet. 2013, 45, 1439–1445. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ioannidis, J.P.A.; Ralston, S.H.; Bennett, S.T.; Maria Luisa, B.; Daniel, G.; Karassa, F.B.; Bente, L.; Van Meurs, J.B.; Leif, M.; Serena, S. Differential genetic effects of ESR1 gene polymorphisms on osteoporosis outcomes. Jama 2004, 292, 2105–2114. [Google Scholar] [CrossRef] [PubMed]
- Sundermann, E.E.; Maki, P.M.; Bishop, J.R. A review of estrogen receptor α gene (esr1) polymorphisms, mood, and cognition. Menopause 2010, 17, 874–886. [Google Scholar] [CrossRef] [PubMed]
- Huntley, R.P.; Tony, S.; Prudence, M.M.; Aleksandra, S.; Carlos, B.; Martin, M.J.; Claire, O.D.J. The GOA database: Gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015, 43, 1057–1063. [Google Scholar] [CrossRef]
- Pan, Y.; Wang, Z.; Zhan, W.; Deng, L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2018, 34, 1473–1480. [Google Scholar] [CrossRef]
- Pan, Y.; Liu, D.; Deng, L. Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS ONE 2017, 12, e0179314. [Google Scholar] [CrossRef]
- Wang, H.; Liu, C.; Deng, L. Enhanced prediction of hot spots at protein–protein interfaces using extreme gradient boosting. Sci. Rep. 2018, 8, 14285. [Google Scholar] [CrossRef]
- Kuang, L.; Yu, L.; Huang, L.; Wang, Y.; Ma, P.; Li, C.; Zhu, Y. A personalized qos prediction approach for cps service recommendation based on reputation and location–aware collaborative filtering. Sensors 2018, 18, 1556. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Zheng, X.; Yang, Z.; Kuang, L. Predicting Short–Term Electricity Demand by Combining the Advantages of ARMA and XGBoost in Fog Computing Environment. Wirel. Commun. Mob. Comput. 2018, 2018, 5018053. [Google Scholar] [CrossRef]
- Glaab, E.; Baudot, A.; Krasnogor, N.; Schneider, R.; Valencia, A. EnrichNet: Network–based gene set enrichment analysis. Bioinformatics 2012, 28, 451–457. [Google Scholar] [CrossRef] [PubMed]
- Smedley, D.; Kohler, S.; Czeschik, J.C.; Amberger, J.S.; Bocchini, C.; Hamosh, A.; Veldboer, J.; Zemojtel, T.; Robinson, P.N. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics 2014, 30, 3215–3222. [Google Scholar] [CrossRef] [PubMed]
- Perozzi, B.; Alrfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th Acm Sigkdd International Conference on Knowledge Discovery Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar] [CrossRef]
- Grover, A.; Leskovec, J. Node2vec: Scalable feature learning for networks. In Proceedings of the 22th Acm Sigkdd International Conference on Knowledge Discovery Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar] [CrossRef]
- Deng, L.; Wu, H.; Liu, C.; Zhan, W.; Zhang, J. Probing the functions of long non-coding RNAs by exploiting the topology of global association and interaction network. Comput. Biol. Chem. 2018, 74, 360–367. [Google Scholar] [CrossRef] [PubMed]
Sample Availability: Samples of the compounds are not available from the authors. |
Network | Human | Yeast |
---|---|---|
coexpression | 788,166 | 314,014 |
co-occurrence | 18,064 | 2664 |
database | 159,502 | 33,486 |
experimental | 309,287 | 219,995 |
fusion | 1880 | 1361 |
neighborhood | 52,479 | 45,610 |
11–30 | 31–100 | 101–300 | |
---|---|---|---|
BP | 262 | 100 | 28 |
MF | 153 | 72 | 18 |
CC | 82 | 46 | 18 |
Rank | GO Term | GO Name |
---|---|---|
1 | GO:0000122 * | negative regulation of transcription by RNA polymerase II |
2 | GO:0071495 # | cellular response to endogenous stimulus |
3 | GO:0016265 | obsolete death |
4 | GO:0048878* | chemical homeostasis |
5 | GO:0051241 | negative regulation of multicellular organismal process |
6 | GO:0051098 | regulation of binding |
7 | GO:0008284 | positive regulation of cell population proliferation |
8 | GO:0007399 | nervous system development |
9 | GO:0006259* | DNA metabolic process |
10 | GO:0009057* | macromolecule catabolic process |
11 | GO:0010564 | regulation of cell cycle process |
12 | GO:0043900 | regulation of multi-organism process |
13 | GO:0002520 | immune system development |
14 | GO:0006928 | movement of cell or subcellular component |
15 | GO:0006325* | chromatin organization |
16 | GO:0018130# | heterocycle biosynthetic process |
17 | GO:0016192 | vesicle-mediated transport |
18 | GO:0031647 | regulation of protein stability |
19 | GO:0003008 | system process |
20 | GO:0008283 | cell population proliferation |
21 | GO:0051259 | protein complex oligomerization |
22 | GO:0030111 | regulation of Wnt signaling pathway |
23 | GO:0006629 | lipid metabolic process |
24 | GO:0034622 | cellular protein-containing complex assembly |
25 | GO:0010608 | posttranscriptional regulation of gene expression |
26 | GO:0055085 | transmembrane transport |
27 | GO:0016311 | dephosphorylation |
28 | GO:0007186 | G protein-coupled receptor signaling pathway |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Deng, L. Integrating Multiple Interaction Networks for Gene Function Inference. Molecules 2019, 24, 30. https://doi.org/10.3390/molecules24010030
Zhang J, Deng L. Integrating Multiple Interaction Networks for Gene Function Inference. Molecules. 2019; 24(1):30. https://doi.org/10.3390/molecules24010030
Chicago/Turabian StyleZhang, Jingpu, and Lei Deng. 2019. "Integrating Multiple Interaction Networks for Gene Function Inference" Molecules 24, no. 1: 30. https://doi.org/10.3390/molecules24010030
APA StyleZhang, J., & Deng, L. (2019). Integrating Multiple Interaction Networks for Gene Function Inference. Molecules, 24(1), 30. https://doi.org/10.3390/molecules24010030