Positive-Unlabeled Learning for Network Link Prediction
Abstract
:1. Introduction
2. Materials and Methods
2.1. Problem Definition
2.2. Overview
2.3. Network Representation Module
2.4. Positive-Unlabeled Learning Module
2.4.1. Positive-Unlabeled Learning Using Traditional Classifier Estimation
2.4.2. Positive-Unlabeled Learning Using Bagging Strategy
Algorithm 1. Bagging positive-unlabeled learning |
Input: Positive sample set , unlabeled set , sub-sample size , the number of sub-samples . Output: Function :. Steps: % Initialization: generate subsample from U; For = 1 to do Randomly select a subsample of size from ; Train a classifier using ; End For % Aggregate the prediction results of the above-trained classifiers; The mean of the predicted scores of the instance on each classifier: |
2.4.3. Positive-Unlabeled Learning Using Reliable Negative Sampling
3. Results
3.1. Datasets
3.2. Results
3.3. Parameter Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
- Nasiri, E.; Berahmand, K.; Samei, Z.; Li, Y. Impact of centrality measures on the common neighbors in link prediction for multiplex networks. Big Data 2022, 10, 138–150. [Google Scholar] [CrossRef]
- Nasiri, E.; Berahmand, K.; Li, Y. A new link prediction in multiplex networks using topologically biased random walks. Chaos Solitons Fractals 2021, 151, 111230. [Google Scholar] [CrossRef]
- Zamiri, M.; Yazdi, H.S. Image annotation based on multi-view robust spectral clustering. J. Vis. Commun. Image Represent. 2021, 74, 103003. [Google Scholar] [CrossRef]
- Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
- Tamassia, R. Handbook of Graph Drawing and Visualization; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Liben-Nowell, D.; Kleinberg, J. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, LA, USA, 3–8 November 2003; pp. 556–559. [Google Scholar]
- Nasiri, E.; Berahmand, K.; Li, Y. Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed. Tools Appl. 2022, 1–24. [Google Scholar] [CrossRef]
- Martínez, V.; Berzal, F.; Cubero, J.C. A survey of link prediction in complex networks. ACM Comput. Surv. 2016, 49, 1–33. [Google Scholar] [CrossRef]
- Jaskie, K.; Spanias, A. Positive and unlabeled learning algorithms and applications: A survey. In Proceedings of the 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), Patras, Greece, 15–17 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Bekker, J.; Davis, J. Estimating the class prior in positive and unlabeled data through decision tree induction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Li, G. A survey on postive and unlabelled learning. Comput. Inf. Sci. 2013. [Google Scholar]
- Liu, B.; Dai, Y.; Li, X.; Lee, W.S.; Yu, P.S. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL, USA, 22–22 November 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 179–186. [Google Scholar]
- Li, X.L.; Liu, B. Learning from positive and unlabeled examples with different data distributions. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2005; pp. 218–229. [Google Scholar]
- Denis, F.; Gilleron, R.; Letouzey, F. Learning from positive and unlabeled examples. Theor. Comput. Sci. 2005, 348, 70–83. [Google Scholar] [CrossRef]
- Elkan, C.; Noto, K. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 213–220. [Google Scholar]
- Mordelet, F.; Vert, J.P. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognit. Lett. 2014, 37, 201–209. [Google Scholar] [CrossRef]
- Du Plessis, M.; Niu, G.; Sugiyama, M. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning; PMLR: Lille, France, 2015; pp. 1386–1394. [Google Scholar]
- Liu, B.; Lee, W.S.; Yu, P.S.; Li, X. Partially supervised classification of text documents. ICML 2002, 2, 387–394. [Google Scholar]
- Peng, T.; Zuo, W.; He, F. SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl. Inf. Syst. 2008, 16, 281–301. [Google Scholar] [CrossRef]
- Li, X.; Liu, B. Learning to classify texts using positive and unlabeled data. IJCAI 2003, 3, 587–592. [Google Scholar]
- Li, X.L.; Liu, B.; Ng, S.K. Negative training data can be harmful to text classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 9–11 October 2010; pp. 218–228. [Google Scholar]
- Lu, F.; Bai, Q. Semi-supervised text categorization with only a few positive and unlabeled documents. In Proceedings of the 2010 3rd International Conference on Biomedical Engineering and Informatics, Yantai, China, 16–18 October 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 7, pp. 3075–3079. [Google Scholar]
- Kaboutari, A.; Bagherzadeh, J.; Kheradmand, F. An evaluation of two-step techniques for positive-unlabeled learning in text classification. Int. J. Comput. Appl. Technol. Res. 2014, 3, 592–594. [Google Scholar] [CrossRef]
- Lee, W.S.; Liu, B. Learning with positive and unlabeled examples using weighted logistic regression. ICML 2003, 3, 448–455. [Google Scholar]
- Khan, S.S.; Madden, M.G. One-class classification: Taxonomy of study and review of techniques. Knowl. Eng. Rev. 2014, 29, 345–374. [Google Scholar] [CrossRef]
- Chapelle, O.; Scholkopf, B.; Zien, A. Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book Reviews]. IEEE Trans. Neural Netw. 2009, 20, 542. [Google Scholar] [CrossRef]
- Manevitz, L.M.; Yousef, M. One-class SVMs for document classification. J. Mach. Learn. Res. 2001, 2, 139–154. [Google Scholar]
- Lü, L.; Zhou, T. Link prediction in complex networks: A survey. Phys. A Stat. Mech. Appl. 2011, 390, 1150–1170. [Google Scholar] [CrossRef]
- Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl.-Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
- Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; Zhu, W. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1105–1114. [Google Scholar]
- Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar]
- Wang, D.; Cui, P.; Zhu, W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
- Liu, S.; Zhai, S.; Zhu, L.; Zhu, F.; Zhang, Z.M.; Zhang, W. Efficient network representations learning: An edge-centric perspective. In International Conference on Knowledge Science, Engineering and Management; Springer: Cham, Switzerland, 2019; pp. 373–388. [Google Scholar]
- Cen, Y.; Zou, X.; Zhang, J.; Yang, H.; Zhou, J.; Tang, J. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1358–1368. [Google Scholar]
- Yang, C.; Xiao, Y.; Zhang, Y.; Sun, Y.; Han, J. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Trans. Knowl. Data Eng. 2020, 34, 4854–4873. [Google Scholar] [CrossRef]
- Dong, Y.; Hu, Z.; Wang, K.; Sun, Y.; Tang, J. Heterogeneous Network Representation Learning. IJCAI 2020, 20, 4861–4867. [Google Scholar]
- Xie, Y.; Yu, B.; Lv, S.; Zhang, C.; Wang, G.; Gong, M. A survey on heterogeneous network representation learning. Pattern Recognit. 2021, 116, 107936. [Google Scholar] [CrossRef]
- Cui, P.; Wang, X.; Pei, J.; Zhu, W. A survey on network embedding. IEEE Trans. Knowl. Data Eng. 2018, 31, 833–852. [Google Scholar] [CrossRef]
- Zhang, D.; Yin, J.; Zhu, X.; Zhang, C. Network representation learning: A survey. IEEE Trans. Big Data 2018, 6, 3–28. [Google Scholar] [CrossRef]
- Yu, H.; Han, J.; Chang, K.C.C. PEBL: Web page classification without negative examples. IEEE Trans. Knowl. Data Eng. 2004, 16, 70–81. [Google Scholar]
- Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006, 34 (Suppl. S1), D668–D672. [Google Scholar] [CrossRef]
- Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef] [Green Version]
- Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
- Ahmed, A.; Shervashidze, N.; Narayanamurthy, S.; Josifovski, V.; Smola, A.J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 37–48. [Google Scholar]
- Bekker, J.; Davis, J. Learning from positive and unlabeled data: A survey. Mach. Learn. 2020, 109, 719–760. [Google Scholar] [CrossRef] [Green Version]
Dataset | Node | Link | Average Degree | Clustering Coefficient | Assortativity Coefficient | Type |
---|---|---|---|---|---|---|
DrugBank | 812 | 165,802 | 408.3793 | 0.6469 | −0.2031 | Bioinformatic network |
Karate | 34 | 156 | 9.1765 | 0.5706 | −0.4756 | Social network |
Cora | 2708 | 10,556 | 7.7962 | 0.2407 | −0.0659 | Citation network |
Classifier | Methods | AUPR | AUC | F1 | Accuracy |
---|---|---|---|---|---|
RF | N/A | 0.8857 | 0.8527 | 0.8254 | 0.9246 |
Standard-PU | 0.9353 | 0.9471 | 0.9247 | 0.9624 | |
Bagging-PU | 0.9085 | 0.9506 | 0.8999 | 0.9463 | |
TwoStep-PU | 0.9210 | 0.9399 | 0.9097 | 0.9545 | |
SVM | N/A | 0.4842 | 0.6136 | 0.3918 | 0.7094 |
Standard-PU | 0.6230 | 0.5468 | 0.4288 | 0.3436 | |
Bagging-PU | 0.6252 | 0.4993 | 0.4018 | 0.2514 | |
TwoStep-PU | 0.5782 | 0.6783 | 0.5160 | 0.7129 | |
LR | N/A | 0.6465 | 0.6214 | 0.3961 | 0.8016 |
Standard-PU | 0.6902 | 0.7804 | 0.6419 | 0.7814 | |
Bagging-PU | 0.6913 | 0.7817 | 0.6409 | 0.7765 | |
TwoStep-PU | 0.6838 | 0.7463 | 0.5753 | 0.6519 | |
DT | N/A | 0.7815 | 0.7786 | 0.7012 | 0.8743 |
Standard-PU | 0.8056 | 0.8269 | 0.7619 | 0.8898 | |
Bagging-PU | 0.9030 | 0.9452 | 0.8935 | 0.9430 | |
TwoStep-PU | 0.8259 | 0.8345 | 0.7793 | 0.8997 | |
NB | N/A | 0.6786 | 0.7697 | 0.6288 | 0.7735 |
Standard-PU | 0.6789 | 0.7700 | 0.6284 | 0.7716 | |
Bagging-PU | 0.6795 | 0.7705 | 0.6262 | 0.7650 | |
TwoStep-PU | 0.6822 | 0.7694 | 0.6143 | 0.7345 |
Methods | Models | AUPR | AUC | F1 | Accuracy |
---|---|---|---|---|---|
Standard-PU | Deepwalk | 0.8880 | 0.9252 | 0.8752 | 0.9350 |
GF | 0.7952 | 0.8597 | 0.7697 | 0.8752 | |
Node2vec | 0.8877 | 0.9268 | 0.8752 | 0.9345 | |
LINE | 0.9149 | 0.9281 | 0.8998 | 0.9505 | |
SDNE | 0.9353 | 0.9471 | 0.9247 | 0.9624 | |
Bagging-PU | Deepwalk | 0.8917 | 0.9378 | 0.8806 | 0.9357 |
GF | 0.8090 | 0.8826 | 0.7831 | 0.8747 | |
Node2vec | 0.8905 | 0.9388 | 0.8793 | 0.9345 | |
LINE | 0.9051 | 0.9452 | 0.8958 | 0.9446 | |
SDNE | 0.9085 | 0.9506 | 0.8999 | 0.9463 | |
TwoStep-PU | Deepwalk | 0.8856 | 0.9049 | 0.8658 | 0.9340 |
GF | 0.8017 | 0.8494 | 0.7736 | 0.8856 | |
Node2vec | 0.8860 | 0.9078 | 0.8675 | 0.9343 | |
LINE | 0.9082 | 0.9265 | 0.8937 | 0.9470 | |
SDNE | 0.9210 | 0.9399 | 0.9097 | 0.9545 |
Dataset | Methods | AUPR | AUC | F1 | Accuracy |
---|---|---|---|---|---|
DrugBank | RF | 0.8857 | 0.8527 | 0.8254 | 0.9246 |
Standard-PU(RF) | 0.9353 | 0.9471 | 0.9247 | 0.9624 | |
Bagging-PU(RF) | 0.9085 | 0.9506 | 0.8999 | 0.9463 | |
TwoStep-PU(RF) | 0.9210 | 0.9399 | 0.9097 | 0.9545 | |
Karate | RF | 0.6226 | 0.6170 | 0.3759 | 0.8904 |
Standard-PU(RF) | 0.6745 | 0.8429 | 0.6281 | 0.8681 | |
Bagging-PU(RF) | 0.6344 | 0.8215 | 0.5506 | 0.8119 | |
TwoStep-PU(RF) | 0.6527 | 0.7884 | 0.6203 | 0.8841 | |
Cora | RF | 0.5810 | 0.5687 | 0.2414 | 0.9566 |
Standard-PU(RF) | 0.5926 | 0.8202 | 0.5697 | 0.9487 | |
Bagging-PU(RF) | 0.5267 | 0.8350 | 0.3430 | 0.8409 | |
TwoStep-PU(RF) | 0.5155 | 0.7372 | 0.5026 | 0.9504 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gan, S.; Alshahrani, M.; Liu, S. Positive-Unlabeled Learning for Network Link Prediction. Mathematics 2022, 10, 3345. https://doi.org/10.3390/math10183345
Gan S, Alshahrani M, Liu S. Positive-Unlabeled Learning for Network Link Prediction. Mathematics. 2022; 10(18):3345. https://doi.org/10.3390/math10183345
Chicago/Turabian StyleGan, Shengfeng, Mohammed Alshahrani, and Shichao Liu. 2022. "Positive-Unlabeled Learning for Network Link Prediction" Mathematics 10, no. 18: 3345. https://doi.org/10.3390/math10183345
APA StyleGan, S., Alshahrani, M., & Liu, S. (2022). Positive-Unlabeled Learning for Network Link Prediction. Mathematics, 10(18), 3345. https://doi.org/10.3390/math10183345