Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods
Abstract
:1. Introduction
- We proposed a new parameterized link prediction metric that grants the user or system the full control of metric. Note that the proposed metric enhances the performance of the state-of-the-art metrics;
- We compared the performance of the proposed metric against the state-of-the-art metrics using the AUC;
- We studied the impact of using the parameter on each enhanced version of link prediction;
- We studied the correlation between the parameter and the network features;
- We used machine learning algorithms to confirm the efficiency of the proposed method.
2. Methods
2.1. Related Works
2.2. The Proposed Metric
- The path through the nodes A and B:
- The path through the nodes C and B:
- The path through the nodes C and D:
3. Evaluation
3.1. Methodology
3.2. Evaluation Criterion
3.3. Datasets
- USAir [33] is the US air transportation network. The nodes represent airports, and links indicate routes.
- Florida [34]—in this network, the nodes are compartments and edges represent directed carbon exchange in the Florida bay.
- Football [35] represents the American football games between Division IA colleges during regular season in Fall 2000.
- Political network [31] is a directed network of hyperlinks between political blogs about politics in the United States of America. Note that we considered this network as undirected.
- Les Misérables [36] is an undirected network that contains co-occurrences of characters in Victor Hugo’s novel ’Les Misérables’. The nodes represent a character and edges show that two characters appeared in the same chapter of the book.
- Zachary Karate Club [31] in this network, a node represents a member of the club (Zachary Club), and each edge represents a tie between two members of the club. The network is undirected.
4. Results
5. Link Prediction Using Machine Learning Algorithms
5.1. Methodology
5.2. Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Esslimani, I.; Brun, A.; Boyer, A. Densifying a behavioral recommender system by social networks link prediction methods. Soc. Netw. Anal. Min. 2011, 1, 159–172. [Google Scholar] [CrossRef]
- Chen, H.; Li, X.; Huang, Z. Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05), Denver, CO, USA, 7–11 June 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 141–142. [Google Scholar]
- Folino, F.; Pizzuti, C. Link prediction approaches for disease networks. In Proceedings of the International Conference on Information Technology in Bio-and Medical Informatics, Vienna, Austria, 4–5 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 99–108. [Google Scholar]
- Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
- Aziz, F.; Cardoso, V.R.; Bravo-Merodio, L.; Russ, D.; Pendleton, S.C.; Williams, J.A.; Acharjee, A.; Gkoutos, G.V. Multimorbidity prediction using link prediction. Sci. Rep. 2021, 11, 16392. [Google Scholar] [CrossRef] [PubMed]
- Liben-Nowell, D.; Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 2007, 58, 1019–1031. [Google Scholar] [CrossRef] [Green Version]
- Adamic, L.A.; Adar, E. Friends and neighbors on the web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef] [Green Version]
- Jaccard, P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 547–579. [Google Scholar]
- Martínez, V.; Berzal, F.; Cubero, J.C. Adaptive degree penalization for link prediction. J. Comput. Sci. 2016, 13, 1–9. [Google Scholar] [CrossRef]
- Jibouni, A.; Lotfi, D.; El Marraki, M.; Hammouch, A. A novel parameter free approach for link prediction. In Proceedings of the 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM), Marrakesh, Morocco, 16–19 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Ayoub, J.; Lotfi, D.; El Marraki, M.; Hammouch, A. Accurate link prediction method based on path length between a pair of unlinked nodes and their degree. Soc. Netw. Anal. Min. 2020, 10, 9. [Google Scholar] [CrossRef]
- Gu, S.; Chen, L. Link Prediction Based on Precision Optimization. In Proceedings of the International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem, Hong Kong, China, 18–20 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 131–141. [Google Scholar]
- Han, S.; Xu, Y. Link Prediction in Microblog Network Using Supervised Learning with Multiple Features. J. Comput. 2016, 11, 72–82. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Zhou, Y.; Hong, L.; Zou, Y.; Su, H. Pairwise Learning for Neural Link Prediction. arXiv 2021, arXiv:2112.02936. [Google Scholar]
- Matek, T.; Zebec, S.T. GitHub open source project recommendation system. arXiv 2016, arXiv:1602.02594. [Google Scholar]
- Ahmad, I.; Akhtar, M.U.; Noor, S.; Shahnaz, A. Missing link prediction using common neighbor and centrality based parameterized algorithm. Sci. Rep. 2020, 10, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Kumar, A.; Singh, S.S.; Singh, K.; Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. Stat. Mech. Its Appl. 2020, 553, 124289. [Google Scholar] [CrossRef]
- Li, B.; Han, L. Distance weighted cosine similarity measure for text classification. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Hefei, China, 20–23 October 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 611–618. [Google Scholar]
- Barabâsi, A.L.; Jeong, H.; Néda, Z.; Ravasz, E.; Schubert, A.; Vicsek, T. Evolution of the social network of scientific collaborations. Phys. Stat. Mech. Its Appl. 2002, 311, 590–614. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E. Clustering and preferential attachment in growing networks. Phys. Rev. E 2001, 64, 025102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ravasz, E.; Somera, A.L.; Mongru, D.A.; Oltvai, Z.N.; Barabási, A.L. Hierarchical organization of modularity in metabolic networks. Science 2002, 297, 1551–1555. [Google Scholar] [CrossRef] [Green Version]
- Leicht, E.A.; Holme, P.; Newman, M.E. Vertex similarity in networks. Phys. Rev. E 2006, 73, 026120. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Y.X.; Lü, L.; Zhang, Q.M.; Zhou, T. Uncovering missing links with cold ends. Phys. Stat. Mech. Its Appl. 2012, 391, 5769–5778. [Google Scholar] [CrossRef] [Green Version]
- Salton, G.; Mcgill, M. Introduction to Modern Information Retrieval; McGraw-Hill, Inc.: New York, NY, USA, 1986; 400p. [Google Scholar]
- Sorensen, T.A. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biol. Skar. 1948, 5, 1–34. [Google Scholar]
- Zhou, T.; Lü, L.; Zhang, Y.C. Predicting missing links via local information. Eur. Phys. J. B 2009, 71, 623–630. [Google Scholar] [CrossRef] [Green Version]
- Fishburn, P.C. Letter to the editor—Additive utilities with incomplete product sets: Application to priorities and assignments. Oper. Res. 1967, 15, 537–542. [Google Scholar] [CrossRef]
- Han, J.; Moraga, C. The influence of the sigmoid function parameters on the speed of backpropagation learning. In Proceedings of the International Workshop on Artificial Neural Networks, Perth, Australia, 27 November–1 December 1995; Springer: Berlin/Heidelberg, Germany, 1995; pp. 195–201. [Google Scholar]
- Bu, D.; Zhao, Y.; Cai, L.; Xue, H.; Zhu, X.; Lu, H.; Zhang, J.; Sun, S.; Ling, L.; Zhang, N.; et al. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic Acids Res. 2003, 31, 2443–2450. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nakai, K.; Kanehisa, M. Expert system for predicting protein localization sites in Gram negative bacteria. Proteins Struct. Funct. Bioinform. 1991, 11, 95–110. [Google Scholar] [CrossRef]
- Kunegis, J. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
- Watts, D.J.; Strogatz, S.H. Collective dynamics of ’small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
- Batagelj, V.; Mrvar, A. Pajek Datasets. USAir97. Net. 2006. Available online: http://vlado.fmf.uni-lj.si/pub/networks/data/ (accessed on 10 November 2021).
- Ulanowicz, R.E.; DeAngelis, D.L. Network analysis of trophic dynamics in south florida ecosystems. Geol. Surv. Program South Fla. Ecosyst. 2005, 114, 45. [Google Scholar]
- Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [Green Version]
- Rossi, R.; Ahmed, N. The network data repository with interactive graph analytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Latora, V.; Marchiori, M. Efficient behavior of small-world networks. Phys. Rev. Lett. 2001, 87, 198701. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E. Assortative mixing in networks. Phys. Rev. Lett. 2002, 89, 208701. [Google Scholar] [CrossRef] [Green Version]
- Snijders, T.A. The degree variance: An index of graph heterogeneity. Soc. Netw. 1981, 3, 163–174. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Goldberger, J.; Hinton, G.E.; Roweis, S.; Salakhutdinov, R.R. Neighbourhood components analysis. Adv. Neural Inf. Process. Syst. 2004. Available online: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.449.1850&rep=rep1&type=pdf (accessed on 10 November 2021).
- Wu, T.F.; Lin, C.J.; Weng, R.C. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 2004, 5, 975–1005. [Google Scholar]
- Hkdh, B. Neural networks in materials science. ISIJ Int. 1999, 39, 966–979. [Google Scholar]
- Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Metric Name | Equation |
---|---|
Preferential attachment (PA) [19] | |
Common neighbor (CN) [20] | |
Hub promoted (HP) [21] | |
Hub depressed (HD) [21] | |
Jaccard coefficient (JA) [8] | |
Leicht–Holme–Nerman (LHN) [22] | |
Parameter dependent (PD) [23] | |
Adamic/Adar (AA) [7] | |
Salton [24] | |
Sorensen [25] | |
Resource allocation (RA) [26] |
Network | N | M | C | r | Average Degree | D | H | e |
---|---|---|---|---|---|---|---|---|
YeastS | 2284 | 6646 | 0.134 | −0.099 | 5819 | 4.37 | 2.84 | 0.233 |
Power Grid | 4941 | 6594 | 0.08 | 0.003 | 2669 | 18,989 | 1.45 | 0.063 |
USAir | 332 | 2126 | 0.625 | −0.207 | 12,807 | 2738 | 3,463 | 0.406 |
Florida | 128 | 2075 | 0.334 | −0.111 | 32,421 | 1776 | 1237 | 0.622 |
Football | 115 | 613 | 0.403 | 0.162 | 10.66 | 2508 | 1006 | 0.45 |
Political Network | 105 | 440 | 0.481 | −0.132 | 8.38 | 3092 | 1.41 | 0.396 |
Les Misérables | 77 | 253 | 0.559 | −0.163 | 6571 | 2651 | 1829 | 0.434 |
Zachary | 34 | 77 | 0.485 | −0.478 | 4529 | 2424 | 1668 | 0.489 |
Les Misérables | Political Network | Football | USAir | Power Grid | Zachary | Florida | YeastS | |
---|---|---|---|---|---|---|---|---|
AA | 0.895 | 0.902 | 0.819 | 0.941 | 0.59 | 0.643 | 0.625 | 0.715 |
Improved AA | 0.938 | 0.925 | 0.89 | 0.947 | 0.733 | 0.843 | 0.624 | 0.754 |
jaccard | 0.83 | 0.882 | 0.833 | 0.898 | 0.59 | 0.464 | 0.536 | 0.712 |
Improved jaccard | 0.884 | 0.923 | 0.905 | 0.911 | 0.733 | 0.629 | 0.549 | 0.751 |
AP | 0.74 | 0.695 | 0.271 | 0.87 | 0.446 | 0.721 | 0.743 | 0.771 |
Improved AP | 0.834 | 0.852 | 0.864 | 0.756 | 0.7 | 0.714 | 0.501 | 0.794 |
Hub D | 0.826 | 0.869 | 0.83 | 0.89 | 0.59 | 0.443 | 0.529 | 0.712 |
Improved Hub D | 0.876 | 0.916 | 0.908 | 0.9 | 0.733 | 0.629 | 0.546 | 0.751 |
Hub P | 0.814 | 0.886 | 0.831 | 0.874 | 0.582 | 0.564 | 0.542 | 0.713 |
Improved Hub P | 0.868 | 0.916 | 0.905 | 0.876 | 0.733 | 0.714 | 0.553 | 0.751 |
LHN | 0.787 | 0.848 | 0.832 | 0.778 | 0.584 | 0.45 | 0.4 | 0.711 |
Improved LHN | 0.828 | 0.909 | 0.91 | 0.785 | 0.733 | 0.614 | 0.431 | 0.751 |
PD | 0.878 | 0.89 | 0.832 | 0.929 | 0.585 | 0.536 | 0.608 | 0.714 |
Improved PD | 0.934 | 0.932 | 0.902 | 0.942 | 0.733 | 0.721 | 0.622 | 0.753 |
RA | 0.9 | 0.904 | 0.819 | 0.949 | 0.59 | 0.657 | 0.63 | 0.714 |
Improved RA | 0.934 | 0.925 | 0.89 | 0.955 | 0.733 | 0.857 | 0.631 | 0.754 |
CN | 0.882 | 0.893 | 0.82 | 0.928 | 0.59 | 0.593 | 0.621 | 0.714 |
Improved CN | 0.93 | 0.932 | 0.902 | 0.941 | 0.733 | 0.721 | 0.622 | 0.753 |
Salton | 0.834 | 0.886 | 0.833 | 0.905 | 0.584 | 0.479 | 0.539 | 0.712 |
Improved Salton | 0.88 | 0.925 | 0.905 | 0.92 | 0.733 | 0.671 | 0.551 | 0.752 |
Sorensen | 0.83 | 0.882 | 0.833 | 0.898 | 0.59 | 0.464 | 0.536 | 0.712 |
Improved Sorensen | 0.884 | 0.923 | 0.905 | 0.911 | 0.733 | 0.629 | 0.549 | 0.751 |
Correlation | Best for AA | Best for Jaccard | Best for PA | Best for HD | Best for HP | Best for LHN | Best for PD | Best for RA | Best for CN | Best for Salton | Best for Sorensen |
---|---|---|---|---|---|---|---|---|---|---|---|
N | −0.347 | −0.175 | −0.328 | 0.097 | 0.052 | −0.137 | −0.183 | 0.113 | −0.183 | −0.183 | −0.499 |
M | 0.044 | 0.055 | 0.055 | 0.343 | 0.361 | 0.22 | −0.12 | 0.1 | −0.12 | −0.12 | 0.055 |
C | 0.034 | 0.143 | 0.123 | −0.15 | −0.209 | −0.032 | −0.229 | −0.446 | −0.229 | −0.229 | 0.143 |
r | 0.161 | 0.521 | 0.197 | 0.541 | 0.555 | 0.6 | 0.219 | 0.291 | 0.219 | 0.219 | 0.521 |
Average Degree | 0.695 | 0.285 | 0.594 | 0.098 | 0.426 | 0.394 | 0.251 | −0.237 | 0.251 | 0.251 | 0.285 |
D | −0.511 | −0.277 | −0.509 | −0.048 | −0.107 | −0.218 | −0.277 | −0.078 | −0.277 | −0.277 | −0.277 |
H | 0.267 | 0.295 | 0.431 | 0.389 | 0.313 | 0.325 | −0.547 | −0.439 | −0.547 | −0.547 | 0.295 |
e | 0.42 | 0.062 | 0.41 | −0.234 | −0.047 | 0.008 | 0.289 | −0.124 | 0.289 | 0.289 | 0.062 |
Football dataset | Les Misérables dataset |
Political Network dataset | Power Grid dataset |
USAir dataset | YeastS dataset |
Zachary dataset | Florida dataset |
Football dataset | Les Misérables dataset |
Political Network dataset | Power Grid dataset |
USAir dataset | YeastS dataset |
Zachary dataset | Florida dataset |
Football dataset | Les Misérables dataset |
Political Network dataset | Power Grid dataset |
USAir dataset | YeastS dataset |
Zachary dataset | Florida dataset |
Football dataset | Les Misérables dataset |
Political Network dataset | Power Grid dataset |
USAir dataset | YeastS dataset |
Zachary dataset | Florida dataset |
Football dataset | |
Les Misérables dataset | |
Political Network dataset | |
Power Grid dataset | |
USAir dataset | |
YeastS dataset | |
Zachary dataset | |
Florida dataset |
Les Misérables | Political Network | Football | USAir | Power Grid | Zachary | Florida | YeastS | |
---|---|---|---|---|---|---|---|---|
SVM with PSI | 0.902 | 0.858 | 0.756 | 0.942 | 0.995 | 0.677 | 0.763 | 0.808 |
SVM without PSI | 0.892 | 0.852 | 0.756 | 0.892 | 0.638 | 0.677 | 0.760 | 0.801 |
neural network with PSI | 0.922 | 0.862 | 0.768 | 0.968 | 0.999 | 0.774 | 0.778 | 0.834 |
neural network without PSI | 0.912 | 0.852 | 0.760 | 0.913 | 0.643 | 0.710 | 0.761 | 0.819 |
decision tree with PSI | 0.971 | 0.875 | 0.959 | 0.969 | 0.998 | 0.903 | 0.740 | 0.985 |
decision tree without PSI | 0.873 | 0.841 | 0.760 | 0.885 | 0.641 | 0.742 | 0.761 | 0.806 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ayoub, J.; Lotfi, D.; Hammouch, A. Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods. Information 2022, 13, 35. https://doi.org/10.3390/info13010035
Ayoub J, Lotfi D, Hammouch A. Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods. Information. 2022; 13(1):35. https://doi.org/10.3390/info13010035
Chicago/Turabian StyleAyoub, Jibouni, Dounia Lotfi, and Ahmed Hammouch. 2022. "Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods" Information 13, no. 1: 35. https://doi.org/10.3390/info13010035
APA StyleAyoub, J., Lotfi, D., & Hammouch, A. (2022). Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods. Information, 13(1), 35. https://doi.org/10.3390/info13010035