Network Community Detection on Metric Space
Abstract
:1. Introduction
- A detailed study of the community detection algorithms.
- Transforming a graph to a metric space, preserving its structural properties.
- Studying the complex properties of real-world networks on induced metric space.
- Developing community detection algorithms on induced metric space.
- Analyzing the results and complexities of the developed algorithms.
- Comparing the community detection algorithms with other existing methods.
2. Network Community Detection
2.1. Community Evaluation
- Modularity: The notion of modularity is the most popular for network community detection purposes. The modularity index assigns high scores to communities whose internal edges are more than that expected in a random network model, which preserves the degree distribution of the given network.
- Internal density: Density is defined by the number of edges () in subset S divided by the total number of possible edges between all nodes . The “2” is there to cancel out duplicated edges. Internal density .
- Edges inside: This is somewhat useless by itself, since it is not related to any other attributes of subset S; the total number of edges () present in subset S. Edges inside .
- Average degree: This is the average internal degree across all nodes () in subset S. Average degree .
- The fraction over the median degree: This determines the number of nodes that have an internal degree greater than the median degree of nodes in subset S.
- Triangle Participation Ratio: The best measure for density, cohesiveness, and clustering within the goodness scales. Robust under random and expand perturbations. The fraction of nodes in S that belong to a triad. TPR = (number of nodes belonging to a triad).
- Expansion: This measure of separability gives the average number of external connections () per node () in subset S with graph G. It can be thought of as the external degree. Expansion .
- Cut ratio: This metric is a measure of separability and can be thought of as external density. It is the fraction of external edges () of subset Sout of the total number of possible edges in graph G.
- Conductance: This is the ratio of edges inside the cluster to the number of edges leaving the cluster (captures the surface area to volume ratio). It measures best in separability (goodness scale), measuring well-separated non-overlapping communities. It is robust under node swap and shrink perturbation. Community-like sets of nodes have lower conductance.
- Normalized cut: This represents how well subset S is separated from graph G. It sums up the fraction of external edges over all edges in subset S (conductance) with the fraction of external edges over all non-community edges.
- Maximum out degree fraction: This metric first finds the fraction of external connections to internal connections for each node () in S. It then returns the fraction with the highest value.
- Average out degree fraction: This is the sum of the individual fraction of edges outside of the community over the total connections of a node in subset S. It is then divided by the total number of nodes () in subset S.
- Flake out degree fraction: This is a fraction of the number of nodes that have fewer internal connections than external connections to the number of nodes () in subset S.
2.2. Popular Algorithms
- Walktrap algorithm: This algorithm by Pons and Latapy [15] uses a hierarchical agglomerative method. Here, the distance between two nodes is defined in terms of a random walk process. The basic idea is that if two nodes are in the same community, the probability to get to a third node located in the same community through a random walk should not be very different. The distance is constructed by summing these differences over all nodes, with a correction for degree.
- Eigenvector algorithm: This algorithm by Newman [24] is modularity based, and it uses an optimization method inspired by graph partitioning techniques. It relies on the eigenvectors of a so-called modularity matrix, instead of the graph Laplacian traditionally used in graph partitioning.
- Label propagation algorithm: This algorithm by Raghavan et al. [13] uses the concept of node neighborhood and the diffusion of information in the network to identify communities. Initially, each node is labeled with a unique value. Then, an iterative process takes place, where each node takes the label that is the most spread in its neighborhood. This process goes on until one of several conditions is met, for instance no label change. The resulting communities are defined by the last label values.
- Spinglass algorithm: This algorithm by Reichardt and Bornholdt [25] is an optimization method relying on an analogy between the statistical mechanics of complex networks and physical spinglass models.
Author | Ref. | Cat.(No.) | Order |
---|---|---|---|
Van Dongen | (Graph clustering, 2000 [33]) | GT(1) | , parameter |
Eckmann and Moses | (Curvature, 2002 [34]) | GT(2) | |
Girvan and Newman | (Modularity, 2002 [35]) | SDP(1) | |
Zhou and Lipowsky | (Vertex proximity, 2004 [36]) | GT(3) | |
Reichardt and Bornholdt | (Spinglass, 2004 [25]) | SDP(2) | parameter dependent |
Clauset et al. | (Fast greedy, 2004 [23]) | SDP(3) | |
Newman and Girvan | (Eigenvector, 2004 [8]) | SP(1) | |
Wu and Huberman | (Linear time, 2004 [37]) | GT(4) | |
Fortunato et al. | (Infocentrality, 2004 [38]) | SDP | |
Radicchi et al. | (Radicchi et al., 2004 [4]) | SP(2) | |
Donetti and Munoz | (Donetti and Munoz, 2004 [39]) | SDP(4) | |
Guimera et al. | (Simulated annealing, 2004 [40]) | SDP(5) | parameter dependent |
Capocci et al. | (Capocci et al., 2004 [41]) | SP(3) | |
Latapy and Pons | (Walktrap, 2004 [15]) | SP(4) | |
Duch and Arenas | (Extremal optimization, 2005 [42]) | GT(5) | |
Bagrow and Bollt | (Local method, 2005 [43]) | SDP(6) | |
Palla et al. | (overlapping community, 2005 [44]) | GT(6) | |
Raghavan et al. | (label propagation, 2007 [13]) | GT(7) | |
Rosvall and Bergstrom | (Infomap, 2008 [45]) | SP(5) | |
Ronhovde and Nussinov | (Multiresolution community, 2009 [46]) | GT(8) | , |
2.3. Observations and Motivations
3. Graph to Metric Space Transformation
3.1. Graph to Metric Space Algorithm
3.1.1. Construction
3.1.2. Function Selection
3.1.3. Choice of λ and
Algorithm 1 Mapping a graph into the metric space. |
4. Community Detection on Induced Metric Space
4.1. k-Partitioning
4.2. k Selection
4.3. Initialization for k-Partitioning
- Input: graph , with the node similarity defined on it,
- Output: A partition of the nodes into k communities ,
- Objective function: Maximize the minimum intra-community similarity:
Algorithm 2 k-center partitioning algorithm. |
Require: M(V, d) Ensure: T = {C1,C2,...,Ck} with minimum cost(T) 1: Initialize centers z1,...,zk ∈ Rn and clusters T = {C1,C2,...,Ck} 2: repeat 3: for i = 1 to k do 4: for j = 1 to k do 5: Ci ← {x ∈ V s.t. |zi − x| ≤ |zj − x|} 6: end for 7: end for 8: for j = 1 to k do 9: zi ← mean(Ci) 10: end for 11: until |cost(Tt) − cost(Tt + 1)| = 0 12: return T = {C1,C2,...,Ck} |
4.4. Convergence
4.5. Data Complexity
5. Experiments and Results
5.1. Experimental Designs
5.2. Performance Indicator
5.3. Datasets
Name | Type | No. of Nodes | No. of Edges | Diameter | DCC | k |
---|---|---|---|---|---|---|
U | 4039 | 88,234 | 4.7 | 0.72498 | 164 | |
Gplus | D | 107,614 | 13,673,453 | 3 | 0.50073 | 457 |
D | 81,306 | 1,768,149 | 4.5 | 0.57072 | 213 | |
Epinions1 | D | 75,879 | 508,837 | 5 | 0.14001 | 128 |
LiveJournal1 | D | 4,847,571 | 68,993,773 | 6.5 | 0.27432 | 117 |
Pokec | D | 1,632,803 | 30,622,564 | 5.2 | 0.10971 | 246 |
Slashdot0811 | D | 77,360 | 905,468 | 4.7 | 0.05884 | 81 |
Slashdot0922 | D | 82,168 | 948,464 | 4.7 | 0.06340 | 87 |
Friendster | U | 65,608,366 | 1,806,067,135 | 5.8 | 0.16231 | 833 |
Orkut | U | 3,072,441 | 117,185,083 | 4.8 | 0.16689 | 756 |
Youtube | U | 1,134,890 | 2,987,624 | 6.5 | 0.08090 | 811 |
DBLP | U | 317,080 | 1,049,866 | 8 | 0.63307 | 268 |
Arxiv-AstroPh | U | 18,772 | 396,160 | 5 | 0.65841 | 23 |
web-Stanford | D | 281,903 | 2,312,497 | 9.7 | 0.60034 | 69 |
Amazon0601 | D | 403,394 | 3,387,388 | 7.6 | 0.41890 | 92 |
P2P-Gnutella31 | D | 62,586 | 147,892 | 6.5 | 0.00710 | 35 |
RoadNet-CA | U | 1,965,206 | 5,533,214 | 500 | 0.40458 | 322 |
Wiki-Vote | D | 7115 | 103,689 | 3.8 | 0.17048 | 21 |
5.4. Computational Results
Name | Spectral | SDP | GT | Metric |
---|---|---|---|---|
0.0097(5) | 0.1074(3) | 0.1044(7) | 0.1082 | |
Gplus | 0.0119(5) | 0.1593(3) | 0.1544(7) | 0.1602 |
0.0035(5) | 0.0480(3) | 0.0465(7) | 0.0483 | |
Epinions1 | 0.0087(5) | 0.1247(6) | 0.1208(7) | 0.1254 |
LiveJournal1 | 0.0039(5) | 0.0703(6) | 0.0680(7) | 0.0706 |
Pokec | 0.0009(4) | 0.0174(3) | 0.0168(7) | 0.0175 |
Slashdot0811 | 0.0005(5) | 0.0097(6) | 0.0094(7) | 0.0098 |
Slashdot0922 | 0.0007(4) | 0.0138(3) | 0.0133(5) | 0.0138 |
Friendster | 0.0012(5) | 0.0273(1) | 0.0263(7) | 0.0273 |
Orkut | 0.0016(5) | 0.0411(3) | 0.0397(7) | 0.0412 |
Youtube | 0.0031(5) | 0.0869(3) | 0.0838(7) | 0.0871 |
DBLP | 0.0007(4) | 0.0210(3) | 0.0203(7) | 0.0211 |
Arxiv-AstroPh | 0.0024(5) | 0.0929(6) | 0.0895(7) | 0.0931 |
web-Stanford | 0.0007(5) | 0.0320(1) | 0.0308(7) | 0.0320 |
Amazon0601 | 0.0018(5) | 0.0899(6) | 0.0865(7) | 0.0900 |
P2P-Gnutella31 | 0.0009(5) | 0.0522(6) | 0.0503(7) | 0.0523 |
RoadNet-CA | 0.0024(5) | 0.1502(3) | 0.1445(7) | 0.1504 |
Wiki-Vote | 0.0026(5) | 0.1853(6) | 0.1783(7) | 0.1855 |
Name | Spectral | SDP | GT | Metric |
---|---|---|---|---|
0.4487(1) | 0.5464(4) | 0.5434(5) | 0.5472 | |
Gplus | 0.2573(1) | 0.4047(3) | 0.3998(5) | 0.4056 |
0.3261(3) | 0.3706(1) | 0.3691(7) | 0.3709 | |
Epinions1 | 0.0280(1) | 0.1440(3) | 0.1401(5) | 0.1447 |
LiveJournal1 | 0.0791(1) | 0.1455(5) | 0.1432(5) | 0.1458 |
Pokec | 0.0129(3) | 0.0294(1) | 0.0288(5) | 0.0295 |
Slashdot0811 | 0.0038(1) | 0.0130(4) | 0.0127(7) | 0.0131 |
Slashdot0922 | 0.0045(1) | 0.0176(5) | 0.0171(5) | 0.0176 |
Friendster | 0.0275(4) | 0.0536(5) | 0.0526(7) | 0.0536 |
Orkut | 0.0294(3) | 0.0689(4) | 0.0675(5) | 0.0690 |
Youtube | 0.0096(1) | 0.0934(2) | 0.0903(5) | 0.0936 |
DBLP | 0.4011(5) | 0.4214(1) | 0.4207(5) | 0.4215 |
Arxiv-AstroPh | 0.4174(3) | 0.5079(3) | 0.5045(5) | 0.5081 |
web-Stanford | 0.3595(5) | 0.3908(4) | 0.3896(7) | 0.3908 |
Amazon0601 | 0.1768(1) | 0.2649(4) | 0.2615(7) | 0.2650 |
P2P-Gnutella31 | 0.0009(1) | 0.0522(2) | 0.0503(5) | 0.0523 |
RoadNet-CA | 0.0212(3) | 0.1690(4) | 0.1633(5) | 0.1692 |
Wiki-Vote | 0.0266(1) | 0.2093(1) | 0.2023(5) | 0.2095 |
Algorithm | Spectral | SDP | GT | Metric |
---|---|---|---|---|
Minimum Time | 884 | 910 | 871 | 869 |
Maximum Time | 1386 | 1725 | 1641 | 869 |
Average Time | 917 | 981 | 1338 | 869 |
5.5. Parameter Settings
5.6. Results Analysis and Achievements
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Freeman, L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
- Carrington, P.J.; Scott, J.; Wasserman, S. Models and Methods in Social Network Analysis; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Newman, M. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
- Radicchi, F.; Castellano, C.; Cecconi, F.; Loreto, V.; Parisi, D. Defining and identifying communities in networks. Proc. Natl. Acad. Sci. 2004, 101, 2658. [Google Scholar] [CrossRef] [PubMed]
- Weiss, R.; Jacobson, E. A method for the analysis of complex organisations. Am. Sociol. Rev. 1955, 20, 661–668. [Google Scholar] [CrossRef]
- Schaeffer, S.E. Graph clustering. Comput. Sci. Rev. 2007, 1, 27–64. [Google Scholar] [CrossRef]
- Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
- Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef]
- Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Danon, L.; Diaz-Guilera, A.; Duch, J.; Arenas, A. Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005, 9, P09008. [Google Scholar] [CrossRef]
- Coscia, M.; Giannotti, F.; Pedreschi, D. A classification for community discovery methods in complex networks. Stat. Anal. Data Min. 2011, 4, 512–546. [Google Scholar] [CrossRef]
- Leskovec, J.; Lang, K.J.; Mahoney, M.W. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web, New York, NY, USA, 26–30 April 2010; pp. 631–640.
- Raghavan, U.N.; Albert, R.; Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 2007, 76, 036106. [Google Scholar] [CrossRef]
- Yang, J.; Leskovec, J. Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 2012, 42, 181–213. [Google Scholar] [CrossRef]
- Pons, P.; Latapy, M. Computing communities in large networks using random walks. In Computer and Information Sciences - ISCIS 2005; Springer Berlin Heidelberg: Berlin, Germany, 2005; pp. 284–293. [Google Scholar]
- Duch, J.; Arenas, A. Community detection in complex networks using Extremal Optimization. Phys. Rev. E 2005, 72, 027104. [Google Scholar] [CrossRef]
- Chakrabarti, D. AutoPart: Parameter-free graph partitioning and outlier detection. In Knowledge Discovery in Databases: PKDD; Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D., Eds.; Springer Berlin Heidelberg: Berlin, Germany, 2004; pp. 112–124. [Google Scholar]
- Macropol, K.; Singh, A.K. Scalable discovery of best clusters on large graphs. Proc. VLDB Endow 2010, 3, 693–702. [Google Scholar] [CrossRef]
- Levorato, V.; Petermann, C. Detection of communities in directed networks based on strongly p-connected components. In Proceedings of the 2011 International Conference on Computational Aspects of Social Networks (CASoN), Salamanca, Spain, 19–21 October 2011; pp. 211–216.
- Brandes, U.; Gaertler, M.; Wagner, D. Experiments on graph clustering algorithms. In Algorithms—ESA 2003; Battista, G.D., Zwick, U., Eds.; Springer Berlin Heidelberg: Berlin, Germany, 2003; pp. 568–579. [Google Scholar]
- Bullmore, E.; Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 2009, 10, 186–198. [Google Scholar] [CrossRef] [PubMed]
- Newman, M. Fast algorithm for detecting community structure in networks. Phys. Rev. E 2004, 69, 066133. [Google Scholar] [CrossRef]
- Clauset, A.; Newman, M.E.J.; Moore, C. Finding community structure in very large networks. Phys. Rev. E 2004, 60, 066111. [Google Scholar] [CrossRef]
- Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
- Reichardt, J.; Bornholdt, S. Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 2004, 93, 218701. [Google Scholar] [CrossRef] [PubMed]
- Deritei, D.; Lazar, Z.I.; Papp, I.; Jarai-Szabo, F.; Sumi, R.; Varga, L.; Regan, E.R.; Ercsey-Ravasz, M. Community detection by graph Voronoi diagrams. New J. Phys. 2014, 16, 063007. [Google Scholar] [CrossRef]
- Zarei, M.; Samani, K.A.; Omidi, G.R. Complex eigenvectors of network matrices give better insight into the community structure. J. Stat. Mech.Theory Exp. 2009, 2009, P10018. [Google Scholar] [CrossRef]
- Pan, G.; Zhang, W.; Wu, Z.; Li, S. Online community detection for large complex networks. PLOS ONE 2014, 9, e102799. [Google Scholar] [CrossRef] [PubMed]
- Lee, C.; Cunningham, P. Community detection: Effective evaluation on large social networks. J. Complex Netw. 2013, 2, 19–37. [Google Scholar] [CrossRef]
- Aldecoa, R.; Marin, I. Exploring the limits of community detection strategies in complex networks. Sci. Rep. 2013, 3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- De Meo, P.; Ferrara, E.; Fiumara, G.; Provetti, A. Mixing local and global information for community detection in large networks. J. Comput. Syst. Sci. 2014, 80, 72–87. [Google Scholar] [CrossRef]
- Abou-Moustafa, K.T.; Schuurmans, D.; Ferrie, F.P. Learning a metric space for neighbourhood topology estimation: Application to manifold learning. In Proceedings of the ACML 2013, Canberra, ACT, Australia, 13–15 November 2013; pp. 341–356.
- A cluster algorithm for graphs. Available online: http://oai.cwi.nl/oai/asset/4463/04463D.pdf (accessed on 21 August 2015).
- Eckmann, J.P.; Moses, E. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. 2002, 99, 5825–5829. [Google Scholar] [CrossRef] [PubMed]
- Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Lipowsky, R. Network brownian motion: A new method to measure vertex-vertex proximity and to identify communities and subcommunities. In Computational Science - ICCS 2004; Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J., Eds.; Springer Berlin Heidelberg: Berlin, Germany, 2004; pp. 1062–1069. [Google Scholar]
- Wu, F.; Huberman, B. Finding communities in linear time: A physics approach. The Eur. Phys. J. B Condens. Matter Complex Syst. 2004, 38, 331–338. [Google Scholar] [CrossRef]
- Fortunato, S.; Latora, V.; Marchiori, M. Method to find community structures based on information centrality. Phys. Rev. E 2004, 70, 056104. [Google Scholar] [CrossRef]
- Donetti, L.; Muñoz, M.A. Detecting network communities: A new systematic and efficient algorithm. J. Stat. Mech. Theory Exp. 2004, 2004, P10012. [Google Scholar] [CrossRef]
- Guimera, R.; Amaral, L.A.N. Functional cartography of complex metabolic networks. Nature 2005, 433, 895–900. [Google Scholar] [CrossRef] [PubMed]
- Capocci, A.; Servedio, V.D.P.; Caldarelli, G.; Colaiori, F. Detecting communities in large networks. Phys. A Stat. Mech. Its Appl. 2004, 352, 669–676. [Google Scholar] [CrossRef]
- Duch, J.; Arenas, A. Community detection in complex networks using extremal optimization. Phys. Rev. E 2005, 72, 027104. [Google Scholar] [CrossRef]
- Bagrow, J.P.; Bollt, E.M. Local method for detecting communities. Phys. Rev. E 2005, 72, 046108. [Google Scholar] [CrossRef]
- Palla, G.; Derenyi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435, 814–818. [Google Scholar] [CrossRef] [PubMed]
- Rosvall, M.; Bergstrom, C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 2008, 105, 1118–1123. [Google Scholar] [CrossRef] [PubMed]
- Ronhovde, P.; Nussinov, Z. Multiresolution community detection for megascale networks by information-based replica correlations. Phys. Rev. E 2009, 80, 016109. [Google Scholar] [CrossRef]
- Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 2008, 6, 29–123. [Google Scholar] [CrossRef]
- Gonzalez, T.F. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 1985, 38, 293–306. [Google Scholar] [CrossRef]
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saha, S.; Ghrera, S.P. Network Community Detection on Metric Space. Algorithms 2015, 8, 680-696. https://doi.org/10.3390/a8030680
Saha S, Ghrera SP. Network Community Detection on Metric Space. Algorithms. 2015; 8(3):680-696. https://doi.org/10.3390/a8030680
Chicago/Turabian StyleSaha, Suman, and Satya P. Ghrera. 2015. "Network Community Detection on Metric Space" Algorithms 8, no. 3: 680-696. https://doi.org/10.3390/a8030680
APA StyleSaha, S., & Ghrera, S. P. (2015). Network Community Detection on Metric Space. Algorithms, 8(3), 680-696. https://doi.org/10.3390/a8030680