A Survey on the Recent Advances of Deep Community Detection
Abstract
:1. Introduction
- We describe and analyze the most prominent new strategies for community detection.
- We organize the existing schemes in groups, depending on the technique being used.
- We outline several points of interest based on the study of each family of community detection papers.
- We present some possible future trends.
2. Preliminaries and Definitions Used in This Survey
- Overlapping Communities: Overlapping communities are communities that have nodes in common.
- Normalized-Cut: A measure that computes the cut cost as a fraction of the total edge connections to all the nodes in the graph.
- Seed nodes: Key nodes of high influence over the network.
- Graph embedding: A process that learns a mapping from a network to a vector space, while preserving relevant network properties.
- Node embedding: The process of mapping each node into a low-dimensional space, to get insight of its similarity and network structure.
- Random walks: Given network , the random walk is a process used to obtain random sequences of vertices rooted on each vertex . The walks are randomly sampled starting from the neighbors of the last visited vertex.
- Normalized Mutual Information (NMI): a measure used to evaluate network partitioning performed by community detection schemes.
- Modularity: A measure of the network structure for the strength of division of a network into clusters or communities.
3. Deep Community Detection Strategies
3.1. AE-Based Community Detection Strategies
Comments on the AE Based Community Detection Schemes
- The two kinds of information considered in community detection produce limited and inefficient models. Auto-encoders provide a solution: The topology and node content information can be considered as objective functions to be combined in a linear form. However, this linearity is not certain in real world networks, and this makes the models limited. In addition, we can produce a network representation by combining the two types of information but, the optimization decision on the appropriate ration of the two types of information makes the model inefficient. Finally, the high-dimensional feature space of the input data can lead to the production of very complex neural network architectures and can increase the number of trainable parameters, which may reach millions. The auto-encoder is a good method to overcome the efficiency problems of linear optimization and to combine the two types of information.
- Auto-encoders are the dominant method for mapping data points into the lower dimensional spaces: The disadvantage of the network high-dimensionality is the very large computational costs that the community detection methods suffer. Thus, a method of converting high-dimensional graphs into a space with fewer dimensions, in which the important information regarding the network structure and the node features are still maintained, is required. The graphs are generally represented by adjacency matrices, which store the node connection information but not the proximity information for non-directly connected nodes. The dominant solution to this accuracy problem is the auto-encoder (AE).
- Sparsity of network samples: Network sampling techniques like random walk sampling can be used to to derive vertex sequences as training datasets. However, the sampled data represent only a small portion of all the vertex sequences. Alternatively, these structures can be encoded in a continuous code space. The idea is to adaptively find the optimal representation by adding a sparse constraint to the auto-encoder for this purpose (sparse auto-encoders). An example is the work of Fei et al. [45], which employs a similarity matrix to reveal the indirect connections between nodes and a sparse auto-encoder to lower the dimensions and extract the complex network structures.
- Community Overlapping: The majority of the strategies described do not deal with overlapped communities. This can be a subject of future concern. An example of auto-encoder based community detection scheme is [27].
- Complexity: The high dimensionality of the feature space of the input data normally increases the number of trainable parameters and makes the models rather complex. Most of the schemes achieve linear time complexity by employing division policies over the network and/or data space to efficiently handle the large data amounts (for examples, see [26,41]).
3.2. CNN and GNN Based Approaches
3.2.1. CNN-Based Community Detection
3.2.2. GNN-Based Community Detection
3.2.3. Comments on the GNN/CNN Based Community Detection Schemes
- The CNN/GNN approaches consider both data structure and node features: The recent deep clustering strategies combine feature extraction and dimensionality reduction in a model which enables deep neural networks to learn suitable representations based on the criteria of the clustering module being used. These strategies usually focus their attention on the data features and not on the structure of the data taken into account during representation learning. This structure reveals the similarity among samples, and therefore is rather useful on representation learning. The emerging CNN/GNN approaches encode the graph structure and node features for the representation. The structural information is very important in data representation learning.
- The CNN/GNN based community detection strategies are basically semi-supervised or supervised learning strategies: Typically, the real networks are unique in the sense that the training data of one network cannot be properly used in a network with a different structure. When communities are located, the only data that can be used for analysis is the same network’s data. Schemes that apply semi-supervised learning strategies can be employed so that a network’s node label information can be used to predict community features for the remaining non-labeled nodes of the same network. These strategies are sometimes very expensive. In this regard, there are CNN/GNN strategies that exploit the strength of auto-encoders to develop automatic feature learning [24,49].
- Many real-world networks are characterized, at a large scale, by skewed class distributions: This is because different network parts may evolve without restrictions. For example, different nodes may be topologically connected with multiple other nodes, so the node class assignment largely depends on its connected nodes. Representation learning through accurately identified boundaries for each class cannot be easily achieved because each class is highly affected by the neighboring classes in the graph. The problem of imbalanced class distributions is hardly addressed in most of the papers, which assume or purposefully provide balanced label distributions for each class. A solution is given in [22], where a two-layered graph network is used to extract node representations trained on imbalanced class labels.
- Performance degrades as graph convolutional layers are added: As the depth of the convolutional network is increased, performance declines, so does the quality of the neighbor propagation. This is due to the fact that the addition of many convolutional layers is similar to multiple Laplacian smoothing procedures. This in fact smooths the node features and the nodes from different clusters become indistinguishable (in the sense that their attribute values converge and they become equal). The proposed strategies try to keep the number of layers to relatively small numbers (up to three, some examples are [48,49,57,60]). The ladder-shaped architecture proposed by [65] et al. is another idea to overcome this issue.
- Real-world graphs are formed by a combination of too many latent factors (for example, the same likes, hobbies, professions, etc.). Removing these factors from a graph is a difficult procedure: to learn the node representations free from such factors may need extraction of subsets of a neighborhood, in a quite complex network. Moreover, there are factors that may connect two people which are somehow overlapping (a football fan and a fan of a specific football club). Without completely distinguishing them (or making them independent), redundant representations will appear. Therefore, special attention should be given to implement techniques that support representation learning that discovers independent latent factors in a graph network. Ref. [66] is such an example.
- Convolutional networks are integrated with undirected graph models: Such graphical models are the Markov Random Fields [48,49]. The convolutional networks generally do not consider community properties and thus the embeddings they construct are not community oriented. On the other hand, the undirected graph models, do not take into account the node information and the model learning requires complex computations. In this regard, the convolutional networks and graphical models like MRF are complementary and form a good combination for CNN/GNN based community detection.
- Overlapping and complexity analysis: Most of the strategies do not consider the fact that communities may overlap; an exception is [57]. Moreover, many strategies do not provide comprehensive complexity analysis, as can be seen in Table 2. For those who do, we can observe linear complexity in terms of the number of nodes, edges, and samples being used.
3.3. GAN Based Approaches
Comments on the GAN-Based Community Detection Schemes
- Regularizations: Deep learning strategies (AE- or GNN- based) focus on preserving the structure relationship and generally ignore the latent data distribution of the representation. Using GANs, the new data generated have the same distribution as real data, and this is a powerful tool for analyzing network data. For example, in [67], some form of regularization is introduced to the latent codes, thereby forcing them to follow some prior data distribution.
- Dense Overlapping Issues: In real-world datasets, vertices can simultaneously belong to numerous communities and clustering algorithms cannot handle such dense overlapping. A weakness in performing overlapping community detection and network embedding simultaneously has been reported (see also our comments in the presentation of AE and CNN/GNN-based schemes). An idea is to combine the effective GANS with affiliation graph models (AGM), which can model densely overlapping community structures. Thus, the performance of GANs and the direct vertex-community membership representation of AGMs join forces to solve the dense overlapping issues [68]. In addition, the scheme presented in [69] is working on overlaps.
- Undistinguished real and not real embedding combinations: Most of the GANs compare the generated fake data, which are produced based on Gaussian distribution sampling to the real data. In adversarial learning approaches, the Gaussian distributions cannot be rectified with real data so the results are not really beneficial for the network embeddings. Approaches like JANE [71] can be used to separate real and fake combinations of embeddings, topology information, and node features. Because of the combined topology information and node features, the Gaussian distribution is capable of capturing the semantic variations in latent space. Thus, the overall embedding results become more useful and the performance of network analysis is highly improved.
4. Future Trends
- Large scale of modern networks: The analysis of modern social networks becomes rather cumbersome, as their size and number keeps growing larger and larger. Older methods tried to employ some level of parallelism, but these suffered from low speedups. Generally, there is still a growing need for algorithms that do not fail to scale with the increasing number of users and the growing complexity of their relationships. Moreover, the existing schemes that were presented in this work generally have not been tested on graphs with billions of vertices or nodes using more powerful machines.
- Dynamic Community Detection: Dynamic community detection requires that the changes in user relationships and in the overall network topology should also be considered. This means that models that extract spatio-temporal features of the social networks have to be developed. Generally, the irregular topologies, the frequent changes of relationships among users, and the network updates pose certain burdens on the processing of synchronous social networks, and more computational efforts are now required to accommodate these changes.
- Better use of computational resources: Certain computations performed necessarily require the consumption of large quantities of computational resources. For example, in [45], the authors admit that the similarity matrix is obtained through processing of the adjacency matrix, and the calculations involved require large memory consumption and powerful experimental equipment. Thus, strategies that consume large amounts of resources or even exhaust the existing ones should be equipped with some type of decomposition strategy, which also may be accompanied with a well-designed parallelism scheme.
- Community overlapping: As mentioned during the description of the most recent schemes in Section 2, most of the strategies presented do consider the situation of community overlaps (some exceptions have been mentioned in the text). In the future, most of these works should be expanded so that this very important issue is taken into account.
- Meaningful semantic representations of communities: The community detection problem is a well-studied problem, and the first papers were published back in the beginning of this century. However, the datasets being used as inputs to construct communities contain rather discrete information like well-used words, symbols, forms of categorization for professions or other attributes or short phrases. Better datasets should be developed and used in deep learning strategies to extract and even predict communities. Generally, the vast amounts of information produced by the social networks should be used in a better and more informative way, in order to produce a more meaningful semantic representation of communities. In addition, the better interpretation of specific communities (medical, sport, politic, etc.) is closely related to this direction.
- Natural Language Processing (NPL embeddings): A recent trend is to compute node embeddings by using the second order random walks [72]. These embeddings aim at keeping similar nodes close in their representations. Simplicial complexes represent a promising way for representing embeddings, but still the community extraction remains a challenge to be dealt with [73]. Other future trends include the ego networks, the streaming community detection (graph streams), and the temporal graphs.
- More straightforward comparisons are needed: During our study, we have noticed that there is a lack of straightforward comparisons among the presented strategies. Perhaps, tools like the NetworKit [74] can be helpful in this regard. NetworKit is a growing open-source toolkit for large-scale network analysis and already contains novel algorithms from recently published research.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, F.; Xue, S.; Wu, J.; Zhou, C.; Hu, W.; Paris, C.; Nepal, S.; Yang, J.; Yu, P.S. Deep Learning for Community Detection: Progress, Challenges and Opportunities. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 January 2021; pp. 4981–4987. [Google Scholar]
- Jin, D.; Wang, K.; Zhang, G.; Jiao, P.; He, D.; Fogelman-Soulie, F.; Huang, X. Detecting communities with multiplex semantics by distinguishing background, general and specialized topics. IEEE Trans. Knowl. Data Eng. 2020, 32, 2144–2158. [Google Scholar] [CrossRef]
- Satuluri, V.; Wu, Y.; Zheng, X.; Qian, Y.; Wichers, B.; Dai, Q.; Tang, G.M.; Jiang, J.; Lin, J. Simclusters: Community-based representations for heterogeneous recommendations at twitter. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, Online, 23–27 August 2020; pp. 3183–3193. [Google Scholar]
- Keyvanpour, M.R.; Shirzad, M.B.; Ghaderi, M. AD-C: A new node anomaly detection based on community detection in social networks. Int. J. Electron. Bus. 2020, 15, 199–222. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Xu, J. Convexified modularity maximization for degree-corrected stochastic block models. Ann. Stat. 2018, 46, 1573–1602. [Google Scholar] [CrossRef] [Green Version]
- Pizzuti, C.; Socievole, A. Multiobjective optimization and local merge for clustering attributed graphs. IEEE Trans. Cybern. 2019, 50, 4997–5009. [Google Scholar] [CrossRef]
- Gulikers, L.; Lelarge, M.; Massoulié, L. A spectral method for community detection in moderately sparse degree-corrected stochastic block models. Adv. Appl. Probab. 2017, 49, 686–721. [Google Scholar] [CrossRef] [Green Version]
- Dhilber, M.; Bhavani, S.D. Community Detection in Social Networks Using Deep Learning. In Proceedings of the 16th International Conference, ICDCIT 2020, Bhubaneswar, India, 9–12 January 2020. [Google Scholar]
- Souravlas, S.; Sifaleras, A.; Katsavounis, S. A Parallel Algorithm for Community Detection in Social Networks, Based on Path Analysis and Threaded Binary Trees. IEEE Access 2019, 7, 20499–20519. [Google Scholar] [CrossRef]
- Souravlas, S.; Sifaleras, A.; Katsavounis, S. Hybrid CPU-GPU Community Detection in Weighted Networks. IEEE Access 2020, 8, 57527–57551. [Google Scholar] [CrossRef]
- Jin, D.; Yu, Z.; Jiao, P.; Pan, S.; Yu, P.S.; Zhang, W. A Survey of Community Detection Approaches: From Statistical Modeling to Deep Representation. arXiv 2021, arXiv:2101.01669. [Google Scholar]
- Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection; Tech. Rep.; University Stanford: Stanford, CA, USA, 2014; Available online: http://snap.stanford.edu/data (accessed on 26 July 2021).
- Citation Network Dataset: DBLP+Citation, ACM Citation Network. Available online: https://www.aminer.org/citation (accessed on 26 July 2021).
- Email-EU. Available online: https://paperswithcode.com/dataset/email-eu (accessed on 8 May 2012).
- Lu, Z.; Sun, X.; Wen, Y.; Cao, G.; Porta, T.L. Algorithms and applications for community detection in weighted networks. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 2916–2926. [Google Scholar] [CrossRef]
- Qiao, S.; Han, N.; Gao, Y.; Li, R.-H.; Huang, J.; Guo, J.; Gutierrez, L.A.; Wu, X. A fast parallel community discovery model on complex networks through approximate optimization. IEEE Trans. Knowl. Data Eng. 2018, 30, 1638–1651. [Google Scholar] [CrossRef]
- Matin, P.; Zhan, J. Optimized Label Propagation Community Detectionon Big Data Networks. In Proceedings of the 2018 International Conference on Big Data and Education(ICBDE’18), Honolulu, HI, USA, 9–11 March 2018; pp. 57–62. [Google Scholar]
- Dusan, D.; Aloise, D.; Mladenovic, N. Ascent–descent variable neigh-borhood decomposition search for community detection by modularity maximization. Ann. Oper. Res. 2018, 272, 273–287. [Google Scholar]
- Fan, S.; Wang, X.; Shi, C.; Lu, E.; Lin, K.; Wang, B. One2Multi Graph Autoencoder for Multi-view Graph Clustering. In Proceedings of the Web Conference 2020 (WWW ’20), Taipei, Taiwan, 20–24 April 2020; pp. 3070–3076. [Google Scholar] [CrossRef]
- Luo, D.; Ni, J.; Wang, S.; Bian, Y.; Yu, X.; Zhang, X. Deep Multi-Graph Clustering via Attentive Cross-Graph Association. In Proceedings of the 13th International Conference on Web Search and Data Mining, (WSDM 2020), Houston, TX, USA, 3–7 February 2020; pp. 393–401. [Google Scholar]
- Sarkar, A.; Mehta, N.; Rai, P. Graph Representation Learning via Ladder Gamma Variational Autoencoders. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 5604–5611. [Google Scholar]
- Shi, H.; Fan, H.; Kwok, J.T. Effective Decoding in Graph Auto-Encoder Using Triadic Closure. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 906–913. [Google Scholar]
- Shang, J.; Wang, C.; Xin, X.; Ying, X. Community detection algorithm based on deep sparse autoencoder. J. Softw. 2017, 28, 648–662. [Google Scholar]
- Wu, L.; Zhang, Q.; Chen, C.; Guo, K.; Wang, D. Deep Learning Techniques for Community Detection in Social Networks. IEEE Access 2020, 8, 96016–96026. [Google Scholar] [CrossRef]
- Cao, J.; Jin, D.; Dang, J. Auto-encoder based community detection with adaptive integration of network topology and node contents. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Changchun, China, 17–19 August 2018; pp. 184–196. [Google Scholar]
- Cao, J.; Jin, D.; Yang, L.; Dang, J. Incorporating network structure with node contents for community detection on large networks using deep learning. Neurocomputing 2018, 297, 71–81. [Google Scholar] [CrossRef]
- Bhatia, V.; Rani, R. A distributed overlapping community detection model for large graphs using autoencoder. Future Gener. Comput. Syst. 2019, 94, 16–26. [Google Scholar] [CrossRef]
- Shao, M.; Li, S.; Ding, Z.; Fu, Y. Deep linear coding for fast graph clustering. In Proceedings of the 24th International Conference on Artificial Intelligence, Virtually, Online, 13–15 April 2021; pp. 3798–3804. [Google Scholar]
- Song, C.; Liu, F.; Huang, Y.; Wang, L.; Tan, T. Auto-encoder based data clustering. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 117–124. [Google Scholar]
- Tian, F.; Gao, B.; Cui, Q.; Chen, E.; Liu, T.-Y. Learning deep representations for graph clustering. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1293–1299. [Google Scholar]
- Yann, L.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar]
- Cavallari, S.; Zheng, V.W.; Cai, H.; Chang, K.C.-C.; Cambria, E. Learning community embedding with community detection and node embedding on graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 377–386. [Google Scholar]
- Yu, W.; Zheng, C.; Cheng, W.; Aggarwal, C.C.; Song, D.; Zong, B.; Chen, H.; Wang, W. Learning deep network representations with adversarially regularized autoencoders. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2663–2671. [Google Scholar]
- Xu, R.; Che, Y.; Wang, X.; Hu, J.; Xie, Y. Stacked auto-encoder based community detection method via an ensemble clustering framework. Inf. Sci. 2020, 526, 151–165. [Google Scholar] [CrossRef]
- Ivannikova, E.; Park, H.; Hamalainen, T.; Lee, K. Revealing community structures by ensemble clustering using group diffusion. Inform. Fusion 2018, 42, 24–36. [Google Scholar] [CrossRef]
- Choong, J.J.; Liu, X.; Murata, T. Optimizing Variational Graph Autoencoder for Community Detection with Dual Optimization. Entropy 2020, 22, 197. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Choong, J.J.; Liu, X.; Murata, T. Learning community structure with variational autoencoder. In Proceedings of the IEEE International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 69–78. [Google Scholar]
- Chen, Z.; Chen, C.; Zhang, Z.; Zheng, Z.; Zou, Q. Variational graph embedding and clustering with Laplacian eigenmaps. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 2144–2150. [Google Scholar]
- Dai, D.; Yu, L.; Wei, H. Parameters sharing in residual neural networks. Neural Process. Lett. 2020, 51, 1393–1410. [Google Scholar] [CrossRef]
- Aich, S.; Yamazaki, M.; Taniguchi, Y.; Stavness, I. Multi-scale weight sharing network for image recognition. Pattern Recognit. Lett. 2020, 131, 348–354. [Google Scholar] [CrossRef] [Green Version]
- Al-Andoli, M.; Cheah, W.P.; Tan, S.C. Deep learning-based community detection in complex networks with network partitioning and reduction of trainable parameters. J. Ambient. Intell. Human Comput. 2021, 12, 2527–2545. [Google Scholar] [CrossRef]
- Cui, G.; Zhou, J.; Yang, C.; Liu, Z. Adaptive Graph Encoder for Attributed Graph Embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, Online, 23–27 August 2020; pp. 976–985. [Google Scholar] [CrossRef]
- Clauset, A.; Newman, M.E.; Moore, C. Finding community structure in very large networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef] [Green Version]
- Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10, P10008. [Google Scholar] [CrossRef] [Green Version]
- Fei, R.; Sha, J.; Xu, Q.; Hu, B.; Wang, K.; Li, S. A new deep sparse autoencoder for community detection in complex networks. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 91. [Google Scholar] [CrossRef]
- Chen, Z.; Li, X.; Bruna, J. Supervised Community Detection with Line Graph Neural Networks. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- He, D.; You, X.; Feng, Z.; Jin, D.; Yang, X.; Zhang, W. A Network-Specific Markov Random Field Approach to Community Detection. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Jin, D.; Liu, Z.; Li, W.; He, D.; Zhang, W. Graph Convolutional Networks Meet Markov Random Fields: Semi-Supervised Community Detection in Attribute Networks. In Proceedings of the Thirty-Third Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January–1 February 2019; pp. 152–159. [Google Scholar]
- He, D.; Song, Y.; Jin, D.; Feng, Z.; Zhang, B.; Yu, Z.; Zhang, W. Community-Centric Graph Convolutional Network for Unsupervised Community Detection. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 July 2020; pp. 3515–3521. [Google Scholar]
- Sperlí, G. A deep learning based community detection approach. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol, Cyprus, 8–12 April 2019; pp. 1107–1110. [Google Scholar]
- Cai, B.; Wanga, Y.; Zeng, L.; Hua, Y.; Li, H. Edge classification based on Convolutional Neural Networks for community detection in complex network. Phys. A Stat. Mech. Appl. 2020, 556, 124826. [Google Scholar] [CrossRef]
- Xu, Y.; Chi, Y.; Tian, Y. Deep convolutional neural networks for feature extraction of images generated from complex networks topologies. Wireless Pers. Commun. 2018, 103, 327–338. [Google Scholar] [CrossRef]
- Xu, J.; Li, M.; Jiang, J.; Ge, B.; Cai, M. Early prediction of scientific impact based on multi-bibliographic features and convolutional neural network. IEEE Access 2019, 7, 92248–92258. [Google Scholar] [CrossRef]
- Hanocka, R.; Hertz, A.; Fish, N.; Giryes, R.; Fleishman, S.; Cohen-Or, D. MeshCNN: A network with an edge. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Gao, W.; Zhou, P. Customized high performance and energy efficient communication networks for AI chips. IEEE Access 2019, 7, 69434–69446. [Google Scholar] [CrossRef]
- Wang, C.; Pan, S.; Hu, R.; Long, G.; Jiang, J.; Zhang, C. Attributed Graph Clustering: A Deep Attentional Embedding Approach. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 3670–3676. [Google Scholar]
- Shchur, O.; Günnemann, S. Overlapping Community Detection with Graph Neural Networks. In Proceedings of the First International Workshop on Deep Learning on Graphs (In Conjunction with the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining), Anchorage, AK, USA, 4–8 August 2019; pp. 1–7. [Google Scholar]
- Levie, R.; Monti, F.; Bresson, X.; Bronstein, M.M. CayleyNets: Graph Convolutional Neural Networks With Complex Rational Spectral Filters. IEEE Trans. Signal Process. 2019, 67, 97–109. [Google Scholar] [CrossRef] [Green Version]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 2016, 29, 3844–3852. [Google Scholar]
- Bo, D.; Wang, X.; Shi, C.; Zhu, M.; Lu, E.; Cui, P. Structural Deep Clustering Network. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 1400–1410. [Google Scholar]
- Cheng, J.; Wang, Q.; Tao, Z.; Xie, D.; Gao, Q. Multi-View Attribute Graph Convolution Networks for Clustering. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 July 2020; pp. 2973–2979. [Google Scholar]
- Zhang, X.; He, L.; Chen, K.; Luo, Y.; Zhou, J.; Wang, F. Multi-view graph convolutional network and its applications on neuroimage analysis for parkinson’s disease. In Proceedings of the AMIA Annual Symposium Proceedings, San Francisco, CA, USA, 3–7 November 2018; p. 1147. [Google Scholar]
- Ma, T.; Xiao, C.; Zhou, J.; Wang, F. Drug similarity integration through attentive multiview graph auto-encoders. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 3477–3483. [Google Scholar]
- Shi, M.; Tang, Y.; Zhu, X.; Wilson, D.; Liu, J. Multi-Class Imbalanced Graph Convolutional Network Learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 July 2020; pp. 2879–2985. [Google Scholar]
- Hu, R.; Pan, S.; Long, G.; Lu, Q.; Zhu, L.; Jiang, J. Going Deep: Graph Convolutional Ladder-Shape Networks. In Proceedings of the Thirty-Fourth Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 2838–2845. [Google Scholar]
- Liu, Y.; Wang, X.; Wu, S.; Xiao, Z. Independence Promoted Graph Disentangled Networks. In Proceedings of the Thirty-Fourth Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 4916–4923. [Google Scholar]
- Pan, S.; Hu, R.; Fung, S.-F.; Long, G.; Jiang, J.; Zhang, C. Learning Graph Embedding With Adversarial Training Methods. IEEE Trans. Cybern. 2020, 50, 2475–2487. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jia, Y.; Zhang, Q.; Zhang, W.; Wang, X. CommunityGAN: Community Detection with Generative Adversarial Nets. In Proceedings of the WWW ’19: The World Wide Web Conference, San Francisco, CA, USA, 13 May 2019; 2019; pp. 784–794. [Google Scholar]
- Zhang, Y.; Xiong, Y.; Ye, Y.; Liu, T.; Wang, W.; Zhu, Y.; Yu, P.S. SEAL: Learning Heuristics for Community Detection with Generative Adversarial Networks. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Online, 6–10 July 2020; pp. 1103–1113. [Google Scholar]
- Wang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang, F.; Li, W.; Xie, X.; Guo, M. Learning Graph Representation with Generative Adversarial Nets. IEEE Trans. Knowl. Data Eng. 2019, 33, 1–13. [Google Scholar] [CrossRef]
- Yang, L.; Wang, Y.; Gu, J.; Wang, C.; Cao, X.; Guo, Y. JANE: Jointly Adversarial Network Embedding. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 July 2020; pp. 1381–1387. [Google Scholar]
- Kulkarni, S.; Katariya, J.K.; Potika, K. GloVeNoR: GloVe for Node Representations with Second Order Random Walks. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), The Hague, The Netherlands, 7–10 December 2020; pp. 536–543. [Google Scholar] [CrossRef]
- Billings, J.C.W.; Hu, M.; Lerda, G.; Medvedev, A.N.; Mottes, F.; Onicas, A.; Santoro, A.; Petri, G. Simplex2Vec embeddings for community detection in simplicial complexes. arXiv 2019, arXiv:1906.09068. [Google Scholar]
- Staudt, C.L.; Meyerhenke, H. Engineering parallel algorithms for community detection in massive networks. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 171–184. [Google Scholar] [CrossRef] [Green Version]
Paper | Year | Strategy | Complexity | Basic Experimental Results |
---|---|---|---|---|
[26] | 2018 | Combines the models of modularity and normalized-cut via an auto-encoder | Outperforms classic community detection schemes like the Louvian [43] or CNM [44] method or similar approaches that use only content or topological information. | |
[27] | 2019 | Two-phased approach; (1) the auto-encoder layered approach is used to initialize candidate seed nodes, (2) Refinement of the seed nodes and clusters in the last encoder layer. | The selection of seed nodes reduces the number of iterations. The strategy scales up linearly produces better quality of clusters. | |
[32] | 2017 | Community Embedding framework that jointly solves community embedding, community detection and node embedding. | Tested on real world datasets, it improves graph visualization and outperforms other schemes in application tasks, like community detection and node classification. | |
[33] | 2018 | Generative adversarial training process as a complementary regularizer | Effectiveness on a variety of tasks, including network reconstruction, link prediction, and multi-label classification. | |
[34] | 2020 | Combines transfer learning and a stacked auto-encoder to produce a feature representation of complex networks in low dimension. | Experiments on artificial and real networks show that it is superior to the existing methods and has great potential in solving the community detection problems | |
[36] | 2020 | A dual optimization procedure, which aims at guiding the optimization process. | The performance gain is marginal as far as community detection is concerned, and the number of learnable parameters, is reduced; thus, faster convergence and training speed are achieved. | |
[41] | 2020 | Based on partitioning complex networks and on techniques for reduction and sharing of trainable parameters. | Not inferred | Improves training speed and efficiency, which increases in deeper level of partitioning. |
[42] | 2020 | A Laplacian smoothing filter combined to an adaptive encoder. | Not inferred | Experiments on four datasets have shown that the scheme outperforms other graph embedding methods considerably on node clustering and link prediction tasks. |
Paper | Year | Strategy | Complexity | Basic Experimental Results |
---|---|---|---|---|
[48] | 2019 | CNN-Based: The MRF model and its inference are transformed to a convolutional layer, and it is added as the third (and last) layer of the basic two-layer convolutional network architecture. | Superior performance over state-of-the-art methods in terms of accuracy, normalized mutual information (NMI) and runtime—in addition, high scalability on several large benchmark problems. | |
[49] | 2021 | CNN-Based with AE: MRFasGCN [48] is cast as an encoder and then the node community membership is determined in the hidden layer of the encoder. | Not inferred | Improved performance in terms of accuracy, normalized mutual information (NMI) and runtime—in addition, high scalability on several large benchmark problems. |
[50] | 2019 | CNN-Based: Uses two convolutional steps, each composed by a convolution and a max-pooling layer for topology learning of the input network, plus a full connection layer to refine the results. | Not inferred | Improved efficiency of execution time (in terms of the density variation of sparse matrix), especially for larger node numbers and improved accuracy of predictions. |
[51] | 2020 | CNN-Based: Uses an edge-to-image model that can map the edge network structure to an image structure. | Not inferred | Increases the accuracy of community structure evaluation for both computer-generated networks and large-scale real-world networks. |
[24] | 2020 | CNN-Based with AE: Based on three procedures, namely a matrix reconstruction method, a spatial feature extraction method (via an auto-encoder) and a community detection strategy. | Not inferred | Offers higher modularity thus it can efficiently detect high quality communities in social networks. |
[56] | 2019 | GNN-Based: Uses an attention network to capture the importance of the neighboring nodes to a target node, to achieve a compact representation, on which a decoder is trained to reconstruct the graph structure. | Not inferred | Better performance compared to other strategies in terms of Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI). |
[57] | 2019 | GNN-Based: Overlapping community detection, combines GNN with Bernoulli–Poisson model. | It better recovers communities in graphs with known ground-truth communities, compared to other state-of-the-art schemes. | |
[58] | 2019 | GNN-Based: A spectral graph GNN architecture, based on complex rational Cayley filters, which can detect narrow frequency bands of importance during training. | Experiments have shown improved performance compared to spectral domain convolutional approaches in terms, of image classification, accuracy, vertex classification, and matrix completion tasks. | |
[60] | 2020 | GNN-Based: a delivery operator to transfer the representations learned by auto-encoder to the corresponding layer, combined with a dual self-supervised mechanism to use as guidance for the whole model updates. | Linear to the number of nodes N and the number of samples, d | Better clustering accuracy and better clustering quality compared to other state-of-the-art schemes |
[61] | 2020 | GNN-Based: Based on two-pathway encoders that map graph embedding features and learn view-consistency information. | Not inferred | Better performance compared to other strategies in terms of Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI). |
[64] | 2020 | Method that imposes two regulation types to handle the representation learning, a class-conditioned training process, and a process that makes the unlabeled nodes follow the same latent distribution to the labeled nodes. | Not inferred | Effectiveness in handling graph data with not balanced distributions of classes. |
Paper | Year | Strategy | Complexity | Basic Experimental Results |
---|---|---|---|---|
[67] | 2020 | Two combined modules: a graph convolutional auto-encoder, which reconstructs a given graph from a latent representation and an adversarial network which is trained to discriminate if a sample is generated from the embedding. | Good average node clustering performance Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI) and good link prediction with different prior distributions compared to a large number of competitive schemes. | |
[68] | 2019 | Combines AGM and GAN in a unified framework, which achieves good GAN performance and direct vertex-community membership representation in AGM. The Graph AGM generates the most likely motifs with graph structure awareness. | Not easy to infer due to random walk involved in the design of the AGM. | Experiments on synthetic data and real-world tasks show its improved performance over other state-of-the-art scheme for different clique sizes and good quality of overlapped communities in terms of NMI and F1-score metrics. |
[69] | 2020 | Strategy that learns heuristics for community detection from training data with generative adversarial networks. The discriminator predicts if a community is real or fake and the generator tries to fool the discriminator by generating communities that fit the characteristics of real ones. | Not inferred | Good performance in terms of evaluation metrics for community detection like the bi-matching F1 and Jaccard scores. |
[70] | 2019 | GraphGAN, a unified framework that designs models via adversarial training in a minimax game. The graph softmax implements the the generator, to resolve the limitations of traditional softmax function and satisfy the properties of normalization, graph structure awareness and computational efficiency. | Achieves important gains in graph reconstruction, link prediction, node classification, and visualization over state-of-the-art schemes. | |
[71] | 2021 | A Jointly Adversarial Network Embedding (JANE) framework with pluggable components to benefit the embedding methods from the adversarial mechanism. JANE distinguishes the real and fake combinations of the embeddings, topology information and node features. | Not inferred | Superiority on link prediction and node clustering in terms of F1 score. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Souravlas, S.; Anastasiadou, S.; Katsavounis, S. A Survey on the Recent Advances of Deep Community Detection. Appl. Sci. 2021, 11, 7179. https://doi.org/10.3390/app11167179
Souravlas S, Anastasiadou S, Katsavounis S. A Survey on the Recent Advances of Deep Community Detection. Applied Sciences. 2021; 11(16):7179. https://doi.org/10.3390/app11167179
Chicago/Turabian StyleSouravlas, Stavros, Sofia Anastasiadou, and Stefanos Katsavounis. 2021. "A Survey on the Recent Advances of Deep Community Detection" Applied Sciences 11, no. 16: 7179. https://doi.org/10.3390/app11167179
APA StyleSouravlas, S., Anastasiadou, S., & Katsavounis, S. (2021). A Survey on the Recent Advances of Deep Community Detection. Applied Sciences, 11(16), 7179. https://doi.org/10.3390/app11167179