Self-Supervised Graph Representation Learning via Information Bottleneck
Abstract
:1. Introduction
- Introducing information bottleneck into contrast learning, thus achieving the purpose of self-supervised learning.
- Considering the neglect of redundant data in past studies, this paper proposes the use of the information bottleneck as the objective function of the optimization model. The information bottleneck was applied by maximizing the mutual information between one view node level representation and another view graph level representation, while minimizing the mutual information between two view node level representations.
- A variety of network analysis experiments, including node classification and node clustering, were conducted on three public datasets as well as two large-scale datasets. Numerous experiments show that the method outperforms the best existing methods. In addition, an in-depth analysis of the model is conducted, and the experimental results show that SGIB can alleviate the over-smoothing problem to a certain extent.
2. Related Work
3. Preliminaries
3.1. Homogeneous Graph
3.2. Mutual Information Estimation
3.3. Information Bottleneck
4. Self-Supervised Graph Representation Learning via Information Bottleneck
4.1. Self-Supervised Information Bottleneck
4.2. Encoders
4.3. Training and Optimization
5. Experimental Analysis and Results
5.1. Datasets and Implementation Details
5.2. Node Classification
5.3. Node Clustering
5.4. Ablation Experiment
5.5. Node Classification with Various Depths
5.6. Limited Labeled Training
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 30th Advances in Neural Information Processing Systems Conference, Long Beach, CA, USA, 4–9 December 2017; pp. 1024–1034. [Google Scholar]
- Pan, S.; Wu, J.; Zhu, X.; Long, G.; Zhang, C. Task sensitive feature exploration and learning for multitask graph classification. IEEE Trans. Cybern. 2016, 47, 744–758. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, D.; Nie, M.; Zhang, H.; Wang, Z.; Wang, D. Network embedding algorithm taking in variational graph autoencoder. Mathematics 2022, 10, 485. [Google Scholar] [CrossRef]
- Pan, S.; Hu, R.; Long, G.; Jiang, J.; Yao, L.; Zhang, C. Adversarially regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2609–2615. [Google Scholar]
- Zhang, Y.; Pal, S.; Coates, M.; Ustebay, D. Bayesian graph convolutional neural networks for semi-supervised classification. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5829–5836. [Google Scholar]
- Qiu, J.; Tang, J.; Ma, H.; Dong, Y.; Wang, K.; Tang, J. Deepinf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2110–2119. [Google Scholar]
- Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. Simgnn: A neural network approach to fast graph similarity computation. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; pp. 384–392. [Google Scholar]
- Sun, F.Y.; Hoffmann, J.; Verma, V.; Tang, J. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Liben-Nowell, D.; Kleinberg, J. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 2007, 58, 1019–1031. [Google Scholar] [CrossRef] [Green Version]
- Zhang, M.H.; Chen, Y.X. Link prediction based on graph neural networks. In Proceedings of the 31st Advances in Neural Information Processing Systems Conference, Montréal, QC, Canada, 3–8 December 2018; pp. 5171–5181. [Google Scholar]
- Velickovic, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Peng, Z.; Huang, W.; Luo, M.; Zheng, Q.; Rong, Y.; Xu, T.; Huang, J. Graph representation learning via graphical mutual information maximization. In Proceedings of the 20th Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 259–270. [Google Scholar]
- Hassani, K.; Khasahmadi, A.H. Contrastive multi-view representation learning on graphs. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; Volume 119, pp. 4116–4126. [Google Scholar]
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Deep graph contrastive representation learning. arXiv 2020, arXiv:2006.04131. [Google Scholar]
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the 21th Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2069–2080. [Google Scholar]
- Yang, L.; Wu, F.; Zheng, Z.; Niu, B.; Gu, J.; Wang, C.; Cao, X.; Guo, Y. Heterogeneous graph information bottleneck. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–27 August 2021; pp. 1638–1645. [Google Scholar]
- Yu, J.; Xu, T.; Rong, Y.; Bian, Y.; Huang, J.; He, R. Recognizing predictive substructures with subgraph information bottleneck. arXiv 2021, arXiv:2103.11155. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Cao, J.; He, R. Improving subgraph recognition with variational graph information bottleneck. arXiv 2021, arXiv:2112.09899. [Google Scholar]
- Belghazi, M.I.; Baratin, A.; Rajeswar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, R.D. Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 530–539. [Google Scholar]
- Achille, A.; Soatto, S. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 2018, 19, 1947–1980. [Google Scholar]
- Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
- Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep variational information bottleneck. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Federici, M.; Dutta, A.; Forré, P.; Kushman, N.; Akata, Z. Learning robust representations via multi-view information bottleneck. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Nowozin, S.; Cseke, B.; Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the 29th Advances in Neural Information Processing Systems Conference, Barcelona, Spain, 5–10 December 2016; pp. 271–279. [Google Scholar]
- Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 912–919. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 29th Advances in Neural Information Processing Systems Conference, Barcelona, Spain, 5–10 December 2016; pp. 3837–3845. [Google Scholar]
- Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; Bronstein, M.M. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5424–5434. [Google Scholar]
- Mavromatis, C.; Karypis, G. Graph infoclust: Maximizing coarse-grain mutual information in graphs. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Virtual Event, 11–14 May 2021; Karlapalem, K., Cheng, H., Ramakrishnan, N., Agrawal, R.K., Reddy, P.K., Srivastava, J., Chakraborty, T., Eds.; Springer: Cham, Switzerland, 2021; Volume 12712, pp. 541–553. [Google Scholar]
- Cao, S.; Lu, W.; Xu, Q. Deep neural networks for learning graph representations. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 1145–1152. [Google Scholar]
- Chang, J.; Blei, D. Relational topic models for document networks. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; Volume 5, pp. 81–88. [Google Scholar]
- Xia, R.; Pan, Y.; Du, L.; Yin, J. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–13 July 2014; pp. 2149–2155. [Google Scholar]
- Yang, C.; Liu, Z.; Zhao, D.; Sun, M.; Chang, E. Network representation learning with rich text information. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 2111–2117. [Google Scholar]
- Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
Datasets | Nodes | Edges | Features | Classes | Train/Val/Test Nodes |
---|---|---|---|---|---|
Cora | 2708 | 5429 | 1433 | 7 | 140/500/1000 |
Citeseer | 3327 | 4732 | 3703 | 6 | 120/500/1000 |
Pubmed | 19,717 | 44,338 | 500 | 3 | 60/500/1000 |
Coauthor-CS | 18,333 | 81,894 | 6805 | 15 | 450/450/17,433 |
Coauthor-Phy | 34,493 | 247,962 | 8415 | 5 | 150/150/34,193 |
Methods | Input | Cora | Citeseer | Pubmed | Coauthor-CS | Coauthor-Phy | |
---|---|---|---|---|---|---|---|
Supervised | MLP | X,Y | 58.2 ± 2.1 | 59.1 ± 2.3 | 70.0 ± 2.1 | 88.3 ± 0.7 | 88.9 ± 1.1 |
LogReg | X,A,Y | 57.1 ± 2.3 | 61.0 ± 2.2 | 64.1 ± 3.1 | 86.4 ± 0.9 | 86.7 ± 1.5 | |
LP | A,Y | 68.0 ± 0.2 | 45.3 ± 0.2 | 63.0 ± 0.5 | 74.3 ± 0.0 | 90.2 ± 0.5 | |
Chebyshev | X,A,Y | 81.2 ± 0.5 | 69.8 ± 0.5 | 74.4 ± 0.3 | 91.5 ± 0.0 | 92.1 ± 0.3 | |
GCN | X,A,Y | 81.5 ± 0.2 | 70.3 ± 0.3 | 79.0 ± 0.4 | 91.8 ± 0.1 | 92.6 ± 0.7 | |
GAT | X,A,Y | 83.0 ± 0.7 | 72.5 ± 0.7 | 79.0 ± 0.3 | 90.5 ± 0.7 | 91.3 ± 0.6 | |
MoNet | X,A,Y | 81.3 ± 1.3 | 71.2 ± 2.0 | 78.6 ± 2.3 | 90.8 ± 0.6 | 92.5 ± 0.9 | |
Unsupervised | DGI | X,A | 81.7 ± 0.6 | 71.5 ± 0.7 | 77.3 ± 0.6 | 90.0 ± 0.3 | 91.3 ± 0.4 |
GMI | X,A | 80.7 ± 0.7 | 71.1 ± 0.2 | 78.0 ± 1.0 | 91.0 ± 0.0 | OOM | |
GRACE | X,A | 80.0 ± 0.4 | 71.7 ± 0.6 | 79.5 ± 1.1 | 90.1 ± 0.8 | 92.3 ± 0.6 | |
GCA | X,A | 80.5 ± 0.5 | 71.3 ± 0.4 | 78.6 ± 0.6 | 91.3 ± 0.4 | 93.1 ± 0.3 | |
GIC | X,A | 81.7 ± 1.5 | 71.9 ± 1.4 | 77.3 ± 1.9 | 89.4 ± 0.4 | 93.1 ± 0.7 | |
MVGRL | X,A | 82.8 ± 1.0 | 72.7 ± 0.5 | 79.6 ± 0.8 | 91.0 ± 0.6 | 93.2 ± 1.0 | |
SGIB | X,A | 83.3 ± 0.7 | 71.7 ± 0.8 | 80.4 ± 0.6 | 92.2 ± 0.5 | 93.8 ± 0.8 |
Methods | Cora | Citeseer | Pubmed | ||||||
---|---|---|---|---|---|---|---|---|---|
F1 | NMI | ARI | F1 | NMI | ARI | F1 | NMI | ARI | |
K-means | 0.368 | 0.321 | 0.230 | 0.409 | 0.305 | 0.279 | 0.195 | 0.001 | 0.002 |
Spectral | 0.318 | 0.127 | 0.031 | 0.299 | 0.056 | 0.010 | 0.271 | 0.042 | 0.002 |
DeepWalk | 0.392 | 0.327 | 0.243 | 0.270 | 0.088 | 0.092 | 0.670 | 0.279 | 0.299 |
DNGR | 0.340 | 0.318 | 0.142 | 0.300 | 0.180 | 0.044 | 0.467 | 0.155 | 0.054 |
RTM | 0.307 | 0.230 | 0.169 | 0.342 | 0.239 | 0.203 | 0.444 | 0.194 | 0.148 |
RMSC | 0.331 | 0.255 | 0.090 | 0.320 | 0.139 | 0.049 | 0.421 | 0.255 | 0.222 |
TADW | 0.481 | 0.441 | 0.332 | 0.414 | 0.291 | 0.228 | 0.335 | 0.001 | 0.001 |
GAE | 0.595 | 0.429 | 0.347 | 0.327 | 0.176 | 0.124 | 0.660 | 0.277 | 0.279 |
VGAE | 0.609 | 0.436 | 0.346 | 0.308 | 0.156 | 0.093 | 0.634 | 0.229 | 0.213 |
DGI | 0.707 | 0.544 | 0.472 | 0.714 | 0.479 | 0.485 | 0.667 | 0.307 | 0.277 |
GMI | 0.701 | 0.542 | 0.495 | 0.667 | 0.419 | 0.418 | 0.644 | 0.239 | 0.225 |
SGIB | 0.714 | 0.546 | 0.505 | 0.716 | 0.487 | 0.487 | 0.673 | 0.307 | 0.279 |
Methods | Cora | Citeseer | Pubmed |
---|---|---|---|
DGI | 82.3 | 71.8 | 76.8 |
DGI + Dropedge | 82.9 | 72.0 | 79.5 |
GMI | 80.7 | 71.1 | 78.0 |
GMI + Dropedge | 81.9 | 69.7 | 78.2 |
SGIB | 83.3 | 71.7 | 80.4 |
Data | Methods | Depths | |||||
---|---|---|---|---|---|---|---|
1 | 2 | 4 | 8 | 16 | 32 | ||
Cora | DGI | 82.30 | 79.36 | 73.10 | 21.87 | 20.01 | 16.43 |
DGI + D | 82.90 | 79.00 | 72.60 | 36.48 | 21.60 | 16.05 | |
GMI | — | 80.70 | 74.06 | 37.81 | 16.22 | 16.06 | |
GMI + D | — | 81.95 | 77.07 | 38.01 | 15.75 | 16.15 | |
MVGRL | 82.80 | 81.74 | 78.20 | 28.38 | 22.04 | 16.82 | |
SGIB | 83.32 | 80.80 | 79.07 | 65.54 | 23.80 | 20.86 | |
Citeseer | DGI | 71.80 | 70.24 | 62.34 | 28.18 | 20.39 | 17.10 |
DGI + D | 72.02 | 70.51 | 64.85 | 32.59 | 21.25 | 17.07 | |
GMI | — | 71.10 | 58.80 | 38.18 | 20.70 | 17.06 | |
GMI + D | — | 69.70 | 54.76 | 39.60 | 24.41 | 19.83 | |
MVGRL | 72.70 | 69.28 | 60.29 | 52.96 | 33.02 | 18.32 | |
SGIB | 71.73 | 71.29 | 67.50 | 58.11 | 35.51 | 20.11 | |
Pubmed | DGI | 76.80 | 73.80 | 65.23 | 50.56 | 45.21 | 34.97 |
DGI + D | 79.48 | 71.83 | 64.94 | 51.20 | 41.84 | 34.89 | |
GMI | — | 78.00 | 75.50 | 61.41 | 44.36 | 34.67 | |
GMI + D | — | 78.18 | 74.62 | 61.32 | 36.39 | 34.81 | |
MVGRL | 79.60 | 75.30 | 67.78 | 36.28 | 34.40 | 34.12 | |
SGIB | 80.44 | 80.10 | 75.50 | 61.58 | 45.60 | 43.66 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, J.; Zheng, Z.; Zhou, W.; Zhang, Y.; Lu, Z.; Yang, L. Self-Supervised Graph Representation Learning via Information Bottleneck. Symmetry 2022, 14, 657. https://doi.org/10.3390/sym14040657
Gu J, Zheng Z, Zhou W, Zhang Y, Lu Z, Yang L. Self-Supervised Graph Representation Learning via Information Bottleneck. Symmetry. 2022; 14(4):657. https://doi.org/10.3390/sym14040657
Chicago/Turabian StyleGu, Junhua, Zichen Zheng, Wenmiao Zhou, Yajuan Zhang, Zhengjun Lu, and Liang Yang. 2022. "Self-Supervised Graph Representation Learning via Information Bottleneck" Symmetry 14, no. 4: 657. https://doi.org/10.3390/sym14040657
APA StyleGu, J., Zheng, Z., Zhou, W., Zhang, Y., Lu, Z., & Yang, L. (2022). Self-Supervised Graph Representation Learning via Information Bottleneck. Symmetry, 14(4), 657. https://doi.org/10.3390/sym14040657