Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction
Abstract
:1. Introduction
- (1)
- We develop a novel framework, named EIFCL for the knowledge graph link prediction task, that can effectively exploit both implicit semantic and explicit structural features of each entity, providing high-quality self-supervised signals for contrastive learning through these two complementary information sources.
- (2)
- We explore the potential associations among entities, modeling and encoding their implicit semantic features based on the clustering characteristic in latent space, enabling high-order but informative entities to contribute to our model.
- (3)
- A feature extraction module is designed to preserve local contextual information; the explicit structural feature is encoded through the subgraph mechanism, which enables the dynamic representation of neighbor entities to adapt to various contexts.
- (4)
- We conduct experiments on three benchmark knowledge graph link prediction datasets. The results validate that our model achieves performance superior to those of the state-of-the-art models, including traditional and GNN-based methods for link prediction tasks.
2. Related Work
2.1. Link Prediction Task
2.2. Traditional Methods
2.3. Contrastive Learning in Knowledge Graph Link Prediction
3. Methods
3.1. Notation and Preliminaries
3.2. Explicit Structural Feature Extraction
3.3. Implicit Semantic Feature Extraction
3.4. Training Objective
- Subtraction (Sub): ;
- Multiplication (Mult): ;
- Rotation (Rot): ;
- Circular-correlation (Corr): .
3.5. The Time Complexity
4. Experiments
4.1. Experimental Settings
- TransE [11]: It is the most representative translation-based model that enforces the tail embedding to be close to the combination of the head and the relation embedding.
- TransR [12]: It is a classical translation-based model that defines a relation-specific matrix for each relation and maps entities from the entity space to the relation space.
- ComplEx-N3 [39]: It is a tensor decomposition model that improves the ComplEx model in the aspect of regularization.
- TypeComplex [40]: It is a tensor decomposition model that improves the TDB methods with additional type information.
- SANS [41]: It is a tensor decomposition model based on DistMult with structure-aware negative samples and a self-adversarial approach.
- PairRE [42]: It is a translation-based model that employs paired vector representations to accommodate diverse and complex relations.
- Node2vec [35]: It is a GNN-based method that combines the random walk and word embedding techniques to capture the structural and contextual relationships between nodes.
- CompGCN [15]: It is a GNN-based method which jointly aggregates entity and relation embeddings.
- SMiLE [34]: It is a state-of-the-art GCL method that introduces schema as priors to enable high-quality negative sampling.
4.2. Experimental Results and Analysis
- (1)
- Compared with the others, our model achieves a competitive result. Specifically, the model performs better than the traditional methods (such as ComplEx-N3, PairRE and TransR); we attribute this superior performance to its ability to capture both explicit entity connections and implicit entity associations in the whole knowledge graph, which is more effective in link prediction tasks.
- (2)
- The GNN-based methods outperform the traditional models in most datasets, providing evidence for the effectiveness of exploring graph features. This approach can propagate shared similar features along entity paths. Node2vec performs well on most datasets, indicating that a random walk can learn continuous feature representations of entities in the subgraph, enhancing the effectiveness and robustness of the framework. Surprisingly, CompGCN is worse than the traditional model, probably because CompGCN only models relational connections: the information within a triple is too unitary. Thus, the entity embedding vectors are overlapping and indistinguishable. This also verifies the necessity of considering implicit association information in our model.
- (3)
- For the self-supervised methods, our model outperforms the state-of-the-art SMiLE. Unlike constraining contrastive learning at the instance level in SMiLE, our model uncovers implicit concept-level associations between the entities and enhances the cohesiveness within the clusters of homophilous nodes, as well as the separability between clusters. With this extra information, the generated entity representations are more informative and discriminatory.
- (4)
- Finally, the experimental results indicate that the proposed model consistently performs better than the others. This is brought about by the feature-enriched contrastive learning objectives. In addition, its slightly weaker performance on the HumanWiki dataset is due to an insufficient number of concepts, where each entity has an average of 6 to 7 concepts in the FB15k dataset, but the entities have only 2 to 3 concepts in HumanWiki. This may result in a deviation of entity associations at the concept-level. In addition, more improvement is obtained on the sparse datasets, such as the FB15k dataset. We speculate that mining entity relevance signals to enrich entity representations can more accurately infer missing entities for connected sparse entities. This is also significant in real-world knowledge graph link prediction scenarios.
4.3. Ablation Study
4.4. Parameter Sensitivity
4.5. Analysis of Sparse Entities
4.6. Visualization
4.7. Large Language Models (LLMs) Comparison and Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chhetri, T.R.; Kurteva, A.; Adigun, J.G.; Fensel, A. Knowledge graph based hard drive failure prediction. Sensors 2022, 22, 985. [Google Scholar] [CrossRef] [PubMed]
- Sakurai, K.; Togo, R.; Ogawa, T.; Haseyama, M. Controllable music playlist generation based on knowledge graph and reinforcement learning. Sensors 2022, 22, 3722. [Google Scholar] [CrossRef] [PubMed]
- Bakhshi, M.; Nematbakhsh, M.; Mohsenzadeh, M.; Rahmani, A.M. Data-driven construction of SPARQL queries by approximate question graph alignment in question answering over knowledge graphs. Expert Syst. Appl. 2020, 146, 113205. [Google Scholar] [CrossRef]
- Cai, X.; Xie, L.; Tian, R.; Cui, Z. Explicable recommendation based on knowledge graph. Expert Syst. Appl. 2022, 200, 117035. [Google Scholar] [CrossRef]
- Zhang, J.; Huang, J.; Gao, J.; Han, R.; Zhou, C. Knowledge graph embedding by logical-default attention graph convolution neural network for link prediction. Inf. Sci. 2022, 593, 201–215. [Google Scholar] [CrossRef]
- Tang, X.; Chi, G.; Cui, L.; Ip, A.W.H.; Yung, K.L.; Xie, X. Exploring research on the construction and application of knowledge graphs for aircraft fault diagnosis. Sensors 2023, 23, 5295. [Google Scholar] [CrossRef] [PubMed]
- He, Q.; Xu, S.; Zhu, Z.; Wang, P.; Li, K.; Zheng, Q.; Li, Y. KRP-DS: A Knowledge Graph-Based Dialogue System with Inference-Aided Prediction. Sensors 2023, 23, 6805. [Google Scholar] [CrossRef] [PubMed]
- Kazemi, S.M.; Poole, D. Simple embedding for link prediction in knowledge graphs. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018; Volume 31. [Google Scholar]
- Wang, S.; Fu, K.; Sun, X.; Zhang, Z.; Li, S.; Jin, L. Hierarchical-aware relation rotational knowledge graph embedding for link prediction. Neurocomputing 2021, 458, 259–270. [Google Scholar] [CrossRef]
- Ferrari, I.; Frisoni, G.; Italiani, P.; Moro, G.; Sartori, C. Comprehensive analysis of knowledge graph embedding techniques benchmarked on link prediction. Electronics 2022, 11, 3866. [Google Scholar] [CrossRef]
- Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; pp. 2787–2795. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Yang, B.; Yih, S.W.t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Guo, S.; Wang, Q.; Wang, L.; Wang, B.; Guo, L. Knowledge graph embedding with iterative guidance from soft rules. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 32. [Google Scholar]
- Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based Multi-Relational Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhang, Y.; Yao, Q. Knowledge Graph Reasoning with Relational Directed Graph. arXiv 2021, arXiv:2108.06040. [Google Scholar]
- Luo, Z.; Xu, W.; Liu, W.; Bian, J.; Yin, J.; Liu, T.Y. KGE-CL: Contrastive Learning of Tensor Decomposition Based Knowledge Graph Embeddings. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2598–2607. [Google Scholar]
- Tan, Z.; Chen, Z.; Feng, S.; Zhang, Q.; Zheng, Q.; Li, J.; Luo, M. KRACL: Contrastive learning with graph context modeling for sparse knowledge graph completion. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 2548–2559. [Google Scholar]
- Wu, Y.; Wang, Y.; Li, Y.; Zhu, X.; Wu, X. Top-k self-adaptive contrast sequential pattern mining. IEEE Trans. Cybern. 2021, 52, 11819–11833. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Meng, Y.; Li, Y.; Guo, L.; Zhu, X.; Fournier-Viger, P.; Wu, X. COPP-Miner: Top-K Contrast Order-Preserving Pattern Mining for Time Series Classification. IEEE Trans. Knowl. Data Eng. 2023, 36, 2372–2387. [Google Scholar] [CrossRef]
- Liu, G.; Jin, C.; Shi, L.; Yang, C.; Shuai, J.; Ying, J. Enhancing Cross-Lingual Entity Alignment in Knowledge Graphs through Structure Similarity Rearrangement. Sensors 2023, 23, 7096. [Google Scholar] [CrossRef] [PubMed]
- Pandithawatta, S.; Ahn, S.; Rameezdeen, R.; Chow, C.W.K.; Gorjian, N.; Kim, T.W. Development of a Knowledge Graph for Automatic Job Hazard Analysis: The Schema. Sensors 2023, 23, 3893. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Yi, Y.; Jia, G. Path-RotatE: Knowledge Graph Embedding by Relational Rotation of Path in Complex Space. In Proceedings of the 10th IEEE/CIC International Conference on Communications in China, ICCC 2021, Xiamen, China, 28–30 July 2021; pp. 905–910. [Google Scholar] [CrossRef]
- Ebisu, T.; Ichise, R. Toruse: Knowledge graph embedding on a lie group. In Proceedings of the AAAI conference on artificial intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
- Luo, Y.; Yang, C.; Li, B.; Zhao, X.; Zhang, H. CP tensor factorization for knowledge graph completion. In International Conference on Knowledge Science, Engineering and Management; Springer: Cham, Switzerland, 2022; pp. 240–254. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
- Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Li, R.; Cao, Y.; Zhu, Q.; Bi, G.; Fang, F.; Liu, Y.; Li, Q. How does knowledge graph embedding extrapolate to unseen data: A semantic evidence view. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 5781–5791. [Google Scholar]
- Zhang, X.; Zhang, C.; Guo, J.; Peng, C.; Niu, Z.; Wu, X. Graph attention network with dynamic representation of relations for knowledge graph completion. Expert Syst. Appl. 2023, 219, 119616. [Google Scholar] [CrossRef]
- You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 2020, 33, 5812–5823. [Google Scholar]
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 12–23 April 2021; pp. 2069–2080. [Google Scholar]
- Wang, X.; Liu, N.; Han, H.; Shi, C. Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning. In Proceedings of the KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Singapore, 14–18 August 2021; Zhu, F., Ooi, B.C., Miao, C., Eds.; ACM 2021. pp. 1726–1736. [Google Scholar] [CrossRef]
- Peng, M.; Liu, B.; Xie, Q.; Xu, W.; Wang, H.; Peng, M. SMiLE: Schema-augmented Multi-level Contrastive Learning for Knowledge Graph Link Prediction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 4165–4177. [Google Scholar]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
- Miyahara, H.; Aihara, K.; Lechner, W. Quantum expectation-maximization algorithm. Phys. Rev. A 2020, 101, 012326. [Google Scholar] [CrossRef]
- Lin, S.; Liu, C.; Zhou, P.; Hu, Z.Y.; Wang, S.; Zhao, R.; Zheng, Y.; Lin, L.; Xing, E.; Liang, X. Prototypical graph contrastive learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 2747–2758. [Google Scholar] [CrossRef] [PubMed]
- Rosso, P.; Yang, D.; Ostapuk, N.; Cudré-Mauroux, P. Reta: A schema-aware, end-to-end solution for instance completion in knowledge graphs. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 12–23 April 2021; pp. 845–856. [Google Scholar]
- Lacroix, T.; Usunier, N.; Obozinski, G. Canonical Tensor Decomposition for Knowledge Base Completion. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Dy, J.G., Krause, A., Eds.; PMLR 2018; Proceedings of Machine Learning Research. Volume 80, pp. 2869–2878. [Google Scholar]
- Jain, P.; Kumar, P.; Chakrabarti, S. Type-sensitive knowledge base inference without explicit type supervision. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; pp. 75–80. [Google Scholar]
- Ahrabian, K.; Feizi, A.; Salehi, Y.; Hamilton, W.L.; Bose, A.J. Structure Aware Negative Sampling in Knowledge Graphs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics 2020. pp. 6093–6101. [Google Scholar] [CrossRef]
- Chao, L.; He, J.; Wang, T.; Chu, W. PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 4360–4369. [Google Scholar]
- Wang, P.; Agarwal, K.; Ham, C.; Choudhury, S.; Reddy, C.K. Self-supervised learning of contextual embeddings for link prediction in heterogeneous networks. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 12–23 April 2021; pp. 2946–2957. [Google Scholar]
Component | FB15k (s) | FB15k-237 (s) | HumanWiki (s) |
---|---|---|---|
Total pre-training time | 11,972.88 | 13,077 | 8945.52 |
Total fine-tuning time | 16,629 | 18,681.6 | 12,779.32 |
Evaluation time | 49.88 | 42.22 | 44.28 |
Dataset | #Entities | #Relations | #Concepts | #Edges | #Triples |
---|---|---|---|---|---|
FB15k-237 | 14,541 | 237 | 583 | 248,611 | 310,116 |
FB15k | 14,579 | 1208 | 588 | 117,580 | 154,916 |
HumanWiki | 38,949 | 221 | 388 | 105,688 | 108,199 |
Model | FB15k | FB15k-237 | HumanWiki | |||
---|---|---|---|---|---|---|
Micro-F1 | AUC-ROC | Micro-F1 | AUC-ROC | Micro-F1 | AUC-ROC | |
TransE | 0.5036 | 0.5013 | 0.4778 | 0.4818 | 0.4906 | 0.4931 |
TransR | 0.7196 | 0.7696 | 0.6719 | 0.7076 | 0.6156 | 0.6654 |
ComplEx-N3 | 0.4963 | 0.4963 | 0.5019 | 0.5034 | 0.5453 | 0.5286 |
TypeComplex | 0.8809 | 0.9390 | 0.5005 | 0.5025 | 0.8017 | 0.8558 |
SANS | 0.8897 | 0.9459 | 0.5003 | 0.5026 | 0.7818 | 0.8369 |
PairRE | 0.8827 | 0.9267 | 0.4962 | 0.4930 | 0.8007 | 0.8768 |
Node2vec | 0.8023 | 0.8891 | 0.8369 | 0.8977 | 0.8013 | 0.8754 |
CompGCN | 0.6035 | 0.6359 | 0.6539 | 0.7201 | 0.5688 | 0.4009 |
SMiLE | 0.9076 | 0.9653 | 0.8875 | 0.9492 | 0.9340 | 0.9792 |
EIFCL (ours) | 0.9285 | 0.9776 | 0.9023 | 0.9620 | 0.9448 | 0.9856 |
Model | FB15k | FB15k-237 | HumanWiki | ||||
---|---|---|---|---|---|---|---|
Implicit | Explicit | Micro-F1 | AUC-ROC | Micro-F1 | AUC-ROC | Micro-F1 | AUC-ROC |
✓ | 0.9139 | 0.9650 | 0.9010 | 0.9623 | 0.9311 | 0.9701 | |
✓ | 0.9113 | 0.9644 | 0.8863 | 0.9505 | 0.9330 | 0.9752 | |
✓ | ✓ | 0.9285 | 0.9776 | 0.9023 | 0.9620 | 0.9448 | 0.9856 |
FB15k | HumanWiki | |||
---|---|---|---|---|
Micro-F1 | AUC-ROC | Micro-F1 | AUC-ROC | |
l = 2 | 0.9250 | 0.9737 | 0.9814 | 0.9287 |
l = 4 | 0.9285 | 0.9776 | 0.9856 | 0.9448 |
l = 6 | 0.8849 | 0.9507 | 0.8896 | 0.9571 |
a = 4 | 0.9285 | 0.9776 | 0.9856 | 0.9448 |
a = 8 | 0.9211 | 0.9742 | 0.9853 | 0.9405 |
a = 12 | 0.9241 | 0.9750 | 0.9831 | 0.9385 |
a = 16 | 0.9216 | 0.9744 | 0.9817 | 0.9319 |
= 6 | 0.9285 | 0.9776 | 0.9856 | 0.9448 |
= 8 | 0.9203 | 0.9743 | 0.9836 | 0.9387 |
= 10 | 0.9231 | 0.9743 | 0.9863 | 0.9399 |
= 12 | 0.9254 | 0.9763 | 0.9813 | 0.9339 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, X.; Wang, W.; Gao, B.; Zhao, L.; Ma, R.; Ding, F. Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction. Sensors 2024, 24, 7353. https://doi.org/10.3390/s24227353
Yuan X, Wang W, Gao B, Zhao L, Ma R, Ding F. Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction. Sensors. 2024; 24(22):7353. https://doi.org/10.3390/s24227353
Chicago/Turabian StyleYuan, Xu, Weihe Wang, Buyun Gao, Liang Zhao, Ruixin Ma, and Feng Ding. 2024. "Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction" Sensors 24, no. 22: 7353. https://doi.org/10.3390/s24227353
APA StyleYuan, X., Wang, W., Gao, B., Zhao, L., Ma, R., & Ding, F. (2024). Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction. Sensors, 24(22), 7353. https://doi.org/10.3390/s24227353