1. Introduction
Knowledge graphs (KGs) are instrumental in various intelligence applications, including recommendation systems [
1,
2], information retrieval [
3,
4], and question answering [
5,
6]. To facilitate these knowledge-driven applications, numerous types of KGs have been developed in the past decades. KGs comprise structured human-understandable knowledge, typically represented as triplets [
7]. Each triplet, denoted as
, comprises a head entity
, a tail entity
, and a relation
r. By leveraging this structured representation, KGs can be represented as multi-relation graphs, with entities acting as nodes and relations serving as edges [
8]. This graphical representation facilitates the exploration and analysis of interconnections and semantic relationships within the knowledge domain, enabling sophisticated knowledge-driven applications to leverage the rich information network within KGs [
9].
KGs are usually constructed from various data sources (e.g., structured databases or text corpora), through methods such as relation extraction [
10], or by manually checking facts by domain experts [
11,
12]. Considering the quality of existing facts and the need for newly added knowledge, KGs are always incomplete and sparse, which limits their utility in various applications [
13,
14,
15]. To address this challenge, knowledge graph embedding (KGE) models have emerged as a powerful solution [
16,
17,
18,
19]. These models aim to learn low-dimensional vector representations or embeddings of relations and entities within KGs. By utilizing these embeddings, KGE models can effectively predict missing links in KGs, thereby enabling the completion of the graph [
20].
KGs consist of human-understanding knowledge organized by incomplete graphs, hence containing undiscovered relations between entities [
21]. Due to the incompleteness and sparsity of KGs, most KGE models learn representation with insufficient information [
22,
23,
24]. Precisely, these methods complete KGs mainly based on observable correlations between entities and overlook potential semantic correlations.
Figure 1 illustrates that even unconnected entities may have probable relationships. Furthermore, interactions within networks are not limited to pairs, but rather occur in larger groups [
25]. Atom-groups in a molecule represent specific chemical functionalities and user-groups in a social network serve particular sociological purposes [
26]. Similarly, triplets in KGs only depict observable entity-pairs, but unobservable correlations reside in entity-groups (e.g., triangles). Entity-groups in KGs serve as semantic patterns, and slight structural variations (e.g., absence of a triplet) within these patterns intricately shape distinct characteristics. Although many KGE models have achieved significant success, capturing these unobserved relations remains challenging. Graph convolutional networks (GCNs) offer powerful graph-learning capabilities and can be employed to extract structural characteristics beyond pairwise relations [
27,
28,
29,
30], GCNs are also effective in various domains (e.g., traffic flow prediction [
31]). R-GCN [
32] introduced relation-specific transformations to incorporate relational information during the aggregation of neighbors. Shang et al. considered the influence of relation weights and proposed SACN [
33]. Vashishth et al. designed CompGCN to improve relation modeling by jointly embedding entities and relations. HRAN [
34] use heterogeneous relation–attention networks to improve relation modeling and prediction in complex and heterogeneous KGs. All of these demonstrate the effectiveness of GCNs in capturing higher-order interactions and dependencies between entities, which also provide valuable insights for relation modeling and prediction tasks. In the context of message passing scheme of GCNs, they effectively propagate semantic information within KGs. Moreover, GCNs perform well in relation modeling and link-prediction tasks, showcasing their capabilities of handling complex relational data. However, GCNs encounter difficulties in aggregating limited neighborhood information within sparse KGs. Consequently, GCN-based KGE models also face challenges in capturing potential semantic correlations.
To address this problem, we propose a KGE model, namely semantic-enhanced knowledge graphcompletion (SE-KGC). The core of SE-KGC lies in a semantic enhancement module, capturing potential correlations through semantic patterns. Considering these patterns exhibit diverse behaviors across different domains, we predefine them by selecting several higher-order structures (i.e., motifs) and then adapting the attention mechanism. Accordingly, the enhancement module operates like sliding windows on KGs, empowering information propagation on underlying relations by mimicking convolution operations. We then apply a multi-relational GCN encoder to learn powerful representations in an enriched local neighborhood. Additionally, we utilize a scoring decoder to evaluate triplets. In summary, the main contributions of this paper are as follows:
We propose SE-KGC, a knowledge graph embedding model that captures potential semantic correlations between entities.
We develop an entity feature enhancement module that incorporates both semantic and structural information for inherently sparse KGs, which adaptively enriches local neighborhoods for the GCN encoder.
We conduct experiments on several real-world KGs from different domains, demonstrating the effectiveness of SE-KGC. We thoroughly analyze the learned weights of different semantic patterns and find that those with higher connectivity are more vital.
The subsequent sections of this paper are organized as follows.
Section 2 deals with the previous relevant research related to our study.
Section 3 introduces some preliminaries.
Section 4 elaborates on the intricate details of our proposed model, highlighting its key components and functionalities.
Section 5 offers a comprehensive exposition of our experimental design, including an in-depth introduction to the experiments conducted and the corresponding results obtained. Finally,
Section 6 summarises the contributions, emphasizing its significance and potential implications.
5. Experiments
We evaluated SE-KGC against multiple baseline methods in the link-prediction task. Furthermore, we performed an extensive analysis of different semantic patterns to gain deeper insights into their effectiveness and contributions.
5.1. Datasets and Baselines
5.1.1. Datasets
To validate the effectiveness of our approach, extensive experiments are conducted on the following benchmark datasets: WN18RR [
35], Kinship [
42], and Nations [
43]. Each dataset was divided into the train, valid, and test sets. The statistical information for these datasets is presented in
Table 2, and additional detailed information is provided as follows.
WN18RR [
35]: WN18RR is a subset of WN18, which is built by removing reversible relations from WN18 dataset. It contains 40,559 entities and 11 relations.
Kinship [
42]: Aboriginal Australian kinship systems are important in traditional aboriginal cultures as customary laws for social interactions among kin. The Alyawarre system from Central Australia has 104 entities and 25 relations, particularly relevant to marriages between aboriginal people.
Nations [
43]: Nations is a small knowledge graph and focuses on countries and their political relations, providing valuable insights into international affairs and diplomatic connections.
5.1.2. Baselines
We evaluate our model with three different combination operations in Equations (
4)–(6), denoted as SE-KGC-attention, SE-KGC-replace, and SE-KGC-concat, respectively. We then compare their performance with several methods developed in recent years.
Translational distance models: TransE [
22], RotatE [
44].
Bilinear matching models: DISTMULT [
23], ComplEx [
24].
CNN-based models: ConvE [
35], ConvR [
36], ConvKB [
37], LTE-ConvE [
45].
GCN-based models: R-GCN [
32], SACN [
33], CompGCN [
38], HRAN [
34], KGEL [
39], DRGI [
40].
By comparing with these existing models, we aim to assess the effectiveness and competitiveness of our proposed approach in knowledge graph representation and link-prediction tasks.
5.2. Experimental Setup
5.2.1. Implementation Details
Each model is trained for 500 epochs using an ADAM optimizerwith a learning rate of 0.001. ADAM is a suitable choice for training knowledge graph models due to its ability to handle sparse gradients, scalability, adaptive learning rate, and faster convergence. To prevent overfitting, the training process ends when the value of the loss function no longer decreases. In order to accommodate the dataset size, strike a balance between efficiency and performance, and mitigate the risk of overfitting, we opt for a single GCN layer with a hidden state dimension of 200. Additionally, aligning the hidden state dimension with the dimensions of the entity and relation embeddings contributes to the seamless integration of information across various model components. The same hyperparameters are used to maintain consistency across different datasets. All experiments are conducted on Intel(R) Xeon(R) Gold 5120 CPU @ 2.20 GHz (Intel Corporation, Santa Clara, United States) and Tesla V100-SXM2-32 GB GPU (NVIDIA Corporation, Santa Clara, United States), which can accelerate the computational process and improve the speed of model training and inference.
5.2.2. Metrics
Consistent with most baseline methods, we evaluate the performance of our model using the ranks of triplets. Specifically, we conduct both head and tail evaluations for all correct triplets. Taking the tail evaluation of a triplet as an example, we first construct its correct triplet by replacing its head and tail entities with other entities. Subsequently, the proposed model predicts the scores of correct and corrupted triples using the graph decoder (ConvE). These scores are then sorted in descending order to determine the corresponding ranks. Based on these ranks, we assess the model’s performance using several rank-based metrics, including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@n, which represents the proportion of correct entities ranking in the top n (where n = 1, 3, 10).
Given the set of triplets S, indicates the predicted rank of the ith triplet. The indicator function is utilized to determine if a rank is the correct one. The MR metric quantifies the average rank of the model within the ranking for each test example. On the other hand, the MRR metric captures the speed at which the model identifies the correct answer within the ranking. Meanwhile, the Hits@n metric assesses whether the model includes the correct triples within its top n predictions. A higher value of MRR and Hits@n indicates better model performance, while a higher MR value indicates poorer performance. Again, the procedure for head evaluation is the same as for tails. The final evaluation result is obtained by averaging the results of both head and tail evaluations. In the realm of real-world KGs, relations between entities typically exhibit bidirectional characteristics, while the identities of missing entities remain unknown (head or tail). To facilitate the application of KGC in link-prediction tasks, a model must possess the capability to concurrently predict both head and tail entities. As an example, consider a KG comprising entities A, B, and C. Given the correct triplet (A, Relation1, B), head evaluation involves fixing Relation1 and entity B to predict the missing head entity A. If the model accurately predicts entity A, it can be concluded that the model exhibits a good understanding of Relation1 and the patterns and dependencies between entities A and B. The same principle applies to tail evaluation. By simultaneously considering both head and tail predictions, the model gains a more comprehensive understanding of the interrelationships among entities, thereby enhancing its ability to capture entity interconnectedness, leading to improved accuracy in predicting missing entities.
5.3. Link Prediction
The results from experiments are shown in
Table 3,
Table 4 and
Table 5. SE-KGC outperforms all other methods on the three datasets. Notably, for the WN18RR dataset, SE-KGC exhibits significant performance improvement compared to SACN and CompGCN, with MRR increased by 11.3% and 11.7%, respectively, while Hits@10 increased by 17.9% and 19.2%, respectively. Such effectiveness is obtained due to the semantic enhancement step. In addition, the Hits@1 and MRR metrics of SE-KGC-concat are better than those of SE-KGC-replace. It is worth noting that SE-KGC-replace does not use the original feature, which still contributes to the overall performance. Comparing SE-KGC to the single decoder model ConvE, SE-KGC demonstrates significant improvements across all metrics, confirming the effectiveness of our encoder model and semantic enhancement approach. The above results show that the proposed SE-KGC can generate representation embedding for entities and relations and can be used in link-prediction tasks. It can also improve the effectiveness of the method through semantic enhancement. The results show that it is necessary to pay attention to the potential semantic correlations between entities.
This paper also addresses the performance of SE-KGC on two small datasets (Kinship, Nations). For Kinship, SE-KGC showed a 1.6% improvement in the Hits@1 metrics compared to CompGCN. For Nations, SE-KGC showed a greater improvement in the Hits@3 and Hits@1 metrics compared to CompGCN, with 7% and 7.9%, respectively. Our model has limited improvement in performance due to the small size of these two datasets. The MRR index of SE-KGC-attention is 24.4% higher than that of ConvE, which confirms the effectiveness of our encoder model and shows that the neighborhood information and semantic enhancement of SE-KGC-attention aggregation are valuable.
The best results for WN18RR were observed with the replace and concat optional operations, as they effectively preserved the original enhanced features and initial features. Compared with WN18RR, Kinship and Nations exhibit better performance on SE-KGC-attention rather than SE-KGC-replace. We contend the reason is that SE-KGC-attention easily fits trainable attention weights for small datasets.
5.4. Analysis of Semantic Patterns
As shown in
Figure 3, both SE-KGC-replace and SE-KGC-concat exhibit a similar distribution of the learned weights of the six motifs in the WN18RR dataset. Motifs
and
occupy the top two highest proportions in two SE-KGC variants, which can be explained by semantic patterns given by structures of motifs. Structurally, motifs
and
connect every entity to all other entities in the graph. Semantic patterns behind these two structures imply that entities with the same neighbors probably also have relations. This observation highlights the semantic patterns ingrained within the structures of these motifs, reinforcing the importance of considering such patterns in link-prediction tasks.
We can also obtain the distribution of the SE-KGC-attention model from
Figure 4. For third-order motifs,
holds greater significance compared to
. Similarly, in the case of fourth-order motifs,
exhibits higher importance than
, and likewise,
surpasses
in significance. This observation can be attributed to the semantic patterns determined by the motifs. The patterns can be explained by the fact that entities with higher connectivity tend to play a more crucial role in characterizing semantic information. We show the effectiveness of semantic enhancement guided by motifs, and different motif structures stand for various semantic patterns.
5.5. Parameter Sensitivity
We conducted extensive experiments on the Nations dataset to investigate the sensitivity of four important parameters, including the impact of the learning rate, the settings of attention dropout, hidden dropout and GCN dropout. By systematically varying these parameters and analyzing their effects on the experimental outcomes, our objective was to gain profound insights into their influence and identify the optimal configurations for our model.
5.5.1. The Value of Learning Rate
The learning rate is a crucial hyperparameter in machine learning that plays a pivotal role in determining the step size for updating model parameters during training. It directly influences the speed at which the model converges when minimizing the loss function using gradient descent. Setting the appropriate learning rate is essential for improving the accuracy and convergence speed of the model. We conducted experiments with learning rates set to 0.01, 0.001, and 0.0001 to ascertain the most suitable learning rate. The experimental results are shown in
Figure 5a. Notably, our model achieved the best results when the learning rate was set to 0.001. As observed, when the learning rate is too small, the model tends to converge at a sluggish pace. Conversely, when using a large learning rate, the model may either fail to converge or converge to a suboptimal solution.
5.5.2. The Settings of Dropout
Attention dropout, GCN dropout, and hidden dropout are regularization techniques widely employed in deep learning to mitigate the problem of overfitting. These techniques play a crucial role in improving the generalization capability of models. Attention dropout sets a proportion of attention weights to zero during training to reduce the model’s reliance on specific attention areas and improve generalization. Hidden dropout randomly drops out neurons during training. This dropout scheme reduces the model’s dependence on specific features and promotes robustness by encouraging the remaining neurons to collectively contribute to the learning process. GCN dropout operates on the adjacency matrix of graph data during training to reduce dependence on certain nodes or edges and improve generalization. This regularization technique enhances the generalization capability of graph convolutional networks by encouraging the model to consider a wider range of node and edge relationships. These techniques are important for preventing overfitting and improving generalization in deep learning models. The experimental results, depicted in
Figure 5b–d, confirm the effectiveness of these techniques. Notably, the optimal experimental results are obtained when att_dropout is and set to
, GCN_dropout is set to
, and hid_dropout is set to
.
6. Conclusions
We have proposed a KGE model, namely SE-KGC, that can capture various semantic information within knowledge graphs. The key idea is to present a semantic enhancement under different semantic patterns. These patterns are predefined by a group of typical higher-order structures, incorporating human-understanding knowledge. The semantic enhancement not only captures potential semantic correlations between entities but also enriches the local neighborhood for our GCN encoder. We apply a decoder to evaluate triplet scores for link-prediction tasks. Experimental results demonstrate that our SE-KGC is superior to other models on three datasets, and we also comprehensively analyze the contributions of different semantic patterns.
Capturing semantic information in KGs is crucial for advancing various information-retrieval applications. Although large language models have achieved significant success in understanding semantemes, their underlying mechanisms remain opaque and difficult to interpret. Exploring human-understanding higher-order semantic patterns holds significant value for model transparency. In this paper, we attempted to formulate these semantic patterns for knowledge graph completion and achieved significant improvements. The follow-up research could pay more attention to higher-order complex semantic information, which will definitely prompt practical applications in various fields such as mining graphs and data science.