1. Introduction
Concepts, as an important part of ontology, and instances collectively compose entities in knowledge bases such as Wordnet [
1], YAGO [
2] and Freebase [
3]. Knowledge Graph Embedding (KGE) encodes entities and relations into continuous low-dimensional vector spaces while capturing their semantics and preserving the inherent structure of the Knowledge Graph (KG). Compared to traditional knowledge graph embeddings, KGE methods with concept embeddings offer significant improvements in semantic understanding, generalization, and integration with language models, making them highly effective for various downstream applications such as relation extraction [
4], question answering [
5], dialogue agent [
6], and Concept-Enhanced Pre-Training Model [
7].
Some studies have achieved promising performance by differentiating and jointly embedding concepts and instances. Depending on the representation form of concepts, these methods can be classified into vectorial methods (e.g., JOIE [
8]) and geometric methods (e.g., TransC [
9]). Specifically, vectorial methods map both instances and concepts into vectors. This uniform representation makes it challenging for models to distinguish between their different characteristics. In comparison, geometric methods represent each concept as a hyper geometric region and each instance as a vector to explicitly distinguish them. Furthermore, geometric methods exploit relative positions between embedding representations to model isA relations (
instanceOf and
subClassOf), thus providing an interpretation that aligns with human intuition for maintaining the transitivity of isA relations.
Nevertheless, existing geometric methods do not explicitly model the unique hierarchical tree structure of concepts in embedding space. This structure naturally forms through the
subClassOf relation between concepts and their sub-concepts (or super-concepts), as illustrated on the left of
Figure 1. The concept tree structure effectively demonstrates the hierarchical relationships between concepts. Higher-level concepts are typically more abstract and general, while lower-level concepts are more specific and detailed. Moreover, a concept can be reified by all of its actual or potential instances [
9]. The
instanceOf relation between instances and different levels of concepts in a concept tree reflects the hierarchy of classification and the scope of concepts (i.e., the higher-level concepts contain more instances than the lower-level concepts). Therefore, establishing a hierarchical tree structure in vector space can link the symbolic representation and vector representation in knowledge graphs. This provides an intuitive explanation for concept reasoning and the transitivity of isA relations and aids in positioning concept and instance representations within the semantic space.
To preserve the hierarchical tree structure between concepts and perform logical inference in a vector space, which was inspired by JoSH [
10] and geometric methods, we represent each concept embedding as a hyperspherical cone region and each instance as a vector point in a unit of high-dimensional spherical space. Specifically, a hyperspherical cone region is determined by a concept central vector as the axis and a half-vertex angle. As shown in the
Figure 1b, higher-level concepts, represented as larger hyperspherical cone regions, iteratively encompass their sub-concepts, which are represented as smaller hyperspherical cone regions, thus illustrating the hierarchical structure of concepts in the embedding space. Starting from the root vector and connecting the central vectors of these concept regions layer by layer reconstructs the tree structure of concepts in the knowledge graph in
Figure 1a.
In contrast to existing geometric methods, HCCE represents the anisotropy of concept embeddings by allowing the lengths of the spherical cap sections of the hyperspherical cone to vary across different dimensions, which can model the anisotropy of concept embeddings. The sphere [
9] ignores the anisotropy of concept embeddings. Compared to the cuboid and ellipsoid [
11] and box [
12], HCCE requires fewer parameters and simpler calculations. Compared to using hyperbolic embedding for modeling hierarchical structures [
13], our approach to modeling high-dimensional spherical cones in Euclidean space makes the model easier to understand and easier to apply to downstream tasks while reducing computational complexity. Unlike existing conical modeling methods [
14], HCCE represents each concept as a high-dimensional spherical cone region composed of all dimensions. This not only strengthens the connections between dimensions but also provides better interpretability and simpler computational methods.
Models that do not differentiate between concepts and instances can only embed entities in the same embedding space. However, distinguishing between instances and concepts provides the option to embed concepts and instances into two different vector spaces. Based on JOIE [
8] and DGS [
15], we not only explore embedding concepts and instances in the same space (HCCE-SS) but also model them in different spaces to capture their distinct structures (HCCE-DS). In our models, the primary distinction between these two methods lies in the modeling of the
instanceOf relation. Specifically, for the same space, the
instanceOf relation is modeled by having instances fall within the concept region. For different spaces, we first use non-linear transformations to map instances from the instance embedding space to the concept embedding space, then determine if the transformed instance vector falls within the concept region in order to model the
instanceOf relation.
Additionally, based on the tree structure of concepts, two special types of concept pairs naturally occur in knowledge graphs. To the best of our knowledge, we are the first to propose and define these concept pairs as follows: (1) Overlapping concept pairs: These two concepts contain the same instance, but the
subClassOf relation does not exist between them. For example, two concepts “American artists” and “American television writers” have the same instance, “Alice”, in
Figure 1. (2) Disjoint concept pairs: These two concepts belong to the same super-concept, but they do not comprise a
subClassOf relation and do not have the same instance. In
Figure 1, two concepts “Chinese artists” and “American artists” belong to the same super-concept “artist”. These two pairs of concepts can be seen as intrinsic enhancement samples for concepts. They can accelerate the formation of concept regions, improve concept representation, and enhance the performance of the
subClassOf relation. However, existing geometric methods do not account for the above two concept pairs.
In this paper, we propose a novel concept representation, HCCE, where each concept is encoded as a hyperspherical cone, and each instance is encoded as a vector point. We explicitly model the hierarchical tree structure of concepts in the embedding space while preserving the transitivity of isA relations as geometric methods. Additionally, we utilize the relative positions to design score functions for overlapping concept pairs and disjoint concept pairs, thus enhancing the model’s ability to capture relationships between concepts and improving the performance of concept representations. We explore the impact of spatial configuration on HCCE by modeling concepts and instances in both the same space (SS) and different spaces (DS). Finally, we perform exhaustive empirical experiments on three benchmark datasets, which demonstrate the superiority of HCCE. We use t-SNE visualization to demonstrate the embedding effect of our models on the concept tree structure, concepts, and instances in the embedding space. Additionally, by illustrating the relationship between the size of the concept embedding regions and the number of concepts and instances, we validate the model’s rationality and effectiveness, providing a basis for geometric modeling methodologies.
Our contributions can be summarized as follows:
To the best of our knowledge, our method, HCCE, is the first to explicitly model the hierarchical tree structure of concepts by representing each concept as a hyperspherical cone region in the embedding space.
We propose two methods, HCCE-SS and HCCE-DS, to explore the impact of embedding instances and concepts in the same or different embedding spaces.
Based on the tree structure of concepts, we identify two types of concept pairs—overlapping concept pairs and disjoint concept pairs—and design two score functions for them by utilizing relative positions to enhance the representation of concept embeddings.
Through exhaustive experiments on three benchmark datasets, HCCE achieves the best performance on the subClassOf relation and competitive performance on instance-related relations.
5. Experimental Results and Discussion
In this section, we first evaluate the performance of our methods on two standard tasks for Knowledge Graph Embedding (KGE): link prediction [
23] and triple classification [
38]. Next, we assess the impact of overlapping and disjoint concept pairs through ablation experiments. Then, we present visualizations of concept embeddings and their corresponding instance embeddings to illustrate the hierarchical tree structure of concepts in the embedding space. Additionally, we analyze the half-vertex angle
to validate the rationality of our models. Finally, we analyze the theoretical implications, practical implications, and limitations of our methods.
5.1. Triple Classification
Triple classification is a binary classification task to judge whether a given triple is correct or not. We conduct classification experiments on three datasets: YAGO39K, M-YAGO39K, and DB99K-242. We trained three types of triples—instance triples, instanceOf triples, and subClassOf triples—simultaneously and show their performance separately.
Evaluation protocol
Following most previous works [
9,
38], we employ the micro-averaged evaluation metric on four metrics—Accuracy, Precision, Recall, and F1-Score—to evaluate a triple classification task. This approach captures the overall effectiveness of the embedding model across all relations without being biased by class size on three datasets YAGO39K, M-YAGO39K, and DB99K-242. The decision rule for classification is simple: We set a relation-specific
for each relation triple prediction, including instance relations,
instanceOf relation, and
subClassOf relation. Then, for a triple, if its score
is below the threshold
, the triple is predicted as positive, otherwise negative. The relation-specific threshold
is determined by maximizing classification accuracy on the validation set. The optimal configurations are determined by accuracy in the validation set, which are shown in
Table 3.
Result
Table 4 shows the experimental results of
subClassOf triple classification on the YAGO39K and M-YAGO39K datasets. Compared with recent competitive models, HCCE-SS and HCCE-DS demonstrate clear improvements in almost all metrics for both datasets. This indicates that using hyperspherical cones to model concepts enhances the representation of concept embeddings. At a significant cost to either precision or recall performance, EIKE-UNP-EYE and TransEllipsoid achieve the highest recall and precision results, respectively. This likely stems from the design of each model: TransEllipsoid restricts concept regions more tightly, while EIKE-UNP-EYE allows broader concept regions. Specifically, compared to our method (HCCE-DS), EIKE-UNP-EYE shows a slight improvement in recall (1% in YAGO39K dataset and 0.85% in the M-YAGO39K dataset), but at a substantial decrease in precision (12.23% lower in the YAGO39K dataset and 12.59% lower in the M-YAGO39K dataset). Similarly, TransEllipsoid achieves higher precision than HCCE-DS (1.1% higher in the YAGO39K dataset and 2.23% higher in M-YAGO39K dataset) but exhibits a notable decline in recall (7.9% lower in YAGO39K dataset and 9.48% lower in M-YAGO39K dataset). As a result, HCCE-DS achieves higher overall F1-Scores, surpassing TransEllipsoid by 2.83% on the YAGO39K dataset and 2.9% on the M-YAGO39K dataset, and outperforming EIKE-UNP-EYE by 7.64% on YAGO39K dataset and 7.55% on the M-YAGO39K dataset. These results suggest that HCCE effectively balances classification metrics to enhance the performance of the
subClassOf triplet classification task. The superior performance of HCCE-DS compared to HCCE-SS indicates that modeling concepts separately in the concept embedding space helps capture the hierarchical tree structure of concepts, thereby improving the effectiveness of concept embeddings. The enhanced performance of the HCCE model on M-YAGO39K compared to YAGO39K indicates that the HCCE model can effectively model the transitivity of the
subClassOf relation.
Table 5 displays the experimental results of
instanceOf triple classification on the YAGO39K and M-YAGO39K datasets. On YAGO39K, TransEllipsoid and EIKE-UNP-EYE outperform HCCE-SS and HCCE-DS, potentially because the former models have more parameters. Specifically, according to
Table 1 and
Table 2, TransEllipsoid and EIKE-UNP-EYE have nearly 50% more parameters than HCCE-SS. Compared to TransC, which has a similar number of parameters, HCCE-SS achieves better results across all four metrics. On M-YAGO39K, HCCE-SS outperforms all baselines on almost all metrics. This indicates that using hyperspherical cones for concept modeling enhances the model’s performance in handling
instanceOf relation. Similar to the results for
subClassOf relation, EIKE-UNP-EYE performs slightly better than HCCE-SS in terms of precision but falls short in the other three metrics. This further demonstrates that HCCE can effectively balance classification metrics. Therefore, these results suggest that our HCCE model can effectively improve the performance of the instanceOf relation classification task while significantly reducing the parameter count. Moreover, on both datasets, HCCE-SS outperforms HCCE-DS. We speculate that this may be due to the difficulty of mapping functions accurately and assigning all instances to their corresponding concept regions. This suggests that modeling instances and concepts in the same space can reduce errors from spatial mapping, thereby benefiting the modeling of the
instanceOf relation.
Table 6 shows the experimental results of
subClassOf and
instanceOf triple classification on the DB99K-242 dataset. On the DB99K-242 dataset, HCCE achieves only competitive results across all metrics, without the outstanding performance seen on the M-YAGO39K and YAGO39K datasets. This may be due to the DB99K-242 dataset having fewer concepts, fewer
subClassOf and
instanceOf triples, and a lower ratio of concepts to instances compared to the other two datasets. As a result, the advantages of HCCE in concept modeling are not fully realized, and HCCE may overfit the structure of the instances due to a larger quantity of instances. This is also reflected in the instance triple classification experiments.
Table 7 shows the instance triple classification results on the YAGO39K and DB99K-242 datasets. On the YAGO39K dataset, HCCE achieves competitive results on all metrics compared to state-of-the-art models. Compared with TransE, HCCE-SS yields better results on all metrics, illustrating that using relative position modeling for
instanceOf and
subClassOf relations can indirectly benefit instance embedding. Compared to TransEllipsoid and EIKE-UNP-EYE, HCCE-SS achieves better results. The combined performance on
instanceOf and
subClassOf relations suggests that using hyperspherical cones for concept embedding is more accurate than ellipsoids for spatial distribution, improving instance embedding through the
instanceOf relation. TransFG and JECI achieve better performance than HCCE-SS and HCCE-DS. It is because TransFG enhances instance performance beyond TransE, while JECI does not consider
subClassOf relations. On the DB99K-242 dataset, HCCE-SS and HCCE-DS outperform other models on three metrics. This echoes the previous results of HCCE on
subClassOf and
instanceOf triple classification.
Additionally, we observe that HCCE-DS and HCCE-SS perform similarly across the three datasets with slight differences; for example, HCCE-DS performs better on subClassOf, while HCCE-SS excels on instanceOf, indicating that different space configurations affect the performance of HCCE.
5.2. Instance Link Prediction
Link prediction [
65] aims to predict the missing head entity
or tail entity
. For each testing instance triple, we use the method proposed in [
66] to replace the head or tail instance from the instance set to construct the corrupted triples, and then such triples are ranked in descending order according to the scores by the score function of Equation (
9). We conduct link prediction experiments for instance triples on YAGO39K and DB99K-242 datasets because the
instanceOf relation and
subClassOf relation are not suitable for this task [
11].
Evaluation protocol Following most previous works [
9,
65], we adopt two evaluation metrics as follows: (1) The mean reciprocal rank of all correct instances (MRR); (2) The proportion of correct instances that rank no larger than N (Hits@N). The higher the value of a model in MRR and Hits@N, the better its performance is. Note that a corrupted triple may have already existed in the knowledge graph, which should be regarded as a correct prediction. The settings “Raw” and “Filter” distinguish whether or not to consider the impact of a corrupted triple already existing in the knowledge graph.
Result Table 8 shows the experimental results of link prediction on the YAGO39K and DB111K-242 datasets. On the YAGO39K dataset, TransEllipsoid and EIKE perform better than HCCE, likely because former models use more parameters to enhance performance. However, with the same number of parameters, the performance of HCCE-SS outperforms TransC’s. This indicates that using hyperspherical cones to model concept embeddings is more effective than spheres for improving embedding quality and instance inference. DistMult achieves the the best performance in MRR, possibly because we determine the best configurations based solely on Hits@10, which may lead to a lower MRR. The performance of HCCE-DS is better than HCCE-SS, which may indicate that using the independent embedding space for instances is beneficial for models to instance inference. On the DB111K-242 dataset, the performance of HCCE is similar to it on the YAGO39K dataset, except that HCCE-DS achieves the best performance in Hits@10.
5.3. Ablation
In this section, to prove the effectiveness of disjoint concept pairs and overlapping concept pairs, We implement three sets of experiments by separately removing overlapping concept pairs (
), disjoint concept pairs (
), and both on YAGO39K and M-YAGO39K dataset.
Table 9 shows the results. (1) When we exclude overlapping concept pairs, the performance of
subClassOf and
instanceOf shows a slight decline. This shows that the overlapping concept pairs are important and can prompt the effectiveness of jointly embedding concepts and instances. (2) When we exclude disjoint concept pairs, the performance degrades again, but significant drop in
subClassOf relation effect (more than 10% F1-Score). This may be due to the fact that the entities involved in disjoint concept pairs are all concepts, so the performance on
subClassOf relation is significantly improved. Additionally, incorporating disjoint concept pairs allows the model to separate multiple concept regions that belong to the same super-concept in embedding space. This helps improve the model’s discrimination of the
subClassOf and
instanceOf relation triples related to these concepts. (3) When we simultaneously exclude overlapping concept pairs and disjoint concept pairs, compared with other results, this scheme has the worst effect. This continues to prove the effectiveness of overlapping and disjoint concept pairs to improve the performance of jointly embedding concepts and instances.
5.4. Tree Structure Visualization
To understand how concepts and instances are distributed in the embedding space and how the hierarchical tree structure of concepts is modeled, we apply t-SNE [
67] on YAGO39K to visualize the embedding space in
Figure 3. We utilize t-SNE to depict the instance embeddings
and the concept embeddings
of
. For HCCE-DS, each instance is mapped to an embedding point in the concept vector space.
In
Figure 3(a3,b3), we observe the following: (1) Instances surround their corresponding concepts. In the visualizations, each star representing a concept is surrounded by points of the same color, which represent instances. (2) Sub-concepts surround their super-concepts, forming a concept tree structure. To clearly illustrate this structure, we connect super-concepts and sub-concepts layer by layer from the “ROOT” concept, reconstructing the hierarchical tree structure of concepts in the knowledge graph. Furthermore, sub-concepts belonging to the same super-concept are closer together, which include the sub-concepts belonging to the concepts “party” and “hockey team”. To enhance clarity regarding the distribution of these two concepts along with their respective sub-concepts and instances, we have separately depicted two subtree structures, as shown in
Figure 3(a1,a2,b1,b2). From these visualizations, we observe the following: (1) Instances belonging to the super-concepts fill the entire space, with black dots spread throughout sub-figure (e.g.,
Figure 3(a1)). (2) Instances belonging to different sub-concepts exhibit distinct partition clusters. In
Figure 3(a1), the three different color dots are concentrated in the upper left, upper right, and lower left parts of the sub-figure. These visualizations demonstrate that HCCE can distinguish and model different sub-concepts belonging to the same super-concepts. Importantly, as anticipated, both HCCE-SS and HCCE-DS yield similar t-SNE visualization results, affirming that the HCCE method can universally model concepts and instances in either the same or different spaces.
An interesting observation is that some sub-concepts that belong to different super-concepts are embedded closer, e.g., “American artists” under “artist” and “American television writers” under “writer”. Clearly, for concept “American artists” and concept “American television writers”, we can easily identify their commonality: these two sub-concepts are “American”. Furthermore, in the YAGO39K dataset, we find that the aforementioned two concepts neither share common instances nor have the same super-concepts (except “ROOT”). These show that HCCE not only models the concept tree structure but also captures the semantic correlation among concepts via jointly training concepts, instances, and their relation embedding.
5.5. Overlapping Concept Pair Visualization
To demonstrate the impact of overlapping concept pairs on Knowledge Graph Embedding, we employed t-SNE to illustrate their distributions in the embedding space, as depicted in
Figure 4. For HCCE-DS, each instance is mapped to an embedding point in the concept vector space. According to the definition of overlapping concept pairs in the introduction, we separately denote instances shared by two concepts by ×. By observing the distribution of these instances, we examine the overlap between the concept regions.
In each sub-figure, instance vectors form two independent clusters according to the concept to which they belong, and the instance vectors belonging to both concepts are located at the intersection of these clusters. This indicates that the regions of the two concepts are independent but partially overlap due to the shared instances, confirming the model design described in
Section 3.2.3. Additionally, in some sub-figures (
Figure 4(a2,b2)), we observe that the clusters of stars and instance points of the same color are widely separated, which seems unreasonable. However, we should consider the instance vectors represented by × in conjunction with the instance vectors of the same color as the concept, as they all belong to the same concept. When viewed together, the concept is located within the cluster formed by these instances, which aligns with our expectations. Furthermore, the visualization of HCCE-SS and HCCE-DS exhibit similar distributions, suggesting that HCCE can universally embed concepts and instances in either the same or different spaces.
5.6. Analysis of the Half-Vertex Angle
In the preceding visualization section, we have illustrated the spatial distribution of concept vectors. However, representing angles (
) in high-dimensional space accurately in lower-dimensional spaces poses challenges. Therefore, we opted to depict the distribution of all concept angles using frequency distribution to demonstrate the pattern of half-vertex angle distribution, as depicted in
Figure 5. Here, we partitioned the entire range of angles into 1000 equal intervals based on the maximum and minimum angles. Observing the images of HCCE-SS, we notice that the half-vertex angles of concepts are concentrated between 0.3 and 1.1. HCCE-DS, on the other hand, exhibits an approximating Gaussian distribution, with concept half-vertex angles predominantly falling between 0.3 and 0.8. In both spatial scenarios, concepts with excessively small or large angles are relatively rare. We attribute concepts with small angles to either their limited number of instances or their granularity (concepts closer to leaf nodes in the concept tree). Conversely, concepts with large angles are likely to be closer to the root node in the concept tree (and inherently less frequent), and due to spatial constraints, it is reasonable for concepts with large angles to be less.
To further validate our hypothesis, we computed the average number of instances for all concepts within each angle interval, which are normalized by the number of concepts within the interval, as shown in
Figure 6. In HCCE-SS and HCCE-DS, concepts encompassing more instances tend to have larger half-vertex angles, which indicates a broader conceptual domain. This observation aligns with our hypothesis and modeling assumptions, as well as human intuition.
5.7. Theoretical and Practical Implications
In this section, we provide a scientifically detailed analysis of the theoretical and practical implications of our HCCE model, which is supported by mathematical modeling and experimental evidence.
From a theoretical perspective, our model addresses a gap in knowledge graph embeddings (KGE) by introducing high-dimensional spherical cones to represent concept embeddings, enabling the modeling of hierarchical and relational structures such as
instanceOf,
subClassOf, and instance relations. As detailed in
Section 3, our mathematical framework produces anisotropic embeddings that capture both hierarchy and relationships within knowledge graphs. This approach allows for a refined representation of overlapping and disjoint concept pairs, providing a more nuanced geometry than traditional embeddings. By leveraging these high-dimensional cones, HCCE encapsulates hierarchical tree structures directly within the embedding space, making it, to our knowledge, the first KGE model to achieve this level of hierarchy-preserving detail.
Experimental results in
Section 5.1 validate the efficacy of HCCE, demonstrating that our model maintains anisotropy while significantly reducing the number of parameters, thereby offering a compact yet expressive representation of concepts. Furthermore, HCCE significantly outperforms prior models on concept-related triples and achieves competitive performance on instance-related triples. Experimental results in
Section 5.6 demonstrate the validity of the design of our approach and provide evidence for using geometric regions to model concept embeddings.
From a practical perspective, our method has far-reaching implications in various applications. HCCE’s explicit geometric representation of hierarchical and relational structures enhances knowledge graph construction by enabling precise, hierarchy-aware concept embeddings. Furthermore, this concept-based approach can benefit concept-based question answering systems, where accurate hierarchical and relational inferences are essential. The model’s capability to represent structured knowledge makes it a valuable asset for large language models, providing them with concept-aware knowledge that improves both generalization and interpretability.
In summary, our HCCE model advances the theoretical understanding of hierarchical embeddings in KGEs and offers practical tools for applications that rely on structured, concept-aware knowledge.
5.8. Limitation
Our model employs an interpretable geometric framework that focuses on leveraging structural information among instances, concepts, and relationships. Concepts are abstract descriptions of entities, which contain a wealth of semantic information. However, our model lacks the utilization of textual descriptions of concepts to enhance concept embedding representations.
Our model primarily focuses on the design of concept representation while retaining the original instance embeddings. Experimental results indicate that the HCCE method can implicitly improve performance on instance-related triples through the instanceOf relationship, although the effect is limited. Future work could explore modifying the instance embeddings, for example, by employing alternative approaches or fine-tuning the mapping functions to enhance the model’s performance on instance-related triples.