Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP

Chen, Yutong; Li, Xia; Liu, Yang; Hu, Tiangui

doi:10.3390/sym17020173

Open AccessArticle

Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP

¹

Department of Biostatistics, City University of Hong Kong, Hong Kong 999077, China

²

Yangtze River Delta Research Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China

³

Institute of Integrated Circuit Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611700, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(2), 173; https://doi.org/10.3390/sym17020173

Submission received: 27 December 2024 / Revised: 16 January 2025 / Accepted: 22 January 2025 / Published: 24 January 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid advancement of knowledge graph (KG) technology has led to the emergence of temporal knowledge graphs (TKGs), which represent dynamic relationships over time. Temporal knowledge graph embedding (TKGE) techniques are commonly employed for link prediction and knowledge graph completion, among other tasks. However, existing TKGE models mainly rely on basic arithmetic operations, such as addition, subtraction, and multiplication, which limits their capacity to capture complex, non-linear relationships between entities. Moreover, many neural network-based TKGE models focus on static entities and relationships, overlooking the temporal dynamics of entity neighborhoods and their potential for encoding relational patterns, which can result in significant semantic loss. To address these limitations, we propose DuaTHP, a novel model that integrates Transformer blocks with Householder projections in the dual quaternion space. DuaTHP utilizes Householder projections to map head-to-tail entity relations, effectively capturing key relational patterns. The model incorporates two Transformer blocks: the entity Transformer, which models entity–relationship interactions, and the context Transformer, which aggregates relational and temporal information. Additionally, we introduce a time-restricted neighbor selector, which focuses on neighbors interacting within a specific time frame to enhance domain-specific analysis. Experimental results demonstrate that DuaTHP significantly outperforms existing methods in link prediction and knowledge graph completion, effectively addressing both semantic loss and time-related issues in TKGs.

Keywords:

TKGE; Transformer; self-attention; Householder projections

1. Introduction

In recent times, there has been a significant growth in the field of knowledge graphs (KGs). However, a common challenge identified in these advanced knowledge graphs is their inherent incompleteness. Detailed studies have shown that a large percentage of individuals and entities are missing critical information; for instance, in Freebase, one of the most widely used KGs in research, over 70% of profiles do not include places of birth, and more than 99% lack ethnicity data [1]. In practical applications, time is closely linked to the evolution of factual data, spanning various domains like disease trajectories and political developments. Temporal knowledge graphs (TKGs) play a crucial role in adding temporal context to facts, capturing both temporal specifics and the dynamic nature of real-world information. Similar to static KGs, TKGs are also incomplete, and recent years have seen increased interest in the TKG completion task within the KG field. Temporal knowledge graph embedding (TKGE) initially relied on static knowledge graph embedding (KGE) models for TKG completion and information retrieval. However, static KGE models have limited modeling capacity and cannot handle temporal facts well, resulting in subpar performance. As a result, researchers have developed various TKGE models, each with significant improvements in performance. For example, TERO [2] incorporated complex vector space modeling into the construction of TKGE, while RotateQVS [3] extended the capabilities of TERO [2] by utilizing quaternion vector space to represent entities, relations, and temporal dimensions. Despite these advancements, existing methodologies still fall short in modeling diverse relationship patterns and relation mapping properties (RMPs). Therefore, further refinements are needed for optimal performance. For example, consider the relationship between Phil Jefferson, Kobe and Gasol. Gasol joined the Los Angeles Lakers in 2008 and the Chicago Bulls in 2014, with Phil Jefferson coaching the Lakers during that time and Kobe playing for the Lakers. Only a few existing TKGE models can capture such complex relationships. Moreover, most TKGE models based on geometric transformations treat knowledge facts independently, without utilizing the structural information in TKGs to enhance the fidelity of embedded representations for entities and relations. Their reliance on additive or multiplicative operations limits their expressive capacity.

Additionally, drawing inspiration from CapsE [4], Fu et al. were the pioneers in applying a capsule neural network to the construction of TKGE models. Their creation, TempCaps [5], was designed to deeply analyze the interaction between entity relations and temporal aspects. Subsequent models, DuCape [6] and BiQCap [7], innovatively incorporated dual quaternion and biquaternion spaces, respectively, merging these concepts with the principles of capsule networks. However, many of these approaches tend to focus primarily on basic or less sophisticated structural frameworks, often overlooking the rich insights that can be derived from the temporal dynamics of entity contexts. This omission limits their ability to effectively discern and map inherent relational patterns and properties. In recent years, the advent of large language models (LLMs) has led to significant advancements across various fields of research. Some researchers [8,9,10,11,12,13] have applied these models to the completion of TKGs, aiming to provide LLMs with temporally relevant logical inputs for decision-making. However, these models primarily focus on the semantic information between quadruples without adequately considering the temporal relationship patterns within the quadruples, resulting in suboptimal performance.

In this work, we introduce a new method, DuaTHP, which integrates Householder projections and Transformer blocks into the realm of dual quaternions. This model initially utilizes Householder projections to represent the relational mapping properties (RMPs) between entities. Subsequently, a neighbor selector is employed to isolate the neighboring nodes associated with the head entity within a specific timeframe. These nodes, along with the head entity and its relational counterpart, serve as inputs to a pair of Transformer layers. Contrasting with previous models, DuaTHP employs a unique design featuring only two layers of Transformer blocks, each with multiple attention heads, which significantly reduces parameter complexity. The first Transformer layer focuses on parameter reduction, while the second excels in collating contextual information from the graph. Additionally, our model includes a refined decoder, balancing high-performance outcomes with scalability. Finally, we apply a logistic sigmoid function to the interaction of computed outputs and the transformed tail entities in the dual quaternion space to obtain probability scores. The primary contributions of our work are as follows:

Our approach blends Transformer block transformations with the TKGE model architecture, allowing for nuanced capture of entity and relationship characteristics and providing valuable context within the TKG.
We innovate by incorporating Householder projections into the dual quaternion space, enabling our model to simultaneously map vital relational patterns and complex mapping properties in the TKG.
Our model’s efficacy is demonstrated through comparisons with leading KGE and TKGE models in both link and time prediction tasks across five TKGs, consistently outperforming established benchmarks.

2. Related Work

2.1. Static KG Embeddings

The array of methodologies for link prediction in knowledge graphs bifurcates into two core categories: those predicated on geometric embeddings and those predicated on neural network architectures. Notably, within the domain of geometric embedding, the TransE model [14] stands as a quintessential example of geometric strategies. This model depicts entities and relationships in a knowledge graph via vector representations, anchored in the principles of vector space stability. This framework succinctly encapsulates relationships in a triadic construct as vectorial transitions from the head entity to the tail entity. Subsequently, aiming to encapsulate relation mapping attributes, scholars have developed various deformation models [15,16,17], building upon TransE [14]. Inspired by Euler’s identity, RotatE [18] transposes entities and relationships into a complex vector space, portraying each relationship as a rotational action from the head entity to the tail entity. This addresses TransE’s [14] limitations in illustrating symmetric relation patterns. Extending this notion, Rotate3D [19] and QuatE [20] broaden this methodology by transposing entity relationships into quaternion space. Further expanding the embedding dimensionality, HousE [21] incorporates Householder rotations and projections for modeling purposes. Neural network methodologies have also made salient contributions, exemplified by ConvE [22], a trailblazer in integrating deep neural networks into KGE, diverging from prior models’ shallow and rapid construction. CapsE [4] further innovates KGE architectures by integrating capsule networks, using capsules to encode specific input features, thus maintaining feature recognition without compromising spatial integrity.

2.2. Temporal Knowledge Graph Embeddings

The field of temporal link prediction has seen a diverse array of methodological approaches. Scholars have developed and refined static TKGE, such as TTransE [23] and TA-TransE [24], both of which integrate a temporal dimension into TransE [14]. HyTE [25] utilizes pure TKG to learn time-aware embeddings, associating each timestamp with its corresponding hyperplane within the entity-relation space. ATisE [26] employs additive time series decomposition to encapsulate the evolution of KG representations. TA-DistMult [24] establishes a new deep evolutionary knowledge network extending from DistMult [27]. This model excels in TKG inference, particularly in predicting temporal aspects of relational facts without explicit temporal data. A salient instance is ATisE [26], applying additive time series decomposition to adeptly capture KG representation evolution. This method effectively embeds temporal data into entity and relation representations, considering their evolutionary uncertainty. Additionally, TA-DistMult [24] evolves from DistMult [27] into a novel deep evolutionary knowledge framework, utilizing time-dependent nonlinear entity representations. Fact emergence is construed as a multivariate point process, guided by an intensity function based on fact scores from entity embeddings. DE-SimplE [28], extending SimplE [29], incorporates temporal data into transient entity embeddings to model relational patterns. It characterizes entities at any given time, integrating transient embedding functions into the static model. TeRo [2] evolves from TransE [14] and RotatE [18] to propose dual temporal inference methodologies. RotateQVS [3] and 3DRTE [30] expand QuatE [20] into a quaternion vector space for entities, relations, and time. ChronoR [31] embeds entity relations in a k-dimensional vector space, employing a versatile knowledge graph inference system for dense representations. This model learns k-dimensional relational and temporal rotation transformations, aligning the head entity of each fact closely with its tail entity in vector space. Models like DYERNIE [32], ATTH [33], and HERCULES [34] project TKGs into hyperbolic space to model hierarchical temporal relationship patterns. HyIE [35] uses Householder transformation in a mixed vector space. These geometry-based TKGE models capture intrinsic temporal relation patterns and mapping properties but have limited modeling capacities. Inspired by CapsE [4], TempCaps [5] enhances TKGE with a dynamic routing aggregator derived from capsule neural networks. However, TempCaps [5] operates on relatively shallow or suboptimal structures, constraining its ability to fully exploit complex entity and relation interactions. Subsequent models, namely DuCape [6] and BiQCap [7], have innovatively amalgamated the realms of dual quaternion and biquaternion spaces, respectively, with the foundational principles of capsule networks. Nonetheless, a significant number of these methodologies predominantly emphasize elementary or rudimentary structural frameworks. This focus frequently results in the underutilization of the profound insights that can be gleaned from analyzing the temporal dynamics inherent in entity contexts. Such an oversight consequently restricts their efficacy in accurately discerning and mapping the intrinsic relational patterns and properties inherent in these contexts.

3. Dual Quaternion and Householder Projections

3.1. Dual Quaternion

In quaternion algebra, a quaternion q is defined as

q = a + b i + c j + d k

, where

q \in H

and the coefficients

a, b, c, d \in R

. The quaternion algebra satisfies the following identities:

i^{2} = j^{2} = k^{2} = i j k = - 1

. An alternative representation of the quaternion q is in terms of a scalar–vector pair, denoted as

q = [a, v]

, where

a \in R

and

v \in R i + R j + R k

.

In dual quaternion theory, a dual quaternion

\hat{q}

is expressed as

\hat{q} = q_{1} + q_{2} ξ

, where

ξ

is a nilpotent element, satisfying

ξ^{2} = 0

, and

\hat{q} \in Q

. Here,

q_{1}

and

q_{2}

are quaternions that represent the real and dual parts, respectively.

To describe the unit dual quaternion in the context of both translation and rotation, it is written as

\hat{q} = r + \frac{1}{2} u r ξ

, where r is a unit quaternion encoding the rotational component, defined by

r = [cos (\frac{θ}{2}), sin (\frac{θ}{2}) v]

. This quaternion represents a rotation around the unit vector

v

by an angle

θ

, where

v \in R i + R j + R k

is a unit vector. Additionally,

u

is the translation quaternion, given by

u = 1 + \frac{ξ}{2} (u_{0} i + u_{1} j + u_{2} k)

.

\hat{q} = r + \frac{1}{2} u r ξ

(1)

where r represents the rotation unit quaternion, written as

r = [cos (\frac{θ}{2}), sin (\frac{θ}{2}) v]

. This formulation encodes a rotation about the unit vector

v

by an angle

θ

, with

v \in R i + R j + R k

being a normalized unit vector.

On the other hand,

u

represents a translation quaternion, defined as

u = 1 + \frac{ξ}{2} (u_{0} i + u_{1} j + u_{2} k)

. This quaternion corresponds to a translation operation by the vector

(u_{0}, u_{1}, u_{2})

.

For a 3D vector

(v_{0}, v_{1}, v_{2})

, the associated unit dual quaternion is introduced as

\hat{v} = 1 + ξ (v_{0} i + v_{1} j + v_{2} k)

. The rotation of a vector

(v_{0}, v_{1}, v_{2})

by a dual quaternion

\hat{q}

is succinctly expressed as

\hat{q} \hat{v} \bar{{\hat{q}}^{*}}

.

3.2. Householder Projections

Given a unit vector

p \in R^{k}

and a scalar

τ \in R

, the

k \times k

modified Householder matrix, denoted as

M (p, τ)

, is defined by

M (p, τ) = I - τ p p^{⊤},

where

{∥p∥}_{2}^{2} = 1

and

I

is the

k \times k

identity matrix. Notably, the modified Householder matrix

M (p, τ)

has

k - 1

eigenvalues equal to 1 and a single eigenvalue equal to

1 - τ

. Consequently,

M (p, τ)

is invertible unless

τ = 1

.

For a sequence of real scalars

T = {τ_{c}}_{c = 1}^{m}

and unit vectors

P = {p_{c}}_{c = 1}^{m}

, where m is a positive integer and

p_{c} \in R^{k}

, the cumulative mapping is given by

P (P, T) = \prod_{c = 1}^{m} M (p_{c}, τ_{c}) .

(2)

It is pertinent to highlight that the matrix produced by the function

P (P, T)

invariably remains invertible, as the product of invertible matrices perpetuates its inherent invertibility. This particular projection, termed as Householder Projection, represents a sequence of m adapted Householder reflections, renowned for their distance-preserving nature. Householder projections uniquely possess the ability to methodically alter the relative distances between data points in a reversible fashion. This exclusive property renders them exceptionally suitable for modeling relational dynamics, maintaining the capability to encapsulate complex relational patterns without distortion.

4. Construction of the DuaTHP Model

In the ensuing segment, we expound upon the structural design of DuaTHP, an innovative paradigm rooted in the amalgamation of Transformer blocks and Householder projections. DuaTHP leverages the neighbor selector methodology to methodically ascertain relevant adjacent nodes within the designated temporal intervals. Thereafter, it engages in the task of encoding entity and relationship representations in the temporal knowledge graph, integrating pivotal information from the proximate graph neighbors. A graphical depiction of this procedure is illustrated in Figure 1 to enhance understanding.

4.1. Notations

In the context of this research, each quadruple represents a genuine event occurring within a specific temporal window in the real world. Formally, a TKG is defined as a collection of quadruples, denoted by

TKG = {(h, r, o, t)}

. Each quadruple consists of a head entity

h \in ς

, a relational entity

r \in R

, a tail entity

o \in ς

, and a temporal entity

t \in T

. Here,

ς

represents the set of all possible entities,

R

encompasses the full range of relations, and

T

denotes the set of all potential temporal instances. The primary goal of TKGE is to predict and improve missing links within the TKG. This task requires the transformation of entities, relations, and temporal entities into distributed vector representations, which form the foundation for constructing a scoring function. This scoring function is designed to assess the plausibility and accuracy of each quadruple.

4.2. Application of Householder Projections

In the context of dual quaternion spaces, consider a quadruple

(h, r, o, t)

. Within this framework, the embeddings of the head entity h and the tail entity o are represented as

h \in Q^{d}

and

o \in Q^{d}

, respectively, where d denotes the dimensionality of the entity embeddings. During the phase involving relational Householder projections, the model utilizes the advantages of Householder projections to transform both the head entity

h

and the tail entity

o

within the dual quaternion space. To facilitate this transformation, two sets of parameters are introduced to define the embeddings associated with the relation r: specifically, the axes

P_{r} \in Q^{d \times m}

and the scalars

T_{r} \in Q^{d \times m}

, where m is a positive integer. It is crucial to note that each row of

P_{r}

contains m unit dual quaternions. Formally, for each quadruple

(h, r, o, t)

, the DuaTHP model governs the transformation of the head entity

h

and the tail entity

o

via relation-specific Householder projections, which are expressed as

h^{'} = P_{1} (P, T) h = \prod_{c = 1}^{m} M_{1} (p_{c}, τ_{c}) h, o^{'} = P_{2} (P, T) o = \prod_{c = 1}^{m} M_{2} (p_{c}, τ_{c}) o .

(3)

4.3. Transformer Blocks

In the proposed model architecture, two Transformer blocks are instantiated. Following the Householder projection transformation applied to the head entity, the Transformer transformation is concatenated with the relational entity embedding

r

. Initially, within each Transformer block, a self-attention mechanism is performed on the embedding matrix of the pair

(h^{'}, r)

. This matrix is treated as the query

Q \in Q^{2 \times d}

, the key

K \in Q^{2 \times d}

, and the value

V \in Q^{2 \times d}

, which serve as inputs to the Transformer block. For a basic single-head attention mechanism, the output representation matrix of the attention block for the head entity and the relation is given by:

Z (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(4)

The self-attention mechanism is applied across all head entities, where each query

Q_{i} = Q W_{i}^{Q}

, key

K_{i} = K W_{i}^{K}

, and value

V_{i} = V W_{i}^{V}

, with the weight matrices

W_{i}^{Q} \in Q^{d \times d_{k}}

,

W_{i}^{K} \in Q^{d \times d_{k}}

, and

W_{i}^{V} \in Q^{d \times d_{v}}

. The final value vectors are concatenated and linearly projected. Assuming there are n head entities, the process can be described as follows:

Z_{i} (Q, K, V) = softmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}, MulZ (Q, K, V) = [Z_{1}, \dots, Z_{n}] W_{o},

(5)

where

i \in [1, \dots, n]

and

W_{o} \in Q^{n d_{v} \times d}

are the projection matrices.

The resulting output,

MulZ (Q, K, V)

, is subsequently passed to the next stage of the encoder, known as the feedforward layer. This layer consists of two components: the first applies the ReLU activation function, while the second employs a linear transformation. The mathematical expression for the feedforward operation is given by

\hat{h} = FFN (MulZ (Q, K, V)) = max (0, MulZ (Q, K, V) W_{1} + b_{1}) W_{2} + b_{2},

(6)

where

max (\cdot)

denotes the ReLU activation function. The parameters

W_{1} \in Q^{d \times d_{n}}

,

b_{1} \in Q^{d_{n}}

,

W_{2} \in Q^{d_{n} \times d}

, and

b_{2} \in Q^{d}

are trainable. The feedforward layer maps the features to a higher-dimensional space via the first transformation and then projects them back to a lower-dimensional space through the second linear transformation. While this simplified Transformer encoder is effective in link prediction tasks, learning temporal knowledge graph embedding (TKGE) from quaternions may not fully capture the rich structural information embedded within the temporal context.

4.4. Neighbor Selector

In the development of the temporal knowledge graph embedding (TKGE) model, we implement the neighbor selector technique to identify the corresponding neighbor entity within the TKGE framework. For a given quadruple

(h, r, o, t)

, the neighbor entity of h within a temporal interval of

2 t_{Δ}

is determined. The set of entities satisfying these conditions across all periods is denoted as

E (h) = \{o^{'} ∣ (h, r, o^{'}, t^{'})\},

(7)

where

(t - t_{Δ}) \leq t^{'} \leq (t + t_{Δ})

.

In a similar vein, the transformation of the neighbor entity via the Multi-head Attention mechanism is mathematically expressed as

Y_{i} (Q, K, V) = softmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}, MulN (Q, K, V) = [Y_{1}, \dots, Y_{h}] W_{o},

(8)

where

i \in [1, \dots, h]

and

W^{o} \in Q^{h d_{v} \times d}

comprises the projection matrices. Additionally, the query, key, and value matrices for each head are defined as

Q_{i} = Q W_{i}^{Q}

,

K_{i} = K W_{i}^{K}

, and

V_{i} = V W_{i}^{V}

, where

W_{i}^{Q} \in Q^{d \times d_{k}}

,

W_{i}^{K} \in Q^{d \times d_{k}}

, and

W_{i}^{V} \in Q^{d \times d_{v}}

, respectively.

The output

MulN (Q, K, V)

is subsequently processed by the feedforward transformation, defined as

\tilde{h} = FFN (MulN (Q, K, V)) = max (0, MulN (Q, K, V) W_{1} + b_{1}) W_{2} + b_{2},

(9)

where

max (\cdot)

denotes the ReLU activation function. The trainable parameters are

W_{1} \in Q^{d \times d_{h}}

,

b_{1} \in Q^{d_{h}}

,

W_{2} \in Q^{d_{h} \times d}

, and

b_{2} \in Q^{d}

. In the second Transformer block, the outputs

\hat{h}

and

\tilde{h}

are integrated, yielding the final output

T_{e}

. This process aims to enrich the head entity with relational context derived from its neighboring entities.

Subsequently, the plausibility score for the quadruple is computed as the dual quaternion product between

T_{e}

and the tail entity’s embedding. The score is represented as

ϕ (h, r, o, t) = T_{e} \otimes o^{'},

(10)

where ⊗ denotes the dual quaternion product.

To validate the quadruple

(h, r, o, t)

, the probability of o being the correct tail entity in the tuple

(h, r, ?, t)

is computed by applying a logistic sigmoid function to the score

ϕ (h, r, o, t)

. This process is denoted as

f (h, r, o, t) = σ (ϕ (h, r, o, t)),

(11)

where

σ

represents the sigmoid function.

4.5. Loss Function

We employ the Adam optimizer [36] to train our model, with the primary objective of minimizing the loss function, as previously formulated in [37]:

L = \sum_{(h, r, o, t) \in T} \sum_{o^{'} \in T^{'}} l o g (1 + exp (y_{o} f (h, r, o, t))) + λ {∥W∥}_{2}^{2},

(12)

where T represents a collection of valid quadruples, while

T^{'}

is derived through the introduction of perturbations to valid quadruples in T. To counteract the propensity for model overfitting, we incorporate

L_{2}

regularization on the weight vector denoted as

W

, with

λ

designating the associated regularization weight. Notably, the assignment of values to

y_{o}

is as follows:

y_{o}

takes on the value of 1 when referring to

(h, r, o, t) \in T

, whereas it assumes the value of −1 when characterizing

(h, r, o, t) \in T^{'}

.

5. Experiment

5.1. Datasets and Baselines

This study conducts a comparative evaluation of the DuaTHP framework relative to leading state-of-the-art models, leveraging five widely recognized benchmark datasets for link prediction to assess its performance. The five key temporal knowledge graph (TKG) benchmarks used in this experimental analysis are ICEWS14, ICEWS05-15, YAGO11k, Wikidata12k, and GDELT. A comprehensive summary of the key characteristics of these datasets is provided in Table 1 for clarity.

In the domain of temporal knowledge graph completion (TKGC), the evaluation employs standard metrics, namely mean reciprocal rank (

M R R

) and hits at N (

H i t @ N

). The

M R R

metric quantifies the average reciprocal rank of correctly predicted quadruples, while

H i t @ N

indicates the proportion of correct quadruples found within the top N ranked predictions, with

H i t @ N

computed for values of

N \in {1, 3, 10}

. Higher values of both

M R R

and

H i t @ N

reflect superior model performance.

Our model is benchmarked against a variety of static KGE models, such as TransE [14], DistMult [27], RotatE [18], QuatE [20], and HousE [21]. Concurrently, we position our model in juxtaposition with various TKGE models, including TTransE [23], ATisE [26], TA-DistMult [24], HyTE [25], DE-SimplE [28], TeRo [2], ChronoR [31], DYERNIE [32], ATTH [33], HERCULES [34], RotateQVS [3], 3DRTE [30], TempCaps [5], HyIE [35], DuCape [6], and BiQCap [7]. To elucidate the impact of integrating graph context and excluding the veritable tail entity, we establish DuaTHPR, which omits the graph context, and DuaTHPA, which retains the actual tail entity.

5.2. Implementation Details

During the training phase, our methodology adheres to a maximum of 1000 iterations, consistent with the experimental protocol outlined in RotatE [18]. To enhance the model’s performance, we utilize negative sampling aimed at minimizing the cross-entropy loss. Specifically, negative samples are drawn exclusively from invalid objects, selected uniformly, with the total number of such samples denoted as

n = 500

. The batch size, represented by

b

, is maintained at 512 across all datasets. Simultaneously, the embedding dimension, denoted as

d

, is systematically evaluated within the range

{100, 200, 500, 1000, 1500}

. The allocation of parameters for representing a single entity,

d \times k

, is fixed at 1500. Additionally, the learning rate,

r

, is iteratively tested across the interval from 0.01 to 1, while the margin,

γ

, is constrained to the values

{6, 9, 12, 24, 30}

. The number of Householder reflections, m, used in Householder projections is carefully tuned within the set

{1, 2, 3, 4, 6, 8}

. For hyperparameter tuning, the number of attention heads is examined within

{20, 30, 40, 50, 64}

, while the architectural hyperparameters, excluding the attention heads, are varied by adjusting

d_{k}

and

d_{v}

within the ranges

{32, 50, 64}

and the parameter

d_{h}

within

{100, 512, 1024, 2048}

.

5.3. Result and Analysis

Table 2 and Table 3 delineate the performance outcomes of our proposed model. Notably, our model demonstrates exceptional efficacy across various datasets. Specifically, it attains cutting-edge results on ICEWS14, YAGO11k, and Wikidata12k. This achievement underscores the potency of our model, synergistically enhanced by the integration of relation mapping properties (RMPs) and domain-specific insights, particularly in the intricately structured ICEWS14 dataset. The experimental outcomes underscore our model’s superior performance over contemporary models in the complex ICEWS14 dataset, primarily attributable to its higher density (as indicated in Table 1 coupled with its intricate points and relational data patterns), thus providing a more conducive environment for our model to leverage its strengths. Conversely, on ICEWS05-15 and GDELT datasets, our model exhibits a marginal underperformance compared to the state-of-the-art models DYERNIE [32] and HyIE [35]. This is largely ascribed to the prevalence of hierarchical relational patterns within these datasets, which are inherently more amenable to models grounded in hyperbolic geometry. Moreover, the superior performance of DuaTHP in comparison to DuaTHPR and DuaTHPA across all five datasets further validates the necessity of incorporating contextual information and the exclusion of actual tail entities. This aspect is crucial, particularly since context significantly influences the model’s performance, akin to the application of static KGE methodologies in TKGE contexts.

5.4. Ablation Experiment

5.4.1. The Value of Householder Projections

The Householder projection is an instrumental tool capable of effecting reversible transformations in the relative distance between two points. It furnishes a theoretical foundation for modeling the intricate RMPs while maintaining the capacity to capture and characterize relational patterns. Prior research efforts, including TransH [16], TransR [17], and TransD [15], have made endeavors to devise projection operations that ensure distinct entity representations under varying relations. However, these models have historically faltered in their ability to effectively encapsulate fundamental relational patterns, including inverse relation patterns.

In order to substantiate the efficacy of the Householder projection proposed in this paper, we devised four distinct variants of DuaTHP. These variants involve the substitution of the Householder projection with alternative techniques, resulting in DuaTHPH, DualHR, and DuaTHPD. Additionally, the article introduces the DualT model, wherein the Householder projection is entirely omitted. Our evaluation encompasses rigorous testing across the ICEWS14, ICEWS05-15, and GDELT datasets. Notably, the findings in Table 4 reveal that the performances of DuaTHPH, DuaTHPR, and DuaTHPD exhibit negligible improvements or, in some instances, degradations compared to the DualT model, devoid of additional projections. This observed decline in performance is primarily attributed to the introduction of asymmetric projections, which inadvertently compromise the models’ original modeling capabilities. Conversely, Table 4 underscores a notable enhancement in the performance of DuaTHP compared to DualT and other deformation models across the three datasets. This substantial improvement underscores the distinct advantage conferred by the invertible Householder projections embedded within the DuaTHP model.

5.4.2. The Number of Attention Heads

The core concept underpinning the initial layer of the Transformer block is the application of self-attention mechanisms, which are pivotal for capturing complex interactions and dependencies between entities and relationships. This mechanism serves as a fundamental component for obtaining robust and informative representations. Notably, a sufficiently large number of attention heads enhances expressiveness, surpassing the benefits offered by merely increasing the number of encoder blocks, as suggested by prior research [38].

To empirically assess the impact of attention head count on model performance, experiments were conducted on the ICEWS14 and GDELT datasets. As illustrated in Figure 2, the findings unequivocally demonstrate that increasing the number of attention heads improves the quality of learned representations while effectively mitigating the risk of overfitting. However, it is essential to recognize that this increase in attention heads also introduces greater model complexity, which consequently reduces the scalability of the model.

Given this trade-off, the present study adopted a strategy that balances performance and complexity. Specifically, 60 attention heads were selected for the ICEWS14 dataset and 30 for the GDELT dataset. These values were carefully chosen to optimize model performance while maintaining a manageable model size and complexity, thus ensuring an optimal balance.

5.4.3. The Number of Modified Householder Matrices

As exemplified in Equation (2), the Householder projection encompasses an assembly of m modified Householder matrices. We extensively investigated the impact of varying the number of modified Householder matrices, represented as m, on model performance across multiple datasets, including ICEWS14, ICEWS05-15, and YAGO11k. As delineated in Figure 3, it becomes evident that augmenting the value of m engenders an improvement in the performance of DuaTHP, followed by a subsequent decline across all three datasets. This observed trend can be primarily attributed to the constructive relationship between projection capability and the value of m; however, it is imperative to recognize that excessively intricate projections may induce model overfitting.

Furthermore, the disparate graph densities characterizing these datasets introduce another layer of complexity. ICEWS14 and ICEWS05-15 exhibit higher graph densities, while a lower graph density characterizes YAGO11k. Consequently, to effectively capture and model the richer graph information prevalent in ICEWS14 and ICEWS05-15, the optimal value of m associated with peak performance significantly surpasses that of m within the context of YAGO11k. This adaptive approach to selecting m values ensures that the model can flexibly accommodate the varying graph characteristics of different datasets while achieving optimal performance.

This study investigates the impact of the visible time window length on model performance, specifically using the ICEWS14 dataset. As illustrated in Figure 4, the optimal length of the visible time window (

t_{Δ}

) is found to be 6 days (

t_{Δ} = 3

days). This is primarily attributed to the fact that a shorter time window may result in insufficient information, while an excessively long time window tends to introduce excessive noise, ultimately undermining the model’s performance.

5.4.4. The Neighbor Selector and Transformer Components

We constructed DualH1 and DualH2 by deleting the neighbor selector and Transformer components based on the original model to investigate the effect of the neighbor selector and Transformer components on the model performance. Analysis of Table 5 shows that the performance of DuaTHP, when compared to DualH1 and DualH2, suggests that the neighbor selector and Transformer components affect the model’s performance. The performance of DualH1 is better than that of DualH2, indicating that the Transformer component has a higher impact on the model’s performance than that of DualH2. The component influences the model’s performance more than the neighbor selector component.

6. Conclusions

In this manuscript, we introduce DuaTHP, a TKGE model that synergizes Transformer blocks with Householder projection techniques within a dual quaternion space. DuaTHP employs Householder projections to map the relational attributes between head and tail entities in dual quaternion space. Concurrently, it constructs two Transformer blocks: The inaugural block is dedicated to capturing the dynamic interplay between entity–relationship pairs. A novel approach to time-restricted neighbor selection is presented, involving the incorporation of a temporal window. This method selectively accrues domain information within a specific timeframe, focusing exclusively on neighbors that interact with the primary entity within this window. The subsequent Transformer block, designated as the Context Transformer, amalgamates relational and temporal insights derived from the first block’s outputs. Demonstrably, DuaTHP achieves substantial enhancements in link prediction tasks across five benchmark datasets, marking a notable advancement over preceding methodologies.

Author Contributions

Y.C.: investigation, methodology, software, and writing—original draft; X.L.: investigation, data curation, visualization, attestation, and writing—original draft; Y.L. investigation, data curation, visualization, attestation, and writing—original draft; T.H.: conceptualization, methodology, software, investigation, writing—original draft, visualization, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Sichuan Key Research Fund under project No. 2023YFG0001, the Quzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of China under grant No. QZQN25F050009, and the Municipal Government of Quzhou under grant No. 2023D021.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bollacker, K.D.; Evans, C.; Paritosh, P.K.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008; Wang, J.T., Ed.; ACM: New York, NY, USA, 2008; pp. 1247–1250. [Google Scholar]
Xu, C.; Nayyeri, M.; Alkhoury, F.; Yazdi, H.S.; Lehmann, J. TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 8–13 December 2020; Scott, D., Bel, N., Zong, C., Eds.; International Committee on Computational Linguistics: New York, NY, USA, 2020; pp. 1583–1593. [Google Scholar]
Chen, K.; Wang, Y.; Li, Y.; Li, A. RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5843–5857. [Google Scholar]
Nguyen, D.Q.; Vu, T.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D.Q. A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers). Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 2180–2189. [Google Scholar]
Fu, G.; Meng, Z.; Han, Z.; Ding, Z.; Ma, Y.; Schubert, M.; Tresp, V.; Wattenhofer, R. TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion. In Proceedings of the Sixth Workshop on Structured Prediction for NLP, SPNLP@ACL 2022, Dublin, Ireland, 27 May 2022; Vlachos, A., Agrawal, P., Martins, A.F.T., Lampouras, G., Lyu, C., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 22–31. [Google Scholar]
Zhang, S.; Liang, X.; Tang, H.; Zheng, X.; Zhang, A.X.; Ma, Y. DuCape: Dual Quaternion and Capsule Network-Based Temporal Knowledge Graph Embedding. ACM Trans. Knowl. Discov. Data 2023, 17, 104:1–104:19. [Google Scholar] [CrossRef]
Zhang, S.; Liang, X.; Li, Z.; Feng, J.; Zheng, X.; Wu, B. BiQCap: A Biquaternion and Capsule Network-Based Embedding Model for Temporal Knowledge Graph Completion. In Proceedings of the Database Systems for Advanced Applications—28th International Conference, DASFAA 2023, Tianjin, China, 17–20 April 2023; Wang, X., Sapino, M.L., Han, W., Abbadi, A.E., Dobbie, G., Feng, Z., Shao, Y., Yin, H., Eds.; Proceedings, Part II; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2023; Volume 13944, pp. 673–688. [Google Scholar]
Yang, J.; Ying, X.; Shi, Y.; Xing, B. Tensor decompositions for temporal knowledge graph completion with time perspective. Expert Syst. Appl. 2024, 237, 121267. [Google Scholar] [CrossRef]
Liu, R.; Yin, G.; Liu, Z.; Tian, Y. Reinforcement learning with time intervals for temporal knowledge graph reasoning. Inf. Syst. 2024, 120, 102292. [Google Scholar] [CrossRef]
Dai, Y.; Guo, W.; Eickhoff, C. Wasserstein adversarial learning based temporal knowledge graph embedding. Inf. Sci. 2024, 659, 120061. [Google Scholar] [CrossRef]
Zhang, F.; Chen, H.; Shi, Y.; Cheng, J.; Lin, J. Joint framework for tensor decomposition-based temporal knowledge graph completion. Inf. Sci. 2024, 654, 119853. [Google Scholar] [CrossRef]
Chen, B.; Yang, K.; Tai, W.; Cheng, Z.; Liu, L.; Zhong, T.; Zhou, F. Interpreting Temporal Knowledge Graph Reasoning (Student Abstract). In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, BC, Canada, 20–27 February 2024; AAAI Press: Washington, DC, USA, 2024; pp. 23451–23453. [Google Scholar] [CrossRef]
Pan, J.; Nayyeri, M.; Li, Y.; Staab, S. HGE: Embedding Temporal Knowledge Graphs in a Product Space of Heterogeneous Geometric Subspaces. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, BC, Canada, 20–27 February 2024; AAAI Press: Washington, DC, USA, 2024; pp. 8913–8920. [Google Scholar] [CrossRef]
Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 2787–2795. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Brodley, C.E., Stone, P., Eds.; AAAI Press: Washington, DC, USA, 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Bonet, B., Koenig, S., Eds.; AAAI Press: Washington, DC, USA, 2015; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, Beijing, China, 26–31 July 2015; Volume 1: Long Papers. The Association for Computer Linguistics: Stroudsburg, PA, USA, 2015; pp. 687–696. [Google Scholar]
Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Gao, C.; Sun, C.; Shan, L.; Lin, L.; Wang, M. Rotate3D: Representing relations as rotations in three-dimensional space for knowledge graph embedding. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 385–394. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion Knowledge Graph Embeddings. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 2731–2741. [Google Scholar]
Li, R.; Zhao, J.; Li, C.; He, D.; Wang, Y.; Liu, Y.; Sun, H.; Wang, S.; Deng, W.; Shen, Y.; et al. HousE: Knowledge Graph Embedding with Householder Parameterization. In Proceedings of the International Conference on Machine Learning, ICML 2022, Baltimore, MD, USA, 17–23 July 2022; Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S., Eds.; PMLR: London, UK, 2022; Volume 162, pp. 13209–13224. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; McIlraith, S.A., Weinberger, K.Q., Eds.; AAAI Press: Washington, DC, USA, 2018; pp. 1811–1818. [Google Scholar]
Leblay, J.; Chekol, M.W. Deriving Validity Time in Knowledge Graph. In Proceedings of the Companion of the Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, 23–27 April 2018; Champin, P., Gandon, F., Lalmas, M., Ipeirotis, P.G., Eds.; ACM: New York, NY, USA, 2018; pp. 1771–1776. [Google Scholar]
García-Durán, A.; Dumancic, S.; Niepert, M. Learning Sequence Encoders for Temporal Knowledge Graph Completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 4816–4821. [Google Scholar]
Dasgupta, S.S.; Ray, S.N.; Talukdar, P.P. HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 2001–2011. [Google Scholar]
Xu, C.; Nayyeri, M.; Alkhoury, F.; Lehmann, J.; Yazdi, H.S. Temporal Knowledge Graph Embedding Model based on Additive Time Series Decomposition. arXiv 2019, arXiv:1911.07893. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Conference Track Proceedings. Bengio, Y., LeCun, Y., Eds.; 2015. [Google Scholar]
Goel, R.; Kazemi, S.M.; Brubaker, M.A.; Poupart, P. Diachronic Embedding for Temporal Knowledge Graph Completion. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020; AAAI Press: Washington, DC, USA, 2020; pp. 3988–3995. [Google Scholar]
Kazemi, S.M.; Poole, D. SimplE Embedding for Link Prediction in Knowledge Graphs. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 4289–4300. [Google Scholar]
Wang, J.; Zhang, W.; Chen, X.; Lei, J.; Lai, X. 3DRTE: 3D Rotation Embedding in Temporal Knowledge Graph. IEEE Access 2020, 8, 207515–207523. [Google Scholar] [CrossRef]
Sadeghian, A.; Armandpour, M.; Colas, A.; Wang, D.Z. ChronoR: Rotation Based Temporal Knowledge Graph Embedding. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; AAAI Press: Washington, DC, USA, 2021; pp. 6471–6479. [Google Scholar]
Han, Z.; Ma, Y.; Chen, P.; Tresp, V. Dyernie: Dynamic evolution of riemannian manifold embeddings for temporal knowledge graph completion. arXiv 2020, arXiv:2011.03984. [Google Scholar]
Chami, I.; Wolf, A.; Juan, D.-C.; Sala, F.; Ravi, S.; Ré, C. Low-dimensional hyperbolic knowledge graph embeddings. arXiv 2020, arXiv:2005.00545. [Google Scholar]
Montella, S.; Rojas-Barahona, L.M.; Heinecke, J. Hyperbolic Temporal Knowledge Graph Embeddings with Relational and Time Curvatures. In Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online, 1–6 August 2021; pp. 3296–3308. [Google Scholar]
Zhang, S.; Liang, X.; Tang, H.; Guan, Z. Hybrid Interaction Temporal Knowledge Graph Embedding Based on Householder Transformations. In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October–3 November 2023; El-Saddik, A., Mei, T., Cucchiara, R., Bertini, M., Vallejo, D.P.T., Atrey, P.K., Hossain, M.S., Eds.; ACM: New York, NY, USA, 2023; pp. 8954–8962. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR: London, UK, 2016; pp. 2071–2080. [Google Scholar]
Baghershahi, P.; Hosseini, R.; Moradi, H. Self-attention presents low-dimensional knowledge graph embeddings for link prediction. Knowl. Based Syst. 2023, 260, 110124. [Google Scholar] [CrossRef]

Figure 1. The architecture of DuaTHP.

Figure 2. (a) MRR of DuaTHP on ICEWS14 for a different number of attention heads; (b) MRR of DuaTHP on GDELT for a different number of attention heads.

Figure 3. The MRR results of our model with different numbers of modified Householder matrices on ICEWS14, ICEWS05-15, and GDELT. (a) m on ICEWS14; (b) m on ICEWS05-15; (c) m on GDELT.

Figure 4. Time window size (days). Experiments were performed on dataset ICEWS14. (a) MRR vs. t; (b) Hit@1 vs. t; (c) Hit@3 vs. t; (d) Hit @10 vs. t.

Table 1. Statistics for the various experimental datasets.

Datatset	No. of Entities	No. of Relations	No. of Trainings	No. of Validations	No. of Tests	Time Span
ICEWS14	6869	230	72,826	8941	8963	2014
ICEWS05-15	10,094	251	368,962	46,275	46,092	2005–2015
YAGO11k	10,623	10	16,406	2050	2051	−453–2844
Wikidata12k	12,554	24	32,497	4062	4062	1479–2018
GDELT	500	20	2,735,685	341,961	341,961	31 March 2015–31 March 2016

Table 2. Link prediction results on ICEWS14 and ICEWS05-15. The best results are in bold.

Model	ICEWS14				ICEWS05-15
Model	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10
TransE	0.280	0.094	-	0.637	0.294	0.090	-	0.663
DistMult	0.439	0.323	-	0.672	0.456	0.337	-	0.691
RotatE	0.418	0.291	0.478	0.690	0.304	0.164	0.355	0.595
QuatE	0.471	0.353	0.530	0.712	0.482	0.370	0.529	0.727
HousE	0.427	0.367	0.537	0.769	0.451	0.342	0.527	0.731
TTransE	0.255	0.074	-	0.601	0.271	0.084	-	0.616
TA-TransE	0.275	0.095	-	0.625	0.299	0.096	-	0.668
HyTE	0.297	0.108	0.416	0.655	0.316	0.116	0.445	0.681
TA-DistMult	0.477	0.363	-	0.686	0.474	0.346	-	0.728
DE-SimplE	0.526	0.418	0.592	0.725	0.513	0.392	0.578	0.748
ATiSE	0.550	0.436	0.629	0.750	0.519	0.378	0.606	0.794
TeRo	0.562	0.468	0.621	0.732	0.586	0.469	0.668	0.795
ChronoR	0.625	0.547	0.669	0.773	0.675	0.593	0.723	0.820
DYERNIE	0.58.8	0.498	0.638	0.761	0.687	0.618	0.728	0.825
HERCULES	0.612	0.543	0.647	0.741	0.685	0.621	0.720	0.809
ATTH	0.617	0.545	0.654	0.754	0.685	0.620	0.719	0.806
RotateQVS	0.591	0.507	0.642	0.754	0.633	0.529	0.709	0.813
HyIE	0.631	0.543	0.671	0.786	0.684	0.615	0.728	0.831
TempCaps	0.489	0.388	0.544	0.679	0.521	0.423	0.576	0.705
DuCape	0.587	0.549	0.661	0.776	0.686	0.609	0.726	0.821
BiQCap	0.592	0.548	0.664	0.780	0.681	0.612	0.730	0.825
DuaTHPR	0.495	0.412	0.547	0.729	0.531	0.435	0.611	0.741
DuaTHPA	0.611	0.528	0.661	0.768	0.655	0.578	0.719	0.827
DuaTHP	0.637	0.553	0.674	0.788	0.689	0.618	0.726	0.835

Table 3. Link prediction results on YAGO11k, Wikidata12k, and GDELT. The best results are in bold.

Model	YAGO11k				Wikidata12k				GDELT
Model	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10
TransE	0.100	0.015	0.138	0.244	0.178	0.100	0.192	0.339	0.132	0.000	-	0.158
DistMult	0.158	0.107	0.161	0.268	0.222	0.119	0.238	0.460	0.196	0.117	0.208	0.348
RotatE	0.167	0.103	0.167	0.305	0.221	0.116	0.236	0.461	-	-	-	-
QuatE	0.164	0.107	0.148	0.270	0.230	0.125	0.243	0.416	-	-	-	-
HousE	0.158	0.089	0.124	0.264	0.269	0.147	0.271	0.416	0.207	0.169	0.241	0.367
TTransE	0.108	0.020	0.150	0.251	0.172	0.096	184	0.329	0.115	0.000	0.160	0.318
TA-TransE	0.127	0.027	160	0.326	0.178	0.030	0.267	0.429	-	-	-	-
HyTE	0.105	0.015	0.143	0.272	0.180	0.098	0.197	0.333	0.118	0.000	0.165	0.326
TA-DistMult	0.161	0.103	0.171	0.292	0.218	0.122	0.232	0.447	0.206	0.124	0.219	0.365
DESimplE	-	-	-	-	-	-	-	-	0.230	0.141	0.248	0.403
ATiSE	0.170	0.110	0.171	0.288	0.280	0.175	0.317	0.481	-	-	-	-
TeRo	0.187	0.121	0.197	0.319	0.299	0.198	0.329	0.507	0.245	0.154	0.264	0.420
DYERNIE	-	-	-	-	-	-	-	-	0.289	0.192	0.307	0.467
HERCULES	-	-	-	-	-	-	-	-	0.294	0.187	0.305	0.464
RotateQVS	0.189	0.124	0.199	0.323	-	-	-	-	0.270	0.175	0.293	0.458
HyIE	0.191	0.121	0.201	0.326	0.301	0.197	0.328	0.506	0.272	0.182	0.292	0.468
TempCaps	-	-	-	-	-	-	-	-	0.258	0.180	0.277	0.404
DuCape	0.183	0.121	0.201	0.324	0.272	0.181	0.315	0.469	-	-	-	-
BiQCap	0.186	0.129	0.198	0.325	0.283	0.184	0.314	0.476	-	-	-	-
DuaTHPR	0.169	0.115	0.152	0.294	0.246	0.141	0.262	0.453	0.232	0.148	0.265	0.421
DuaTHPA	0.187	0.121	0.197	0.321	0.298	0.197	0.319	0.487	0.263	0.171	0.291	0.452
DuaTHP	0.195	0.123	0.207	0.329	0.304	0.209	0.331	0.509	0.275	0.183	0.309	0.462

Table 4. Link prediction results on ICEWS14, ICEWS05-15, and GDELT. The best results are in bold.

Model	ICEWS14				ICEWS05-15				GDELT
Model	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10	MRR	Hit@1	Hit@3	Hit@10
DualT	0.631	0.548	0.667	0.781	0.684	0.617	0.725	0.823	0.261	0.172	0.292	0.452
DuaTHPH	0.630	0.550	0.664	0.780	0.681	0.614	0.723	0.815	0.265	0.174	0.293	0.451
DuaTHPR	0.632	0.547	0.667	0.780	0.682	0.611	0.719	0.821	0.267	0.175	0.301	0.459
DuaTHPD	0.631	0.549	0.668	0.782	0.682	0.613	0.718	0.823	0.266	0.175	0.296	0.454
DuaTHP	0.637	0.553	0.674	0.788	0.689	0.618	0.726	0.835	0.275	0.183	0.309	0.462

Table 5. Link prediction results on ICEWS14. The best results are in bold.

Modle	ICEWS14
Modle	MRR	Hit@1	Hit@3	Hit@10
DualH1	0.628	0.542	0.661	0.776
DualH2	0.621	0.539	0.658	0.772
DuaTHP	0.637	0.553	0.674	0.788

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Li, X.; Liu, Y.; Hu, T. Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP. Symmetry 2025, 17, 173. https://doi.org/10.3390/sym17020173

AMA Style

Chen Y, Li X, Liu Y, Hu T. Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP. Symmetry. 2025; 17(2):173. https://doi.org/10.3390/sym17020173

Chicago/Turabian Style

Chen, Yutong, Xia Li, Yang Liu, and Tiangui Hu. 2025. "Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP" Symmetry 17, no. 2: 173. https://doi.org/10.3390/sym17020173

APA Style

Chen, Y., Li, X., Liu, Y., & Hu, T. (2025). Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP. Symmetry, 17(2), 173. https://doi.org/10.3390/sym17020173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Transformer Architecture and Householder Transformations for Enhanced Temporal Knowledge Graph Embedding in DuaTHP

Abstract

1. Introduction

2. Related Work

2.1. Static KG Embeddings

2.2. Temporal Knowledge Graph Embeddings

3. Dual Quaternion and Householder Projections

3.1. Dual Quaternion

3.2. Householder Projections

4. Construction of the DuaTHP Model

4.1. Notations

4.2. Application of Householder Projections

4.3. Transformer Blocks

4.4. Neighbor Selector

4.5. Loss Function

5. Experiment

5.1. Datasets and Baselines

5.2. Implementation Details

5.3. Result and Analysis

5.4. Ablation Experiment

5.4.1. The Value of Householder Projections

5.4.2. The Number of Attention Heads

5.4.3. The Number of Modified Householder Matrices

5.4.4. The Neighbor Selector and Transformer Components

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI