4.1. Embedding Propagation
The embedding propagation is the knowledge propagation with collaborative signal. As is known to all, knowledge is propagated along the links in the knowledge graph. Thus, the user’s long-distance interests are captured by capturing higher-order connection information in KG. Furthermore, the user-item collaborative graph with interactions between users and items, directly contains a collaborative signal, including the relationship between original user and item. So, the collaborative signal is extracted by learning interactions in CG. Concretely speaking, the embedding propagation is composed of two components: collaborative signal extraction and knowledge propagation. First, the collaborative signal is extracted as the initial sets of entities for users and items according to historical users’ interaction data with the items. Second, knowledge is propagated layer by layer by aggregating information from each layer’s neighborhood.
Collaborative signal extraction. In user and item interactions, it is generally assumed that users with similar access behaviors may prefer the similar items, and users may like items that are similar to their preferences. Thus, it makes sense for collaboration signals to be embedded in the feature representations of users and items and participate in propagation. In particular, the user’s collaborative signal is represented by items, which the user has already indicated they prefer. A collection of these items with a score of 1 between user and items is named the initial set of users for the user. The initial set of users
u formulated as
. For the item, if different items are all associated with a user, there may be a correlation and similarity between these items. Therefore, first, users who have interacted with the item
v are found, then other items that interact with the same users are treated as collaborative neighbors. The set of item’s collaborative neighbors is called the initial set of items, as
. For example, in
Figure 1, user
prefers items
and
, so the initial set of the user includes
and
. For item
, the user interacting with
is
and
,
interacts with
and
interacts with
. Thus, the initial set of the
contains
and
.
Notice that the items in the user-item collaborative graph is aligned with the entities in the knowledge graph, to obtain the initial entity set of the user and the initial entity set of the item, respectively. The initial entity set of the user is shown below:
The initial entity set of the item is shown below:
Knowledge propagation. The knowledge graph contains a large amount of side information, such as item attributes, correlation information and so on. In knowledge graph, entities are connected through relationships. The neighbor information of an entity has an essential influence on the feature of the entity. So, the feature representation of the entity is extended layer by layer through neighborhood knowledge along links between entities themselves. This method is able to capture high-order structural information.
Since the initial entity sets of users and items are subsets of the entity set in the knowledge graph, the feature representations of users and items are extended and learned, which starts with them initial entity sets. In other words, both users and items carry out embedding propagation and feature extraction in the form of entities on the knowledge graph. Thus, the propagation of entity e is used to illustrate the specific propagation process for convenience.
Specifically, the entities directly connected to an entity
e is the first-order neighbors of the entity, as
. Furthermore, given a triple
, the tail entity
t is one of the first-order neighbors of the head entity
h. By analogy, the
l-order neighbors set of an entity is defined as
at layer
l. The details as follows:
where
denotes the
-order neighbors of the entity
e at layer
. The triplet set composed of all neighbors at layer
l for entity
e is shown below:
In the real scene, different tail entities have different contributions to feature representation of an entity under different conditions. For example, user
prefers a movie named Crouching Tiger, Hidden Dragon. The movie is both a martial arts film and a love story. However, user
actually prefers this movie because he likes martial arts films. Therefore, martial arts as the attribute of the film should have a higher weight than love story. Therefore, the attention mechanism is designed to learn the weights of different tail entities. The global embedding of the triple set at the
l-th layer for the entity is defined as follows:
where
is the embedding representation of the tail entity
t,
is a scoring function used to compute the attention weight, which controls how much information from the tail entity is propagated to head entity in this case, where head entity is
h and the relation is
r.
The
is implemented via the attention network, which is formulated as follows:
where
is a multi-layer neural network.
,
and
are the trainable weight matrices of each layer, respectively.
,
and
are the biases.
is the last nonlinear activation function, which is Sigmoid. ReLU is the activation function except for the last layer. Then, the softmax function is applied to normalize the coefficients across all triples, as follows:
The attention score function is capable of attracting more attention to the neighboring tail entity that has more influence. The method distills more accurate association information.
Finally, the representations of entities of other layers are computed according to Equation (
5), and the representations set from layer 1 to layer L are obtained for the entity
e:
So, the representation sets of user and item are obtained as
and
. However, there is still an issue to note, which is that the importance of the initial entity set should not be ignored. In particular, the initial entity set of the user is directly related to the user and represents the user’s preferred items. It directly reflects user interest and, for items, the initial entity set is a set of collaborative neighborhoods, which contains collaboration information and strong interaction information. Thus, the initial entity sets of users and items are treated as their 0-order neighborhood, respectively. The average value of the embedding representations of the entities in the initial entity set of the user and the item is regarded as the 0-th layer embedding of the user and the item:
Finally, the representation sets of users and items, which contain attention weight, are as follows:
4.2. Transductive Learning
The long tail problem is a serious challenge for personalized recommendation. In the real recommended scenario, most entities have only a few interactive data. This results in insufficient data available for training. However, the general methods require a large amount of training data to achieve an accurate effect. Therefore, the transductive learning strategy is proposed to to alleviate the long tail problem. Transductive learning is applied to each layer of item individually. For ease of understanding, we use the example of executing the strategy at layer l, as well as at other layers. Here are the details.
First, the representations of items are updated, based on the representation of unknown items already learned in the previous step. Then, considering the small number of triples available to entities and high uncertainties on unknown items, the novel representations of items are fed into two individual knowledge propagation layers to model relations of unknown items. So, the number of nodes and layer depth of knowledge propagation in transductive learning are consistent with the number of nodes and layer depth in embedding propagation. Then, the transductive learning layer is designed to re parameterize the model by computing the output representations across the two knowledge propagation layers. The specific calculation process is as follows:
where the learning feature embedding of items that have been updated in the previous step is
, and
denotes representation of item
v at
l-th layer.
and
represent processes through two individual knowledge propagation layers, respectively. Meanwhile,
, if there is an item
v and entity
h that satisfy
. Homoplastically,
, if there is an item
v and entity
t that satisfy
.
is defined as embedding of item
v at the
l-th layer, which contains a relationship between unknown items that have been modeled.
Transductive learning predicts the relationship between unknown entities themselves and constructs a connection for them. As a result, when a new item is added to the knowledge graph, reliable contact information for that entity is also capable of being obtained, even though the new entity has no triple. This approach extends the knowledge graph and also eases the cold starts problem.
Furthermore, different from users who only appear in the user-item collaborative graph, items appear both in the collaborative graph and as entities in the knowledge graph. This results in the item having initial embedding in the knowledge graph, and the embedding of the item in the original knowledge graph is closely related to the item, so the initial embedding of the item is added to the representation set.
Performing the above procedure at each layer results in the new embedding representations of the item at each layer with richer connection information. Then, the representations of items is updated again to
. The representations of the user is also
. For convenience, let us say
is equal to
. Therefore, the final embedding of the user and item is shown below: