2.5.2. Concrete Implementation of Recommendation Model Based on RippleNet
The recommendation problem based on the meteorology KG is formulated as follows. In the maize variety precise recommendation system, this study needs to define the maize variety set and the planting area set, where the set of maize varieties is represented by , and the planting areas set is . is the interaction matrix between the approved maize varieties and the planting areas, where is the suitability result of the maize varieties in the test planting area, and the value rules are shown in Equation (1).
In addition, the meteorology KG needs to be defined in this study. The meteorology KG is represented by
, and the meteorology KG contains a large number of triples
, in which
represent the head entity, relationship and tail entity, respectively, and
and
represent the sets of entities and relations in the meteorology KG. The
refers to a prediction function, where
is the probability that maize varieties
will be recommended in planting area
, and
are the model parameters of function
.
The recommendation model based on RippleNet [
23].
The framework of the recommendation model based on RippleNet is shown in
Figure 5. The model takes maize variety
and planting area
as input, and outputs the probability of recommending maize variety
in planting area
. The seed in the meteorology KG is generally composed of the field test data
of maize variety
.
, which refers to multiple ripple sets of maize variety
, is formed when the seed is expanded with the link. The set of knowledge triples that are
k-hop(s) away from the seed set
consist of a ripple set
. These ripple sets interact iteratively with the planting area embedding (the light yellow block) to obtain the responses of maize variety to planting area (the green blocks). The recommendation model obtains the final embedding (the dark gray block) of the maize variety by combining all the responses. Finally, the probability
that each planting area
is suitable for planting the maize variety
is calculated by embedding the approved maize variety
and the planting area
.
The meteorology KG contains rich facts and relationships between entities. In order to express the hierarchical meteorological preference of maize varieties in the meteorological KG, we recursively define the k-hop relevant entities set of maize variety in the recommendation model based on RippleNet, as shown below:
Definition 1 is the detailed explanation of the relevant entity, and Definition 2 is used to elaborate how a ripple set is formed by the recommended model with the help of the meteorology KG.
Definition 1. The relevant entity set of k-hop of maize varieties is defined as Equation (2) when “variety–plant station” interaction matrixand the meteorology knowledge graphis given.
in Equation (2) refers to the area set where the maize variety has been tested in the field and the yield increase percentage compared to the control variety of the variety in the tested area is larger than the specified threshold, which could be regarded as the seed set of the maize variety in the meteorology KG.
Definition 2. The seed (the historical planting records of maize variety) in the meteorology KG extends along the link to form relevant entities of maize variety. The k-hop ripple set of maize varietyis defined as Equation (3) when the definition of relevant entities is given. The k-hop ripple set of maize varietyis defined as the set of knowledge triples starting from.
In this study, the preference propagation technique is very valuable for the recommendation model based on RippleNet. This technique not only models the interaction between maize variety and planting area in a more detailed way, but also can be used to explore the potential interests of maize varieties in the ripple sets.
is the planting area embedding of each planting area
, where
refers to the dimension of the embeddings (
Figure 5). Generally, each triple
in the 1-hop ripple set
of maize variety
will be assigned a relevance probability by comparing the planting area embedding
to head
and relation
in this triple:
In Equation (4), the embeddings of relation and head are expressed by and , respectively. In the recommendation model, we regard the relevance probability as the similarity of planting area and entity . In order to implement the preference propagation technique, the relevance probability needs to be measured in the embeddings of relation ().
According to the relevance probability obtained in Equation (4), the second step of preference propagation is realized, and the relevance probability is used as a weight to control the direction of preference propagation. We will get the vector
by taking the sum of tails in
weighted by the corresponding relevance probabilities.
is the embedding of tail
from the knowledge triple
. In addition, vector
is the 1-order response of maize variety
’s history record
with respect to planting area
. Equations (4) and (5) are the main steps of the preference propagation technique of this recommendation model. Through these two steps, the interests of a maize variety are transferred along the links in
. The set of relevant entities
will inherit the interests of the maize variety from the maize variety’s history set
. After the preference propagation in
is completed,
is used to replace
in Equation (4), and we see the ripple sets
repeat Equations (4) and (5), and finally the second response vector is returned. When preference propagation iterates on maize variety
’s ripple sets
, we obtain corresponding response vectors in each ripple set, which carry the meteorological preferences of maize varieties at different levels. In order to fully reflect the meteorological preference of maize varieties and help make the subsequent recommendation more precise, we obtain the embedding of maize variety
by combining all the response vectors from all orders. This embedding is the preference of maize varieties
for planting area
. The specific calculation is shown in Equation (6).
Finally, by calculating the inner product of maize variety embedding and planting area embedding, the output prediction recommendation probability is obtained:
In Equation (7), is the sigmoid function.
Through the above content, the relevant definitions in the recommendation model and the recommendation process of the recommendation model are introduced in detail. Next, we introduce the algorithm derivation process of the recommendation model.
In the recommendation model for suitable planting areas for maize based on RippleNet, we intend to maximize the posterior probability of the model parameters
when the meteorology KG,
and the matrix of “variety–test station” implicit feedback
are given. According to Bayes’ theorem, this is equivalent to maximizing
In Equation (8), the embeddings of all entities, relations and planting areas are included in the model parameters
. Further,
measures the priori probability of model parameters
and is set as a Gaussian distribution with zero mean and a diagonal covariance matrix according to [
24]:
The
in Equation (8) could be used as the likelihood function of the observed meteorology knowledge graph
when
is given. In the recommendation model based on RippleNet, the likelihood function for knowledge graph embedding (KGE) is defined by the three-way tensor factorization method:
When
,the indicator
in Equation (10) equals 1, otherwise
equals 0. Based on the definition in Equation (10), the scoring functions of entity–entity pairs in KGE and planting area–entity pairs in preference propagation can be unified under the same calculation model. The likelihood function of the observed “variety-test station” implicit feedback uses the
in Equation (8) when model parameters
and the meteorology KG is given. Additionally, the definition of
is the product of Bernouli distributions based on Equations (2)–(7).
In subsequent algorithms, we take the negative logarithm of Equation (8) as the following loss function for the recommendation model based on Ripplenet. See Equation (12) for details.
The embedding matrices for all planting areas and entities are and in Equation (12), respectively. For relation , the indicator tensor in the meteorology KG can be sliced into , and its embedding matrix is represented by . The cross-entropy loss between the ground truth of interactions and the predicted value by the recommendation model based on Ripplenet is measured by the first term of Equation (12). According to the second term, the squared error between the ground truth of the meteorology KG and the reconstructed indicator matrix will be returned. The third term is the regularizer for preventing over-fitting.
In order to solve the problem, the method of optimizing the loss function in this study is the stochastic gradient descent (SGD) algorithm. The negative sampling strategy in [
32] is used to randomly sample a minibatch of positive/negative interactions from
and true/false triplets from
during each iteration. Then the gradients of loss
with respect to model parameters
are calculated, and all parameters are updated by back-propagation based on the sampled minibatch. The detailed process of the recommendation model algorithm is shown in Algorithm 1.
Algorithm 1: Learning algorithm for the recommendation model based on RippleNet |
Input of this model: The field test data and the meteorology knowledge graph |
Output of this model:, a prediction function for recommending non-test planting areas for maize varieties |
1: Initialize all the recommendation model parameters; |
2: The ripple sets for each maize variety is calculated through the meteorology knowledge graph; |
3: for number of training iteration do |
4: Positive and negative interactions from and true and false triples from are sampled by the minibatch; |
5: Calculate gradients and on the minibatch by back-propagation according to Equations (4)–(12); |
6: Update , and by gradient descent with learning rate ; |
7: end for |
8: return the prediction function |