A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes

Zhang, Wei; Cui, Baoyang; Ye, Zhonglin; Liu, Zhen

doi:10.3390/math11234788

Open AccessArticle

A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes

¹

School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

²

The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining 810008, China

³

School of Computer, Qinghai Normal University, Xining 810008, China

⁴

Graduate School of Eegineering, Nagasaki Institute of Applied Science, Nagasaki 851-0123, Japan

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(23), 4788; https://doi.org/10.3390/math11234788

Submission received: 23 October 2023 / Revised: 23 November 2023 / Accepted: 24 November 2023 / Published: 27 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

Current network representation learning models mainly use matrix factorization-based and neural network-based approaches, and most models still focus only on local neighbor features of nodes. Knowledge representation learning aims to learn low-dimensional dense representations of entities and relations from structured knowledge graphs, and most models use the triplets to capture semantic, logical, and topological features between entities and relations. In order to extend the generalization capability of the network representation learning models, this paper proposes a network representation learning algorithm based on multiple remodeling of node attributes named MRNR. The model constructs the knowledge triplets through the textual association relationships between nodes. Meanwhile, a novel co-occurrence word training method has been proposed. Multiple remodeling of node attributes can significantly improve the effectiveness of network representation learning. At the same time, MRNR introduces the attention mechanism to achieve the weight information for key co-occurrence words and triplets, which further models the semantic and topological features between entities and relations, and it makes the network embedding more accurate and has better generalization ability.

Keywords:

network representation learning; knowledge modeling; structure modeling; attention; attribute contain

MSC:

68T07; 91D30; 05C82

1. Introduction

With the popularity of Internet applications such as social networks, e-commerce platforms, online news, and search engines, various types of data (e.g., text, images, audio, etc.) are constantly emerging, and people need to extract useful features from these data to help decision-making. In networks, nodes usually represent entities (e.g., people, objects, events, etc.), and the connecting edges between nodes represent relationships between entities. For network data such as social networks, knowledge graphs, etc., node’s text features and network structure features contain rich entity attributes and relationship features between entities, while the connected edges between nodes represent association relationships between entities. Therefore, it has become a hot research direction on how to use node’s text features and network structure features for network representation learning [1,2,3,4,5] in the field of machine learning, whose main goal is to map the nodes and network structures into a low-dimensional embedding space to facilitate learning and reasoning for various downstream tasks. Currently, network representation learning has become an important part of several fields, such as graph embedding and knowledge graph representation learning. In terms of development history, network representation learning is mainly divided into two stages: matrix factorization-based methods and deep learning-based methods. Early network representation learning methods are mainly based on matrix factorization, i.e., these methods are based on singular value factorization, and these methods are based on probability matrix factorization. Although these methods can learn the low-dimensional representation vectors of nodes, they are usually unable to handle nonlinear relationships in the network well because of the limitations of higher sparsity and lower semantics inherent in matrix factorization. In particular, the higher sparsity of data can bring higher computational complexity and poorer parallelism to traditional machine learning algorithms, which has become a limitation to the use of statistical learning methods for solving the different network data mining tasks. With the use of deep learning (neural networks), network representation learning methods based on deep learning have begun to receive attention from researchers. Deep neural networks have become the mainstream methods in network representation learning, which mainly include methods based on self-encoder [6], methods based on convolutional neural networks, and methods based on recurrent neural networks [7]. On the other hand, there are also some methods based on Graph Convolutional Networks (GCNs) [8], whose main idea is to aggregate the feature of each node with features of its neighboring nodes through convolutional operations, learning the low-dimensional embedding vectors of the nodes.

Most of the current methods for learning network representations are based on local neighboring features of nodes, which allows embedding nodes in a low-dimensional space to obtain a compact and meaningful representation. However, this approach ignores global network structural features and the global node’s textual features, thus potentially leading to missing features or interference from noise. In order to better capture the semantic and contextual features of nodes, researchers have begun to focus on how to utilize node’s textual features and network structural features for network representation learning. In our social network, each node has rich attribute information, and text is a type of attribute. It is one of the main innovations for researchers to make full use of the text attributes of nodes.

On the other hand, knowledge representation learning aims to learn low-dimensional dense representations of entities and relations from structured knowledge bases. Most methods use triplets to capture semantic, logical, and topological features between entities and relations. However, most of the current approaches view the triplet as a constraint item of modeling. Network representation learning uses neural networks to learn low-dimensional dense representations of nodes and edges to reflect the proximity and similarity between nodes and edges. Knowledge representation learning also adopts neural networks to learn entities’ and relations’ representations, such as TransE [9,10], TransH [11], and other methods. Network representation learning also usually adopts neural network-based methods, such as DeepWalk [12], node2vec [13], and others. The integration of the two kinds of representation learning can further enhance the semantic and topological features between entity and relationship.

In order to improve the model of network representation learning, this paper introduces additional features to improve the performance of network representation learning, named MRNR, which models network structural features, text features, and knowledge triplet features in a unified learning framework, which is shown in Figure 1.

Firstly, for the text features, different weights are considered to be contributed to the model. So, the attention mechanism is introduced to give different weights to the co-occurring words in the nodes. Important co-occurring words will be identified, while unimportant co-occurring words will reduce their weights. By that, the high-quality feature inputs can be filtered from the text of the network nodes. Then, the node’s text relationship triplet is constructed to define the relationships between nodes and nodes about the text, i.e., if the text features of two nodes contain common words, it means that the two nodes have text correlation with each other. In addition, the attention mechanism is added to the triplet, and it can make better use of the semantic features of the knowledge triplets, which can learn the contribution of different triplets to the model and improve the effect of the network representation learning models. Finally, the MRNR algorithm jointly models the structural features, textual features, and triplet features of the network and uses the attention mechanism to optimize the modeling process, which results in the final learned network representation learning vectors having more feature factors, extending the generalization ability of the representation vectors and improving the performance in downstream machine learning tasks. The method can also be applied to a wide range of fields, such as social networks, recommender systems, natural language processing, etc., which has important practical significance.

In summary, the main contributions of this paper are as follows:

(1): An unsupervised network embedding learning model as MRNR is proposed, which is not only able to utilize network structural features and node’s text features but also adds triplet features constructed based on node’s text relationships. Finally, the representation vectors obtained from the learning procedure contain more feature factors and better reflect the multifaceted features of nodes, improving the accuracy and robustness of node representation. By adding the attention mechanism to the triplets and co-occurrence words, different weights can be assigned based on the importance and similarity, improving the expressiveness and differentiation of the node representation.
(2): Compared to the existing graph attention network models, the MRNR algorithm proposed in this paper can directly calculate different weights for co-occurrence words and triples in text. This type of attention weight is fine-grained attention information. However, the existing network representation learning algorithms with attention mechanisms often take text features as the initial representation vector of nodes and then adjust the element values in the network representation vector in the neural network. Text features exhibit insufficient participation during the training process.
(3): The MRNR algorithm has a reasonable framework and clear objectives, and it is an unsupervised machine-learning method that achieves the goal of modeling multiple features in a unified framework. Compared to existing graph neural networks, the MRNR algorithm proposed in this paper can be applied to unlabeled networks. At the same time, it can also be applied to labeled networks.

2. Related Work

Network representation learning methods mainly include methods based on matrix factorization, methods based on random walk, and methods based on graph convolutional neural network, etc. Random walk-based methods mainly include DeepWalk, node2vec, and LINE [14], etc. The DeepWalk algorithm uses random walk to generate node sequences in the graph and learns the representation vectors of the nodes using the Skip-Gram [12] model or CBOW model. The node2vec algorithm can enhance the learning of local features while retaining the structural features of the network by introducing the weight control of the undirected edges in the random walk. The node2vec algorithm learns the embedding vectors of nodes by introducing the weight control of undirected edges in the random walks to enhance the learning of local features. The SGCN [15] algorithm proposes a directed graph convolutional neural network based on random walks for learning node embeddings in directed graphs. The GCLA [16] algorithm proposes a graph representation learning model based on comparative learning and data augmentation, which improves the robustness and generalization ability of the model through data augmentation, such as random walks on graphs and node deletion. GraRep [17] proposes a graph representation learning method based on global structural features, which captures the global structural features and the representation vectors of different orders by singular value factorization of the adjacency matrix of different orders of the graph, and then the model combines these representation vectors to learn the low-dimensional embedding vectors of the nodes. The HAN [18] algorithm improves the learning of heterogeneous graph representations and proposes a heterogeneous graph embedding learning method based on a multi-head attention mechanism, which can effectively learn the representation vectors of different types of nodes and edges. These methods treat the sequence of nodes as a text sequence and use a Word2Vec-like model to learn the relationships between nodes.

The method based on graph convolutional neural networks utilizes graph structure features to extend convolution operations to graphs, such as GCN, GAT [19], and GraphSAGE [20]. The GCN algorithm proposes a convolutional neural network architecture that can perform convolution operations on a graph, which learns the representation vectors of the nodes in a semi-supervised manner. The GraphSAGE algorithm samples node neighbors using an aggregation operation and learns the nodes using a convolutional neural network. The GAT algorithm is a node representation learning method based on graph neural networks, which models the relationships between nodes through attention mechanisms to improve the performance of graph tasks. The GAT algorithm is a classic graph neural network algorithm that adds attention mechanisms, and it is also a type of network representation learning algorithm. However, the GAT algorithm is a semi-supervised algorithm, and the MRNR algorithm proposed in this paper is an unsupervised algorithm.

These methods can effectively employ semi-supervised mechanisms to learn node representation vectors. Graph Attention Network-based methods introduce an attention mechanism to further improve the model performance, such as AGNN [21], GATNE [22], etc. The AGNN algorithm is a node classification method based on a graph neural network algorithm, which integrates the node’s own features and neighboring features to learn the node representation through an adaptive neighbor aggregation strategy. The GATNE algorithm is a multivariate network representation learning method based on a graph neural network, which can simultaneously consider the features of different types of nodes and edges for representation learning. The DynGCN [23] algorithm proposes a representation learning method based on temporal dynamic graphs, which extends the representation learning of graphs to temporal dynamic scenarios. Dong et al. [24] propose a graph node representation learning model by maximizing the mutual information between the current node and neighbor nodes. The EIGAT [25] algorithm proposes a method of incorporating global features into local attention for knowledge representation learning, which combines global features appropriately into the GAT model family by using scaled entity importance, which is computed by an attention-based global random walk strategy. The MEGNN [26] algorithm proposes a meta path extraction GNNs for heterogeneous graphs, which combine different bipartite graphs related to edge types into a new trainable graph structure. Sun et al. [27] propose a graph autoencoder algorithm based on a dual decoder, embedding graph topology and node attributes into a compact vector and reconstructing vertex attributes and graph structure simultaneously.

The classical approach in knowledge representation learning is TransE, which proposes an idea based on transformation strategies to achieve the modeling of multi-relationship knowledge graphs by mapping structures and relations into a low-dimensional embedding space. The DistMult [28] algorithm proposes a model based on matrix factorization, which calculates scores between entities and relations by matrix multiplication to achieve the modeling of knowledge graphs. The ConvE [29] algorithm is based on the model of a convolutional neural network, which represents entities and relationships as two-dimensional matrices and uses a convolutional neural network to model node relationships with higher prediction accuracy and interpretability. Yu et al. [30] propose a knowledge graph completeness model that integrates entity descriptions and network structures, and they study how to enhance the completeness of the knowledge graph by using the entity descriptions and the network structures. KANE [31] proposes a knowledge graph embedding model based on a knowledge graph attention network and its attributes to feature enhancement, which captures the higher-order structure and attribute features of knowledge graphs by considering both relation triplets and attribute triplets in a graph convolutional network framework.

Graph comparison learning, as an effective training scheme, can reduce popularity bias and improve noise robustness based on graph tasks. Wei et al. [32] proposed CGI to construct an optimized graph structure by adaptively deleting nodes and edges, which provides a basis for alleviating popular biases. In order to effectively discard information unrelated to downstream recommendations, CGI innovatively integrates information bottlenecks into the multi-view comparative learning process for recommendations. Cai et al. [33] proposed a lightweight and robust graph comparison learning framework named LightGCL, which can alleviate the problems caused by inaccurate comparison signals by utilizing global collaborative relationships. Compared with existing GCL-based methods, the proposed LightGCL has higher training efficiency.

Based on the above background, this paper proposes a network representation learning model that combines textual features, network structural features, and knowledge triplet features, which learns low-dimensional representations of nodes by jointly modeling textual features, network structural features, and knowledge triplet features to better support graph tasks such as node classification and link prediction. The MRNR algorithm proposed in this paper is an unsupervised machine learning algorithm that can be applied to various social network tasks, emotional computing tasks, recommendation systems, etc.

3. Algorithm Design

3.1. Definitions

This section defines some variables that are commonly used in this paper. We use G = (V, E) to define a network, where V denotes the set of nodes, E denotes the set of edges, and |V| denotes the number of nodes. R_v ∈ R^k represents the trained network representation vector, which is a k-dimensional matrix, and each row in the matrix denotes a node k-dimensional network representation vector, where k is much smaller than |V| and att is the attention parameter. T is the label of the nodes v. The symbol table is shown in Table 1.

3.2. MRNR Modeling

In this paper, text is used as the input of the relationship model, and then constructing node relationships, if it contains the same word of a node, the word is regarded as one kind of relationship between two nodes, and the more the number of the same words in the node’s text, the higher the node similarity. In order to adapt to the requirements of large-scale network-embedded learning tasks, a simple and efficient joint learning model needs to be proposed. Therefore, this paper proposes a novel joint learning framework, named MRNR, for network representation, which consists of three parts: network node relationship modeling, the node’s text relationship modeling, and the knowledge triplet relationship modeling. MRNR is improved from two levels. Firstly, an unsupervised network embedding joint learning framework is proposed, which is not only able to use network structure features and node’s text feature features but also adds the knowledge triplet features between nodes, which makes the representation vector obtained by learning contains multiple feature factors to improve the accuracy and robustness of node representation. Secondly, by adding the attention mechanism for co-occurrence words and knowledge triplets, different weights are automatically learned for different co-occurring words and knowledge triplets (text features between nodes are learned two times) to improve the expressiveness and differentiation of the node representation.

With the above improvements, the MRNR model is expected to solve the problem of joint modeling of network structural features, textual features, and knowledge triplets, which can help obtain better-quality representation vectors. The framework of the MRNR algorithm is specifically shown in Figure 2.

As shown in Figure 2, the network node relationship modeling learns the network structure feature representation, the node’s text feature relationship modeling learns the node’s text feature representation, and the triplet relationship modeling learns the node’s triplet relationship feature representation. The model ultimately integrates each node’s neighbor information and text features to obtain better-quality vectors. It should be noted that the knowledge triplet introduced in this paper is essentially a reuse of text features. Therefore, the MRNR algorithm models the node’s textual features by two different methods. At the same time, an attention mechanism is introduced to optimize the model modeling process, which provides the possibility of efficient use of text features.

Network Node Relationship Modeling: in order to have closer vector space distances between node pairs with connected edges and make node pairs with multi-hop edges or no edges have farther vector space distances between them, network node relationship modeling constantly adjusts the values in the node representation vectors.

Node’s textual relationship modeling: relationships between nodes are modeled by considering textual features between nodes. When there are common word features between nodes, the nodes containing this relationship get more learning from the model, which makes these types of nodes have a closer vector space distance between each other. In order to make the node pairs with the same important words get more learning opportunities between them, an attention mechanism is introduced in the node’s text relationship modeling, which enables the training of important words in the model to have higher weights. Ultimately, the parameters learned from the model are more reasonable.

Triplet relationship modeling: This model learns the triplet relationship features between nodes and introduces an attention mechanism to distinguish the importance and participation of different triplets. This allows the model to be given higher weights for important triplets and lower weights for unimportant triplets to affect and optimize the model training process. Triplet relationship modeling is a secondary modeling of text features, which is an efficient modeling of text features within a unified framework.

In summary, the MRNR model obtains and exchanges features between each other’s nodes through shared vectors, which enables the MRNR algorithm to obtain valuable feature factors from neighboring nodes, node’s textual features, and triplets during the modeling learning process to have a stronger generalization ability in all kinds of tasks. At the same time, the shared node vectors provide a better solution idea for the joint learning model. In this paper, we propose a learning model that combines network structural features, text features, and triplet features, and its overall objective function is

L (v) = L_{c} (v) + L_{w} (v) + L_{r} (v) = \sum_{v \in C} (\log g_{c} (v) + ρ \log g_{w} (v) + β \log g_{R} (v)) .

(1)

where

\log g_{c} (v)

is part of network node relationship modeling,

ρ \log g_{w} (v)

is part of the node’s text word features and attention mechanism learning,

β \log g_{R} (v)

is part of the triplet features and attention mechanism learning, and

ρ

and

β

are the reconciliation coefficients that balance the three models. The goal of this paper is to maximize Equation (1), which uses stochastic gradient ascent to update each parameter, where the three parts have the same form in parameter updating. The first term of the above equation is the objective function like the CBOW model, the second term of the above equation is the objective function of the textual features and the attention mechanism, which is essentially a CBOW model adding attention mechanism, and the third term of the above equation is the process of modeling the triplet relationship adding attention mechanism. For the speed of modeling, the optimization method of the second part and the third part still refer to that of the first part.

(1): Network node relationship modeling

Inspired by the CBOW model, this subsection proposes a network node relationship modeling approach, which is essentially a negative sampling-based CBOW model. Where nodes

w_{i}

are positive samples, and other nodes are negative samples. For the current node

v

, the context node of node

v

is

v_{n b}

, where node

v

is a positive sample and other nodes are negative samples. The negative sample set of node

v

is

N E G (v)

,

N E G (v) \neq n u l l

. For

\forall u \in D

,

D

is the set of nodes, and the sampling result of the node

v

is

L_{c} (v)

and it is defined as follows:

L^{v} (u) = \{\begin{matrix} 1, u = v, \\ 0, u \neq v, \end{matrix}

(2)

where

L^{v} (u)

is 1 for the positive sample, otherwise

L^{v} (u)

is 0.

For node

v

and its context node

v_{n b}

, and its positive sample

(v, v_{n b})

, the goal is to maximize the probability as follows:

g_{c} (v) = \prod_{v_{i} \in \{v\} \cup N E G (v)} p (v_{i}| v_{n b}) .

(3)

And

p (v_{i}| v_{n b}) = {[σ ({e_{v}}^{T} θ^{v_{i}})]}^{L^{v} (v_{i})} \cdot {[1 - σ ({e_{v}}^{T} θ^{v_{i}})]}^{{1 - L}^{v} (v_{i})} .

(4)

σ (x)

is the Sigmoid function.

e_{v}

is the sum of the node vectors in

v_{n b}

.

θ^{v_{i}}

is the node

v_{i}

corresponding to a vector to be trained.

Therefore,

g_{c} (v)

can be changed based on Equations (2) and (3):

g_{c} (v) = σ ({e_{v}}^{T} θ^{v_{i}}) \prod_{v_{i} \in \{v\} \cup N E G (v)} [1 - σ ({e_{v}}^{T} θ^{v_{i}})] .

(5)

Then, the overall optimization objective function is as follows:

G = \prod_{v \in C} g_{c} (v) .

(6)

For computational convenience, taking logarithms of

G

, the negative sampling-based CBOW model objective function is the following logarithmic operation on Equation (5):

L_{c} (v) = \log G = \log \prod_{v \in C} g_{c} (v) = \sum_{v \in C} \log g_{c} (v) = \sum_{v \in C} \sum_{v_{i} \in \{v\} \cup N E G (v)} \{(1 - L^{v} (v_{i})) \cdot \log [1 - σ ({e_{v}}^{T} θ^{v_{i}})] + L^{v} (v_{i}) \cdot \log [σ ({e_{v}}^{T} θ^{v_{i}})]\},

(7)

and

L (v, v_{i}) = (1 - L^{v} (v_{i})) \cdot \log [1 - σ ({e_{v}}^{T} θ^{v_{i}})] + L^{v} (v_{i}) \cdot \log [σ ({e_{v}}^{T} θ^{v_{i}})],

(8)

we give the gradient computation of the function

L (v, v_{i})

with respect to

θ^{v_{i}}

as follows:

\frac{\partial L (v, v_{i})}{\partial θ^{v_{i}}} = \frac{\partial}{\partial θ^{v_{i}}} \{(1 - L^{v} (v_{i})) \cdot \log [1 - σ ({e_{v}}^{T} θ^{v_{i}})] + L^{v} (v_{i}) \cdot \log [σ ({e_{v}}^{T} θ^{v_{i}})]\},

(9)

Utilizing the derivative functions of

\log σ (x)

and

\log (1 - σ (x))

to optimize Equation (8), we obtain

\begin{array}{l} \frac{\partial L (v, v_{i})}{\partial θ^{v_{i}}} = L^{v} (v_{i}) [1 - σ ({e_{v}}^{T} θ^{v_{i}})] e_{v} - [1 - L^{v} (v_{i})] σ ({e_{v}}^{T} θ^{v_{i}}) e_{v} \\ = \{L^{v} (v_{i}) [1 - σ ({e_{v}}^{T} θ^{v_{i}})] - [1 - L^{v} (v_{i})] σ ({e_{v}}^{T} θ^{v_{i}})\} e_{v} \\ = [L^{v} (v_{i}) - σ ({e_{v}}^{T} θ^{v_{i}})] e_{v} . \end{array}

(10)

In Equation (6), C is the node’s sequence corpus after the nodes have performed random walk, which is optimized using stochastic gradient ascent to obtain the updated equations for each parameter. The updated equation of

θ^{v_{i}}

is

θ^{v_{i}} : = θ^{v_{i}} + μ [L^{v} (v_{i}) - σ ({e_{v}}^{T} θ^{v_{i}})] e_{v},

(11)

where

μ

denotes the learning rate.

Considering the gradient of

e_{v}

in

L (v, v_{i})

, using the

e_{v}

and

θ^{v_{i}}

symmetry, the updated equation of the embedding vector of each node

v (u)

in the context can be obtained by

v (u) : = v (u) + μ \sum_{v_{i} \in \{v\} \cup N E G (v)} [L^{v} (v_{i}) - σ ({e_{v}}^{T} \cdot θ^{v_{i}})] {\cdot θ}^{v_{i}},

(12)

where

u \in v_{i}

,

e_{v}

is the sum of vectors of each node in the context node. How do we update the representation vectors of each node in the context node? We use the method like Word2Vec,

\sum_{v_{i} \in \{v\} \cup N E G (v)} [L^{v} (v_{i}) - σ ({e_{v}}^{T} \cdot θ^{v_{i}})] {\cdot θ}^{v_{i}}

is directly contributed to each component to update the representation vector of each node in the context. This paper adopts the same strategy and method. Finally, Equation (6) can be simplified as

L_{c} (v) = \sum_{v \in C} \sum_{v_{i} \in \{v\} \cup N E G (v)} \log p (v_{i} | v_{n b}) .

(13)

(2): Node’s text Relationship Modeling

With reference to the above model, this paper constructs a node’s textual relationship modeling model that maximizes the probability of the occurrence of the target node by modeling the probability of the occurrence of words in the context node. The objective function of the node’s textual relationship modeling is as follows:

L_{w} (v) = ρ \cdot \sum_{v \in C} \sum_{u \in w_{v}} \log (g_{w} (u)) = ρ \cdot \sum_{v \in C} \sum_{u \in w} \sum_{v_{i} \in \{v\} \cup N E G (u)} \{L^{v_{i}} (u) \cdot \log [σ (e_{u} {\cdot θ}^{v_{i}})] + (1 - L^{v_{i}} (u)) \log [1 - σ (e_{u} {\cdot θ}^{v_{i}})]\},

(14)

and

f_{1} = L^{v_{i}} (u) \cdot \log [σ (e_{u} {\cdot θ}^{v_{i}})] + (1 - L^{v_{i}} (u)) \log [1 - σ (e_{u} {\cdot θ}^{v_{i}})] .

(15)

In Equation (13), the target node is a positive sample node, and the other nodes are negative sample nodes. Then using the

f_{1}

respectively for the

θ^{v_{i}}

and

e_{u}

to obtain the partial derivation, and the parameter update equation is obtained as

θ^{v_{i}} : = θ^{v_{i}} + μ [L^{v_{i}} (u) - σ (e_{u} {\cdot θ}^{v_{i}})] \cdot e_{u} .

(16)

The embedding vector for each node in the context of

w (u)

is updated with the equation

w (u) : = w (u) + μ \sum_{v_{i} \in \{v\} \cup N E G (v)} [L^{v_{i}} (u) - σ (e_{u} \cdot θ^{v_{i}})] \cdot θ^{v_{i}},

(17)

where

μ

denotes the learning rate. After obtaining the updated forms of

θ^{v_{i}}

and

e_{u}

θ, the objective function can be optimized iteratively. The weights

ρ

need to be multiplied into Equation (14) before

μ

to balance the network structure model and the text feature model.

e_{u}

denotes the sum of word representation vectors, and

w (u)

represents the representation vector of words. The attention function

a t t

is used to balance the contribution rate of different co-occurrence words to the model.

When context words act as context nodes, the sum of the context vectors is

e_{u}

, which is calculated as follows:

e_{u} = \sum_{j = 1}^{|S|} a t t (w_{j}) \cdot d_{j},

(18)

a t t (w_{j})

is the attention weight of the word

w_{j}

for the target node,

d_{j}

is the representation vector of the word

w_{j}

, and

|S|

is the number of words in the node’s texts.

The attention function (

a t t

) is calculated as follows:

a t t (w_{j}) = \frac{e x p (d_{j} \cdot w (u))}{\sum_{k = 1}^{| S |} e x p (d_{j} \cdot w (u))} .

(19)

Attention weights are computed from representation vectors of co-occurrence words and representation vectors of target nodes.

(3): Triplet relationship modeling

Inspired by the translation mechanism of the TransE algorithm, we can represent the interaction between nodes as a translation operation in the space. TransE algorithm, as a classical model of relationship embedding in the knowledge graph, is based on the principle of mapping the entities and relationships to the representation vectors in the low-dimensional space, which predicts the relationships between the entities through the calculation of the distance between the vectors. Meanwhile, the TransE algorithm provides an interpretable and easy-to-understand relationship representation. In this paper, we use the co-occurring words in the text of the network nodes to construct the triplet. If the text of two nodes contains a co-occurrence word, then the word is treated as the relationship between the two nodes. For example, if the text of node 1 and node 2 both contain “Network”, then the node triplet (Node1, Network, Node2) can be constructed. The triplets constructed in this way are not independent of each other but are related to each other in the whole network. These relationships can portray the semantic features of the target network structure to a certain extent. We refer to these interrelated triplets as knowledge triplets.

Define the target word (

w_{i}

) of the knowledge triplet as follows:

C (w_{i}) = \{(r, t)| \forall r, t, (w_{i}, r, t) \in G\},

(20)

For example, if the text of Node 1 is “Network Representation Learning” and the text of Node 2 is “Network Embedding Learning”, we can construct two triplet relationships: (Node1, Network, Node2) and (Node1, Learning, Node2). By using a knowledge triplet containing the target word

w_{i}

, the likelihood of

w_{i}

appearing in a given context can be predicted. As relational triplets continue to appear, the representation vectors of the head entity (h), relationship (r), and tail entity (t) in the triplet network are continuously adjusted. It can be found that the knowledge triplet incorporates the co-occurring features of the text of the network nodes, which helps the model to express the semantic similarity of the text between the network nodes.

In the TransE model, the representation learning model needs to define a rating function

f (h, r, t) = | |h + r - t| |

for users to assess the credibility of a triplet. The higher the score, the greater the probability that the triplet is a factual feature and the higher the credibility. For the triplet set

(h, r, w_{i})

constructed by the target word

w_{i}

, if the triplet is a factual feature, then

h + r \approx w_{i}

, and it shows that the corresponding vector

h + r

should be closer to

w_{i}

. In this paper, knowledge triplets are introduced into the objective function of modeling, and the probability of existing factual triplets is

f (h, r, t) = P ((h, r, t)| C (h, r, t)) .

(21)

We factorize this equation with the conditional probability equation as follows:

f (h, r, t) = P (h| C (h, r, t)) \cdot P (t| C (h, r, t), h) \cdot P (r| C (h, r, t), h, t) .

(22)

The first part of the above equation,

P (h| C (h, r, t))

, represents the occurrence conditional probability of the head entity (

h

) under the given knowledge triplet

C (h, r, t)

. Due to the direct correlation between the head entity and its neighbors,

P (h| C (h, r, t)) \approx P (h| C_{n} (h))

, inspired by reference [9], can be represented as a Softmax function:

P (h| C_{n} (h)) = \frac{e x p (a t t \cdot d_{n} (h, C_{n} (h)))}{\sum_{h^{'} \in V} e x p (a t t \cdot d_{n} (h^{'}, C_{n} (h)))},

(23)

where

d_{n} (., .)

is used to measure the degree of association between

h^{'}

and

h

, and

d_{n} (h^{'}, C_{n} (h))

is defined as follows:

d_{n} (h^{'}, C_{n} (h)) = \frac{1}{{| C}_{n} (h) |} \sum_{(r, t) \in C_{n} (h)} | | h^{'} \circ r \circ t | |,

(24)

where “

\circ

” is the Hadamard product,

{| C}_{n} (h) |

denotes the number of knowledge triplets, and

| | \cdot | |

denotes first-order normal form and second-order normal form. It has been shown in literature 3 that combining operations on vectors of entities and relations is better using multiplication than addition. So in this paper, we use the Hadamard product to measure the similarity degree between them. Finally, attention mechanisms are added to all triplet relationships in

h^{'}

and

C_{n} (h)

, and the learned attention parameter is used to adjust the weight of each triplet relation.

The second part of Equation (22),

P (t| C (h, r, t), h)

, represents the conditional probability of the occurrence of tail entity (

t

) under the conditions of having head entity (

h

) and knowledge triplet

C (h, r, t)

. This paper uses co-occurrence words between entities to measure the degree of association between head and tail entities and sets

P (t| C (h, r, t), h) \approx P (t| C_{n} (h, t), h)

, where

C_{n} (h, t)

represents the knowledge triple of entity pair

h

and

t

, and

P (t| C_{n} (h, t), h)

represents a softmax function.

P (t| C_{n} (h, t), h) = \frac{e x p ({a t t \cdot d}_{n} (t, C_{n} (h, t)))}{\sum_{r^{'} \in E, t^{'} \in V} e x p ({a t t \cdot d}_{n} (t^{'}, C_{n} (h, t)))},

(25)

where

d_{n} (., .)

is used to measure the degree of association between

t^{'}

and

C_{n} (h, t)

,

d_{n} (t^{'}, C_{n} (h, t))

is defined as follows:

d_{n} (t^{'}, C_{n} (h, t)) = \frac{1}{{| C}_{n} (h, t) |} \sum_{n \in C_{n} (h, t)} | | h \circ r \circ t^{'} | |,

(26)

{| C}_{n} (h, t) |

denotes the number of knowledge triplets. Finally, the attention mechanism is added to all the triplet relations in the

C_{n} (h, t)

, and the learned attention parameters are used to adjust the weights of each triplet relation, which is regarded as the degree of association between

t^{'}

and

C_{n} (h, t)

.

P (r| C (h, r, t), h, t)

represents the occurrence conditional probability of relationship (

r

) under the conditional of having head entity (

h

), tail entity (

t

), and knowledge triplet

C (h, r, t)

. Since the entities

h

and

t

have been determined, the knowledge triplet has been introduced. Therefore, the knowledge triplet

C (h, r, t)

in

P (r| C (h, r, t), h, t)

can be omitted and formalized as

P (r| h, t) = \frac{e x p (a t t \cdot d_{n} (h, r, t))}{\sum_{r^{'} \in E} e x p ({a t t \cdot d}_{n} (h, r^{'}, t))},

(27)

where

d_{n} (., ., .)

is used to measure the degree of association between

r

and entity pairs

(h, t)

, defined as follows:

d_{n} (h, r, t) = ||h \circ r \circ t|| .

(28)

The knowledge triplet features are brought into the model, Equation (21) can be approximated as

f (h, r, t) \approx P (h| C_{n} (h)) \cdot P (t| C_{n} (h, t), h) \cdot P (r| h, t) .

(29)

The MRNR model uses the triplet establishment probability as a triplet scoring function to maximize the joint probability of all triplets. We define the objective function as follows:

g_{R} (v) = \prod_{(h, r, t) \in G} f (h, r, t) .

(30)

The objective function is still optimized by negative sampling so that the likelihood function of the triplet model is

\begin{array}{l} g_{R} (v) = \prod_{(h, r, t) \in G} f (h, r, t) \\ \approx \prod_{(h, r, t) \in G} P (h| C_{n} (h)) \cdot P (t| C_{n} (h, t), h) \cdot P (r| h, t) \\ \approx \prod_{(h, r, t) \in G} [σ (d_{n} (h, C_{n} (h))) \\ \cdot \prod_{h^{'} \in H^{-}} σ (- d_{n} (h^{'}, C_{n} (h)))] \cdot \prod_{(h, r, t) \in G} [σ (d_{n} (t, C_{n} (h, t))) \\ \cdot \prod_{t^{'} \in T^{-}} σ (- d_{n} (t^{'}, C_{n} (h, t)))] \\ \cdot \prod_{(h, r, t) \in G} [σ (d_{n} (h, r, t)) \cdot \prod_{r^{'} \in R^{-}} σ (- d_{n} (h, r^{'}, t))], \end{array}

(31)

where

h^{'}

,

r^{'}

and

t^{'}

are negative samples of the head entity, relation, and tail entity in the triplet, respectively.

H^{-}

,

R^{-}

, and

T^{-}

are the sets of these types of negative samples, respectively.

h^{'}

is obtained by replacing the head entity of a triplet with any other entity.

σ (x)

is the Sigmoid function. To facilitate parameter optimization, Equation (30) is converted to negative logarithmic form.

\begin{array}{l} L (v) = βlog g_{R} (v) = β \sum_{(h, r, t) \in G} \log f (h, r, t) \\ \approx - β \sum_{(h, r, t) \in G} \log P (h| C_{n} (h)) + \log P (t| C_{n} (h, t), h) + \log P (r| h, t) \\ \approx - β \sum_{(h, r, t) \in G} [\log σ (d_{n} (h, C_{n} (h))) + β \sum_{h^{'} \in H^{-}} \log σ (- d_{n} (h^{'}, C_{n} (h)))] \\ + β \sum_{(h, r, t) \in G} [\log σ (d_{n} (t, C_{n} (h, t))) + β \sum_{t^{'} \in T^{-}} \log σ (- d_{n} (t^{'}, C_{n} (h, t)))] \\ + β \sum_{(h, r, t) \in G} [\log σ (d_{n} (h, r, t)) + β \sum_{r^{'} \in R^{-}} \log σ (- d_{n} (h, r^{'}, t))] . \end{array}

(32)

Equation (31) is the objective function of network node triplet relationship modeling. The calculation method for the update equation of each parameter is still similar to the modeling of network node relationships and node’s text relationships, which is optimized using the random gradient rise method.

4. Experimentation

4.1. Data Sets

In order to evaluate the effectiveness of MRNR, this paper uses three citation network datasets, which include the academic network datasets Citeseer, DBLP, and SDBLP. SDBLP removes the nodes where the number of citations is less than 3 for DBLP. After performing operations such as deleting isolated nodes from the original dataset, the statistical data of the dataset is shown in Table 2.

It can be found that the density of the network is getting higher. We use Citeseer, DBLP, and SDBLP to simulate three types of network datasets with different attributes to verify the generalization ability of the MRNR algorithm.

4.2. Introduction to the Contrast Algorithm

(1): DeepWalk

DeepWalk originates from the Word2Vec algorithm. The DeepWalk algorithm is the most classic network embedding learning algorithm based on neural networks, and many subsequent network embedding learning algorithms are mainly based on the DeepWalk algorithm.

(2): LINE

LINE is an unsupervised learning graph embedding algorithm proposed by Tang et al. in 2015, which aims to embed nodes into low-dimensional vector spaces for analysis in machine learning and data mining tasks. The LINE algorithm has two main variants: The first one is based on the neighboring relationship between nodes, which learns the similarity between neighboring nodes by maximizing their first-order similarity. The second one is based on the neighboring relationships, which learns the second-order similarity of nodes by maximizing the similarity between second-order neighboring nodes. Ultimately, the LINE algorithm combines the first-order and second-order similarities of the nodes to produce the final embedding vectors of the nodes.

(3): GraRep

GraRep is a graph embedding algorithm based on higher-order proximity. GraRep is different from traditional first-order proximity (DeepWalk, LINE, etc.), which considers higher-order proximity between nodes. The model uses the features of nodes in their neighbors with different hop counts to construct higher-order adjacency matrices and performs Singular Value Factorization (SVD) of these matrices to generate embedding vectors. The GraRep’s advantage is the ability to capture complex relationships between nodes using higher-order proximity between nodes to generate richer embedding representations.

(4): MFDW

The MFDW algorithm uses the idea of DeepWalk based on matrix factorization to compute the representation vector of each node, which is a simplified algorithm of DeepWalk.

(5): Text Feature (TF)

The Text Feature algorithm is a graph embedding algorithm based on text features. The Text Feature algorithm first converts the text features of each node into a text feature matrix using a text feature extraction technique, and then it maps the text feature matrix into a low dimensional space using the SVD algorithm to obtain the representation vector of each node.

(6): TADW

TADW is a graph embedding algorithm that combines topology and node attributes. TADW is a matrix factorization algorithm that combines a matrix (

M

) with the text matrix (

T

) for joint factorization, which improves the expressiveness and generalization of the embedding vectors.

4.3. Experimental Setup

This paper evaluates the MRNR algorithm and the comparison algorithm using a network node classification task with Liblinear as the baseline classifier. In order to test the generalization ability of the algorithms, we set 9 training set scales from 0.1 to 0.9 and use the remaining nodes as the test set. We set the network embedding vectors obtained by all algorithms uniformly to 100 dimensions. We set the length of random walks as 40, the number of random walks as 10, the window size as 5, negative sampling as 5, the minimum node frequency as 5, and the learning rate of the neural network as 0.05. In the experiment, we will repeat the training 10 times and take the average of the experimental results as the final result. We fix the

ρ

and

β

as 0.5 in the node classification task.

4.4. Analysis of Experimental Results

In this paper, we use real network datasets Citeseer, DBLP, and SDBLP as evaluation datasets. We will take 10% to 90% of the data as the training set and the remaining data as the test set. Table 3 shows the accuracy of the comparison algorithm and MRNR algorithm in network node classification tasks under 3 datasets and 9 different training set ratios.

Based on the given three datasets, the experimental results show that the MRNR algorithm achieves the best performance at all proportions, and its accuracy is higher than other compared methods at different data proportions. For example, for the Citeseer dataset, MRNR achieves an accuracy of 79.93% at 90% data proportion, while the highest accuracy of other comparison methods is only 64.30% (MFDW). On the DBLP and SDBLP datasets, the accuracy of MRNR achieves 85.33% and 85.63% at 80% data proportion, which are higher than the highest accuracy of other comparison methods.

Taking the Citeseer dataset as an example, it can be found that the MRNR algorithm achieves a better performance under different proportions of the training set. Especially under the proportion of 90% of the training set, MRNR’s accuracy reaches 79.93%, which is much higher than other algorithms. In the comparison algorithms, MFDW is a matrix factorization form of DeepWalk and its accuracy reaches 64.30%; the LINE algorithm has the worst network node classification performance. The MRNR algorithm optimizes the process of modeling network structural features and adds node’s text features and triplet features, which outperforms the DeepWalk algorithm.

In the DBLP dataset, the classification performance of DeepWalk is slightly inferior to LINE and GraRep, and the MFDW algorithm has a better classification performance than DeepWalk. As the proportion of the training set increases, the network node classification performance of Text Feature is increasingly superior to DeepWalk. The MRNR still performs best in high-proportion training sets, and its accuracy reaches 84.89%.

In the SDBLP dataset, the MRNR algorithm also performs well with different proportions of training sets. Especially in a high proportion of training sets, the MENR algorithm’s accuracy reaches 84.53%. In the comparison algorithms, the classification performance of the GraRep algorithm and the MRNR algorithm both obtained better performance due to the dense networks.

Overall, the MRNR algorithm performs well in different datasets and training set ratios and even outperforms the comparison algorithms in some cases. This shows that the algorithm proposed in this paper has good potential for node classification problems. We compare representation learning methods using network structures and representation learning methods using text features. The experimental results showed that MRNR significantly improved its network node classification performance after adding text features. The MRNR algorithm proposed in this paper exhibits excellent node classification performance on sparse networks such as Citeseer. The main reason is that MRNR introduces text features of nodes and assigns different weights to different co-occurrence words and triples in the text, allowing more features to participate in learning and training during model modeling. On dense datasets such as SDBLP, the network structure features are very rich, so the performance differences between different algorithms are relatively smaller.

As shown in Figure 3, the span of network node classification accuracy values of the five comparative algorithms on the Citeseer and DBLP datasets is greater than their values on the SDBLP dataset. On the SDBLP dataset, the network node classification accuracies of the five comparison algorithms show a significant upward trend. However, on the Citeseer and DBLP datasets, the accuracy curves show a relatively slow upward trend. The main reason for this phenomenon is that on sparse networks, there is a large variability in the features acquired between different algorithms. If a certain algorithm can acquire features that more fully reflect the network structure, its network classification performance becomes better. On such sparse networks, the network representation learning algorithms based on joint learning can compensate for the insufficient training problem due to the sparsity of the network. Regarding dense networks, different algorithms can obtain effective network structure features from sufficiently connected edges, and thus, the differences in classification performance are smaller.

4.5. Network Embedding Visualization

The main purpose of network representation visualization is to examine whether the trained representation vectors exhibit a significant clustering phenomenon, which can be considered whether the network representation has learned the community features of the network. If the community division based on network representations is more accurate, it has better reliability in network node classification tasks. This paper randomly selects four classes of nodes from the Citeseer dataset and 150 nodes from each class. At the same time, we use the t-SNE algorithm to degrade and visualize the network representation vectors. The specific results are shown in Figure 4.

From Figure 4, it can be found that this paper visualizes some of the network representation vectors obtained from DeepWalk and the MRNR model trained on the three datasets: Citeseer, DBLP, and SDBLP. From Table 2, it can be seen that DeepWalk performs poorly in network representation classification tasks; therefore, DeepWalk also exhibits the same results in visualization tasks. The MRNR model shows significant improvement in network representation and classification performance in three datasets. Therefore, the representation vectors of all three types of nodes exhibit obvious clustering phenomena and clustering boundaries.

4.6. Case Analysis

In order to verify the feasibility of the model, the target node is randomly selected in the Citeseer dataset. The text of this node is “Quantum Field Theory as Dynamical System”. We get the three nodes with the highest similarity to the target node. Then, we analyze the correlation between the text of the target node and the text of the three nodes. As shown in Table 4, the DeepWalk model only considers the network structure feature similarity, and it does not consider the node’s text feature similarity. Therefore, the texts of the three nodes do not all contain the same words as the target node text. The TADW model and MRNR consider both the network structure feature and the node’s text features, so the returned similar nodes all have the same co-occurring words. This case cannot explain which algorithm has better machine learning performance, and it only explains that the MRNR algorithm is able to model both network structural features and textual features.

5. Summary

In this paper, we propose a network representation learning algorithm (MRNR) that jointly models network structure features, node’s text features, and knowledge triplet features. Its goal is to import textual features as supervision information in the training process based on structural feature modeling, letting the node representation vectors obtained learn similar textual features. In the process of network modeling, if the structural relationships of the nodes are only considered, the learned representation vectors ignore the semantic relationships between the nodes. In order to consider the semantic relationships between nodes, the text features and knowledge triplet features are introduced into the proposed model. Meanwhile, we introduce an attention mechanism in order to involve valuable features, resulting in the more valuable features having higher weights. In different machine learning tasks, the machine learning performance of the MRNR algorithm based on the remodeling of node attributes is significantly better than the network representation learning algorithm using a single feature. For texts and triplets, adding attention mechanisms based on network structural features can improve the machine-learning performance of network representation. The MRNR algorithm proposed in this paper is currently unable to handle the situation of missing node’s text features. Therefore, the MRNR algorithm will mainly solve this type of problem, and at the same time, new text feature embedding algorithms will be studied in the future.

Author Contributions

Methodology, Z.Y.; Supervision, Z.L.; Writing—original draft, W.Z. and B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This article is supported by the National Key Research and Development Program of China (No. 2020YFC1523300), the Innovation Platform Construction Project of Qinghai Province (No. 2022-ZJ-T02) and Research Fund of Qinghai Normal University: 2023QZR007.

Data Availability Statement

Data available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tu, C.C.; Yang, C.; Liu, Z.Y.; Sun, M.S. Network representation learning: An overview. Sci. Sin. Informationis 2017, 47, 980–996. [Google Scholar]
Zhang, D.K.; Yin, J.; Zhu, X.Q.; Zhang, C.Q. Network representation learning: A survey. IEEE Trans. Big Data 2018, 6, 3–28. [Google Scholar] [CrossRef]
Yang, C.; Xiao, Y.X.; Zhang, Y.; Sun, Y.Z.; Han, J.W. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Trans. Knowl. Data Eng. 2020, 34, 4861–4867. [Google Scholar] [CrossRef]
Cui, P.; Wang, X.; Pei, J.; Zhu, W.W. A survey on network embedding. IEEE Trans. Knowl. Data Eng. 2018, 31, 833–852. [Google Scholar] [CrossRef]
Han, Z.M.; Liu, D.; Zheng, C.Y. Coupling network vertex representation learning based on network embedding method. Chin. Sci. Inf. Sci. 2020, 50, 1197–1216. [Google Scholar]
Wang, D.X.; Cui, P.; Zhu, W.W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Mikolov, T.; Karafiát, M.; Burget, L.; Černocký, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, 5–10 December 2013. [Google Scholar]
Liu, Z.Y.; Sun, M.S.; Lin, X.K.; Xie, R.B. Knowledge representation learning:a review. J. Comput. Res. Dev. 2016, 2, 247–261. [Google Scholar]
Feng, J. Knowledge Graph Embedding by Translating on Hyperplanes; American Association for Artificial Intelligence: Beijing, China, 2014. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.Z.; Zhang, M.; Yan, J. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015. [Google Scholar]
Derr, T.; Ma, Y.; Tang, J. Signed Graph Convolutional Network. arXiv 2018, arXiv:1808.06354. [Google Scholar]
You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph Contrastive Learning with Augmentations. arXiv 2020, arXiv:2010.13902. [Google Scholar]
Cao, S.S.; Lu, W.; Xu, Q.K. GraRep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, 18–23 October 2015. [Google Scholar]
Wang, X.; Ji, H.Y.; Shi, C.; Wang, B.; Ye, Y.F.; Cui, P. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Thirty-First Conference on Neural Information Processing Systems, Long Beach Conventio, CA, USA, 4–9 December 2017. [Google Scholar]
Thekumparampil, K.K.; Wang, C.; Oh, S.; Li, L.J. Attention-Based Graph Neural Network for Semi-Supervised Learning. arXiv 2018, arXiv:1803.03735. [Google Scholar]
Cen, Y.K.; Zou, X.; Zhang, J.W.; Yang, H.X.; Zhou, J.R.; Tang, J. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, Alaska, AK, USA, 4–8 August 2019. [Google Scholar]
Li, J.; Liu, Y.; Zou, L. DynGCN: A dynamic graph convolutional network based on spatial-temporal modeling. In Proceedings of the 21st International Conference, Amsterdam, The Netherlands, 20–24 October 2020; Springer: Cham, Switzerland, 2020; Volume 12, pp. 83–95. [Google Scholar]
Dong, W.; Wu, J.S.; Luo, Y.; Ge, Z.Y.; Wang, P. Node representation learning in graph via node-to-neighbourhood mutual information maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Zhao, Y.; Feng, H.L.; Zhou, H.; Yang, Y.R.; Chen, X.Y.; Xie, R.B.; Zhuang, F.Z.; Li, Q. EIGAT: Incorporating global information in local attention for knowledge representation learning. Knowl. -Based Syst. 2022, 237, 107909. [Google Scholar] [CrossRef]
Chang, Y.M.; Chen, C.; Hu, W.B.; Zheng, Z.B.; Zhou, X.C.; Chen, S.Z. Megnn: Meta-path extracted graph neural network for heterogeneous graph representation learning. Knowl. -Based Syst. 2022, 235, 107611. [Google Scholar] [CrossRef]
Sun, D.D.; Li, D.S.; Ding, Z.L.; Zhang, X.Y.; Tang, J. Dual-decoder graph autoencoder for unsupervised graph representation learning. Knowl. -Based Syst. 2021, 234, 107564. [Google Scholar] [CrossRef]
Yang, B.; Yih, W.T.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. arXiv 2015, arXiv:1412.6575. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D knowledge graph embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Yu, C.; Zhang, Z.; An, L.; Li, G. A knowledge graph completion model integrating entity description and network structure. Aslib J. Inf. Manag. 2023, 75, 500–522. [Google Scholar] [CrossRef]
Liu, W.Q.; Cai, H.Y.; Cheng, X.; Xie, S.F.; Yu, Y.P.; Zhang, D.K. Learning high-order structural and attribute information by knowledge graph attention networks for enhancing knowledge graph embedding. Knowl. -Based Syst. 2022, 250, 109002. [Google Scholar] [CrossRef]
Wei, H.Y.; Liang, J.; Liu, D.; Wang, F. Contrastive graph structure learning via information bottleneck for recommendation. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 6–14 December 2021. [Google Scholar]
Cai, X.H.; Huang, C.; Xia, L.H.; Ren, X.B. LightGCL: Simple yet effective graph contrastive learning for recommendation. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]

Figure 1. Learning target of MRNR algorithm model.

Figure 2. Model of MRNR algorithm.

Figure 3. Performance comparisons of six algorithms on three datasets.

Figure 4. Visualization results of the 3 datasets on DeepWalk and MRNR (Different colors represent different labels of nodes).

Table 1. Symbol table.

Symbols	Meaning
G = (V, E)	inputted network
V	set of nodes
E	the set of edges
\|V\|	the number of nodes
R_v	the trained network representation vector
att	attention parameter
T	labels of the nodes v
$\log g_{c} (v)$	the loss of network node relationship modeling
$ρ \log g_{w} (v)$	the loss of node’s text word features and attention mechanism learning
$N E G (v)$	negative sample set of node $v$
$L_{c} (v)$	sampling label of node $v$
$v_{n b}$	the context node of node $v$
$D$	the set of nodes
$σ (x)$	Sigmoid function
C	node’s sequence corpus
$μ$	the learning rate
$e_{u}$	the sum of word representation vectors
$w (u)$	the word representation vectors
$a t t (w_{j})$	the attention weight of the word $w_{j}$
$h$	head entity
$t$	tail entity
$r$	relation entity
$C (h, r, t)$	knowledge triplet

Table 2. Dataset statistics.

Data Set	Number of Nodes	Number of Sides	Average Degree	Training Set	Validation Set	Test Set
Citeseer	4610	5923	2.57	1333	667	2610
DBLP	17,725	105,781	11.926	3333	1667	12,725
SDBLP	3119	39,516	25.339	666	334	3119

Table 3. Comparison results on the node classification task.

Data Set	Methods of Comparison	Percentage of Data
Data Set	Methods of Comparison	10%	20%	30%	40%	50%	60%	70%	80%	90%
Citeseer	DeepWalk	55.89	59.30	60.89	61.48	62.19	62.30	62.62	62.33	63.95
	LINE	42.64	47.06	48.04	49.57	50.43	51.02	51.18	53.07	53.63
	GraRep	39.38	53.09	57.85	59.75	59.97	61.05	61.57	62.09	60.89
	MFDW	57.62	60.79	62.33	63.05	62.96	63.00	63.00	63.48	64.30
	Text Feature	57.69	61.30	62.76	63.05	63.48	63.30	62.87	62.19	63.95
	MRNR	75.76	77.92	78.56	79.15	79.62	79.12	79.18	79.22	79.93
DBLP	DeepWalk	62.26	64.34	65.42	65.98	66.24	66.18	66.60	67.03	66.77
	LINE	64.49	66.53	67.49	67.87	67.98	68.30	69.03	68.89	68.86
	GraRep	58.92	65.92	67.26	67.92	68.77	68.88	69.26	69.56	69.79
	MFDW	65.02	74.68	74.88	75.02	75.05	75.13	75.22	74.57	75.51
	Text Feature	66.17	69.46	70.49	71.15	71.29	71.44	71.54	71.57	71.83
	MRNR	81.96	81.82	83.96	84.04	84.88	85.02	85.37	85.33	84.89
SDBLP	DeepWalk	79.76	80.65	81.88	81.49	82.56	82.35	82.73	82.71	83.37
	LINE	73.79	77.01	78.11	81.49	79.31	78.97	79.63	78.82	78.77
	GraRep	80.99	82.52	84.14	84.78	84.97	84.17	85.36	85.27	84.95
	MFDW	79.79	83.08	84.38	84.12	84.53	84.29	84.70	84.55	84.53
	Text Feature	65.03	71.23	72.64	73.86	74.54	75.07	75.14	76.00	75.33
	MRNR	82.02	82.66	83.85	84.34	84.66	84.56	84.68	85.63	85.37

Table 4. Case analysis (Citeseer dataset).

Algorithm	Vertex Title
DeepWalk	Statistics Localization Regions and Modular Symmetries in Quantum Field Theory
	Extensions of Conformal Nets and Super Selection Structures
	Modular Covariance Pct Spin and Statistics
TADW	Remarks on Causality in Relativistic Quantum Field Theory
	Modular Covariance Pct Spin and Statistics
	An Algebraic Spin and Statistics Theorem
MRNR	Higher-dimensional Algebra and Topological Quantum Field Theory
	Statistics Localization Regions and Modular Symmetries in Quantum Field Theory
	Quantum Field Theory as Eigenvalue Problem Manuscript

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Cui, B.; Ye, Z.; Liu, Z. A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes. Mathematics 2023, 11, 4788. https://doi.org/10.3390/math11234788

AMA Style

Zhang W, Cui B, Ye Z, Liu Z. A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes. Mathematics. 2023; 11(23):4788. https://doi.org/10.3390/math11234788

Chicago/Turabian Style

Zhang, Wei, Baoyang Cui, Zhonglin Ye, and Zhen Liu. 2023. "A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes" Mathematics 11, no. 23: 4788. https://doi.org/10.3390/math11234788

APA Style

Zhang, W., Cui, B., Ye, Z., & Liu, Z. (2023). A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes. Mathematics, 11(23), 4788. https://doi.org/10.3390/math11234788

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Network Representation Learning Model Based on Multiple Remodeling of Node Attributes

Abstract

1. Introduction

2. Related Work

3. Algorithm Design

3.1. Definitions

3.2. MRNR Modeling

4. Experimentation

4.1. Data Sets

4.2. Introduction to the Contrast Algorithm

4.3. Experimental Setup

4.4. Analysis of Experimental Results

4.5. Network Embedding Visualization

4.6. Case Analysis

5. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI