1. Introduction
Graph embedding is a powerful technique for representing graphs with a wide range of applications in various domains. For instance, node-clustering tasks group together similar news articles [
1], link prediction tasks anticipate associations between users in social networks [
2], and product recommendation tasks reveal users’ preferences on shopping platforms [
3]. However, most current research on graph embedding focuses on context-free methods that rely solely on node representations based on local or global features [
4,
5,
6]. For example, Perozzi et al. [
7] introduced the DeepWalk method, which employs random walks to capture neighborhood structures and maximizes the probability of neighboring nodes appearing in the walk sequence. Sheikh et al. [
8] investigated using attributes and structures to learn node representations. Pan et al. [
9] proposed a method for jointly learning optimal node representations by leveraging network structure, node content, and node labels. Importantly, in numerous cases, particularly for sparse graphs, context-free graph-embedding methods may result in the omission of crucial details, consequently diminishing the performance of network analysis tasks.
Incomplete representations of a node’s single context have prompted researchers to propose a context-sensitive method known as context-sensitive graph embedding. This method captures the contexts where the nodes exist and learns distinct representations of the nodes in various scenarios, i.e., the representations of the node change according to the target task. Existing context-sensitive representation methods depend on additional textual information, the graph structure itself, and the information of the neighbors. For example, Tu et al. [
10] were the first to propose a mutual attention mechanism based on textual embedding. This mechanism emphasizes the interactions between a node and its neighbors and advocates for separate embeddings for nodes during these interactions. Gracious et al. [
11] combined mutual and topological attention mechanisms and proposed a diffusion graph for text graph embedding. This graph captures the semantic relatedness between texts by leveraging the global structural information of the graph. Epasto et al. [
12] encoded multiple representations of a node by leveraging its roles in distinct local communities through ego-network decomposition. Kefato et al. [
13] utilized an attentional pooling network to determine the personalized importance of a node’s neighbors. Qin et al. [
14] proposed using GCN to assist in obtaining high-quality node representations.
We aim to generate high-quality context-sensitive node representations by focusing solely on the structural information of nodes. Our work is based on the Graph Attentional Pooling network (GAP) [
13], which could view as a generalization of it. The model suggests the source node’s representation using a mutual attention mechanism that responds to changes in the target node, thereby achieving context sensitivity. To describe more clearly how GAP works, we use a graph with some points and edges to illustrate how it works. For example, in
Figure 1, node 5 and node 6 have first-order neighbor sets of
and
, respectively. During interactions between nodes 5 and 6, the model primarily focuses on the same neighbors
while ignoring or paying little attention to other neighbors. Although the relationship between node 6’s neighbor
and node 5’s neighbors
exists, it is not considered during the model training process. Therefore, the mutual attention mechanism between nodes could result in the model missing important information necessary for achieving accurate node representation, such as implicit topological structure information.
We propose a novel Graph Embedding with Similarity Metric Learning (GESML), a more expressive mutual attention method to obtain more topological information about node neighborhoods. GESML employs an attention-based symmetric similarity measurement function in metric learning that can learn the similarity matrix between pairs of nodes. This method can capture the optimal graph structure and establish the relationship between node representation and the target task. GESML consists of three modules: a graph structure learning module, a graph representation module, and a prediction module. The graph structure learning module aims to obtain embedding representations of the source and target nodes by sampling from the graph and then generating the similarity matrix based on the attention-based symmetric similarity measurement function. The graph representation module encodes from the perspectives of the source node and the target node to generate the embedding representation of pairs of nodes. The prediction module calculates the soft alignment score using the embedding representation of the source and target nodes.
We achieved state-of-the-art performance in link prediction and node clustering tasks on three datasets. Furthermore, the results illustrated that introducing attention-based symmetric similarity measurement functions and the top-k pooling operation can significantly enhance the embedding ability. In summary, our contributions are two-fold:
We propose a simple embedding representation method that combines an attention-based symmetric similarity measurement function and top-k pooling.
Experimental results on three datasets illustrate that GESML is significantly better than SOTA methods.
2. Related Work
Most existing research on graph embedding still focuses on context-free graph-embedding methods, i.e., using only local or global features to accomplish node representation. For example, Perozzi et al. [
7] introduced the DeepWalk method to address the challenge of defining neighborhood structures. This method utilizes random walks to capture neighbors’ structures and enhances the likelihood of neighboring nodes appearing in each random walk sequence by implementing the Skip-Gram approach. Building upon the foundations of DeepWalk, Grover et al. [
15] introduced the Node2Vec method, which establishes a versatile notion of node graph neighborhoods. Node2Vec formulates a second-order random walk strategy to sample neighboring nodes, enabling a seamless transition between breadth-first sampling (BFS) and depth-first sampling (DFS). In addition to addressing neighborhood structures, Tang et al. [
16] proposed LINE, a large-scale network embedding model that preserves first-order and second-order proximities. To overcome the limitation of the expressive power of LINE, Wang et al. [
17] presented a deep model called SDNE for graph embedding. It also endeavors to capture both first-order and second-order proximities. Additionally, numerous works concentrate on the fusion of graph embedding with topological structure. For example, Yang et al. [
18] introduced TADW, which incorporates much information (such as text) from nodes while acquiring low-dimensional representations. Pan et al. [
9] devised a coupled deep model encompassing graph structure, node attributes, and node labels within graph-embedding techniques. Huang et al. [
19] put forth attribute graph embedding (for both nodes and features) with matrix factorization subject to joint constraints. Kipf et al. [
4] developed a graph autoencoder that naturally integrates graphs’ node attributes and structural information.
The accuracy of a single representation of context-free graph embeddings has been debated among recent works, leading to the proposal of context-sensitive approaches. Context-sensitive graph embedding captures different scenarios a node is in and learns the node’s representation in that scenario. It means that for the same node, the representation changes as the goal task changes. For example, Epasto et al. [
12] devised a method for acquiring multiple representations of nodes through self-network decomposition, which enables the encoding of node roles within distinct local communities. Kefato et al. [
13] employed an attentional pooling network to learn the personalized importance of neighboring nodes. It allows the model to concentrate exclusively on relevant neighbor information and ignore irrelevant neighbor information during interaction processes. Li et al. [
20] introduced a personalized PageRank method to acquire larger receptive fields and higher-quality representations of text embeddings. Several works employ side information to aid in the learning of node representations. Tu et al. [
10] proposed a mutual attention mechanism that integrates both structural and textual information of nodes. This mechanism takes into account contextual information and interaction relationships in node representations. Zhang et al. [
21] introduced a diffusion graph model for text graph embedding that utilizes the global structural information of the graph to capture semantic correlations between texts. Wang et al. [
22] put forth an embedding graph form that concentrates on text. This form encompasses the modeling of text information and network topology. Expanding upon the research by Kefato et al. [
13], we propose a novel graph-embedding model that employs a symmetric similarity measurement function that enables encoding a more significant amount of topological structural information of nodes, thereby obtaining high-quality node embedding representations.
3. Preliminaries
We first briefly overview the essential components of the prior work GAP [
13] before delving into GESML detail. A graph
can be represented as
, where
represents the set of nodes and
denotes the set of edges. The first step in the GAP method involves defining a neighborhood sampling function
. Here,
f maps each node
in the graph to a set of nodes
. One straightforward way of implementing
is by sampling the first-order neighbors of node
u (
). Given the neighbors’ sequences
and
corresponding to node pairs
, where
m denotes the number of sampled neighbors. Applying word embedding techniques, we can generate the embedding vectors.
and
for each
and
.
Figure 2 illustrates the model architecture of GAP.
The processing of GAP can be described as follows:
- 1.
Utilize the sampling function to perform first-order neighbor sampling and employ graph embedding to obtain representations for the source node and target node, denoted as and .
- 2.
Employ a trainable parameter matrix to compute the attention matrix between node pairs . The attention matrix is calculated as .
- 3.
Perform pooling operations on separately to obtain the unnormalized attention vectors: , .
- 4.
Use the to normalize the attention vector and then apply and to obtain the node representations for the source node and the target node .
- 5.
Compute to obtain the score of the node pair .
Applying GAP reveals that the mutual attention mechanism between nodes solely focuses on their common neighbors. This method could overlook crucial information essential for node representations, such as implicit topological and structural information.
4. Graph Embedding with Similarity Metric Learning
Figure 3 illustrates the model architecture of the GESML. It comprises three modules: the graph structure learning module, the graph representation learning module, and the prediction module. The graph structure learning module aims to sample and obtain embedded representations of the source and target nodes from the graph and generate a similarity matrix using an attention-based symmetric similarity metric function. The graph representation learning module encodes the source and target nodes separately to generate embedded representations for node pairs. The prediction module utilizes the embedded representations of the source and target nodes to compute soft alignment scores.
4.1. The Graph Structure Learning Module
The primary objective of the graph structure learning module is to sample and obtain embedded representations of the source and target nodes from the graph and then generate a similarity matrix using an attention-based symmetric similarity metric function. Similar to GAP [
13], the model initially defines a neighbor-sampling function
that maps each node
in the graph to a set of nodes
. One simple method to implement
is by sampling the first-order neighbors of node
u (
). Secondly, the neighbor set for the node pair
is obtained by applying the first-order neighbor-sampling function
f, resulting in
and
, where
m denotes the number of sampled neighbor nodes. Furthermore, for each
and
, we can generate the embedding vectors
and
. Finally, a trainable parameter matrix
and an attention-based symmetric similarity metric function are introduced to calculate the similarity matrix
between node pairs
using
. In this case,
refers to a rectified linear unit function employed to enforce sparsity in the output similarity matrix, while
is applied to obtain a row-normalized similarity matrix.
The similarity matrix represents the degree of association between node pairs. The i-th row vector in is associated with the source node . The element of encodes the similarity between the global embedding of source node and the neighbors () of the target node . Similarly, The j-th column vector in is associated with the target node . The element of encodes the similarity between the global embedding of target node and the neighbors () of the source node . Thus, can be interpreted as a weighted adjacency matrix among the nodes. Additionally, using to enforce sparsity in the output similarity matrix is beneficial due to the high computational costs and potential noise introduced by fully connected graphs, such as irrelevant edges. Enhancing the sparsity of the learned graph structure in this manner is advantageous. Furthermore, the application of for row normalization of satisfies the input requirement of the asymmetric encoder in the graph representation learning module.
4.2. The Graph Representation Learning Module
The goal of the graph representation learning module is to independently encode the source and target nodes and generate embedded representations for node pairs. Firstly, for each row (or column) of the similarity matrix , we compute the unnormalized attention matrices and using top-k pooling operations. These operations identify the top-k target nodes with the highest similarity scores captured by each neighbor of the source node s. The same method is also applied to the target nodes . This method allows the neighborhood sequences of the source and target nodes to influence each other mutually, facilitating the learning of context-sensitive representations for both. Following, the function is applied to obtain and . Finally, the context-sensitive representations for the source and target nodes are computed as and .
4.3. The Prediction Module
The model aims to predict edge similarity by leveraging the dot product of the representations for the source node and the target node, denoted as
, to assess the probability of a connection. The model utilizes the edge margin loss function:
where
denotes the representation of the target node acquired through negative sampling (
), and
computes the average of the results. The model strives to learn context-sensitive node embeddings in an unsupervised manner, where the ranking of positive edges
surpasses negative edges
. Finally, the computational complexity of GESML is proportional to the number of edges in the graph since we consider each edge as a pair of inputs.
5. Experiments
We will conduct extensive experiments on three datasets to evaluate the model performance of GESML. Additionally, we will employ visualizations to supplement the explanation of experimental results.
5.1. Datasets and Benchmarking Methods
Following a similar method to the work by Kefato et al. [
13], GESML will be evaluated using three datasets: Cora [
23], Zhihu [
11], and Email [
24]. Cora is a citation network dataset that papers relationships through citations. Nodes represent papers, and labels represent topics. Node features correspond to the words in the papers, and edges between nodes represent citation relationships. Zhihu, China’s largest online question-and-answer community, focuses on modeling relationships among registered users, with user identification based on their posts. Email represents the communication network system of a prominent European research institution, primarily focused on modeling communication relationships among users within the institution.
Table 1 presents the primary statistical information for these three benchmark datasets.
We aim to facilitate method comparison by initially categorizing all benchmark methods.
Table 2 illustrates the division of 16 benchmark methods into four categories based on two criteria: context sensitivity and the utilization of structural information. Moreover, the models’ performance evaluation primarily involves two types of experiments: link prediction and node clustering. Lastly, all results represent the average of 20 model runs to ensure experimental fairness.
The following is a brief introduction to the benchmark methods:
DeepWalk [
7] captures the neighborhood structure by performing random walks and maximizes the probability of neighboring nodes appearing in the walk sequences using the Skip-Gram approach.
Node2Vec [
15] employs second-order random walk strategies to sample neighboring nodes in the graph, smoothly interpolating between breadth-first and depth-first sampling. Nodes are treated as words, and the Word2Vec algorithm from NLP is used to train the nodes.
WalkLets [
25] is a multi-scale representation learning method for network nodes, which generates multi-scale relationships by performing secondary sampling of short random hashes on the nodes in the graph.
Attentive Walk [
27] learns the network parameters by treating the network parameters as the probability distribution of neighbors in a random walk.
Line [
16] embeds large-scale information networks into low-dimensional vector spaces and designs a method that preserves both local (first-order similarity) and global (second-order similarity) network structures.
TRIDNR [
9] proposes a coupled neural network algorithm that utilizes the relationships between nodes, the relevance of node content, and label content to obtain the optimal representation for each node.
TADW [
18] demonstrates the equivalence of deep walk to matrix factorization and proposes a network learning method that combines text information.
CENE [
26] treats text content as a specific node type and performs node embedding using both node links and node content links.
SPLITTER [
12] introduces Splitter, a novel technique that learns multiple embeddings for individual nodes, enabling a better description of networks with overlapping communities.
GAP [
13] presents a novel context-aware graph-embedding algorithm that utilizes attention pooling networks to focus on different parts of the node neighborhood.
PPR [
20] introduces personalized PageRank methods to obtain larger receptive fields and higher quality text embedding representations.
GOAT [
24] proposes GOAT, a context-aware algorithm inspired by node information propagation and mutual attention mechanisms, which can obtain high-quality context-aware node representations relying solely on graph structure.
CANE [
10] introduces CANE, a mutual attention mechanism that combines node structural information and text information, and the representation of nodes considers contextual information and different interaction relationships.
DMTE [
11] learns low-dimensional vector representations for nodes that contain rich text information relevant to the network.
VHE [
22] considers VHE an embedding graph form that explicitly focuses on text and models text information and network topology.
ACNE [
11] proposes ACNE, an adversarial mechanism that utilizes text-embedding discriminators and structure-embedding generators to learn effective representations.
5.2. Link Prediction
This section presents an evaluation of the performance of GESML on three datasets for link-prediction experiments. Link prediction is a fundamental task in graph embedding that aims to predict missing or future links between pairs of nodes. We use part of the edges for training the model parameters and the rest of the edges to test the effect of the model. The training set encompasses edge proportions ranging from
to
, with gaps of
. A fixed number of neighbor samples is employed across the three datasets. Each node is represented by a 200-dimensional vector, and
Table 3 provides comprehensive information on other experimental parameters. Notably, the results of this experiment are quantified with the AUC score, which indicates the probability of randomly selected node pairs
obtaining higher similarity scores compared to node pairs
. In addition, numbers marked in bold indicate state-of-the-art results.
In general, GESML outperforms the baseline methods on the three datasets.
A notable improvement in link prediction results is observed when the proportion of edges in the training set is relatively small on the three datasets. This phenomenon means the attention-based similarity measurement function can learn more effective graph structures.
GESML exhibits a significant variation in its impact on link prediction results on three datasets. The performance improvement is relatively modest on the Cora. This result can reason by the varying nodes and edges among the three datasets (Cora: ; Zhihu: ; Email: ). The Cora dataset contains relatively less structural information.
5.3. Node Clustering
Node clustering is the process of categorizing samples into predefined classes based on their similarities. This section evaluates the model’s performance on the Email dataset, specifically comparing it with baseline methods that leverage structural information. Like the link prediction experiments, we train the model parameters using a subset of edges and test them on the remaining edges. The training set encompasses edge proportions ranging from
to
with an interval of
while keeping other settings constant. To evaluate the experimental results, we measure the consistency between the given actual labels
y and the predicted labels
using the
and
metrics.
represents the normalized mutual information (
), and
indicates the adjusted mutual information [
28].
Table 7 reveals that GESML outperforms the baseline methods. Thus, the results reinforce the idea that the model initially learns optimal graph structures through an attention-based similarity measurement function. It then leverages top-k pools to simultaneously capture the topological information of nodes and their neighborhoods. Consequently, this method enables encoding decadent topological information from node neighborhoods, leading to enhanced performance.
5.4. Hyperparameter Analysis
In this section, we first analyze the influence of the number of sampled neighbors on the performance of GESML. We conduct link prediction experiments on three datasets to evaluate how the number of sampled neighbors affects the model’s performance. The experimental results are illustrated in
Figure 4,
Figure 5 and
Figure 6.
Observing
Figure 4,
Figure 5 and
Figure 6 reveals that the model’s performance remains unaffected mainly by the variation in the number of sampled neighbors, irrespective of changes in the percentage of edges in training set across the three datasets.
This section conducts clustering task experiments on the Email dataset to ensure experimental comprehensiveness. These experiments aim to investigate the effect of the number of sampled neighbors on the model’s performance.
Figure 7 illustrates the relationship between the number of sampled neighbors and the model’s performance, with a fixed proportion of
of edges in the training set.
The analysis of
Figure 7 indicates that variations in the number of sampled neighbors have a negligible impact on the NMI and AMI consistency metrics in the node clustering task. Based on the results of the link prediction and node clustering experiments regarding the influence of the number of sampled neighbors on model performance, we can infer that the model exhibits insensitivity to this hyperparameter.
Furthermore, we explored the influence of different values of k in top-k pooling. Specifically, we utilized the Cora dataset, with of the edges used for training, to examine various values of k.
The model’s performance improves as k increases within the range of . This suggests that encoding topological structural information can enhance the model’s performance.
The model shows insensitivity to k when .
5.5. Model Limitations Analysis
This section aims to identify several limitations inherent in the current model.
Global Structure of Nodes: The current model primarily emphasizes the first-order neighborhood structure of nodes while neglecting the broader global structure. By incorporating the global network, the model’s receptive field would expand, enabling a more comprehensive acquisition of structural information.
Selection of Metric Learning Methods: The current model incorporates an attention-based similarity measurement function that promotes mutual attention among nodes. However, this approach should align optimally with the model’s primary objective of maximizing the preservation of structural information. We can use an alternative metric learning approach that integrates structural awareness to address this concern. This alternative technique would facilitate the extraction of implicit graph structures from the data, leading to more effective capture of intricate structural patterns.
6. Conclusions
Achieving context-sensitive graph embeddings has been the subject of numerous attempts, yet they have limitations. Some methods need supplementary information, while others use community algorithms to capture multiple contexts. Moreover, some methods require capturing sufficient structural information effectively. This paper presents GESML, a novel graph embedding with a similarity metric learning model incorporating a metric learning method that relieves these limitations. GESML leverages an attention-based symmetric similarity metric function and a TOP-K pooling operation to acquire high-quality node representations, integrating node and structural information from the graph neighborhood as input features. Finally, we assess the effectiveness of GESML through extensive experimentation on three datasets, focusing on its performance in link prediction and node-clustering tasks. The experimental results conclusively demonstrate that GESML outperforms baseline methods in both tasks, firmly establishing its superiority. In the future, our research endeavors will explore cross-fusion among neighboring nodes. These will enhance the capabilities of GESML further and unlock even more promising results.