An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence

Chen, Dongming; Zhang, Shuyue; Zhao, Yumeng; Xie, Mingzhao; Wang, Dongqi

doi:10.3390/math12233644

Open AccessArticle

An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence

by

Dongming Chen

^*

,

Shuyue Zhang

,

Yumeng Zhao

,

Mingzhao Xie

and

Dongqi Wang

Software College, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(23), 3644; https://doi.org/10.3390/math12233644

Submission received: 21 October 2024 / Revised: 18 November 2024 / Accepted: 20 November 2024 / Published: 21 November 2024

(This article belongs to the Special Issue Deep Representation Learning for Social Network Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

The unsupervised attribute graph embedding technique aims to learn low-dimensional node embedding using neighborhood topology and attribute information under unlabeled data. Current unsupervised models are mostly based on graph self-encoders, but full-batch training limits the scalability of the model and ignores attribute integrity when reconstructing the topology. In order to solve the above problems while considering the unsupervised learning of the model and full use of node information, this paper proposes a graph neural network architecture based on a graph self-encoder to capture the nonlinearity of the attribute graph data, and an attribute graph embedding algorithm that explicitly models the influence of neighborhood information using a multi-level attention mechanism. Specifically, the proposed algorithm fuses topology information and attribute information using a lightweight sampling strategy, constructs an unbiased graph self-encoder on the sampled graph, implements topology aggregation and attribute aggregation, respectively, models the correlation between topology embedding and attribute embedding, and considers multi-level loss terms.

Keywords:

graph embedding; graph neural networks; attention mechanisms; graph self-encoders; attribute graphs

MSC:

68T07; 05C82; 05C80; 05C81

1. Introduction

Graph embedding maps the nodes and edges of graph data to a low-dimensional Euclidean vector space, mining potential information to achieve data mining goals and quantify entities and their relationships. This is the basis for tasks such as node classification [1], node clustering [2], link prediction [3], association detection [4], and visualization [5]. Node classification determines node category labels based on labeled data and topology. Network embedding (also known as representation learning) is the process of assigning such an ambient space (called the latent or embedding space) to a network [6]. Node clustering divides tightly linked subsets based on the similarity of nodes in the vector space. The output of graph embedding is used as input to clustering algorithms such as k-means algorithm [7], hierarchical clustering algorithm [8], and spectral clustering algorithm [9] to provide clustering algorithm support. Link prediction predicts the likelihood of links between nodes by learning node relationships. Association detection identifies tightly linked subgraphs in graph data and classifies the data into associations. Visualization demonstrates the network structure to help understand complex relationships.

Attribute graph embedding computes node embeddings based on topology information and attribute information of the neighborhood. Existing unsupervised attribute graph embedding algorithms based on graph neural networks use the neighborhood aggregation mechanism to smooth node information in order to preserve the local similarity between nodes, which does not explicitly differentiate the different impacts of the neighborhood’s topology information and attribute information on node embedding and has some limitations. The oversmoothing problem limits the depth of these models, that is, it limits the model’s ability to aggregate the features of higher-order neighbor nodes [10]. Note that the required nodes may still grow slowly with the increase in depth [11]. The TAGE model proposed in this paper, in order to achieve the full use of node information, uses API-GNN as an encoder, which consists of an attribute aggregation module, a topology aggregation module, and an interaction module, which are used to compute neighborhood attribute embeddings and neighborhood topology embeddings, respectively, and to measure correlation between neighborhood attribute embeddings and neighborhood topology embeddings in order to compute node embeddings.

2. Related Work

Graph data exist widely in the real world, and graph embedding is widely used in graph analytics, recommender systems, and other fields by mapping graph nodes and edges to a low-dimensional vector space, preserving topological and attribute information, and quantifying node similarity. Graph embedding algorithms can be classified into three categories based on matrix decomposition, random wandering, and deep learning.

Matrix decomposition decomposes a matrix into a product of multiple matrices to cover node embedding and node context information in a sequence of matrices [12]. Laplacian eigenmaps achieve dimensionality reduction by an eigenvalue decomposition of Laplacian matrices of graphs, preserving the first-order similarity of directly linked nodes, constructing symmetric normalized weight matrices, and are suitable for both direct push down and inductive learning that emphasizes global consistency. HOPE [13] considers the asymmetry of directed graph information transfer through source and target node embedding, approximates the higher-order similarity matrix using a generalized SVD, and is extensible to undirected graph embedding. Cauchy graph embedding [14] optimizes the LE by using the Cauchy distribution and the inverse function of the Euclidean distance to measure the node similarity and introduces a regularization term that better preserves local topological structure information to handle high-dimensional sparse graph data.

Random wandering as a neighborhood sampling strategy generates a sequence of nodes by moving from a node in a regular manner, encoded with a graph embedding algorithm, mitigates the effect of randomness, preserves neighborhood information, is applicable to large-scale graph data and partially linked graph data, and effectively captures the transmissibility between nodes. DeepWalk [15], inspired by word2vec, generates a sequence of nodes using random wandering, exploits the Skip-gram model to maximize the node co-occurrence probability, captures local topology information, preserves node higher-order similarity, and performs superiorly on sparse graph data. Node2vec [16] improves DeepWalk by controlling the probability of accessing a node through the parameters p and q, implements smooth interpolation between BFS and DFS, accelerates training with SGD and negative sampling, and balances local and global topological information but fails to capture additional attribute information.

Deep learning-based graph embedding algorithms transform topological and attribute information of graph data into low-dimensional vectors for information aggregation and similarity metrics. GCN [17] uses convolutional neural networks for semi-supervised node classification, aggregates neighborhood information through spectral graph convolutional layers, ignores Laplace regularization, adjusts the mapping function, and predicts the labels of unlabeled nodes with two layers of GCN. GraphSage [18] proposes an inductive unsupervised graph embedding algorithm that combines topological and attribute information to generalize unknown nodes and new subgraphs through fixed-size neighborhood sampling and a trainable aggregation function, using negative sampling to optimize the objective function.

3. Attribute Graph Embedding Algorithms for Perceptual Topology and Attribute Influence

The graph self-encoder is an unsupervised learning model that generates low-dimensional embeddings while preserving the topological and attribute information of graph data. It uses an encoder to map nodes into a low-dimensional space and a decoder to reconstruct the topology, optimizing reconstruction errors to improve performance. However, existing methods primarily model node similarity from a local perspective, overlooking the strong correlation between topology and attribute information, especially in sparse or noisy graphs. Additionally, sampling strategies may introduce bias and fail to fully utilize node information, neglecting the varying influences of different neighborhoods and their attributes on target node embeddings.

To solve the above problems, this paper proposes the Attribute Graph Embedding Algorithm for Sensing Topology and Attribute Influence (TAGE model) to achieve unsupervised learning of the model and a full utilization of node information. The sampler considers both topological and attribute information of the attribute graph, designs attribute-sensitive unbiased graph self-encoders, performs topology aggregation and attribute aggregation independently in an unsupervised learning manner, uses a multilevel attention mechanism to explicitly model the influence of neighbors on node embedding, models the relevance of topology embedding and attribute embedding, and the objective function aims to reconstruct the topological information of the sampled graphs, attribute information, and association information to preserve local and global information, fusing the harvested graph to obtain the target node embeddings.

3.1. Definition of the Problem and Description of Relevant Symbols

Considering the independence of the TAGE model, the symbols used in this section and their descriptions are shown in Table 1.

Given an undirected attribute map

G

, the set V is the set of nodes, E is the set of edges, and A is the adjacency matrix of the attribute graph. We show that there are links between the nodes, i.e., X is the identity matrix of

G

. Let N and F be the number of nodes and features, respectively, and X is the feature matrix of dimensions

N \times F

, representing each node and each feature of each node as a vector. Let

H^{(1)}

denote the first node of all nodes in the feature matrix,

f^{(1)}

denotes the dimension of the first feature matrix of all nodes, and f is the dimension of the first feature.

The graphical self-encoder aims to learn a mapping function

f (\cdot)

, where

Z = f (A, X)

, based on the adjacency matrix A and the feature matrix X. We calculate the node embedding Z. We define the variables for the TAGE model as follows:

Attribute information: for node i, its attribute information is the node’s input eigenvector

x_{i}

and its linear or non-linear transformation

ϕ (x_{i})

.

Neighborhood information: For node i, its neighborhood information

h_{N (i)}

is obtained from the node’s neighborhood

N (i)

through a neighborhood aggregation operation. The neighborhood aggregates the attributes of each neighboring node

j \in N (i)

and aggregates these attributes into the neighborhood information

h_{N (i)}

. The essence of the neighborhood information is the input feature matrix X in the hierarchical hierarchy. This neighborhood information serves as an information transfer mechanism of the input feature matrix across the hierarchical topology, and the computation process of the neighborhood information can be viewed as an iterative operation.

Topology information: The process of calculating neighborhood information relies on the topology information and attribute information of the attribute graph. To model the effect of global topology on node embedding, in addition to the first-order neighborhood information, it is necessary to calculate additional topology information.

Problem definition: for unsupervised attribute graph embedding algorithms, the goal is to optimize their graph self-encoder architecture from both local and global perspectives to retain as much topology information and attribute information of the attribute graph as possible.

3.2. The TAGE Model

The TAGE model optimizes the graph self-encoder architecture to compute attribute graph node embeddings through unsupervised learning, solves the gradient explosion problem using a graph sampling approach, and reconstructs the topology and attribute information of subgraphs by combining an attribute and topology aggregation module with an attention mechanism, as detailed in Figure 1.

3.2.1. Graph Sampling

The difference between supervised and unsupervised attribute graph embedding algorithms lies in whether or not data labeling is performed and how the objective function is chosen. For the supervised attribute graph embedding algorithm, a graph convolutional network is used to perform convolutional operations directly on the attribute graph to achieve neighborhood aggregation in a supervised learning manner. For the unsupervised attribute graph embedding algorithm, the node embeddings are computed in an unsupervised learning manner based on the graph self-encoder architecture using graph convolutional networks [19]. The input adjacency matrix and feature matrix are encoded to minimize the reconstruction error of the matrix [20].

Since the full-batch training of a graph self-encoder needs to compute each node of each layer of the graph convolutional network, it increases the time complexity and space complexity of the model. To improve the scalability of the model, the TAGE model considers subgraph sampling operations on the original attribute graph. The design of the sampler for the TAGE model is similar to that of SAGES [21]. Similarly, both use subgraph sampling methods based on graph sampling, which produces higher similarity between nodes of the same subgraph. For node

v_{i}

, its neighborhood is sampled with a conditional probability of

p_{v_{j} ∣ v_{i}} \propto cos θ_{ij}

, where

cos θ_{ij}

represents the cosine similarity between nodes

v_{i}

and

v_{j}

. Compared to the neighborhood nodes

v_{k}

, if

cos θ_{ij} > cos θ_{ik}

, the node

v_{j}

has a higher likelihood of influencing node

v_{i}

. The TAGE model performs a linear K-hop graph convolutional transform operation on the input feature matrix X. A linear K-hop graph convolutional transform operation is performed to update the feature matrix using the local information of the nodes:

\tilde{X} = Q^{K} X

. Among them,

Q = {\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}

,

\tilde{A} = A + I

, where

I

is the identity matrix, and

\tilde{D}

is the diagonal matrix of

\tilde{A}

. According to the updated neighborhood matrix

\tilde{X}

, the optimal sampling probability of node

v_{j}

is

p_{v_{j} ∣ v_{i}} \propto cos 〈{\tilde{x}}_{i}, {\tilde{x}}_{j}〉

. From GraphSAINT, it can be seen that for the training process of the graph convolutional network in a mini-batch, the use of normalization technique to eliminate the variance introduced by the sampler can improve the accuracy of the model to a certain extent and accelerate the convergence speed of the model.

Algorithm 1 summarizes the graph sampling algorithm for the TAGE model.

Algorithm 1 Graph sampling algorithm for the TAGE model

Inputs: attribute map

G (V, E)

, feature matrix

\tilde{X}

, sample metric

M

, batch size b, step size d
Output: sampling map

G_{s} (V_{s}, E_{s})

1:: $V_{root} \leftarrow b$ Root nodes sampled randomly according to similarity metric $M$ from V of $G$
2:: $V_{s} \leftarrow V_{root}$
3:: for $v_{j} \in V_{root}$ do
4:: $v_{j} \leftarrow v_{i}$
5:: for $s = 1$ to d do
6:: $P_{nei} (v_{k}) : = softmax (cos ({\tilde{x}}_{j}, {\tilde{x}}_{k}) / temp)$ , $v_{k} \in N_{j}$
7:: $v_{j} \leftarrow$ node sampled from $N_{j}$ according to $P_{nei} (v_{j})$
8:: $V_{s} \leftarrow V_{s} \cup {v_{j}}$
9:: end for
10:: end for
11:: return $G_{s} \leftarrow$ node-induced subgraph of G from $V_{s}$

The graph sampling algorithm for the TAGE model uses a random-walk-based sampler, based on the sample metrics M sampling b nodes as the root node for the random walk, where the sample metric M can be quantified in different ways: for uniform sampling, nodes are randomly selected as sampling nodes on the attribute graph and each node has the same sampling probability; for degree sampling, the neighborhood nodes are sampled based on the node degree with

P (v_{i}) \propto {∥{\tilde{A}}_{:, v_{i}}∥}^{2}

the sampling probability of a neighborhood node, where

{\tilde{A}}_{i, v_{i}} = \sum_{j = 1}^{N} {\tilde{A}}_{v_{i}, v_{i}}

; the sampling probability of a node is proportional to its degree.

Random wandering starts from the root node and goes through the steps to get the random wandering path. For each step of the random wandering, a neighborhood node of the current node is randomly selected as the target node of the next step based on the similarity metric, and the process is repeated until the sampler achieves the random wandering of the step or reaches the predefined termination condition, where

P_{n e i}

is the similarity measure between the node and its neighborhood and

t e m p

is the

s o f t m a x

parameter of the function to smooth the probability distribution of the neighborhood nodes [22].

3.2.2. Attribute Aggregation Module

Attribute graph embedding computes node embeddings based on topological structure information and attribute information of the neighborhood. Existing unsupervised attribute graph embedding algorithms based on graph neural networks use the neighborhood aggregation mechanism to smooth the node information to preserve the local similarity between nodes and do not explicitly differentiate the different impacts of topology information and attribute information of the neighborhood on the node embedding, which has certain limitations. In order to achieve the full use of node information, the TAGE model uses API-GNN as an encoder, which consists of an attribute aggregation module, a topology aggregation module, and an interaction module for calculating neighborhood attribute embeddings and neighborhood topology embeddings, respectively, and measuring the correlation between neighborhood attribute embeddings and neighborhood topology embeddings in order to calculate the node embeddings.

For the attribute aggregation module, the TAGE model aims to model the effect of attribute information on node embeddings, with the output being the neighborhood attribute embeddings of the target node to preserve the original attribute information of the attribute graph. Considering the comprehensiveness of attribute information, the effect of attribute information on node embedding is divided into two cases, i.e., the effect of multiple attributes of a single neighborhood node on node embedding, and the effect of different neighborhood nodes on node embedding. The effect of multiple attributes of a single neighborhood node on node embedding is modeled using an attribute-level attention mechanism that performs a trainable linear transformation operation on the input feature matrices, specifically, using layer-specific attribute transformation matrices

S_{F}^{(l)}

that map each attribute vector of a neighborhood node to the semantic space of the global embedding of the target node, defining the projection operation as follows:

x_{jr}^{(1)} = S_{F}^{(1)} \cdot x_{jr}

, where

v_{j} \in N_{i}

is a node

v_{i}

the neighborhood node of the node, and

x_{j r}

is the node

v_{j}

’s first attribute vector of the node r attribute vector, and

x_{j r}^{(l)}

is

x_{j r}

’s projected attribute vector. We use the self-attention mechanism of GAT to compute the first l layer nodes, with

v_{i}

the global embedding of

h_{i}^{(l)}

, and the node

v_{j}

of each attribute vector of the node

x_{j r}^{(l)}

between the attention weights to quantify the node

v_{j}

with each attribute vector of the node

x_{j r}^{(l)}

to the node’s

v_{i}

and the global embedding of the node

h_{i}^{(l)}

of the node:

\partial_{ijr}^{(l)} = LeakyReLU (a_{F}^{T} \cdot [W_{F}^{(l)} h_{i}^{(1)} ∥ W_{F}^{(l)} x_{jr}^{(l)}])

(1)

where

a_{F}

is the attribute-level attention vector,

W_{F}^{(l)}

is a matrix of weight coefficients for the first l matrices of weight coefficients at the attribute level of the hierarchy, and LeakyReLU is the nonlinear activation function. To solve the problem of being able to ignore topological structure information, the TAGE model uses a masked attention mechanism to mask the unarticulated points so that the computation of the attention weights focuses only on the neighborhood nodes. To compare the attention weights of different attributes, a function is used for the normalization to update the attention weights.

According to the l-layer neighborhood node

v_{j}

, a linear combination of all attribute vectors and corresponding attention weights can be obtained for the target node

v_{i}

of the neighborhood nodes of the target node

v_{j}

; the attribute embedding of the target node

σ

is the nonlinear activation function:

{\vec{x}}_{ij}^{(1)} = σ (\sum_{r = 1}^{R} \frac{exp (\partial_{ijr}^{(1)})}{\sum_{r = 1}^{R} \partial_{ijr}^{(1)}} \cdot x_{jr}^{(1)}) .

(2)

Considering the different effects of the attribute information of different neighborhood nodes on node embedding, a node-level attention mechanism is used to calculate the first l-level node

v_{i}

and its neighborhood node’s attribute embedding

h_{i}^{(l)}

, and the attribute embeddings of its neighborhood nodes

{\vec{x}}_{ij}^{(1)}

. The computation process of the attention weight between them is similar to that of the attribute-level attention mechanism:

\partial_{ij}^{(1)} = LeakyReLU (a_{N}^{T} \cdot [W_{N}^{(1)} h_{i}^{(1)} ∥ W_{N}^{(l)} {\vec{x}}_{ij}^{(l)}])

(3)

φ_{ij}^{(1)} = softmax (\partial_{ij}^{(1)}) = \frac{exp (\partial_{ij}^{(i)})}{\sum_{v_{j} \in N_{i}} exp (\partial_{ij}^{(1)})}

(4)

where

a_{N}

is the node-level attention vector, and

W_{N}^{(l)}

is a matrix of weight coefficients at the l-level node, and

φ_{ij}^{(1)}

is the normalized node-level attention weight. Using the attention weights to aggregate the attribute embeddings of the neighborhood nodes

{\vec{x}}_{ij}^{(1)}

, the target node’s neighborhood attribute embedding can be obtained as the

v_{i}

of the neighborhood attribute embedding:

h f_{i}^{(1)} = σ (\sum_{v_{j} \in N_{i}} φ_{ij}^{(1)} \cdot {\vec{x}}_{ij}^{(1)}) .

(5)

Since the target node

v_{i}

’s neighborhood attribute embedding computation process only involves the aggregation operation of the original attribute information of the neighborhood nodes and does not use the topology information of the neighborhood nodes, the TAGE model is able to preserve the original attribute information of the nodes.

3.2.3. Topology Aggregation Module

For the topology aggregation module, the TAGE model aims to model the effect of topology information on node embeddings, and the output is the neighborhood topology embedding of the target node. Considering that existing attribute graph embedding algorithms update node embeddings by iteratively aggregating topological structure information and attribute information, pure topological structure information cannot be obtained. For some graph-related downstream tasks, attribute graph embedding algorithms may have certain limitations. To solve the above problems, the TAGE model maps the nodes and edges of the attribute graph to a low-dimensional vector space using a two-layer graph self-encoder architecture, so that the dimensions of the node’s original attribute vectors are the same as those of the node’s global embeddings, defining the encoder of the graph self-encoder as:

x_{i} = x_{i 1} ∥x_{i 2}∥ \dots ∥ x_{iR}

(6)

{of}_{i}^{(1)} = σ (W_{2} \cdot σ (W_{1} \cdot x_{i} + b_{1}) + b_{2})

(7)

where

x_{i}

is the raw attribute vector obtained by concatenating all the raw attribute vectors of node

v_{i}

obtained by the concatenation operation of all the raw attribute vectors of

W_{1}

and

W_{2}

,

b_{1}

and

b_{2}

are the trainable parameters of the encoder,

o f_{i}^{(1)}

are the training parameters of the encoder and lth layer node

v_{i}

of the original attribute embedding, and the decoder and objective functions of the graph self-encoder are defined as:

x_{i}^{'} = σ (W_{4} \cdot σ (W_{3} \cdot {of}_{i}^{(1)} + b_{3}) + b_{4})

(8)

L_{auto} = \frac{1}{N} \sum_{i = 1}^{N} {∥x_{i}^{'} - x_{i}∥}^{2} .

(9)

According to the objective function of the graph self-encoder, one can obtain the node

v_{i}

’s original attribute embedding of

o f_{i}^{(1)}

using the global embedding of the node

h_{i}^{(1)}

for initialization. Considering that the multiple iteration process of the graph self-encoder will update both the global embedding and the primitive attribute embedding of the node, we design a layer-specific transformation matrix

S_{T}^{(l)}

that converts the raw attribute embeddings of the nodes in the previous layer to the raw attribute embeddings of the current layer. The global embedding of the nodes in each layer of the graph self-encoder is used to subtract the original attribute vectors of the nodes to obtain the first layer of the lth layer node

v_{i}

. The topological embedding of

t p_{i}^{(l)}

is as follows:

{tp}_{i}^{(1)} = h_{i}^{(1)} - S_{T}^{(1 - 1)} \cdot {of}_{i}^{(1 - 1)} .

(10)

For the first layer of the graph self-encoder, we embed the topology into

t p_{i}^{(1)}

, which is set to the zero vector. Since the topological embedding of different neighborhood nodes affects the global embedding of the target node differently, similar to the process of calculating neighborhood attribute embeddings, the variability between topological embeddings is modeled using the following attention mechanism:

δ_{ij}^{(1)} = LeakyReLU (a_{T}^{T} \cdot [W_{T}^{(1)} h_{i}^{(1)} ∥ W_{T}^{(I)} {tp}_{j}^{(1)}])

(11)

Ψ_{ij}^{(1)} = softmax (δ_{ij}^{(1)}) = \frac{exp (δ_{ij}^{(1)})}{\sum_{v_{j} \in N_{i}} exp (δ_{ij}^{(1)})}

(12)

h t_{i}^{(1)} = σ (\sum_{v_{j} \in N_{i}} Ψ_{ij}^{(1)} \cdot {tp}_{j}^{(1)})

(13)

where

a_{T}

is the attention vector of the topology,

W_{T}^{(l)}

is the weight coefficient matrix of the first l matrices of weight coefficients for the layer topology, and

Ψ_{ij}^{(1)}

is a matrix of weight coefficients for the lth layer neighborhood node

v_{j}

, the attention weights of the neighborhood nodes in the first layer, and

h t_{i}^{(1)}

is the weight matrix of the lth layer target node

v_{i}

of the neighborhood topology embedding.

3.2.4. Interaction Module

The attribute aggregation module and topology aggregation module model the effect of attribute information and topology information of neighborhood nodes on the global embedding of the target node, outputting neighborhood attribute embeddings and neighborhood topology embeddings, respectively. For the interaction module, the TAGE model aims to model the correlation between the current layer neighborhood attribute embeddings and neighborhood topology embeddings, and the global embeddings of the nodes in the next layer can be obtained:

h_{i}^{(1 + 1)} = W_{φ}^{(1)} \cdot h f_{i}^{(1)} + W_{Ψ}^{(1)} \cdot h t_{i}^{(1)} + S_{T}^{(i)} \cdot h_{i}^{(l)} .

(14)

Among them

W_{φ}^{(1)}

and

W_{φ}^{(1)}

are the neighborhood attribute embedding

h f_{i}^{(l)}

and the neighborhood topology embedding

h t_{i}^{(l)}

’s weight coefficient matrices of the neighborhood attribute embeddings and the neighborhood topology embeddings, respectively.

S_{T}^{(i)}

is the layer-specific transformation matrix of the topology aggregation module, and

h_{i} (l + 1)

is the matrix of weight coefficients for the layer-specific transformation of the

(l + 1)

th layer target node

v_{i}

of the global embedding. After several iterations of the TAGE model, the global embeddings of the nodes in layer 1 can be obtained. The L-layer global embedding of nodes

H_{s}^{L} \in R^{(} N \times d)

is the global embedding of the nodes in the output layer

Z_{s}

.

3.2.5. Graph Decoder

The TAGE model considers topology-level, attribute-level, and association-level loss terms in the objective function, aiming to reconstruct the topology information, attribute information, and association information of the sampled graph in order to preserve the local and global information of the attribute graph. We use an inner product decoder in the decoder

{\tilde{A}}_{s} = sigmoid (Z_{s} Z_{s}^{T})

. Reconstruction of the adjacency matrix

1 A_{s}

, which predicts the probability of the existence of links between nodes based on the inner product between node embeddings, defines the reconstruction error of the adjacency matrix as

L_{A} = loss (A_{s}, {\hat{A}}_{s})

. The reconstruction error of the neighbor matrix is defined as

(v_{i}, v_{j}) \in E_{s}

. The reconstruction error is as follows:

L_{ij} = - log \frac{1}{1 + exp (- z_{i}^{T} z_{j})} .

(15)

To ensure the completeness of node embedding, the TAGE model considers reconstructing the feature matrix in the decoder, specifically, designing attribute-sensitive decoders

ϕ (A_{s}, Z_{s}; Θ_{ϕ}) = {\hat{X}}_{s}

that can reconstruct the feature matrix from the node’s potential embedding

Z_{s}

. The feature matrix is reconstructed from the node’s potential embedding

X_{s}

, such that the input feature matrix

X_{s}

and the reconstructed feature matrix

{\hat{X}}_{s}

are as close as possible. The decoder

ϕ

is a single-layer graph attention network that computes the input feature matrix

X_{s}

and reconstructed feature matrix

{\hat{X}}_{s}

. The attention weights between

γ_{s} \in R^{N_{s} \times N_{s}}

are as follows:

\begin{matrix} γ_{s} & = softmax (sigmoid (Z_{i} + {\hat{Z}}_{j})) \end{matrix}

(16)

\begin{matrix} Z_{i} & = A_{s} ⊙ (v_{i} ReLU (W_{s} Z_{s})) \end{matrix}

(17)

\begin{matrix} Z_{j} & = A_{s} ⊙ (v_{j} ReLU (W_{s} Z_{s})) \end{matrix}

(18)

where ⊙ is the Hadamard product, ReLU is the nonlinear activation function,

W_{s} \in R^{F^{'} \times F^{'}}

,

v_{i} \in R^{1 \times d^{(k)}}

, and

v_{j} \in R^{1 \times d^{(k)}}

are the decoder and

ϕ

the trainable parameters of the decoder. Based on the attention weights of the feature matrix, the decoder

ϕ

is defined as follows:

ϕ (A_{s}, Z_{s}) = σ (γ_{s} Z_{s}) .

(19)

For the nodes

v_{i} \in V_{s}

, and the input feature matrix

X_{s}

, the reconstruction errors of their eigenvectors and eigenmatrices, respectively, can be defined as:

L_{x_{i}} = {∥x_{i} - {\hat{x}}_{i}∥}_{2}

(20)

L_{X} = loss (X_{s}, {\hat{X}}_{s}) .

(21)

To address the problem of inconsistent sampling probabilities of nodes and edges, the TAGE model has reconstruction errors for the feature matrix

L_{X}

, and the reconstruction errors of the adjacency matrix

L_{A}

are processed using attribute normalization coefficients

λ_{i} = 1 / p_{i}

and link normalization factor

λ_{ij} = 1 / p_{ij}

to eliminate the bias introduced by the sampler, and the reconstruction error of the feature matrix

L_{X}

is updated to:

L_{X} = \frac{1}{N_{s}} \sum_{v_{i} \in V_{s}} λ_{i} L_{x_{i}} .

(22)

Among them,

N_{s}

is the sampling graph

G_{s}

of the number of nodes. Based on the edge

(v_{i}, v_{j}) \in E_{s}

’s reconstruction error of the adjacency matrix and the normalization factor

λ_{ij} = 1 / p_{ij}

, the reconstruction error of the adjacency matrix

L_{A}

is updated as:

L_{A} = \frac{1}{M_{s}} \sum_{(v_{i} v_{j}) \in E_{s}} λ_{ij} L_{ij} .

(23)

Among them,

E_{s}

is the number of edges of the sampled graph

G_{s}

. For the sampling graph

G_{s}

, considering the objective function of node embedding in

L_{s}

, the reconstruction errors of the feature matrix and the adjacency matrix are simultaneously minimized:

L_{N} = L_{X} + ω_{s} L_{A}

.

Where

ω_{s} \geq 0

is a hyperparameter used to balance the reconstruction error of the feature matrix and the adjacency matrix. To solve the problem

p_{i}

, since

p_{i j}

cannot be obtained directly based on a theoretical analysis, the TAGE model preprocesses the sampling probabilities of nodes and edges. Specifically, the attribute graph is repeatedly sampled to obtain the set of subgraphs

G_{1}, G_{2}, \dots, G_{t}

, using counters

C a l_{i}

and

C a l_{i j}

to count the nodes

v_{i}

and edges

(v_{i}, v_{j})

. The number of occurrences of nodes and edges in the subgraph collection is combined with the number of occurrences of nodes and edges

C a l_{i}

and

C a l_{i j}

to the number of sampled subgraphs. The ratio of the number of occurrences of nodes and edges to the number of sampled graphs is taken as the sampling probability

p_{i}

and

p_{i j}

. The sampling probabilities of nodes and edges can be updated as an approximation of

p_{i} = {Cal}_{i} / t

and

p_{ij} = {Cal}_{ij} / t

.

To achieve full utilization of node information, the TAGE model uses a DGI [19] variant of the DGI architecture to fuse association structure information into node embeddings, maximizing the mutual information between node embeddings and association embeddings. The node embeddings are obtained based on the encoder calculation

Z_{s}

. Based on this, the readout function is used to determine

R : R^{N_{s} \times F^{'}} \to R^{F^{'}}

. We calculate the association embedding

c_{s}

as follows:

c_{s} = R (Z_{s}) = sigmoid (\frac{1}{N_{s}} \sum_{v_{i} \in V_{s}} z_{i}^{(s)})

(24)

where

z_{i}^{(s)}

is the node

v_{i}

in the sampling graph

G_{s}

’s embedding, and

R (\cdot)

generalizes the node embedding into the association embedding. For positive samples, the discriminant function is used:

D : R^{F^{'}} \times R^{F^{'}} \to R

. The correlation between the quantified nodes and different associations in the form of node–association pairs produces a higher correlation between nodes and their corresponding associations. We define the discriminant function as follows:

D (z_{i}^{(s)}, c_{s}) = sigmoid (c_{s}^{T} W_{D} z_{i}^{(s)})

(25)

where

D (z_{i}^{(s)}, c_{s}) = sigmoid (c_{s}^{T} W_{D} z_{i}^{(s)})

is the matrix of weight coefficients. Since negative samples need to be obtained by random sampling a noisy distribution, we use the corruption function C for the adjacency matrix

A_{s}

and the identity matrix

X_{s}

to process them:

C (A_{s}, X_{s}) = ({\tilde{A}}_{s}, {\tilde{X}}_{s})

.

The corruption function encourages the retention of the adjacency matrix

A_{s}

of the sampling graph, and the ordering of the feature matrices

X_{s}

is adjusted row by row to obtain the noise feature matrix

{\tilde{X}}_{s}

. The node embeddings corresponding to negative samples are computed using the encoder

{\tilde{Z}}_{s} = ξ ({\tilde{A}}_{s}, {\tilde{X}}_{s}; Θ_{ξ})

. Given the node embedding

Z_{s}

, the node embedding corresponding to negative samples

{\tilde{Z}}_{s}

and association embeddings

c_{s}

using the noise comparison cross-entropy [23] as the objective function of association embeddings to maximize the mutual information between node embeddings and association embeddings in an unsupervised learning approach is as follows:

L_{C} = \sum_{v_{i} \in v_{s}} D log (z_{i}^{(s)}, c_{s}) + \sum_{v_{j} \in v_{s}} log (1 - D ({\tilde{z}}_{j}^{(s)}, c_{s})) .

(26)

Considering simultaneously optimizing the objective function for association embedding

L_{N}

and the objective function of association embedding

L_{C}

, we define the objective function of the TAGE model as

L = L_{N} + ω L_{C}

.

Where

ω

is a hyperparameter to balance the objective function of association embedding

L_{N}

and the objective function of association embedding

L_{C}

.

The pseudo-code for the TAGE model is shown in Algorithm 2.

Algorithm 2 Attribute graph embedding algorithm for perceiving topological and attribute influence

Inputs: attribute map

G (V, E)

, adjacency matrix A, feature matrix X, Sampler

S a m p l e

Output: node embedding of the attribute graph

1:: $G \leftarrow$ construct training graph of $G$
2:: Preprocessing: calculating influence matrix $\tilde{X}$
3:: Setup the $S a m p l e$ parameters
4:: Compute normalization coefficients $λ$
5:: for each mini-batch do
6:: $G_{s} (V_{s}, E_{s}) \leftarrow S a m p l e (G)$
7:: GAE construction on $G_{s}$
8:: Attribute aggregation module.
9:: $x_{i j}^{(l)} \leftarrow$ Calculate influences of different attributes of $v_{j}$ according to $φ_{j r}^{(l)}$
10:: $h_{f}^{(l)} \leftarrow$ Calculate influences of different neighbors’ attributes according to $φ_{i j}^{(l)}$
11:: Topology aggregation module.
12:: $t p_{i}^{(l)} \leftarrow$ Topology information distilled from attributes vector of $v_{i}$
13:: $h_{t}^{(l)} \leftarrow$ Calculate influences of neighbor topology according to $t p_{i}^{(l)}$
14:: Interaction module.
15:: $h_{t}^{(l + 1)} \leftarrow$ Calculate interaction between $h_{f}^{(l)}$ and $h_{t}^{(l)}$
16:: $Z_{s} \leftarrow$ Forward propagation of $(A_{s}, X_{s})$ by the encoder of TAGE
17:: ${\hat{A}}_{s}, {\hat{X}}_{s} \leftarrow$ Reconstruct $A_{s}$ , based on $X_{s}$ based on $Z_{s}$ by attribute-sensitive decoder
18:: $L_{A} \leftarrow$ Compute $λ_{i j}$ -normalized topology loss $(A_{s}, {\hat{A}}_{s})$
19:: $L_{X} \leftarrow$ Compute $λ_{i}$ -normalized attribute loss $(X_{s}, {\hat{X}}_{s})$
20:: $L_{C} \leftarrow$ Compute community-level loss
21:: Backward propagation according to $L_{A}$ , $L_{X}$ , and $L_{C}$
22:: Update weights
23:: end for

4. Experiments

4.1. Experimental Setup

4.1.1. Experimental Dataset

The experimental environment was configured with an Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 128.0GB system memory, NVIDIA GeForce RTX 2080TI display card, Windows 10 operating system, and Python development language.

To validate the effectiveness of the TAGE model, node classification experiments and node clustering experiments were conducted on the ACM, DBLP, Amazon, Flickr datasets, and comparative experiments were conducted with the baseline algorithms in order to evaluate the unsupervised attribute graph embedding algorithm. Table 2 summarizes the statistical information of the experimental datasets.

ACM datasets: ACM datasets are academic literature datasets in the field of computer science covering the latest research results, technological advances, theoretical discussions, and practical experience in computer science and related fields, supported by ACM Digital Library; you can visit the ACM Digital Library website to obtain the required data and literature information.

DBLP dataset: the DBLP dataset contains journal articles and conference papers in computer science and related fields, covering all aspects of computer science and related fields, and provides a search function which allows one to retrieve the required literature information based on the metadata of the academic literature.

Amazon dataset: The Amazon dataset is an e-commerce dataset for the Amazon.com website, where nodes denote items, edges denote common rating relationships between items, and node labels denote item categories. This paper used the Amazon metadata set Amazon-clothing.

Flickr dataset: the Flickr dataset is an image dataset for the online image sharing platform Flickr, covering image resources and related metadata information, where nodes represent uploaded images, and edges are created to represent associative relationships between images if they share similar metadata information.

4.1.2. Baseline Algorithm

Considering that the node embeddings obtained from the TAGE model may be used for different graph-related downstream tasks, node classification experiments and node clustering experiments were carried out to evaluate the model performance, and some representative attribute graph embedding algorithms were selected as baseline algorithms to be used for comparison experiments with the TAGE model.

DeepWalk [15]: DeepWalk performs truncated random wandering on the input attribute graph to preserve the local topology information of the nodes, where nodes and node sequences can be considered as words and sentences in the field of NLP, and the collection of node sequences can be considered as a corpus; it learns the distributed representation of the nodes using the Skip-gram model, which is able to maximize the co-occurrence probability between the nodes and uses the hierarchical softmax as a classifier to improve the training efficiency of the model.

Node2vec [16]: Node2vec can be regarded as an extension of DeepWalk with the advantage of a biased random wandering strategy, based on the homogeneous effect of nodes as well as topology similarity; the design parameters are smoothly interpolated between BFS and DFS to balance the local and global topology information of the nodes, and the objective function is optimized using the SGD mechanism, which is not able to preserve the attribute information of nodes and edges.

SDNE [17]: SDNE aims to jointly optimize the first-order similarity and second-order similarity between nodes by modeling the second-order similarity with unsupervised learning using stacked graph self-encoders, and the first-order similarity with supervised learning using Laplacian feature mapping, and by applying a higher penalty factor in the objective function for the reconstruction error of non-zero elements, taking into account the sparsity of the attribute graph.

DGI [24]: DGI designs graph self-encoder architectures based on graph convolutional networks to compute node embeddings in an unsupervised learning manner, with an objective function that aims to maximize the mutual information between the local embeddings of the nodes and the global embeddings of the attribute graphs, and uses contrast learning to train the model in order to discriminate between positive and negative samples, making DGI applicable to both direct inferential and inductive learning.

GAT [25]: GAT follows the self-attention mechanism by stacking a series of graph attention layers to perform neighborhood aggregation operations, allowing each node to dynamically adjust its attention weights according to the importance of its neighborhood nodes. Attention weights and neighborhood embeddings are weighted and summed in the process of neighborhood aggregation to distinguish the different impacts of different neighborhood nodes on the target node, preserving both local and global information of the attribute graph.

SAGES [21]: Considering the computational complexity of unsupervised attribute graph embedding algorithms, similar to GraphSAINT, SAGES performs subgraph sampling operations in the form of graph sampling, resulting in higher similarity between nodes of the same subgraph, constructs an unbiased graph self-encoder based on the sampled subgraphs, reconstructs the adjacency and feature matrices in a decoder, and removes bias introduced by the sampler using normalization techniques.

K-means [26]: K-means is an unsupervised clustering algorithm based on heuristic rules; for a given dataset, K nodes are randomly selected as cluster centers, and the dataset is divided into K different clusters so that the distance between each node and the cluster center node is minimized; for each cluster, the mean of all its data points is calculated, the cluster center is updated to the mean, and the above process is repeated until the algorithm converges.

TADW [27]: TADW analyses the DeepWalk algorithm from another perspective and proves its equivalence with the matrix decomposition-based graph embedding algorithm. TADW pays attention to both topology information and attribute information of the attribute graph. Considering the incompleteness of the information, in order to balance the efficiency and accuracy of model training, it uses an inductive learning-based matrix complementation method that complements the missing information based on the adjacency matrix and feature matrix.

GAE and VGAE [28]: GAE takes the adjacency matrix as input, uses graph convolutional networks to encode both topological and attribute information, reconstructs the adjacency matrix in an inner product decoder, and minimizes the negative cross-entropy loss of the adjacency matrix in order to train the model. VGAE assumes that the a priori probability follows a standard normal distribution, uses variational inference to approximate the multi-dimensional Gaussian distribution to obtain the a posteriori probability on top of the GAE, and optimizes the lower bound of the variational in order to quantify the reconstruction loss.

ARGA and ARVGA [29]: both ARGA and ARVGA are graph self-encoder-based attribute graph embedding algorithms, which can be considered as extensions of GAE and VGAE, respectively, with the difference being the introduction of adversarial networks to quantify the difference between node embedding and random noise, and the consideration of reconstruction losses from both graph self-encoders and adversarial regularization in the objective function.

GALA [30]: GALA designs fully symmetric graph self-encoder architectures; for the encoder, node embeddings are computed using graph convolutional networks, and for the decoder, Laplacian sharpening is used as the counterpart of Laplacian smoothing for the encoder, to ensure the numerical stability of the graph self-encoder, GALA introduces signed graphs, which further give numerically stable forms of Laplacian sharpening.

4.1.3. Hyperparameter Settings

For the TAGE model as well as the baseline algorithm described above, the general setup of graph neural networks was followed, with an initialized learning rate of 0.001, which was adaptively learned using the Adam optimizer. Based on model convergence as well as validation set accuracy, an early stopping strategy was used to guide the training process of the models, and all models were iterated for 20 epochs. For the graph self-encoder-based baseline algorithm, a two-layer graph convolutional network architecture was used, with the node embedding dimension set to 512. For the other baseline algorithms, their default parameter settings were used, with no additional parameter tuning.

4.2. TAGE Model Node Classification Experiment

For the node classification task, the ACM, DBLP, Amazon, and Flickr datasets were divided into a training set, validation set, and test set in the ratio of 80%, 10%, 10%, using the accuracy rate as an evaluation metric to measure the prediction result of the classification model; the closer the value was to one, the better the performance of the model. Based on the experimental settings of unsupervised learning or semi-supervised learning, the node classification of the TAGE model and the baseline algorithms of DeepWalk, Node2vec, SDNE, DGI, GAT, and SAGES are shown in Figure 2, Figure 3, Figure 4 and Figure 5. The results of the comparison experiments are shown in Figure 2, Figure 3, Figure 4 and Figure 5, with the TAGE model as well as the baseline algorithms as horizontal coordinates and the accuracy values as vertical coordinates.

According to the results of the node classification comparison experiments shown in Figure 2, Figure 3, Figure 4 and Figure 5, for the ACM, DBLP, Amazon, and Flickr datasets, the TAGE model proposed in this paper improved the accuracy of the node classification task to some extent in most cases. Specifically, compared with the optimal baseline algorithm, the TAGE model improved the accuracy by 4.3%, −1%, 3.8%, and 2.2% on the ACM, DBLP, Amazon, and Flickr datasets, respectively. To improve the scalability of the model, the TAGE model computed node embeddings on top of the harvested graph. It is worth noting that the TAGE model was sensitive to the choice of sampler and the size of the sampled graph, and since the DBLP dataset was a dense graph, the sampled graph could not retain the complete topology and attribute information, which may be one of the reasons why the node classification accuracy of the TAGE model was lower than that of the optimal baseline algorithm on the DBLP dataset. The TAGE model and all baseline algorithms were based on an experimental setup of unsupervised or semi-supervised learning for the node classification task. DeepWalk and Node2vec are random-walk-based graph embedding algorithms that model only the topological information of the attribute graph and are unable to model information about its attributes. GAT follows the self-attention mechanism that allows each node to dynamically adjust its neighborhood nodes according to the importance of their attention weights; while being the optimal baseline algorithm in most cases, there was some numerical volatility, especially on the Amazon dataset, on which it performed poorly, suggesting that the dataset had multiple types of links, which may involve modeling multiple types of links. DGI aims to maximize the mutual information between the local embeddings of the nodes and the global embeddings of the attribute graph and needs to perform full-batch training, in which each node of each layer of the graph neural network is computed, and is not sufficiently suitable for large-scale datasets, such as the Flickr dataset. Compared with SAGES, the TAGE model subdivided the effect of topological structure information and attribute information on node embedding on the basis of SAGES. In summary, the comparative experimental results of the TAGE model proposed in this paper on ACM, DBLP, Amazon, and Flickr datasets proved its advantages in the node classification task. The TAGE model is based on the graph sampling method and uses the self-attention mechanism to compute the topology embedding and attribute embedding of the node’s neighborhoods, respectively, and considers the topology embedding and attribute embedding in the interaction module between the correlation, which can achieve the full utilization of node information.

4.3. TAGE Model Node Clustering Experiments

For the node clustering task, accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI) were used as evaluation metrics for assessing the consistency between the clustering results and the real labels, with higher values indicating better clustering results. Considering that some attribute graph embedding algorithms did not specify clustering algorithms, such as TADW, spectral clustering was performed on node embeddings, according to DBI [31]. We selected the best epoch and used linear function normalization to scale the node embeddings to the [0, 1] interval. Other parameters as well as experimental settings were consistent with the node classification task. The experimental results of node clustering comparison between the TAGE model and the K-means, Spectral, TADW, GAE, VGAE, ARGA, ARVGA, and GALA baseline algorithms on the ACM, DBLP, Amazon, and Flickr datasets are shown in Table 3.

According to the results of node clustering comparison experiments shown in Table 3, for the ACM, DBLP, Amazon, and Flickr datasets, the TAGE model proposed in this paper outperformed the other baseline algorithms in most cases. GALA is a graph self-encoder-based attribute graph embedding algorithm that uses numerically stable Laplace-sharpened form, which is able to enhance the high-frequency components of the graph signals to predict the attribute graphs with the potential links, which may have been one of the reasons leading to GALA’s better node clustering on the Amazon dataset than the TAGE model.

Intuitively, attribute graph embedding algorithms that consider both topological and attribute information have better clustering results, e.g., TADW, GAE, VGAE, ARGA, ARVGA, GALA, and the TAGE model proposed in this paper outperformed the baseline algorithms K-means, and spectral that used unilateral information in the ACM dataset, which suggests that the topology information and attribute information can optimize the node clustering task from different perspectives, as well as the importance of modeling the interaction between topological structure information and attribute information. It is worth noting that the objective function of GAE, VGAE, ARGA, ARVGA is the reconstruction error of the adjacency matrix.

In summary, the results of node clustering experiments on the ACM, DBLP, Amazon, and Flickr datasets demonstrated the superiority of the TAGE model. From the architectural design of the graph neural network, the subgraph sampling strategy enabled the TAGE model to obtain the association structure of the input attribute graph without additional computation, and the objective function aimed to reconstruct the topological structure information, attribute information, and association information.

4.4. TAGE Model Ablation Experiments

The TAGE model calculates the reconstruction errors of topological structure information, attribute information, and association information in the objective function to achieve the full utilization of node information. In order to distinguish the effect of reconstruction errors of different types of information on node embeddings, variants of the TAGE model, TAGE-RA, TAGE-RX, TAGE-RC, TAGE-RA+RX, TAGE-RA+RC, and TAGE-RX+RC, were designed to perform ablation experiments on the ACM, DBLP, Amazon, and Flickr datasets, with hyperparameters taking the same values as in the node classification task. RA, RX, and RC denote the topology information, attribute information, and association information of the reconstructed attribute graph only in the decoder, respectively, and similarly for the other cases. The results of the ablation experiments for the TAGE model and its variants are shown in Figure 6, Figure 7, Figure 8 and Figure 9, with the TAGE model as well as the baseline algorithms in the horizontal coordinates, and the accuracy, ACC, the Normalized Mutual Information, NMI, and the Adjusted Rand Index in the vertical coordinates.

According to the results of the ablation experiments shown in Figure 6, Figure 7, Figure 8 and Figure 9, compared with the TAGE model variant that reconstructs part of the information, the TAGE model reconstructing the topology information, attribute information, and association information of the attribute graph simultaneously in the decoder from both local and global perspectives resulted in a better performance of the node clustering task carried out by the TAGE model on the ACM, DBLP, Amazon, and Flickr datasets. This illustrated that topology information, attribute information, and association information jointly affect node embedding and verified the effectiveness of the objective function design of the TAGE model.

5. Conclusions

In this paper, we proposed the TAGE model, an attribute graph embedding algorithm for topology and attribute influence, using a graph self-encoder architecture with subgraph sampling for scalability. The encoder, featuring attribute and topology aggregation and an interaction module, employed self-attention to measure fine-grained neighborhood effects on node embeddings. The decoder’s objective function preserved local and global information. Experiments on the ACM, DBLP, Amazon, and Flickr datasets demonstrated TAGE’s consistent advantages in node classification and clustering, with ablation experiments validating the objective function design.

Author Contributions

Conceptualization, D.C.; formal analysis, S.Z. and D.W.; funding acquisition, D.C.; methodology, D.C., S.Z. and Y.Z.; project administration, D.C. and D.W.; resources, D.W.; software, Y.Z.; supervision, D.C. and M.X.; validation, S.Z.; visualization, S.Z. and Y.Z.; writing—original draft, Y.Z.; writing—review and editing, D.C., S.Z. and M.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Technologies Research and Development Program of Liaoning Province in China under Grant No. 2021JH1/10400079, in part by the Applied Basic Research Project of Liaoning Province under Grant No. 2023JH2/101300185, in part by the Natural Science Foundation of Liaoning Provincial Department of Science and Technology under Grant No. 2022-KF-11-04, in part by the Fundamental Research Funds for the Central Universities under Grant No. 2024GFZD03 and in part by the Key Technologies Research and Development Program of Liaoning Province under Grant No. 2024JH2/102400072.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, H.; Ng, M.K. Hypergraph Convolution on Nodes-Hyperedges Network for Semi-Supervised Node Classification. ACM Trans. Knowl. Discov. Data 2022, 16, 1–19. [Google Scholar] [CrossRef]
Wu, Y.; Zhong, Z.; Xiong, W.; Chen, L.; Jing, E. An efficient attribute graph clustering method. Ph.D. Thesis, National University of Defense Technology, Changsha, China, 2013. [Google Scholar]
Huang, L.; Li, D.; Ma, Y. A meta path-based link prediction model for heterogeneous information networks. Chin. J. Comput. 2014, 37, 848–858. [Google Scholar]
Hu, W.; Peng, C.; Liang, J. A social network event detection method based on link prediction. J. Softw. 2015, 26, 2339–2355. [Google Scholar] [CrossRef]
Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl.-Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef]
Baptista, A.; Sanchez-Garcia, R.; Baudot, A.; Bianconi, G. Zoo guide to network embedding. J. Phys. Complex. 2023, 4, 042001. [Google Scholar] [CrossRef]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Los Angeles, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Johnson, S.C. Hierarchical clustering schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef]
Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2001, 14, 849–856. [Google Scholar]
Yang, F.; Zhang, H.; Tao, S.; Fan, X. Simple hierarchical PageRank graph neural networks. J. Supercomput. 2023, 80, 5509–5539. [Google Scholar] [CrossRef]
Zhang, H.; Zhu, Y.; Li, X. Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7451–7462. [Google Scholar] [CrossRef]
Makarov, I.; Kiselev, D.; Nikitinsky, N.; Subelj, L. Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Comput. Sci. 2021, 7, e357. [Google Scholar] [CrossRef]
Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; Zhu, W. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1105–1114. [Google Scholar]
Luo, D.; Nie, F.; Huang, H.; Ding, C.H. Cauchy graph embedding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 553–560. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Wang, D.; Cui, P.; Zhu, W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 1263–1272. [Google Scholar]
Wang, J.; Qu, X.; Bai, J.; Li, Z.; Zhang, J.; Gao, J. SAGES: Scalable attributed graph embedding with sampling for unsupervised learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 5216–5229. [Google Scholar] [CrossRef]
Schmidt, J.; Marques, M.R.G.; Botti, S.; Marques, M.A.L. Recent advances and applications of machine learning in solid-state materials science. Npj Comput. Mater. 2019, 5, 83. [Google Scholar] [CrossRef]
Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. arXiv 2018, arXiv:1808.06670. [Google Scholar]
Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. arXiv 2018, arXiv:1809.10341. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Yang, C.; Liu, Z.; Zhao, D.; Zhao, D.; Sun, M.; Chang, E.Y. Network representation learning with rich text information. In Proceedings of the IJCAI, Buenos Aires, Argentina, 25–31 July 2015; pp. 2111–2117. [Google Scholar]
Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
Pan, S.; Hu, R.; Long, G.; Jiang, J.; Yao, L.; Zhang, C. Adversarially regularised graph autoencoder for graph embedding. arXiv 2018, arXiv:1802.04407. [Google Scholar]
Park, J.; Lee, M.; Chang, H.J.; Lee, K.; Choi, J.Y. Symmetric graph convolutional autoencoder for unsupervised graph representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6519–6528. [Google Scholar]
Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]

Figure 1. Model diagram of the TAGE algorithm.

Figure 2. Experimental results of node classification on ACM dataset.

Figure 3. Experimental results of node classification on DBLP dataset.

Figure 4. Experimental results of node classification on Amazon dataset.

Figure 5. Experimental results of node classification on Flickr dataset.

Figure 6. Results of ablation experiments on ACM dataset.

Figure 7. Results of ablation experiments on DBLP dataset.

Figure 8. Results of ablation experiments on Amazon dataset.

Figure 9. Results of ablation experiments on Flickr dataset.

Table 1. Description of relevant notations.

Notation	Definition
$G$	Undirected attribute map (math.)
V	The set of nodes of the attribute graph $G$
E	Edge set of the attribute graph $G$
A	The adjacency matrix of the attribute map $G$
X	The identity matrix of the attribute map $G$
Y	The node labeling matrix of the attribute graph $G$
T	Tag collection
Z	Figure node embedding matrix obtained from the self-encoder
$H^{(l)}$	Node embedding matrix for layer l encoder
$N_{i}$	The set of neighbors of node i
$N, M, R, L$	The set of neighborhoods of number of nodes, edges, features and graph neural networks
F	The dimension of the node feature matrix X
$F^{'}$	The dimension of the potential node embedding
$f^{(l)}$	Node embedding dimension of layer l encoder/decoder
$S_{F}^{(l)}, S_{T}^{(l)}$	Layer-specific transformation matrices
$ϵ_{i}$	Neighborhood embedding of node i

Table 2. Statistical information on the dataset.

Data Set	Number of Nodes	Number of Sides	Label Count	Characteristic Number (Math.)
ACM	3025	6454	3	1870
DBLP	3890	2,347,227	4	334
Amazon	5004	27,313	5	354
Flickr	89,250	899,756	7	500

Table 3. Experimental results for node clustering on datasets.

Modeling	ACM			DBLP			Amazon			Flickr
Modeling	ACC	NMI	ARI	ACC	NMI	ARI	ACC	NMI	ARI	ACC	NMI	ARI
K-means	0.511	0.309	0.242	0.482	0.452	0.321	0.565	0.324	0.328	0.431	0.485	0.209
Spectral	0.513	0.256	0.204	0.407	0.415	0.317	0.509	0.332	0.312	0.418	0.479	0.206
TADW	0.529	0.484	0.347	0.417	0.346	0.275	0.536	0.344	0.312	0.437	0.412	0.070
GAE	0.619	0.363	0.663	0.515	0.382	0.319	0.589	0.361	0.358	0.529	0.399	0.378
VGAE	0.571	0.459	0.415	0.535	0.406	0.324	0.692	0.398	0.346	0.596	0.465	0.291
ARGA	0.642	0.435	0.566	0.537	0.409	0.331	0.715	0.402	0.353	0.598	0.467	0.314
ARVGA	0.658	0.497	0.391	0.668	0.563	0.479	0.693	0.359	0.345	0.635	0.397	0.214
GALA	0.692	0.536	0.496	0.711	0.616	0.542	0.716	0.410	0.420	0.702	0.440	0.316
TAGE	0.702	0.544	0.513	0.728	0.621	0.529	0.697	0.425	0.436	0.616	0.410	0.324

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, D.; Zhang, S.; Zhao, Y.; Xie, M.; Wang, D. An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence. Mathematics 2024, 12, 3644. https://doi.org/10.3390/math12233644

AMA Style

Chen D, Zhang S, Zhao Y, Xie M, Wang D. An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence. Mathematics. 2024; 12(23):3644. https://doi.org/10.3390/math12233644

Chicago/Turabian Style

Chen, Dongming, Shuyue Zhang, Yumeng Zhao, Mingzhao Xie, and Dongqi Wang. 2024. "An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence" Mathematics 12, no. 23: 3644. https://doi.org/10.3390/math12233644

APA Style

Chen, D., Zhang, S., Zhao, Y., Xie, M., & Wang, D. (2024). An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence. Mathematics, 12(23), 3644. https://doi.org/10.3390/math12233644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence

Abstract

1. Introduction

2. Related Work

3. Attribute Graph Embedding Algorithms for Perceptual Topology and Attribute Influence

3.1. Definition of the Problem and Description of Relevant Symbols

3.2. The TAGE Model

3.2.1. Graph Sampling

3.2.2. Attribute Aggregation Module

3.2.3. Topology Aggregation Module

3.2.4. Interaction Module

3.2.5. Graph Decoder

4. Experiments

4.1. Experimental Setup

4.1.1. Experimental Dataset

4.1.2. Baseline Algorithm

4.1.3. Hyperparameter Settings

4.2. TAGE Model Node Classification Experiment

4.3. TAGE Model Node Clustering Experiments

4.4. TAGE Model Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI