1. Introduction
Recommender systems significantly impact people’s daily lives; they filter information for people and recommend things that people may be interested in [
1]. Recommendation systems widely exist in different fields, such as e-commerce [
2], job markets [
3], and streaming media [
4]. A key scenario in recommendation systems is session-based recommendations. A session refers to a sequence of actions carried out by a user, such as clicking buttons, viewing pages, or making purchases [
5]. These actions occur within a short time and are considered a single visit [
6]. This type of recommender system can provide personalized suggestions to users within a specific session, typically without requiring long-term user profiles or histories [
7]. These systems are widely used in content streaming [
8], news websites [
9], and other online platforms [
10].
Numerous methods have been developed to design session-based recommender systems. Attention-based methods are crucial in the field of deep learning [
11]. It is also crucial in session-based recommendations. For instance, in the work of Zhang et al. [
12], a vanilla attention mechanism is utilized to combine heterogeneous features for extracting the transition patterns between items. Yuan et al. [
13] provide a dual sparse attention network to mitigate the effect of irrelevant items on user preference. Additionally, topic models have emerged as a recent area of focus. Sottocornola et al. [
14] train topic models to track changes in users’ interest in news over time; ref. [
15] is a music recommender system where the music consists of a session. This work regards the music list as a set of topics, helping to uncover latent user patterns. In addition to the above two types of methods, there are many other methods, such as collaborative filtering-based methods [
16], content-based methods [
17], and matrix factorization-based methods [
8].
Recently, graph neural networks (GNNs) have been the most popular methods [
18,
19,
20]. Many GNN methods for session-based recommender systems have emerged, such as TMI-GNN [
21], RN-GNN [
22], and SR-GNN [
23]. Compared to other recommender systems, GNN-based recommender systems can learn intricate item conversions and effectively model user behavior within a specific session graph, which is ignored by other methods. Specifically, GNN-based recommender systems model the input session as a graph, in which nodes refer to various items within the session, and edges refer to transitions or relations between items. GNNs are capable of effective feature learning for the items in the session graph. By fully considering and modeling complex inter-relationships between items, these GNN-based methods can understand user behaviors and preferences during recommendations.
However, the methods mentioned above are designed for next item recommendation and might not be effective in recommending new items. This ineffectiveness arises from the lack of user interaction with new items, leading to new items independent of the session graph that are hard to learn by GNNs. To tackle this issue, a GNN for the session-based new item recommendation method (NirGNN) has been proposed recently [
24]. NirGNN incorporates a dual-intent network that mimics user decision-making processes through a soft-attention mechanism and a
distribution mechanism. To address the challenge of new items that are hard to learn by using GNNs, NirGNN leverages inspiration from zero-shot learning and infers the embedding of new items based on their attributes. Consequently, NirGNN calculates recommendation scores and prioritizes new items with higher scores for user recommendations. Nonetheless, this method may encounter three limitations:
Lack of Time Sensitivity: The previous research models session graphs simply as directed graphs, failing to consider the temporal aspect of item interactions. This omission is a critical flaw as the timing of an item’s appearance in a session can greatly influence its relevance and importance. Without considering this temporal dimension, the session graph falls short of mirroring the dynamic nature of actual user interactions and preferences.
Sparsity of Sessions: This scarcity of data within individual sessions poses a significant challenge for accurately learning user intent in graph neural networks. The sparsity sessions can lead to incomplete or biased interpretations of user preferences as the graph neural network has limited interaction data to analyze and learn from.
Flawed Attention Mechanism: Previous works usually use attention mechanisms to learn users’ preferences. However, the existing attention mechanism disproportionately increases the preference weight of the last item visited by the user. This approach can lead to a skewed understanding of user preferences as it assumes the most recent interaction is the most significant for users. Such a mechanism neglects the possibility that earlier items in the session might hold equal or greater relevance to the user’s preferences. Consequently, this results in recommendations that do not accurately reflect the user’s overall preferences, focusing narrowly on their most recent activity.
As a remedy for the shortcomings, we introduce a time-sensitive method for a session-based new item recommendation, called the Time-Sensitive Graph Neural Network (TSGNN). (1) To address the limitation on time sensitivity, we propose an innovative modeling technique for constructing session graphs that incorporates a temporal element. Specifically, we model the session graph according to the sequence of node visits. To fully express the temporal relationship, we apply a time-sensitive weight to each edge. Consequently, GNN-based models are enabled to learn the temporal information of nodes through these edge weights. With this graph modeling technique, the model considers the temporal sequence of item interactions, ensuring that the appearance of each item is taken into account, and reflects user interactions and preferences over time. (2) To tackle the challenge of sparse sessions, we incorporate a graph augmentation technique into the session graph. This technique involves altering the original graphs, allowing the TSGNN to generate a multitude of augmented graphs. This improvement significantly enriches the session data by providing a set of informative graphs for the GNN encoder. This strategy effectively mitigates the impact of graph sparsity, enhancing the overall performance of the model. (3) To provide a comprehensive attention mechanism, we propose a new attention mechanism, called the time-aware attention network. The time-aware attention mechanism emphasizes the temporal aspects’ influence on the user’s preference learning process. By incorporating this approach, the time-aware attention mechanism mitigates the excessive emphasis often placed on the most recently visited item. Instead, this attention network amplifies the temporal impact on attention allocation, ensuring a more accurate and nuanced interpretation of user preferences.
By focusing on the temporal aspect, TSGNN aims to capture more accurate and comprehensive user preferences than previous works, leading to more relevant recommendations. Our work’s enhancements in session-based new items recommendation can be encapsulated in the following key contributions:
We highlight and address the problem of previous research that ignores the aspect of time influence in session graph modeling. This inclusion of time sensitivity ensures a dynamic representation of user interactions, aligning the model closely with the actual user behavior and preference.
We incorporate graph augmentation technology into the session-based new item recommendation process. This innovation significantly reduces the sparsity of session graphs, leading to a situation in which the session graph is easily learned by GNNs.
We propose a novel attention mechanism specifically designed for learning user preference with a time-aware perspective. This method adjusts the focus of the attention mechanism, ensuring that it accounts for the temporal aspects of user interactions. By doing so, a more accurate understanding of user preferences over time reduces the overemphasis on the most recent interactions, which has been a drawback of previous models.
The structure of this paper is outlined as follows:
Section 2 presents the related work relevant to our study. In
Section 3, we detail our method, TSGNN, highlighting several innovative techniques, including the time-sensitive weight method, session graph augmentation, and time-aware attention networks.
Section 4 describes a series of experiments conducted to validate the efficacy of our proposed methods. Finally, in
Section 5, we conclude this paper and offer directions for future research.
3. Method
This section introduces the preliminary concepts for this paper; provides an overview of our model, TSGNN; and illustrates each component of TSGNN.
3.1. Preliminary
A user’s session consists of items they have visited, e.g., . Each represents one item, and n represents how many items there are. There is an order for every two items in a session, which represents the user’s actions. The session can be constructed as a directed graph , where denotes a set of nodes, and represents the features of nodes with dimension d. A is the adjacent matrix, and . An edge in a directed graph is the order of items in a session.
The GNN session-based new item recommendation aims to design a GNN-based method for recommending new items to users. Specifically, the GNN-based methods for new item recommendation often learn user preference by using session graphs. As the new items are not directly learnable by GNNs, the GNN session-based methods often introduce external knowledge, such as attribute to create simulated embeddings for these new items.
3.2. Overview
Figure 1 shows the overview of TSGNN. TSGNN obtains input from user sessions. In step (a), a time-sensitive session graph is constructed from the input session. Subsequently, in step (b), the session graph undergoes augmentation through a graph augmentation function, resulting in two augmented graphs. In step (c), the two augmented graphs are fed into two weights-shared GNN encoders, ensuring that all learned embeddings are located within the same space. The encoding process generates representations for both graphs, which are subsequently fed into a time-aware attention mechanism. This network is tailored to comprehend and capture user preferences, taking into account the impact of time on the session graph. In step (d), TSGNN calculates the compatibility scores between user preferences and new items. It then identifies and recommends the most appropriate item for the user based on these scores. A detailed explanation of each component will be presented in the subsequent sections.
3.3. Time-Sensitive Session Graph Construction
Given the limited consideration of the impact of time on session graph modeling in previous research, we reconsider this aspect in session graph construction. A session is a directed sequence, consisting of various items (nodes) and directed edges between items. The order of directed edges represents the time sequence in which these nodes are visited. The strategy utilized by SR-GNN [
23] was followed, which involves transforming a session into a directed session graph. If a node is visited multiple times, we do not create new vertices for each visit. Instead, we directly connect the node, which has been visited multiple times, to its subsequent node (or the preceding node). If a node in the graph is visited multiple times, we do not create new vertices for each visit. Instead, we directly connect the node, which has been visited multiple times, to its subsequent node (or the preceding node). There are two adjacency matrices in a directed session graph: one for an incoming adjacency matrix and one for an outgoing adjacency matrix [
23]. Suppose we have an edge
with nodes
and
, where
starts the edge and
ends it. The weight of each edge can be defined as:
calculates the incoming degree of node
, while
calculates the outgoing degree of node
. Building upon the weighting method of SR-GNN, we incorporate a temporal influence by assigning time weights to each edge and constructing a time-sensitive adjacency matrix. Specifically, edges are sorted by their appearance timestamps, and we assign a rank score
as a weight for the edge
. Specifically, the
calculates the order in which the edge
appears. For example, the first edge encountered in the session is assigned a weight of 1, the second edge is assigned a weight of 2, and so forth, with the
n-th visited edge receiving a weight of
n. The item last visited may reflect the user’s preference over the early visited items. By assigning weights in this manner, different edges are given different degrees of importance based on their time of visit. Recently visited edges are given a higher weight. Consequently, the rank score
effectively captures the temporal influence. The time-sensitive weighting method can be defined as:
where
is a ranking method. We provide an example of how to calculate an outgoing time-sensitive adjacency matrix. As illustrated in
Figure 2a, considering a session with 5 items,
, our objective is to construct it as a time-sensitive session graph (b), achieved by calculating outgoing and incoming adjacency matrices (c). For edge
, the in-degree of
is 1, the out-degree of
is 2, and the rank of
is 1. Consequently, the weight assigned to this edge is
. The calculation of the incoming adjacent matrix is the same.
3.4. Session Graph Augmentation and Embedding
Using the mechanism mentioned above, a time-sensitive session graph can be constructed. To address the issue of the sparsity of session graphs, we adapt two commonly used methods for session graph augmentation, specifically for the time-sensitive graph. Drawing on established methods [
51], we implement ‘drop feature’ and ‘drop edge’ techniques. For node features, we employ a random probability, denoted as
, to selectively drop features [
60], e.g.,
. Similarly, we apply this random function
to the adjacency matrix, such that
. To learn the embedding
for the vertex
, we use a GNN as an encoder, such as the gated graph neural network (GGNN) [
61]. In the case of a node
, the embedding
takes the following form [
62]:
where
and
denote the incoming and outgoing adjacency matrices of the
i-th row in the matrix, respectively. The variable
t symbolizes the training step,
signifies the sigmoid function, and ⊙ indicates element-wise multiplication.
and
are parameters that can be adjusted during the learning process. A GGNN acquires the embeddings of items within a session graph
G by propagating information among neighboring nodes.
3.5. Time-Aware Attention Network
This section introduces a temporal attention network for learning user preferences. It has been found that some previous studies overemphasize users’ preferences based on the last item they visited [
23], overlooking the influence of timestamps on user preferences. To address this limitation, we introduce a time-aware attention network to learn user preferences under the time influence. Given that each node’s representation includes temporal information, we utilize the cosine similarity between
and
to assess the time influence of
on
. By splicing the similarity of the current nodes and other nodes, the preferences of users can be tracked over time. By concatenating these time influences from other nodes, node
embodies the information from other items, implicitly reflecting user preferences. The significance of each node is then determined using the attention mechanism. Thus, the overall time-aware attention
is as follows:
where
W represents the learnable weights of item embedding vectors, and
is the cosine similarity function. User preference
I influenced by time can be defined as:
Time-aware attention networks take into account the temporal order of items, recognizing that the items in which users are viewed are crucial to learning preferences.
3.6. Optimization
After learning the users preference, we employ the following function to compute the recommendation score [
24]:
where
is the candidate new item embedding.
To optimize the entire model effectively, we integrate a variety of loss functions for the overall loss. Particularly, since we have added graph augmentation in the model, we employ the InfoNCE loss as a foundational component for model optimization. The InfoNCE loss is formulated as follows [
63]:
Here,
denotes the similarity between nodes
and
,
is a corresponding node in the augmented graph for
, and
represents non-corresponding nodes. The essence of this loss is to pull corresponding nodes (those that are augmented versions of the same node) closer together and push non-corresponding nodes apart in the embedding space. Following the augmented method described in
Section 3.4, we obtain two augmented graphs
and
. In
, we select node
as the anchor node and
as the corresponding node in
. In both graphs
and
, the node
is treated as a positive sample of
whereas the other nodes (e.g.,
) are treated as negative samples [
60]. Building on this, we can express our contrastive loss function as follows:
where
represents a exponential function and
represents a temperature coefficient. This loss optimizes the graph encoder by making the corresponding nodes more similar and non-corresponding nodes more dissimilar.
In addition, we incorporate two essential loss functions in the recommendation system, including a cross-entropy loss
to optimize the recommendation process, and a new item learning loss
to learn new items embeddings.
is a distance function. As a result, the final loss function can be formulated as follows:
where
represents the ground truth item embedding, and
is employed to balance the new item embedding learning loss and the contrastive loss for augmented graph learning. Given that
is the primary loss for recommendation, we do not assign a trade-off parameter to it.