TempoGRAPHer: Aggregation-Based Temporal Graph Exploration

Tsoukanara, Evangelia; Koloniari, Georgia; Pitoura, Evaggelia

doi:10.3390/info16010046

Open AccessArticle

TempoGRAPHer: Aggregation-Based Temporal Graph Exploration

by

Evangelia Tsoukanara

^1,*

,

Georgia Koloniari

¹

and

Evaggelia Pitoura

²

¹

Department of Applied Informatics, University of Macedonia, 546 36 Thessaloniki, Greece

²

Department of Computer Science & Engineering, University of Ioannina, 451 10 Ioannina, Greece

^*

Author to whom correspondence should be addressed.

Information 2025, 16(1), 46; https://doi.org/10.3390/info16010046

Submission received: 18 November 2024 / Revised: 28 December 2024 / Accepted: 8 January 2025 / Published: 13 January 2025

(This article belongs to the Special Issue Advances in Data and Network Sciences Applied to Computational Social Science)

Download

Browse Figures

Versions Notes

Abstract

:

Graphs offer a generic abstraction for modeling entities and the interactions and relationships between them. Most real-world graphs, such as social and cooperation networks, evolve over time, and exploring their evolution may reveal important information. In this paper, we present TempoGRAPHer, a system for analyzing and visualizing the evolution of temporal attributed graphs. TempoGRAPHer supports both temporal and attribute aggregation. It also allows graph exploration by identifying periods of significant growth, shrinkage, or stability. Temporal exploration is supported by two complementary strategies, namely skyline- and interaction-based exploration. Skyline-based exploration provides insights into the overall trends in the evolution, while interaction-based exploration offers a closer look at specific parts of the graph evolution history where significant changes occurred. We present experimental results demonstrating the efficiency of TempoGRAPHer. Additionally, we showcase the usefulness of our system in understanding graph evolution by presenting detailed scenarios, including exploring the evolution of a real contact network between primary school students and analyzing the collaborations in a co-authorship network between authors of the same gender over time.

Keywords:

temporal graph; exploration; aggregation; temporal evolution

1. Introduction

Graphs are ubiquitous as they provide a generic model for entities and their interactions or relationships. They are used in several real-world applications and problems. For example, graphs can successfully capture collaborations in a co-authorship network or interactions between people in a social network. Most real-world graphs change with time, and thus considering the time dimension is vital. An interesting problem in this context is studying the evolution of graphs over time and detecting important events in their history, such as time periods of significant stability, growth, or shrinkage.

When studying the evolution of a graph, we are more interested in detecting general patterns and trends in the evolution of the graph and not necessarily on updates that concern individual nodes or edges. Entities in real networks are usually described by attributes that represent characteristics of a node. We support graph aggregation, that is, grouping nodes based on their attribute values, while respecting network structure. We also support temporal aggregation, that is, aggregating graph updates in specific time intervals using appropriate temporal operators. Graph aggregation combined with temporal aggregation allows us to view graph evolution at a higher level, thus enabling us to focus on the evolution during specific time periods of the relationships between groups of nodes having attributes with specific values.

By combining graph and temporal aggregation, we may detect evolution patterns that can lead us to discover hidden associations with external factors. Conversely, if we are aware of external factors that potentially affect the evolution of the graph, we may direct our analysis to specific time periods and attributes.

As a motivating scenario, consider a face-to-face proximity network in a school, where nodes represent students and edges the physical interactions between them. We would like to study the evolution of this network to take preventive measures against the spread of infectious diseases. Detecting important events in the evolution of the network may reveal patterns that will allow us to build targeted mitigation strategies against disease spread. For such patterns to be effective, they should appear at a global level and not at the level of individual students. For example, reporting an increase in the interactions between students of different classes during breaks can be considered as a parameter affecting the spread of an infectious disease in the community. By taking advantage of such insights, we can plan appropriate strategies to reduce the risk of disease spread such as limiting the break time or having breaks at different times for each class.

As an alternative scenario, consider an academic conference interested in analyzing collaboration patterns among its attendees over multiple conference editions. Attendees are associated with attributes such as their gender, location, and research field. Suppose that the conference has adopted a policy aiming at fostering the contribution of women in research. By examining the growth of collaborations between women in the years prior to and after the adopted policy, the conference organizers can gain valuable insights to determine whether the initiative had the expected impact or further adjustments are necessary to enhance its effectiveness. Additionally, by considering the location of attendees, the conference can seek to promote collaborations between authors from specific geographical areas. This strategic approach can enhance networking opportunities and foster collaboration within targeted regions, contributing to a more diverse and interconnected academic community. Finally, the evolution of the research fields provides information into the dynamics of interdisciplinary collaboration within the conference community. This information can inform decisions related to program planning and support efforts to promote collaboration and knowledge exchange across diverse academic communities.

In this paper, we present TempoGRAPHer, a system that allows users to interactively explore the evolution of attributed graphs through time. TempoGRAPHer provides support for both temporal and graph aggregation. The interactive exploration of the graph evolution is guided by two complementary exploration strategies: skyline-based and interaction-based. Both strategies assist users in discovering important events in the evolution of the graph, where an event is defined as an interval of increased activity (shrinkage, growth), or lack thereof (stability).

Skylines are widely used in several domains to identify dominating entities in case of multi-criteria selection problems [1]. In our setting, we define skylines based on the significance of an event (i.e., number of affected graph elements) and the length of the associated interval. Skyline-based exploration provides an overview of the graph’s evolution, revealing trends associated with specific attribute values.

The interaction-based approach allows users to focus on distinctive parts of the graph’s evolution by setting a threshold on the significance of the event. This enables users to identify specific periods in the evolution where significant events occur, based on the threshold selected.

The two strategies can be applied independently or in combination. For example, by first performing skyline-based exploration, peaks in the graph evolution are found that inform users about the level of significance of the events through time. Then, by exploiting this output, users can determine an appropriate threshold value for the interaction-based approach so as to study in a more detailed level the corresponding time periods.

TempoGRAPHer builds on our theoretical models for exploring the evolution of temporal graphs [2,3]. Specifically, the theoretical formulation of the interaction-based exploration strategy was first introduced in [2], while the theoretical formulation of the skyline-based exploration was first introduced in [3]. A preliminary version of TempoGRAPHer without support for the skyline-based exploration was demonstrated in [4].

In this paper, we provide an in-depth analysis of the various components of the TempoGRAPHer system, which is extended with the skyline-based exploration strategy, and also, a detailed description of the exploration procedure that incorporates both the skyline-based and interaction-based approaches. Moreover, we propose ways of combining the two exploration approaches, allowing one strategy to leverage the results of the other to provide a more effective analysis of temporal graph evolution. Furthermore, we provide a complexity analysis of all our algorithms, as well as a series of additional experiments to evaluate their performance. Finally, we include two thorough case studies, where we showcase the applicability of our approach.

The real-world case studies include one using a proximity network between students in a primary school and another based on a collaboration network between scientists. These examples illustrate the diverse ways in which TempoGRAPHer can uncover valuable insights into the evolution of graphs.

The remainder of this paper is structured as follows. Section 2 summarizes related work. In Section 3, we introduce the necessary concepts and models that our system builds on. Section 4 provides details on the implementation of our algorithms, while Section 5 presents the TempoGRAPHer system and its functionality in detail. In Section 6, we report experimental results to evaluate the performance of our algorithms and present two case studies where we show how TempoGRAPHer highlights important aspects in the evolution of a graph. Finally, Section 7 offers conclusions and directions for future work.

2. Related Work

In the following section, we review a collection of related works, focusing specifically on research related to graph exploration, temporal graphs, skylines, and graph visualization.

Graph exploration. Graph exploration is a generic term that can be viewed as supported by three tasks: (1) profiling and structural summarization, which refers to computing various statistical measures, e.g., [5], and providing a concise representation of the most interesting parts of the graph, for example, by extracting the schema of the graph, e.g., [6,7], or by mining frequent substructures, e.g., [8]; (2) exploratory search, which aims at helping a user that has vague information and needs to retrieve an output that is useful and relevant, often through an iterative process, for example, by searching by example, e.g., [9,10], or by exploiting query suggestion and refinement, e.g., [11,12]; and (3) exploratory analytics, which provides multi-dimensional analysis and statistical information about the graph, e.g., [13,14].

Although various approaches have been proposed for all three types of exploration, these approaches either assume static graphs or model time as one of the node or edge features, thus offering no specific support for exploring the evolution of the graph over time. Our temporal exploration strategy based on skylines and interactions on aggregated evolution graphs provides a novel form of summarizing, analyzing, and searching the evolution of the graph.

Temporal graphs. Regarding previous work on temporal graphs and their evolution, there is some previous research in defining temporal operators and time and attribute aggregation. T-GQL [15] is an extension of GQL with temporal operators to handle temporal paths; evolution is modeled through continuous paths. TGraph [16] uses temporal algebraic operators such as temporal selection for nodes and edges and traversal with temporal predicates on a temporal property graph. TGraph is extended with attribute and time aggregation, which allows viewing a graph in different resolutions [17], but only stability is studied. GRADOOP [18,19] introduces various extensions for supporting temporal property graphs such as temporal operators for grouping and pattern matching. The system provides different graph visualizations, i.e., the temporal graph view, the grouped graph view, and the difference graph view, which illustrates new, stable, and deleted elements between two graph snapshots. Unlike our work, which facilitates a complete exploration strategy that concerns the history of the graph, the system is driven by user queries and provides no exploration strategy.

Another approach is versioning (i.e., maintaining previous graph snapshots), which does not utilize temporal operators. In the EvOLAP graph [20], versioning is used both for attributes and graph structures to enable analytics on changing graphs. In [21], a conceptual model is presented with explicit labeling of graph elements to support analytical operations over evolving graphs, and particularly time-varying attributes, while in [22] a conceptual model is designed for capturing changes in the topology, the set of attributes, and the attribute values.

In addition to these approaches, other techniques such as stochastic actor-based models (SABMs) and parameter-evolution analysis are commonly used in analyzing temporal graphs. SABMs, as discussed by [23,24], define stochastic models for actor behaviors and interactions over time, providing insights into network dynamics, while parameter-evolution analysis, as explored by [25], tracks changes in network characteristics and tie strengths over time, shedding light on structural shifts within evolving networks.

Skylines. In this paper, we have proposed exploration based on skylines for identifying time periods that dominate other time periods in terms of increased activity (shrinkage, growth) or lack thereof (stability). To the best of our knowledge, this is a novel application of skylines. There has been a lot of previous work on skylines. Since its introduction [26], the skyline operator has been utilized in several domains to identify dominating entities in multi-criteria selection problems [1]. Although skyline queries are very popular for multi-dimensional data, there is not much work on skylines over graphs. A domain where skylines were first used is road networks, where the best detours based on a given route [27] or the best places to visit [28] are detected using distances among other possible criteria. In [29], skylines of routes based on multiple criteria, such as distance, and cost, are also defined. The network is modeled as a multi-attribute graph and a vector of different optimization criteria is stored for each edge. In [30], the authors explore the concept of skyline path queries in the context of location-based services, where, given a pick-up point and a destination point, the system applies skyline queries based on a set of features so as to determine the most useful routes.

Besides road networks, skylines have been defined for graphs using the shortest path distances between nodes [31]. Specifically, given a set of query nodes, a node u dominates a node v if u is at least as close as v to all query points and u is closer than v to at least one query point. A different approach that does not rely on distances is defined in [32], where a skyline consists of subgraphs that best match a given user query, also represented as a subgraph. Matching relies on isomorphisms and uses appropriate encoding schemes that capture both structural and numeric features of the graph nodes. Finally, skylines on knowledge graphs are defined in [33]. The focus is on supporting skyline queries over entities in an RDF graph through SPARQL queries and the efficient evaluation of such queries, but while the data are modeled as a graph, skylines are defined on node attributes and do not take into account graph structure.

Graph visualization. In this work, we use visualization to present the results of our skyline-based and interaction-based exploration. In general, there is a considerable amount of research related to graph visualization [34,35]. Many of these studies are specifically dedicated to temporal graphs. In the following, we position our work within the context of graph visualization taxonomies that identify time as the major distinguishing feature.

In [36], the authors provide a task taxonomy for temporal graph visualization by extending the generic Andrienko Task Framework [37], which distinguishes tasks based on data items, to include temporal graph data. Specifically, they create a design space that involves two dimensions. The first dimension includes task categories, that is, lookup, comparison, and relation seeking, and the second dimension refers to the data items that participate in the tasks, i.e., graph elements, graph subsets, time points, and time intervals. Our model aligns with the category of inverse lookup or pattern search on graph subsets and time intervals as we identify patterns occurring within parts of the graph defined using temporal operators. The authors in [38] provide a hierarchical taxonomy for dynamic graph visualizations that identifies time as the primary feature in distinguishing approaches. They assert two major categories: animation (time-to-time mapping), where dynamic graphs are represented as animated node–link diagrams, and timeline (time-to-space mapping), where visualizations correspond to static graphs based on a timeline. The methods described rely on user observation to detect evolution patterns, while our approach quantifies evolution using time and attribute aggregation, providing an automated view of the events unveiled in the graph.

There is also a substantial amount of research on graph visualization that does not explicitly focus on the temporal dimension but encompasses it or enables its integration into the discussed categories. In [39], the visualization of multilayer networks is studied, where time slices of dynamic graphs are interpreted as layers representing different temporal aspects. The authors in [40] offer a high-level categorization of faceted graph visualization and describe compositions of the graph layout with representations of the four common facets of partitions, attributes, time, and space. Our work can be regarded as a combination of graph structure, time, and attributes facets as it addresses the structural and semantic changes happening in the graph over time.

In addition, there are works that focus on different aspects within graph visualization. For example, ref. [41] tackles the issue of large graph sizes in graph visualization. To address this challenge in our work, we implement sampling, as well as aggregation, thus achieving a significant reduction in the number of nodes and edges. This strategy offers an effective and efficient visualization, ensuring graph readability. Another area of focus is addressed in [42], where an in-depth analysis of graph interpretation, summarization, and visualization (SIV) strategies employing graph-based learning methods is provided, with a particular emphasis on deep learning approaches. In our paper, we adopt a data-centric approach, performing summarization by aggregating time and attributes and interpreting evolution using skylines, providing visualization for various supported tasks.

3. The TempoGRAPHer Framework

In this section, we provide an overview of the underlying model of TempoGRAPHer that builds on the GraphTempo model [2,3].

3.1. Temporal Aggregation

We adopt an interval-based model of temporal graphs, where nodes and edges are associated with intervals of validity and each node is associated with a set of attributes.

We define a temporal attributed graph

G (V, E, τ u, τ e)

in a time interval T, as a graph

G [T]

where each node

u \in V

and edge

e \in E

is associated with timestamps

τ u (u) \in T

and

τ e (e) \in T

, which are sets of intervals during which u and e are valid in G. For each node u and time instance

t \in τ u (u)

, there is a n-dimensional tuple,

A (u, t) = {A_{1} (u, t), A_{2} (u, t) \dots A_{n} (u, t)}

, where

A_{i} (u, t)

is the value of u at time t

\in τ u (u)

on the i-th attribute.

We collectively refer to graph nodes and edges as graph elements.

Our model supports both static attributes with values that remain the same through time and time-varying ones with values that may change. In the following examples, for ease of illustration, the nodes in the graphs only include static attributes. Figure 1a depicts an example of a temporal attributed graph defined in interval

T = [1, 3]

. Both nodes and edges are annotated with timestamps corresponding to the time instances in T during which they are valid. Each node representing a student has two static attributes,

g e n d e r

, with values

{f, m}

, and

c l a s s

, with values

{1 A, 1 B, 1 C}

. In the following, an interval that includes a single time point, e.g.,

[1, 1]

, is simply denoted as

[1]

.

We define a set of temporal operators, that, given a temporal attributed graph and time intervals, produces a new temporal attributed graph. First, we define union, where given a graph

G [T]

, and intervals

T_{1},

and

T_{2}

, the union (∪) operator

G [T_{1}] \cup G [T_{2}]

generates a new temporal attributed graph with graph elements that appear either in

G [T_{1}]

or

G [T_{2}]

. The intersection (∩) operator, denoted as

G [T_{1}] \cap G [T_{2}]

, where given a graph

G [T]

, and intervals

T_{1},

and

T_{2}

, produces a new temporal attributed graph, whose elements appear in both

G [T_{1}]

and

G [T_{2}]

. Finally, the difference (−) operator

G [T_{1}] - G [T_{2}]

, where given a graph

G [T]

, and intervals

T_{1},

and

T_{2}

, outputs a temporal attributed graph that includes graph elements that appear in

G [T_{1}]

but not in

G [T_{2}]

.

Given the temporal attributed graph in Figure 1a and time intervals

T_{1} = [1]

and

T_{2} = [2]

, Figure 1b presents the union graph

G [1] \cup G [2]

, Figure 1c the intersection graph

G [1] \cap G [2]

, and Figure 1d the difference graph

G [1] - G [2]

.

3.2. Graph Evolution and Aggregation

We want to study the evolution of a graph between two consequent intervals

T_{1}

and

T_{2}

so as to capture important events. Specifically, we want to determine whether graph elements remained stable in both intervals (called the stability event), new graph elements appeared in the most recent interval (called the growth event), or existing graph elements disappeared in the most recent one (called the shrinkage event).

To model graph evolution with respect to these three types of events, we exploit the temporal operators to define the following three event graphs. For two time intervals

T_{1}

,

T_{2}

where

T_{2}

follows

T_{1}

: (a) the stability graph (s-graph),

G_{s} [(T_{2}, T_{1})]

, is defined as

G [T_{1}] \cap G [T_{2}]

and captures graph elements that appear in both

T_{1}

and

T_{2}

, (b) the shrinkage graph (h-graph),

G_{h} [(T_{2}, T_{1})]

, is defined as

G [T_{1}] - G [T_{2}]

and captures elements that disappear from

T_{1}

to

T_{2}

and (c) the growth graph (g-graph),

G_{g} [(T_{2}, T_{1})]

, is defined as

G [T_{2}] - G [T_{1}]

and captures graph elements that are new to

T_{2}

. We use

γ

to denote any of the three events.

Besides studying graph evolution at the individual node and edge level, we want to study the evolution of a graph at a higher granularity level. For example, instead of studying stable interactions between individuals, we also want to study stable interactions between individuals of the same gender or class. To this end, we use graph aggregation, where nodes are grouped based on one or more of their attribute values while respecting network structure.

Given a graph

G [T]

and a subset C of its n attributes, the graph aggregated by C in T is a weighted graph

G [T, C]

, where there is a node

u^{'}

in

G [T, C]

for each value combination of the C attributes and the weight of

u^{'}

,

w (u^{'})

is equal to the number of distinct nodes u in

G [T]

that have the specific attribute value combination. There is an edge

e^{'}

between two nodes

u^{'}

and

v^{'}

in

G [T, C]

, if there is an edge between the corresponding nodes in

G [T]

and its weight

w (e^{'})

is equal to the number of such edges. Figure 2a shows the graph aggregated by gender in time instance 3,

G [[3], {g e n d e r}]

, of the graph in Figure 1a. There are two nodes that represent male and female authors, with values m, f. The weight of the male nodes is 3 and denotes the number of nodes of the graph in time instance 3 with the attribute male. Similarly, female nodes have weight equal to 1, corresponding to the number of nodes with the attribute female for the graph in time instance 3. There is one edge between male nodes with weight 2 and one edge between male and female nodes with weight 2 that represent the number of collaborations between nodes having corresponding attributes.

We provide two types of aggregation: distinct aggregation, where the weight of an aggregate node in

G [T, C]

counts the distinct nodes in

G [T]

that have the corresponding attribute value combination, and non-distinct aggregation, where each appearance of an attribute value combination is counted in the weight each time it appears either on the same or on different nodes.

Aggregation is also applied to event graphs. For example, Figure 2b–d shows the aggregated event graphs for the evolution of the graph of Figure 1a from time instance 2 to 3 when distinct aggregation by gender is applied.

To evaluate the significance of an event, we measure the number of affected graph elements. We focus on events regarding edges. Let c and

c^{'}

be two attribute value combinations for the attributes C selected for the aggregation. For an event

γ

, we use

c o u n t (G_{γ} [(T_{2}, T_{1}), C], c, c^{'})

to denote the number of edges from nodes labeled with c to nodes labeled with

c^{'}

in the aggregated

γ

-graph

G_{γ} [(T_{2}, T_{1}), C]

, which is equal to the weight of such edges. For example, in Figure 2c, with respect to the growth event,

c o u n t (G_{g} [([3], [2]), {g e n d e r}], f, m) = 2

measures the number of new edges between female and male nodes that appear in

G [3]

with respect to

G [2]

.

3.3. Graph Exploration

We are interested in identifying time points in the history of the graph where the count of events is high, indicating interesting points in the evolution of the graph. Concretely, we want to determine time points, termed reference points,

t_{r}

, where the stability, shrinkage or growth counts,

c o u n t (G_{γ} [(t_{r}, T_{r}), C], c, c^{'})

, are high with respect to an immediate preceding interval

T_{r}

.

To locate such points, for each reference point

t_{r}

, we perform our search as follows. We initialize

T_{r}

to be the time point immediately preceding

t_{r}

, and then we gradually extend

T_{r}

to the past. We extend

T_{r}

employing two types of semantics: (a) intersection-inspired strict (∩) semantics, where we want a graph element to appear in all time instances of the extended interval, and (b) union-inspired loose (∪) semantics, where we want a graph element to appear in at least one time instance in the extended interval. The strict semantics model includes persistent occurrences of graph elements, while loose semantics also includes transient changes.

In case of steep changes, such as growth or shrinkage, we are interested in finding the shortest interval

T_{r}

with a high event count. In contrast, for stability, we are interested in the longest interval

T_{r}

with a high event count so that we capture the longest period that the elements existed. Thus, we define minimal and maximal properties.

Given

G_{γ} [(t_{r}, T_{r}), C]

and attribute value combinations c and

c^{'}

, we say that

T_{r}

is: (i) minimal if ∄

T_{r}^{'}

such that

T_{r}^{'} \subset T_{r}

and

c o u n t (G_{γ} [(t_{r}, T_{r}^{'}), C], c, c^{'}) \geq c o u n t (G_{γ} [(t_{r}, T_{r}), C], c,

c^{'})

, and (ii) maximal if ∄

T_{r}^{'}

such that

T_{r} \subset T_{r}^{'}

and

c o u n t (G_{γ} [(t_{r}, T_{r}^{'}), C], c, c^{'})

\geq c o u n t (G_{γ} [(t_{r},

T_{r}) C], c, c^{'})

.

We say that a count for an event

γ

is increasing if for any

T_{r} \subseteq T_{r}^{'}

,

c o u n t (G_{γ} [(t_{r}, T_{r}^{'}), C], c,

c^{'})

≥

c o u n t (G_{γ} [(t_{r}, T_{r}), C], c, c^{'})

, for any set of attributes C, and any combination of attributes values c and

c^{'}

. Similarly, we define a decreasing count for an event

γ

if for any

T_{r} \subseteq T_{r}^{'}

,

c o u n t (G_{γ} [(t_{r}, T_{r}^{'}), C], c, c^{'})

≤

c o u n t (G_{γ} [(t_{r}, T_{r}), C], c, c^{'})

, for any set of attributes C, and any combination of attributes values c and

c^{'}

.

For increasing counts, we are interested in minimal intervals, while for decreasing counts in maximal ones.

Next, we introduce two types of exploration, skyline-based and interaction-based.

3.3.1. Skyline-Based Exploration

Skylines are often used in multi-criteria optimization problems to retrieve the items that have the best value in at least one of the criteria, and are not worse in the other criteria [33,43,44,45]. In particular, for multidimensional data, we say that an item x dominates another item

x^{'}

if x is as good as

x^{'}

in all dimensions and x is better that

x^{'}

in at least one dimension. A skyline query returns all non-dominated items in a dataset.

We define two criteria to evaluate the interestingness of an event

γ

in an aggregated graph

G_{γ} [(t_{r}, T_{r}), C]

with respect to attribute value combinations c and

c^{'}

of C: (1) the length

l_{T_{r}}

of interval

T_{r}

defined as the number of time instances in

T_{r}

and (2) the event count, i.e.,

c o u n t (G_{γ} [(t_{r}, T_{r}), C], c, c^{'})

.

Given an event

γ

, a graph

G [T, C]

and pair of attribute values, c and

c^{'}

of C, the evolution skyline is defined as the set of all non-dominated tuples

(t_{r}, T_{r}, w)

where

t_{r}

∈T,

T_{r}

⊂T and w =

c o u n t (G_{γ} [(t_{r}, T_{r}), C]

,

c, c^{'})

.

For the case of an increasing count, we say that a tuple

(t_{r}, T_{r}, w)

dominates a tuple

(t_{r}^{'}, T_{r}^{'}, w^{'})

, if it holds

(l_{T_{r}} < l_{T_{r}^{'}})

and

(w \geq w^{'})

, or

(l_{T_{r}} \leq l_{T_{r}^{'}})

and

(w > w^{'})

. Respectively, for decreasing counts, we say that a tuple

(t_{r}, T_{r}, w)

dominates a tuple

(t_{r}^{'}, T_{r}^{'}, w^{'})

, if it holds

(l_{T_{r}} > l_{T_{r}^{'}})

and

(w \geq w^{'})

, or

(l_{T_{r}} \geq l_{T_{r}^{'}})

and

(w > w^{'})

.

Skylines may include numerous results, especially in the case of graphs evolving over long periods of time. To address the potentially large size of the evolution skyline, we define the domination degree of a tuple

(t_{r}, T_{r}, w)

,

d o d (t_{r}, T_{r}, w)

to be the number of other tuples that this tuple dominates. We rank the

(t_{r}, T_{r}, w)

tuples in the evolution skyline according to their domination degree and return only the top-k ones.

3.3.2. Interaction-Based Exploration

Skyline-based exploration provides a general overview of the evolution of the graph, which can help us identify trends in the evolution of the graph. When we need to focus on specific parts of the graph where at least a given number of changes have occurred, we employ the interaction-based exploration. In the interaction-based approach, we adopt a threshold-based technique and search for intervals where at least

θ

events occurred in terms of interactions; i.e., at least

θ

edges were stable, added or removed. The value of

θ

is a user-defined parameter.

In exploring the graph, given an event

γ

, a graph

G [T, C]

, a pair of attribute values, c and

c^{'}

of C, and a threshold

θ

, we are looking to identify pairs (

t_{r}, T_{r}

) where

t_{r}

is the reference point and

T_{r}

an interval preceding

t_{r}

such that it holds:

c o u n t (G_{γ} [(t_{r}, T_{r}), C]

,

c, c^{'})

≥

θ

. Again, for increasing counts, we are interested in the minimal

T_{r}

, while for decreasing ones, we are interested in the maximal

T_{r}

.

For simplicity, in the rest of the paper we denote the event count as

c o u n t (G_{γ} [T_{r}, C], c, c^{'})

identifying

t_{r}

as the point in time next to the end point of

T_{r}

.

Initialization of $θ$ . Our framework provides an interactive exploration strategy that enables users to adjust $θ$ based on the output they receive for different values of $θ$ . We also offer a default strategy to establish an appropriate $θ$ . To this end, we leverage the skyline-based exploration strategy.

Specifically, to compute

θ

for an event

γ

, a graph

G [T, C]

, and pair of attribute values, c and

c^{'}

of C, we first compute the corresponding skyline, R. Then, we obtain the minimum and the maximum values of the counts from the tuples in R and set

θ

equal to the average of these values. The rationale is that for an event to be considered interesting, it must have a count that is at the least equal to the estimated average. Subsequently, users can interactively fine-tune this value and find an appropriate

θ

.

More formally, the estimated value of

θ

is formulated as follows:

θ_{e s t} = \frac{min_{(t_{r}, T_{r}, w) \in R} w + max_{(t_{r}, T_{r}, w) \in R} w}{2}

4. Implementation

In this section, we present information about the implementation of our framework. First, we describe the basic data structures, then the algorithms for temporal and graph aggregation, and finally, the algorithms for graph evolution. Further details are provided in [2,3].

4.1. Data Structures

To store a temporal attributed graph

G [T]

, we maintain separate structures, V for the nodes in V,

E

for the edges in E,

S

for the static attributes S in A, and

A_{i}

for the i-th time-varying attribute in A. Structure

V

is a labeled array with

| V |

rows and

| T |

columns, where each row is labeled with a node id,

v \in V

, and each column with a time point

t \in T

.

V [v, t]

, is equal to 1 if v appears at t, and 0 otherwise. A similar structure is employed for the edges, where

E

has

| E |

rows and

| T |

columns. Each row is labeled with a pair of node ids corresponding to the edge end points and each column with time point

t \in T

.

Since static attributes remain constant over time, we maintain a dedicated structure,

S

, that retains information for all available attributes without considering the dimension of time. This approach enhances the efficiency of our algorithms.

S

is a labeled array with

| V |

rows labeled with the node id, and

| S |

columns labeled by the corresponding attribute name

S_{i} \in S

.

V [v, S_{i}]

corresponds to the value of

S_{i}

for v. For the i-th time-varying attribute, we maintain a labeled array

A_{i}

. Rows are labeled with node ids, and columns with the time point

t \in T

, and

A_{i} [v, t]

is equal to the value of attribute

A_{i}

of v at time point t.

4.2. Temporal Operators and Aggregation

Temporal operators. We provide different algorithms for the union, intersection, and difference operators. We present here in detail the algorithm for intersection; the algorithms for union and difference are similar. Given a temporal attributed graph $G [T]$ and a pair of intervals $T_{1}, T_{2}$ , the algorithm ensures that only nodes and edges that appear in every time point of $T_{1}, T_{2}$ are included in the resulting graph. The output graph corresponds to a temporal attributed graph.

As depicted in Algorithm 1, the algorithm takes as input graph

G [T]

, represented by the labeled arrays

V, E, S, s e t (A_{i})

, and intervals

T_{1}, T_{2}

, and outputs labeled arrays

V^{'}, E^{'}, S^{'},

s e t (A_{i}^{'})

, that maintain the information for the intersection graph. We initialize

V^{'}, E^{'}, s e t (A_{i}^{'})

as empty arrays with one column for each t in

T_{1}

and

T_{2}

, and

S^{'}

as an empty array with one column for each static attribute (line 1). Next, we restrict the input graph

G [T]

, by filtering arrays

V, E, s e t (A_{i})

to include only the columns associated to the given set of intervals

T_{1}, T_{2}

(line 2). For each node

v \in V

, the corresponding row,

V [v, :]

, is accessed and if all values in the

V [v, :]

are equal to 1 (

V [v, t] = 1, \forall t \in (T_{1}, T_{2})

),

V [v, :]

is inserted into

V^{'}

. The corresponding entries for the attributes of v are also updated. That is, the row corresponding to v in

S

,

S [v, :]

, and for time-varying attribute i the corresponding rows from arrays

A_{i}

,

A_{i} [v, :]

, are retrieved and copied in

S^{'}

and

A_{i}^{'}

respectively (lines 3–7). Similarly, for each edge

e \in E

the corresponding row

E [e, :]

is accessed, and if all values

E [e, :]

are equal to 1,

E [e, :]

is inserted into

E^{'}

(lines 8–10).

Algorithm 1 Temporal operator (intersection)

Input:

G [T]

represented by

V, E, S, s e t (A_{i})

, intervals

T_{1}, T_{2}

Output: Intersection graph

G^{'} = G [T_{1}] \cap G [T_{2}]

1:: Initialize $V^{'}, E^{'}, s e t (A_{i}^{'})$ in $(T_{1}, T_{2})$ , and $S^{'}$
2:: Restrict $V, E, s e t (A_{i})$ in $(T_{1}, T_{2})$
3:: for each node v do
4:: if $V [v, t] = 1$ $\forall t$ then ▹ $t \in (T_{1}, T_{2})$
5:: $V^{'} \leftarrow V [v, :]$
6:: $S^{'} \leftarrow S [v, :]$
7:: $A_{i}^{'} \leftarrow A_{i} [v, :]$ ▹ $\forall i$ attribute of v
8:: for each edge e do
9:: if $E [e, t] = 1$ $\forall t$ then ▹ $t \in (T_{1}, T_{2})$
10:: $E^{'} \leftarrow E [e, :]$
11:: return $V^{'}, E^{'}, S^{'}, s e t (A_{i}^{'})$

Aggregation. Algorithm 2 presents distinct aggregation for static attributes. The algorithm takes as input a temporal attributed graph $G [T]$ represented by $V, E, S$ , and the set C of the aggregation attributes, and outputs the labeled vectors $V^{'}, E^{'}$ representing the aggregated graph $G [T, C]$ . First, we initialize $V^{'}, E^{'}$ as empty arrays (line 1). For each attribute $S_{i}$ in C the corresponding column $S [:, S_{i}]$ , holding the values of $S_{i}$ for each node in $V$ , is accessed and inserted in $V^{'}$ . Next, columns in $V^{'}$ are merged (lines 2–4). Therefore, $V^{'}$ corresponds to a vector, where for each node v, the value $V^{'} [v]$ corresponds to the combined attribute values of v. To build $E^{'}$ , we traverse $E$ and for each edge $(u, v)$ , we lookup the edge endpoints u and v in $V^{'}$ to retrieve their values in attributes C. Each value pair is inserted into $E^{'}$ (lines 5–8). Next, we group rows in $V^{'}$ and $E^{'}$ based on their values. We set these values as the group (i.e., produced row) labels, and we count their appearances to calculate the corresponding weights (lines 9–10). The algorithm returns $V^{'}$ vector labeled with attribute value(s) that maintains the weight of the aggregated nodes and $E^{'}$ labeled with pairs of attribute values that maintain the weight of the aggregated edges (line 11).

Algorithm 2 Aggregation (static attributes)

Input:

G [T]

represented by

V, E, S

, attributes set C

Output: Aggregate graph

G [T, C]

represented by

V^{'}, E^{'}

1:: Initialize $V^{'}, E^{'}$
2:: for each $S_{i} \in C$ do
3:: $V^{'} \leftarrow S [:, S_{i}]$
4:: Merge columns in $V^{'}$
5:: for each $(u, v) \in E$ do
6:: $c \leftarrow$ lookup u in $V^{'}$
7:: $c^{'} \leftarrow$ lookup v in $V^{'}$
8:: $E^{'} [(u, v)] \leftarrow (c, c^{'})$
9:: $V^{'} . G r o u p b y (c) . C o u n t (w e i g h t)$
10:: $E^{'} . G r o u p b y ((c, c^{'})) . C o u n t (w e i g h t)$
11:: return $V^{'}, E^{'}$

We also support non-distinct aggregation. In this case, instead of iterating in line 2 for each attribute, we iterate for each node and attribute so as to account for multiple node and edge appearances. To improve efficiency, static and time-varying attributes are treated differently. The algorithm for time-varying attributes follows a similar structure, but it is more involved.

4.3. Graph Exploration

We now present the algorithms for the skyline-based and the interaction-based exploration. Both algorithms use the data structures and algorithms for temporal operators and aggregation presented in the previous section. However, for notation simplicity, here we use a simplified graph representation.

Skyline-based exploration. The algorithm takes as input an event $γ$ , a temporal attributed graph $G [T]$ , attribute(s) C, and attribute value combinations $c, c^{'}$ of C, and outputs the tuples that belong in the skyline, stored in R based on the length of the detected interval, along with their domination degree stored in D. In Algorithm 3, we present in detail the skyline-based computation process for the event of stability ∩.

We consider all reference points in T, denoted as

t_{r}

, and initially pair each of them with its corresponding past interval

T_{r}

of maximum length. For each such

(t_{r}, T_{r})

pair, first we create the intersection graph and then we compute the aggregation for the requested C. Next, for the given values

c, c^{'}

, we retrieve the weight of the corresponding edges in the aggregate graph and form the corresponding tuple

p = (t_{r}, T_{r}, w_{p})

(lines 2–8). Each such tuple is then compared against the tuples already in the skyline to determine whether it should be added or not to R. To avoid redundant comparisons, we discern cases based on the length of the compared tuples. For each tuple

q = (t_{r}^{'}, T_{r}^{'}, w_{q})

in R with

l_{T_{r}} \leq l_{T_{r}^{'}}

, we check the dominance relationship between p and q, by comparing both the intervals length

l_{T_{r}}, l_{T_{r}^{'}}

and the corresponding weights

w_{p}

and

w_{q}

. If p dominates q, p replaces q in R and D is correspondingly updated with p’s domination degree. If q dominates p, then p is discarded and q’s domination degree is updated. Finally, if there is a tie, p is also inserted in R. For each tuple q with

l_{T_{r}} > l_{T_{r}^{'}}

, we only need to check if

w_{p}

is at least equal to

w_{q}

. If this condition is met, q is dominated by p. Otherwise, a tie exists. Subsequently, R and D are updated accordingly (lines 9–23). At the end of this procedure, we reduce the inspected interval

T_{r}

and update its length (lines 24–25) and repeat the described procedure for the new

T_{r}

(and the same reference point). The procedure terminates when all reference points are considered against all corresponding past intervals.

Depending on the event’s counts, we adopt a different approach for selecting the initial interval for each reference point in our exploration so as to reduce comparisons and replacements in the skyline tuples. Thus, when considering events with decreasing counts, such as stability with strict semantics as presented here or growth with loose semantics, our algorithm begins with the

T_{r}

with the greatest length and gradually reduces it. Respectively, for events with increasing counts, such as shrinkage with loose semantics, we search for minimal intervals, and thus we start our exploration with the

T_{r}

with the smallest length and we gradually increase it.

Algorithm 3 Skyline-based exploration (stability (∩))

Input:

G [T]

, attribute(s) C, attribute values

c, c^{'}

Output: Skyline set represented by R, dominance set represented by D

1:: Initialize $R, D$ ▹ $R : l_{T_{r}} \to s e t ((t_{r}, T_{r}, w)), D : (t_{r}, T_{r}, w) \to | d |, d$ : dominated tuples
2:: for each reference point $t_{r}$ do
3:: $T_{r} =$ longest interval for $t_{r}$
4:: while $l_{T_{r}} \geq 1$ do
5:: Compute intersection for $G [T]$ on $(t_{r}, T_{r})$ (Algorithm 1)
6:: Compute aggregation on C (Algorithm 2)
7:: Get count $w_{p}$ for $c, c^{'}$
8:: Current tuple $p = (t_{r}, T_{r}, w_{p})$
9:: for each tuple $q = (t_{r}^{'}, T_{r}^{'}, w_{q}) \in R$ do
10:: if $l_{T_{r}} \leq l_{T_{r}^{'}}$ then
11:: if $w_{p} > w_{q}$ and $l_{T_{r}} = l_{T_{r}^{'}}$ then ▹p dominates q
12:: Remove q from R
13:: Insert p in R
14:: Update $D [p]$
15:: else if $w_{p} < w_{q}$ then ▹q dominates p
16:: Update $D [q]$
17:: else ▹ tie
18:: Insert p in R
19:: else if $l_{T_{r}} > l_{T_{r}^{'}}$ then
20:: if $w_{p} \geq w_{q}$ then ▹p dominates q
21:: Execute lines 12–14
22:: else ▹ tie
23:: Execute line 18
24:: Reduce $T_{r}$
25:: $l_{T_{r}} - -$
26:: return $R, D$

Interaction-based exploration. The algorithm takes as input an event $γ$ , a temporal attributed graph $G [T]$ , attribute(s) C, attribute value combinations $c, c^{'}$ of C, and threshold $θ$ and outputs a set Z with tuples corresponding to maximal or minimal intervals where at least $θ$ events occur. We present the implementation of the interaction-based exploration for the event of stability (∩) as depicted in Algorithm 4. As stability has decreasing counts, the algorithm finds maximal intervals. In contrast to skyline-based exploration, here we consider first all reference points $t_{r}$ , along with their corresponding shortest past interval $T_{r}$ . We then, similarly to Algorithm 3, build the current tuple p (lines 2–9). Next, we check whether $w_{p} \geq θ$ , and if so, p is inserted in Z and $T_{r}$ is extended. If $w_{p} < θ$ , $t_{r}$ does not need to be considered with any other intervals and it is pruned. This is because the event counts are decreasing and therefore, extending $T_{r}$ will not increase the count of interest (lines 10–14). The process ends after examining all reference points.

For an event with increasing counts, we search for minimal intervals and begin with intervals of the greatest length and then reduce them gradually.

Complexity. We now analyze the complexity of our algorithms. Specifically, for Algorithm 1, which constructs the stability (intersection) graph, complexity depends on the number of nodes, edges, attributes, and length $τ$ of interval $(T_{1}, T_{2})$ . Thus, the cost of constructing the stability graph is $O (| V | \times | A | \times τ + | V | \times | S | + | E | \times τ)$ , where the first two terms account for the processing of time-varying and static node attributes, respectively. Complexity for constructing the other event graphs is similar to that of stability.

Concerning Algorithm 2, which presents aggregation on static attributes, complexity depends on the number of nodes, attributes, edges, and merging, lookup, and group-by operations. The cost of merging is

O (V)

, the cost of each lookup

O (1)

, and the cost of group-by

O (| V | + | E |))

. Thus, the factor that dominates the construction of the aggregate graph is the processing of the node attributes, which amounts to

O (| V | \times | C | + | E |))

.

Algorithm 4 Interaction-based exploration (stability (∩))

Input:

G [T]

, attribute(s) C, attribute values

c, c^{'}

, threshold

θ

Output: Result set Z

1:: Initialize Z
2:: for each reference point $t_{r}$ do
3:: $T_{r} =$ shortest interval for $t_{r}$
4:: $r e p e a t = T r u e$
5:: while $r e p e a t = T r u e$ do
6:: Compute intersection for $G [T]$ on $(t_{r}, T_{r})$ (Algorithm 1)
7:: Compute aggregation on C (Algorithm 2)
8:: Get count $w_{p}$ for $c, c^{'}$
9:: Current tuple $p = (t_{r}, T_{r}, w_{p})$
10:: if $w_{p} \geq θ$ then
11:: $Z \leftarrow p$
12:: Extend $T_{r}$
13:: else
14:: $r e p e a t = F a l s e$
15:: return Z

For the skyline-based exploration algorithm, Algorithm 3, we denote the length of interval T as

τ

. For each reference point within the interval, we consider all immediately preceding intervals. Hence, for each of the

τ

reference points, we examine

τ - 1

past intervals. The total number of pairwise intervals, and thus the number of graphs defined on them, is then

O (τ^{2})

, since for each of the

τ

reference points, the algorithm considers

τ - 1

intervals. We have already discussed the cost of constructing the stability (or any other event) and aggregate graphs. Each of the

τ^{2}

graphs produces one candidate skyline tuple that we need to check against all existing tuples in the skyline set R. Thus, each pair of intervals leads to an update of the skyline set R by at most one tuple at a time. Therefore, the complexity for skyline computation, excluding, for simplicity, the cost of constructing the graphs, is

O (τ^{2} \times | R |)

.

For the interaction-based exploration algorithm, Algorithm 4, the overall complexity is dominated by the number of reference points

τ

and the number of interval extensions. Each reference point may require up to

τ - 1

interval extensions in the worst case (when the threshold is always met and no pruning is performed), resulting in a total of

O (τ^{2})

constructed stability and aggregate graphs and corresponding result tuples that are compared against the threshold. Thus, the overall complexity is

O (τ^{2})

, where the complexity for stability and aggregate graph construction is omitted.

5. The TempoGRAPHer System

TempoGRAPHer supports three main functionalities: (1) an overview of the temporal graph at specific time points, (2) aggregation of the temporal graph on one or more of its attributes and at various time granularities, and (3) exploration of the history of the temporal graph for identifying intervals of significant changes with both skyline-based and interaction-based approaches.

The overall architecture of the TempoGRAPHer system is depicted in Figure 3. The system comprises three basic components that implement its three main functionalities, overview, aggregation, and exploration.

Next, we describe in detail each of the main components and the functionality they provide. To this end, we will use as case study the Primary School dataset [46,47], which is a contact network describing the face-to-face proximity of 232 students and 10 teachers in a primary school in Lyon, France. Each interaction denotes a 20-second contact between two individuals. Each individual has two static attributes, gender, and class. The school has five grades, 1 to 5, with two classes each (i.e., 1A, 1B, 2A, 2B, etc.), plus teachers, and three values for the gender attribute, female, male, and unspecified. Our dataset covers a period of 17 h. Table 1 shows the total number of nodes and edges and the density for each time point for the graph of Primary School.

Graph Overview. The main input for all system components is a temporal attributed graph. The overview component, as its name indicates, provides a general overview of the given graph by illustrating at specific time points the maximum connected component of the graph colored based on a selected attribute. This allows the user to derive insights on the distribution of this attribute among the nodes of the graph at the selected time point. If the original graph is too large to display (more than 100 nodes), graph sampling is first applied utilizing the Snowball sampler [48].

First, the system enables users to load their own graph or choose among existing graphs that are preloaded in the system, such as Primary School. Then, the user selects the preferred time point and a single attribute, and in the graph displayed, each node is assigned a color based on the value of the selected attribute. By hovering the mouse over a node, the user can be informed about the id of the node. Figure 4 depicts the graph overview for Primary School for the first time point. Nodes are colored based on the class attribute. The system also provides some general statistics, showing the number of nodes in the graph and the number of distinct values of the attribute at the given time point.

Graph Aggregation. TempoGRAPHer facilitates the aggregation of the original graph on both the time and the attribute dimensions. As illustrated in Figure 3, given a time interval and a temporal operator (i.e., project, union, intersection, difference, or evolution), first temporal aggregation is performed. The derived graph is then aggregated based on a set of specified attributes either using distinct or non-distinct aggregation, and the produced weighted graph is then visualized.

The system interface allows the user to select all required parameters, that is, the preferred time period of interest, the temporal operator of interest, the type of aggregation, distinct/non-distinct, and the set of attributes based on which graph aggregation is performed. For example, Figure 5 shows the distinct aggregation of the gender and class attributes on the intersection graph for intervals [1, 2] and [3, 4]. Each attribute value combination is colored using a different color and the weights of both nodes and edges are displayed. Furthermore, the attribute value combination of each node is shown by hovering over the node.

Graph Exploration. The third functionality of TempoGRAPHer is exploration, which enables users to visually discover parts of the graph where significant events have occurred. TempoGRAPHer explores stability using strict semantics, and growth and shrinkage utilizing loose semantics.

The exploration component provides the two exploration strategies, both skyline- and interaction-based exploration. For both exploration strategies, all pairs of reference point and interval need to be considered iteratively. Thus, given an event type, i.e., stability, growth, or shrinkage, at each iteration, i.e, for the current reference point and interval pair, the corresponding event graph is first constructed. Then, the event graph is aggregated based on a set of selected attributes. The event counts are then derived from the aggregate event graph for the specified values of the selected attributes. The output of this procedure is a tuple consisting of an interval, a reference point, and the computed count. These three steps are repeated for all pairs of reference point and interval as we mentioned above and each derived tuple is sequentially forwarded according to the strategy used to the corresponding subcomponent as shown in Figure 3.

In particular, for skyline-based exploration, the system computes the skyline result by checking if each derived tuple is dominated by any other or whether it should be added in the skyline. The output depicts the most important pairs of intervals for the specified event considering the length of the interval and the number of interactions for the requested attribute values. The user can visualize the output in two formats, a 3D plot and a 2D plot. In the 3D plot, the x-axis represents the past interval, the y-axis depicts the reference point, and the z-axis displays the counts. Additionally, the plot depicts the top three results, which are colored in blue. The 2D plot offers a distinct representation for each reference point. The x-axis represents the past interval and the y-axis displays the counts.

If the interaction-based exploration is selected, the system compares the count of each derived tuple against the specified threshold

θ

and eliminates all tuples that do not satisfy the given constraint. Pruning is applied to make the procedure more efficient. The output is then visualized as a plot, illustrating for each reference point all maximal or minimal interval pairs, depending on the type of the event.

Figure 6 shows the visualization of the results for the skyline-based exploration for stability of interactions among female students. The user specifies the event of interest, stability in this example; the attributes for graph aggregation; gender; and the specific attribute value combination, i.e., F-F edges. The skyline includes long periods of stability and periods of high stability. Pair ([12], [11]) represents the most important result considering the number of interactions, while pair ([17], [2, 16]) is the most important result from the interval length perspective. Figure 6a depicts the 3D visualization of the skyline. The bars colored with blue depict the top three derived results according to the domination degree. Figure 6b depicts the 2D visualization for the same settings. The results for each reference point are represented individually. By hovering over a line, the corresponding count appears.

Similarly, Figure 7 depicts the visualization results for stability among female student interactions using interaction-based exploration. In addition to the parameters specified for skyline-based exploration, the user also specifies the threshold value

θ

equal to 30 in the presented example. The result includes all maximal intervals where at least

θ = 30

stable interactions have persisted between girls (F-F edges). The highest stability is observed for reference point 12 with the maximal interval pair being ([12], [7, 11]).

6. Evaluation

For our experiments, besides Primary School (described in Section 5), we use two additional real-world datasets, namely DBLP, and MovieLens.

DBLP is a collaboration network extracted from DBLP [49]. The dataset covers a period of 21 years from 2000 to 2020, where each year denotes a time point. There is an edge when two authors co-author a paper. There are two attributes for the authors, one static, gender with two values, and one time-varying, #publications per year with three categorical values (low, average, high). Gender is determined utilizing an appropriate tool [50].

MovieLens is a movie ratings dataset that builds on the benchmark MovieLens [51]. The dataset covers a period of 6 months from 1 May to 31 October 2000, and each month represents a time point. There is an edge between two users when they have both rated the same movie. Each user has three static attributes, gender with 2 values, age with 7 categorical values, occupation with 21 values, and one time-varying, average rating of the user per month with 38 values.

Table 2 and Table 3 depict, respectively, the size and the density of the DBLP and MovieLens graphs per time point.

Our methods are implemented in Python 3.11 and our experiments are conducted on a macOS 14.2.1 M2 with 16 GB RAM. Our code and data are publicly available [52]. TempoGRAPHer can be accessed online [53].

6.1. Performance Evaluation

We perform experiments across various graph sizes to study the performance behavior and scalability of our algorithms. We use strict semantics for the operator of intersection, and loose semantics for the operators of growth and shrinkage. We report the execution time for each of our algorithms for graphs defined on increasing intervals, depicted on the y-axis for all cases. The time presented corresponds to the average time based on five iterations.

First, we study the performance behavior of our temporal operators. In Figure 8, we present the execution time for building the intersection, union, and difference graphs of DBLP, MovieLens, and Primary School. The x-axis represents a pair of intervals within the specified period. For instance, “up to 2005” refers to the pair [2000, 2004], [2005]. Since strict semantics are used for the intersection operator, the graph on [2000, 2004], [2005] is computed by directly applying intersection on the graph defined on [2000, 2005] and outputs a graph with nodes and edges that are present on each year within [2000, 2005]. Similarly to intersection, for union and loose semantics, the union operator is applied for [2000, 2005], resulting in a graph with elements being present in any of the years in the interval. For difference, first we need to compute the graph on [2000, 2004] employing the union operator (loose semantics) and then apply the difference operator. Difference captures nodes and edges existing in 2005 that are not present in any of the past years in [2000, 2004]. We observe that intersection is the fastest to complete, while difference is the most costly overall due to the fact that the difference operator involves union and, consequently, corresponds to two distinct operations.

In particular, for DBLP, we report a linear increase in execution time as the graph size increases. We observe a steady increase in the execution time of all three operators. For instance, the difference operator needs 0.01 s to complete for the smallest input graph of size 4 K and reaches up to 0.09 s for the largest input graph of size 250 K. For MovieLens, which has only six time points, the difference operator shows a significant spike in August, with an execution time of 0.5 s. This spike is due to a sharp increase in the number of nodes and edges (and density, which is the second highest across all time points) in August compared to the previous interval. For the union and intersection operators, the execution times remain close to constant despite the input graph reaching 1 M nodes and edges. All three operators perform the fastest for the Primary School graph, with the most time-consuming operator requiring only 0.002 s. Despite the rather high number of time points, the small number of nodes and edges (about 9 K) for the input graph results in almost constant execution times. In general, we can safely deduce that execution time depends on both the size of the input graphs and the size of the output graph.

Next, we analyze the performance characteristics of our algorithm for aggregation. Figure 9 illustrates the execution time for applying aggregation on the intersection graph for various attribute combinations. The x-axis represents an interval, e.g., [2000, 2005] corresponds to “up to 2005”, and the input of the algorithm is the resulting graph when applying the intersection operator on this interval. Concerning the static attributes, G is present across all graphs and denotes gender, and Cl denotes the class of Primary School. Regarding the time-varying attributes, P represents #publications of DBLP, and R represents rating for MovieLens. Static attributes are the fastest to complete requiring at most 0.002 s, while time-varying attributes and combination of attributes take up to 0.007 s to execute. This is due to the dynamic nature of time-varying attributes, where capturing all attribute value appearances in the interval impacts the performance. We observe a constant drop in the execution time when considering longer periods of time, which can be attributed to the decrease in the input graph, that is, the intersection graph. We derive that execution time for aggregation on the intersection graph mainly depends on the size of the input graph and the type of attribute.

In the following, we analyze the aggregation performance given as input the corresponding union graph. The aggregation on the union graph is expected to exhibit an inverse behavior compared to that of intersection, as by considering longer periods, the union graph increases. In Figure 10, we present the execution time for aggregation on the union graph for various attribute combinations. The x-axis represents an interval, e.g., [2000, 2005] corresponds to “up to 2005”, and the input of the algorithm is the graph generated when applying the union operator on this interval. We notice a general pattern of time increase as the interval increases. Also, aggregation on static attributes is the fastest to compute. When considering combinations of attributes, we observe that they are the most time-consuming for the graphs of DBLP and Primary School. However, MovieLens exhibits a different behavior with aggregation on rating proving faster than that on the combination of gender and rating. For the “up to oct” scenario, applying the union operator results in about 980 k edges to be aggregated. When aggregating on rating alone, the graph consists of 660 aggregate edges, while the combination of gender and rating yields about 2 k aggregate edges. This indicates that in the former case, there are fewer groups but a higher number of grouped edges, whereas in the latter case, there are more groups but fewer grouped edges, which apparently affects the execution time. Additionally, we observe that our algorithm achieves constant execution time for the union aggregation of static attributes, suggesting that the runtime remains stable regardless of the size of the input graph. For the cases of time-varying and combined attributes, the performance of our algorithm scales proportionally with the size of the input graph.

Next, we evaluate the performance for our skyline-based exploration algorithm. Figure 11 illustrates the skyline execution time on exploring the stable connections between females, between female and male, and between males. Female is denoted as F and male is denoted as M. The x-axis denotes a reference point, defining the exploration space where the pairs of (reference point, interval) are formed. For example, for “2005” we form the pair [2000, 2004], [2005], and all the pairs that derive by gradually reducing [2000, 2004], that is, pair [2001, 2004], [2005], continuing up to pair [2004], [2005]. The graphs generated for these pairs serve as input to the algorithm. We report high performance across all graphs, with MovieLens exhibiting the best overall, needing at most 0.14 s to complete the exploration. Despite the significant imbalance in the number of (F, F), (F, M), (M, M) edges across all graphs, we observe that execution time remains unaffected. Additionally, exploration takes the longest time to complete for DBLP, requiring up to 1.8 s. This is due to DBLP having the highest number of time points among the three graphs, as well as more edges than Primary School though fewer than MovieLens. This indicates that skyline-based exploration time primarily depends on the number of time points and then on the number of the nodes and edges in the graph. In conclusion, skyline exploration demonstrates effective performance, with execution time increasing linearly with the size of the input graph, while the number of the edges matching the pairs of attribute values requested for exploration has no significant impact on performance.

Finally, we analyze the second type of exploration, which is based on the number of interactions. Figure 12 depicts the execution time of the interaction-based exploration for stable edges between females, between female and male, and between males. The x-axis shares the same notation as the skyline-based approach. However, the pairs defined for each reference point are derived in reverse, starting from the minimum interval and gradually increasing it. We implement the default strategy to determine

θ

as the average of the minimum and maximum counts recorded during skyline exploration. For each type of edge, we have a different

θ

value. For example, in DBLP,

θ

is 16 for (F, F), 232 for (F, M), and 880 for (M, M) for the case “up to 2020”. Our algorithm achieves high performance for any of the graphs, with MovieLens exploration being the fastest to complete, requiring at most 0.07 s. Time appears to increase linearly, while the number of the edges being explored does not affect the performance. Undoubtedly, pruning plays a crucial role in efficiently computing the exploration result by enabling us to skip exploring intervals that do not meet our criteria. Moreover, it is important to set an appropriate

θ

threshold as it can potentially eliminate unnecessary computations and further enhance the performance of our algorithm.

Overall, both approaches demonstrate high performance, with the interaction-based exploration being particularly efficient due to its reliance on pruning and the ability to appropriately select a

θ

threshold. These approaches can function independently or complementarily. Users can start with the skyline-based approach to gain a general overview of the graph evolution. Subsequently, they can utilize the skyline output to focus on specific intervals using the interaction-based approach, setting a suitable threshold.

Our experimental results confirm the theoretical complexity of our algorithms. Specifically, the observed performance aligns with the expected scaling behavior based on the number of nodes, edges, attributes, and the length of the intervals, as outlined in the complexity analysis.

6.2. Qualitative Evaluation

In this section, we present two case studies to showcase how the TempoGRAPHer system helps users reveal underlying information and discover interesting aspects in the history of a graph.

6.2.1. Case Study: Targeting Transmission Pathways in Primary School

Our first case study uses the Primary School graph as described in Section 5.

Suppose that we want to design a policy for containing disease spread. Thus, we need to study the school’s evolution graph to identify factors that influence disease spread. In particular, we need to detect groups of students that are more active or have less stable interactions and are therefore more likely to facilitate disease spread. Also, we need to detect time periods in which disease spread is more likely to happen. In the following, we only show our results as the system interfaces are described in the previous section through Figure 4, Figure 5, Figure 6 and Figure 7.

To gain a first understanding about the attributes of the input graph and its evolution, we first deploy a graph overview to visualize the graph on various time points and gain an idea of how the graph changes through time. Figure 13a depicts the 12th time point and the gender attribute, where girls are colored with light blue, boys with orange, and unspecified gender with green. At least three well-separated clusters are formed in the illustrated graph showing student interactions. However, the clusters do not seem to depend on gender. To determine whether the clusters depend on the other attribute of the nodes, in Figure 13b we use the graph overview again for the same time point but select the class attribute this time. As we can see, there are six distinct class values at this time point, 1A, 1B, 4A, 4B, 5A, and 5B classes and teachers. By selecting class for coloring our graph, we can clearly notice that actually there are five clusters formed based on this attribute, which is a first indication that student interactions depend on their class.

We continue with the next time point and notice a change at the 13th time point, shown in Figure 14. Compared to the 12th time point, here we notice a more complex view, with students interacting with each other regardless of gender (Figure 14a) or class (Figure 14b). Thus, we can assume that the 12th time point corresponds to interactions during lessons held at the 12th hour of school, while the 13th time point shows break time, in which classes are not separated and students interact regardless of the class they belong to. Viewing the graphs at specific time points helps us recognize lessons and breaks and showcases that during class time, as expected, interactions are mostly limited among members of the same class, while during breaks interactions, they are not contained within class limits. However, this first analysis provides limited information about the volume or the duration of student interactions.

Thus, we proceed with distinct aggregation so as to quantify student interactions at lessons and breaks depending on different attribute values. Figure 15a and Figure 16b show the aggregate graphs on gender at the 12th and 13th time point respectively. Firstly, in none of the graphs can we infer any clear dependence of the interactions on the gender of the students. However, when comparing the two, we observe that boys are much more active during breaks compared to girls, while during class, both genders exhibit similar behavior. Furthermore, we can see that while fewer people interact during breaks, the interactions per person are significantly increased compared to class time. Thus, we deduce that the students that do interact during breaks are much more active compared to class time independent of their gender, as we expected based on our finding from the graph overview.

Aggregating by class for the 12th time point, we notice in Figure 15b that most interactions are confined within students of the same class. For the 13th time point, in Figure 16b, we observe that in most cases students have more interactions with students of other classes rather than with students of their own class. For instance, students from class 5A depicted with yellow-green have 41 interactions with students of the same class and 244 interactions in total with students of other classes. Consequently, unsurprisingly we again may deduce that breaks during school time pose a risk for an increase in disease spread.

To gain further insights on the nature of the interactions between the students, we use exploration to investigate their stability and see whether we can rely on this fact to limit disease spread.

We start by studying the stability of interactions between the students of the same gender using skyline-based exploration. Comparing the interactions between girls in Figure 17a and boys in Figure 17b, we report 10 results for the girls’ and 13 results for the boys’ contacts. We observe that periods with high stability have a rather limited length, while there are shorter periods in which a very high degree of stable interactions is observed. In the skylines for both girls and boys, intervals with length longer than seven maintain at most nine stable interactions, which can be seen as a bound on the duration and size of isolation bubbles. On the other hand, we observe an interval of duration four on ([12], [7, 11]) with stability of at least 32 for both girls and boys, indicating a potential lower risk zone for disease spread. Also, as both figures illustrate, most skyline results for both boys and girls are reported on the 12th time point, which indicates that 12 is the reference point with the highest stability.

Based on the first conclusions drawn from the skyline-based exploration, we next deploy interaction-based exploration for the interactions between students of the same gender. To set

θ

, we start by initializing it using the default strategy and then fine-tune and obtain

θ = 35

. Figure 18 depicts the interaction-based exploration results with

θ = 35

for girls (Figure 18a) and boys (Figure 18b). We notice longer intervals of stability for boys’ contacts with an average length of

1.88

compared to those for girls, where the average length of the stable intervals is

1.19

. The longest stability period reported for boys is on ([12], [7, 11]) with interval length four, a result that is also included in the skyline of the stable interactions between boys. On the other hand, for girls, the highest stability is on ([12], [8, 11]) and ([11], [7, 10]) with interval length three. While ([12], [8, 11]) was also discovered through the skyline exploration, ([11], [7, 10]) is not part of the skyline. Thus, we can see in this example how the interaction-based exploration allows us to discover more results of significant stability, where significance is determined through the user-defined threshold

θ

.

After exploring the interactions based on gender, we focus on the class attribute and first present the results of the exploration of stable contacts between students of junior class 1A and between students of senior class 5A. First, we run the skyline-based exploration, where we report 10 results for the stable contacts between students of class 1A, as shown in Figure 19a, while for students of class 5A we report 14 results (Figure 19b). In general, the results for the senior class also exhibit longer intervals and higher counts compared to the results for the junior class, indicating that senior students build more stable connections compared to junior ones.

Using a similar approach to the one we used with same-gender interactions, we apply the interaction-based exploration with

θ = 15

. We report 14 results for the students of class 1A, as shown in Figure 20a, with the longest stability periods on ([4], [1, 3]), ([9], [6, 8]), ([11], [8, 10]) and ([12], [9, 11]) with interval length two, while Figure 20b reports 16 results for the students of class 5A, where ([12], [6, 11]) corresponds to the longest stability period with length five. We can see that the interaction-based exploration is again able to reveal more results compared to the skyline-based approach that satisfy a minimum user-defined threshold of stability, giving us more opportunities to detect periods that seem of lower risk with respect to disease spread.

To conclude our investigation, we contrast our results of interactions between students of the same class, with an analysis of the relationships of students between students of different classes but the same grade using skyline-based exploration. In particular, we explore the contacts of students of two different classes that belong to the same junior grade as depicted in Figure 21a for students of 1A and 1B and also to the same senior grade as shown in Figure 21b for students of 5A and 5B. Despite selecting students of the same grade, as we expect students of the same age to interact more among themselves, we still observe that the skyline output is rather poor with only two and four results, respectively, and we observe no significantly high stability counts nor long stability periods. For the junior students, the longest stable interval only reaches length two. Consequently, we may safely deduce that the class attribute seems to be the most important attribute that determines the stable interactions between students, showing us opportunities for limiting disease spread by creating isolation bubbles within the limits of each class at least.

In summary, our analysis indicates that while gender does not seem to be a significant factor governing student interactions, boys still show greater stability on their relationships at school compared to girls. Boys are also much more active than girls during breaks. Regarding the class attribute, we observe that most stable interactions remain within the limits of a class, and older students seem to have established more stable contacts compared to younger ones. Finally, while during lessons most of the interactions are between students of the same class, during breaks, a large number of interactions are between students of different classes showing us that in order to limit disease spread, breaks should be limited or at different times among classes.

The Primary School dataset was originally created to study important properties on the contact patterns and the relevance of these patterns on modeling propagation of a disease [47]. Our results confirm the findings of this work about the importance of age and class in shaping student contact patterns. Furthermore, our findings indicate that gender is a factor that can affect interactions, and also, we identify cases of significant stability in the student contacts, corresponding to potential safety zones.

6.2.2. Case Study: Growth of Same Gender Collaborations in DBLP

In this case study, we investigate the evolution over the years of collaborations in DBLP between authors of the same gender, focusing on the growth event. To this end, we apply graph exploration using the skyline-based method.

In Figure 22a, we report the skyline result for new collaborations between female authors. Recall that the skyline includes the most important intervals with respect to the duration and volume of new collaborations. Notably, seven out of eight results appear in 2019, with at least 640 collaborations reported in any of the intervals. The highest count is 669 new collaborations in 2019 that did not exist in 2018. The longest period is [2000, 2018] with a count of 640, meaning that there are 640 new collaborations in 2019 that did not exist in any of the previous years. Only one result is reported for 2020, with 629 new collaborations and period [2000, 2019], meaning that none of them existed in any of the previous years. As the majority of the results are concentrated in 2019, we consider 2019 as the most important reference point, signifying the greatest growth for female collaborations.

In Figure 22b, we report the skyline result for new collaborations between male authors. We notice results are reported only for 2020, indicating it as the most important year for new collaborations between male authors, with more than 17 K new collaborations. Specifically, the highest count reported is 18,370 new collaborations that did not exist in 2019, and the longest interval reported is [2000, 2018] with a count of 17,400, which means there are 17,400 new male collaborations that did not exist before.

The results show a significant difference in the number of new collaborations between female and male authors, reflecting a significant gender gap in research. Although this is an important indicator, it would be useful to investigate a different aspect, focusing on the percentage of new collaborations in relation to the total collaborations reported.

We focus on the important reference points for growth, i.e., years 2019 and 2020, to study how the overall collaborations that appear in these years are related to growth. Thus, we apply distinct aggregation by gender. Figure 23a,b report the results. Orange nodes represent male authors and blue nodes female authors. For the year 2019, which is the most important year in terms of growth of female collaborations, there are overall 706 F-F collaborations. Among these collaborations, 640 appear for the first time, meaning that about 90% of the female collaborations appearing in 2019 are new ones. For the year 2020, which is the most important year in terms of growth of male collaborations, there are 20130 M-M collaborations in total, of which 17400 appear for the first time, that is, the percentage of new male collaborations in 2020 is 86%. This means that we see a slightly higher percentage of growth for female collaborations. Additionally, in 2019 we report 6164 collaborations involving female authors (F-F, F-M), indicating that only 11% of these collaborations consist exclusively of female authors, while 77% of collaborations involving male authors (M-M, F-M) are exclusively male collaborations. For 2020, the ratio remains 11% for collaborations involving female authors, while for male authors it slightly increases to 78%. It should be noted that this significant disparity in collaborations between female authors and male authors mainly derives from the lower number of female authors compared to male authors. Nevertheless, it confirms the overall findings that underscore the limited representation of women in research.

In conclusion, our analysis underscores gender inequality in research. Furthermore, incorporating additional attributes, such as the author affiliation or research field, relevant analyses can be applied to identify trends in collaborations, focusing for example, on interdisciplinary research. By carefully analyzing the evolution of collaborations, policymakers can identify patterns to facilitate proactive actions. For example, identifying the years of important growth or shrinkage in specific types of collaborations, targeted actions may be undertaken.

Our case study aligns with research on the temporal evolution of academic collaboration networks, particularly regarding gender disparities. A steady increase in female participation in the INFORMS network since 1980 is reported in [54], though women still represent a smaller proportion of high-publication authors. Gender dynamics are also explored within the software engineering academic community in [55], revealing significant gender biases in academic promotions, with women tending to form less tightly knit co-authorship clusters than men.

7. Conclusions

In this paper, we described the TempoGRAPHer system for visualizing, aggregating, and exploring temporal graphs. TempoGRAPHer offers two complementary strategies for exploring the evolution of a graph, namely skyline-based and interaction-based exploration. Skyline-based exploration identifies the peaks in the evolution of the graph, while interaction-based exploration offers a closer look at the time intervals of significant change, where significant change is determined by the number

θ

of events (stable, new or disappeared interactions) as specified by the user.

As future work, we plan to extend the skyline-based and the interaction-based exploration to consider several types of interactions (i.e., value combinations) for each attribute at the same time. For instance, in our running example, exploration will be supported for not just one value combination for gender (e.g., girl–girl) at a time but for all possible ones. In this case, besides the temporal interval dimension, our skylines will have as many dimensions as the possible value combinations for the attribute. Analogously, the user will provide multiple thresholds, one for each type of interaction.

Additionally, we will explore techniques for identifying the most important attributes and attribute values and direct our exploration on them. In terms of user friendliness, we aim to conduct user studies to further enhance the usability of our system. We also plan to explore alternative, more scalable visualization techniques to improve the user experience.

We also plan to extend TempoGRAPHer to handle streaming temporal data, enabling the system to process and visualize dynamically evolving graphs in real time. This extension will improve TempoGRAPHer’s applicability to time-sensitive applications, such as monitoring rapidly changing networks. By including dynamic exploration, where exploration results are automatically refreshed as new data arrive, the system will be able to provide continuous insights into a graph’s evolution. To achieve this, we will explore performance optimization techniques, such as incremental graph updates and parallel processing, to handle the increased computational load efficiently.

Finally, we aim to expand our interest to various applications of temporal graph analysis, such as social influence dynamics and rumor propagation [56,57], by integrating learning-based approaches, such as graph embedding techniques [58,59].

Author Contributions

Conceptualization, E.T., G.K. and E.P.; methodology, E.T., G.K. and E.P.; software, E.T.; validation, E.T.; formal analysis, E.T., G.K. and E.P.; investigation, E.T., G.K. and E.P.; resources, E.T.; data curation, E.T.; writing—original draft preparation, E.T.; writing—review and editing, E.T., G.K. and E.P.; visualization, E.T.; supervision, G.K. and E.P.; project administration, E.P.; funding acquisition, E.P. All authors have read and agreed to the published version of the manuscript.

Funding

Research work supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “1st Call for H.F.R.I. Research Projects to Support Faculty Members & Researchers and Procure High-Value Research Equipment” (Project Number: HFRI-FM17-1873, GraphTempo).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in GitHub at https://github.com/etsoukanara/graphtempo-demo, accessed on 1 February 2022.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kalyvas, C.; Tzouramanis, T. A Survey of Skyline Query Processing. arXiv 2017, arXiv:1704.01788. [Google Scholar]
Tsoukanara, E.; Koloniari, G.; Pitoura, E. GraphTempo: An aggregation framework for evolving graphs. In Proceedings of the 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, 28–31 March 2023. [Google Scholar]
Tsoukanara, E.; Koloniari, G.; Pitoura, E. Skyline-Based Temporal Graph Exploration. In Proceedings of the Advances in Databases and Information Systems—27th European Conference, ADBIS 2023, Barcelona, Spain, 4–7 September 2023. [Google Scholar]
Tsoukanara, E.; Koloniari, G.; Pitoura, E. TempoGRAPHer: A Tool for Aggregating and Exploring Evolving Graphs. In Proceedings of the 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, 28–31 March 2023. [Google Scholar]
Abedjan, Z.; Grütze, T.; Jentzsch, A.; Naumann, F. Profiling and mining RDF data with ProLOD++. In Proceedings of the IEEE 30th International Conference on Data Engineering, ICDE 2014, Chicago, IL, USA, 31 March–4 April 2014. [Google Scholar]
Cebiric, S.; Goasdoué, F.; Kondylakis, H.; Kotzinos, D.; Manolescu, I.; Troullinou, G.; Zneika, M. Summarizing semantic graphs: A survey. VLDB J. 2019, 28, 295–327. [Google Scholar] [CrossRef]
Lbath, H.; Bonifati, A.; Harmer, R. Schema Inference for Property Graphs. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, 23–26 March 2021. [Google Scholar]
Preti, G.; Lissandrini, M.; Mottin, D.; Velegrakis, Y. Mining patterns in graphs with multiple weights. Distrib. Parallel Databases 2021, 39, 281–319. [Google Scholar] [CrossRef]
Lissandrini, M.; Mottin, D.; Palpanas, T.; Papadimitriou, D.; Velegrakis, Y. Unleashing the Power of Information Graphs. SIGMOD Rec. 2014, 43, 21–26. [Google Scholar] [CrossRef]
Lissandrini, M.; Mottin, D.; Palpanas, T.; Velegrakis, Y. Data Exploration Using Example-Based Methods. In Synthesis Lectures on Data Management; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Mottin, D.; Bonchi, F.; Gullo, F. Graph Query Reformulation with Diversity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015. [Google Scholar]
Lissandrini, M.; Mottin, D.; Palpanas, T.; Velegrakis, Y. Graph-Query Suggestions for Knowledge Graph Exploration. In Proceedings of the WWW ’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020. [Google Scholar]
Gallinucci, E.; Golfarelli, M.; Rizzi, S.; Abelló, A.; Romero, O. Interactive multidimensional modeling of linked data for exploratory OLAP. Inf. Syst. 2018, 77, 86–104. [Google Scholar] [CrossRef]
Varga, J.; Etcheverry, L.; Vaisman, A.A.; Romero, O.; Pedersen, T.B.; Thomsen, C. QB2OLAP: Enabling OLAP on Statistical Linked Open Data. In Proceedings of the 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 16–20 May 2016. [Google Scholar]
Debrouvier, A.; Parodi, E.; Perazzo, M.; Soliani, V.; Vaisman, A.A. A model and query language for temporal graph databases. VLDB J. 2021, 30, 825–858. [Google Scholar] [CrossRef]
Moffitt, V.Z.; Stoyanovich, J. Temporal graph algebra. In Proceedings of the 16th International Symposium on Database Programming Languages, DBPL 2017, Munich, Germany, 1 September 2017. [Google Scholar]
Aghasadeghi, A.; Moffitt, V.Z.; Schelter, S.; Stoyanovich, J. Zooming Out on an Evolving Graph. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, 30 March–2 April 2020. [Google Scholar]
Rost, C.; Gómez, K.; Fritzsche, P.; Thor, A.; Rahm, E. Exploration and Analysis of Temporal Property Graphs. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, 23–26 March 2021. [Google Scholar]
Rost, C.; Gómez, K.; Täschner, M.; Fritzsche, P.; Schons, L.; Christ, L.; Adameit, T.; Junghanns, M.; Rahm, E. Distributed temporal graph analytics with GRADOOP. VLDB J. 2022, 31, 375–401. [Google Scholar] [CrossRef]
Guminska, E.; Zawadzka, T. EvOLAP Graph—Evolution and OLAP-Aware Graph Data Model. In Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety, Proceedings of the 14th International Conference, BDAS 2018, Poznan, Poland, 18–20 September 2018; Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Ghrab, A.; Skhiri, S.; Jouili, S.; Zimányi, E. An Analytics-Aware Conceptual Model for Evolving Graphs. In Proceedings of the Data Warehousing and Knowledge Discovery—15th International Conference, DaWaK 2013, Prague, Czech Republic, 26–29 August 2013. [Google Scholar]
Andriamampianina, L.; Ravat, F.; Song, J.; Vallès-Parlangeau, N. Graph data temporal evolutions: From conceptual modelling to implementation. Data Knowl. Eng. 2022, 139, 102017. [Google Scholar] [CrossRef]
Snijders, T.; van de Bunt, G.; Steglich, C. Introduction to stochastic actor-based models for network dynamics. Soc. Netw. 2010, 32, 44–60. [Google Scholar] [CrossRef]
Krivitsky, P.N.; Handcock, M.S. A separable model for dynamic networks. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 29–46. [Google Scholar] [CrossRef] [PubMed]
Holme, P.; Saramäki, J. Temporal networks. Phys. Rep. 2012, 519, 97–125. [Google Scholar] [CrossRef]
Börzsönyi, S.; Kossmann, D.; Stocker, K. The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001. [Google Scholar]
Huang, X.; Jensen, C.S. In-Route Skyline Querying for Location-Based Services. In Proceedings of the Web and Wireless Geographical Information Systems, 4th InternationalWorkshop, W2GIS 2004, Goyang, Republic of Korea, 26–27 November 2004. [Google Scholar]
Jang, S.; Yoo, J. Processing Continuous Skyline Queries in Road Networks. In Proceedings of the International Symposium on Computer Science and its Applications, Hobart, TAS, Australia, 13–15 October 2008. [Google Scholar]
Kriegel, H.P.; Renz, M.; Schubert, M. Route skyline queries: A multi-preference path planning approach. In Proceedings of the 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA, USA, 1–6 March 2010. [Google Scholar]
Chowdhury, N.; Arefin, M.S. Skyline Path Queries for Location-based Services. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 436–444. [Google Scholar] [CrossRef]
Zou, L.; Chen, L.; Özsu, M.T.; Zhao, D. Dynamic Skyline Queries in Large Graphs. In Proceedings of the Database Systems for Advanced Applications, 15th International Conference, DASFAA 2010, Tsukuba, Japan, 1–4 April 2010. [Google Scholar]
Zheng, W.; Lian, X.; Zou, L.; Hong, L.; Zhao, D. Online Subgraph Skyline Analysis over Knowledge Graphs. IEEE Trans. Knowl. Data Eng. 2016, 28, 1805–1819. [Google Scholar] [CrossRef]
Keles, I.; Hose, K. Skyline Queries over Knowledge Graphs. In Proceedings of the Semantic Web—ISWC 2019, Auckland, New Zealand, 26–30 October 2019. [Google Scholar]
Velitchko, F.; Alessio, A.; Silvia, M. Are We There Yet? A Roadmap of Network Visualization from Surveys to Task Taxonomies. Comput. Graph. Forum 2023, 42, e14794. [Google Scholar]
Vehlow, C.; Beck, F.; Weiskopf, D. Visualizing Group Structures in Graphs: A Survey. Comput. Graph. Forum 2017, 36, 201–225. [Google Scholar] [CrossRef]
Kerracher, N.; Kennedy, J.; Chalmers, K. A Task Taxonomy for Temporal Graph Visualisation. IEEE Trans. Vis. Comput. Graph. 2015, 21, 1160–1172. [Google Scholar] [CrossRef] [PubMed]
Andrienko, N.V.; Andrienko, G.L. Exploratory Analysis of Spatial and Temporal Data—A Systematic Approach; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Beck, F.; Burch, M.; Diehl, S.; Weiskopf, D. A Taxonomy and Survey of Dynamic Graph Visualization. Comput. Graph. Forum 2017, 36, 133–159. [Google Scholar] [CrossRef]
McGee, F.; Ghoniem, M.; Melançon, G.; Otjacques, B.; Pinaud, B. The State of the Art in Multilayer Network Visualization. Comput. Graph. Forum 2019, 38, 125–149. [Google Scholar] [CrossRef]
Hadlak, S.; Schumann, H.; Schulz, H. A Survey of Multi-faceted Graph Visualization. In Proceedings of the 17th Eurographics Conference on Visualization, EuroVis 2015—State of the Art Reports, Cagliari, Italy, 25–29 May 2015. [Google Scholar]
von Landesberger, T.; Kuijper, A.; Schreck, T.; Kohlhammer, J.; van Wijk, J.J.; Fekete, J.; Fellner, D.W. Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges. Comput. Graph. Forum 2011, 30, 1719–1749. [Google Scholar] [CrossRef]
Mishra, P.; Kumar, S.; Chaube, M.K. Graph Interpretation, Summarization and Visualization Techniques: A Review and Open Research Issues. Multim. Tools Appl. 2023, 82, 8729–8771. [Google Scholar] [CrossRef]
Zhu, X.; Wu, J.; Chang, W.; Wang, G.; Liu, Q. Authentication of Skyline Query over Road Networks. In Proceedings of the Security, Privacy, and Anonymity in Computation, Communication, and Storage, Melbourne, NSW, Australia, 11–13 December 2018. [Google Scholar]
Banerjee, S.; Pal, B.; Jenamani, M. DySky: Dynamic Skyline Queries on Uncertain Graphs. In Proceedings of the Web Information Systems Engineering—WISE 2020, Amsterdam, The Netherlands, 20–24 October 2020. [Google Scholar]
Liu, J.; Li, X.; Ren, K.; Song, J. Parallelizing uncertain skyline computation against n-of-N data streaming model. Concurr. Comput. Pract. Exp. 2018, 31, e4848. [Google Scholar] [CrossRef]
Gemmetto, V.; Barrat, A.; Cattuto, C. Mitigation of infectious disease at school: Targeted class closure vs school closure. BMC Infect. Dis. 2014, 14, 695. [Google Scholar] [CrossRef]
Stehlé, J.; Voirin, N.; Barrat, A.; Cattuto, C.; Isella, L.; Pinton, J.; Quaggiotto, M.; den Broeck, W.V.; Régis, C.; Lina, B.; et al. High-resolution measurements of face-to-face contact patterns in a primary school. PLOS ONE 2011, 6, e23176. [Google Scholar] [CrossRef] [PubMed]
Rozemberczki, B.; Kiss, O.; Sarkar, R. Little Ball of Fur: A Python Library for Graph Sampling. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), Online, 19–23 October 2020. [Google Scholar]
DBLP: Computer Science Bibliography. Available online: https://dblp.uni-trier.de/ (accessed on 1 February 2022).
Gender Guesser. Available online: https://github.com/lead-ratings/gender-guesser (accessed on 1 February 2022).
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2016, 5, 19:1–19:19. [Google Scholar] [CrossRef]
Tsoukanara, E. GraphTempo Demo. Available online: https://github.com/etsoukanara/graphtempo-demo (accessed on 1 February 2022).
Tsoukanara, E. GraphTempo Demo Streamlit App. Available online: https://etsoukanara-graphtempo-demo-main-ul7qp1.streamlit.app/ (accessed on 1 February 2022).
Hermsdorff, G.B.; Felso, V.; Ray, E.; Gunderson, L.M.; Helander, M.E.; Maria, J.; Niv, Y. Gender and collaboration patterns in a temporal scientific authorship network. Appl. Netw. Sci. 2019, 4, 112. [Google Scholar] [CrossRef]
D’Angelo, A.; d’Aloisio, G.; Marzi, F.; Marco, A.D.; Stilo, G. Uncovering gender gap in academia: A comprehensive analysis within the software engineering community. J. Syst. Softw. 2024, 217, 112162. [Google Scholar] [CrossRef]
Sun, X.; Cheng, H.; Liu, B.; Li, J.; Chen, H.; Xu, G.; Yin, H. Self-Supervised Hypergraph Representation Learning for Sociological Analysis. IEEE Trans. Knowl. Data Eng. 2023, 35, 11860–11871. [Google Scholar] [CrossRef]
Sun, X.; Yin, H.; Liu, B.; Meng, Q.; Cao, J.; Zhou, A.; Chen, H. Structure Learning Via Meta-Hyperedge for Dynamic Rumor Detection. IEEE Trans. Knowl. Data Eng. 2023, 35, 9128–9139. [Google Scholar] [CrossRef]
Goyal, P.; Ferrara, E. Graph Embedding Techniques, Applications, and Performance: A Survey. Knowl. Based Syst. 2017, 151, 78–94. [Google Scholar] [CrossRef]
Hayat, M.K.; Xue, S.; Wu, J.; Yang, J. Heterogeneous Hypergraph Embedding for Node Classification in Dynamic Networks. IEEE Trans. Artif. Intell. 2024, 5, 5465–5477. [Google Scholar] [CrossRef]

Figure 1. (a) A temporal attributed graph

G [T]

,

T = [1, 3]

, (b)

G [1] \cup G [2]

, (c)

G [1] \cap G [2]

, and (d)

G [1] - G [2]

.

Figure 1. (a) A temporal attributed graph

G [T]

,

T = [1, 3]

, (b)

G [1] \cup G [2]

, (c)

G [1] \cap G [2]

, and (d)

G [1] - G [2]

.

Figure 2. (a) Aggregated graph on gender on 3,

G [[3], {g e n d e r}]

and aggregated evolution graphs for the graph of Figure 1a defined on 2 to 3 (b)

G_{s} [([3], [2]), {g e n d e r}]

, (c)

G_{g} [([3], [2]), {g e n d e r}]

, (d)

G_{h} [([3], [2]), {g e n d e r}]

.

Figure 2. (a) Aggregated graph on gender on 3,

G [[3], {g e n d e r}]

and aggregated evolution graphs for the graph of Figure 1a defined on 2 to 3 (b)

G_{s} [([3], [2]), {g e n d e r}]

, (c)

G_{g} [([3], [2]), {g e n d e r}]

, (d)

G_{h} [([3], [2]), {g e n d e r}]

.

Figure 3. The TempoGRAPHer system architecture.

Figure 4. Graph overview example for Primary School and class on time point 1. Nodes are colored based on their class value. Hovering over a node displays its id. The class value of each node is displayed below the node.

Figure 5. Graph aggregation example for Primary School based on (gender, class) at the intersection graph of intervals [1, 2] and [3, 4]. Each node represents a combination of attribute values. The node weight indicates the number of original nodes and edges with the corresponding attribute values. Hovering over a node reveals its attribute value combination.

Figure 6. Skyline-based graph exploration example with (a) 3D and (b) 2D visualizations for the stability event, gender, and (F, F) interactions. In the 3D plot, each bar represents a skyline result, with the x-axis showing past intervals, the y-axis displaying reference points, and the z-axis representing the count of the requested interactions. Blue bars correspond to the top-3 ranked results. In the 2D plot, each reference point is shown in a separate plot, with distinct color lines representing different results. The x-axis represents past intervals, and the y-axis shows the count of the requested interactions.

Figure 7. Interaction-based graph exploration example for the stability event, gender, (F, F) interactions, and

θ =

30 interactions. The x-axis represents past intervals, and the y-axis shows reference points. Each result denotes that at least 30 (F, F) interactions appear during the displayed interval.

Figure 7. Interaction-based graph exploration example for the stability event, gender, (F, F) interactions, and

θ =

30 interactions. The x-axis represents past intervals, and the y-axis shows reference points. Each result denotes that at least 30 (F, F) interactions appear during the displayed interval.

Figure 8. Operator execution time of (a) DBLP, (b) MovieLens, (c) Primary School.

Figure 9. Aggregation execution time for intersection of (a) DBLP, (b) MovieLens, (c) Primary School.

Figure 10. Aggregation execution time for union of (a) DBLP, (b) MovieLens, (c) Primary School.

Figure 11. Skyline-based exploration execution time for stability of (a) DBLP, (b) MovieLens, (c) Primary School.

Figure 12. Interaction-based exploration execution time for stability of (a) DBLP, (b) MovieLens, (c) Primary School.

Figure 13. Graph overview on the 12th time point of Primary School for (a) gender and (b) class. Nodes are colored based on their attribute value. Clusters formed in the graph seem to depend only on the class attribute.

Figure 14. Graph overview on the 13th time point of Primary School for (a) gender and (b) class. Nodes are colored based on their attribute value. The connections seem random, no clusters are formed.

Figure 15. Graph aggregation of Primary School based on (a) gender and (b) class on time point 12. Each node represents a value of the respective attribute. Most interactions occur between students of the same class.

Figure 16. Graph aggregation of Primary School based on (a) gender and (b) class on time point 13. Each node represents a value of the respective attribute. Significant increase in the interactions between boys (orange nodes) compared to the previous time point. Most interactions occur between students of different classes.

Figure 17. Skyline-based exploration results for stable interactions between (a) girls and (b) boys of Primary School. Higher number of results are reported for boys compared to girls. Time point 12 indicates the highest stability for both girls and boys.

Figure 18. Interaction-based exploration results for stability between (a) girls and (b) boys of Primary School with at least 35 interactions. Longer stability intervals are detected for boys’ interactions.

Figure 19. Skyline-based exploration results for stable interactions between students of Primary School for (a) 1A and (b) 5A classes. Higher number of results with longer intervals and higher counts are reported for class 5A students.

Figure 20. Interaction-based exploration results for stability between (a) 1A and (b) 5A students of Primary School with at least 15 interactions. More results with longer intervals are detected for interactions between 5A students.

Figure 21. Skyline-based exploration results for stable interactions between students of Primary School of the same seniority (age): (a) 1A, 1B and (b) 5A, 5B classes. In both cases, the results are poor, with interactions not appearing to be strongly influenced by age.

Figure 22. Skyline-based exploration results for new collaborations between (a) female and (b) male authors of DBLP. 2019 marks the key reference point for new female collaborations, while 2020 is significant for male collaborations.

Figure 23. Graph aggregation for DBLP by gender of authors in (a) 2019 and (b) 2020. Orange nodes represent male authors and blue nodes female authors. The low number of female authors participating in collaborations, despite most being new and indicating growth, highlights the ongoing underrepresentation of women in research.

Table 1. Primary School graph.

	1	2	3	4	5	6
Nodes	228	231	233	220	118	217
Edges	857	2124	1765	1890	1253	1560
Density	0.0331	0.0800	0.0653	0.0785	0.1815	0.0666
	7	8	9	10	11	12
Nodes	215	232	238	235	235	236
Edges	1051	1971	1170	1230	2039	1556
Density	0.0457	0.0736	0.0415	0.0447	0.0742	0.0561
	13	14	15	16	17
Nodes	147	119	211	175	187
Edges	1654	1336	1457	1065	1767
Density	0.1541	0.1903	0.0658	0.0700	0.1016

Table 2. DBLP graph.

	2000	2001	2002	2003	2004	2005	2006
Nodes	1708	2165	1761	2827	3278	4466	4730
Edges	2036	2565	2004	3485	4070	6031	6156
Density	0.0014	0.0011	0.0013	0.0009	0.0008	0.0006	0.0006
	2007	2008	2009	2010	2011	2012	2013
Nodes	5193	5501	5363	6236	6535	6769	7457
Edges	6391	7160	7331	8627	8530	10,185	11,083
Density	0.0005	0.0005	0.0005	0.0004	0.0004	0.0004	0.0004
	2014	2015	2016	2017	2018	2019	2020
Nodes	7035	8581	8966	9660	11,037	12,377	12,996
Edges	10,292	13,782	14,787	16,312	18,806	24,790	26,404
Density	0.0004	0.0004	0.0004	0.0003	0.0003	0.0003	0.0003

Table 3. MovieLens graph.

	May	Jun	Jul	Aug	Sep	Oct
Nodes	486	508	778	1309	575	498
Edges	87,969	76,883	176,715	530,862	65,570	42,793
Density	0.7464	0.5970	0.5847	0.6201	0.3973	0.3458

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsoukanara, E.; Koloniari, G.; Pitoura, E. TempoGRAPHer: Aggregation-Based Temporal Graph Exploration. Information 2025, 16, 46. https://doi.org/10.3390/info16010046

AMA Style

Tsoukanara E, Koloniari G, Pitoura E. TempoGRAPHer: Aggregation-Based Temporal Graph Exploration. Information. 2025; 16(1):46. https://doi.org/10.3390/info16010046

Chicago/Turabian Style

Tsoukanara, Evangelia, Georgia Koloniari, and Evaggelia Pitoura. 2025. "TempoGRAPHer: Aggregation-Based Temporal Graph Exploration" Information 16, no. 1: 46. https://doi.org/10.3390/info16010046

APA Style

Tsoukanara, E., Koloniari, G., & Pitoura, E. (2025). TempoGRAPHer: Aggregation-Based Temporal Graph Exploration. Information, 16(1), 46. https://doi.org/10.3390/info16010046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TempoGRAPHer: Aggregation-Based Temporal Graph Exploration

Abstract

1. Introduction

2. Related Work

3. The TempoGRAPHer Framework

3.1. Temporal Aggregation

3.2. Graph Evolution and Aggregation

3.3. Graph Exploration

3.3.1. Skyline-Based Exploration

3.3.2. Interaction-Based Exploration

4. Implementation

4.1. Data Structures

4.2. Temporal Operators and Aggregation

4.3. Graph Exploration

5. The TempoGRAPHer System

6. Evaluation

6.1. Performance Evaluation

6.2. Qualitative Evaluation

6.2.1. Case Study: Targeting Transmission Pathways in Primary School

6.2.2. Case Study: Growth of Same Gender Collaborations in DBLP

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI