Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting

Ge, Yajun; Wang, Jiannan; Zhang, Bo; Peng, Fan; Ma, Jing; Yang, Chenyu; Zhao, Yue; Liu, Ming

doi:10.3390/math12193159

Open AccessArticle

Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting

by

Yajun Ge

^1,*,

Jiannan Wang

²,

Bo Zhang

²,

Fan Peng

²,

Jing Ma

³,

Chenyu Yang

⁴,

Yue Zhao

⁵

and

Ming Liu

^6,*

¹

Shaanxi Transportation Holding Group Co., Ltd., Xi’an 710000, China

²

Operation Management Branch of Shaanxi Transportation Holding Group Co., Ltd., Xi’an 710000, China

³

Shaanxi Expressway Testing & Measuring Co., Ltd., Xi’an 710000, China

⁴

School of Economics, Renmin University of China, Beijing 100872, China

⁵

School of Civil Engineering and Architecture, Xi’an University of Technology, Xi’an 710048, China

⁶

School of Materials Science and Engineering, Xi’an University of Technology, Xi’an 710048, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(19), 3159; https://doi.org/10.3390/math12193159

Submission received: 6 September 2024 / Revised: 3 October 2024 / Accepted: 8 October 2024 / Published: 9 October 2024

(This article belongs to the Special Issue Numerical Computation, Data Analysis and Software in Mathematics and Engineering, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate traffic flow prediction in road networks is essential for intelligent transportation systems (ITS). Since traffic data are collected from the road network with spatial topological and time series sequences, the traffic flow prediction is regarded as a spatial–temporal prediction task. With the powerful ability to model the non-Euclidean data, the graph convolutional network (GCN)-based models have become the mainstream framework for traffic forecasting. However, existing GCN-based models either use the manually predefined graph structure to capture the spatial features, ignoring the heterogeneity of road networks, or simply perform 1-D convolution with fixed kernel to capture the temporal dependencies of traffic data, resulting in insufficient long-term temporal feature extraction. To solve those issues, a spatial–temporal correlation constrained dynamic graph convolutional network (STC-DGCN) is proposed for traffic flow forecasting. In STC-DGCN, a spatial–temporal embedding encoder module (STEM) is first constructed to encode the dynamic spatial relationships for road networks at different time steps. Then, a temporal feature encoder module with heterogeneous time series correlation modeling (TFE-HCM) and a spatial feature encoder module with dynamic multi-graph modeling (SFE-DCM) are designed to generate dynamic graph structures for effectively capturing the dynamic spatial and temporal correlations. Finally, a spatial–temporal feature fusion module based on a gating fusion mechanism (STM-GM) is proposed to effectively learn and leverage the inherent spatial–temporal relationships for traffic flow forecasting. Experimental results from three real-world traffic flow datasets demonstrate the superior performance of the proposed STC-DGCN compared with state-of-the-art traffic flow forecasting models.

Keywords:

traffic flow forecasting; spatial–temporal correlation; dynamic graph convolutional network; attention mechanism

MSC:

05C82

1. Introduction

Accurate traffic flow forecasting plays a crucial role in modern traffic management and services, serving as a powerful tool for mitigating traffic congestion, optimizing road utilization, enhancing traffic management, improving traffic safety, and facilitating route planning [1,2]. Traffic flow forecasting presents a typical spatial–temporal data prediction challenge due to its reliance on historical road network data that encompass both spatial topological and time series characteristics. With the complex spatial–temporal dependencies contained in road networks, it is still a challenging task to accurately predict the traffic flow [3,4].

Currently, mainstream traffic flow forecasting methods are mainly divided into three categories: traditional statistical methods, machine learning methods, and deep learning methods. The traditional statistical methods, such as auto-regressive integrated moving average model (ARIMA) [5] and vector auto-regressive model (VAR) [6], utilize the statistical characteristics of historical traffic data to predict future traffic flow. These models are suitable for relatively stable traffic patterns but have limitations in capturing complex nonlinear relationships and effectively handling the spatial dependencies of road network data. The machine-learning-based methods, such as support vector regression (SVR) [7], random forest (RFR) [8], and K-nearest neighbor (KNN) [9], primarily rely on stationarity assumptions [10] that only capture the time series feature and overlook complex spatial–temporal dependencies, thereby limiting their effectiveness in predicting highly nonlinear traffic flow relationships. In contrast, the deep-learning-based methods have emerged as the mainstream traffic flow prediction model due to their robust ability to extract spatial–temporal features and mode nonlinear relationships, e.g., long short-term memory networks (LSTM) [11], gated recurrent units (GRU) [12], recurrent neural networks (RNN) [13], convolutional neural networks (CNN) [14], and graph convolutional networks (GCN) [15].

Given its effective ability to model the spatial–temporal dependencies of road networks, the GCN demonstrates superior performance for traffic flow forecasting, especially the spatial–temporal graph convolutional network (ST-GCN) model [16], which combines the advantage of graph structure and time series modeling. Despite the good performance of ST-GCN-based models in improving traffic flow prediction accuracy, they still have some limitations. Firstly, most existing GCN-based models rely on the manually predefined graph structure to capture the spatial features. However, road segments are heterogeneous, with different types of roads (such as main roads and branch roads) exhibiting distinct characteristics and dynamically correlated over time. Moreover, the connection strength or type between nodes changes dynamically. The predefined graph structure limits the model’s ability to learn dynamic spatial correlation and cannot effectively handle the disturbance from random events like traffic accidents. Secondly, for the periodic features of traffic flow, existing GCN-based methods extract the temporal features through a CNN or an RNN. Due to the fixed size of their convolution kernels, these methods struggle to capture the temporal dependencies of long-distance nodes and ignore the dynamic relationship of similarity between nodes at different periods, resulting in insufficient temporal feature extraction [17].

To solve those issues, a spatial–temporal-correlation-constrained dynamic graph convolutional network (STC-DGCN) for traffic flow forecasting is proposed as shown in Figure 1. To effectively encode the dynamic changes of spatial relationships for road networks at different time steps, a spatial–temporal embedding encoder module (STEM) is devised to embed the periodic traffic signal. Then, a temporal feature encoder module with heterogeneous time series correlation modeling (TFE-HCM) is proposed to capture the dynamic correlation of different nodes over time. To overcome the limitation of the GCN-based models that rely on manually predefined graph structure to capture the spatial features, a spatial feature encoder module with dynamic multi-graph modeling (SFE-DCM) is devised to explore and quantify inter-node influence in the spatial dimension. By alternately stacking TFE-HCM and SFE-DCM modules to generate dynamic graph structures, the proposed model can effectively capture the dynamic spatial and temporal correlations of traffic signals. Finally, a spatial–temporal feature fusion module based on gating fusion mechanism (STM-GM) is proposed to effectively learn and leverage the inherent spatial–temporal relationships for traffic flow forecasting.

The main objectives of our work are as follows:

(1): To effectively capture the long-term temporal features, a TFE-HCM module is proposed to capture the dynamic correlation of different nodes over time with a GTF block and LTF block. Specifically, the GTF block utilizes an attention mechanism to capture the global dynamic correlation of various nodes across different time steps, while the LTF block relies on the attention scores to encode the impact of temporally adjacent nodes for future predictions.
(2): To overcome the limitation of the GCN-based models using the manually predefined graph structure to capture the spatial features, an SFE-DCM is devised to extract dynamic spatial correlations by an SDM block and an ASM block. The SDM block uses an attention mechanism to explore and quantify the influence nodes in the spatial dimension, and the ASM block is used to capture the dynamic correlations for all nodes at each time step.
(3): An STM-GM module is designed to effectively fuse the spatial features and temporal features, where a gated fusion unit is explored to effectively learn and leverage the inherent spatial–temporal relationships within the traffic data.

2. Related Works

2.1. Traffic Flow Forecasting

Traffic flow forecasting, a classic time series prediction problem that involves utilizing the historical road network data, contains both spatial topological and time series characteristics. The mainstream traffic flow forecasting methods are mainly classified into three categories, the traditional statistical based methods [5,6] that primarily focus on modeling the historical temporal traffic signal for future traffic flow forecasting, such as ARIMA [5] and VAR [6]. However, these methods have limitations in capturing complex nonlinear relationships and often result in low prediction accuracy due to their reliance on linear relationships within the historical traffic data. Machine-learning-based methods [7,8,9] that rely on stationarity assumptions to capture temporal information for prediction include SVR [7] and KNN [9]. Since these methods ignore the spatial–temporal dependencies of traffic signals, they show some limitations for traffic flow predicting with highly nonlinear relationships. The deep-learning-based methods [10,11,12,13,14,15,16,17,18], such as LSTM [11], CNN [14], GRU [12], and GCN [15], have emerged as the leading traffic flow prediction model due to their powerful ability to extract spatial–temporal features and model nonlinear relationships. Since the road network data contain both spatial topological and time series characteristics, the GCN-based traffic flow forecasting methods attract more attention with the ability to model non-Euclidean data.

2.2. GCN-Based Traffic Forecasting

The GCN-based traffic forecasting methods typically leverage the spectral GCN to capture the spatial correlation of traffic data and rely on the RNNs (e.g., RNN, LSTM, GRU), CNNs, and self-attention mechanism to model the temporal correlations. DCRNN [18] combines diffusion GCN with GRU to extract both spatial and temporal dependencies of traffic signals. ST-GCN performs convolution operations to capture temporal correlations and employs spectral convolution to capture spatial correlations for traffic flow forecasting. Since the road network is a dynamically changing system, the manually predefined graph structure in ST-GCN cannot effectively encode those dynamic changes. To address this limitation, Graph WaveNet [19] introduces an adaptive GCN to capture the spatial dynamic correlations. To effectively model the long-term spatial–temporal correlations, ASTGCN [20] incorporates an attention mechanism constraint, with specially designed temporal and spatial attention modules to capture the spatial–temporal dependencies in traffic data. Furthermore, to overcome the over-smooth issue in GCN and effectively process the time series sequential, Choi et al. introduced the ordinary differential equations into GCN, and proposed an STG-NCDE model [21] to capture the long-term spatial–temporal dependencies for prediction.

Compared to previous related works, the proposed STC-DGCN model designs a temporal feature encoder module with heterogeneous time series correlation modeling (TFE-HCM) and a spatial–temporal embedding encoder module with dynamic multi-graph modeling (SFE-DGM). This allows the model to effectively learn the dynamic spatial–temporal relationships of road networks. By incorporating the attention-mechanism-constrained GTF block and LTF block within TFE-HCM, our model can effectively encode the dynamic temporal heterogeneity of time series traffic signals. Moreover, the SDM block and ASM block within SFE-DGM enable the proposed model to effectively capture the dynamic inter-node correlations in the spatial dimension.

3. Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network

3.1. Preliminaries

Traffic network: By considering the spatial topological and time series characteristics of traffic flow data, a traffic network is represented as a graph

G = (υ, E, A)

, where

υ

is a node set with

N

nodes that denotes sensors in the road network.

E

is the edge set that denotes the connectivity between road nodes.

A = {(α_{i j})}_{N \times N} \in R^{N \times N}

is the adjacency matrix that encodes the connection relationship between road nodes for graph

G

, and

α_{i j}

is the connection weight for the edge between node

υ_{i}

and node

υ_{j}

.

Traffic signal: A traffic signal

X_{t, n, c} \in R^{T \times N \times C}

represents the collected traffic data for node

n

at time

t

, where

1 \leq t \leq T

,

1 \leq n \leq N

,

1 \leq c \leq C

,

T

is length of time series,

N

is the number of sensors in road network, and

C

is the types of traffic data collected by the sensors, including traffic flow, speed, density, etc.

C = 1

in our work since we only involve traffic flow forecasting.

X_{t} = [X_{t, 1}, \dots, X_{t, n}, \dots X_{t, N}] \in R^{N \times 1}

represents the traffic data collected by all sensors in the traffic network

G

at time step

t

.

Traffic flow forecasting: Given the traffic network

G

and its historical traffic signal

X_{(t - T_{h} + 1) : t} = [X_{t - T_{h} + 1}, \dots, X_{t - 1}, X_{t}] \in R^{T_{h} \times N \times 1}

, the goal of traffic flow forecasting is to construct a traffic flow prediction model to predict the future traffic signals

Y_{(t + 1) : (t + T_{P})} = [Y_{t + 1}, Y_{t + 2}, \dots Y_{t + T_{P}}] \in R^{T_{P} \times N \times 1}

as follows:

Y_{(t + 1) : (t + T_{P})} = P (X_{(t - T_{h} + 1) : t}, G)

(1)

where

P (\cdot)

is the traffic flow prediction model, and

T_{h}

and

T_{P}

are the length of time step for historical and future traffic signal, respectively.

3.2. The Proposed Model

To effectively capture the dynamic spatial correlations and temporal heterogeneity of traffic signals, a spatial–temporal-correlation-constrained dynamic graph convolutional network (STC-DGCN) is proposed for traffic flow forecasting as shown in Figure 1. The proposed STC-DGCN consists of four modules stacked with different functions: the spatial–temporal embedding encoder module (STEM) is constructed to encode the dynamic spatial relationships for road networks across different time steps; the temporal feature encoder module with heterogeneous time series correlation modeling (TFE-HCM) is devised to capture dynamic temporal correlation for each node, utilizing both a global contextual temporal feature encoder block and local temporal feature encoder block, as shown in Figure 1c; the spatial feature encoder module with dynamic multi-graph modeling (SFE-DCM) is designed to capture the dynamic inter-node correlations in the spatial dimension; and the spatial–temporal feature fusion module based on gating fusion mechanism is proposed to effectively learn and leverage the inherent spatial–temporal relationships within the traffic signals for traffic flow forecasting.

3.2.1. Spatial–Temporal Embedding Encoder Module (STEM)

To effectively encode the dynamic changes in spatial relationships for road networks at various time steps, a spatial–temporal embedding encoder module (STEM) is devised as shown in Figure 1b.

For a traffic road network

G

, the raw traffic signal is initially mapped to a high-dimension space with a fully connected layer

F C (\cdot)

, in order to preserve its native information as follows:

E_{f} = F C (X_{(t - T_{h} + 1) : t})

(2)

where

E_{f} \in R^{T_{h} \times N \times d_{f}}

is the embedding feature, and

d_{f}

is the dimension for

E_{f}

.

Due to the cyclical impact of human activities such as daily commuting, traffic signals exhibit significant cyclic dynamic correlations, and thus, knowing how to effectively embed the periodic traffic signal is useful for improving the model’s accuracy. For a traffic road network

G

with

N

nodes, the frequency of sensor collection of traffic data is

N_{d}

times a day and

N_{w}

in a week, and the daily traffic signal and weekly traffic signal are represented as

X_{d} = [X_{t - (T_{d} - 1) - τ}, X_{t - (T_{d} - 2) - τ}, \dots X_{t - τ}] \in R^{T_{d} \times N \times d_{f}}

and

X_{w} = [X_{t - (T_{w} - 1) - 7 \times τ}, X_{t - (T_{w} - 2) - 7 \times τ}, \dots X_{t - 7 \times τ}] \in R^{T_{w} \times N \times d_{f}}

, respectively. Where

T_{d}

and

T_{w}

are time step for

X_{d}

and

X_{w}

,

τ

is the frequency of sensor collection at the traffic signals.

d_{f}

is the dimension after performing Equation (2). For current traffic signal

X_{t}

, we follow the [17] to represent the learnable daily embedding dictionary as

T_{t}^{d} \in R^{N_{d} \times d_{f}}

and weekly embedding dictionary as

T_{t}^{w} \in R^{N_{w} \times d_{f}}

,

N_{d} = 288

(24 h in a day and each 5 min to collect traffic signals),

N_{w} = 7

. Then, we extract the daily embedding

E_{t}^{d} \in R^{T_{d} \times d_{f}}

and the weekly embedding

E_{t}^{w} \in R^{T_{w} \times d_{f}}

from

T_{t}^{d}

and

T_{t}^{w}

as in [22]. Finally, the 1 × 1 convolutional is used to transform the dimensions of

E_{t}^{d}

and

E_{t}^{w}

to be the same as

E_{f}

, and thus the periodical embedding spatial–temporal features

X^{E}

for the time series traffic signal is represented as follows:

X^{E} = c a t (E_{f}, E_{t}^{d}, E_{t}^{w}) \in R^{T \times N \times D}

(3)

where

c a t (\cdot)

is the concatenate operation, and

D = 3 d_{f}

is the dimension for

X^{E}

.

3.2.2. Temporal Feature Encoder Module with Heterogeneous Time Series Correlation Modeling

With the significant temporal heterogeneity of time series traffic signals, the relevance of these signals for future predictions varies across different time steps. Moreover, traffic signals demonstrate inter-temporal correlations (e.g., periodicity, trending) across different time slices. For instance, the traffic signals that temporally proximate to the current time step typically manifest higher correlation and exert a stronger influence on future traffic flow prediction. Thus, to effectively capture the intrinsic temporal correlation among traffic signals, a temporal feature encoder module with heterogeneous time series correlation modeling (TFE-HCM) is devised to explore dynamic temporal correlation for each node in the road network as shown in Figure 1c.

(1) Global contextual temporal feature encoder block with attention mechanism (GTF): Since the temporal dependences of nodes exhibit variability across different time steps, we utilize the attention mechanism to calculate the temporal correlation matrix for the input traffic signals, aiming to capture the global dynamic correlation of different nodes over time. For node

υ_{n}

in

G

, the corresponding embedding features are denoted as

X_{: n :}^{E} \in X^{E}

.

X_{: n :}^{E} \in X^{E}

is first mapped to obtain the query vector

Q_{n}^{h}

, key vector

K_{n}^{h}

, and value vector

V_{n}^{h}

by linear transformer as shown in Equation (4). Then, we calculate the temporal correlation

T_{h}^{a t t}

of node

υ_{n}

in all time series as shown in Equation (5) and obtain the

h

attention score as shown in Equation (6). Finally, we concatenate all

H

attention values to obtain the temporal correlation as shown in Equation (7).

Q_{n}^{h} = X_{: n :}^{E} W_{Q_{n}}^{h}, K_{n}^{h} = X_{: n :}^{E} W_{K_{n}}^{h}, V_{n}^{h} = X_{: n :}^{E} W_{V_{n}}^{h}

(4)

T_{n}^{h} = \frac{Q_{n}^{h} {(K_{n}^{h})}^{T}}{\sqrt{D}}

(5)

A_{n}^{h} = s o f t m a x (T_{n}^{h})

(6)

X_{T, G T F}^{(l + 1)} (Q_{n}^{h}, K_{n}^{h}, V_{n}^{h}) = c a t (\sum_{n = 1}^{N} A_{n}^{h} \times V_{n}^{h}), h = 1, \dots, H

(7)

where

W_{Q_{n}}^{h} \in R^{D \times D}

,

W_{K_{n}}^{h} \in R^{D \times D}

,

W_{V_{n}}^{h} \in R^{D \times D}

are the trainable parameter matrix,

H

is the number of multi-head attention, and

c a t (\cdot)

is the concatenation operation. As demonstrated in Equation (7), GTF calculates the dynamic temporal correlation for all nodes in all time steps with different attention scores, which can effectively capture the global dynamic correlation of different nodes over time.

(2) Local temporal feature encoder block with attention-constrained temporal convolution (LTF): Since the traffic signals that temporally proximate to the current time step typically exert a stronger influence on future traffic flow prediction, we proposed an LTF block that emphasizes the impact of temporally close nodes for future prediction. According to the naïve temporal convolution operation, we first define the temporal adjacent correlation matrix

A_{T}

as in [23], where the elements in

A_{T}

are assigned a value of 1 when the temporal distance between two time steps is less than a predefined threshold λ, and 0 otherwise. Then, with the temporal attention score computed in Equation (6) for each node in all time steps, a local temporal correlation matrix

A_{T}^{a t t}

is defined as follows:

\begin{array}{l} A_{T}^{a t t} = A_{T} ⊙ M_{c a t, T}^{A t t} \\ M_{c a t, T}^{A t t} = c a t (\sum_{n = 1}^{N} A_{n}^{h}) \end{array}

(8)

where

⊙

is Hadamard product.

Then, with the attention score to measure the importance of each node for future prediction, the LTF block aggregates local temporal features as follows:

X_{T, L T F}^{(l + 1)} (A_{T}, M_{c a t, T}^{A t t}) = σ ({WX}^{(l)} A_{T}^{a t t})

(9)

where

W

is the learnable weight matrix for temporal graph convolution, and

A_{T}^{a t t}

represents the temporal connection relationship with attention constraint in temporal dimension.

After performing the global context temporal feature encoder and local temporal feature encoder, the two kinds of heterogeneous time series feature are fused to achieve the temporal representation as follows:

X_{T}^{(l + 1)} = X_{T, G T F}^{(l + 1)} + X_{T, L T F}^{(l + 1)}

(10)

Finally, the residual connection is added into the aggregated features

X_{T}^{(l + 1)}

in the temporal dimension as the output of the TFE-HCM.

3.2.3. Spatial Feature Encoder Module with Dynamic Multi-Graph Modeling (SFE-DGM)

The traffic conditions of a road network are influenced by other roads to varying degrees, and this influence exhibits a highly dynamic correlation over time. Simply applying a naïve GCN to capture spatial features of neighboring nodes in a road network may overlook the similarities between nodes at different time periods, thus failing to effectively capture the dynamic inter-node correlations. To address this issue, a spatial feature encoder module with dynamic multi-graph modeling (SFE-DGM) is designed to extract dynamic spatial correlations. As illustrated in Figure 1d, the proposed SFE-DGM module comprises three main components: a spatial feature encoder block with dynamic adjacent correlation modeling (SDM), an attention-mechanism-constrained spatial feature encoder block (ASM), and a residual connection.

(1) Spatial feature encoder block with dynamic adjacent correlation modeling (SDM): To effectively encode the spatial correlation between adjacent roads, a graph

G_{a} = (υ, E, M_{a})

is constructed to encode the correlation between each node at every time step.

M_{a}

is the weight matrix that measures the spatial correction between nodes as follows:

\begin{array}{l} M_{a} = [\begin{matrix} 1 & m_{a} (υ_{1}, υ_{2}) & \dots & m_{a} (υ_{1}, υ_{N}) \\ m_{a} (υ_{2}, υ_{1}) & 1 & \dots & m_{a} (υ_{2}, υ_{N}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ m_{a} (υ_{N}, υ_{1}) & m_{a} (υ_{N}, υ_{2}) & \dots & 1 \end{matrix}] \\ m_{a} (υ_{i}, υ_{j}) = \frac{d_{υ_{i}, υ_{j}} - d_{\min}}{d_{\max} - d_{\min}} \end{array}

(11)

where

d_{υ_{i}, υ_{j}}

is the spatial proximity between two nodes, and

d_{\max}

and

d_{\min}

are the maximum and minimum distance between nodes in graph

G_{a}

.

Since the influence among different nodes in a road network exhibits significant variability, some nodes, despite being spatially distant, may demonstrate similar historical traffic patterns due to comparable functional characteristics, such as both being located near educational institutions. To effectively explore and quantify this differential influence among nodes, attention mechanism is used to measure the inter-node influence. For all nodes in

G

at time

t

, the corresponding embedding features are denoted as

X_{t : :}^{E} \in X^{E}

.

X_{t : :}^{E} \in X^{E}

is first mapped to obtain the query vector

Q_{t}^{h}

, key vector

K_{t}^{h}

, and value vector

V_{t}^{h}

by linear transformer as shown in Equation (12).

Q_{t}^{h} = X_{t : :}^{E} W_{Q_{t}}^{h}, K_{t}^{h} = X_{t : :}^{E} W_{K t}^{h}, V_{t}^{h} = X_{t : :}^{E} W_{V t}^{h}

(12)

where

W_{Q_{t}}^{h} \in R^{D \times D}

,

W_{K t}^{h} \in R^{D \times D}

,

W_{V t}^{h} \in R^{D \times D}

are the trainable parameter matrix, and

h

is the multi-head attention.

We then achieve the spatial attention matrix similarly to Equation (5), as follows:

S_{t}^{h} = \frac{Q_{t}^{h} {(K_{t}^{h})}^{T}}{\sqrt{D}}

(13)

With the spatial attention matrix to measure the influence of each node in space, the dynamic adjacent correlation is formulated as follows:

A_{S}^{A t t} = C o n c a t (s o f t m a x (S_{t}^{h} ⊙ M_{a})) |h = 1, \dots, H

(14)

Finally, the SDM block aggregates spatial features as follows:

X_{S, S D M}^{(l + 1)} (S_{t}^{h}, M_{a}) = σ ({WX}^{(l)} A_{S}^{A t t})

(15)

where

W

is the learnable weight matrix for spatial feature.

(2) Attention-mechanism-constrained spatial feature encoder (ASM): In the topological graph

G_{a}

of road networks, some nodes lack direct connectivity but are accessible through multi-step paths. These indirectly connected nodes often exhibit significant correlations that the pre-defined adjacency matrix in

G_{a}

fails to capture. For instance, as illustrated in Figure 2, while nodes B and D are not directly linked, congestion at node B can propagate and impact node D since nodes B and D share common resources (both are schools) or paths (reached in several steps) in the road network. Such intermediate dependencies within the road network are crucial and should not be disregarded. Thus, we employ a spatial attention mechanism [24] to capture the dynamic correlations for all nodes at each time step.

With the spatial attention matrix

S_{t}^{h}

computed in Equation (13), we first obtain the attention score as follows:

A_{S}^{h} = s o f t m a x (S_{t}^{h})

(16)

Then the attention-constrained spatial feature extraction is calculated as:

X_{S, S A M}^{(l + 1)} (Q_{t}^{h}, K_{t}^{h}, V_{t}^{h}) = c a t (\sum_{n = 1}^{N} A_{S}^{h} \times V_{n}^{h}), h = 1, \dots, H

(17)

After constructing the SDM block and ASM block, the two kinds of spatial features are fused to achieve the spatial representation as follows:

X_{S}^{(l + 1)} = X_{S, S D M}^{(l + 1)} + X_{S, S A M}^{(l + 1)}

(18)

Finally, the residual connection is added into the aggregated features

X_{S}^{(l + 1)}

in the spatial dimension as the output of the SFE-DCM.

3.2.4. Spatial–Temporal Feature Fusion Module Based on Gating Fusion Mechanism (STM-GM)

Since the traffic state at any given time step is influenced by both its historical traffic signals and the adjacent road nodes, directly predicting the traffic flow by relying solely on either spatial feature extraction or temporal feature extraction modules may not adequately capture the intricate spatial–temporal dependencies inherent in traffic flow signals. These dependencies are crucial for improving the accuracy of prediction. Thus, a spatial–temporal feature fusion module is designed based on a gating fusion mechanism (STM-GM) to effectively fuse the spatial features and temporal features extracted in Section 3.2.3 and Section 3.2.2. As shown in Figure 1e, the proposed STM-GM module employs a gated fusion unit to effectively learn and leverage the inherent spatial–temporal relationships within the traffic data.

For the

l

layer output

X_{T}^{(l)}

of TFE-HCM and

X_{S}^{(l)}

of SFE-DCM,

X_{T}^{(l)}

and

X_{S}^{(l)}

are fused to achieve the update features as follows:

X^{(l + 1)} = μ_{t} ⊙ X_{T}^{(l + 1)} + (1 - μ_{t}) ⊙ X_{S}^{(l + 1)}

(19)

μ_{t} = σ (l i n e a r (X^{(l)}) + l i n e a r ({(X^{(l)})}^{T}))

(20)

where

σ (\cdot)

is sigmoid activation function, and

l i n e a r (\cdot)

is linear mapping.

X^{(l)}

and

X^{(l + 1)}

are the input and output at time step

t

for

l

layer.

X^{(0)}

is the initial node feature

X^{E}

of the road network G. Furthermore, since the dimensions for

X_{T}^{(l)}

and

X_{S}^{(l)}

extracted by TFE-HCM and SFE-DCM are different,

X^{(l)}

is first transposed before being fed into the linear layer as shown in Figure 1e. After performing nonlinear activation, the gated fused weight

μ_{t}

is achieved by performing an element-wise addition operation as in Equation (20).

By performing the STM-GM module for each layer

l

, the output for all layers is concatenated to achieve the fusion features as follows:

Z = C o n c a t (X^{(l)})) |l = 1, \dots, L

(21)

Finally, the fused feature

Z

is fed into a two-layer fully convolution layer for final prediction as follows:

\tilde{Y} = Re L U (F C (Z))

(22)

where

Re L U (\cdot)

is the activation function, and

F C (\cdot)

is a fully convolution layer.

4. Experiments

To evaluate the effectiveness of the proposed STC-DGCN, three real traffic flow benchmark datasets [25], namely the PEMS04, PEMS07, and PEMS08 datasets, are used to compare the prediction performance of our STC-DGCN with some state-of-the-art traffic flow forecasting methods. Those datasets are widely used for traffic flow forecasting as they provide a good representation of diverse traffic conditions and patterns across different geographical locations. Ablation studies are conducted to extensively verify the contribution of each module of STC-DGCN on PEMS04.

4.1. Datasets and Evaluation Metrics

Datasets: As shown in Table 1, the PEMS04, PEMS07, and PEMS08 datasets are collected at traffic signals at a frequency of 30 s, and then aggregated into 5 min intervals by the Caltrans Performance Measure System (PeMS) [26]. Each sensor collected 12 traffic signal per hour, totaling 288 time steps per day. PEMS04 recorded traffic flow data for 307 sensors in the San Francisco area from 1 January 2018 to 8 February 2018, PEMS07 recorded traffic flow data for 883 sensors in the San Francisco area from 1 May 2017 to 31 August 2017, and PEMS08 recorded traffic flow data for 170 sensors in the San Bernardino area from 1 July 2016 to 31 August 2017. Since the goal of our STC-DGCN is to predict the traffic flow, the type of traffic signal is traffic flow.

Evaluation metrics: To evaluate the traffic flow prediction performance, three widely used metrics are adopted to measure the difference between the predicted results and true values, namely, mean absolute error (MAE), root mean squared error (RMSE) and mean absolute percent error (MAPE).

\begin{array}{l} M A E = \frac{1}{N} \sum_{n = 1}^{N} |Y_{n} - {\tilde{Y}}_{n}| \\ R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(Y_{n} - {\tilde{Y}}_{n})}^{2}} \\ M A P E = \frac{100 %}{N} \sum_{n = 1}^{N} |\frac{Y_{n} - {\tilde{Y}}_{n}}{Y_{n}}| \end{array}

(23)

where

{\tilde{Y}}_{n}

and

Y_{n}

are the prediction and ground truth for sample

n

, and

N

is the number of test samples.

4.2. Experimental Settings

Following [16,18,19,27,28], we split the PEMS04, PEMS07, and PEMS08 datasets into a 6:2:2 ratio for training, validation, and testing, and a Z-score normalization was applied to pre-process the traffic signals in those datasets. All experiments were conducted in Python 3.6 under the PyTorch framework with two Nvidia GTX 2080Ti GPUs. The proposed model is trained using the Adam optimizer, with MAE as the loss function. The initial learning rate is 0.004 and the number of training epochs is 200. The number of layers in the TFE-HCM module and SFE-DGM module is

L = 3

. The embedding dimension in the STEM module is

d_{f} = 12

. The multi-head attention in the TFE-HCM and SFE-DGM modules is

H = 4

. The STC-DGCN model uses

T_{h} = 12

historical traffic signal as input to predict the future hour traffic flow with

T_{p} = 12

.

4.3. Some State-of-the-Art Methods

(1): ARIMA [5] (auto-regressive integrated moving average): an auto-regressive model that utilizes auto-regression and a moving average model for time series data prediction.
(2): VAR [29] (vector auto-regression): an auto-regressive model that uses a statistical method to model the time correlation of traffic data.
(3): LSTM [30] (long short-term memory): a recurrent neural network (RNN)-based traffic flow forecasting model that utilizes multi-layer LSTM and an auto-regressive model to capture time series dependency of traffic signals.
(4): GRU-ED [31] (gated recurrent unit encoder–decoder): a variant of the LSTM model that construct an encoder–decoder model with multiple GRU units for time series forecasting.
(5): ST-GCN [16] (spatial–temporal graph convolution network): a graph convolution network (GCN)-based model that combines the gated 1-D convolution layer and spectral GCN to capture the spatial and temporal dependencies of traffic signals.
(6): DCRNN [18] (diffusion convolutional recurrent neural network): a GCN-based model that adopts a seq-to-seq framework for forecasting, it combines diffusion GCN with GRU to extract the spatial and temporal dependencies of traffic signals.
(7): GWN [19] (Graph WaveNet): an adaptive GCN-based model that combines spatial GCN and gated TCN with an adaptive adjacency matrix to capture the spatial and temporal dependencies.
(8): AGCRN [27] (adaptive graph convolutional recurrent network): an adaptive GCN-based model that combines the adaptive graph and GRU units with a node-adaptive parameter learning module for capturing the spatial–temporal correlation of traffic flow data.
(9): DSANet [32] (dual self-attention network): an attention-based model that utilizes a dual self- attention mechanism to capture spatial dependencies and extracts temporal correlations using a convolution with a kernel size of 3.
(10): ASTGCN [20] (attention-based spatial–temporal graph convolutional network): an attention-based model that combines a self-attention mechanism with GCN to capture spatial–temporal dependencies for traffic data.
(11): STG-NCDE [21] (spatial–temporal graph neural controlled differential equation): an ST-GCN-based model that combines the neural controlled differential equations with GCN for processing the time series data.
(12): STGODE [28] (spatial–temporal graph ordinary differential equation networks): an ST-GCN-based model that utilizes ordinary differential equations to overcome the over-smooth issue in GCN, where a spatial adjacency matrix, semantic adjacency matrix, and two temporal dilation convolutions are designed to capture the long-term temporal dependencies of traffic flow data.

For each compared model described above, we perform their codes and parameters in the original paper to train the model for reporting the results or simple report the results presented in their original published papers for fairness.

4.4. Ablation Study

4.4.1. Ablation Study for STC-DGCN

To further analyze how each component module of the proposed STC-DGCN model affects the prediction results, ablation studies were conducted on the PEMS04 dataset. We have designed four variants of the STC-DGCN model to verify the influence of different components of STC-DGCN on prediction accuracy as follows:

(1): w/o STEM: to verify the effects of STEM, we remove the STEM module, only using the original traffic data for prediction.
(2): w/o TFE-HCM: the TFE-HCM module is replaced by the traditional TCN for capturing temporal correlation.
(3): w/o SFE-DGM: the SFE-DGM module is replaced by a pre-defined graph structure to capture the spatial features, letting the model rely only on a fixed adjacency matrix to extract neighborhood node features.
(4): w/o STM-GM: removing the STM-GM module and directly using the last layer output of the STC-DGCN model for traffic flow forecasting.

The prediction results of the MAE, RMSE, and MAPE metrics for each variant model are shown in Figure 3. STC-DGCN outperforms w/o STEM, indicating that the STEM module has the ability to effectively encode the periodic dynamic correlation of traffic flow data and plays an important role in improving the model’s performance. Compared to other variant models, the performance of the w/o SFM-DTE model degrades significantly. This verifies that the designed TFE-HCM module is crucial for the model’s performance since it can effectively model the short-term and long-term dynamic temporal corrections by the devised GTF block and LTF block with attention mechanisms. The model’s performance is also deteriorated for w/o SFE-DGM, with 23.62, 37.19, and 15.66% under MAE, RMSE, and MAPE metrics, respectively. Those results demonstrate that the SFE-DGM module also plays a significant role in improving the model’s performance. Since the SFE-DGM module models the dynamic spatial correlation through multi-graph structure with SDM block and ASM block, it can overcome the limitation of the GCN that only relies on a fixed adjacency matrix to extract spatial node features. Compared with w/o STM-GM, the MAE, RMSE, and MAPE are increased. This further demonstrates that the STM-GM module can effectively learn and leverage the inherent spatial–temporal relationships within the traffic data, and is crucial for improving the model’s performance.

4.4.2. The Effect of TFE-HCM

(1): w/o TCN: this variant uses the traditional TCN for capturing temporal correlation instead of the TFE-HCM module.
(2): w/o GTF: the TFE-HCM module is replaced by only using the designed GTF block to capture the temporal correlation of nodes in road network.
(3): w/o LTF: the TFE-HCM module is replaced by only using the designed LTF block to capture temporal correlation.

As shown in Figure 4, TFE-HCM outperforms w/o TCN, w/o GTF, and w/o LTF, indicating the effectiveness of the TFE-HCM module. Since w/o TCN only relies on the TCN that uses 1D-CNN to capture temporal features, it cannot effectively capture long-term temporal correlations due to the fixed convolution kernel, and thus performs deficiently compared to other models. Since the GTF block utilizes the attention mechanism to encode the global contextual temporal features, the w/o GTF model demonstrates better performance than the w/o TCN model. This verifies that the GTF block can effectively capture the global dynamic correlation of different nodes over time and is helpful for improving the model’s performance. With the attention score to measure the importance of each node for future prediction, the LTF block can effectively aggregate local temporal features, and thus the MAE, RMSE, and MAPE for the w/o LTF model are decreased significantly, verifying that the local temporal features of the traffic signal are also crucial for the model’s performance.

4.4.3. The Effect of SFE-DGM

(1): w/o SCN: this variant uses the traditional pre-defined graph structure with spectral convolution to capture the spatial feature instead of the SFE-DGM module, where $M_{a}$ in Equation (11) measures the spatial correction between nodes.
(2): w/o SDM: the SFE-DGM module is replaced by only using the designed SDM block to capture spatial correlation of nodes in the road network.
(3): w/o ASM: the SFE-DGM module is replaced by only using the designed ASM block.

The results are shown in Figure 5. SFE-DGM has superior performance compared to w/o SCN, w/o SDM, and w/o ASM. The good performance of SFE-DGM verifies that it can effectively model the spatial correlation of traffic signals, and it plays a significant role in improving the model’s performance. Compared with other variants, the SFE-DGM module captures the dynamic spatial correlation through multi-graph structure with SDM block and ASM block. With the attention mechanism introduced into pre-defined graph structure to explore and quantify inter-node influence, the w/o SDM overcomes the limitation of the GCN that only relies on a fixed adjacency matrix to extract spatial node features, demonstrating superior performance compared to w/o SCN. The w/o ASM model employs a spatial attention mechanism to capture the dynamic correlations for all nodes at each time step, and it also demonstrates good performance compared to w/o SCN.

4.5. Comparison with SOTA Methods

4.5.1. Results on PEMS04

We compare our STC-DGCN model with SOTA methods including statistical-based models [4,5,6], RNN-based models [4,5,6], GCN-based models [4,5,6], adaptive GCN-based models [4,5,6], attention-mechanism-based models [4,5,6], and differential-equation-controlled GCN-based models [4,5,6]. As shown in Table 2, the prediction accuracy of our proposed STC-DGCN model outperforms most of the compared models, with 18.27, 30.37, and 12.12% on MAE, RMSE, and MAPE metrics in the PEMS04 dataset, respectively. Compared with the typical statistical-based time series forecasting models, such as ARIMA and VAR, the results show that those models perform more poorly than neural-network-based methods; this proves that the statistical-based models are insufficient to model complex nonlinear pattens hidden in dynamic time series data. RNN-based models such as LSTM and GRU-ED perform better than statistical-based methods, since these models utilize the RNNs to effectively model the nonlinear temporal relations in traffic data; however, RNN-based models still have poor performances compared to the GCN-based methods because they only model temporal features. The GCN-based methods such as ST-GCN and DCRNN show good performance, since these models combine the temporal convolution with spatial GCN to model the temporal and spatial correlation of traffic data, which indicates that the spatial correlations are important for traffic flow prediction. In terms of adaptive GCN-based models, such as GWN and AGCRN, these models introduce the adaptive adjacency matrix into GCN with gated temporal convolution to capture the spatial and temporal dependencies of traffic data, and they achieve better performance than the GCN-based methods. This indicates that the pre-defined graph structure in GCN has limitation in capturing rich semantic correlation between the nodes for traffic data, which greatly affects the model’s performance. Compared with the attention-mechanism-based models, such as DSANet and ASTGCN, the performance of GCN-based models is significantly inferior to that of attention-based models. This is because the GCN-based methods and the adaptive GCN-based models encounter an over-smooth problem with the increasing of graph convolution layers, limiting the model’s ability to model rich spatial temporal dependencies of the nodes in the road network. Furthermore, those models relying on TCN that use 1D-CNN to capture temporal features are also limited in their ability to capture long-term temporal correlations due to the fixed convolution kernel. In terms of differential-equation-controlled GCN-based models, such as STG-NCDE and STGODE, these models introduce ordinary differential equations into GCN for overcoming the over-smooth problem in GCN, and they have superior performance compared to the attention-mechanism-based model. The performance of STG-NCDE and STGODE returns lower MAE, RMSE, and MAPE metrics than DSANet and ASTGCN on the PEMS04 dataset (19.21 vs. 22.79, 31.09 vs. 35.77, 12.76 vs. 16.03 of MAE, RMSE, and MAPE for STG-NCDE and DSANet, respectively; 20.84 vs. 22.42, 32.82 vs. 34.25, 13.77 vs. 15.87 of MAE, RMSE, and MAPE for STGODE and ASTGCN, respectively). This further validates that the differential equations can effectively handle the dynamic changes of spatial–temporal time series signals, improving model performance.

4.5.2. Results on PEMS07

As shown in Table 2, our proposed STC-DGCN model has superior prediction accuracy compared to the SOTA models, with 19.36, 32.79, and 8.06% for MAE, RMSE, and MAPE metrics in the PEMS07 dataset, respectively. This is because the proposed STC-DGCN model is a GCN-based traffic flow forecasting model, which utilizes the spatial graph convolution to model the spatial dependence in a road network; thus it has superior performance to the statistical-based models and RNN-based models, which show limitations either in extracting nonlinear time dependence or spatial correlation for nodes in time series forecasting tasks. Additionally, since the proposed STC-DGCN model is stacked with the STEM module and TFE-HCM module, it can effectively encode the dynamic spatial–temporal relationships of road networks at different time steps, and thus shows superior performance compared to the GCN-based models [4,5,6] and attention-mechanism-based models. Moreover, we have designed the spatial–temporal feature fusion module based on gating fusion mechanism to effectively learn and leverage the inherent spatial–temporal relationships within the traffic signals, which demonstrates better performance than the separate spatial–temporal models (i.e., DCRNN, STGCN, GWNet, and AGCRN). This further validates that spatial–temporal dependencies of nodes in a road network are key for time series forecasting tasks. Though STG-NCDE and STGODE introduce ordinary differential equations into GCN for overcoming the over-smooth problem, they still model spatial relationships by static graph structure, and the dynamic changes of time series data are not considered. Hence, these differential-equation-controlled GCN-based models demonstrate worse performance than our STC-DGCN model.

4.5.3. Results on PEMS08

The prediction error metrics on the PEMS08 dataset also demonstrated that the proposed model has superior performance to the compared methods. For example, though the proposed model, ST-GCN, DCRNN, GWN, AGCRN, and ASTGCN all model spatial and temporal dependencies of traffic data, the performance of ST-GCN and DCRNN are worse than other methods since GWN and AGCRN introduce the adaptive graph structures and GRU units to effectively capture the dynamic spatial–temporal correlation of traffic data. Furthermore, with the additional attention mechanism to model dynamic changes of time series data, AGCRN and ASTGCN exhibit superior performance compared to ST-GCN, DCRNN, and GWN. However, the proposed model not only uses the designed STEM module to effectively encode the dynamic spatial–temporal relationships of nodes for road networks, but also utilizes the TFE-HCM module that has the advantage of an attention mechanism to capture dynamic temporal correlation in the long term and short term. The SFE-DCM module is also designed to model dynamic spatial relationships through multi-graph structure. Thus, with these modules stacked, the experimental results on PEMS08 show that our model achieves the best performance. This indicates that it effectively models the spatial–temporal correlations of traffic data and still has much room for improving the forecasting performance.

4.5.4. Visualization of the Prediction Error with Different Time Steps

To further analyze the performance of traffic flow forecasting influenced by the length of time steps, Figure 6 shows the prediction error metrics at each step (10 min) on the PEMS04 dataset.

As shown in Figure 6, with the increasing length of forecasting time steps, the MAE, RMSE, and MAPE metrics for all methods continue to increase as the forecasting task is changed in difficulty, while the prediction error of statistical-based models, e.g., VAR, increases significantly with the increase of time steps. This is because statistical-based models cannot effectively model complex nonlinear patterns hidden in dynamic time series data, and thus it is insufficient for future traffic flow forecasting. LSTM and GRU-ED only extract the temporal features of traffic flow data, while ignoring the spatial inter-node correlations, and thus the long-term prediction accuracy of those models also decreases significantly compared to other GCN-based models. Since the DCRNN and GWN combine the diffusion GCN and TCN for capturing spatial and long-term temporal corrections, those two models demonstrate good performance compared to the LSTM and GRU-ED models with the increasing of forecasting time steps. Although ST-GCN, DCRNN, and GWN capture both the spatial and temporal correlation for forecasting, they simply use a 1D-CNN to extract temporal correlations, ignoring the traffic flow forecasting is an essential spatial–temporal forecasting task, and the long-term prediction performance is worse than that of attention-based models, such as DSANet and ASTGCN. Since the attention mechanisms can effectively model the long-term temporal correlations, the MAE, RMSE, and MAPE metrics for DSANet and ASTGCN increase slowly compared to the ST-GCN, DCRNN models with time steps increasing. It is worth noting that the long-term prediction performance of AGCRN is superior to the STG-NCDE and STGODE models. The rate of long-term prediction accuracy for STG-NCDE and STGODE is lower than that of the AGCRN model. This shows that the differential equations introduced into GCN can effectively handle the dynamic changes of spatial–temporal time series signals, improving model performance. Our STC-DGCN model effectively encodes the dynamic long-term spatial–temporal relationships of nodes for road networks with an attention mechanism and dynamic graph structure, and thus has better long-term forecasting capabilities, as shown in Figure 6.

5. Conclusions

In this paper, a spatial–temporal-correlation-constrained dynamic graph convolutional network (STC-DGCN) is proposed for traffic flow forecasting. The STEM module is devised to effectively embed the periodic traffic signal and encode the dynamic changes of spatial relationships in road networks at different time steps. The TFE-HCM module and SFE-DCM module are alternatively stacked to generate dynamic graph structures, enabling the STC-DGCN model to effectively capture the dynamic spatial and temporal correlations of traffic signals. Finally, the STM-GM module is designed to effectively learn and leverage the inherent spatial–temporal relationships for traffic flow forecasting. Experimental results on three real-world traffic flow datasets demonstrate the effectiveness of the proposed STC-DGCN. In future work, we plan to further decompose traffic signals into normal signals and abnormal signals for modeling, incorporating additional factors, such as accident and weather conditions, to improve the model’s robustness. Furthermore, we will incorporate algorithmic optimizations, such as parallel, distributed processing, and model pruning, to further enhance the model’s computational capabilities and reduce computational load, memory usage, and processing time.

Author Contributions

Conceptualization, Y.G.; Methodology, Y.G., B.Z., F.P., Y.Z. and M.L.; Software, J.W., J.M. and M.L.; Validation, Y.G., J.W., F.P., J.M., C.Y. and Y.Z.; Formal analysis, J.W. and J.M.; Investigation, Y.G., B.Z., F.P., C.Y. and Y.Z.; Resources, J.W., C.Y. and M.L.; Data curation, B.Z., F.P. and Y.Z.; Writing—original draft, Y.G., J.M. and M.L.; Project administration, M.L.; Funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript. The software used in this paper is Python 3.6 under the PyTorch framework with two Nvidia GTX 2080Ti GPUs.

Funding

This work is supported by a grant from a research project of the Shaanxi Provincial Transportation Department (No. 23-90X).

Data Availability Statement

The data used in this manuscript are publicly available at https://pems.dot.ca.gov/ and include detailed instructions for using those datasets. The source code supporting this paper will be available from the corresponding author upon request.

Conflicts of Interest

Author Yajun Ge was employed by the company Shaanxi Transportation Co., Ltd., Jiannan Wang, Bo Zhang and Fan Peng were employed by Operation Management Branch of Shaanxi Transportation Holding Group Co., Ltd., Jing Ma was employed by Shaanxi Expressway Testing & Measuring Co. Ltd. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, F.; Liang, Y.; Lin, Z.; Zhou, J.; Zhou, T. SSA-ELM: A Hybrid Learning Model for Short-Term Traffic Flow Forecasting. Mathematics 2024, 12, 1895. [Google Scholar] [CrossRef]
Yang, H.H.; Wen, J.M.; Wu, X.J.; He, L. An Efficient Edge Artificial Intelligence Multi-pedestrian Tracking Method with Rank Constraint. IEEE Trans. Ind. Inform. 2019, 15, 4178–4188. [Google Scholar] [CrossRef]
Zhao, Y.; Lin, Y.; Wen, H.; Wei, T.; Jin, X.; Wan, H. Spatial-Temporal Position-Aware Graph Convolution Networks for Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8650–8666. [Google Scholar] [CrossRef]
Liu, X.; Wang, W. Deep Time Series Forecasting Models: A Comprehensive Survey. Mathematics 2024, 12, 1504. [Google Scholar] [CrossRef]
Smith, B.L.; Williams, B.M.; Oswald, R.K. Comparison of parametric and nonparametric models for traffic flow forecasting. Transp. Res. C Emerg. Technol. 2002, 10, 303–321. [Google Scholar] [CrossRef]
Chandra, S.R.; Al-Deek, H. Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 2009, 13, 53–72. [Google Scholar] [CrossRef]
Lippi, M.; Bertini, M.; Frasconi, P. Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning. IEEE Trans. Intell. Transp. Syst. 2013, 14, 871–882. [Google Scholar] [CrossRef]
Johansson, U.; Boström, H.; Löfström, T.; Linusson, H. Regression conformal prediction with random forests. Mach. Learn. 2014, 97, 155–176. [Google Scholar] [CrossRef]
Zheng, Z.; Su, D. Short-term traffic volume forecasting: A k-nearest neighbor approach enhanced by constrained linearly sewing principal component algorithm. Transp. Res. Part C Emerg. Technol. 2014, 43, 143–157. [Google Scholar] [CrossRef]
Liu, A.; Zhang, Y. Spatial–Temporal Dynamic Graph Convolutional Network with Interactive Learning for Traffic Forecasting. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7645–7660. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the 2014 NIPS Workshop on Deep Learning, San Diego, CA, USA, 8–11 December 2014; pp. 1–9. [Google Scholar]
Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Stacked bidirectional and uni directional lstm recurrent neural network for forecasting network wide traffic state with missing values. Transp. Res. Part C Emerg. Technol. 2020, 118, 102674. [Google Scholar] [CrossRef]
Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5668–5675. [Google Scholar]
Jiang, W.W.; Luo, J.Y.; He, M.; Gu, W. Graph Neural Network for Traffic Forecasting: The research progress. ISPRS Int. J. Geo-Inf. 2023, 12, 100. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; ACM: New York, NY, USA, 2018; pp. 3634–3640. [Google Scholar]
Weng, W.C.; Fan, J.; Wu, H.F.; Hu, Y.J.; Tian, H.; Zhu, F.; Wu, J. A Decomposition Dynamic graph convolutional recurrent network for traffic forecasting. Pattern Recognit. 2023, 142, 109670. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–16. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 922–929. [Google Scholar]
Choi, J.; Choi, H.; Hwang, J.; Park, N. Graph neural controlled differential equations for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 6367–6374. [Google Scholar]
Li, Z.; Ren, Q.; Chen, L.; Sui, X.; Li, J. Multi-Hierarchical Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 4913–4919. [Google Scholar]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]
Liu, Z.; Lin, Y.T.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal syn chronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 914–921. [Google Scholar]
Chen, C.; Petty, K.; Skabardonis, A.; Varaiya, P.; Jia, Z. Freeway performance measurement system: Mining loop detector data. Transp. Res. Rec. 2001, 1748, 96–102. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Proceedings of the 34th Neural Information Processing Systems (NIPS), San Diego, CA, USA, 6–12 December 2020; pp. 17804–17815. [Google Scholar]
Fang, Z.; Long, Q.; Song, G.; Xie, K. Spatial-temporal Graph ode Networks for Traffic Flow Forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 364–373. [Google Scholar]
Zivot, E.; Wang, J. Vector autoregressive models for multi variate time series. In Modeling Financial Time Series with S-PLUS^®; Springer: New York, NY, USA, 2006; pp. 385–429. [Google Scholar]
Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Deep bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction. arXiv 2018, arXiv:1801.02143. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. arXiv 2014, arXiv:1409.3215. [Google Scholar] [CrossRef]
Huang, S.; Wang, D.; Wu, X.; Tang, A. Dsanet: Dual Self-attention Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2129–2132. [Google Scholar]

Figure 1. Framework of the proposed STC-DGCN.

Figure 2. An example of a road network.

Figure 3. Ablation experiment results for each component of the proposed STC-DGCN.

Figure 4. Ablation experiment results for each component of the TFE-HCM.

Figure 5. Ablation experiment results for each component of the SFE-DGM.

Figure 6. Prediction error with different time steps on PEMS04.

Table 1. Dataset details.

Dataset	PEMS04	PEMS07	PEMS08
Nodes	307	883	170
Time steps	16,992	28,224	17,856
Time interval	5 min	5 min	5 min
Start time	1 January 2018	1 May 2017	1 July 2016
End time	28 February 2018	31 August 2017	31 August 2016
Type	traffic flow	traffic flow	traffic flow
Area	San Francisco	San Francisco	San Bernardino

Table 2. Comparison of our model with SOTA methods using benchmark datasets.

Models		PEMS04			PEMS07			PEMS08
Models		MAE	RMSE	MAPE/%	MAE	RMSE	MAPE/%	MAE	RMSE	MAPE/%
Statistical-based models	ARIMA	33.73	48.80	24.18	38.17	59.27	19.46	31.09	44.32	22.73
Statistical-based models	VAR	24.54	38.61	17.24	50.22	75.63	32.22	19.19	29.81	13.10
RNN-based models	LSTM	26.81	40.74	22.33	29.71	45.32	14.14	22.19	33.59	18.74
RNN-based models	GRU-ED	23.68	39.27	16.44	27.66	43.49	12.20	22.00	36.22	13.33
GCN-based models	ST-GCN	21.16	34.89	13.83	25.33	39.34	11.21	17.50	27.09	11.29
GCN-based models	DCRNN	22.74	36.58	14.75	23.64	36.52	12.28	18.18	28.18	11.23
Adaptive GCN-based models	GWN	24.89	39.66	17.29	26.39	41.50	11.97	18.28	30.05
Adaptive GCN-based models	AGCRN	19.83	32.26	12.97	22.37	36.55	9.12	15.95	25.22	10.09
Attention-mechanism-based models	DSANet	22.79	35.77	16.03	31.36	49.11	14.43	17.14	26.96	11.32
Attention-mechanism-based models	ASTGCN	22.42	34.25	15.87	21.22	34.10	9.05	15.07	24.80	9.51
Differential-equation-controlled GCN-based models	STG-NCDE	19.21	31.09	12.76	20.53	33.84	8.80	15.45	24.81	9.92
Differential-equation-controlled GCN-based models	STGODE	20.84	32.82	13.77	22.59	37.54	10.14	16.81	25.97	10.62
Our model	STC-DGCN	18.27	30.37	12.12	19.36	32.79	8.06	14.32	23.74	9.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ge, Y.; Wang, J.; Zhang, B.; Peng, F.; Ma, J.; Yang, C.; Zhao, Y.; Liu, M. Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Mathematics 2024, 12, 3159. https://doi.org/10.3390/math12193159

AMA Style

Ge Y, Wang J, Zhang B, Peng F, Ma J, Yang C, Zhao Y, Liu M. Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Mathematics. 2024; 12(19):3159. https://doi.org/10.3390/math12193159

Chicago/Turabian Style

Ge, Yajun, Jiannan Wang, Bo Zhang, Fan Peng, Jing Ma, Chenyu Yang, Yue Zhao, and Ming Liu. 2024. "Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting" Mathematics 12, no. 19: 3159. https://doi.org/10.3390/math12193159

APA Style

Ge, Y., Wang, J., Zhang, B., Peng, F., Ma, J., Yang, C., Zhao, Y., & Liu, M. (2024). Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Mathematics, 12(19), 3159. https://doi.org/10.3390/math12193159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network for Traffic Flow Forecasting

Abstract

1. Introduction

2. Related Works

2.1. Traffic Flow Forecasting

2.2. GCN-Based Traffic Forecasting

3. Spatial–Temporal-Correlation-Constrained Dynamic Graph Convolutional Network

3.1. Preliminaries

3.2. The Proposed Model

3.2.1. Spatial–Temporal Embedding Encoder Module (STEM)

3.2.2. Temporal Feature Encoder Module with Heterogeneous Time Series Correlation Modeling

3.2.3. Spatial Feature Encoder Module with Dynamic Multi-Graph Modeling (SFE-DGM)

3.2.4. Spatial–Temporal Feature Fusion Module Based on Gating Fusion Mechanism (STM-GM)

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Experimental Settings

4.3. Some State-of-the-Art Methods

4.4. Ablation Study

4.4.1. Ablation Study for STC-DGCN

4.4.2. The Effect of TFE-HCM

4.4.3. The Effect of SFE-DGM

4.5. Comparison with SOTA Methods

4.5.1. Results on PEMS04

4.5.2. Results on PEMS07

4.5.3. Results on PEMS08

4.5.4. Visualization of the Prediction Error with Different Time Steps

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI