4.1.1. Semi-Supervised Road Flow Estimation Model Based on Graph Convolutional Network
In recent years, graph-based semi-supervised learning has found a wide range of applications in areas such as social network analysis, recommender systems, and cybersecurity. It uses the structure of relationships between data (represented as graphs) to improve the performance of learning algorithms, especially when labeled data are scarce. Moreover, Kipf and Welling [
37] proposed a simple and effective graph convolutional network (GCN) for semi-supervised representation learning. GCN can extract representative data features by automatically learning feature information and structural information of graph data simultaneously [
38]. It can achieve high accuracy with only a small amount of labeled data, which is currently one of the best choices for semi-supervised learning tasks. In this paper, a semi-supervised path flow estimation model based on GCN is developed, as shown in
Figure 5. We abstract each path in the arterial as a node. The yellow nodes in
Figure 5 indicate path samples with real traffic flow labels, and the blue nodes indicate path samples without real traffic flow labels. The connected edges between nodes represent the correlations between paths, expressed as an adjacency matrix
(i.e., the structural information of the graph). In the input layer, we use the connected vehicles’ current and historical period flow information of each path (including labeled path samples and unlabeled path samples) as input features. In the hidden layer, based on the structure of the graph, the information about the path itself and the information about the neighboring path nodes are aggregated to generate a new representation of each path node. Finally, at the output layer, we can obtain the estimated traffic flow for each path. By comparing the estimated and true flows of the labeled path samples, we can calculate the estimated loss of the model. Backpropagation optimization of the model parameters is performed based on the estimated loss, and the optimized model can be used to estimate the traffic flow of unlabeled path samples. In the following, we specify the details of the GCN-based semi-supervised path flow estimation model.
The graph structure of an arterial with
paths can be denoted as
.
and
denote the adjacency matrix of the graph structure information and the feature matrix of dimension
, respectively. Here, the graph structure information of adjacency matrix
is used to reflect the relationship between arterial paths. We denote the adjacency matrix
with the following:
where
Since the graph is an undirected graph, the adjacency matrix
is a symmetric matrix (triangular matrix). The feature matrix
is denoted as:
denotes the connected vehicle flow observed in the current time period and last time periods of path . Here, the feature dimension . and are the data in the input layer of the graph convolutional network.
According to the propagation rule for graph convolutional networks, the propagation rule for the
pth layer is as follows:
The mapping function
by the graph convolutional network learning is the output feature representation of the
th layer of the graph convolutional network,
.
, where
is the unit matrix, and
is the adjacency matrix add the self-cycling, indicating that the feature information of the paths themselves is taken into account in the graph convolutional calculation.
denotes the degree matrix of
,
.
is the
th layer weight network parameter of the graph convolutional network. By propagating through the layers, each node has access to more higher-order information. Taking the example of a two-layer graph convolutional network, the model can be represented as:
where
. Neighborhood aggregation and information propagation occurs via two-layer graph convolution, and
is the estimated output value of the model (i.e., it is an estimate of the real flow for each path in the arterial). As shown in
Figure 5, due to the sparsity of AVI detection devices, we only have the real observed flows for some of the paths (the paths represented by nodes 2,
, and
in
Figure 5). Therefore, we only use the labeled path simples to compute the model loss. The loss of the GCN model is denoted as:
where
is the set of labeled path simples, and
is the computed loss value between the estimated and true values of the path flow.
4.1.2. Multi-Knowledge Graphs Construction
As there are specific correlations between different paths in arterials [
36], with respect to different transportation and domain knowledge, such correlations can be multifaceted (e.g., topological connectivity relationships, temporal pattern similarity, non-functional dependencies, etc.). It is difficult to adequately represent these complex associations using a single-graph structure. Therefore, we may need to build multiple knowledge-based graph structures to express these associative relationships. Multi-graph-based learning methods can mine more potential information to model reasonable contextual relationships [
39]. Due to the consideration of more potential information, multi-graph-based learning method approaches are usually superior to single-graph-based approaches.
We define three knowledge-based graphs to represent different types of correlations between different paths in the arterials and fuse the multi-knowledge graphs by relational graph convolutional network (RGCN). The nodes in the three graphs represent the paths in the arterial, and the edges of the adjacency matrix represent the topological connectivity, temporal similarity, and potential correlation among the paths, respectively. The representation and specific details for each knowledge graph are shown below.
(1) Topology Connectivity Graph
Figure 6 shows a simple example of a signalized arterial with two intersections, lending to the illustration of our problem. To model the physical topology of paths in a signalized artery for path
and
that originate from the same entrance, since they have the same origin, we define that there is a departure topology connectivity between them when the weight of the adjacent edges between them is 1, i.e.,
. For example, for the paths starting from entrance 2 in
Figure 6, the weights of the adjacent edges between them are all 1. Similarly, for the paths m and n arriving at the same entrance, since their destinations are the same, we define that there exists an arrival topology connectivity between them, in which case the weights of the adjacent edges between them are 1. For example, for the paths arriving at entrance 2 in
Figure 6, the weights of the adjacent edges between them are both 1. For the paths m and n, which have neither a departure topology connection nor an arrival topology connection, the weights of the adjacent edges between them are 0, i.e.,
. To summarize, for the topology connection graph, for any two paths in the arterial, the adjacent edge weight between them is defined as:
(2) Temporal Similarity Graph
There may be similar time series patterns between different paths in an arterial. To better capture the temporal similarity between paths, we use a similarity assessment algorithm to calculate the temporal similarity between paths. The traditional Euclidean distance can only reflect the scalar distance between two points, i.e., the numerical distance between two time series, and cannot be used to accurately describe the degree of similarity between the shapes of two time series curves. The dynamic time warping (DTW) method [
40] can accurately describe the similarity of time series curves by warping the time axis for matching. Given the historical flow time series
and
of any two paths in the arterial, of lengths
and
, respectively. Using the Euclidean distance to calculate the distance between any two points in the time series
and
, i.e.,
The distance matrix
:
The DTW algorithm is to find a single continuous path that minimizes the cumulative distance from the upper left corner to the lower right corner. A smaller DTW indicates a higher similarity between the historical flow time series of the two paths. Therefore, we construct a time similarity graph based on the DTW values between paths as their adjacent edge weights. In the temporal similarity graph, for any two paths
and
in the arterial, the adjacent edge weight between them is defined as:
Since the adjacency matrix requires that the higher the temporal similarity between paths, the higher the weight of the adjacency edges (the closer to 1). Therefore, to make the adjacency matrix
more computationally tractable, we introduce an exponential function
to map the values of DTW to the interval range [0, 1]. We define this as follows:
When the temporal similarity of the two paths is higher, this means that the value of
tends to be 1 for smaller values of
. When the temporal similarity of the two paths is smaller, which means that the value of
tends to be 0 for larger values of
. We use
as the new adjacent edge weight, i.e.,:
(3) Potential Correlation Graph
The two predefined knowledge-based graphs may not be enough to capture all possible correlations between paths, and there could be hidden correlations that cannot be explicitly represented. Therefore, we need to further obtain the potential correlation between signalized arterial paths. Traditional correlation measures such as Pearson and Spearman can only obtain the linear correlation between paths and have additional requirements on the data distribution (e.g., Pearson’s algorithm requires that the data obey a normal distribution). The potential correlation between paths is complex, and there is not only a linear correlation. Therefore, the above algorithms cannot meet our requirements. Compared with other correlation measures, the maximal information coefficient (MIC) algorithm can widely measure the dependency relationships between variables, such as linear, non-linear relationships, and even non-functional dependencies (such as dependencies consisting of more than one function) that cannot be represented by a single function. Therefore, we use
to define the potential correlation between two arterial paths, which takes the range of [0, 1]. A larger
value indicates a higher potential correlation between two paths. When the potential correlation between the two paths is higher, the value of
is larger, and the value of
tends to be 1. When the potential correlation between the two paths is lower, the value of
is smaller, and the value of
tends to be close to 0. Therefore, we construct the potential correlation graph based on the
values between paths as their adjacent edge weight. In the potential correlation graph, for any two paths
and
in the arterial, the adjacent edge weight between them is defined as:
4.1.3. Multi-Knowledge Graph Integration Based on Relational Graph Convolutional Network
In the previous section, we built multiple knowledge-based graph structures based on multiple associative relationships, and in this section, we use a relational graph convolutional network (RGCN) [
41] to integrate multiple knowledge-based graphs in order to learn a unified feature representation. In RGCN, nodes first aggregate neighboring node features from a single graph and then fuse the node features aggregated on multiple graphs again for a fused representation.
For different relation types of multiple graphs (e.g.,
Figure 7), RGCN does the following:
In the
th layer of the convolution, we use
to denote the linear transformation function of the knowledge graph
. For each of the three knowledge graphs we propose, each of them has its own linear transformation function, which is responsible for transforming the features of the neighboring nodes on the edges of the corresponding relationship graph:
where
denotes the state feaatures of the
th layer of path
.
is the set of knowledge graphs.
is the normalization weight coefficient which is responsible for assigning weights to each knowledge graph with a value of the number of knowledge graphs.
denotes the set of neighboring nodes of path
in the knowledge graph
, and
is the activation function.
According to Equation (19), after aggregating the node features of different knowledge graphs, R-GCN also needs to add the features of its own nodes, and finally, the node features after fusing multiple knowledge graphs can be obtained through an activation function. We established a path flow estimation model based on a multi-knowledge graph, and the model structure is shown in
Figure 8. Firstly, we transport the input features into a topological connectivity graph, temporal similarity graph, and potential correlation graph to extract the features. Then, we fuse the features of multiple graphs based on the multi-layer RGCN. For the results of multi-graph fusion, we transport them into a fully connected network to finally obtain the estimated traffic flow of each path. When training the model, we back-propagate the error by computing the estimated loss of the model only from the real traffic labels of some of the observed paths. Thus, the model training process is semi-supervised.