1. Introduction
With the continuous development of intelligent transportation technology, the Electronic Toll Collection (ETC) gantry system has been widely used on highways [
1,
2]. As of the end of 2022, a total of 28,328 physical gantries and 37,947 dedicated ETC lanes have been established in China, with nearly one billion ETC transactions recorded daily [
3]. However, since the system involves a large number of devices and data transmission, the occurrence of system failures may lead to problems such as traffic congestion and loss of fees. Therefore, the efficient dynamic anomaly detection of ETC gantry can help maintenance personnel to quickly detect and solve system anomalies, ensuring the normal operation of the ETC gantry system, which is of great significance, and can also provide a data basis and guarantee for a series of research [
4,
5,
6,
7] on expanding the application of smart highways based on ETC transaction data. The original ETC gantry anomaly detection mainly uses manual inspection or remote detection. However, these methods have some problems: the human resources’ investment is large, and traditional abnormality detection mainly relies on the experience of personnel, making it prone to omission or misdiagnosis, and it is difficult to meet the requirements of real-time operation and accuracy. Highway companies in various provinces and cities in China have also carried out a series of studies on the intelligent operation and maintenance of ETC gantries. Zheng et al. [
8] improved and optimized the modules of the system control cabinet. This provides a safe and stable operating environment for the ETC gantry system in terms of front-end sensing, storage, power supply, and communication. Jin et al. [
9] briefly described intelligent detection and operation and maintenance, focusing on two aspects, big data fusion analysis and detection, and operation and maintenance full life-cycle management, in order to improve the detection, operation, and maintenance business capacity of highways. Wei et al. [
10] developed a toll system key power, network cloud detection, and warning system; the system can raise alarms for abnormalities caused by the interruption of the gantry cable or the tripping of any switch.
The operation and maintenance of the gantry system presented above are based on the indicators of the hardware equipment, which are to detect faults and then realize the operation and maintenance management of the gantry. However, this method is rather one-sided and only analyzes the local anomalies in the system power and network. Moreover, most use experience and theory to design the operation and maintenance system, which have poor reliability and cannot effectively meet the needs of gantry operation and maintenance in a timely manner. Therefore, this paper summarizes the research work in the field of equipment anomaly detection in recent years and, on this basis, combines the characteristics of highway ETC transaction data, and proposes a highway ETC gantry anomaly dynamic detection method based on the combination of spatio-temporal characteristics. The main contributions are specified as follows:
The paper proposes, for the first time, the use of gantry transaction data for gantry anomaly detection, which provides a comprehensive assessment of gantry anomalies at the macro-level and defines four categories of gantry anomalies of different grades, and this categorization system allows for the continuous observation of potentially anomalous risky gantries.
The method uses a combination of a Graph Convolutional Network (GCN), contextual attention mechanism, and Gate Recurrent Unit (GRU) to not only consider the spatial distribution of gantries, but also capture the temporal dynamic properties of gantry transaction data. The time-series characteristics of the gantries are fully considered, leading to better detection and updating of the gantries’ status.
This paper carries out a large number of experiments using real ETC data, and achieves more than 99% accuracy in all kinds of evaluation indexes, showing a better performance than other baseline models, and the model shows good performance with different data volumes, which proves the effectiveness of this paper’s method.
The rest of the paper is organized as follows:
Section 2 provides an overview and summary of related work in the field of anomaly detection.
Section 3 presents a problem description and related definitions for the dynamic detection of gantry anomalies.
Section 4 presents the proposed model and methodology.
Section 5 conducts detailed experiments on the developed model and presents the experimental results and the performance of the proposed model. Finally,
Section 6 summarizes the work of this paper and provides an outlook for future work.
2. Literature Review
ETC gantry is a system of multiple hardware devices integration, which mainly contains front-end devices, such as industrial controllers, power adapters, RSU antennas, license plate recognition devices, and back-end devices, such as servers and switches. Current research on hardware device anomaly detection is broadly categorized into three main groups: signal processing methods, multivariate statistics-based methods, and machine-learning-based methods.
Among the methods of signal processing, time-frequency analysis and sparse filtering can improve the performance of fault diagnosis models [
11,
12,
13,
14]. Li [
15] analyzed the spectrum of the signal by applying dimensionality reduction to the spectral image obtained by fast Fourier transform for fault diagnosis. The proposed wavelet transform overcomes the problem of localized characteristics and is used in fault diagnosis at a non-smooth multiscale, but cannot further decompose signals in high-frequency bands. Among the multivariate statistical analysis methods, Principal Component Analysis (PCA) has an excellent advantage for data dimensionality reduction processing. Liang et al. [
16] proposed an anomaly detection method based on generalized mutual entropy principal element analysis and applied it to the Tennyson–Eastman process for anomaly detection; the method shows good performance in handling non-Gaussian anomaly detection, with low false-alarm and miss rates. Machine learning methods have received more and more attention in anomaly detection and diagnosis techniques due to their good self-learning, recognition, and classification capabilities. Shao et al. [
17] proposed an AM-Relief feature selection using an integration (support vector machine (SVM)) diagnostic algorithm; the proposed method increases the fault diagnosis accuracy from 83.0% to 98.9%, providing a new idea for research on the mechanical fault diagnosis of high-voltage circuit breakers. Yang et al. [
18] proposed a bearing fault diagnosis model based on multi-sensor information fusion for aero-engine bearings; the accuracy of this method is improved by 36.92% compared to the classification and identification of faults using SVM. Liu et al. [
19] proposed a train speed anomaly detection method that combines extreme gradient boosting and anomaly verification methods; the method has high performance and meets the requirements of the real-time detection of train operations. However, the use of the XGboost model in the research methodology may have some limitations, such as a strong dependence on parameter settings, which may require more tuning and validation. The equipment anomaly detection methods mentioned above are mainly applied to industrial processes, machinery and equipment. They have not been fully applied in the anomaly detection of ETC gantry systems and are limited by the limitations of each system device, which can only be detected individually through the design of specific algorithms for each device.
The transaction data generated by the ETC gantry system contain time-series properties, and the topology consisting of vehicle trajectories extracted from the transaction data has graph network structure properties. Gantry anomalies can cause missing transaction data, which manifest as point anomalies in time series anomalies and structural anomalies in topological road networks, so gantry anomalies can be investigated using dynamic graph anomaly detection methods based on time series and the structural properties of road network structural characteristics graph network structure characteristics. Dynamic graph anomaly detection is mainly categorized into isolated individual anomaly detection, anomaly group detection and event anomaly detection.
The main types of dynamic graph-oriented anomalous individual detection are node anomalies, anomalous edges, and anomalous moments. Ranshous et al. [
20] proposed a multidimensional count min sketch (CM-Sketch) approach to maintain the count of edges, which mainly focuses on the detection of anomalies on the edge flow, modeling the dynamic network as edges arriving over time and detecting the anomalous edges on this basis. However, empirical metrics are used, making the method less able to generalize. Eswaran et al. [
21] considered edges connecting sparsely connected regions occurring in emergency situations as anomalous edges and compared the connectivity between nodes around the edges before and after the addition of edges, such that edges with increased connectivity have higher anomaly values. However, this method requires extra space to maintain the sample and does not detect exceptions in the general case. Dynamic graph anomaly population detection requires consideration of the time dimension, and different areas of concern for the type of anomalous groups are different, such as those that can be analyzed in traffic or used to analyze anomalous groups in a social network. Zhang et al. [
22] proposed the FGN detection method based on graph neural network to resolve the problem of abnormal user detection in social networks. The article proposes HNR, a graph node representation learning method incorporating hyperbolic geometry, which reduces the training complexity and improves the information transfer process, preserving higher-order neighbor information. However, the HNR in the article only utilizes the node’s own attributes and does not make full use of the neighbors’ attribute information, meaning that it has a limited characterization learning ability. Event anomaly detection is a method of anomaly detection that looks for points in time series data that are different from most other points in time, and then delves deeper by using static graphs of network structure data for the points in time. Neural network anomalous event detection is widely used, especially the graph neural network, which is developed for deep learning on data with a graph structure, which play an important role in many directions [
23,
24,
25]. Liu et al. [
26] proposed the TADDY framework to represent a dynamic graph as a series of discrete graph snapshots, and defined the anomaly detection problem as one that provides an anomaly score for each edge at each timestamp. However, the overall training efficiency of the method is low, and it is not friendly to large-scale dynamic graphs. Ding et al. [
27] proposed AnoGLA, a network anomaly detection framework based on graph neural networks and long- and short-term memory networks. The framework views network traffic as a subgraph, extracts graph structural features using GCN, learns traffic variation patterns over time using long- and short-term memory networks, and finally generates anomaly scores for each traffic flow through end-to-end training using a negative sampling training framework. However, the paper only considered a small number of attributes for node coding, did not fully explore the node information, and was not effective for attribute-rich datasets.
In summary, the current research related to the abnormal detection of ETC gantry equipment is still relatively shallow; most of the research is more focused on engineering technology, and there is still a lot of room for academic research. Graph-based dynamic anomaly detection has shown great advantages in the field of big data, is capable of dealing with heterogeneous structures and time dependencies in dynamic networks, and is now widely used in such fields as finance, Internet security, social relationship mining, and telemarketing fraud detection. However, no scholars have applied the method to the dynamic detection of gantry equipment to date. Therefore, to address the shortcomings of the current research related to the abnormal dynamic detection of gantries, as well as the characteristics of the gantry transaction data and the properties of the graph network model, this paper proposes the GCN-GRU gantry anomaly dynamic detection method based on the contextual attention mechanism.
3. Relevant Definitions and Problem Descriptions
Definition 1. (highway segment QD): Two adjacent gantries on a highway form a QD, with gantry as the start node of the QD and as the end node:
Definition 2. (highway section LD): Highway segments consist of multiple adjacent QDs:where the starting node of is called the starting point of the section, the terminating node of is called the end point of the section, and the end point of the previous QD is the starting point of the next QD. Definition 3. (Freeway Road Network LW): All segments of the LD within the freeway study area comprise the freeway road network:
Definition 4. Effective transaction topology: In Figure 1, Figure 1a shows the local LW, and Figure 1b–d are the three sets of locally effective transaction topologies. In Figure 1b there are two ordered trading sections, <K, H, I> and <K, H, G, F, E>. As shown in Figure 1e, if the same license plate number appears in the transaction data of both gantry G and gantry E in the time series, but there is no same license plate number in gantry F, this means that gantry F has a transaction failure. Definition 5. Gantry Anomaly: A pattern of trading in which the gantry exhibits significantly different trading behavior from normal trading behavior during a specific time window. Based on the continuity of anomaly occurrence, gantry anomalies can be subdivided into the following four categories.
Burst Anomaly (BA): refers to multiple discrete transaction failures for different vehicle transactions on the gantry within a time window. Defined as more than B discrete transaction failures on the gantry within a time window T, where B is a predetermined threshold.
Mild Continuous Anomaly (MCA): refers to a time window in which consecutive different vehicles on the gantry have failed to trade, where is a predetermined threshold.
Moderate Continuous Anomaly (MoCA): refers to a time window in which there are consecutive different vehicles on the gantry that have failed to trade, where .
Severe Continuous Anomaly (SCA): refers to a time window in which or more consecutively different vehicles on the gantry have failed to trade, where .
In this paper, we distinguish the severity of anomalies by defining the thresholds B, , , and for the number of failed transactions, and capture the temporary or persistent nature of the anomaly based on the time window T.
Problem description:
Combining the topology of the highway network, the gantry nodes are taken as the nodes of the graph, the gantry transactions are taken as the edges of the graph, the dynamic detection of the gantry anomalies is converted to the dynamic anomaly detection of the graph nodes, and based on the four types of anomalies defined above, the problem is finally transformed into the dynamic anomaly detection based on the node classification task. The set of n time slices is divided according to the temporal characteristics of the transaction data and the zone travel time, the set of edges E of the subgraph is the transactions between the gantries, E is the set of edges of the subgraph, and the adjacency matrix of the graph G is denoted as . Using the information of the transaction data as attribute features of the nodes, the feature matrix of the graph stream is denoted as , where P is the dimension of the attribute characterizing the node. Then, the feature graph at time step t is denoted as ; the embedding of the node at time step t in is denoted as . The anomaly scoring function is used to compute the anomaly scores S of the node at different time steps, and finally, based on the anomaly scores and the threshold c, whether the node v contains the anomaly in Definition 5 is determined.
4. Methodology
The gantry anomaly dynamic detection algorithm proposed in this paper mainly includes three parts: graph convolutional neural network, attention mechanism, and GRU, and the overall framework is shown in
Figure 2. The algorithm first uses the GCN model to model the transaction data and adjacency of the gantry, learns the connection relationship between the gantry nodes and the influence of the transaction data through the GCN, and fuses the node features with the graph structure to generate a new node representation. Then, time-series modeling is performed with the GRU model to capture the time-dependence of the same gantry transactions, which allows for dynamic adjustment of the hiding state and the detection of various anomalies through the computation of reset gates and update gates. Finally, the classification results are obtained through the fully connected layer combined with the SoftMax function.
4.1. Graph Convolutional Neural Network Module
The main focus of this study is on gantry anomalies, which are primarily characterized by missed transactions. As defined in
Section 3, it is evident that the determination of gantry missed transactions is influenced by the transaction history of neighboring gantry nodes. Furthermore, ETC gantries on highways exhibit varying topological relationships in different sections. It is clear that the mutual influence between ETC gantries with different topological relationships will vary accordingly. If we can effectively extract and utilize the topological relationships between ETC gantries, the detection of gantry anomalies will be more accurate.
Conventional Convolutional Neural Networks (CNNs) can only handle spatial data with regular Euclidean structures and are incapable of processing irregular non-Euclidean spatial data [
28]. Therefore, the GCN model was proposed to handle non-Euclidean spatial data. The spatial distribution of ETC gantries on highways constitutes a non-Euclidean geometric structure. Hence, we employ the GCN model to model the spatial distribution of ETC gantries on highways and extract gantry node features.
To begin with, we construct a transaction graph for each time interval using the gantry transaction data, where gantries are considered as nodes, and if there is a transaction between the same vehicle and two gantries, then there is an edge between those two gantries. Assuming we have a transaction dataset Tr = (
vi,
ci,
ti), where vi represents the gantry number, ci represents the license plate number, and t
i represents the transaction time. The gantry transaction graph is represented as a graph G = (V, E), where there are n gantry nodes, and m transaction edges between gantries. Here, V = {
v1,
v2, …,
vn} represents the set of gantry nodes, and E = {(
vi,
vj,
wk)} represents the set of edges extracted from the transaction dataset D, where
vi and
vj denote adjacent gantry numbers. The gantry transaction network using an adjacency matrix:
where
indicates whether there is a transaction edge between gantry nodes i and j. If a transaction edge exists, then
; otherwise,
. Construct the gantry transaction graph according to Algorithm 1:
Algorithm 1: Time Slice Transaction Graph Construction Method |
Input: Gantry topology dataset, vehicle transaction dataset; Output: Graph data 1. # Define the number of gantry nodes and the feature vector dimension 2. num_nodes = 6, feat_dim = 32 3. node_list, edge_list = extract_node_and_edge_info(RSU_data, OBU_data) 4. # Construct the gantry node feature vectors 5. node_feats = {} 6. for i in range(num_nodes): 7. node_feats[i] = np.zeros(feat_dim) 8. # Define the adjacency matrix of a graph 9. adj_mat = np.zeros((num_nodes, num_nodes)) 10. # Read gantry transaction data 11. transactions = read_transactions() 12. # Iterate over all transaction data and update the adjacency matrix and node feature # vectors 13. for trans in transactions: 14. node_id = trans[‘node_id’] # Get node numbers 15. node_feats[node_id][trans[‘feat_index’]] = trans[‘feat_value’] # update the node feature vectors 16. for adj_node_id in trans[‘adj_nodes’]: # Iterate over neighboring nodes 17. adj_mat[node_id][adj_node_id] = 1 # Updating the adjacency matrix 18. # Building a gantry trade chart 19. graph = Graph(adj_mat, node_feats) |
In the above gantry transaction graph construction process, this paper uses the normalization method [
29] to process the adjacency matrix so that it maintains symmetry in information transfer. The symmetric normalization of the adjacency matrix is performed as follows:
where
is the symmetrically normalized adjacency matrix, and during the process of capturing adjacent node features,
prevents the current node’s feature information from being unable to propagate.
is the degree matrix, where
represents the degree of gantry node
, i.e., the number of edges connected to it.
After generating the adjacency matrix as well as the feature matrix , of the time-slice transaction graph, we learn the representation of the graph using GCN. For each time slice, the neighbor information of the node is aggregated in the output vector. The GCN can be trained by a back propagation algorithm and the relationships between node features are learned in the network.
Taking a time-slice transaction graph as an example, the GCN computes the embedding vectors of each node by performing a weighted average of the features of the neighboring nodes. This process can be viewed as a series of convolution operations. For each node i, its embedding vector can be expressed as:
where
is the set of neighboring nodes of node
,
is the normalization factors of the edges between nodes
and
,
is the parameter matrix from layer
to layer
,
is the activation function ReLU.
As shown in
Figure 3, each node in the figure represents the highway ETC gantry, which demonstrates that the GCN model captures the features of the current node and its neighboring nodes through a convolutional layer, and ultimately outputs the current node’s state matrix
that aggregates information of the neighboring nodes,
represents the L-layer GCN, and the specific convolutional process of
is as follows:
4.2. Attention Mechanism
After obtaining the information of the aggregated neighbor nodes of each time transaction graph, in order to capture the dynamic information of the transaction graphs of different time slices, in this paper, we propose a localized proximity time window state, construction based on a contextual attention mechanism, as shown in
Figure 4. In this attention mechanism, each time step t is represented as three aspects:
The query vector is used to represent the focus or information of interest at the current time step . It captures the feature representation of the current time step and is used as a query for similarity calculation with the keys of other time steps.
The key vector is used to represent the information of other time steps. It contains the target for similarity computation with the query vector for the current time step . The key vector can be regarded as a representation of other time steps for comparison with the query of the current time step.
The value vector contains the feature representation or information for each time step. It comprises a sequence of vectors used to compute the attention weights. The value vector can contain important information related to the transaction graph.
At each time step t, the similarity between the query vector and the key vector is usually computed using the dot product, and additive attention (using fully connected layers) method. After the similarity is calculated, it is normalized by softmax function to obtain the attention weight ; these weights indicate the most relevant position to the query at the current position t. Weighted summation of a sequence of value vectors using attention weights yields the context vector . contains information about the query vector at the current time step t, which is used as a feature representation of the current time step to capture information about the dynamics of the transaction graph during a locally proximate time window.
Based on the above framework principle of the attention mechanism,
corresponds to Query in the graph, which is used to map the approaching time window state to the feature of attention at the current location.
corresponds to
Key in the graph denoting the approaching time window state of the current node
at time step
t, which is used to compute the relevance degree. The result Value processed by the mapping function represents the approaching time window state of the current node
at time step
. The feature representation obtained after processing through the mapping function
is used to generate the context representation. The specific calculation process is as follows:
where
denotes the hidden state of the i-th node.
is the size of the window for capturing the proximity time-slice, which determines how many hidden states from the previous time-steps are to be considered to construct the proximity state of the current node.
is the hidden state of multiple time-slice of the
-th node that is mapped to a new vector by a mapping operation for the subsequent attentional weight computation.
and
denote the parameters that are used to optimize the context-based attention mechanism parameters of the optimization based contextual attention mechanism.
is a vector of attentional weights computed by softmax function in vector
. This attentional weight is used to represent the importance of the hidden states of different time slices.
4.3. Dynamic Update Mechanism Based on GRU
In order to realize the dynamic detection of gantry anomalies, we need to capture the transaction characteristics of the gantry topology in different time slices, and the anomalies of the gantry will also change over time. GRU is a kind of Recurrent Neural Network (RNN) that solves the problems of Long Short Term Memory (LSTM) dependence and gradient vanishing in the back-propagation, and is similar to replacing the forgetting gates in LSTM. The input gates in LSTM are replaced by update gates, which makes the GRU network have fewer parameters and improves the training efficiency to a large extent. Therefore, in this paper, we use GRU to model the sequence changes, and capture the node information of the neighboring time slices to the current time slice, as well as the node long-term dependence information. The structure is shown in
Figure 5.
GRU contains two gating units: reset gate (
) can be combined with tanh to change the original information (upper position output and this position input) to obtain the result after processing the original information. Update gate (
) will contain the result after processing the reset gate selected by the update gate, and the weight of the upper position output is summed up to obtain the current position output. The specific computation of the GRU process is as follows:
where
is the activation function
,
is the state representation of the gantry node at the current time slice t, obtained by the GCN model aggregation in
Section 4.1,
is the state representation of the node’s approaching time window, obtained by the contextual attention mechanism in
Section 4.2, Equation (16) denotes the current time-heavy storage information, and Equation (17) denotes the calculation of the update gates
for the current time, the hidden state of the approaching time window
, and the weighted average of the three current pieces of time-weighted storage information
. Finally, the hidden state matrix
of the node at timestamp t is computed.
Based on the above three subsections, we obtained the state matrix
for node t of the time slice and, according to the problem definition, we categorized the n node states in the matrix into five categories, N (normal), BA, MCA, MoCA, and SCA, i.e., C = 5, and computed the score for each node:
where
is the hidden state of the
-th node,
W is the category weight matrix
,
is the weight vector of the
-th category, and
is the bias term of the
-th category. for each node
, the Softmax function is used to compute the score on the category to which it belongs, and to obtain the probability distribution of each category.