Next Article in Journal
Personalized Fair Split Learning for Resource-Constrained Internet of Things
Previous Article in Journal
Application of AI for Short-Term PV Generation Forecast
Previous Article in Special Issue
Multi-Classification and Tree-Based Ensemble Network for the Intrusion Detection System in the Internet of Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems

1
School of Transportation, Southeast University, Nanjing 211189, China
2
Joint Research Institute on Internet of Mobility, Southeast University and University of Wisconsin-Madison, Southeast University, Nanjing 211189, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(1), 86; https://doi.org/10.3390/s24010086
Submission received: 14 November 2023 / Revised: 19 December 2023 / Accepted: 21 December 2023 / Published: 23 December 2023
(This article belongs to the Special Issue Regeneration Control, Sensing and Digital Twin of Eco-Environment)

Abstract

:
The deployment of Electronic Toll Collection (ETC) gantry systems marks a transformative advancement in the journey toward an interconnected and intelligent highway traffic infrastructure. The integration of these systems signifies a leap forward in streamlining toll collection and minimizing environmental impact through decreased idle times. To solve the problems of missing sensor data in an ETC gantry system with large volumes and insufficient traffic detection among ETC gantries, this study constructs a high-order tensor model based on the analysis of the high-dimensional, sparse, large-volume, and heterogeneous characteristics of ETC gantry data. In addition, a missing data completion method for the ETC gantry data is proposed based on an improved dynamic tensor flow model. This study approximates the decomposition of neighboring tensor blocks in the high-order tensor model of the ETC gantry data based on tensor Tucker decomposition and the Laplacian matrix. This method captures the correlations among space, time, and user information in the ETC gantry data. Case studies demonstrate that our method enhances ETC gantry data quality across various rates of missing data while also reducing computational complexity. For instance, at a less than 5% missing data rate, our approach reduced the RMSE for time vehicle distance by 0.0051, for traffic volume by 0.0056, and for interval speed by 0.0049 compared to the MATRIX method. These improvements not only indicate a potential for more precise traffic data analysis but also add value to the application of ETC systems and contribute to theoretical and practical advancements in the field.

1. Introduction

Electronic Toll Collection (ETC) systems signify a remarkable advancement in vehicular management, enabling swift identification and seamless toll transactions. ETC gantry data are characterized by their large volume, diverse modalities, rapid acquisition, coexistence of genuine and erroneous data, and rich potential for transportation applications. These systems are primarily classified into two categories: toll plazas equipped with dedicated ETC lanes and overpass gantries [1,2]. Notably, China has taken a pioneering role in this technological transformation since 2019, systematically phasing out provincial boundary toll stations in favor of the widespread deployment of ETC gantries [3]. This bold initiative has led to the establishment of a robust network, supported by over 230 million ETC users, marking the dawn of a new era in toll collection [4].
ETC systems represent a significant technological advancement in traffic management, characterized by high-quality, extensive coverage and the widespread accessibility of data. These data, renowned for their comprehensiveness, play a crucial role in enabling precise traffic analyses [5,6,7]. However, operating such advanced systems comes with a unique set of challenges, primarily influenced by factors such as system equipment performance, the installation of image automatic capture and recognition systems, weather and lighting conditions, and the proper functioning of equipment [8].
These factors often lead to common issues within ETC gantry system, such as unrecorded license plates, misidentifications, and delayed data uploads [9]. Unrecorded license plates can result from various factors, like poor visibility, suboptimal camera angles, or technical glitches in recognition software, significantly impacting the accuracy and reliability of the data, which, in turn, affect toll collection and traffic monitoring processes. Misidentification of license plates often occurs because of limitations in the optical character recognition technology, especially under challenging conditions, such as low light, adverse weather, or high vehicle speeds. Such errors can lead to incorrect toll charges, causing inconvenience to users and operational complexities for the authorities. Additionally, delayed data uploads pose a significant challenge in systems designed for real-time or near-real-time processing. Delays in data transmission can disrupt the efficient functioning of traffic management systems, especially in situations in which immediate data availability is critical for decision making, like in traffic congestion management or emergency response scenarios. Another noteworthy challenge is the significant variation in distances between ETC gantries, which creates substantial obstacles for seamless traffic monitoring. This irregular spacing can lead to inconsistencies in data collection, with shorter distances potentially resulting in redundant data capture and longer distances risking the omission of important traffic pattern changes.
The complexities and challenges inherent in ETC gantry systems have become pivotal areas for research and innovation. Addressing these challenges necessitates the development of sophisticated solutions that can effectively manage these complexities and enhance the overall efficacy of ETC systems. Our study aims to contribute to this field by introducing a novel approach that not only addresses the technical challenges but also significantly improves the operational aspects of ETC systems, ensuring more accurate and reliable traffic management.
We introduce high-order tensor modeling methods for ETC hub data, as well as ETC hub data completion methods based on improved tensor decomposition techniques. The structure of this article is as follows: We briefly summarize the research results of previous researchers in Section 2. In Section 3, we introduce the high-order tensor model we adopted, including the high-order tensor model for ETC hub data and the data completion method based on improved tensor decomposition technology. In particular, a detailed introduction to high-order tensor models for ETC hub data is provided in Section 3.1, including subtensor block models and extensions of tensor block models. Section 3.2 explains the ETC hub data completion method based on improved tensor decomposition techniques, including a dynamic tensor decomposition model and a sparse control Laplace matrix. Section 4 provides our experimental results and analysis, including the preprocessing of ETC license plate data, the accuracy evaluation, and the computational complexity evaluation. Section 5 summarizes the main findings and contributions of this study and discusses future research directions.

2. Related Works

A comprehensive analysis of prior research will enable us to identify the challenges and gaps that this study seeks to address, making an exploration of related work essential. Facing these challenges, missing data completion has emerged as a vital research frontier. Established methods encompass statistical, deep learning, and tensor-based approaches. On the basis of statistical methods, regression analysis methods using techniques such as linear regression, polynomial regression, and regression trees, are suitable for situations in which there are few missing data but that do not capture the complex relationships among data [10,11,12]. Clustering methods can group the samples in a dataset and then use other samples within the same cluster to estimate the missing values, but clustering methods need to pre-specify the number of clusters, which is limited for high-dimensional data analysis [13]. Methods based on moment decomposition include Singular Value Decomposition (SVD) and Nonnegative Matrix Factorization (NMF), which are widely used in recommender systems and image processing. However, SVD and NMF may be sensitive to outliers, may not work well when dealing with sparse matrices, and are usually used for numerical data, not for categorical data [14]. Bayesian methods allow for the introduction of uncertainty into the estimation of missing values and provide estimates of probability distributions. Markov Chain Monte Carlo (MCMC) and Bayesian networks are part of the Bayesian approach. However, in the case of high-dimensional and large-scale data, the computational complexity of Bayesian methods increases and requires a priori knowledge to specify the probability distribution [15]. Deep learning techniques, such as Recurrent Neural Networks (RNNs) and transformers, have made significant progress in data completion tasks. They can handle complex data patterns and long-term dependencies. However, complex machine learning models may require more computational resources and a large amount of labeled data for training, which, in some cases, may not be easily accessible [16,17,18,19,20]. For tensor-based complementation methods applied to high-dimensional tensors, the increase in dimensionality may lead to dimensionality catastrophe, i.e., the data become very sparse, increasing the probability of missing data points, increasing the difficulty in the complementation problem, leading to an increase in uncertainty in the results of the complementation, and the model suffers from overfitting or difficulty in generalizing to new data [21,22,23].
The intricacy of traffic flow data, with its multifaceted correlations and patterns, necessitates analytical approaches that can adeptly manage their complexity. Tensor models have risen to prominence in this context, offering a robust framework for capturing high-order correlations and multimodal data interactions inherent in traffic systems [24,25]. These models are particularly valued for their ability to exploit the intrinsic low-rank properties of traffic data, thus enhancing the precision and interpretability of the analyses. A significant contribution in this domain has been made by Chen et al., who utilized Gaussian regular polynomial decomposition to unravel the latent structures within the data, applying Bayesian networks to fill in the gaps of incomplete datasets [15]. Their work stands out for integrating probabilistic models with tensor decomposition, providing a powerful tool for data restoration and uncertainty quantification in traffic flow analysis. Expanding upon these foundational models, subsequent research has delved into dynamic tensor flow models that aim to capture the spatiotemporal dynamics of traffic flow. These models are sophisticated, as they are designed to analyze data across both macroscales, which consider long-term trends and cyclic patterns, and microscales, which focus on the minute-by-minute fluctuations of traffic flow [26,27,28]. By doing so, they provide a granular view of traffic dynamics, offering insight into the temporal progression and spatial distribution of traffic congestion, vehicle speeds, and density. However, the application of these models is not without challenges. One critical issue is the computational and storage inefficiencies that arise from the redundant calculations within tensor windows, which often result in increased processing time and memory requirements [29]. This redundancy is particularly problematic when dealing with large-scale traffic datasets, where the efficiency of computation can significantly impact the timeliness and usability of the analysis.
To address these inefficiencies, there has been a push toward optimizing tensor calculations, such as employing more sophisticated tensor decomposition techniques that reduce redundancy and accelerate computation. These techniques include the use of block term decomposition, which partition the tensor into smaller, more manageable blocks, and the implementation of parallel computing strategies that distribute the workload across multiple processors [30]. Furthermore, advancements in hardware, such as the use of GPUs for tensor operations, have also contributed to alleviating the computational burden [30]. In summary, while existing tensor models have laid a solid foundation for traffic data analysis, there is a continual need for innovation to overcome the computational challenges associated with these complex models. Our research contributes to this field by proposing a novel tensor model that not only captures the high-order correlations and spatiotemporal dynamics of traffic flow but also improves computational efficiency through optimized decomposition techniques and algorithmic enhancements.
To combat the aforementioned deficiencies, particularly with large volumes of ETC gantry data, our study introduces an innovative method for missing data completion that leverages an enhanced dynamic tensor decomposition technique. This method not only acknowledges the high-dimensional nature of ETC gantry data but also pioneers the use of a high-order tensor model, refined through an improved tensor Tucker decomposition process. The main contributions of our study are manifold: (1) A dynamic tensor flow model was proposed for decomposing the high-order tensor model of ETC gantry data after analyzing the characteristics of ETC gantry data. (2) The issue of high sparsity in ETC gantry data was addressed through the introduction of an adaptive tensor window to analyze incremental tensor blocks in dynamic tensor flow. Furthermore, an improvement was made to the incremental processing method of the dynamic tensor flow by incorporating a Laplacian matrix. (3) An approximate computation approach was adopted to handle the decomposition of neighboring tensor blocks within the same group, leveraging the characteristics of the gantry data, leading to the computational speed enhancement of the high-order tensor Tucker decomposition.

3. Materials and Methods

3.1. High-Order Tensor Model for ETC Gantry Data

ETC gantry data on highways exhibit multifaceted characteristics, including complex structural patterns, rapid update frequencies, and varied density values across different data types. To address these complexities, this section introduces a high-order tensor model framework specifically designed for ETC gantry data. This framework aims to filter out attributes that do not pertain to traffic analysis and to streamline the dimensionality of the data. The extraction methodology is depicted in Figure 1.
In Figure 1, we provide an overview of our proposed high-order tensor model for processing ETC gantry data. The figure illustrates the architecture of the model, which consists of two main components: the high-order tensor model for ETC gantry data on the left and the traffic data calculation process based on the model on the right. The left component encapsulates ETC gantry data in the form of tensor blocks, indicating the multidimensional properties of data collected from various gantry structures. This is further decomposed into subtensor blocks for detailed analysis and processing. The right component shows the sequence of our improved dynamic tensor decomposition model, which starts from a high-order tensor model and is decomposed through dynamic tensor flow decomposition, combined with the Laplace matrix for sparsity control, and reaches its climax through the incremental approximate decomposition of the tensor blocks. The interaction between the subtensor blocks and improved dynamic decomposition model is the core of achieving precise traffic data computation, which is the ultimate goal of our research.

3.1.1. Multidimensional Tensor Construction for ETC Gantry Data Analysis

We construct a three-dimensional tensor model that intricately integrates time, gantry space, and vehicle user dimensions. We measure the time dimension in seconds, setting the minimum counting unit at one second, and tailoring the parameters to reflect the dynamics of traffic flow accurately. The statistical duration determines the time dimension’s granularity. The gantry space dimension captures the driving direction, gantry number, and functionality, providing a spatial context for our analysis. We define the vehicle user dimension through parameters such as license plate information, vehicle model, and transaction status, ensuring a comprehensive categorization. Our tensor model meticulously records vehicle passage times, offering granular insight into the temporal flow, while the spatial dimension is marked by specific gantry identifiers. However, the model’s expansion across time, space, and vehicle diversity leads to increased computational complexity and tensor size—a challenge compounded by significant sparsity. Large periods and areas lacking vehicle transactions result in an unnecessarily inflated tensor, complicating data storage and computation. A three-dimensional tensor model of the ETC gantry data is shown in Figure 2.
Figure 2 visually represents the three-dimensional tensor model for ETC gantry data. In this model, the x-axis represents time, segmented into units of 4 s each, capturing the temporal aspect of traffic flow. The y-axis corresponds to the gantry ID, delineating the specific location of each gantry along the highway. The z-axis indicates the presence or absence of vehicles in different lanes at any given time. Distinct colors represent different lanes for ease of interpretation: orange for the first lane, yellow for the second, and green for the third. This color-coding aids in visualizing the distribution and movement of traffic across various lanes. The diagram exemplifies the model’s capacity to encapsulate detailed information about vehicle flow, including temporal and spatial dynamics. However, it also highlights the challenge of tensor inflation due to large periods and areas lacking vehicle transactions, which can complicate data storage and computational processes.
To address these issues, our analysis employs innovative techniques to compress the tensor effectively. These methods enable us to discern significant patterns within the sparse dataset, preserving the fidelity of the temporal and spatial details. Our approach ensures that the model remains both manageable and reflective of the intricate patterns within traffic data.

3.1.2. ETC Gantry Tensor Block and Subtensor Block Models

This study introduces gantry tensor block models to encapsulate the traffic statistics of a gantry over one-minute intervals. These models facilitate streamlined calculations through a methodically organized sequence. In alignment with high-order tensor models, these blocks adopt time, space, and vehicle user dimensions as their fundamental structure. Each time dimension within a tensor block corresponds to 1 min, comprising 60 distinct time components. The spatial dimension is bifurcated into 2 key factors: gantry number and gantry malfunction. The vehicle user dimension mirrors that of the high-order tensor model, ensuring consistency. Crucially, the process of dimensionality reduction applied to ETC gantry data retains the intrinsic properties of the original high-dimensional dataset within a more manageable, low-dimensional framework. For practical implementation, attributes such as ETC gantry number, capture time, and license plate identification were meticulously selected to construct a subtensor block model, as follows:
X L P R R   I t i m e × I s p a c e × I c a r I t i m e = t p i c t i m e I s p a c e = s i d I c a r = c l i c e n s e
Herein, X L P R represents the subtensor block of the ETC gantry plate recognition data, encompassing the capture time, I t i m e = t p i c t i m e ; gantry number, I s p a c e = s i d ; and identified license plate, I c a r = c l i c e n s e .
Similarly, the ETC gantry transaction data are synthesized into a subtensor block model, effectively enabling the assessment of transactional patterns:
X t r a d e R   I t i m e × I s p a c e × I c a r I t i m e = t t r a d e t i m e I s p a c e = s i d I c a r = c l i c e n s e , c t y p e , c m a t c h
Herein, X t r a d e represents the subtensor block of the ETC gantry transaction data, encompassing the capture time, I t i m e = t t r a d e t i m e ; gantry number, I s p a c e = s i d ; and vehicle license plate recognition information, vehicle type, and transaction outcome, I c a r = c l i c e n s e , c t y p e , c m a t c h .

3.1.3. Extension of the Subtensor Block Models to the Tensor Block Model

The subtensor block models are integrated into a higher-order tensor space; components within the same dimension and of identical order are merged, preserving the unique orders. The amalgamation of subtensor blocks is executed as follows:
A R I × J × K I = i 1 , i 2 , i 3 ,   J = j 1 , j 2 , j 3 , K = k 1 , k 2 B R I × J , I = i 1 , i 2 , i 4 , J = j 1 , j 2 f : A × B C ,   C R I × J × K I = i 1 , i 2 , i 3 , i 4 , J = j 1 , j 2 , j 3 , K = k 1 , k 2
Equation (3) delineates the method for merging subtensor blocks. In this context, A R I × J × K and B R I × J represent tensor blocks with different dimensions, where A R I × J × K has tensor dimensions I = i 1 , i 2 , i 3 , J = j 1 , j 2 , j 3 ,   a n d   K = k 1 , k 2 , and B R I × J has tensor dimensions I = i 1 , i 2 , i 4   a n d   J = j 1 , j 2 . A R I × J × K and B R I × J are superimposed to form tensor C R I × J × K with dimensions I = i 1 , i 2 , i 3 , i 4 , J = j 1 , j 2 , j 3 ,   a n d   K = k 1 , k 2 .
By expanding each subtensor block model, we construct a tensor block model, which is then sequentially organized along the timeline to form a composite, higher-order tensor model. Parameters of identical order within each dimension constitute the foundational attribute parts, while those of varying orders are categorized as extensible attribute parts, facilitating a modular approach to model construction.

3.1.4. Traffic Data Calculation Based on the High-Order Tensor Model

Traffic data extraction was performed through calculations using a high-order tensor model. Figure 3a depicts this model, which utilizes 30 min traffic statistics from two ETC gantries, each with three lanes. In this model, the vehicle types within tensor blocks are color-coded, enhancing the model’s visual interpretation. This tensor model is adaptable to various spatial component transformations—including expansion, reflection, shear, projection, and rotation—enabling complex, multidimensional traffic data calculations.
For instance, Figure 3b illustrates the calculation of traffic data for lane 2 at the G005032001000110010 gantry. By segmenting a slice of the tensor model for this specific gantry, the model yields comprehensive traffic information for lane 2. This slice encompasses tensor blocks representing different time intervals and crucial traffic parameters, such as traffic volume, average time headway, vehicle type distribution, and space occupancy rate. Additionally, the process of retrieving vehicle-level information is demonstrated in Figure 3c. By entering the license plate details of a specific vehicle, the model directly locates the corresponding tensor block. Subsequently, traffic parameters, including interval speed and time headway for that vehicle, are accurately computed.

3.2. ETC Gantry Data Completion Method Based on Improved Tensor Decomposition

This study introduces an Improved Dynamic Tensor Decomposition (IDTD) model, specifically designed to extract high-dimensional and dynamic traffic features with enhanced accuracy and efficiency. To counter the sparsity issue prevalent in high-dimensional data, a Laplacian matrix was employed to regulate parameter sparsity. Furthermore, approximate calculations were conducted to ascertain the tensor core and factor matrices, significantly reducing computational overhead.

3.2.1. Improved Dynamic Tensor Decomposition Model

The sparse nature of traffic data presents significant challenges. The “vehicle user-time stamp” information matrix typically exhibits low density, which becomes increasingly problematic as the volume of highway vehicles and the frequency of traffic record timestamps surge, thereby diminishing the information density related to vehicles. The addition of a spatial dimension exacerbates this sparsity. Our approach involves analyzing tensor flow within a stipulated timeframe and introducing a tensor window to the data tensor flow. The tensor flow for a fixed period with a tensor window size of w is denoted as:
D t , w = X T w + 1 , , X T X t R I 1 × I 2 × , . . . , I N + 1
As depicted in Figure 4a, the tensor flow X t R I 1 × I 2 × , . . . , I N , 1 t T performs local processing using a sliding tensor window of size w . Moreover, dynamic traffic flow information is captured using the dynamic tensor model. The decomposition of the Nth-order tensor flow X t R I 1 × I 2 × , . . . , I N , 1 t T , is articulated as:
X t G t × 1 U 1 × 2 U 2 × × N U N + E t G t R J 1 × J 2 , . . . , × J N
where U n represents the factor matrix corresponding to mode n (the principal component of mode n ); G t is the core tensor, encapsulating independent features that articulate the interactions among various pattern principal components; and U 1 , U 2 , a n d   U 3 are the factor matrices for modes 1, 2, and 3, respectively. The constituents of the core tensor stream, G t , mirror the interplay between principal components and the temporal dynamics of this tensor series. The intricacies of tensor flow decomposition are exemplified in Figure 4b.

3.2.2. Laplacian Matrix for Sparsity Control

The dynamic tensor flow decomposition model for ETC gantry data is illustrated in Figure 5. This model postulates that a segment of vehicle passage information is represented by the tuple c , e , v , t , where the passage information v for vehicle c at gantry e is recorded in period t , ( t = 1 , . . . , T ) . The ETC gantry data, structured as a time-series, form a dynamic tensor sequence X t R n c × n e × n v , where n c is the count of vehicles passing within the chosen time, n e denotes the quantity of ETC gantries, and n v represents the volume of passage data points. Leveraging Tucker decomposition, the dynamic tensor sequence decomposes as:
X t Y t × c C t × e E t × v V t
where Y t R r c × r e × r v is the core tensor sequence encapsulating dynamic behavioral patterns that describe the interplay among vehicles, gantries, and passage information. It signifies the likelihood of the occurrence for passage information from group j v at group j e gantries for vehicles belonging to group j c before time t .
The matrix C t R n c × r c serves as the vehicle factor matrix up until time t , where C t i c , j c indicates the probability that the i c t h vehicle belongs to the j c t h vehicle group.
The matrix E t R n e × r e is the factor matrix for the ETC gantries before time t , and E t i e , j e illustrates the likelihood that the i e t h gantry belongs to the j e t h gantry group.
The matrix V t R n v × r v represents the factor matrix for passage information until time t , and V t i v , j v denotes the probability that the i v t h passage belongs to the j v t h passage information group.
The tensor model for ETC gantry traffic data is inherently intricate. While large tensor windows yield strong classification capabilities, they concomitantly escalate computational complexity [31]. To maintain the precision of tensor approximation decomposition within homogenous tensor groups, we introduced variable tensor window sizes. The interrelation among discrete tensor blocks was quantified using the Pearson correlation coefficient r i j , setting the tensor window size predicated on the threshold r i j > 0.95 . The Pearson correlation coefficient and the corresponding tensor window size are computed as follows:
r i j = C O V X i , X j σ X i σ X j
w i = j = 1 T sgn ( r i j ) sgn ( r i j ) = 1 , r i j 0.95 0 , r i j < 0.95
Figure 6 illustrates the range of maximum and minimum tensor window values, which vary according to different correlation coefficients. To minimize the computational load while preserving the integrity of the tensor decomposition approximations, this study synthesized the established discriminative threshold of the Pearson correlation coefficient with the concrete context of the ETC gantry data [32]. A correlation coefficient exceeding 0.95 r i j > 0.95   i s indicative of a strong correlation, prompting the adjustment of the tensor window size to this stringent standard.
To address the challenges posed by high data sparsity, the Laplacian matrices L c , L e ,   a n d   L v were employed, quantifying the similarity among individual entities and their collective groups, namely, passing vehicles, ETC gantries, and transit information. The Laplacian matrix is defined as:
L = D W
where D is the degree matrix, and W is the similarity matrix. The i , j t h matrix element represents the similarity between the i t h and j t h entities. The similarity degrees for vehicles, gantries, and transit data are tailored according to parameters like car model, geographic proximity, and interval headway time. Direct comparisons among entities are facilitated by numerical differentials, with vehicle model specifics calibrated by the conversion factor detailed in Table 1.
The elements of the similarity matrix, W , along each column are aggregated to yield n values, and these n values are then positioned on the main diagonal to construct the degree matrix, D , as an n × n diagonal matrix, leaving the nondiagonal entries at zero. The derived Laplacian matrices, L c , L e ,   a n d   L v , are symmetric and positive semi-definite, each with its minimum eigenvalue being zero. This characteristic is instrumental for the execution of incremental approximate decomposition of the tensor block, simplifying the computational process involved in the analysis of the ETC gantry data.

3.2.3. Calculation of the Incremental Approximate Decomposition of the Tensor Block

The ETC gantry data exhibit a substantial spatial correlation. Given the minor variations within neighboring tensor blocks, these correlations can be harnessed to conserve computational resources. The sparse increment at time t , denoted by Δ X t = X t + 1 X t , facilitates a more efficient dynamic tensor decomposition.
  • Initialization of the Core Tensor and Constraint Terms:
Utilizing Tucker decomposition, we acquire the initial tensor X 1 , alongside the constraint term L m | m = 1 M R n m × n m , within the M-dimensional tensor sequence X t R I 1 × I 2 × . . . × I m .
2.
Initial Tensor Covariance Matrix Calculation:
At time t = 1 , the covariance matrix for the m t h dimension of the initial tensor X 1 is defined as:
C 1 m = X 1 m X 1 m T + μ m L m
where X 1 ( m ) is the unfolded matrix of X 1 along the m t h dimension, and μ ( m ) is the weight for L m | m = 1 M R n m × n m .
3.
Factor Matrix and Core Tensor Calculation:
The factor matrix U 1 m R n m × r m comprises the leading r m eigenvectors of the covariance matrix, C 1 m | m = 1 M . The core tensor Y 1 R r 1 × × r M is computed correspondingly. This procedure is outlined in Table 2.
4.
Update of the Factor Matrix U t + 1 m
The revised factor matrix, U t + 1 m , is realized through the calculation of the update of the eigenvalue, λ t + 1 , i m , and the eigenvector, u t + 1 , i m , as C t m = X t m X t m T + μ m L m , whose eigenvalue is λ t , i m with an eigenvector of u t , i m ; the size of the change in X t is Δ X t .
λ t + 1 , i m = λ t , i m + Δ λ t , i m
u t + 1 , i m = u t , i m + Δ u t , i m
Bringing the expressions for λ t + 1 , i m and u t + 1 , i m into C t + 1 m and omitting t simplifies the equation.
[ ( X m + Δ X m ) ( X m + Δ X m ) T + μ m L m ] ( u i m + Δ u i m ) = ( λ i m + Δ λ i m ) ( u i m + Δ u i m )
By focusing only on first-order variable terms, we simplify to:
X m X m T Δ u i m + ( X m Δ X m T + Δ X m X m T ) u i m + μ m L m Δ u i m   = λ i m Δ u i m + Δ λ i m u i m
The orthogonality of eigenvectors permits the characterization of the changes in Δ u i m using the current eigenvectors:
Δ u i m j = 1 r m α i j u j m
where α i j is the constant to be calculated and substituted to obtain:
( X m X m T + μ m L m ) j = 1 r m α i j u j m + X m Δ X m T + Δ X m X m T u i m   = λ i m j = 1 r m α i j u j m + Δ λ i m u i m
The above equation can be further simplified by multiplying the left-hand side of the equation by u k m T , and the above equation can be further simplified:
λ k m α i k + u k m T ( X m Δ X m T + Δ X m X m T ) u i m = λ i m α i k
α i k = u k m T ( X m Δ X m T + Δ X m X m T ) u i m λ i m λ k m
The eigenvectors satisfy orthogonal unitization, which yields the following:
( u i m + Δ u i m ) T ( u i m + Δ u i m ) = 1 1 + 2 ( u i m ) T Δ u i m + ο ( | | Δ u i m | | 2 ) = 1
Eliminating the higher-order regression term leads to α i i = 0 .
Δ u i m = j i u j m T ( X m Δ X m T + Δ X m X m T ) u i m λ i m λ j m u j m
5.
Updating the Core Tensor, Y t + 1 R r 1 × . . . × r M
The core tensor can be updated according to the updating of the factor matrix, U t + 1 m R n m × r m , and the calculation process for the core tensor Y t + 1 R r 1 × . . . × r M is summarized in Table 3.

3.2.4. ETC Gantry Data Completion Based on Improved Tensor Dynamic Decomposition

Restoration of missing ETC gantry data leverages the correlation among inter-tensor block traffic data. We approximate the kernel tensor for missing data by analyzing the decomposition of surrounding tensor blocks.

3.2.5. Description of the ETC Gantry Data Completion Issue

It is assumed that there exists a higher-order tensor block, X R I 1 × I 2 × I 3 , with missing values and a set of observable data within X ,   w h i c h is Φ [ I 1 ] × [ I 2 ] × [ I 3 ] . Given that x i j k i , j , k Φ , the elemental values outside the observable dataset should be calculated as rapidly as possible while ensuring accuracy. Therefore, the observable dataset within X is projected onto each spatial dimension of the tensor block as follows: if ( i , j , k ) Φ , then the observed value of element P Φ ( x ) is x i j k ; otherwise, the elemental value is zero.
Ρ Φ X = x i j k , i , j , k Φ 0 , else
where P Φ is the projection operator.

3.2.6. Construction of the Objective Function

The method adopted for ETC gantry data completion, rooted in Tucker decomposition, addresses the associated optimization problem.
f G , U 1 , U 2 , U 3 = argmin 1 2 Ρ Φ G × 1 U ( 1 ) × 2 U ( 2 ) × 3 U ( 3 ) 2 2
To minimize the function, the core tensor G and factor matrices U ( 1 ) , U ( 2 ) ,   a n d   U ( 3 ) are the factor matrices to be completed. The current function can be minimized by obtaining the core tensor and factor matrix.

3.2.7. Solution to the Objective Function

  • Initial Solution:
Taking into account the properties of adjacent tensor blocks, the initial solution for the objective function is estimated using the tensor core and factor matrices, represented as G 0 , U 0 ( 1 ) , U 0 ( 2 ) ,   a n d   U 0 ( 3 ) .
2.
Optimization of Solutions:
The solution is refined through iterative computation of the approximate gradients for each variable. This iterative process ensures the optimal update of the objective function, guided by the following update rule:
G k = argmin G f G k 1 , U ~ , G G k 1 + L G 2 G G k 1 F 2 + λ i G 1 = max 0 , G k 1 1 L G G f ( G k 1 , U ~ ) λ G L G
U k ( i ) = argmin U ( i ) f ( G ~ , U ~ ( j i ) , U ( i ) ) , U U k 1 ( i ) + L U ( i ) 2 U U k 1 ( i ) F 2 + λ i U ( i ) 1 = max 0 , U k 1 ( i ) 1 L U ( i ) U ( i ) f ( G ~ , U ~ ( j i ) , U ( i ) ) λ i L i
In the above equations, G f G k 1 , U ~   a n d   U ( i ) f ( G ~ , U ~ ( j i ) , U ( i ) ) are the gradients of f ( G , U ( 1 ) , U ( 2 ) , U ( 3 ) ) to G and U ( i ) , respectively, and L G and L U ( i ) are the Lipschitz constants of the corresponding gradient function, respectively, satisfying the Lipschitz conditions.
f G k 1 1 , U ~ f G k 1 2 , U ~ L G G 1 G 2 F , G 1 , G 2
f G ~ , U ~ j i , U k 1 i 1 f G ~ , U ~ j i , U k 1 i 2 L U i U i 1 U i 2 F , U i 1 , U i 2
The flow of the ETC gantry data completion calculation, based on tensor decomposition, is presented in Table 4.

4. Results

4.1. ETC Gantry Plate Data Preprocessing

Our empirical analysis centered on a segment of the Shanghai–Chongqing Expressway, specifically the G50 Nanxun to Jingze Interchange (G50 Nanjing section). Over the 30 days from 1–30 January 2022, we collated 24 sets of ETC gantry data via the toll system on this stretch. These datasets comprise both static information (including vehicle, toll station, and pass card details, alongside toll system specifics) and dynamic records (detailing travel, toll transactions, and system validations). This study focused on the investigation and analysis of ETC gantry plate recognition data for a specific road section. Three main types of anomalies in ETC gantry plate recognition data were identified: incorrect plate recognition data, missing plate recognition data, and duplicate plate recognition data. The causes and forms of these anomalies are outlined in Table 5:
Directly applying abnormal ETC gantry plate recognition data in the computation of traffic parameters can severely disrupt model results and analyses. The preprocessing of abnormal plate recognition data in this section includes the identification of ETC gantry abnormal plate recognition data based on classification concepts, preliminary repair of abnormal plate recognition data, and evaluation of the integrity of ETC gantry plate recognition data.
Initially, by establishing a normal plate recognition data model using standards and norms, one can predetermine the characteristics of normal data within the data table. This includes correct data formats/rules and distribution models for each attribute. The ETC gantry plate recognition data table is then checked against these standards, including data format, degree of fit within distribution models, repetition of vehicle passage information through each gantry, and vehicle travel direction. Differences between the tested data and the normal data characteristics are used to identify anomalies.
In the second step, the prematch of abnormal ETC gantry plate recognition data involves the correction of erroneous, duplicate, and missing data through deletion, expansion, completion, and transformation, thus enhancing the data usability. As indicated above, different processing methods can be selected based on the cause and form of the anomalies. For incorrect plate recognition data, identified errors are deleted from the standard data table and then reclassified as missing data, moving into the repair process for missing plate recognition data. The issue of missing data is resolved using automated completion and repair methods based on auxiliary analysis data tables, ETC gantry toll transaction tables, and ETC gantry abnormal event logs, which are manually processed, to ensure the accuracy of the data and assist in the completion of missing ETC gantry plate recognition data. For instance, if part of the data for a vehicle are missing in the gantry plate recognition data table, one can use the complete information available for that vehicle within the table to search the entire vehicle passage record in the auxiliary analysis tables, utilizing the correlation of the same vehicle’s short-term operational status to extract usable data for completion. In cases where multiple identical entries appear in the ETC gantry plate recognition data table, the uniqueness of the billing transaction identification number is used as a standard to retain one instance of the duplicate plate recognition data and discard the rest.
Finally, this paper quantitatively describes errors, omissions, and duplications in plate recognition data information and evaluates the quality of ETC gantry plate recognition data. A commonly used data quality evaluation index formula is as follows:
T D I N = N U M v a l i d N U M a l l
Within the formula, N U M v a l i d represents the amount of valid data in the original ETC gantry plate recognition data table that is nonredundant and complete. N U M a l l is the theoretical total amount of data that should be in the ETC gantry plate recognition data table. T D I N is the evaluation index for the quality of the ETC gantry plate recognition data.
Issues such as network transmission delays can lead to delayed uploads of plate recognition data. The ETC gantry system tolerates delays in data upload to a certain extent, but uploads exceeding a specific time range can affect highway toll transactions. Therefore, this section sets t as the acceptable time range. When the data upload time is within this range, it is marked as a normal upload; otherwise, it is considered delayed. The data evaluation index formula is adjusted as follows:
T D I N = N U M o b t a i n N U M d u p l i c a t e N U M d e l a y N U M a l l
Here, N U M d e l a y represents the volume of delayed uploaded data. N U M d u p l i c a t e is the volume of duplicate data in the ETC gantry plate recognition data table. N U M o b t a i n is the actual total volume of data obtained by the ETC gantry. After identifying and initially repairing the abnormal data, the quality of the ETC gantry plate recognition data is evaluated. The data quality of the G50 Nanjing section ETC gantry for January is illustrated in Figure 7.
The quality of the ETC gantry data was generally high, with the monthly quality level exceeding 85% and an average data quality level of 90.4% for the month. There were 11 days in the month when the ETC gantry data quality fell below 90% (referred to as poor data quality), which is already superior to the detection effects of most detectors on highways. Utilizing meteorological data from the ETC gantry system’s built-in weather detection equipment and referencing weather data, an investigation was conducted on the meteorological conditions in January 2022 in the area of the G50 Nanjing section (Suzhou). The region experienced minimal variation in January temperatures, with an average temperature of 7.93 °C, including an outlier of 1.4 °C. The region’s precipitation for the month was 74.8 mm over 9 days of rainfall, with the specific weather conditions shown in Figure 8 below.
Given the minor fluctuations in regional temperatures and its location in the south, where there are no cold temperatures, it can be inferred that temperature has a minimal impact on the ETC gantry’s detection ability. This paper primarily analyzed the quality of ETC gantry data under poor visibility conditions, such as overcast and rainy days. By comparing the meteorological conditions on dates when the ETC gantry data quality was above 90% with those below 90%, it was found that the proportion of rainy days with poor quality was 77.78% of the total number of rainy days and 63.64% of the total number of days with poor quality. This suggests that the quality of the ETC gantry data is affected by adverse weather conditions such as rainfall.
The accuracy and reliability of the ETC plate recognition data directly impact the operation of highway toll collection. An analysis of the actual usage of the G50 Nanjing section’s ETC gantry data reveals the following advantages: (1) Quick updates, with a large volume of data accumulating over a short period of time. Millions of vehicle passage data can be generated daily. Moreover, the data have high consistency, strong structure, and low update costs. (2) Rich dimensions of ETC gantry data provide multiple perspectives for traffic data analysis. The ETC gantry system includes static data, such as vehicle information, toll station information, pass card information, and toll system information, as well as dynamic data, like travel information, charging information, and system verification information.
The collection, transmission, and storage of plate recognition data require a multitude of devices and network services, which introduces the possibility of missed captures and identification errors: (1) Evasion tactics such as covering license plates, along with unstable equipment conditions, can result in missing fields and data anomalies within ETC gantry data, making it impossible to guarantee data quality. Concurrently, improper use by drivers, such as excessive speed, loss of ETC electronic tags, the application of special windshield films, and small vehicles being obscured by larger vehicles in tow, can lead to detection failures at the gantry, impacting data accuracy and leading to issues such as transaction anomalies, nonrecognition of vehicles, and missing vehicle statistics in the G50 Nanjing section ETC gantry. (2) The ETC gantry span in the G50 Nanjing section reaches up to 9 km, with insufficient monitoring of traffic flow between gantries. Additionally, the uneven and sparse distribution of ETC gantry detectors may lead to biases in the description of vehicle behavior. For instance, a vehicle with a license plate ending in 375 recorded passage times at two gantries, G005032001000310010 and G005032001000410010, approximately 2 min apart. With the gantries about 6 km apart, this would calculate to a speed of 180 km/h, which is inconsistent with road traffic rules. (3) ETC gantry video detectors are used for vehicle plate data collection. These video detectors rely on optical principles, and factors such as dust, shadows, weather, and lighting can affect the detection results; hence, natural elements like rainy weather and nighttime conditions impact the accuracy of gantry detection. (4) The minimum distance between gantries in the same direction on the G50 Nanjing section is set at 400 m, with two ETC gantries spaced 500 m apart, and three road segments have relatively small distances between them. Figure 9 below shows the proportion of duplicate vehicle detection data for January among ETC gantry segments, with blue, red, and green representing the three closely spaced segments, and grey representing other segments. Duplicate detection issues are more significant in the three closely spaced segments.
For tensor model construction, PyTorch 1.11 was the framework of choice. The subtensor block models were formulated using data fields from each ETC gantry within the G50 Nanjing section, encompassing aspects such as token recognition, transactions, foundational information, and atypical state indicators. These subtensor components were methodically categorized per dimension into basic and expandable attributes. The expandable attributes were then amalgamated and preserved based on the foundational attributes of each dimension to establish a three-dimensional, high-order tensor block model for the ETC gantry system.
The method’s efficacy was gauged by its accuracy and computational complexity. We contrasted this approach with a matrix decomposition-based completion model (MATRIX) and a prediction model founded on Long Short-Term Memory (LSTM) neural networks. The MATRIX model relies on the low-rank nature of matrices and known elements to decompose the target matrix into two lower-rank matrices, thereby reconstructing the missing entries [33]. Conversely, the LSTM-based model capitalizes on the predictive capabilities of neural networks for ETC gantry data restoration. Preliminary processing of the ETC gantry data revealed notable instances of missing information for January, which included discrepancies arising from incorrect data removal and delays in data transmission. These instances are encapsulated in Table 6.
In each missing data situation, two days of missing data were selected as the original data input, and the average time headway, h t i , j ; traffic volume, q i ; and mean interval speed, v s ¯ ( i , i + n ) were the traffic data to be completed. After the data are calibrated using the transaction data sheet, the data can be considered accurate and set to the true value. The Root Mean Square Error (RMSE) was selected as the evaluation index for this data supplementation, and the equation is as follows:
R M S E = i = 1 n ( x i ^ x i ) 2 n
where x i ^ is the value of complementary traffic data, x i is the true value of traffic data, and n is the number of complementary traffic data points.

4.2. Accuracy Evaluation

The traffic data completion error values for the MATRIX, LSTM, and IDTD models under various missing data scenarios are presented in Table 7, with a comparative analysis depicted in Figure 10.
On the basis of the above results, it can be seen that IDTD outperformed the other two algorithms for all types of traffic data completion under different missing data situations:
  • All three algorithms increased the completion error with an increase in the missing rate, among which MATRIX had the greatest effect on the completion with the missing rate, and IDTD has the least effect with the missing rate and could better adapt to severe missing data;
  • Among the selected traffic data, the time headway completion error was the largest, whereas the traffic volume and interval average speed completion errors were similar. The time headway was influenced by the driving environment and independent choices of the driver. It was less effective for the low-dimensional MATRIX and only useful for the time-series LSTM.

4.3. Evaluation of the Computational Complexity

The evaluation of the computational complexity in this study focused on the static tensor calculation model used for real-time update passage records into the historical data and global data calculations. For our ETC gantry tensor model, the number of gantries is denoted as m , the number of vehicles recorded historically is n , the number of updated vehicles is Δ n , and the number of core tensor and factor matrix iterations in the static high-order ETC gantry tensor model is ο ( m 2 ( n + Δ n ) 2 ) . The daily traffic volume on the G50 Nanjing highway is more than 20,000 vehicles; therefore, the calculation volume under the static tensor model is extremely large. If D ( i ) is the number of features of each point in the i t h -dimensional space in the improved dynamic tensor decomposition model for ETC gantries, the high sparsity of the ETC gantry data leads to D ( i ) n i . Thus, the computational complexity of the factor matrix increment in each dimension in the conventional calculation is ο ( n ( i ) D ( i ) ) , whereas the computational complexity of the factor matrix increment in the approximate tensor decomposition is only ο ( D ( i ) ) . The computational time complexity for updating the core tensor corresponding to the static high-order tensor decomposition and IDTD are ο ( T i = 1 3 ( D ( i ) ( n ( i ) ) 2 + ( n ( i ) ) 3 ) ) and ο ( T i = 1 3 r ( i ) ( n ( i ) + 1 ) D ( i ) ) , respectively, which show that the computational complexity is significantly reduced.

4.4. Discussion

This paper’s analysis indicates that the abnormal phenomenon of the ETC gantry data is not only accidental but also indicates systematic problems from technical failures to deliberate avoidance strategies. We identified and corrected data anomalies based on the completion model. The initial repair strategy, including deletion, expansion, completion, and transformation, greatly improved the availability of data, ensuring that the subsequent analysis was based on as accurate data as possible.
The impact of weather and other environmental conditions on data quality cannot be ignored. The research results of G50 indicate that adverse weather conditions, especially rainfall, can have a negative impact on the quality of ETC gantry data. This is a key area of future research, as developing more powerful systems that can maintain high data quality regardless of environmental conditions is crucial for the reliability of traffic data analysis.
The IDTD model was used for tensor decomposition and compared with the MATRIX and LSTM models, demonstrating the potential of advanced computing techniques to address the challenges of data sparsity and complexity. The excellent performance of the IDTD model in accuracy and reducing computational load indicates its potential applicability in a range of scenarios where large-scale sparse datasets are prevalent. This study lays the foundation for future research. It is necessary to further improve data preprocessing techniques, develop more flexible data collection systems for environmental factors, and explore the scalability of IDTD models.

5. Conclusions

This paper introduced a high-order tensor model tailored for ETC gantry data that adeptly captures temporal and spatial structural information. Through the utilization of an Improved Dynamic Tensor Decomposition (IDTD) approach, we successfully addressed the challenges posed by the considerable sparsity of ETC gantry data. By employing a Laplacian matrix within the IDTD framework, we effectively mitigated computational burdens and preserved essential traffic characteristics in the data completion process. Our comparative analysis, illustrated through a case study, confirms the superior accuracy and reduced computational complexity of the IDTD-based algorithm over traditional methods. The IDTD method, therefore, holds substantial promise for comprehensive ETC gantry data completion and has the potential for broader application across diverse datasets and scenarios. Future research will explore extending the reach of IDTD in handling large-scale ETC gantry datasets and its adaptability to other data-intensive domains.

Author Contributions

Methodology, Y.R. and Y.Z.; validation, Y.R. and Y.Z.; data curation, W.L. and C.W.; writing—original draft preparation, Y.R. and Y.Z.; writing—review and editing, Y.R., Y.Z., W.L. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Key R&D Program of Shandong Province, China. No. 2020CXGC010118.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Randriamasy, M.; Cabani, A.; Chafouk, H.; Fremont, G. Geolocation process to perform the electronic toll collection using the ITS-G5 technology. IEEE Trans. Veh. Technol. 2019, 68, 8570–8582. [Google Scholar] [CrossRef]
  2. Lin, M.Y.; Chen, Y.C.; Lin, D.Y.; Hwang, B.F.; Hsu, H.T.; Cheng, Y.H.; Liu, Y.T.; Tsai, P.J. Effect of implementing electronic toll collection in reducing highway particulate matter pollution. Environ. Sci. Technol. 2020, 54, 9210–9216. [Google Scholar] [CrossRef] [PubMed]
  3. Hong, Y.; Sha, Y.; Ding, F.; Chen, Z.; Zhang, D. Resource allocation optimization and performance evaluation for 5G cellular vehicle-to-everything (C-V2X). J. Automot. Saf. Energy 2023, 14, 346–354. [Google Scholar] [CrossRef]
  4. Chen, H.; An, B.; Sharon, P.; Miao, C.; Soh, Y. DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation. Thirty-Second. AAAI Conf. Artif. Intell. (AAAI-18) 2018, 32, 757–765. [Google Scholar] [CrossRef]
  5. Chen, Z.; Wu, B.; Li, B.; Ruan, H. Expressway exit traffic flow prediction for ETC and MTC charging system based on entry traffic flows and LSTM model. IEEE Access 2021, 9, 54613–54624. [Google Scholar] [CrossRef]
  6. Zhao, Y.; Lu, W.; Rui, Y.; Ran, B. Classification of the Traffic Status Subcategory with ETC Gantry Data: An Improved Support Tensor Machine Approach. J. Adv. Transp. 2023, 3, 2765937. [Google Scholar] [CrossRef]
  7. Wang, J.; Zhu, R.; Li, T.; Gao, F.; Wang, Q.; Xiao, Q. ETC-oriented efficient and secure blockchain: Credit-based mechanism and evidence framework for vehicle management. IEEE Trans. Veh. Technol. 2021, 70, 11324–11337. [Google Scholar] [CrossRef]
  8. Ding, F.; Mi, G.; Tong, E.; Zhang, N.; Bao, J.; Zhang, G. Multi-channel high-resolution network and attention mechanism fusion for vehicle detection model. J. Automot. Saf. Energy 2022, 13, 122–130. [Google Scholar] [CrossRef]
  9. Zhao, Y.; Zhang, B.; Cao, Y.; Rui, Y.; Ran, B. Application of data fusion based on clustering-neural network for ETC gantry flow capacity correction. In CICTP; ASCE Library: Reston, VA, USA, 2022; pp. 239–249. [Google Scholar]
  10. Wu, Y.; Tan, H.; Li, Y.; Zhang, J.; Chen, X. A fused CP factorization method for incomplete tensors. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 751–764. [Google Scholar] [CrossRef]
  11. Chiou, J.M.; Zhang, Y.C.; Chen, W.H.; Chang, C.W. A functional data approach to missing value imputation and outlier detection for traffic flow data. Transp. B Transp. Dyn. 2014, 2, 106–129. [Google Scholar] [CrossRef]
  12. Li, L.; Li, Y.; Li, Z. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transp. Res. Part C Emerg. Technol. 2013, 34, 108–120. [Google Scholar] [CrossRef]
  13. Zhu, Z.; Xu, M.; Wang, K.; Lei, C.; Xia, Y.; Chen, X. A non-local grouping tensor train decomposition model for travel demand analysis concerning categorical independent variables. Transp. Res. Part C Emerg. Technol. 2023, 157, 104–124. [Google Scholar] [CrossRef]
  14. Chen, X.; He, Z.; Wang, J. Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transp. Res. Part C Emerg. Technol. 2018, 86, 59–77. [Google Scholar] [CrossRef]
  15. Chen, X.; He, Z.; Sun, L. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transp. Res. Part C Emerg. Technol. 2019, 98, 73–84. [Google Scholar] [CrossRef]
  16. Li, L.; Zhang, J.; Wang, Y.; Ran, B. Missing value imputation for traffic-related time series data based on a multi-view learning method. IEEE Trans. Intell. Transp. Syst. 2018, 20, 2933–2943. [Google Scholar] [CrossRef]
  17. Zhang, Y.F.; Thorburn, P.J.; Xiang, W.; Fitch, P. SSIM—A deep learning approach for recovering missing time series sensor data. IEEE Internet Things J. 2019, 6, 6618–6628. [Google Scholar] [CrossRef]
  18. Chen, Z.; Lu, Z.; Chen, Q.; Zhong, H.; Zhang, Y.; Xue, J.; Wu, C. A spatial-temporal short-term traffic flow prediction model based on a dynamical-learning graph convolution mechanism. arXiv 2022, arXiv:2205.04762. [Google Scholar] [CrossRef]
  19. Han, L.; Zheng, K.; Zhao, L.; Wang, X.; Wen, H. Content-aware traffic data completion in ITS based on generative adversarial nets. IEEE Trans. Veh. Technol. 2020, 69, 11950–11962. [Google Scholar] [CrossRef]
  20. Yang, B.; Kang, Y.; Yuan, Y.; Huang, X.; Li, H. ST-LBAGAN: Spatio-temporal learnable bidirectional attention generative adversarial networks for missing traffic data imputation. Knowl. Based Syst. 2021, 215, 106705. [Google Scholar] [CrossRef]
  21. Lu, W.; Zhou, T.; Li, L.; Gu, Y.; Rui, Y.; Ran, B. An improved Tucker decomposition-based imputation method for recovering lane-level missing values in traffic data. IET Intell. Transp. Syst. 2022, 16, 363–379. [Google Scholar] [CrossRef]
  22. Deng, L.; Liu, X.Y.; Zheng, H.; Feng, X.; Chen, Y. Graph spectral regularized tensor completion for traffic data imputation. IEEE Trans. Intell. Transp. Syst. 2021, 99, 10996–11010. [Google Scholar] [CrossRef]
  23. Wu, H.; Yang, C.; Xie, W.; Zhang, W. Joint matrix decomposition-based missing data completion in low-voltage area. Math. Probl. Eng. 2021, 2021, 4170064. [Google Scholar] [CrossRef]
  24. Tan, H.; Feng, G.; Feng, J.; Wang, W.; Zhang, Y.-J.; Li, F. A tensor-based method for missing traffic data completion. Transp. Res. Part C Emerg. Technol. 2013, 28, 15–27. [Google Scholar] [CrossRef]
  25. Zhang, X.; Zhang, Y.; Wei, X.; Hu, Y.; Yin, B. Traffic forecasting with missing data via low rank dynamic mode decomposition of tensor. IET Intell. Transp. Syst. 2022, 16, 1164–1176. [Google Scholar] [CrossRef]
  26. Wang, J.; Wu, J.; Wang, Z.; Gao, F.; Xiong, Z. Understanding urban dynamics via context-aware tensor factorization with neighboring regularization. IEEE Trans. Knowl. Data Eng. 2020, 32, 2269–2283. [Google Scholar] [CrossRef]
  27. Dong, H.; Ding, F.; Tan, H.; Zhang, H. Laplacian integration of graph convolutional network with tensor completion for traffic prediction with missing data in inter-city highway network. Phys. A Stat. Mech. Appl. 2022, 586, 126474. [Google Scholar] [CrossRef]
  28. Liu, C.; Wu, T.; Li, Z.; Wang, B. Individual traffic prediction in cellular networks based on tensor completion. Int. J. Commun. Syst. 2021, 34, e4952. [Google Scholar] [CrossRef]
  29. Liu, C.; Wu, T.; Li, Z.; Ma, T.; Huang, J. Robust online tensor completion for IoT streaming data recovery. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10178–10192. [Google Scholar] [CrossRef]
  30. Li, L.; Lin, X.; Liu, H.; Lu, W.; Zhou, B.; Zhu, J. Displacement Data Imputation in Urban Internet of Things System Based on Tucker Decomposition with L2 Regularization. IEEE Internet Things J. 2022, 8, 13315–13326. [Google Scholar] [CrossRef]
  31. Gong, C.; Zhang, Y. Urban traffic data imputation with detrending and tensor decomposition. IEEE Access 2020, 8, 11124–11137. [Google Scholar] [CrossRef]
  32. Qin, J.; Ma, Q.; Yu, X.; Wang, L. On synchronization of dynamical systems over directed switching topologies: An algebraic and geometric perspective. IEEE Trans. Autom. Control 2020, 65, 5083–5098. [Google Scholar] [CrossRef]
  33. Cui, Z.; Henrickson, K.; Ke, R.; Wang, Y. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4883–4894. [Google Scholar] [CrossRef]
Figure 1. Traffic information extraction process for ETC gantry data.
Figure 1. Traffic information extraction process for ETC gantry data.
Sensors 24 00086 g001
Figure 2. Diagram of the three-dimensional tensor model of ETC gantry data.
Figure 2. Diagram of the three-dimensional tensor model of ETC gantry data.
Sensors 24 00086 g002
Figure 3. Diagrammatic representations of traffic data calculations based on the high-order tensor model: (a) layout of high-order tensor model; (b) dissection of the high-order tensor model to analyze traffic data (c) process of extracting specific vehicle information within the high-order tensor mode for traffic.
Figure 3. Diagrammatic representations of traffic data calculations based on the high-order tensor model: (a) layout of high-order tensor model; (b) dissection of the high-order tensor model to analyze traffic data (c) process of extracting specific vehicle information within the high-order tensor mode for traffic.
Sensors 24 00086 g003aSensors 24 00086 g003b
Figure 4. Diagrams of the dynamic tensor flow decomposition: (a) tensor window diagram; (b) diagram of the tensor flow decomposition.
Figure 4. Diagrams of the dynamic tensor flow decomposition: (a) tensor window diagram; (b) diagram of the tensor flow decomposition.
Sensors 24 00086 g004
Figure 5. Dynamic tensor processing of ETC gantry traffic data.
Figure 5. Dynamic tensor processing of ETC gantry traffic data.
Sensors 24 00086 g005
Figure 6. Maximum and minimum tensor window values based on varying criteria.
Figure 6. Maximum and minimum tensor window values based on varying criteria.
Sensors 24 00086 g006
Figure 7. The data quality of the G50 Nanjing section’s ETC gantry for January.
Figure 7. The data quality of the G50 Nanjing section’s ETC gantry for January.
Sensors 24 00086 g007
Figure 8. Meteorological conditions in the Suzhou Area of the G50 Nanjing Section.
Figure 8. Meteorological conditions in the Suzhou Area of the G50 Nanjing Section.
Sensors 24 00086 g008
Figure 9. Proportion of duplicate detection data within ETC gantry segments.
Figure 9. Proportion of duplicate detection data within ETC gantry segments.
Sensors 24 00086 g009
Figure 10. Traffic data completion comparison under different missing data scenarios.
Figure 10. Traffic data completion comparison under different missing data scenarios.
Sensors 24 00086 g010
Table 1. Numerical conversion factors for vehicle types.
Table 1. Numerical conversion factors for vehicle types.
Vehicle TypeNumber
Minibus1.0
LargeBus1.5
Minivan2.0
Medium-sized truck2.5
Large truck4.0
Extra-large truck5.0
Table 2. Initial core tensor and factor matrix calculation process.
Table 2. Initial core tensor and factor matrix calculation process.
Input:  X 1 , L m | m = 1 M R n m × n m
Output:  Y 1 ,     U 1 m | m = 1 M ,   λ   1 m | m = 1 M
Begin
1for m = 1 : M
2 C 1 m u 1 , i m = λ 1 , i m u 1 , i m
## Calculate the first r m eigenvalues, λ 1 , i m , with eigenvector u 1 , i m of C 1 m .
3   U 1 ( m ) = d i a g ( u 1,1 m , , u 1 , m m )
4end for
5Return Y 1 ,     U 1 m | m = 1 M ,   λ   1 m | m = 1 M
End
Table 3. Calculation flow for updating the core tensor.
Table 3. Calculation flow for updating the core tensor.
Input:  X t ,   Δ X t ,   U t m , λ t m  
Output:  Y t + 1 , U t + 1 m | m = 1 M , λ t + 1 m | m = 1 M
Begin
1for m = 1 : M
2  for i = 1 :   r m
3     λ t + 1 , i m = λ t , i m + Δ λ t , i m
    ## update the factor matrix
4     u t + 1 , i m = u t , i m + Δ u t , i m
5     U t + 1 ( m ) = d i a g ( u t + 1,1 m , , u t + 1 , m m )
6  end for
7end for
8       Y t + 1 = ( X t + Δ X t ) m = 1 M × ( m ) U t + 1 m T
9Return Y t + 1 , U t + 1 m | m = 1 M , λ t + 1 m | m = 1 M
End
Table 4. Computational process for ETC gantry data completion via tensor decomposition.
Table 4. Computational process for ETC gantry data completion via tensor decomposition.
Input: X R I 1 × I 2 × I 3 , Dynamic Tensor Flow W T ,
Iteration error ε 0 = 1 0 6
Output: X N E W
Begin
1Calculate Ρ Φ X ;
2Apply W T approximate tensor decomposition to obtain the initial solution G 0 ,   U 0 ( 1 ) ,   U 0 ( 2 ) ,   U 0 ( 3 ) ; ## initialization of the objective function solution
3Calculate ε = X X N E W F X F ; ## Error calculation
4While ε > ε 0 , do ## Set iteration conditions
5   G k = max 0 , G k 1 1 L G G f ( G k 1 , U ~ ) λ G L G ;
6   U k ( i ) = max 0 , U k 1 i 1 L U i U i f ( G ~ , U ~ j i U ( i ) ) λ i L i ;
7end for
8 X N E W = G k × 1 U k ( 1 ) × 2 U k ( 2 ) × 3 U k ( 3 )
9Return X N E W = G k × 1 U k ( 1 ) × 2 U k ( 2 ) × 3 U k ( 3 )
End
Table 5. Causes and forms of anomalies in ETC gantry plate recognition data.
Table 5. Causes and forms of anomalies in ETC gantry plate recognition data.
CausesForms
Incorrect Plate RecognitionDetector malfunctionUnrecognized characters (e.g., “A00000_9” as a default)
Upload failure or nonstandard storageVehicle plate numbers with fewer digits than normal, as well as other data format errors; sudden changes in driving parameters, vehicle plate numbers with fewer digits than normal
Vehicle evasionParts of the plate not existing in reality, indicative of toll evasion
Missing Plate RecognitionHigh traffic flow parameters, weather, operator error, and detector failure High traffic flow parameters, weather, operator errorAbsence of vehicle passage information in the data table; Absence of vehicle passage information
Communication network failures, power supply issuesComplete or partial absence of vehicle passage information; Complete or partial absence of vehicle passage information
Duplicate Plate RecognitionVehicle plate data transmission faultA single vehicle detected by an ETC gantry within a short interval (typically within 10 s)
Adjacent detectors duplicate detection (i.e., interference between gantries)Multiple ETC gantries recording the passage information of the same vehicle within a short time frame (typically within 10 s)
Table 6. Overview of missing ETC gantry data in January.
Table 6. Overview of missing ETC gantry data in January.
Missing Data < 5 % 5 10 % 10 15 %
Number of days71211
Table 7. RMSE values for traffic data completion.
Table 7. RMSE values for traffic data completion.
Missing Situation<5%5–10%10–15%<5%5–10%10–15%<5%5–10%10–15%
MATRIXLSTM IDTD
h t i , j 0.01010.01170.01640.00910.00980.01350.00500.00770.0105
q i 0.00860.01090.01360.00640.00790.00950.00300.00560.0069
v s ¯ ( i , i + n ) 0.00850.01020.01490.00590.00840.01130.00340.00490.0066
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rui, Y.; Zhao, Y.; Lu, W.; Wang, C. Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems. Sensors 2024, 24, 86. https://doi.org/10.3390/s24010086

AMA Style

Rui Y, Zhao Y, Lu W, Wang C. Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems. Sensors. 2024; 24(1):86. https://doi.org/10.3390/s24010086

Chicago/Turabian Style

Rui, Yikang, Yan Zhao, Wenqi Lu, and Can Wang. 2024. "Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems" Sensors 24, no. 1: 86. https://doi.org/10.3390/s24010086

APA Style

Rui, Y., Zhao, Y., Lu, W., & Wang, C. (2024). Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems. Sensors, 24(1), 86. https://doi.org/10.3390/s24010086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop