1. Introduction
In the realm of transportation systems, marked by their increasing complexity and the growing prevalence of shared mobility options, the escalating issues of traffic congestion, accidents, and environmental pollution are becoming more pronounced [
1,
2,
3]. Within the United States, congestion costs surged from USD 166 billion in 2014 to USD 190 billion in 2019 [
4,
5,
6]. Traffic simulation has emerged as a pivotal tool to address these multifaceted challenges, encompassing concerns such as traffic congestion, safety, pollution, and energy consumption [
7,
8,
9]. In pursuit of enhancing safety, a calibrated microscopic traffic simulation model forms the foundation for a safety assessment model [
10,
11]. Furthermore, Hishikawa et al. [
7] conducted an evaluation of the safety of mixed traffic, which includes pedestrians and personal motor vehicles, by analyzing the impact of geometric and traffic conditions using a traffic simulation model.
The research and development of software pertaining to traffic simulation constitute a vital component of the broader field of traffic simulation research [
12,
13,
14]. Among the notable simulation models, CORridor SIMulation (CORSIM) stands out as a microscopic traffic simulation model tailored for the analysis of both highways and urban roads. It utilizes the safety distance model as its car-following (CF) model [
8,
15]. In contrast, Simulation of Urban Mobility (SUMO) (Version 1.19.0), an open-source and multimodal microscopic traffic simulation software, adopts the Gipps extension model as its CF model [
9]. Park et al. [
16] conducted a comprehensive study on the calibration and validation of a microscopic traffic simulation model, using the Verkehr In Städten—SIMulationsmodell (VISSIM) simulation model to analyze a coordinated actuated signal system. It is noteworthy that traffic simulation software fundamentally relies on specific and intricate analytical mathematical models to facilitate vehicle traffic simulation [
17]. However, these analytical models have inherent limitations, as they can only account for a limited number of factors influencing driving behavior [
18,
19]. Furthermore, these models simulate different traffic behaviors in isolation, leading to significant disparities between simulation outcomes based on analytical models and real-world observations [
20,
21,
22]. Wastavino et al. pointed out the inherent limitations of analytical models in fully capturing the intricacies of driving behavior [
23]. Subsequently, Toledo et al. developed and tested a driving behavior framework that integrates both the CF model and the lane-changing (LC) model. However, this data-driven framework’s complexity and inflexibility posed challenges [
24].
Conversely, some scholars have proposed an alternative approach, contending that various driving behavior models are inherently embedded within vehicle trajectories. They suggest that the challenging task of concurrently modeling the CF and LC models can be supplanted by directly modeling vehicle trajectories [
25,
26,
27]. For instance, Tomar employed a multilayer perceptron to predict the future trajectories of discrete vehicles [
28]. Cui et al. developed a deep convolutional neural network designed for predicting future vehicle trajectories [
29]. Additionally, Zhang et al. introduced the hybrid retraining constrained (HRC)–long short-term memory (LSTM) method for the simultaneous prediction of vehicle trajectories [
30]. While extensive research on trajectory prediction has validated its effectiveness in simultaneously modeling CF and LC behaviors, its simulation efficiency in large-scale networks remains questionable. This is primarily because such models often focus on predicting the trajectory of individual vehicles without adequately considering the future movements of surrounding vehicles. Consequently, in large-scale networks, particularly within high-density traffic flows, simulating the entire traffic scenario requires iterating over all vehicles in the scene, significantly reducing computational efficiency. Given these challenges, the development of a simulation framework that can efficiently simulate all vehicles in a large-scale network scenario while maintaining high computational efficiency is essential.
As for the realm of data-driven machine learning models, Li et al. proposed a multivariate ensembles-based hierarchical linkage strategy, fusing the benefits of multivariate ensembles model into the hierarchical linkage technique [
31]. By employing the developed hierarchical leveling strategy for system decomposition and the subsequent application of the mapping engine model for synchronous mapping of subsystem outputs, this approach facilitates the construction of a multi-level system reliability framework. Abdelaty et al. developed and evaluated seven data-driven modeling techniques, spanning both machine learning and statistical models, providing a comprehensive analysis of each model’s applicability [
32].
Addressing the aforementioned limitations, we present a novel data-driven traffic simulation framework that leverages deep neural networks as the core data-driven kernel model. This framework integrates human driving experience data for decision-making, allowing it to simultaneously model various vehicle driving behaviors while considering both vehicle-specific factors and environmental influences. Distinct from conventional model-driven methods and individual vehicle trajectory prediction approaches, our framework concurrently simulates the entire road segment, rather than sequentially simulating vehicles on an individual basis. To facilitate a comprehensive comparison, we also establish model-driven and trajectory prediction frameworks and calibrate their respective parameters. To enhance the training efficiency and performance capabilities of the data-driven framework, we introduce improvements to the mean-squared error (MSE) loss function and introduce TransMSELoss as the designated loss function during network training. In single-step prediction, the data-driven framework demonstrates impressive accuracy, achieving 97.22% on the training set and 95.76% on the test set. Moreover, in the context of five consecutive prediction experiments, the data-driven framework consistently outperforms the model-driven framework, exhibiting accuracy levels that are generally 5% to 10% higher. Furthermore, we delve into the scalability of the simulation frameworks, exploring promising avenues for enhancing their capabilities. Additionally, we conduct a thorough transferability analysis, revealing that the data-driven framework exhibits exceptional transferability. This work carries significant implications, not only in terms of achieving more accurate simulations, but also in enabling the feasibility of large-scale road network simulations.
The rest of this paper is organized as follows:
Section 2 introduces the frameworks of model-driven traffic simulation, trajectory prediction, and the data-driven traffic simulation.
Section 3 describes the parameter configuration of the model.
Section 4 gives the comparison results and verifies the performance of the frameworks.
Section 5 studies the transferability of the model.
Section 6 concludes the paper.
2. Methodology
In this section, we delve into the specifics of our framework, starting with the model-driven traffic simulation framework and its submodel, trajectory prediction, which utilizes the LSTM architecture. Following that, we provide an in-depth exploration of the structure of the data-driven traffic simulation framework, including a detailed breakdown of the two neural network structures employed within it.
2.1. Model-Driven Traffic Simulation Framework
Typically, existing traffic simulation software relies on various microscopic traffic simulation models to simulate vehicle operations. These models encompass the simulation of vehicle driving behavior, with a primary focus on CF and LC models, as outlined by Toledo [
33].
2.1.1. CF Model
CF models, such as the intelligent driver model (IDM), elucidate how a vehicle tracks the movements of the preceding vehicle within a single lane. These models employ analytical formulas to characterize the vehicle’s longitudinal behavior on the roadway. CF models find extensive applications across various domains, including microscopic traffic simulation, analysis of driving behaviors, capacity assessment, and traffic safety [
34]. For the purpose of this paper, we designate the IDM as our benchmark model [
35].
The IDM is depicted by Equations (
1) and (
2), offering a unified description of the transition of vehicles from free-flow conditions to congested flow. This model assesses the vehicle’s following motion by considering the acceleration induced by a range of social forces, encompassing both the driving force and resistance. The driving force emanates from the driver’s psychological inclination to attain a desired speed, while the resistance arises from the influence of the leading vehicle on the following vehicle.
where
is the acceleration of the vehicle
i at time
t;
denotes the maximum acceleration of the vehicle
i;
is the speed of the vehicle
i at time
t;
is the free-flow speed;
stands for the acceleration index
;
is the spacing between the vehicle
i and the vehicle
;
is the desired spacing;
is the stationary safety distance;
is a coefficient usually taken as 0;
is the safe time headway; and
is the comfortable deceleration.
2.1.2. LC Model
The Minimizing Overall Braking Induced by Lane changes (MOBIL) model utilizes vehicle acceleration to define the utilities and assesses LC decisions by comparing the utilities of the current lane and the locally affected vehicles on the target lane, both before and after the LC behavior [
36].
The MOBIL model posits that acceleration serves as an intuitive indicator of the lane selection’s effectiveness. Essentially, driver decision-making in LC involves selecting a lane that offers the prospect of improved acceleration. According to the MOBIL model, the decision-maker evaluates the benefits derived from changing lanes and the potential ramifications for vehicles in both the original lane and the target lane during the LC process. Lane changes are considered necessary only when they contribute to enhancing the overall system’s utilities.
The MOBIL model also emphasizes that the main safety concerns are centered around the LC vehicle and the following vehicles in the target lane while executing the lane change maneuver. If either the following vehicle or the LC vehicle experiences substantial deceleration, it indicates an unsafe lane change. As a result, it is imperative to ensure that the safety criteria specified in Equation (
5) are met during the lane change process.
where
is acceleration calculated according to the CF model when vehicle
selects lane
k;
is the utility of vehicle
selecting lane
k;
is the following acceleration of the vehicle
after the vehicle
changes lanes, while
is the following acceleration of the vehicle
before changing lanes;
and
are the acceleration of the following vehicle on the target lane after and before vehicle
changing lane;
and
are the acceleration of the following vehicle on the original lane after and before vehicle
changing lane;
denotes the gap between the vehicle
and the following vehicle on the traget lane;
is the altruistic factor;
is the threshold for the improvement of the overall utility of changing lanes;
is the asymmetric lane change parameters; and
denotes the absolute value of the maximum deceleration to ensure driving safety.
2.2. Trajectory Prediction
LSTM, a specialized recurrent neural network, possesses the ability to effectively model time series data with extended time intervals and autonomously ascertain the optimal prediction time lag [
37]. The utilization of LSTM networks enables us to grasp human driving trajectory data, consequently facilitating the advancement and amalgamation of CF and LC vehicle driving behaviors [
38,
39].
LSTM diverges from traditional recurrent neural networks by governing the cell state via three inherently parameterized control gates: the forget gate, input gate, and output gate.
In the initial stage of LSTM, the determination of which information to discard from the cell state is orchestrated by a sigmoid layer known as the “forget gate”. This gate takes inputs, namely (the previous output) and (the current input), and produces an output value ranging between 0 and 1 for each element in the cell state (the previous state). A value of 1 signifies complete retention of the information, while a value of 0 signifies its complete removal.
The subsequent stage involves determining which information to retain in the cell state, a process divided into two distinct steps. Firstly, a sigmoid layer referred to as the “input gate” determines which values should be updated. Following that, a hyperbolic tangent (tanh) layer constructs a candidate vector, denoted as , which will be incorporated into the state of cells.
Finally, the output of the cell unit is controlled by the output gate.
Output Gate:
where
is the sigmoid layer;
is the input data at time
t;
,
, and
are the forget gate, input gate, and output gate at time
t, respectively;
is the hidden state;
is the cell state at the time
t;
W is the weight coefficient; and
b is the bias.
2.3. Data-Driven Traffic Simulation Framework
2.3.1. ConvLSTM
Convolutional long short-term memory (ConvLSTM), a derivative of LSTM, introduces a pivotal modification by replacing the full connection layer in LSTM with a convolutional layer and altering the model’s input from a vector to a 3D tensor. This adjustment is made to address the inherent limitation of LSTM in encoding spatial information during the state transition process [
40,
41]. The equation expression for ConvLSTM is as follows, where the ∗ symbol signifies the convolution operator, and the ∘ symbol denotes the Hadamard product:
where
is the sigmoid layer;
is the input data at time
t;
,
, and
are forget gate, input gate, and output gate at time
t, respectively;
is the hidden state;
is the cell state at the time
t;
W is the weight coefficient; and
b is the bias.
2.3.2. ConvGRU
Gated recurrent unit (GRU), another variation of LSTM, distinguishes itself by dispensing with the memory unit present in LSTM and consolidating the input gate and forget gate into a single component called the “update gate” [
42,
43]. Additionally, it introduces the “reset gate”, amalgamating the advantages of recurrent units with convolutional layers. This enables convolutional gated recurrent unit (ConvGRU) to effectively process sequential data that possess spatial dimensions. By regulating the information flow through the reset gate, ConvGRU enhances its capability to capture and model long-range dependencies, rendering it a valuable asset across diverse domains. With each new input, the reset gate determines whether to clear the previous state, while the update gate governs the extent to which new information is incorporated into the state.
where
is the sigmoid layer;
f is the activation;
is the input data at time
t;
,
,
, and
are update gate, reset gate, memory state, and new information at time
t, respectively; and
W is the weight coefficient.
2.3.3. Data-Driven Traffic Simulation Framework Structure
The structure of the data-driven traffic simulation framework is depicted in
Figure 1. In this framework, training data comprise vehicle traffic data from a particular scenario, with the central component consisting of various deep neural networks. Once these neural networks are trained, they are utilized to make predictions on the test set.
4. Result Analysis
To examine the prediction accuracy of various frameworks, we conducted experiments for both single-step prediction and continuous prediction. Additionally, this section includes schematic diagrams illustrating the predictions made by different frameworks.
4.1. Performance Analysis
To evaluate the performance of the framework, ConvLSTM and ConvGRU are employed as the data-driven models within the framework, with the parameters specified in
Table 3. Simultaneously, the calibrated IDM model and the trained LSTM model are applied to the same dataset to compare one-step prediction performance. The results are presented in
Table 4.
As indicated in
Table 4, in the model-driven traffic simulation, IDM and MOBIL achieve a position accuracy rate of 79.30% after parameter calibration. This is attributed to the inherent intertwining of the CF and LC processes of vehicles. Furthermore, as it is not a neural network model, there is no distinction between the training and test sets. The model-driven traffic simulation utilizes IDM to capture longitudinal behavior and employs the MOBIL model for lateral behavior. The allocation of tasks between IDM and MOBIL during real vehicle driving introduces variations in the prediction accuracy rate.
The position accuracy rate of the trajectory prediction model mirrors that of the model-driven traffic simulation in the training set but drops to less than 60% in the test set. This is due to LSTM’s distinctive recurrent neural network structure, which excels at integrating historical driving information, making it well-suited for trajectory prediction. However, the training and test set errors in the model training results are very similar, indicating that the LSTM we trained has not been overfitted. One significant reason for the notable difference in LSTM’s performance between the training and test sets is that the LSTM model for trajectory prediction predominantly focuses on individual vehicle trajectories and overlooks the surrounding vehicle information. Additionally, the model complexity of LSTM may not be sufficient due to its inability to fully capture the intricate dynamics and dependencies inherent in the trajectory prediction task.
Additionally, the position accuracy rates of the simulation framework utilizing ConvLSTM and ConvGRU surpass 95% on the training set, and on the test set, they remain above 90%, regardless of which deep neural networks are employed. These results indicate the framework’s outstanding prediction performance and remarkable fitting ability. The two deep neural networks within the framework can achieve high prediction accuracy in both the training and test sets because their training input and output consist of 3D tensor data comprising time-series road sections. This implies that the model can capture not only historical data characteristics of vehicle driving behavior, but also surrounding information characteristics. Moreover, the framework does not predict the driving behavior of an individual vehicle but rather an entire road section, which may positively impact prediction accuracy.
4.2. Prediction Display
Figure 3 shows the real road section and the road sections predicted by IDM, ConvLSTM, and ConvGRU. It can be seen that there is still a significant difference between the case predicted by IDM and the real one. As for the cases predicted by ConvLSTM and ConvGRU, they are basically the same as the real one.
4.3. Continuous Prediction Comparison
To assess the framework’s performance in traffic simulation, continuous prediction capability was also tested on the test set, as depicted in
Figure 4. The three curves, comprising continuous prediction results for ConvLSTM, ConvGRU, and IDM, clearly show that ConvLSTM and ConvGRU significantly outperform the IDM model in terms of prediction accuracy. It deserves recognition that continuous prediction using LSTM was omitted due to its relatively poor prediction performance. Additionally, ConvLSTM outperforms ConvGRU in the initial step but exhibits slightly lower performance in subsequent continuous predictions. Detailed results can be found in
Table 5.
4.4. Scalability Analysis
This section primarily focuses on comparing the scalability of large-scale simulation among the three aforementioned frameworks. In contrast to the trajectory prediction framework, where the input and output involve vehicle information, the simulation framework deals with road section data. This implies that in large-scale traffic flow simulations, the time complexity of the simulation framework is solely related to the overall road section size, and not dependent on the number of vehicles [
45]. In other words, the time complexity for tracking vehicle trajectories is
, while the time complexity for the simulation framework is a constant
.
To assess the scalability of various frameworks, we conducted two sets of comparative experiments focused on computing time. One set involved controlling the simulation scale while varying the density, while the other set involved controlling the density while altering the simulation scale. The framework’s simulation inputs encompass scenes of varying sizes, while the inputs for the model-driven and trajectory prediction frameworks pertain to different vehicle quantities. The simulation results are presented in
Table 6 and
Table 7. To ensure the experiment’s reliability, the scene size in each column of
Table 6 corresponds precisely to the number of vehicles. It is significant to mention that the ’Large’ in the ’Scale’ column pertains to the simulation scale, a topic that is subsequently explored in greater detail in
Table 7. The platform used for the simulation experiments is an Intel I5-10400F with 16 GB 2666 Mhz RAM and an NVIDIA GeForce GTX 1660 SUPER 6 GB.
The time cost of the data-driven framework remains unchanged with increasing density, in stark contrast to the other two frameworks, which show substantial variations. It should be emphasized that the time cost of the other two frameworks increases nearly linearly with density. Furthermore, the computational efficiency of the framework does not display a significant advantage over the other two frameworks when the scale is small or the density is low. However, in scenarios involving larger scale or higher density, its superiority becomes markedly evident.
Regarding the results, as the size increases (from normal-scale to large), a noticeable phenomenon emerges: the time cost of the data-driven framework significantly rises. This can be attributed to the increase in DNN hidden layer parameters with the expanding size, leading to decreased computational efficiency. In contrast, for the other two frameworks, the results are similar, with computation time showing a near-linear relationship with size.
Furthermore, in the case of large-scale scenes (extra-large and ultra-large), the computational efficiency of the data-driven framework surpasses that of the other two frameworks significantly. Notably, the linear growth observed in the normal-scale case is disrupted, mainly due to the substantial increase in computational load for the other two frameworks. However, the data-driven framework remains relatively insensitive to large-scale scenarios, and thus does not yield a significant improvement in computation time (It merits attention that for simulating the extra-large scene, we used SUMO software, and each simulation step took 7.149835 s).
In summary, it can be concluded that the data-driven framework exhibits distinct advantages for large-scale simulation when compared to the other two frameworks. Furthermore, the data-driven framework represents a potential enabler that has the potential to overcome existing limitations in the realm of large-scale simulation.
6. Conclusions and Future Work
Traditional traffic simulation has predominantly depended on complex analytical models to mimic traffic behaviors accurately. This study, however, pivots towards a novel data-driven traffic simulation framework anchored in deep neural networks. By capitalizing on road section data, this framework adeptly models and forecasts the dynamics of road sections, facilitating a more nuanced traffic simulation. Central to our methodology are ConvLSTM and ConvGRU, serving as the primary models within this innovative framework. Their efficacy was tested using data from the I-80 road section, with their performance measured against traditional models such as IDM and LSTM trajectory prediction models. Additionally, the framework’s adaptability was evaluated through its application to the US-101 road section data. The salient findings from this study include the following:
Demonstrating superior predictive accuracy, our data-driven framework significantly outperforms traditional models, achieving 97.22% accuracy in training and 95.76% in testing, a stark contrast to the 79.30% by model-driven frameworks. Despite its robust training set performance, the trajectory prediction aspect showed tendencies of overfitting, affecting test set accuracy.
In evaluating continuous prediction capabilities, our framework consistently outstripped traditional models, even as accuracy naturally declined over five prediction steps. Notably, ConvLSTM showed particular prowess in single-step forecasts over ConvGRU, although it slightly lagged in multi-step predictions.
An innovative loss function modification enhanced prediction accuracy by integrating tensor loss computation post-softmax operation into MSELoss, thus quickening network convergence and bolstering accuracy without succumbing to overfitting.
Scalability analyses underscored the framework’s exceptional performance in large-scale simulations, despite not showing marked advantages in smaller setups. ConvLSTM, in particular, emerged as the superior choice for extensive simulation tasks over ConvGRU.
Transferability tests affirmed the framework’s utility across different datasets, with a pre-trained model mirroring the original dataset’s accuracy at 93.48%.
Within the burgeoning field of automated and connected automated vehicles, the applications for this innovative data-driven framework are vast, offering a new horizon for vehicle communication and operational efficiency. At its foundation, the framework is powered by deep neural networks, highlighting the critical need for high-quality data to fully harness the framework’s predictive capabilities. It is imperative to recognize, however, that the reliance on the NGSIM dataset confines the scope of our study primarily to basic highway traffic modeling, rather than extending to more intricate traffic scenarios like signalized intersections or highway merge areas. For a truly holistic approach to traffic simulation that encompasses a variety of scenarios, further model development is indispensable. It is crucial to highlight that while the proposed framework exhibits significant advantages over traditional model-driven frameworks in terms of computational complexity and efficiency, it still encounters challenges such as memory overflow when dealing with ultra-large-scale simulations. In light of this, it becomes imperative to develop corresponding memory management optimization mechanisms based on this research to further enhance computational efficiency. This research represents a pioneering step towards a novel methodology in traffic flow simulation, suggesting substantial opportunities for future advancements. Integrating self-learning mechanisms into the model holds the promise of transforming it into a highly dynamic and adaptive tool for traffic management solutions, enabling the system to autonomously refine and optimize its predictive accuracy over time. Such advancements could significantly elevate the framework’s ability to anticipate and respond to complex traffic scenarios, thereby enhancing safety and efficiency on the road.