1. Introduction
Traffic demand describes the spatial distribution characteristics of travel of passengers and goods, which reflects the basic information of motor vehicles’ movement between traffic zones. Traffic demand is the most basic input parameter in urban long-term transportation planning and short-term traffic management and control. Urban traffic route guidance, speed limits, congestion control and other measures require an accurate origin-destination (OD) demand matrix. Traffic demand not only provides a decision-making basis for the balance of supply and demand of urban road networks and government construction investment, but also provides data support for the development of urban motorization. Only by accurately estimating the OD demand can we grasp the traffic characteristics of the current road network and provide more targeted and effective strategies to ease traffic congestion in the processes of urban planning and transportation planning.
With the development of urban traffic information acquisition technology, we can collect observation data from a variety of traffic detectors. Detector types include loop-coil detection, ultrasonic detection, and image recognition detection. The detection data types mainly include road flow, speed, density, travel time, vehicle trajectory, etc. This paper aims to estimate traffic demand using these multisource data.
In recent years, scholars have used a variety of theories to estimate traffic demand, and the main research methods include those listed below. (1) The generalized least square method [
1,
2]. In this method, the least square sum of the errors between the observed variables and the estimated variables is found by modifying the OD demand matrix to be estimated. Because the mathematical formula of this method is simple and easy to understand, it is widely used in OD estimation. (2) The OD estimation method based on information entropy [
3]. The objective function of the least square method is the minimum distance between the observed variable and the estimated variable, while the objective function of the information entropy model is the maximum entropy or the minimum information to estimate the OD demand. (3) The maximum likelihood estimation [
4,
5]. The objective of this method is to maximize the likelihood function based on the observed flow and the historical OD matrix to estimate the OD matrix. (4) The Bayesian estimation method [
6]. This method is the same as the classic statistical method, assuming that the observed variables are random variables, and the Bayesian method also considers that the parameters to be estimated are random variables. The principle is to give the prior distribution of all the variables to be estimated and then derive the posterior distribution of all the variables to be estimated according to the collected sample information to give the estimated value and the corresponding confidence interval of the variables to be estimated. The derivation of a posteriori distribution often requires some complex integral calculus. The Monte Carlo simulation algorithm is typically used to sample and solve the estimation and confidence interval of a posteriori distribution. For other traffic network OD estimation models with new observation variables, we can refer to the OD estimation model based on Smartcard data [
7], cellular data [
8], license plate recognition data [
9] and the OD estimation model based on mobile location data [
10].
As a subproblem of traffic demand, the network detector placement problem can be used to identify the location of detectors to determine the unique unobserved road flow or to optimize the quality of traffic demand to be estimated. To obtain the only unobserved link flow, He [
11] used graph theory to solve the optimal detector layout strategy. Based on algebra method, some scholars build the corresponding network detector layout model [
12] according to the link, path and node correlation matrix. To optimize the quality of the parameters to be estimated, Simonelli et al. [
13] used the trace of the covariance of the posterior OD matrix to determine the optimal detector placement strategy based on Bayesian statistics. Yang and Zhou [
14] further proposed the maximum coverage criterion of path flow. Salari et al. [
15] analyzed the effect of sensor failure on the detector layout strategy.
Although different methods have been used to study the detector layout strategy and traffic demand estimation, the following deficiencies exist: (1) research on traffic demand estimation using multisource data fusion technology has seldom been used in the existing literature. (2) Research on the layout of traffic detectors does not consider the coverage information of road sections and path flows. (3) The lack of an analytical relationship between traffic demand and traffic flow from the perspective of traffic network theory and the relationship is helpful to establish the maximum likelihood traffic demand estimation model.
To address these deficiencies, this paper proposes an integrated model of detectors layout and maximum likelihood traffic demand estimation. The former calculates the optimal number and location of the detector layout by maximizing the road section and path coverage information and uses minimum variance weighted average technology to fuse the detected multisource data to use the maximum likelihood method to estimate the traffic demand. The successive detector identification algorithm and iterative algorithm are designed to solve the detector layout and traffic demand estimation model, and the Nguyen Dupuis and Sioux Falls networks are used to test the model and algorithm [
16,
17]. The contributions of this paper mainly contains: (1) proposed an integrated model of detectors layout and maximum likelihood traffic demand estimation. (2) designed the successive detector identification algorithm and iterative algorithm.
2. Detector Layout and Traffic Demand Estimation
2.1. Detector Layout Model
Traffic demand estimation can be regarded as the following parameter estimation problem in mathematics: some samples (such as road flow, speed, density, travel time, etc.) are observed to estimate parameters in the overall distribution (traffic demand). Under the current budget, road samples are observed to estimate the most reliable or most practical traffic demand, which is the premise scientific problem to be solved in this paper.
We determine the optimal detector layout quantity and location by studying the detector layout model, and the existing detector layout models unilaterally maximize the link coverage information or path coverage information. Under the given budget, this paper maximizes the coverage information of links and paths at the same time:
subject to
where
is a collection of all OD pairs in the transportation network and
is one of those OD pairs.
is the collection of all paths between OD pair
,
;
is the weight coefficient;
represents the prior flow on segment
;
is the prior flow on path
between OD pair
;
is the coverage factor on path
between OD pair
. The parameter is 1 if there is a road section with detectors in path
; otherwise, it is 0;
is the number of detectors;
is the cost of installing detectors for each road section;
is the general budget;
is the coverage factor on road section
, the parameter is 1 if there is the road section
with detectors, otherwise it is 0;
is the Section set,
;
is the Path link association parameters, between OD pair
, the parameter is 1 if the road section
is on path
; otherwise, it is 0.
Objective function (1) maximizes the coverage information of road sections and paths, where is the sum of the traffic volume of the road sections where the detectors are deployed; that is, the detectors will be preferentially deployed on the road sections with high traffic flow of the prior road sections. is the sum of the traffic volume of the path where the detectors are deployed. The difference between the two is that the objective function calculated by the former is larger because the former will repeatedly stack the path flow, while the latter considers the repeatability between paths. Equation (2) is the cost constraint of the detector layout; Equation (3) is the constraint of the total number of road detectors; Equation (4) is the relationship between the road section and path coverage information; and Equations (5) and (6) are the road section coverage factor and path coverage factor, respectively.
2.2. Multisource Observation Variable Fusion
According to the detector layout model, the number and location of detectors are calculated, and the traffic flow, speed, density and travel time are observed. Because the dimensions of the observed variables are not consistent, traffic flow theory is used to transform the dimensions of the observed variables to unify the observed multisource road information. In this paper, the green shields traffic flow speed density relation function is used to unify the multisource observation variables into link flow:
where
is the traffic volume on segment
,
is the estimated travel time on segment
,
is the length on segment
,
is the crowding density on segment
, and
is the maximum speed on segment
. Road travel time can be derived from road length and vehicle speed.
After the dimension is unified as the traffic volume of the road section, the method of minimum variance weighted average is used to fuse the multisource road flow to comprehensively reflect the traffic status of the road. The mathematical expression of the minimum variance weighted average method is as follows:
The flow weight coefficient
of each type of observation section is as follows:
where
is the link flow type, such as speed, density, time, etc., and
;
is the
-th type of road section flow; and
is its vector form, i.e.,
;
is the
-th type of weight coefficient, and
is its vector form, i.e.,
; from Formula (9),
;
is the variance of
-th type observed flow.
2.3. Maximum Likelihood Traffic Demand Estimation Model
Because the arrival of vehicles in unit time is random, this paper uses a Poisson distribution to describe the random traffic demand
. If the traffic volume is not very low, the traffic demand is approximately in the form of a multivariate normal distribution as follows:
where
is the mean vector form of traffic demand, i.e.,
. This paper assumes that the traffic demand is independent of each other, so the covariance matrix of traffic demand is
.
In the stochastic user equilibrium model, the relationship between path flow and OD demand is as follows:
where
is the flow on path
between OD pair
;
is the probability operator;
is the vector form of path travel time
, i.e.,
; where
is the route selection probability of travel users on path
between OD pair
and it can be obtained by calculating the following logit formula:
where
is the discrete parameter used to measure the degree of perception error of road travelers; where
is the estimated route travel time on path
between OD pair
. Therefore, the path flow also follows the form of a multivariate normal distribution is as follows:
where
is the mean vector form of path selection probability, i.e.,
. Because the road flow is the sum of the path flow, the central limit theorem shows that the road flow also obeys the form of a multivariate normal distribution is as follows:
where
is the mean vector form of flow of all sections, i.e.,
,
is the traffic flow on the road section;
is the section set;
is the path link incidence matrix.
According to the fusion of the observed link flow (Equation (8)) and the analytical relationship between link flow and traffic demand (Equation (14)), the likelihood function of traffic demand is established as follows:
where
is the probability density equation. It can be seen from the front that the road section flow
obeys the multivariate normal distribution, and its log likelihood function is as follows:
where
is the fused observed link flow.
Based on the fusion technology, i.e., the link flow obtained from Equations (8) and (9), the maximum likelihood traffic demand estimation model is established by using the bi-level programming theory, the upper model uses the maximum likelihood method to solve the traffic demand, and the lower model uses the SUE model to solve the path selection probability and the upper level is as follows:
upper level:
lower level: The SUE model (Equation (11)) is used to allocate the traffic demand to obtain the route selection probability.
3. Algorithm Design
To identify the number and location of detectors, a successive identification detector algorithm is designed to solve the detector layout model; that is, one optimal detector location is identified each time and added to the detector set, and the path coverage information is updated. This method is simple and easy to apply to practical engineering. When solving the maximum likelihood traffic demand estimation model, the iterative algorithm framework is used to calculate the upper and lower models repeatedly. The upper maximum likelihood model is solved by the steepest descent method, and the lower SUE model is solved by a successive average algorithm. The algorithm is as follows (Algorithm 1):
Algorithm 1: Successive recognition detector algorithm |
step 1.1 | Initialization: set the number of initial detectors ; weight coefficient ; prior traffic demand ; initial detector placement strategy ; initial path set . |
step 1.2 | Solving SUE model with successive average algorithm, obtain road flow , path flow and path link association parameters ; according to Equation (2), calculate the maximum number of detectors in the road section. |
step 1.3 | Link model coefficient: for each road section, the detector layout coefficient based on road section flow is calculated. |
step 1.4 | Path model coefficient: in the whole road network, the path flow in the path set is eliminated to obtain the new link flow , then the detector layout coefficient based on path flow is calculated for each road section. |
step 1.5 | Identifying a new detector position (except ) makes the objective function maximum, that is , and add it to the detector layout strategy . In addition, add all paths through the section to the path collection . |
step 1.6 | Convergence test: order , if , stop the calculation, is the optimal detector placement strategy; otherwise, go to step 1.4. |
According to the number and location of detectors obtained by Algorithm 1, observe traffic flow, speed, density, travel time, etc. on these sections. Then, Formulas (8) and (9) are used to fuse these observation variables, and traffic demand is estimated by Algorithm 2.
Algorithm 2: Iterative algorithm for traffic demand estimation model |
step 2.1 | Initialization: set the number of iteration steps ; convergence accuracy ; mean of traffic demand, so the covariance matrix of traffic demand is , at the same time, the initial traffic demand is set as ; link flow after fusion. |
step 2.2 | Solving the underlying model: Using successive average algorithm to solve SUE model, i.e., distribution demand , then the path selection probability is obtained. |
step 2.3 | Solving the upper model: probability of path selection in , using the steepest descent method to solve the auxiliary OD demand . |
step 2.4 | Update traffic requirements: order . |
step 2.5 | Convergence test: if , stop the calculation, is the optimal traffic demand; otherwise, order , go to step 2.2. |
The designed algorithms are programmed in MATLAB R2016a and tested on the notebook computer with Intel(R) Core(TM) i7-5600U CPU 2.60 GHz, 8 GB memory.
5. Conclusions
To obtain the basic data of urban road network planning and traffic control, a detector layout model and a maximum likelihood traffic demand estimation model are established by using the multisource data observed by modern information technology. The former provides the location and quantity of data monitoring for the latter, and the minimum variance weighted average technology is used to fuse the multisource data. Furthermore, the bilevel programming method is used to estimate the traffic demand. We design a successive detector recognition algorithm and an iterative algorithm to solve the two models.
We use the Nguyen Dupuis and Sioux Falls networks to test the performance of the proposed model and algorithm. The results show that the detector placement scheme considering both the coverage information of road section and path is more reliable, and the designed detector layout model can determine the optimal layout strategy for newly installed and refitted detectors. The disturbance of input parameters has a significant impact on traffic demand estimation, and the designed algorithm can achieve an accuracy of 10−3 in 20 s; the traffic demand estimation method based on multisource data fusion fully mines the observed data information to estimate the traffic demand close to the actual value. In a follow-up study, we will estimate the dynamic traffic demand in the congestion network and further use the estimated traffic demand to make proper traffic management policies. And apply the proposed model in the real urban traffic network.