1. Introduction
Intelligent vehicles are conducive to reducing traffic accidents and easing traffic congestion, which is an important direction of automobile technology development [
1]. Accurate target detection is an important condition for intelligent vehicles to drive normally in the increasingly complex road environment. However, the current sensors used for target detection all have different degrees of defects, such as a limited detection range and poor adaptability to climate and light, leading to incorrect detection information and other problems. In general, a combination of multiple sensors can expand the detection range and improve the detection reliability and robustness [
2]. Therefore, sensor fusion technology can be used to solve the problem of target detection.
Currently, the sensors commonly used for target detection on the market include lidar, millimeter-wave (MMW) radar, camera, and ultrasonic radar [
3]. Among them, MMW radar can work in all weather, and the detection of distance and speed is relatively accurate. The camera has a wide detection range and target type recognition ability [
4]. In addition, the two sensors are cheap. Therefore, the combination of an MMW radar and a camera has become a mainstream scheme for intelligent vehicles.
According to the differences in the data processing methods, sensor fusion can be divided into three levels, namely: data level, feature level, and target level [
5]. Data level fusion gathers the original data of each sensor and integrates them on the level of original information to obtain environmental perception results. Feature level fusion extracts the feature information from the original data output by sensors and fusing the feature information. Target level fusion requires each sensor to calculate the position, speed, and contour of the targets according to its own detection information, and to then conduct fusion according to the target information. The data level and feature level generally occur in the target detection stage, while the target level occurs in the target tracking stage. For the fusion of an MMW radar and a camera, because of the large difference of sensor data and high required communication capability, the fusion effect of the data level and feature level is not ideal. Therefore, the target level is more suitable [
6].
The key to target level fusion lies in data association and fusion track state estimation [
7]. Scholars have done some research on these issues. The authors of [
8] used a fuzzy clustering method to associate information from different sensors, and realized fusion estimation based on the parallel filtering method of a centralized fusion structure. The authors of [
9,
10] developed the weighted track association algorithm to complete the association of local tracks from different sensors, and the cross-covariance fusion algorithm was used to obtain the state estimation of the fusion track. The authors of [
11,
12] utilized the covariance intersection method to estimate the fusion track state. The authors of [
13] adopted the information of different sensors associated with the global nearest neighbor, and obtained a virtual measurement based on the weighted fusion of measurement errors, and carried out the filtering processing. The authors of [
14] associated the local tracks of different sensors based on Dempster–Shafer evidence theory, obtained virtual measurements based on maximum likelihood estimation, and processed them with filtering. In the literature [
15], a centralized extended Kalman filter is adopted to solve the problem of multi-sensor fusion in single-target tracking. The authors of [
16] used the joint probabilistic data association algorithm to complete the association of different sensor tracks and the state estimation of the fusion track. The authors of [
17] proposed a new sensor fusion method based on an information matrix, but it did not involve data association.
Through the analysis of the above works of literature, it is known that the current data association methods mostly adopt a single rule. Based on the target information, a statistical value is obtained according to the set rules, and a threshold value is set in advance. The statistical value is compared with the threshold value, to determine whether there is an association. However, in the actual environment, traffic targets are complex and changeable, so it is difficult to select an appropriate threshold to ensure a good association effect. In addition, for the state estimation of the fusion track, there is a problem with a large amount of the calculations. Therefore, based on the above studies, this paper further explores the target level fusion technology of an MMW radar and a camera. This paper designs a fusion algorithm framework based on a distributed structure, and divides the fusion algorithm into two tracking processing modules and one fusion center module. Each tracking processing module is divided into four parts, namely: pretreatment, data association, track management, and state estimation. In the fusion center module, the temporal–spatial alignment is completed, and a two-level association structure combining regional collision association and weighted track association is designed to associate the local tracks that are output by two tracking processing modules. For the fusion tracks, the state estimation is completed based on the non-reset federated filter. Finally, the global tracks’ information obtained by the sensor fusion is output.
The paper is structured as follows, we design the fusion algorithm framework and define different modules in
Section 2. Then,
Section 3 designs the tracking processing module, and
Section 4 designs the fusion center module.
Section 5 verifies the feasibility of the proposed algorithm through experiments. Finally, the concluding remarks and future works are presented in
Section 6.
2. Algorithm Framework
Common processing structures for target level fusion include centralized and distributed structures [
18]. The centralized structure has only one data processing module, which is also the fusion center. The measurement information detected by each sensor is transmitted to the fusion center, which associates the fusion track, and the measurement also updates the fusion track state. The distributed structure has multiple tracking processing modules and a fusion center module. The measurement information of each sensor is transmitted to the corresponding tracking processing module, and the tracking processing module outputs the local tracks of the sensor. The local tracks of each sensor enter the fusion center, and the fusion center processes the local tracks and obtains the global tracks, which are the final result of the fusion algorithm.
Compared with the centralized structure, the distributed structure has a good stability and low requirements on the communication ability and computing speed of the system. Therefore, the distributed structure is selected as the basic structure of the fusion algorithm in this paper, and the designed fusion algorithm framework for an MMW radar and a camera is shown in
Figure 1. The framework divides the fusion algorithm into two tracking processing modules and one fusion center module. The tracking processing module receives the sensor measurement information and carries out multi-target tracking processing, which is divided into several parts, including pretreatment, data association, track management, and state estimation. The fusion center module is mainly divided into several parts, including temporal–spatial alignment, data association, fusion track state estimation, and global track generation. When the system works, the MMW radar and camera separately output measurement information through the CAN communication, and the measurement information enters their respective tracking processing modules. The tracking processing module runs a multi-target tracking algorithm and outputs local track information, including track state and state covariance, which will enter the fusion center module. The fusion center module first registers the local tracks of two sensors in time and space, and then makes the association between the local tracks. The local track is divided into two parts through the track–track association algorithm. One part is the successfully associated tracks, which is called the residual track, and its state information is the same as that of the local track. The other part is the successfully associated tracks, which generate the fusion track, and then estimate the state of the fusion track according to the corresponding local track. The fusion track and residual track constitute the global track, which is the output information of the algorithm framework.
Section 3 and
Section 4 will design the tracking processing module and the fusion center module, respectively.
3. Tracking Processing Module
Figure 2 shows the algorithm structure of the tracking processing module. After the measurement information is entered into the tracking processing module, the measurement targets that influence the running of the ego vehicle are firstly selected by combining with the status information of the ego vehicle, and pretreatment is completed. Then, the measurement is associated with the current local tracks. We assume that there is a local track,
i, and the measurement prediction is
at time,
k, defined here:
where
is the value of measurement target
j;
is the covariance matrix of innovation; and
is the weighted norm of innovation vector, which can be understood as the statistical distance between the measurement prediction information of the local track and the measurement target.
The statistical distance is taken as the association reference, and the Kuhn–Munkres algorithm is taken as the allocation reference, and the association between track and measurement is completed according to the global nearest neighbor idea [
19]. The relationship between the measurement and track after the association is completed can be divided into three categories, namely: if the measurement and track are successfully associated, the track is not associated with any measurement, and the measurement is not associated with any track. These association results will be fed into the track management section.
The main function of track management is to manage the generation, maintenance, and disappearance of the track. Track management can solve the problem of false measurement and missing target detection [
20]. The measurement target that is not associated with any track is a generated temporary track. For the continuous multi-frame successfully associated temporary track, what can be considered as a real and confirmed track is generated, which is also local track. For the confirmed track, if there is no association measurement in continuous multiple frames, the track can be considered dead and discarded [
21]. After the rule judgment, it is necessary to update the status of the confirmed track and temporary track to obtain the optimal state estimation. For the track that is determined to be dead, it is deleted from the track list without any state update.
State estimation is divided into state prediction, measurement prediction, and state update, which is the same as the Kalman filter [
22]. Assuming that the acceleration of the target is constant in a short time, a motion model with constant acceleration can be established. The target state vector is
where
is the position vector,
is the velocity vector, and
is the acceleration vector. Then, the motion state model can be obtained
where
,
.
is the white noise sequence in a discrete model, and
and
correspond to the target’s noise “jerk” along the x- and y-axis, respectively.
The MMW radar can detect the target distance, azimuth, and relative velocity, and its measurement vector be expressed as follows
The corresponding measurement model is as follows:
where
represents the measurement white noise sequence. Because of the nonlinearity of the measurement model, an extended Kalman filter is used to estimate the track state of the MMW radar.
A camera can detect the target distance, and its measurement vector can be expressed as follows
The corresponding measurement model is as follows:
where
represents the measurement white noise sequence.
is the measurement matrix
Both the motion state model and the measurement model are linear, so the track state of the camera is estimated through a linear Kalman filter.
4. Fusion Center Module
In the fusion center module, the camera detection cycle is taken as the fusion time node. In other words, during the camera detection cycle, the fusion center module will start to run after the camera tracking processing module is completed. First, the local track information of the two sensors is transformed into the same coordinate system to ensure spatial registration. In this paper, the motion coordinate system of the MMW radar is used as the fusion track coordinate system, so we only need to convert the local track information of the camera, which only involves to the conversion of a two-dimensional cartesian coordinate system, which will not be described in detail here. Moreover, every time the camera outputs a set of CAN messages, the track information obtained from the MMW radar in the previous N-cycles is fitted with the quadratic curve according to the least square method. Then, the fitted curve is extrapolated to the current time node of the camera in order to obtain the estimated value of the MMW radar track information [
23]. The temporal–spatial alignment is completed. The following parts mainly design the track–track association and fusion track state estimation.
4.1. Track–Track Association
4.1.1. Association Algorithm Design
The measurement information of the two sensors is processed by their respective tracking processing modules in order to obtain effective targets, namely a local track. Intuitively, if the effective targets of the two sensors are close enough together, the two targets can be considered to be associated. This is an association method based on the location threshold. Specific to the MMW radar and the camera used in intelligent vehicles, in general, the longitudinal distance measurement of the MMW radar is relatively accurate, the lateral distance measurement is relatively rough, while the camera is just the opposite. This leads to a large deviation between the position of the MMW radar target and the camera target, and the greater distance between the target and the ego vehicle, the greater the deviation.
It is difficult to get good association results only depending on the position threshold, so the motion state information of the target can be further considered. The tracking processing module outputs the local track information, which can be used to compare the similarity of the motion state between the different sensor targets. The track information is used to determine the degree of association, and commonly-used methods include weighted track association [
24], etc. However, when the environment is complex and there are many targets, the correlation performance of the weighted track association method will decrease, and there will be many errors and omissions in the associated track.
After comprehensive consideration, this paper designs a two-level association structure, as shown in
Figure 3. Firstly, the regional collision association algorithm is designed based on the idea of a location threshold. Then, the unassociated local tracks are input into the weighted track association part. Because one association has been passed, the number of targets that need to be associated is decreased, the environmental complexity is reduced, and the weighted track association can play a better role.
4.1.2. Regional Collision Association
The selection of the position threshold is related to the state uncertainty of the local track, which is expressed by the state covariance in the state estimation process. In this paper, the rotation in the target motion process is ignored, and the rectangular uncertain region is established with the current local track position as the center, as shown in
Figure 4. The length and width of the uncertain region are related to the position standard deviation of the local track in the longitudinal and lateral directions, respectively. For the local track,
i, with state
X and state covariance
P, the length and width of the uncertain region are, respectively, as follows
where
and
are the constants. Because there is a two-level association, the first-level association can select a smaller threshold to ensure that the association is valid.
Regional collision association means that if two local tracks belonging to different sensors intersect with their uncertain regions, the association of two local tracks can be determined. The pseudocode to execute the regional collision association is shown in Algorithm 1.
Algorithm 1 Regional Collision Association |
1: | ifthen |
2: | return false |
3: | else ifthen |
4: | return false |
5: | return true |
In order to quantify the degree of association between two local tracks, the concept of the Jaccard coefficient is cited. The Jaccard coefficient refers to the ratio between the intersection and union of the two sets. Here, the uncertain regional area of the local track is used to refer to the set, and the expression can be obtained as follows
where
J is called the association similarity index, which represents the association degree of two local tracks.
and
represent the uncertain region area of the local tracks of the MMW radar and camera, respectively.
FusionLife is set up to indicate the stability of the fusion track. When the fusion track is initially formed, the FusionLife value is 0. If the local tracks corresponding to the fusion track are all associated in the subsequent continuous period, then the FusionLife value is accumulated. When FusionLife reaches the set threshold of FusionLifeMax, it indicates that the fusion track is relatively stable. Then, in the later fusion time node, the corresponding local tracks that do not need to participate in the association algorithm can be directly used to update the state of the fusion track. If the FusionLife is less than FusionLifeMax at a certain fusion time node, and the corresponding two local tracks are not associated, the fusion track will die out.
For the fusion track obtained through regional collision association, the
FusionLife value accumulation mode is as follows
where
is a constant coefficient.
4.1.3. Weighted Track Association
It is assumed that for an MMW radar and a camera, there are local tracks,
i and
j, respectively. Through previous tracking processing modules, the state estimation of two local tracks is
and
, and their state covariance is
and
. The state estimation difference of two tracks is expressed as follows:
The null hypothesis and alternative hypothesis are established, and the track association problem is transformed into a hypothesis testing problem.
: and are the track state estimation of the same target, namely, track i and j are associated;
: and are not the track state estimation of the same target, namely track i and j are not associated.
It is assumed that the state errors for the local tracks of the same target are statistically independent. Under the
assumption, the state estimation differences covariance of track
i and
j can be expressed as
The statistical value of weighted track association is as follows
Under the assumption, the state estimation difference obeys gaussian distribution, and the statistical value obeys chi-square distribution. The chi-square distribution association threshold is selected. When is less than , the hypothesis is accepted, and tracks i and j are considered to be associated. Otherwise, we accept the hypothesis that tracks i and j are unassociated.
For the fusion track obtained through the weighted track association, the
FusionLife value is accumulated in the form of
where
is a constant. The associated quality is not evaluated here, so only a set constant
is used as the added value of
FusionLife.
4.2. Fusion Track State Estimation
A federated filter is applied to the distributed fusion structure [
25], which can be used to build the connection of the state estimation part between the tracking processing modules and the fusion center module. A federated filter can be generally divided into four basic structures, namely: fusion-reset mode, zero-reset mode, no-reset mode, and rescale mode. In the non-reset mode, there is no information reset from the master filter to the sub-filters, so the sub-filters will not pollute each other. The non-reset mode is fast in computation and strong in fault tolerance. This paper designs a fusion track state estimation method based on the non-reset federated filter structure, as shown in
Figure 5. The figure only estimates the fusion state for a single target, where
and
are the radar measurement and camera measurement associated with the target, respectively.
and
represent the state estimation output of the two sub-filters, and
represent state estimation output of the master filter, which is also the state information of the fusion track. The extended Kalman filter corresponds to the state estimation of the MMW radar track, and the Kalman filter corresponds to the state estimation of the camera track. They have been designed in the tracking processing module.
The workflow of the federated filter includes the initial information determination, information allocation, time update, measurement update, and information fusion. Among them, the time update and measurement update of the two sub-filters belong to the state estimation part of the tracking processing modules, which will not be detailed here.
The target motion model adopted by the MMW radar and the camera is the same, and the target state format output by the sub-filters is the same. Therefore, the output state estimation information of the two sub-filters can be integrated into the master filter. At the initial time of fusion, the system needs to determine the initial information.
The global estimation error covariance
and system process noise
at the initial moment can be calculated from Equations (17) and (18).
For a non-reset federated filter, the information is allocated only at the initial time. The initial information is generally distributed evenly. However, because of the different measurement accuracies of the millimeter wave radar and camera, if the initial information is distributed equally, the global estimation accuracy will be reduced. Now,
and
are set to represent the sum of the state covariance singular values of two sub-filters, respectively, and the singular values are used to calculate the two information allocation coefficients.
and
represent the initial information allocation coefficients of the millimeter wave radar and camera local tracks, respectively. The coefficients are used to assign information about sub-filters and to update their initial information
where
represents the local track information of the millimeter wave radar,
represents the local track information of the camera. In the information fusion part, the local state estimation information obtained by two independent sub-filters is fused to obtain the global optimal estimation