1. Introduction
In recent years, the development of advanced driving assist systems (ADAS) has attracted a large amount of research and funds from major car factories and universities. The key issues of ADAS include on road object detection, anti-collision technology, park assist system, etc. Three kinds of sensors (i.e., radar, Lidar, and camera) are widely adopted for object detection in front of vehicles [
1,
2,
3,
4,
5]. Since there are limitations of single sensors, multi-sensor fusion technology can be used to compensate for the disadvantages of each single sensor [
6,
7].
In reference [
8], by using background subtraction and a Haar wavelet translation, the foreground image was transformed into a second-order feature space. Then, based on the concept of a histogram of original gradients (HOG), horizontal and vertical high-frequency components were obtained. In a hierarchical SVM classifier architecture, the proposed system can classify pedestrians, automobiles, and two wheeled vehicles effectively. Yang et al. [
9] used an optical flow method to calculate the motion vectors of the objects. Subsequently, the focus of expansion (FOE) of each object was found by using voting. By using the concept of a hierarchical decision tree, false alarms for detection (e.g., shadows or ground marking lines, etc.) can be avoided. Finally, the collision time was calculated by using the motion vectors of the objects.
Millimeter-wave (MMW) radars detect objects by transmitting electromagnetic waves onto the objects and analyzing the reflected waves that are not affected by light and weather. These radars can measure the relative distances and speeds of objects. However, millimeter-wave radars are susceptible to noise and environmental interference. To address the issues related to the microwave radar noise, Park et al. [
10] proposed applying a statistical model to the radar using hybrid particle filter to track the preceding vehicle.
The laser range finder is an electronic measuring instrument that uses a laser to accurately measure the distance to the target, which exhibits the advantages of high measurement accuracy and good stability. Nashashibi et al. [
4] developed a method to detect, track, and classify multiple vehicles by means of a laser range finder mounted on a vehicle. The classification was based on different criteria: sensor specifications, geometric configuration, occlusion reasoning, and tracking information. The system was tested in highways and urban centers with three different laser range finders.
In contrast with range finder sensors, camera sensors are not only cost-effective but can also provide other useful information. Many novel vision-based object detection algorithms for the front of vehicles have been proposed in the past decade. Vehicle detection and vehicle distance estimation systems were proposed in reference [
11]. By using the histogram of an oriented gradient (HOG) feature and support vector machine (SVM) classifier, the authors can segment the road area and identify the shadow area under the vehicle in which to detect the vehicle position. Guo et al. [
12] used a two-stage detection algorithm for pedestrian detection. First, the candidate regions were decided from foreground image, then the edge features of object were identified in the second stage. The experiment result verified the accuracy of the proposed method.
Despite the advantages exhibited by all sensors, they have limitations that affect their object detection abilities. For instance, cameras are susceptible to light and environmental factors, and the radar stability is affected by the relative speed and surrounding environment. Hence, a sensor fusion mechanism is developed to compensate for the deficiencies of relying on a single sensor.
The series type fusion architecture based on laser and vision sensors was addressed in reference [
13]. The proposed system can quickly find the region of interesting objects without a huge amount of computation time. The other advantage was that after the verification and comparison of each sensor, the overall false alarm rate was reduced. Wang et al. [
14] proposed a system scheme for on-road obstacle detection by fusing an MMW radar and a monocular vision sensor. An experimental method to investigate the radar-vision point alignment was proposed. In addition, a region searching method for potential target detection was proposed to reduce image processing time. Wang et al. [
15] proposed a tandem sensor fusion of series connection architecture that uses MMW radar to obtain the candidate position of the detected object. The position coordinates are converted into image coordinates that considered as regions of interest to reduce the number of window searches. Then the image is used to recognize and track the vehicle in candidate areas. A Kalman filter is used to compare the tracking trajectory of the radar and camera to improve the vehicle detection rate and reduce the false positive rate.
In the aforementioned references, using a single sensor to detect objects has significantly reduced the detection system cost; however, the system stability is still a challenge when considering special weather conditions. The main purpose of using series architecture in sensor fusion is to rapidly determine the candidate area via radar or Lidar and accelerate the image search process. Another advantage of using a second layer sensor is to reduce noise interference after verification and comparison. However, the entire tandem architecture system will fail when one of the sensors fails.
This paper extends our earlier vision based research work [
16] and proposes a set of MMW radar and camera fusion strategies based on a parallel architecture that can compensate for the failure of a single sensor and enhance the system detection rate using the complementary characteristics of the sensors. The radar subsystem provides noise filtering, tracking, and credibility analysis. The two-stage vision detection subsystem can rapidly identify the candidate area form image. The fusion strategy of parallel architecture systems depends on the confidence index of each sensor. Three kinds of scenario conditions (daytime, nighttime, and rainy-day) are implemented in an urban environment to verify the proposed system.
The contributions of this study include the following:
In order to solve the shortcomings of each single sensor, by using sensor fusion technology, we integrated the two sensor systems and improved the reliability of the systems.
For the fusion architecture of series type, any single sensor failure causes whole system failure. The proposed parallel architecture system depends on the confidence index of each sensor. The system can compensate for each other’s sensors and avoid the limitations of series fusion architecture.
Three kinds of scenario conditions (daytime, nighttime, and rainy-day) were implemented in an urban environment to verify the proposed system’s viability. The experiment results can provide the baseline of comparison for future research.
2. System Architecture
This study proposed a sensor fusion technology integrating MMW radar and camera for front object detection. The proposed system consists of three subsystems, including a radar-based detection system, vision-based recognition system, and sensor fusion system.
The image captured by the camera can easily be affected by lighting and weather conditions. Furthermore, the estimated distance of the front object derived from the camera image has a low precision. A sufficiently large velocity relative to the front object is necessary for the MMW radar to stably detect it. Accordingly, these two sensor subsystems were combined in a parallel connection to compensate for the limitations of each sensor and improve the robustness of the detection system. The overall architecture of the proposed detection and recognition system is shown in
Figure 1.
A clustering algorithm and particle filter were applied to the MMW radar data to achieve noise removing and multi-object tracking. Then the object detected by the coordinate system of radar sensor was converted into an image coordinate. On the other hand, two-stage classifiers were implemented for the foreground segmentation and object recognition for the image data, respectively, then the object information could be obtained. Finally, a radial basis function neural network (RBFNN) was used to fuse the detected object information from the MMW radar and camera.
3. Radar-Based Object Detection
A 24 GHz short-range radar was adopted for front-end environment detection and a multi-object tracking method based on radar was proposed. This method can facilitate tracking multiple object simultaneously and removing noises, which were considered as non-real objects. The flow chart of the proposed radar-based detection subsystem is shown in
Figure 2. First, the radar data were divided into different clusters using a clustering algorithm. The particle filter is then used for signal filtering and target tracking. Two kinds of probability scores will be evaluated in the particle filter process. The convergence of the particle swarm can reflect the quality of the tracking. For the stable tracking objects, the particles around the object have a higher weighting in the importance sampling step. Furthermore, these particles have a higher probability of survival in the resampling step. We define the range probability (
) as the survival probability of the particles within a radius of 1 m around the object to evaluate the quality of the tracking. On the other hand, the diversity of the particle swarm can cover of all the states of the object. We defined the available probability (
) as the survival probability of the particles after the resampling step. During the tracking process, in line with the value of
, the system adjusts the particle percentage of resampling to ensure the diversity of the particle swarm. In addition, the confidence index of the target object was derived from the range probability and probability of survival. This confidence index determines the credibility of the actual object. The relative velocity and distance between the vehicle and front object were provided by this subsystem.
3.1. Radar Data Pre-Processing
The MMW radar signals are electromagnetic waves. Both reflection and refraction will occur when the electromagnetic waves occur on the medium. In addition to the reflected wave from the medium itself, some noise signals of non-real objects are also prone to appear. The relationship between relative distance and echo intensity information was statistically analyzed using a vast amount of data collected during experiments. The statistical results are shown in
Figure 3. The statistical results of the signal distribution indicate that both real objects and noise show respective concentrations, and only a small part of the distribution of both overlaps. Accordingly, a noise filtering operation was performed. As shown in
Figure 3a, after the signal on the left side of red curve was filtered, the subsequent target tracking and particle filter algorithm were performed. Density-based spatial clustering of applications with noise (DBSCAN) algorithm [
17] was used to cluster the radar data, and the number of possible front objects was estimated.
3.2. Particle Filter
A particle filter [
18] is widely used in many fields, including object tracking, signal processing, and automatic control. In this study, particle filtering was used to filter the radar signal and track the objects in front of a vehicle. The particle filter algorithm uses a finite number of particles to represent the posterior probability of some stochastic process with partial observations. Each particle has the respective weight values that represent the probability of the particle being sampled from the probability density function. The procedure to implement a particle filter algorithm in this study was roughly divided into four steps as follows:
3.2.1. Particle Initialization
To cover all the potential object positions, n pieces of particles were randomly distributed within the radar detection area. Each particle represents a potential position of a real object, where the weight of the particle indicates the probability that the object is at this location.
3.2.2. State Prediction
The state of the object changes over time. Discrete time was used to calculate the object state, and the state of the particle at next moment was predicted by the state and motion model at time
. Then the prior probability
was obtained. The equation used to predict the object state is expressed as follows [
19]:
with
where
is the sampling time of the radar sensor,
denotes the state vector, and
and
denote the relative lateral distances between the target object and the sensor at the current time and the previous moment, respectively.
and
are the relative longitudinal distances between the target object and the sensor at the current time and the previous moment, respectively.
and
represent the lateral and longitudinal relative speeds of the target and the sensor, respectively.
is zero-mean Gaussian white noise.
3.2.3. Importance Sampling
This step is based on the concept of a Bayesian filter. The particles that are obtained during the state prediction stage and the information obtained from MMW radar are used to estimate the target position. The Bayesian theorem is used to update the prior probability then obtain the posterior probability. In this step, each particle is assigned a weight. Based on the assumption that the radar measurement area is
blocks, each block unit is 1
. The measurement model of the radar sensor is expressed by Equation (3),
where
is the measured noise in
block and its Gaussian white noise with the means equal to 0 and the variance
, while
is the signal strength of the object in the
block and its point spread function [
20] is expressed as follows:
where
and
are the block sizes,
is echo strength of the MMW radar,
is the blurring degree of the sensor, and the weight value of the particle can be obtained by the following equation:
The weight of each particle in the space region is normalized. The normalization method is based on dividing the weight of each particle by the sum of all particle weights, as shown by Equation (6):
After the weight of each particle is obtained, the relative position of the object detected by the MMW radar can be estimated. The expected value of the target estimation is expressed as follows:
3.2.4. Resampling
The method of estimating according to the weight of each particle is referred to as the sequential importance sampling (SIS) particle filter [
18]. However, this method involves particle degradation, leading to insignificant weight values of most particles after several iterative operations. This triggers the system to perform unnecessary calculations on these particles. Thus, the real target position may not be covered by the remaining particles. The resampling method was used to address this issue. In each iteration process, the particles with smaller weight values were discarded and replaced by particles with larger weight values. After resampling, the weight values of all particles was set at
, then the next iteration was performed with new particles. The expected value of the target estimation is expressed as follows:
3.3. Experimental Verification
A lot of object information was lost while the MMW radar information was processed by internal algorithms. Therefore, the original unprocessed data was obtained from the MMW radar in this study. The proposed particle filter algorithm was used to track the front object and address the issue of losing too much information.
To verify the feasibility of the algorithm proposed in this study, a laser range finder with high precision was used. The measurement error of the adopted lase finder was
mm to record the center position of the frontal object. The experimental equipment installed to verify the radar tracking system is shown in
Figure 4. Three verification conditions were set to avoid dark objects and lack of relative speeds, which can lead to losing laser range finder and radar information, as follows: metal and light-colored moving objects, a relative velocity of
km/h or more, and objects moving from far away to nearby.
The position of the object measured by the laser range finder is considered as the ground truth, which is illustrated by the blue line seen in
Figure 5. The red line represents the tracking result obtained by the proposed particle filtering algorithm. The result of the internal algorithm of the radar sensor is illustrated by the green line. An offset between the detected and actual positions of the object may be observed owing to the characteristics of the radar sensor.
The error and standard deviation of our proposed particle filter tracking algorithm and the internal algorithm of the radar sensor were compared to the ground truth to verify the tracking results. The error is defined as the absolute value of the estimated position from the algorithm and the ground truth. The average error is the sum of the errors divided by the number of times of detections. As shown in
Table 1, the proposed algorithm had better performance considering the average error, the maximum error, and the standard deviation of error of the longitudinal or lateral direction. In addition, the number of times the proposed algorithm effectively detected objects was also greater than that obtained by the sensor internal algorithm.
5. Sensors Fusion and Decision Mechanism
A single sensor system can operate independently; however, a parallel architecture was adopted in this study to fuse two different sensors. The main purpose of this is to improve the detection rate that can be achieved by a single sensor. The sensor fusion was divided into three parts. First, the two-dimensional coordinate information of the MMW radar was converted into the coordinate of the image. Afterwards, the information obtained by the two sensors was integrated into the same coordinate system. Next, the object information needed to be matched to determine whether the same object information had been obtained by both the MMW radar and camera, and to integrate the detection results of the two systems. Finally, the trusted sensor was determined based on the confidence index of the sensor.
5.1. Coordinate Transformation
The supervised learning algorithms was used to learn the relationship between the MMW radar coordinate and image coordinate system. Before the coordinate transformation, the radar coordinates (
x,
y) and image coordinate (
u,
v) needed to be recorded synchronously to be considered as training samples for offline learning. An MMW radar uses electromagnetic waves as a medium, and it exhibits better reflective property to metal objects. Hence, a triangular metal reflector was used as a target object to gather data obtained from the radar and the camera, as shown in
Figure 8. A metal reflector was randomly placed in a straight lane at a distance which ranged from 1 m to 12 m in front of the experimental vehicle, and a total of 280 training samples were established.
The camera was installed at an angle parallel to the horizon. When the target object moved from far away to nearby, the position of its center point slightly changed near the center point of the image in the vertical direction. Thus, the variation in the image -direction coordinate was not obvious. Therefore, the fusion system primarily enabled the neural network to learn the relationship between the MMW radar coordinate (x, y) and the image coordinate (u, v).
From the collected training samples, the longitudinal and lateral distances from the radar were considered as the input of the RBFNN, and the corresponding
u coordinate of horizontal direction in the image was considered as an output. This network architecture allows for obtaining the coordinate conversion relationship between these two sensors. The network architecture is shown in
Figure 9.
5.2. Object Match
The MMW radar detection and image recognition systems operate independently, and the two systems obtain information about the detected objects, respectively. To fuse the information of the two systems, the object information must be matched first to determine whether the same object information has been detected by the two sensors. Coordinates shown in the same image may correspond to several different radar coordinate information, as illustrated by the green points shown in
Figure 10. In addition, the distance estimated from the image coordinates may be inaccurate owing to the bumpy road surfaces that can cause the vehicle to shake; thus, it is difficult to match the object information and effectively determine whether the same object is detected.
Another RBFNN is used to match the object information and determine whether the same objects are detected by the two sensors. Six factors were entered as the network inputs, which affect the object match, including image coordinate u, object width, object height, object distance estimated from image, object distance measured by the radar, and the u coordinate converted from the radar to the image. Either “match” or “non-match” were obtained as the network output.
5.3. Decision Strategy
If a single sensor in the sensor fusion of cascade architecture fails, then the entire system will inevitably fail. Meanwhile, the sensor fusion of parallel architecture determines which sensor should be trusted based on the decision mechanism. Although one of the sensors might not detect an object or gives a false alarm, if the other sensor correctly detects the object, then the confidence index of each sensor can be calculated via a scoring mechanism, and a credible subsystem can be determined based on the confidence index.
The confidence index of the radar subsystem was calculated as follows:
where
is the number of times the object tracked by particle filter.
is a constant.
The confidence index of the image subsystem was calculated as follows:
where
denotes the distance from the input data point to the SVM hyperplane,
is the number of times the object tracked in image subsystem, and
and
are constants.
The confidence index of the sensor fusion system was expressed as follows:
When the confidence index is greater than the set threshold , the reliability of the system is extremely high, and the output result obtained by the system represents the real situation. If the confidence index of each subsystem is greater than the threshold , then the subsystem with the highest score is responsible for the entire system decision making process.