1. Introduction
The tracking of moving targets within surveillance areas plays a crucial role in various fields such as automotive driving and robotic systems [
1]. With the increasing demand for accurate tracking, high-resolution sensors such as Radar and Lidar have gained significant importance. In traditional tracking scenarios, e.g., air traffic control, each target produces at most one detection. However, in many recent applications, such as autonomous driving, this is not the case, because the resolution of the sensor is high enough that the detected object may occupy multiple resolution cells and thus generate multiple measurements. In the literature [
2], such scenarios are defined as the extended object tracking (EOT) problem, which is also referred to as the extended target tracking (ETT) problem. In the ETT problem, the shape of the target is not negligible, requiring the tracking algorithm to estimate both its shape and kinematic states. Moreover, the data association and the track management in the ETT problem are far more complicated than in the point target tracking scenario. The ETT has become a key problem in the automotive system. Given its importance in automotive systems, extensive research has been conducted on the ETT problem, utilizing the random finite set (RFS) theory and nonlinear estimation methods [
3,
4,
5,
6]. Recently, deep learning-based methods have also been proposed to address the ETT problem [
7,
8]. A comprehensive tutorial for the ETT problem is given in [
1].
The ETT problem can be divided into two parts: the non-point target measurement model and the multiple-target tracking (MTT) problem. Some commonly used non-point target measurement models are the random matrix (RM) [
9,
10] model and the random hypersurface model (RHM) [
11]. The RM model, first introduced in [
9], assumes that the measurements of extended targets are distributed over the entire ellipse surface, following a spatial Gaussian distribution. A semi-positive definite (SPD) matrix is employed to describe the contours of the ellipse. However, the original RM model neglects the measurement errors caused by the sensor, which can accumulate and lead to an overestimation of the object’s shape. To address this limitation, Koch enhances the measurement model in [
10] by taking the measurement error into account. In addition, the interacting multiple model (IMM) is integrated into the RM model, enabling the filter to track maneuvering targets. Further investigation into the IMM model’s utilization for maneuvering extended target tracking is presented in [
12]. A joint detection, tracking, and classification (JDTC) approach for multiple extended objects is proposed in [
13]. This also uses an RM to model the ellipse. In many situations, it is not accurate enough to model the target as an ellipse. Therefore, Lan [
14] proposes an approach that uses multiple sub-ellipses to model the extended targets with complex shapes. Another common method assumes that extended targets are star-convex and use the RHM to derive the measurement model. RHM-based methods assume that the measurement sources have certain spatial distributions and are located on differently scaled inner surfaces of the target. The implicit measurement model is then updated with a non-linear filter such as an extended Kalman filter (EKF). Based on the RHM, the radial function is introduced in [
15] to explicitly describe the surface of the non-point target, and the Fourier series expansion is used to approximate the radial function. Sun et al. [
16] propose a filter to track number-variable maneuvering extended targets using the RHM and the IMM. The B-spline model is another approach to modeling star-convex targets and is first introduced by [
17]. Compared to the RHM and the RM model, the B-spline model is capable of describing arbitrary shape extended targets using the control points on the B-spline. In [
18], the B-spline model is used under the Poisson multi-Bernoulli mixture (PMBM) filter framework to track non-ellipsoidal targets. The B-spline model is also used to track elongated targets such as trains and pedestrians [
19]. The Gaussian process (GP) has been widely used in machine learning and other fields because of its easy-to-compute posterior probability and excellent analytical properties [
20]. It is first proposed in [
21] to use the GP regression to estimate the radial function. The GP model can describe arbitrary shaped extended targets and is computationally more efficient than the B-spline model. At the same time, the estimation of the target shape is more accurate than the ellipse approximation models. A new temporal covariance kernel for GP regression is proposed in [
22] that could improve the accuracy of object shape estimation. The dependency of connected extended targets is utilized in [
23] and the GP model is used to estimate the shape of the extended target.
The MTT problem is challenging due to the track associations, miss detections, etc. However, the RFS theory [
24] proposed by Mahler provides a rigorous treatment of the MTT. Their method is completely free of explicit data associations, unlike traditional MTT techniques such as the joint probabilistic data association (JPDA) filter and the multiple hypothesis tracker (MHT). Furthermore, with the improvement of hardware [
25,
26,
27,
28], it is possible to employ algorithms based on the RFS theory. The optimal Bayesian filter derived from the RFS theory involves high-dimensional integrals. Therefore, Mahler proposes several theoretical approximation methods, such as the PHD filter [
29] and the MB filter [
24]. Among these methods, the PHD filter stands out as the most computationally efficient and memory-friendly, making it a practical choice given current computational resources. The first attempt to address the ETT problem within the framework of the PHD filter is proposed by [
2]. A Gaussian mixture implementation of [
2] is given in [
30] called the extended target Gaussian mixture PHD filter (ET-GM-PHD). Nonetheless, both methods in [
2,
30] only estimate the kinematic states of the target and omit the shape which contains crucial information about the target. In [
31], the RM approach is first used in the PHD framework to estimate the shape of an unknown number of extended targets in the presence of clutter and miss detections. Granström [
32] further considers target spawning and combination and proposes the GGIW-PHD filter, which is the state-of-the-art ETT tracker and has been successfully tested in a real marine surveillance scenario [
33].
Although 3D data, generated by sensors such as depth cameras and Lidars, are becoming increasingly accessible, there have been few attempts to address the ETT problem in 3D space. The relationship between the ellipse and symmetrical SPD matrix is analyzed and the ellipse fitting approach (EFA) is proposed [
34]. In [
35], a generalization of the RHM to 3D space is proposed; however, only cylinder targets are suitable for this method. Current approaches for 3D ETT usually use simple geometries such as cylinders and ellipses and are not able to model complexly shaped extended targets. The GP model approach to tracking 3D targets is proposed in [
36,
37], but it only considers the single-target situation. Therefore, to solve the 3D ETT problem, the combination of a complex-target model and multiple-target tracking filters, such as the PHD filter, is required.
In this paper, we propose a method that uses a 3D radial function to describe the shape of the target. The GP regression model is adopted to estimate the value of the radial function. We transform the regression problem into a state estimation problem. This approach allows us to use the Bayesian filter paradigm to estimate the unknown shape of the target and can be integrated into the kinematic estimation process. The computationally efficient PHD filter is combined with the measurement model to track an unknown number of extended targets in the presence of clutter. The kinematic states and the shape are then inferred by the EKF. The proposed filter is capable of tracking multiple extended targets with complex shapes in the presence of clutter. To demonstrate the performance of the algorithms, we simulate point cloud measurements in MATLAB for two dynamic objects with different shapes in the presence of clutter. We evaluate the performance of the filter using both the optimal sub-pattern assignment (OSPA) [
38] and the intersection-over-union (IoU) value. The proposed method outperforms the traditional GGIW-PHD filter in the simulation experiment. We also test the stability of the two filters at different measurement rates. It is shown that the proposed filter performs better than the GGIW-PHD filter under low-measurement-rate conditions.
3. Gaussian Process Regression Measurement Model
The Gaussian process has been widely used in tasks such as classification and regression. A Gaussian process
can be defined by its mean function
and the correlation function
, which are given as
Therefore, the GP
can be denoted as
The Gaussian process could be regarded as the generalization in the infinite dimension of a joint Gaussian distribution, and so the function values of
input
are Gaussian distributed as
where
We assume that a training set is given, whose input is
and output is
. The measurement model of the training set is
The Gaussian process regression is used to estimate the output
of some given test set inputs
. The training set, together with the measurement model, leads to the following distribution
The conditional distribution of the joint Gaussian distribution is still a Gaussian distribution that gives
where
The measurements are collected recursively rather than in the batch form in the multiple-target tracking problem, and so the Gaussian process regression needs to be improved in order to estimate the output of the test set recursively. Ref. [
39] introduces basis points which are considered to be the inputs of the training set. This article adopts the same approach as [
39], where a state-space model is also introduced to estimate the output recursively.
and
denote the input and the output of the test set, respectively, and so the state-space model is
where
is the measurement at time
. We assume
is sufficient statistics for
. Under this assumption
the recursive optimal Bayesian filter gives the posterior probability density function (PDF)
as
According to Equation (17), the joint distribution of
and
is
Therefore, the likelihood function and the prior PDF are
where
The advantage of the method is that the extent states can be inserted into the kinematic states and be estimated using a single-state space model.
Measurement Model
In this article, we assume that the measurement sources are all on the object’s surface. A 3D radial function
is used to model the shape of the target, whose inputs
are the azimuth and the elevation. For simplicity, we use
to denote the pair
. The function value
of the radial function is the distance between the basis points and the center of the extended target. The radial function is assumed to be a GP, denoted as
The mean function of the GP model is assumed to be some unknown constant
, which is Gaussian-distributed, and we assume that
as in [
37]. The correlation function in [
37] is used, which is
where
represents the prior variance,
is the length scale, and
calculates the relative distance between two inputs, given as follows
The output range of the distance function is , and any coincident input is mapped to , while the opposite pair is mapped to . The employed distance is specified to imply a higher correlation for closer regions compared to separated regions.
Together with the radial function and the GP model, both the kinematic model and measurement model can be derived. We denote the extended target state as
, where
is the extended target kinematic state including the translation vector
and the orientation vector
. The values of the radial function are denoted as
. The kinematic model of the extended target is
The constant velocity (CV) model is considered for the translation of the extended target in this article, which gives
A
orthogonal matrix called a rotation matrix is usually used to describe the orientation of the target. However, a rotation matrix is singular or discontinuous at some points. Quaternion is used in this article instead of a rotation matrix to describe the orientation of the target. A quaternion
is a 4D vector which contains two parts: a three-dimensional vector and a constant, denoted as
The translation between the rotation matrix and the quaternion is
where
represents the rotation matrix and
is the cross-product matrix
The multiplication of two quaternions
and
is defined as
A useful property of quaternion is
which indicates that the product of two rotation matrices can be represented by the product of quaternions. Therefore, the continuous rotation of a target can be represented by the product of quaternions. Ref. [
40] introduces an approach to estimate the orientation of the target recursively using quaternions based on the property of Equation (44). This method can be augmented in the process of estimating the target’s kinematic states. The orientation of the target at time
can be expressed as
where
is the reference orientation and
is the deviation from the reference orientation. To estimate the orientation recursively, the estimated orientation at time
is used as the reference orientation.
is the deviation vector, and
is defined using Rodrigue parameterization as
The core idea is to estimate the deviation vector
, and the orientation at any moment can be calculated together with the reference orientation. Ref. [
41] gives the dynamic model of the deviation vector
We denote the rotation vector as
, and the dynamic model of the rotational part of the extended target can be derived using Equation (47) as
where
is the acceleration of the rotation vector, and is assumed to be the Gaussian white noise in the article. Equation (48) indicates a nonlinear system, and therefore the first-moment Taylor expansion is used to derive the approximated linear model. The first-order Taylor expansion of the estimated orientation
at time
is given as
The deviation vector is set to zero after each measurement update, and so the Taylor approximation in Equation (49) is
The discrete system is
where
Substituting Equation (42) into Equation (52) gives
where
is the sampling time. The process noise covariance matrix is
Because the deviation vector is set to zero,
is
The following are the details of the matrices
We assume that the shape of the target does not change, and that therefore the dynamic model for the extent state of the extended target is
which indicates that
.
Two coordinate systems are used in the article to derive the measurement model. The first one is the global coordinate denoted as the upper letter
, and the second one is the local coordinate of the target denoted as the upper letter
.
Figure 1 shows the two coordinates.
We assume
measurements
at time
are received, and so the
ith measurement
can be expressed as
where
is the direction vector. Measurements in the local coordinate can be derived using the global coordinate measurements and the reference quaternion as
The azimuth and elevation of the measurements are
Substituting Equation (21) into Equation (61) gives
which can be further simplified to obtain the pseudo-measurement function
where
5. Simulation Results and Discussion
In this section, a simulation experiment is conducted to compare the performance of the GP-PHD filter proposed in the article with the traditional GGIW-PHD filter [
33]. A three-dimensional area with size
is considered. We assume that two extended targets move in the x–y–z plane. Target 1 (T1) is a cylinder with
and target 2 (T2) is a cube with
and
. We denote the target states as
. The initial states of T1 is
and the initial states of T2 is
. T1 moves at constant speed
along the
y axis. T2 moves at constant speed
in the first 10 s, and then makes a constant turn at speed
for 10 s. After the constant turn, T2 moves along the
y axis at a constant speed. Both targets’ moving duration is
and the sampling time is
. The experiment is a simulation of a typical traffic scenario, where T1 corresponds to the pedestrian and T2 corresponds to a practical medium size vehicle. Both linear and non-linear motion models of extended targets are considered. Although we use a linear motion model, the filter is expected to be robust enough to handle such model mismatches, which are common in most tracking applications.
We assume that all the measurements come from the surface of the target and that the number of the measurements is Poisson-distributed. The Poisson rate of the measurements is
and the covariance matrix of the measurement noise is
. The Poisson rate of the number of clutter is
, and clutter is uniformly distributed in the surveillance area.
Figure 2 shows clutter and measurements collected in the surveillance area. All experiments in the following section are carried out with different realizations of the measurement noise, the measurement source, and the clutter. The result numbers of the experiments are the averages of Monte Carlo (MC) runs.
The CV model is considered for both the GGIW-PHD filter and the GP-PHD filter. The process noise of the dynamic model is
. For the GP-PHD filter,
basis points equally located in
are used and the hyperparameters of the GP model are set to
. The parameters for the GGIW-PHD filter are given in
Table 1.
The number of the gamma Gaussian mixture component is no more than
and the number of Gaussian components per target is no more than
. The tracking results of the two filters are shown in
Figure 3.
Figure 3 shows that both filters are able to deal with the non-linear movements of the targets, while the GP-PHD filter is more capable of handling the motion model mismatches and the estimated trajectory is closer to the truth.
We use the OSPA distance to evaluate the performance of the filter, which takes into account both errors made in estimating target states and the errors in estimating target cardinality. The order and the penalty factor are set as
. The average result is obtained through 100 MC runs, which is shown in
Figure 4.
The estimation of the shape is illustrated in
Figure 5. The GP-PHD filter shows superior performance compared to the GGIW-PHD filter in terms of shape estimation. The GP-PHD filter uses a radial function to describe the shape of the target, which is suitable for arbitrary shape extended targets. However, the GGIW-PHD filter is limited to ellipsoidal shape targets. Therefore, both filters show satisfactory performance in estimating the shape of T1 in
Figure 5a. However, the GGIW-PHD filter shows a poor performance in estimating the shape of T2, especially when the target is making a turn. The GP-PHD filter takes into account the orientation of the target.
Figure 5b shows that the GP-PHD filter is able to estimate the shape of the target, even when the target is making a turn.
The performance of the estimation of the shape is evaluated using the IoU, which is defined as
where
is the true target shape and
is the estimated target shape. The IoU value not only accounts for the quality of the estimation of the kinematics states but also the extent. The average results of 100 MC runs are shown in
Figure 6. Overall, the GP-PHD filter estimates the shape of the extended targets better than the GGIW-PHD filter. The performance of the GGIW-PHD filter is not stable, particularly when the target is making a turn.
We further examine the measurement rates’ impact on the proposed filter and the GGIW-PHD filter. The measurement rates are set as
. The experiments are repeated for 50 MC runs, respectively, to obtain the average results shown in
Figure 7 and
Figure 8. It is reported that the GP-PHD filter shows good performance at both low and high measurement rates, while the measurement rate has a great impact on the performance of the GGIW-PHD filter. The GP-PHD filter is more robust compared to the GGIW-PHD filter. The main reason for the impact of the measurement rate on the GGIW-PHD filter is that there is less information about the shape that can be exploited by the GGIW-PHD filter when the measurement rate is low. However, the covariance function of the GP model assumes that the radial function has a period of
. This assumption suggests that the target is symmetric, which is the usual case in tracking. Therefore, under low-measurement-rate conditions, the GP-PHD filter can obtain more information about the target and a more accurate estimation of the shape can be expected.
All the simulations are conducted in MATLAB 2022b on a computer with an AMD Ryzen 5 3600X 6-Core processor and 16 GB RAM. The average computation time for the GP-PHD filter and the GGIW-PHD filter are 459 s and 5.11 for 300 updates. Both the GGIW-PHD filter and the GP-PHD filter are recursively updated by the standard EKF, and therefore the computational load is mainly determined by the dimension of the state vector. The state dimension of the GP-PHD filter is , which is much greater than that of the GGIW-PHD filter. Further improvements, such as projecting the contour of the 3D target into 2D space to reduce the computational burden, can be considered.