1. Introduction
Target tracking is the process of estimating a continuous and accurate target trajectory by processing discrete, noise-contaminated sensor measurements. The basic idea is to design a motion model for the target based on prior knowledge and iteratively optimize this model according to the spatio-temporal variation patterns of the measurements, with the model output serving as the target state estimation. After decades of development, target tracking has been widely applied in radar target detection, precision weapon guidance, aerospace measurement and control, UAV navigation, and autonomous driving, significantly advancing the related fields. It is noteworthy that in many practical scenarios, the target’s motion pattern is not constant and may experience state transitions such as acceleration, deceleration, or turning. These uncertain maneuvering behaviors cause a mismatch between the tracking model and the actual target motion characteristics, resulting in a significant reduction in tracking accuracy or even tracking divergence. Thus, accurately tracking maneuvering targets, especially those with strong maneuvers, has long been a challenging problem in radar data processing.
The commonly used maneuvering target tracking methods mainly fall into two categories: one is model-based state prediction methods, and the other is neural network-based state prediction methods. Model-based methods are centered around the Kalman Filter [
1,
2], and their tracking effectiveness depends on the accuracy of the target motion model, initially including the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and others. Common motion models [
3,
4] mainly include the Constant Velocity (CV) model, Constant Acceleration (CA) model, Coordinated Turn (CT) model, and adaptive tracking models such as the Singer model [
5] and the current statistical model [
6]. These methods, using only a single model, can adjust model parameters according to measurement changes, but still struggle to cover a variety of complex target maneuvers. To address this, scholars have subsequently proposed multi-model approaches and later the Interacting Multiple Model (IMM) [
7] algorithm, which work with various models in synchronization and perform a weighted summation of their respective state estimates to achieve a superior state estimation. Compared to traditional single-model-based filtering methods, they significantly improve the tracking performance of maneuvering targets. However, due to the latency in the calculation and update of weights, at the moment when the target maneuvers, the probabilities of the models are not accurate, leading to lagged tracking results and reduced tracking precision.
In recent years, with the rapid development and widespread application of deep learning, scholars have proposed neural network-based methods for maneuvering target tracking. In the earlier methods, scholars used online learning [
8,
9] to predict the innovations, Kalman gain, and other parameters, thereby adaptively adjusting the motion models to obtain better state estimates. However, online learning cannot use models and algorithms with high computational complexity, and the limited time available for data preprocessing leads to poor preprocessing effects. Therefore, this method has gradually been replaced by deep neural network algorithms trained and learned on offline data. Recurrent Neural Networks, the earliest models used for time-series processing, have been widely applied in many fields such as weather forecasting, stock market predictions, autonomous driving, and natural language processing. Correspondingly, as trajectory data are a type of time series, the strategy of integrating deep learning methods with traditional processing techniques has become a prominent trend. However, for a single radar, the amount of trajectory measurements is often too small to build a complete offline database [
10,
11]. To address this, this paper uses a Digital Combat Simulator (DCS) to generate a large amount of simulated flight data, which after preprocessing, are converted into trajectory data in Cartesian coordinates suitable for network input. Compared with current databases built on simulation data, this offline simulated trajectory database is closer to real conditions. The resulting models will have stronger generalization capabilities and tracking performance.
In maneuvering target tracking methods, state prediction is crucial. There are currently many neural network models designed for state prediction, such as trajectory prediction algorithms based on BP neural networks [
12], algorithms based on Bi-LSTM [
13], state estimation algorithms based on NARX (nonlinear autoregressive with an exogenous input) [
14], Transformer-based networks TBN [
15], and the joint direct tracking and classification algorithm DeppDTC based on CNN-Transformer [
16]. These algorithms primarily focus on whether the network can retain long-term memory of feature information based on attention mechanisms, while neglecting performance analysis of the convolutional neural network (CNN) used for feature extraction. Therefore, this paper proposes a new state prediction model combining a Temporal Convolutional Network and LSTM. The TCN optimizes the CNN layers by replacing standard convolutions with dilated convolutions and causal convolutions. By using LSTM, long-term dependencies in the measurements can be captured with fewer layers, enhancing state prediction accuracy and efficiency.
Without measurement-based correction, relying solely on predictions inevitably introduces cumulative errors. Prediction systems based on neural networks face difficulties incorporating measurements due to the unclear specific form of the nonlinear state transition matrix. To address this issue, some scholars have proposed feasible methods, such as using neural networks to estimate model parameters, including state transition matrices [
17,
18], transition probability matrices (TPMs) in IMM [
19], polynomial coefficients [
20], turning rate [
21,
22], and model probabilities in IMM [
23], and then using traditional methods for filtering after obtaining specific models. Additionally, incorporating measurement influences during neural network training is another method [
24]. Although this approach presents significant challenges to the neural network’s feature learning capabilities, it avoids the complexity of adding measurement feedback modules later, such as in the TrMTT algorithm [
25]. Other studies have designed new recursive algorithms on existing models, such as NTTrRT [
26]. These methods abandon trajectory-to-trajectory prediction, improving prediction accuracy but failing to address the statistical quantity problem of target states under nonlinear prediction systems, introducing unnecessary errors. Therefore, this paper proposes a fusion algorithm, TCN-LSTM-UKF (TLU), based on neural network prediction and the Unscented Kalman Filter, capable of calculating state statistics at each time step without explicitly defining the state transition matrix, thus incorporating measurement influences into the system. Firstly, we analyzed the propagation characteristics of Sigma points in nonlinear systems and discovered that the state changes in a single Sigma point across different time steps exhibit temporal correlation. Therefore, we proposed a method of extracting Sigma points using the UT transformation by ‘unfolding’ multi-sequence vectors, employing the TCN-LSTM network for time-series prediction of each point. The predicted state vectors and covariance matrices are computed by calculating the weighted sum of the Sigma points at the current moment, and then, the measurements are introduced following the basic principles of Kalman Filters to correct the state vectors and covariance matrices. Additionally, we explored design techniques for the time sliding window length of recurrent neural networks. Through this approach, the proposed algorithm effectively handles the state prediction issues of nonlinear systems and introduces measurement feedback in a reasonable and effective manner to help correct errors in the predictions.
This paper is organized as follows:
Section 2 introduces the basic principles of TCNs and LSTM.
Section 3 proposes the intelligent algorithm for maneuvering target tracking, TCN-LSTM-UKF.
Section 4 presents the results of a simulation analysis and real data processing, and
Section 5 summarizes the work of this paper.
3. The Proposed TCN-LSTM-UKF Algorithm
3.1. Neural Network Model Based on the Serial Structure of TCN-LSTM
Temporal Convolutional Networks (TCNs) offer a distinct advantage over traditional CNN in handling time-series data, especially for tasks like maneuvering target tracking. Typically, before using recurrent neural networks for maneuvering target trajectory prediction, CNN layers are used to extract motion features. However, traditional CNN layers are primarily designed for image processing, where the receptive field of each layer is limited by the size and number of convolution kernels. Although increasing the number of layers can indirectly expand the receptive field, this method significantly raises computational complexity, memory consumption, and training time. More importantly, CNNs struggle to capture long-term dependencies, which are critical in trajectory prediction tasks. This limitation also introduces the risk of “future information leakage,” where CNNs unintentionally use future data in predictions, thus compromising temporal continuity and reliability. In contrast, TCNs are specifically designed to address these issues by capturing long-distance temporal dependencies even in shallow networks, improving computational efficiency. TCNs use dilated convolutions, which expand the receptive field without increasing the computational burden, allowing the model to access information from a broader temporal range. Moreover, TCNs employ causal convolutions, where each output at a given time step depends only on the current and previous inputs, preserving temporal order and ensuring predictions are based solely on known historical data. This causality-preserving property avoids the problem of future information leakage. Additionally, TCNs demonstrate significant stability and efficiency advantages over CNNs. In deep neural networks, handling long sequences often leads to vanishing or exploding gradients. While CNNs can partially mitigate this problem by stacking multiple convolutional layers, they do not fully resolve it. TCNs, however, use residual connections and normalization layers to maintain stable gradient flow in deep networks, significantly reducing the risk of vanishing or exploding gradients. TCN’s architecture also enhances parallel computation capabilities, as dilated convolutions can cover multiple time steps simultaneously, greatly improving computational efficiency for long sequences and better utilizing modern hardware’s parallel computing power. Furthermore, TCNs, with their relatively simple convolution operations and fewer parameters, provide better generalization and stability when processing large and complex time-series datasets, with a lower risk of overfitting compared to CNNs. In summary, TCN offers a more efficient, powerful, and adaptable architecture than CNN, especially suited for various time-series data analysis tasks requiring the capture of complex temporal dynamics. Therefore, to enable the network to better learn the maneuvering characteristics of target motion, we replace the CNN-LSTM neural network architecture with the TCN layer.
TCN and LSTM are both excellent techniques for processing time-series data, each with its unique advantages. When using LSTM alone, handling long time series can become challenging due to issues such as gradient explosion or gradient vanishing, which affects the accuracy of the model’s predictions. By introducing TCN, which has a larger receptive field, the network can better capture long-term dependencies in the sequence. Additionally, TCN’s convolution-based parallel processing structure improves the computational efficiency compared to the sequential processing structure of using LSTM alone. More importantly, TCN can effectively filter noise during the feature extraction phase, providing cleaner and more meaningful features for LSTM to process subsequently, thereby enhancing the overall robustness and noise resistance of the model. Therefore, this paper proposes a serial network structure based on TCN and LSTM, where TCN serves as the feature extractor followed by LSTM layers for sequence modeling. This effectively integrates the strengths of both, improving the overall prediction accuracy of the model and can be effectively applied in the single-step prediction phase of maneuvering target tracking tasks.
In summary, TCN-LSTM can handle the diversification and complexity in maneuvering target tracking tasks, performing excellently in large-scale, high-dimensional, and rapidly changing tracking scenarios. This method helps improve tracking algorithm performance, enhancing reliability and efficiency in practical applications.
Figure 4 shows the specific network framework, consisting of two TCN layers and one LSTM layer, with preprocessed Cartesian coordinate system historical trajectory data as input. The structure is b × n × c, where b represents batch size (i.e., the number of samples processed simultaneously in one forward and backward pass), n represents sequence length (in maneuvering target tracking tasks, it represents the number of trajectory time steps in a window), and c represents the number of features (in this paper, it is consistent with the target state dimension, with the target state vector containing nine dimensions of x, y, and z positions, velocities, and accelerations). The TCN part accepts nine-dimensional inputs and consists of two layers, with output channels of 64 and 128, respectively. The kernel size is nine, and dilated convolutions are used with dilation rates of one and two to expand the receptive field, allowing the model to effectively capture long-term dependencies. To maintain the time dimension, the padding is set to 8 and 16 in the first and second layers, respectively. Each convolutional layer is followed by a Relu activation function, and a 0.2 Dropout is applied to prevent overfitting. In addition, residual connections are used for each convolutional output, and if the input and output channels are different, a 1 × 1 convolution is used to adjust the input size. Next is the LSTM part, which processes the 128-dimensional feature output by the TCN. The LSTM network contains nine layers in total, with 256 hidden units per layer, and a unidirectional structure is adopted. Therefore, it extracts features from the forward sequence. The output dimension of the LSTM is 256, which is then mapped to a nine-dimensional output through a fully connected layer to generate the final prediction results. A 0.2 Dropout is also applied between each LSTM layer.
3.2. Intelligent Tracking Algorithm Based on Unscented Filters
Existing data-based tracking methods rarely combine Kalman Filters to add a measurement feedback mechanism. Using neural network predictions alone cannot dynamically adjust and correct prediction errors. First, the predictions of neural networks have some error at each time step, and when these predictions are used for subsequent predictions, these errors gradually accumulate, leading to a decline in prediction accuracy. Additionally, the internal working mechanisms of neural networks are complex and lack interpretability, making it difficult to understand the model’s decision-making process and the reliability of the results in practical applications. In contrast, Kalman Filtering methods have clear mathematical models, making the results easier to verify. Therefore, this paper proposes a TCN-LSTM-UKF intelligent tracking algorithm based on unscented filtering. This algorithm integrates modern neural networks and traditional Kalman Filters, leveraging the powerful predictive capabilities of neural networks and the real-time correction capabilities of Kalman Filters. This significantly improves the tracking accuracy and robustness of the system when dealing with highly maneuvering targets.
To optimize filter performance and improve the system model, this paper studies the time dependency of Sigma points. In the Unscented Kalman Filter, although Sigma points at each time step are regenerated according to the current state estimate and covariance matrix, their generation depends on the prediction and state update from the previous time step. In this sense, there is an indirect temporal dependency between Sigma points at different times, as each set of Sigma points is adjusted based on information from the previous time step. By tracking the changes in each Sigma point over time, we can observe how they evolve with the system. In this context, the changes in Sigma points can be viewed as a time series reflecting the dynamic behavior of the system. Therefore, the proposed TCN-LSTM network model can be trained to understand the temporal patterns of Sigma point changes and predict future states. At this point, each Sigma point’s time series can be considered a feature sequence, and the neural network needs to capture the patterns in these sequences for prediction.
The generation and propagation rules of Sigma points are illustrated in
Figure 5. In UKF, the generation of Sigma points is designed to approximate the Gaussian distribution with a small number of discrete points and to capture the state distribution in a nonlinear system. Specifically, 2n + 1 Sigma points are generated from the initial state mean and covariance matrix, where n is the dimension of the state vector. By calculating the Cholesky decomposition of the covariance matrix, we obtain vectors to offset the state mean, generating symmetric Sigma points on both sides. These Sigma points represent possible values of the state distribution and are centered around the state mean, distributed within the range defined by the covariance, effectively capturing the changes in the state. When the Sigma points pass through a nonlinear system (in this paper, a pre-trained neural network prediction model), each point undergoes an independent nonlinear transformation. This means that the nonlinear system makes different predictions for each Sigma point, producing corresponding outputs. These predicted results typically no longer maintain a linear relationship, as the complexity of the nonlinear system leads to a nonlinear mapping between input and output. However, despite this nonlinear transformation, the Sigma points still retain the statistical properties of the original state distribution. After passing through the nonlinear system, the predicted results of the new Sigma points are used to re-estimate the state mean and covariance of the system. By taking a weighted average of these predicted results, a new state mean can be calculated, and the state covariance can also be re-estimated. This process allows UKF to better capture changes and uncertainties in the state when dealing with nonlinear systems, ensuring an accurate description of the state distribution. In summary, the generation and propagation of Sigma points through a nonlinear system ensure that, even in the presence of complex nonlinear mappings, the algorithm can effectively reflect changes in the state and obtain more accurate estimates of the mean and covariance through weighted summation. This makes UKF more stable and accurate when handling nonlinear systems compared to other filtering methods.
Based on the time dependency of Sigma points, this paper proposes an intelligent tracking algorithm, TLU, for maneuvering targets based on the unscented filter, as shown in
Figure 6.
Initially, appropriate data preprocessing is conducted, converting raw radar measurements recorded as range–azimuth–elevation into Cartesian coordinates, and preliminary tracking is performed using the IMM algorithm. Data from the initial n time steps are selected as the initial window, and the Sigma point set of the target state vector at each time step in the initial window is calculated, resulting in a three-dimensional tensor of shape
s ×
n ×
c, where s represents the number of selected Sigma points, and c represents the dimension of the target state vector. According to the general practice of the Unscented Transformation, we have the following:
As shown in
Figure 7, in TLU, since the network-based one-step prediction requires all information of the target state for the previous n time sampling points, each set of Sigma points constructed should reflect the propagation characteristics over this period and the motion characteristics of the target. Therefore, we first flatten the state vectors of the n sampling points and then construct the Sigma point set based on this one-dimensional vector. The number of points obtained with this new construction method is as follows:
Among them, it is necessary to construct a covariance matrix corresponding to the
n ×
c-dimensional state vector for Cholesky decomposition. The covariance matrices for n time steps, each of shape
c ×
c, are combined along the diagonal to form a new covariance matrix of shape (
n ×
c) × (
n ×
c), as follows:
Before inputting the s Sigma point sequences into the TCN-LSTM network for prediction, a normalization module [
35] needs to be added to ensure faster and more efficient convergence of the algorithm. After normalization, the model treats all features with consistent importance, avoiding poor training results due to the large variation range of position features. In the proposed algorithm, the min–max normalization method is used:
After processing the data through the TCN-LSTM module, the future state predictions of the s Sigma points are obtained. Since the subsequent process is based on the recursive model of the UKF, only one time step prediction result is output at this point. The UKF uses this prediction result and new measurement data to update the state estimate, optimizing potential prediction errors.
The TLU algorithm combines dynamic model selection (IMM), deep learning (TCN-LSTM), and advanced filtering technology (UKF) with a data-driven core. Compared to traditional model-based tracking algorithms, it offers higher prediction accuracy and performs excellently in handling high-maneuvering target tracking problems in complex environments. More importantly, regarding the issue of unknown model parameters in nonlinear systems when using neural networks for prediction, the TLU algorithm does not require determining the specific form of the state transition matrix. It can calculate the mean and covariance matrix by performing a weighted sum of the Sigma point set predicted at the current time step, without the need to design new measurement feedback systems. This avoids errors introduced by linearization, and this problem is addressed and solved by the proposed method for the first time compared to existing tracking algorithms.