Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences
Abstract
:1. Introduction
- The proposed method, the SCT model, based on Kalman filtering and bipartite matching, is presented. It is an online MOT model to optimize feature extraction, object detection, and data association. It performs data association to object detection (pairwise).
- The informative regions are improved using box pair regression of SCT with the help of a joint attention unit to enhance performance.
- The proposed SCT reaches remarkable performance with datasets of MOT16 and MOT17.
2. Literature Review
3. Materials and Methods
3.1. Problems
3.2. Problem Formulation
3.3. Chained-Tracker Pipeline
3.3.1. Framework
3.3.2. Node Chaining
- Adding Kalman Filter [49] (predicts object location in the next frame) helps to get smooth and reasonable tracklets which causes a decrease in the number of ID switches. We can keep only those detections for tracking purposes whose predicted location is near to previous frame detection. The mathematical equations of the Kalman filter provide an efficient computational means for evaluating the states of a process by helping evaluate the present, past, and future states. It can also help evaluate when the modeled system is unknown.
- Using Bipartite Matching [50] (one-to-one mapping), we can assign one identity to one person in the next frame. It ensures one person maps to only one person in the next frame and thus reduces the ID switch. The bipartite matching framework [51] allows us to factorize node similarity in the search for a one-to-one correspondence between nodes in two graphs.
3.4. SCT Framework
3.4.1. Paired Boxes Regression
3.4.2. Joint Attention Module
3.4.3. Feature Reuse
3.4.4. Loss Function and Assigning Labels
3.5. Datasets and Evaluation Metrics
3.5.1. Identification-F1 (IDF1)
3.5.2. Number of Identity Switch (IDSw)
3.5.3. Multiple Object Tracking Accuracy (MOTA)
4. Results and Discussion
4.1. Qualitative Results
4.2. Quantitative Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Symbol | Use |
---|---|
CNN | Convolution Neural Network |
CNNMTT | Multi-target tracking based on CNN |
DeepSORT | Deep Simple Online and Realtime Tracking |
DPM | Deformable part model |
DMAN | Dual Matching Attention Networks |
EER | Equal error rate |
EAMTT | Early Association Multi-Target Tracker |
FPN | Feature Pyramid Networks |
ID-F1 | Identification-F1 |
IDSw | Identity Switch |
IoU | Intersection over Union |
JAM | Joint Attention Module |
LBP | Local Binary Patterns |
LMPR | Lifted Multicut and Person Re-identification |
MAT | Motion-Aware Tracker |
MOT | Multiobject Tracking |
MOTA | Multiobject Tracking Accuracy |
MOTP | Multiobject Tracking Precision |
MHI | Motion History Image |
MSM | Memory Sharing Mechanism |
MT | Mostly Tracked Trajectory |
MLT | Mostly Lost Trajectories |
NOMT | Near-Online Multi-target Tracking |
POI | Person of Interest |
Quad-CNN | Quadruplet Convolution Neural Network |
RNN | Recurrent Neural Network |
R-CNN | Region-based Convolutional Neural Network |
RPN | Region Proposal Network |
ROI | Region of Interest |
SDP | Scale-Dependent pooling |
SCT | Super Chained Tracker |
SSD | Single Shot-multibox Detector |
STAM | Spatial-Temporal Attention Mechanism |
SORT | Simple Online and Real-time Tracking |
TBD | Tracking-by-detection |
UAV | Unmanned Armed Vehicles |
YOLO | You Only Look Once |
References
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep learning in video multi-object tracking: A survey. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Xu, B.; Liang, D.; Li, L.; Quan, R.; Zhang, M. An Effectively Finite-Tailed Updating for Multiple Object Tracking in Crowd Scenes. Appl. Sci. 2022, 12, 1061. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Processing Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Tangirala, K.V.; Namuduri, K.R. Object tracking in video using particle filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, 23 March 2005. [Google Scholar]
- Tariq, S.; Farooq, H.; Jaleel, A.; Wasif, S.M. Anomaly detection with particle filtering for online video surveillance. IEEE Access 2021, 9, 19457–19468. [Google Scholar]
- Takala, V.; Pietikainen, M. Multi-object tracking using color, texture and motion. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar]
- Shu, G. Human Detection, Tracking and Segmentation in Surveillance Video. Ph.D. Thesis, University of Central Florida, Orlando, FL, USA, 2014. [Google Scholar]
- Wang, L.; Liu, T.; Wang, G.; Chan, K.L.; Yang, Q. Video tracking using learned hierarchical features. IEEE Trans. Image Processing 2015, 24, 1424–1435. [Google Scholar]
- Gao, T.; Wang, N.; Cai, J.; Lin, W.; Yu, X.; Qiu, J.; Gao, H. Explicitly exploiting hierarchical features in visual object tracking. Neurocomputing 2020, 397, 203–211. [Google Scholar]
- Zhong, Z.; Gao, Y.; Zheng, Y.; Zheng, B. Efficient spatio-temporal recurrent neural network for video deblurring. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Kang, K.; Ouyang, W.; Li, H.; Wang, X. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Ning, G.; Zhang, Z.; Huang, C.; Ren, X.; Wang, H.; Cai, C.; He, Z. Spatially supervised recurrent convolutional neural networks for visual object tracking. In Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, USA, 28–31 May 2017. [Google Scholar]
- Held, D.; Thrun, S.; Savarese, S. Learning to track at 100 fps with deep regression networks. In European conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Ma, C.; Huang, J.-B.; Yang, X.; Yang, M.-H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H. Visual tracking with fully convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Shen, H.; Li, S.; Zhu, C.; Chang, H.; Zhang, J. Moving object detection in aerial video based on spatiotemporal saliency. Chin. J. Aeronaut. 2013, 26, 1211–1217. [Google Scholar] [CrossRef]
- Zhang, J.; Liang, X.; Wang, M.; Yang, L.; Zhuo, L. Coarse-to-fine object detection in unmanned aerial vehicle imagery using lightweight convolutional neural network and deep motion saliency. Neurocomputing 2020, 398, 555–565. [Google Scholar]
- Hui, T.-W.; Tang, X.; Loy, C.C. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Wang, R.J.; Li, X.; Ling, C.X. Pelee: A real-time object detection system on mobile devices. arXiv 2018, arXiv:1804.06882. [Google Scholar]
- Guo, L.; Liao, Y.; Luo, D.; Liao, H. Generic object detection using improved gentleboost classifier. Phys. Procedia 2012, 25, 1528–1535. [Google Scholar] [CrossRef]
- Ayed, A.B.; Halima, M.B.; Alimi, A.M. MapReduce based text detection in big data natural scene videos. Procedia Comput. Sci. 2015, 53, 216–223. [Google Scholar] [CrossRef]
- Viswanath, A.; Behera, R.K.; Senthamilarasu, V.; Kutty, K. Background modelling from a moving camera. Procedia Comput. Sci. 2015, 58, 289–296. [Google Scholar] [CrossRef]
- Soundrapandiyan, R.; Mouli, P.C. Adaptive pedestrian detection in infrared images using background subtraction and local thresholding. Procedia Comput. Sci. 2015, 58, 706–713. [Google Scholar] [CrossRef]
- Park, Y.; Dang, L.M.; Lee, S.; Han, D.; Moon, H. Multiple object tracking in deep learning approaches: A survey. Electronics 2021, 10, 2406. [Google Scholar] [CrossRef]
- Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2 November 2019. [Google Scholar]
- Zhu, J.; Yang, H.; Liu, N.; Kim, M.; Zhang, W.; Yang, M.-H. Online multi-object tracking with dual matching attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Chu, Q.; Ouyang, W.; Li, H.; Wang, X.; Liu, B.; Yu, N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Choi, W. Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Yu, F.; Li, W.; Li, Q.; Liu, Y.; Shi, X.; Yan, J. Poi: Multiple object tracking with high performance detection and appearance feature. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Sanchez-Matilla, R.; Poiesi, F.; Cavallaro, A. Online multi-target tracking with strong and weak detections. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 7 February 2017. [Google Scholar]
- Mahmoudi, N.; Ahadi, S.M.; Rahmati, M. Multi-target tracking using CNN-based features: CNNMTT. Multimed. Tools Appl. 2019, 78, 7077–7096. [Google Scholar] [CrossRef]
- Zaech, J.-N.; Liniger, A.; Dai, D.; Danelljan, M.; Van Gool, L. Learnable online graph representations for 3d multi-object tracking. In Proceedings of the IEEE Robotics and Automation Letters, Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
- Han, S.; Huang, P.; Wang, H.; Yu, E.; Liu, D.; Pan, X. Mat: Motion-aware multi-object tracking. Neurocomputing 2022, 476, 75–86. [Google Scholar] [CrossRef]
- Sun, Z.; Chen, J.; Mukherjee, M.; Liang, C.; Ruan, W.; Pan, Z. Online multiple object tracking based on fusing global and partial features. Neurocomputing 2022, 470, 190–203. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Tsai, C.-Y.; Su, Y.-K. MobileNet-JDE: A lightweight multi-object tracking model for embedded systems. Multimed. Tools Appl. 2022, 81, 9915–9937. [Google Scholar] [CrossRef]
- Breitenstein, M.D.; Reichlin, F.; Leibe, B.; Koller-Meier, E.; Van Gool, L. Robust tracking-by-detection using a detector confidence particle filter. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009. [Google Scholar]
- Bochinski, E.; Eiselein, V.; Sikora, T. High-speed tracking-by-detection without using image information. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017. [Google Scholar]
- Peng, J.; Wang, C.; Wan, F.; Wu, Y.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Fu, Y. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Qin, C.; Zhang, Y.; Liu, Y.; Lv, G. Semantic loop closure detection based on graph matching in multi-objects scenes. J. Vis. Commun. Image Represent. 2021, 76, 103072. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Patel, H.A.; Thakore, D.G. Moving object tracking using kalman filter. Int. J. Comput. Sci. Mob. Comput. 2013, 2, 326–332. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. arXiv 2021, arXiv:2101.02702,. [Google Scholar]
- Shokoufandeh, A.; Dickinson, S. Applications of bipartite matching to problems in object recognition. In Proceedings of the ICCV Workshop on Graph Algorithms and Computer Vision, Corfu, Greece, 21 September 1999. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS-improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Ravindran, V.; Osgood, M.; Sazawal, V.; Solorzano, R.; Turnacioglu, S. Virtual reality support for joint attention using the Floreo Joint Attention Module: Usability and feasibility pilot study. Jmir Pediatrics Parent. 2019, 2, e14429. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Yu, E.; Li, Z.; Han, S.; Wang, H. Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Trans. Multimed. 2022. [CrossRef]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [Google Scholar] [CrossRef]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Dai, J.; He, K.; Sun, J. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Yang, F.; Choi, W.; Lin, Y. Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J. Image Video Processing 2008, 2008, 246309. [Google Scholar] [CrossRef]
- Liu, Y.; Bai, T.; Tian, Y.; Wang, Y.; Wang, J.; Wang, X.; Wang, F.-Y.I. SegDQ: Segmentation Assisted Multi-Object Tracking with Dynamic Query-based Transformers. Neurocomputing 2022, 481, 91–101. [Google Scholar] [CrossRef]
- Li, Y.; Huang, C.; Nevatia, R. Learning to associate: Hybridboosted multi-target tracker for crowded scene. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Lee, J.; Kim, S.; Ko, B.C. Online multiple object tracking using rule distillated siamese random forest. IEEE Access 2020, 8, 182828–182841. [Google Scholar] [CrossRef]
- Tang, S.; Andriluka, M.; Andres, B.; Schiele, B. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Son, J.; Baek, M.; Cho, M.; Han, B. Multi-object tracking with quadruplet convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kim, C.; Li, F.; Rehg, J.M. Multi-object tracking with neural gating using bilinear lstm. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Chen, J.; Sheng, H.; Zhang, Y.; Xiong, Z. Enhancing detection model for multiple hypothesis tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Bae, S.-H.; Yoon, K.-J. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 595–610. [Google Scholar] [CrossRef] [PubMed]
- Lee, B.; Erdenee, E.; Jin, S.; Nam, M.Y.; Jung, Y.G.; Rhee, P.K. Multi-class multi-object tracking using changing point detection. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Method | MOTA↑ | IDF1↑ | FP↓ | FN↓ | IDSw↓ |
---|---|---|---|---|---|
LMPR [65] | 48.8 | 51.3 | 6654 | 86,245 | 481 |
Quad-CNN [66] | 44.1 | 38.3 | 6388 | 94,775 | 745 |
MHT-bLSTM [67] | 42.1 | 47.8 | 11,637 | 93,172 | 753 |
EDMT [68] | 45.3 | 47.9 | 11,122 | 87,890 | 639 |
Method | MOTA↑ | IDF1↑ | FP↓ | FN↓ | IDSw↓ |
---|---|---|---|---|---|
Tracktor [31] | 54.4 | 52.5 | 3280 | 79,149 | 682 |
DMAN [32] | 46.1 | 54.8 | 7909 | 89,874 | 532 |
STAM [33] | 46.0 | 50.0 | 6895 | 91,117 | 473 |
CDA-DDAL [69] | 43.9 | 45.1 | 6450 | 95,175 | 676 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qureshi, S.A.; Hussain, L.; Chaudhary, Q.-u.-a.; Abbas, S.R.; Khan, R.J.; Ali, A.; Al-Fuqaha, A. Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences. Appl. Sci. 2022, 12, 9538. https://doi.org/10.3390/app12199538
Qureshi SA, Hussain L, Chaudhary Q-u-a, Abbas SR, Khan RJ, Ali A, Al-Fuqaha A. Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences. Applied Sciences. 2022; 12(19):9538. https://doi.org/10.3390/app12199538
Chicago/Turabian StyleQureshi, Shahzad Ahmad, Lal Hussain, Qurat-ul-ain Chaudhary, Syed Rahat Abbas, Raja Junaid Khan, Amjad Ali, and Ala Al-Fuqaha. 2022. "Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences" Applied Sciences 12, no. 19: 9538. https://doi.org/10.3390/app12199538
APA StyleQureshi, S. A., Hussain, L., Chaudhary, Q. -u. -a., Abbas, S. R., Khan, R. J., Ali, A., & Al-Fuqaha, A. (2022). Kalman Filtering and Bipartite Matching Based Super-Chained Tracker Model for Online Multi Object Tracking in Video Sequences. Applied Sciences, 12(19), 9538. https://doi.org/10.3390/app12199538