Real-Time Object Tracking with Template Tracking and Foreground Detection Network
Abstract
:1. Introduction
- We combine feature representation backbone network, TmpNet and FgNet into an end-to-end framework for real-time object tracking.
- In object representation, we adopt a modified pretrained VGG as backbone network where the last block parameters are fine-tuned via a multi-task loss in the first frame of a test sequence, which enhances the discrimination of feature representation.
- In object localization, TmpNet for channel-wise target template tracking and FgNet for fast foreground detection are combined to find the optimal tracking results, boosting the tracking performance.
- We performed comprehensive experiments on four benchmark datasets: OTB2013 [21], OTB-2015 [22], TC128 [23], and UAV123 [24]. Our tracker achieves outstanding performance, operating at 38 fps on a single GPU, while obtaining an AUC of 67.7% on OTB2013, 64.1% on OTB2015, 55.4% on TC128 and 47.1% on UAV123.
2. Related Work
2.1. Correlation-Filter Based Real-Time Tracker
2.2. Real-Time Template Tracking Methods
2.3. Foreground Detection Methods
3. Our Method
3.1. Formulation and Motivation
3.2. Details of the Overall Architecture
3.3. Training Details
- (1)
- Scale changes: We randomly choose a scale factor to change object scale via .
- (2)
- Object rotation: We change object rotation by choosing a rotation angle factor , where .
- (3)
- Object flip: Since some objects may go through object flip, we simulate them via flipping an object up and down or left and right.
- (4)
- Illumination change: We change the illumination of image by random to choose saturation S and value V in space, , where , , and .
3.4. Online Tracking Processing
Algorithm 1: Tracking with TFnet. | ||
1: | Input: Frames, initial target bounding box | |
2: | Augment the training samples according to the augmentation generation strategy. | |
3: | Learn the variables in Equation (2) with the augmented training samples, where corresponding to the backbone network; t, the target template; , the foreground network | |
4: | Copy the target template t to an adaptive template and static template | |
5: | For frame i = 2: | |
6: | Extract the search images according to the result in last frame | |
7: | Forward propagate the search images and predict the object location with Equation (4) | |
8: | Generate a new training sample and its corresponding label based on the predicted results | |
9: | Update the adaptive target template with Equation (7) | |
10: | End | |
11: | Output: Tracking results |
4. Experiment
4.1. Experiment Setting
4.2. Evaluation Benchmarks
4.3. Evaluation Methodology
4.4. Comparisons to State-of-the-Art Trackers
4.4.1. Results on OTB2013
4.4.2. Results on OTB2015
4.4.3. Results on TC128
4.4.4. Results on UAV123
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In ECCV Workshops; Springer: Cham, Switzerland, 2016; pp. 850–865. [Google Scholar] [Green Version]
- Yang, T.; Chan, A.B. Learning Dynamic Memory Networks for Object Tracking. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 153–169. [Google Scholar]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning Dynamic Siamese Network for Visual Object Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1781–1789. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8971–8980. [Google Scholar]
- Song, Y.; Ma, C.; Gong, L.; Zhang, J.; Lau, R.W.H.; Yang, M.H. CREST: Convolutional Residual Learning for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2574–2583. [Google Scholar]
- Choi, J.; Chang, H.J.; Fischer, T.; Yun, S.; Lee, K.; Jeong, J.; Demiris, Y.; Choi, J.Y. Context-Aware Deep Feature Compression for High-Speed Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 479–488. [Google Scholar]
- Nam, H.; Han, B. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4293–4302. [Google Scholar]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H. Visual Tracking with Fully Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3119–3127. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical Convolutional Features for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 472–488. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6931–6939. [Google Scholar]
- Qi, Y.; Zhang, S.; Qin, L.; Yao, H.; Huang, Q.; Lim, J.; Yang, M.H. Hedged Deep Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4303–4311. [Google Scholar]
- Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In Proceedings of the ECCV, Florence, Italy, 7–13 October 2012; pp. 702–715. [Google Scholar]
- Galoogahi, H.K.; Fagg, A.; Lucey, S. Learning Background-Aware Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1144–1152. [Google Scholar]
- Wang, Q.; Gao, J.; Xing, J.; Zhang, M.; Hu, W. DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv 2017, arXiv:1704.04057. [Google Scholar]
- Valmadre, J.; Bertinetto, L.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. End-to-End Representation Learning for Correlation Filter Based Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5000–5008. [Google Scholar]
- He, A.; Luo, C.; Tian, X.; Zeng, W. A Twofold Siamese Network for Real-Time Object Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4834–4843. [Google Scholar]
- Zhang, Y.; Wang, L.; Qi, J.; Wang, D.; Feng, M.; Lu, H. Structured Siamese Network for Real-Time Visual Tracking. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 355–370. [Google Scholar]
- Dai, K.; Wang, Y.; Yan, X.; Huo, Y. Fusion of Template Matching and Foreground Detection for Robust Visual Tracking. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2720–2724. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Online Object Tracking: A Benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
- Liang, P.; Blasch, E.; Ling, H. Encoding color information for visual tracking: Algorithms and benchmark. IEEE Trans. Image Process. 2015, 24, 5630–5644. [Google Scholar] [CrossRef] [PubMed]
- Mueller, M.; Smith, N.; Ghanem, B. A Benchmark and Simulator for UAV Tracking. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 445–461. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Discriminative Scale Space Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1561–1575. [Google Scholar] [CrossRef] [PubMed]
- Ma, C.; Yang, X.; Zhang, C.; Yang, M.H. Long-term correlation tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5388–5396. [Google Scholar]
- Li, F.; Tian, C.; Zuo, W.; Zhang, L.; Yang, M.H. Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4904–4913. [Google Scholar]
- Choi, J.; Chang, H.J.; Yun, S.; Fischer, T.; Demiris, Y.; Choi, J.Y. Attentional Correlation Filter Network for Adaptive Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4828–4837. [Google Scholar]
- Choi, J.; Chang, H.J.; Jeong, J.; Demiris, Y.; Choi, J.Y. Visual Tracking Using Attention-Modulated Disintegration and Integration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4321–4330. [Google Scholar]
- Liu, Y.; Sui, X.; Kuang, X.; Liu, C.; Gu, G.; Chen, Q. Object Tracking Based on Vector Convolutional Network and Discriminant Correlation Filters. Sensors 2019, 19, 1818. [Google Scholar] [CrossRef] [PubMed]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H.S. Staple: Complementary Learners for Real-Time Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1401–1409. [Google Scholar]
- Held, D.; Thrun, S.; Savarese, S. Learning to Track at 100 FPS with Deep Regression Networks. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 749–765. [Google Scholar]
- Huang, C.; Lucey, S.; Ramanan, D. Learning Policies for Adaptive Tracking with Deep Feature Cascades. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 105–114. [Google Scholar]
- Chang, S.; Li, W.; Zhang, Y.; Feng, Z. Online Siamese Network for Visual Object Tracking. Sensors 2019, 19, 1858. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Zhang, J. Combined Kalman Filter and Multifeature Fusion Siamese Network for Real-Time Visual Tracking. Sensors 2019, 19, 2201. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Teng, Z.; Xing, J.; Gao, J.; Hu, W.; Maybank, S.J. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4854–4863. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.Q.; Wu, W.; Yan, J.; Hu, W. Distractor-Aware Siamese Networks for Visual Object Tracking. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 101–117. [Google Scholar]
- Zhu, Z.; Huang, G.; Zou, W.; Du, D.; Huang, C. UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 1973–1982. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Wang, Y.; Luo, Z.; Jodoin, P.M. Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 2017, 96, 66–75. [Google Scholar] [CrossRef]
- Lim, L.A.; Keles, H.Y. Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recognit. Lett. 2018, 112, 256–262. [Google Scholar] [CrossRef] [Green Version]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Fan, H.; Ling, H. SANet: Structure-Aware Network for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 2217–2224. [Google Scholar]
- Song, Y.; Ma, C.; Wu, X.; Gong, L.; Bao, L.; Zuo, W.; Shen, C.; Lau, R.W.H.; Yang, M.H. VITAL: VIsual Tracking via Adversarial Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8990–8999. [Google Scholar]
- Sun, C.; Lu, H.; Yang, M.H. Learning Spatial-Aware Regressions for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8962–8970. [Google Scholar]
- Fan, H.; Ling, H. Parallel Tracking and Verifying. IEEE Trans. Image Process. 2018, 28, 4130–4144. [Google Scholar] [CrossRef] [PubMed]
- Hare, S.; Saffari, A.; Torr, P.H.S. Struck: Structured Output Tracking with Kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 38, 263–270. [Google Scholar]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-Learning-Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Convolutional Features for Correlation Filter Based Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 621–629. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Learning Spatially Regularized Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
Attributes | Overlap Success Rates on Each Attribute | Distance Precision on Each Attribute | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
TFnet | TRACA | Memtrack | ECO-HC | SiamRPN | DCFnet | TFnet | TRACA | Memtrack | ECO-HC | SiamRPN | DCFnet | |
IV | 0.659 | 0.623 | 0.588 | 0.612 | 0.631 | 0.596 | 0.881 | 0.863 | 0.789 | 0.793 | 0.848 | 0.751 |
OPR | 0.668 | 0.640 | 0.629 | 0.632 | 0.662 | 0.612 | 0.909 | 0.897 | 0.848 | 0.862 | 0.890 | 0.785 |
SV | 0.672 | 0.613 | 0.654 | 0.627 | 0.653 | 0.619 | 0.911 | 0.865 | 0.882 | 0.838 | 0.878 | 0.777 |
OC | 0.670 | 0.644 | 0.612 | 0.670 | 0.623 | 0.645 | 0.905 | 0.884 | 0.810 | 0.913 | 0.838 | 0.833 |
DF | 0.675 | 0.688 | 0.594 | 0.645 | 0.677 | 0.606 | 0.914 | 0.941 | 0.797 | 0.863 | 0.900 | 0.787 |
MB | 0.674 | 0.575 | 0.548 | 0.610 | 0.583 | 0.515 | 0.895 | 0.771 | 0.725 | 0.777 | 0.786 | 0.615 |
FM | 0.671 | 0.578 | 0.585 | 0.607 | 0.601 | 0.534 | 0.890 | 0.782 | 0.773 | 0.797 | 0.791 | 0.646 |
IPR | 0.648 | 0.610 | 0.603 | 0.589 | 0.646 | 0.572 | 0.887 | 0.859 | 0.803 | 0.801 | 0.867 | 0.723 |
OOV | 0.686 | 0.630 | 0.560 | 0.694 | 0.615 | 0.690 | 0.842 | 0.743 | 0.676 | 0.883 | 0.757 | 0.820 |
BC | 0.662 | 0.618 | 0.591 | 0.606 | 0.647 | 0.579 | 0.874 | 0.844 | 0.800 | 0.816 | 0.868 | 0.760 |
LR | 0.566 | 0.470 | 0.671 | 0.403 | 0.665 | 0.429 | 0.983 | 0.903 | 0.996 | 0.750 | 0.976 | 0.590 |
Overall | 0.659 | 0.608 | 0.603 | 0.609 | 0.636 | 0.581 | 0.899 | 0.850 | 0.809 | 0.826 | 0.854 | 0.735 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dai, K.; Wang, Y.; Song, Q. Real-Time Object Tracking with Template Tracking and Foreground Detection Network. Sensors 2019, 19, 3945. https://doi.org/10.3390/s19183945
Dai K, Wang Y, Song Q. Real-Time Object Tracking with Template Tracking and Foreground Detection Network. Sensors. 2019; 19(18):3945. https://doi.org/10.3390/s19183945
Chicago/Turabian StyleDai, Kaiheng, Yuehuan Wang, and Qiong Song. 2019. "Real-Time Object Tracking with Template Tracking and Foreground Detection Network" Sensors 19, no. 18: 3945. https://doi.org/10.3390/s19183945
APA StyleDai, K., Wang, Y., & Song, Q. (2019). Real-Time Object Tracking with Template Tracking and Foreground Detection Network. Sensors, 19(18), 3945. https://doi.org/10.3390/s19183945