An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking
Abstract
:1. Introduction
- We propose an anchor-free Siamese network (AFSN) for object tracking, which can perform end-to-end online training and offline tracking. It changes the original strides and receptive field and eventually achieves powerful performance.
- A dual-fusion method is designed to combine feature maps and prediction results. The high-level features are added to the low-level and middle-level features to make full usage of both spatial and sematic information. Application of the weighted-sum method to multiple prediction results can improve accuracy and boost robustness.
- A multi-template update mechanism is designed to determine whether the template should be updated. The score of the peak to correlation energy is used to measure the degree of occlusion of the object and ensure the effectiveness of the template.
- We present a proposal of replacing the RPN module with the anchor-free prediction network, which can decrease the number of hyper-parameters, make the tracker simpler, speed up the tracking process, and enhance performance.
2. Related Works
2.1. Siamese Network-Based Trackers
2.2. Anchor-Free Method of Detection
3. Methodology
3.1. Feature Extraction with a Siamese Network
3.2. Anchor-Free Prediction Network
3.3. Multi-Template Update Mechanism
3.4. Loss Function
4. Results and Discussion
4.1. Results on GOT-10k
4.2. Results on LaSOT
4.3. Results on UAV123
4.4. Results on OTB100
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Qin, X.F.; Zhang, Y.P.; Chang, H.; Lu, H.; Zhang, X.D. ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking. Electronics 2020, 9, 1528. [Google Scholar] [CrossRef]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware Siamese Networks for Visual Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 103–119. [Google Scholar]
- Fan, H.; Ling, H. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7952–7961. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2019; pp. 4282–4291. [Google Scholar]
- Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6668–6677. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1562–1577. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fan, H.; Lin, L.T.; Yang, F.; Chu, P.; Deng, G.; Yu, S.J.; Bai, H.X.; Xu, Y.; Liao, C.Y.; Ling, H.B. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2019; pp. 5–6. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for uav tracking. In European Conference on Computer Vision; Springer: Amsterdam, The Netherlands, 2016; pp. 445–461. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ATOM: Accurate Tracking by Overlap Maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Chen, X.; Yan, X.; Zheng, F.; Jiang, Y.; Xia, S.-T.; Zhao, Y.; Ji, R. One-shot adversarial attacks on visual tracking with dual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10176–10185. [Google Scholar]
- Yu, Y.; Xiong, Y.; Huang, W.; Scott, M.R. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6728–6737. [Google Scholar]
- Voigtlaender, P.; Luiten, J.; Torr, P.H.S.; Leibe, B. Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6578–6588. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 17 September 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, Y. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Zhou, X.Y.; Wang, D.Q.; Krhenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTubeBoundingBoxes. A large high-precision human-annotated dataset for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5296–5305. [Google Scholar]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 472–488. [Google Scholar]
- Wang, G.T.; Luo, C.; Xiong, Z.W.; Zeng, W.J. Spm-tracker: Series-parallel matching for real-time visual object tracking. arXiv 2019, arXiv:1904.04452. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary Learners for Real-Time Tracking. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
- Li, Y.; Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 254–265. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1561–1575. [Google Scholar] [PubMed] [Green Version]
- Zhang, J.; Ma, S.; Sclaroff, S. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 188–203. [Google Scholar]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P.H. End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2805–2813. [Google Scholar]
- Nam, H.; Han, B. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4293–4302. [Google Scholar]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic siamese network for visual object tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1763–1771. [Google Scholar]
- Gao, J.; Hu, W.; Lu, Y. Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 7386–7395. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6269–6277. [Google Scholar]
Trackers | AO | SR0.5 | SR0.75 | FPS |
---|---|---|---|---|
KCF [29] | 0.203 | 0.177 | 0.065 | 94.66 |
SRDCF [30] | 0.236 | 0.227 | 0.094 | 5.58 |
Staple [28] | 0.246 | 0.239 | 0.089 | 28.87 |
SAMF [31] | 0.246 | 0.241 | 0.084 | 7.43 |
DSST [32] | 0.247 | 0.223 | 0.081 | 18.25 |
MEEM [33] | 0.253 | 0.235 | 0.068 | 20.59 |
ECO-HC [34] | 0.286 | 0.276 | 0.096 | 44.55 |
CFnet [35] | 0.293 | 0.265 | 0.087 | 35.62 |
MDnet [36] | 0.299 | 0.303 | 0.099 | 1.52 |
ECO [34] | 0.316 | 0.309 | 0.111 | 2.62 |
CCOT [26] | 0.325 | 0.328 | 0.107 | 0.68 |
SiamFC [2] | 0.374 | 0.404 | 0.144 | 25.81 |
SiamRPN_R18 [3] | 0.483 | 0.581 | 0.270 | 97.55 |
SPM [26] | 0.513 | 0.593 | 0.359 | 72.30 |
SiamRPN++ [6] | 0.508 | 0.601 | 0.313 | 38.71 |
AFSN | 0.558 | 0.659 | 0.413 | 28.09 |
UAV123 | KCF | MEEM | SRDCF | ECO | SiamRPN | DaSiamRPN | ATOM | RLS-RTMDNet | AFSN |
---|---|---|---|---|---|---|---|---|---|
OP0.50 | 0.368 | 0.440 | 0.551 | 0.631 | 0.711 | 0.725 | 0.751 | 0.633 | 0.758 |
OP0.75 | 0.144 | 0.150 | 0.263 | 0.324 | 0.398 | 0.405 | 0.476 | 0.320 | 0.554 |
AUC | 0.331 | 0.392 | 0.464 | 0.525 | 0.557 | 0.569 | 0.617 | 0.516 | 0.613 |
Challenging Aspects | Trackers | |||
---|---|---|---|---|
SiamFC | Staple | SiamRPN | AFSN | |
Scale variation | 0.556 | 0.522 | 0.628 | 0.641 |
Deformation | 0.510 | 0.552 | 0.628 | 0.599 |
Illumination variation | 0.574 | 0.596 | 0.663 | 0.655 |
Background clutter | 0.523 | 0.574 | 0.601 | 0.628 |
Motion blur | 0.550 | 0.546 | 0.627 | 0.636 |
Fast motion | 0.568 | 0.537 | 0.606 | 0.653 |
In-plane rotation | 0.557 | 0.552 | 0.636 | 0.611 |
Out-of-plane rotation | 0.558 | 0.534 | 0.631 | 0.617 |
Out of view | 0.506 | 0.481 | 0.550 | 0.617 |
Occlusion | 0.547 | 0.545 | 0.597 | 0.589 |
Low resolution | 0.592 | 0.418 | 0.597 | 0.608 |
ResNet-50 | Multi-Template Update | OTB100 | |||
---|---|---|---|---|---|
R4 | R4 | R4 | AUC | Precision | |
√ | 0.603 | 0.827 | |||
√ | 0.625 | 0.845 | |||
√ | 0.642 | 0.852 | |||
√ | √ | 0.644 | 0.857 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, T.; Yang, W.; Li, Q.; Wang, Y. An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking. Electronics 2021, 10, 1067. https://doi.org/10.3390/electronics10091067
Yuan T, Yang W, Li Q, Wang Y. An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking. Electronics. 2021; 10(9):1067. https://doi.org/10.3390/electronics10091067
Chicago/Turabian StyleYuan, Tongtong, Wenzhu Yang, Qian Li, and Yuxia Wang. 2021. "An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking" Electronics 10, no. 9: 1067. https://doi.org/10.3390/electronics10091067
APA StyleYuan, T., Yang, W., Li, Q., & Wang, Y. (2021). An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking. Electronics, 10(9), 1067. https://doi.org/10.3390/electronics10091067