SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network
Abstract
:1. Introduction
- (1)
- Based on the manifold feature assumption of image data, an end-to-end so-called Siamese classification and regression framework for visual target tracking is proposed. The framework integrates the manifold sample branches of objects and inherits the advantages of semantic information and geometric attributes of objects.
- (2)
- The proposed tracker is both anchor and proposal free, which greatly reduces the design of hyper-parameters and the influence of human factors, and makes the calculation more simple and fast, especially in training.
- (3)
- Compared with the state-of-the-art trackers, our proposed algorithm SiamMFC has obtained a competitive advantage on three public benchmark datasets.
2. Manifold Fully Convolutional Siamese Networks
2.1. Siamese Network Based on Anchor-Free
2.2. Manifold Learning Background
2.2.1. Glassman Manifold
2.2.2. Geodesics
2.3. Manifold Siamese Network Based on Anchor-Free
2.3.1. Overall Network Structure
2.3.2. Manifold Template Branch
2.3.3. Siamese Sub-Network and Classification Regression Sub-Network Branch
3. Experiments
3.1. Implementation Details
3.2. Results on OTB
3.3. Results on UAV123
3.4. Results on GOT-10K
3.5. Ablation Studies
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, H.; Xiezhang, T.; Yang, C.; Deng, L.; Yi, P. Secure video surveillance framework in smart city. Sensors 2021, 21, 4419. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Ouyang, Q.; Cui, Y.; Jin, Y. Preview control with dynamic constraints for autonomous vehicles. Sensors 2021, 21, 5155. [Google Scholar] [CrossRef] [PubMed]
- Gao, M.; Jin, L.; Jiang, Y.; Bie, J. Multiple object tracking using a dual-attention network for autonomous driving. IET Intell. Transp. Syst. 2020, 14, 842–848. [Google Scholar] [CrossRef]
- Chen, J.; Ai, Y.; Qian, Y.; Zhang, W. A novel Siamese Attention Network for visual object tracking of autonomous vehicles. Proc. Inst. Mech. Eng. D J. Automob. Eng. 2021, 235, 2764–2775. [Google Scholar] [CrossRef]
- Tao, R.; Gavves, E.; Smeulders, A.W.M. Siamese Instance Search for Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1420–1429. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar] [CrossRef]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4277–4286. [Google Scholar] [CrossRef] [Green Version]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2016; pp. 770–778. [Google Scholar]
- Guo, Y.; Chen, Y.; Tang, F.; Li, A.; Luo, W.; Liu, M. Object tracking using learned feature manifolds. Comput. Vis. Image Underst. 2014, 118, 128–139. [Google Scholar] [CrossRef]
- Edelman, A.; Arias, T.A.; Smith, S.T. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 1999, 20, 303–353. [Google Scholar] [CrossRef]
- Declercq, A.; Piater, J.H. Online learning of gaussian mixture models—A two-level approach. In Proceedings of the 3rd International Conference on Computer Vision Theory Applications, Funchal, Portugal, 22–25 January 2008; pp. 605–611. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Tian, Z.; Shen, C.H.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer International Publishing AG: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTube-Bounding-Boxes: A large high-precision human annotated data set for object detection in video. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA; 2017; pp. 7464–7473. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6931–6939. [Google Scholar]
- Hong, Z.; Chen, Z.; Wang, C.; Mei, X.; Prokhorov, D.; Tao, D. MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 749–758. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary Learners for Real-Time Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
- Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.M.; Hicks, S.L.; Torr, P.H. Struck: Structured Output Tracking with Kernels. IEEE Trans. Pattern. Anal. Mach. Intell. 2016, 38, 2096–2109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Danelljan, M.; Hager, G.; Khan, F.S.; Felsberg, M. Convolutional features for correlation filter based visual tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCV), Santiago, Chile, 7–13 December 2015; pp. 621–629. [Google Scholar]
- Zha, Y.; Wu, M.; Qiu, Z.; Dong, S.; Yang, F.; Zhang, P. Distractor-aware visual tracking by online Siamese network. IEEE Access 2019, 7, 89777–89788. [Google Scholar] [CrossRef]
- Huang, L.; Zhao, X.; Huang, K. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. arXiv 2018, arXiv:1810.11981. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; Springer International Publishing AG: Cham, Switzerland, 2016; pp. 472–488. [Google Scholar]
- Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4293–4302. [Google Scholar]
- Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P.H.S. End-to-end representation learning for correlation filter based tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5000–5008. [Google Scholar]
- Wang, G.; Luo, C.; Xiong, Z.; Zeng, W. SPM-tracker: Series-parallel matching for real-time visual object tracking. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–21 June 2019; pp. 3638–3647. [Google Scholar]
- Galoogahi, H.K.; Fagg, A.; Lucey, S. Learning background-aware correlation filters for visual tracking. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1144–1152. [Google Scholar]
- Zhang, J.M.; Ma, S.G.; Sclaroff, S. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
ECO-HC | SiamRPN++ | SiamRPN | ECO | DaSiamRPN | SiamMFC | |
AUC (%) | 0.506 | 0.613 | 0.527 | 0.525 | 0.586 | 0.620 |
P (%) | 0.725 | 0.807 | 0.748 | 0.741 | 0.796 | 0.811 |
Tracker | FPS | |||||
---|---|---|---|---|---|---|
SiamFC | 0.374 | 0.404 | 0.144 | Titan X | Matlab | 25.81 |
CCOT | 0.325 | 0.328 | 0.107 | CPU | Matlab | 0.68 |
CFNet [27] | 0.293 | 0.265 | 0.087 | Titan X | Matlab | 35.62 |
SPM [28] | 0.513 | 0.593 | 0.359 | Titan XP | Python | 72.3 |
BACF [29] | 0.260 | 0.262 | 0.101 | CPU | Matlab | 14.44 |
MEEM [30] | 0.253 | 0.235 | 0.068 | CPU | Matlab | 20.59 |
ECO | 0.316 | 0.309 | 0.111 | CPU | Matlab | 2.62 |
MDNet | 0.299 | 0.303 | 0.099 | Titan X | Python | 1.52 |
SiamRPN++ | 0.517 | 0.616 | 0.325 | RTX 1080ti | Python | 49.83 |
SiamMFC | 0.554 | 0.668 | 0.413 | RTX 1080ti | Python | 51.13 |
Component | UAV-123 | |
---|---|---|
AUC (%) | P (%) | |
SiamRPN++ | 0.613 | 0.807 |
SiamRPN++ (anchor-free) | 0.617 | 0.809 |
SiamRPN++ and manifold | 0.615 | 0.808 |
Ours | 0.620 | 0.811 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Wang, F.; Zhang, Y.; Ai, Y.; Zhang, W. SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network. Sensors 2021, 21, 6388. https://doi.org/10.3390/s21196388
Chen J, Wang F, Zhang Y, Ai Y, Zhang W. SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network. Sensors. 2021; 21(19):6388. https://doi.org/10.3390/s21196388
Chicago/Turabian StyleChen, Jia, Fan Wang, Yingjie Zhang, Yibo Ai, and Weidong Zhang. 2021. "SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network" Sensors 21, no. 19: 6388. https://doi.org/10.3390/s21196388
APA StyleChen, J., Wang, F., Zhang, Y., Ai, Y., & Zhang, W. (2021). SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network. Sensors, 21(19), 6388. https://doi.org/10.3390/s21196388