DLUT: Decoupled Learning-Based Unsupervised Tracker
Abstract
:1. Introduction
- We propose a decoupled learning-based unsupervised tracker (DLUT). By decoupling the training pipelines of different branches, we fully unleash the unsupervised learning capability of the DLUT.
- Considering the characteristics of each branch and in line with the proposed decoupled framework, we devise three independent decoupling-correlation modules. These modules are able to adapt to the goals of different branches and realize independent and adaptive training without disrupting the respective training pipelines.
- We devise an unsupervised training strategy based on suppression ranking, which is able to suppress background noise and sort foreground and background samples to highlight the foreground object.
2. Related Work
2.1. Supervised Tracking
2.2. Unsupervised Tracking
2.3. Decoupled Structures
3. Proposed Method
3.1. Preliminary
3.2. Unsupervised Tracking Network Based on Decoupled Framework
3.2.1. Network Architecture
3.2.2. Decoupled Structure
3.3. Cross-Correlation Module
3.4. Training the Network with Noisy Labels
3.4.1. Suppression-Ranking Strategy
3.4.2. Regression Loss
3.4.3. Centerness Loss
3.5. Multi-Stage Training
4. Experiments
4.1. Experimental Setup and Details
4.2. Comparison with State-of-the-Art Trackers
4.3. Ablation Experiments
4.4. Qualitative Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DLUT | Decoupled Learning-based Unsupervised Tracker |
PFD | Pixel-FiLm-Depth-wise |
PPD | Pixel-Pair-Depth-wise |
PLD | Pixel-Psa-Depth-wise |
SRS | Suppression-Ranking Strategy |
SBL | Sample Balance Loss |
CRL | Classification-Ranking Loss |
DIoU | Distance-Intersection over Union |
GPU | Graphic Processing Unit |
SGD | Stochastic Gradient Descent |
AUC | Area under curve |
Suc | Success |
Pre | Precision |
NPre | Normalized Precision |
GT | Ground Truth |
References
- Chen, F.; Wang, X.; Zhao, Y.; Lv, S.; Niu, X. Visual object tracking: A survey. Comput. Vis. Image Underst. 2022, 222, 103508. [Google Scholar] [CrossRef]
- Wang, N.; Song, Y.; Ma, C.; Zhou, W.; Liu, W.; Li, H. Unsupervised Deep Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 1308–1317. [Google Scholar]
- Zheng, J.; Ma, C.; Peng, H.; Yang, X. Learning To Track Objects From Unlabeled Videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 13546–13555. [Google Scholar]
- Shen, Q.; Qiao, L.; Guo, J.; Li, P.; Li, X.; Li, B.; Feng, W.; Gan, W.; Wu, W.; Ouyang, W. Unsupervised Learning of Accurate Siamese Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, LA, USA, 19–24 June 2022; pp. 8101–8110. [Google Scholar]
- Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of Localization Confidence for Accurate Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–799. [Google Scholar]
- Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10186–10195. [Google Scholar]
- Song, G.; Liu, Y.; Wang, X. Revisiting the sibling head in object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11563–11572. [Google Scholar]
- Zhang, Z.; Peng, H.; Fu, J.; Li, B.; Hu, W. Ocean: Object-aware anchor-free tracking. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 771–787. [Google Scholar]
- Zhang, Z.; Liu, Y.; Wang, X.; Li, B.; Hu, W. Learn to match: Automatic matching network design for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 13339–13348. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10, 15–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 850–865. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8971–8980. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic siamese network for visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1763–1771. [Google Scholar]
- Dong, X.; Shen, J.; Shao, L.; Porikli, F. CLNet: A compact latent network for fast adjusting Siamese trackers. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 378–395. [Google Scholar]
- Yang, T.; Chan, A.B. Learning Dynamic Memory Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 152–167. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Xie, F.; Wang, C.; Wang, G.; Yang, W.; Zeng, W. Learning Tracking Representations via Dual-Branch Fully Transformer Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 2688–2697. [Google Scholar]
- Lin, L.; Fan, H.; Zhang, Z.; Xu, Y.; Ling, H. Swintrack: A simple and strong baseline for transformer tracking. Adv. Neural Inf. Process. Syst. 2022, 35, 16743–16754. [Google Scholar]
- Fu, Z.; Fu, Z.; Liu, Q.; Cai, W.; Wang, Y. SparseTT: Visual Tracking with Sparse Transformers. arXiv 2022, arXiv:2205.03776. [Google Scholar]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. MixFormer: End-to-End Tracking With Iterative Mixed Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, LA, USA, 19–24 June 2022; pp. 13608–13618. [Google Scholar]
- Chen, B.; Li, P.; Bai, L.; Qiao, L.; Shen, Q.; Li, B.; Gan, W.; Wu, W.; Ouyang, W. Backbone is all your need: A simplified architecture for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 375–392. [Google Scholar]
- Ye, B.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 341–357. [Google Scholar]
- Lan, J.P.; Cheng, Z.Q.; He, J.Y.; Li, C.; Luo, B.; Bao, X.; Xiang, W.; Geng, Y.; Xie, X. Procontext: Exploring Progressive Context Transformer for Tracking. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–9 June 2023; pp. 1–5. [Google Scholar]
- Gao, S.; Zhou, C.; Zhang, J. Generalized Relation Modeling for Transformer Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18686–18695. [Google Scholar]
- Xie, F.; Chu, L.; Li, J.; Lu, Y.; Ma, C. VideoTrack: Learning To Track Objects via Video Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 22826–22835. [Google Scholar]
- Wu, Q.; Yang, T.; Liu, Z.; Wu, B.; Shan, Y.; Chan, A.B. DropMAE: Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 14561–14571. [Google Scholar]
- Zhao, H.; Wang, D.; Lu, H. Representation Learning for Visual Object Tracking by Masked Appearance Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18696–18705. [Google Scholar]
- Wei, X.; Bai, Y.; Zheng, Y.; Shi, D.; Gong, Y. Autoregressive Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 9697–9706. [Google Scholar]
- Chen, X.; Peng, H.; Wang, D.; Lu, H.; Hu, H. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 14572–14581. [Google Scholar]
- Wang, Q.; Gao, J.; Xing, J.; Zhang, M.; Hu, W. DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv 2017, arXiv:1704.04057. [Google Scholar]
- Yuan, W.; Wang, M.Y.; Chen, Q. Self-supervised Object Tracking with Cycle-consistent Siamese Networks. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10351–10358. [Google Scholar]
- Li, X.; Liu, S.; De Mello, S.; Wang, X.; Kautz, J.; Yang, M.H. Joint-task Self-supervised Learning for Temporal Correspondence. In Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Lane Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Sio, C.H.; Ma, Y.J.; Shuai, H.H.; Chen, J.C.; Cheng, W.H. S2SiamFC: Self-Supervised Fully Convolutional Siamese Network for Visual Tracking. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’20, Seattle, WA, USA, 12–16 October 2020; pp. 1948–1957. [Google Scholar]
- Wu, Q.; Wan, J.; Chan, A.B. Progressive Unsupervised Learning for Visual Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2993–3002. [Google Scholar]
- Zhou, Z.; You, S.; Kuo, C.C.J. Unsupervised Green Object Tracker (GOT) without Offline Pre-training. arXiv 2023, arXiv:2309.09078. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Lane Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
- Yan, B.; Zhang, X.; Wang, D.; Lu, H.; Yang, X. Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5289–5298. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Perez, E.; Strub, F.; De Vries, H.; Dumoulin, V.; Courville, A. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Tang, F.; Ling, Q. Ranking-Based Siamese Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 8741–8750. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6269–6277. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef]
- Muller, M.; Bibi, A.; Giancola, S.; Alsubaihi, S.; Ghanem, B. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 300–317. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Wang, X.; Shu, X.; Zhang, Z.; Jiang, B.; Wang, Y.; Tian, Y.; Wu, F. Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13763–13773. [Google Scholar]
- Kiani Galoogahi, H.; Fagg, A.; Huang, C.; Ramanan, D.; Lucey, S. Need for Speed: A Benchmark for Higher Frame Rate Object Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1125–1134. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Zhang, Z.; Peng, H. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4591–4600. [Google Scholar]
- Jung, I.; Son, J.; Baek, M.; Han, B. Real-Time MDNet. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 83–98. [Google Scholar]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning Spatially Regularized Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
- Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P.H.S. End-To-End Representation Learning for Correlation Filter Based Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2805–2813. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H.S. Staple: Complementary Learners for Real-Time Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef]
- Danelljan, M.; Häger, G.; Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In Proceedings of the British machine vision conference, Nottingham, UK, 1–5 September 2014; BMVA Press: London, UK, 2014. [Google Scholar]
- Wang, N.; Zhou, W.; Song, Y.; Ma, C.; Liu, W.; Li, H. Unsupervised deep representation learning for real-time tracking. Int. J. Comput. Vis. 2021, 129, 400–418. [Google Scholar] [CrossRef]
- Ma, C.; Yang, X.; Zhang, C.; Yang, M.H. Long-Term Correlation Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5388–5396. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical Convolutional Features for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
- Li, Y.; Zhu, J. A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration. In Proceedings of the Computer Vision-ECCV 2014 Workshops, Munich, Germany, 8–14 September 2015; pp. 254–265. [Google Scholar]
- Kiani Galoogahi, H.; Fagg, A.; Lucey, S. Learning Background-Aware Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1135–1143. [Google Scholar]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H. Visual Tracking With Fully Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3119–3127. [Google Scholar]
- Qi, Y.; Zhang, S.; Qin, L.; Yao, H.; Huang, Q.; Lim, J.; Yang, M.H. Hedged Deep Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4303–4311. [Google Scholar]
- Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W.; Torr, P.H. Fast Online Object Tracking and Segmentation: A Unifying Approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1328–1338. [Google Scholar]
Tracker | Unsupervised | OTB2015 | TrackingNet | |||
---|---|---|---|---|---|---|
Suc. | Pre. | Suc. | Pre. | NPre. | ||
SiamRPN++ [39] | No | 0.695 | 0.906 | 0.694 | 0.800 | 0.733 |
SiamDW [54] | No | 0.670 | 0.892 | - | - | - |
SiamRPN [11] | No | 0.637 | 0.851 | - | - | - |
MDNet [55] | No | 0.660 | 0.885 | 0.565 | 0.705 | 0.606 |
DCFNet [30] | No | 0.580 | 0.769 | 0.533 | 0.654 | 0.578 |
SiamFC [10] | No | 0.586 | 0.772 | 0.533 | 0.663 | 0.571 |
SRDCF [56] | No | 0.598 | 0.789 | 0.455 | 0.573 | 0.521 |
CFNet [57] | No | 0.568 | 0.778 | 0.533 | 0.654 | 0.578 |
Staple [58] | No | 0.578 | 0.783 | 0.470 | 0.603 | 0.528 |
KCF [59] | Yes | 0.485 | 0.696 | 0.419 | 0.546 | 0.447 |
DSST [60] | Yes | 0.518 | 0.689 | 0.460 | 0.588 | 0.464 |
LUDT [61] | Yes | 0.602 | 0.769 | 0.469 | 0.593 | 0.543 |
LUDT+ [61] | Yes | 0.639 | 0.843 | 0.495 | 0.633 | 0.563 |
PUL [34] | Yes | 0.584 | - | 0.485 | 0.630 | 0.546 |
USOT [3] | Yes | 0.589 | 0.806 | 0.551 | 0.682 | 0.599 |
USOT * [3] | Yes | 0.574 | 0.775 | 0.566 | 0.691 | 0.615 |
ULAST [4] | Yes | 0.610 | 0.811 | - | - | - |
GOT [35] | Yes | 0.654 | 0.876 | 0.526 | - | 0.563 |
ours | Yes | 0.632 | 0.872 | 0.567 | 0.700 | 0.618 |
Tracker | KCF [59] | LCT [62] | HCF [63] | DSST [60] | SAMF [64] | Staple [58] | BACF [65] | SRDCF [56] | FCNT [66] | HDT [67] | SiamFC [10] | MDNet [55] | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 FPS no MB | 0.209 | 0.232 | 0.275 | 0.277 | 0.281 | 0.321 | 0.332 | 0.339 | 0.378 | 0.386 | 0.396 | 0.418 | 0.426 |
30 FPS MB | 0.203 | 0.222 | 0.270 | 0.261 | 0.270 | 0.310 | 0.320 | 0.331 | 0.365 | 0.368 | 0.367 | 0.395 | 0.426 |
240 FPS | 0.308 | 0.318 | 0.357 | 0.416 | 0.411 | 0.421 | 0.467 | 0.448 | 0.447 | 0.453 | 0.454 | 0.450 | 0.486 |
Experiment ID | PFD | PPD | PLD | Suc./Pre. |
---|---|---|---|---|
1 | ✓ | 62.7/86.4 | ||
2 | ✓ | 60.3/85.1 | ||
3 | ✓ | 58.6/80.6 | ||
4 | ✓ | ✓ | 63.0/86.6 | |
5 | ✓ | ✓ | 62.9/86.0 | |
6 | ✓ | ✓ | 60.9/83.1 | |
7 | ✓ | ✓ | ✓ | 63.2/87.2 |
8 | 58.8/79.7 |
Overall | IV | SV | OCC | DEF | MB | FM | IPR | OPR | OV | BC | LR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PFD | 62.7/86.4 | 63.6/87.9 | 61.0/84.4 | 57.8/78.9 | 61.4/87.0 | 60.2/81.6 | 63.2/86.7 | 63.2/88.7 | 61.2/85.9 | 51.5/73.2 | 57.8/80.9 | 67.0/97.6 |
PPD | 60.3/85.1 | 60.5/87.1 | 59.0/82.8 | 55.5/77.5 | 59.0/85.7 | 57.9/81.3 | 59.8/84.1 | 61.8/88.6 | 58.2/84.0 | 43.7/64.8 | 56.4/83.6 | 63.1/91.8 |
PLD | 58.6/80.6 | 60.4/82.9 | 58.2/82.0 | 54.6/73.6 | 54.6/77.2 | 55.8/76.3 | 56.8/78.2 | 60.8/85.2 | 58.7/82.2 | 52.9/74.2 | 52.7/72.7 | 63.4/94.5 |
ALL | 63.2/87.2 | 63.5/88.1 | 62.0/85.5 | 58.0/79.0 | 61.6/86.9 | 61.4/83.5 | 63.5/87.4 | 63.3/89.3 | 62.3/87.5 | 50.3/71.8 | 58.5/82.6 | 66.5/96.9 |
Base | 58.8/79.7 | 58.7/78.7 | 59.5/80.3 | 54.0/72.9 | 54.3/75.0 | 58.3/77.2 | 62.7/83.2 | 60.6/85.2 | 59.2/81.4 | 50.7/70.0 | 53.6/73.3 | 55.5/81.7 |
Experiment ID | SBL | CRL | PSO | Suc. | Pre. |
---|---|---|---|---|---|
1 | ✓ | 61.3 | 83.6 | ||
2 | ✓ | 62.3 | 85.2 | ||
3 | ✓ | ✓ | 60.7 | 82.3 | |
4 | ✓ | ✓ | 63.2 | 87.2 | |
5 | ✓ | ✓ | 61.7 | 84.8 | |
6 | ✓ | ✓ | ✓ | 61.5 | 83.9 |
7 | 60.1 | 83.2 |
Decoupled-Correlation Modules | SRS | Suc. | Pre. |
---|---|---|---|
✓ | 60.1 | 83.2 | |
✓ | 58.8 | 79.7 | |
✓ | ✓ | 63.2 | 87.2 |
57.4 | 77.5 |
Offline (St.1) | Offline+Online (St.2) | Suc. | Pre. |
---|---|---|---|
✓ | 62.1 | 86.3 | |
✓ | 63.2 | 87.2 | |
✓ | ✓ | 62.5 | 86.4 |
62.4 | 86.6 |
Tracker | SiamFC | MDNet | HDT | SRDCF | FCNT | GradNet | Mem- Tracking | SiamRPN | SiamMask | SiamRPN ++ | SiamDW | UDT | USOT | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Unsupervised | No | No | No | No | No | No | No | No | No | No | No | Yes | Yes | Yes |
Skating2-1 | 0.335 | - | - | - | - | - | - | - | - | 0.454 | 0.365 | 0.327 | 0.560 | 0.595 |
horse_running | 0.495 | 0.508 | 0.492 | 0.031 | 0.492 | - | - | - | - | - | - | - | - | 0.679 |
person_scooter | 0.279 | 0.266 | 0.276 | 0.400 | 0.277 | - | - | - | - | - | - | - | - | 0.697 |
INF_blackwidow _1-Done | 0.545 | - | - | - | - | 0.515 | 0.551 | 0.717 | - | - | - | 0.662 | - | 0.732 |
spider-14 | 0.063 | - | - | - | - | - | - | - | 0.401 | 0.605 | 0.463 | - | 0.641 | 0.631 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Z.; Huang, D.; Huang, X.; Song, J.; Liu, H. DLUT: Decoupled Learning-Based Unsupervised Tracker. Sensors 2024, 24, 83. https://doi.org/10.3390/s24010083
Xu Z, Huang D, Huang X, Song J, Liu H. DLUT: Decoupled Learning-Based Unsupervised Tracker. Sensors. 2024; 24(1):83. https://doi.org/10.3390/s24010083
Chicago/Turabian StyleXu, Zhengjun, Detian Huang, Xiaoqian Huang, Jiaxun Song, and Hang Liu. 2024. "DLUT: Decoupled Learning-Based Unsupervised Tracker" Sensors 24, no. 1: 83. https://doi.org/10.3390/s24010083
APA StyleXu, Z., Huang, D., Huang, X., Song, J., & Liu, H. (2024). DLUT: Decoupled Learning-Based Unsupervised Tracker. Sensors, 24(1), 83. https://doi.org/10.3390/s24010083