Parameter-Efficient Tuning for Object Tracking by Migrating Pre-Trained Decoders
Abstract
:1. Introduction
- We leverage adapter adjusting parameters to improve transfer learning in tracking. Attached to pre-trained models, with only a few additional parameters trained, tracking performance comparable to full fine-tuning can be achieved, dramatically increasing parameter efficiency.
- We migrate pre-trained transformer decoders in MAE pre-training to enhance the tracking head, increasing the robustness and generalization of tracking.
2. Related Work
3. Proposed Approach
3.1. Preliminary
3.2. Adapter Tuning
3.3. Migrating Pre-Trained Decoders
4. Experiments
4.1. Implementation Details
4.2. Mainstream Benchmarks
Tracker | LaSOT | LaSOT_ext | TrackingNet | ||||||
---|---|---|---|---|---|---|---|---|---|
AUC | P_norm | P | AUC | P_norm | P | AUC | P_norm | P | |
AdaDe-L256 | 71.2 (+3.1) | 80.9 | 78.6 | 50.3 (+1.3) | 60.6 | 57.3 | 84.3 | 88.8 | 83.8 |
Baseline-L256 | 68.1 | 76.5 | 73.8 | 49.0 | 58.6 | 56.1 | 83.8 | 88.2 | 83.3 |
AdaDe-B256 | 68.7 (−0.2) | 78.5 | 74.6 | 49.5 (+0.8) | 59.9 | 56.0 | 82.0 | 86.6 | 80.1 |
Baseline-B256 | 68.9 | 78.1 | 74.5 | 48.7 | 58.7 | 55.3 | 82.8 | 87.2 | 81.2 |
LoRAT-B-224 [47] | 72.4 | 81.6 | 77.9 | 48.5 | 61.7 | 55.3 | 83.7 | 88.2 | 82.2 |
GRM-L320 [34] | 71.4 | 81.2 | 77.9 | - | - | - | 84.4 | 88.9 | 84.0 |
GRM-B256 [34] | 69.9 | 79.3 | 75.8 | - | - | - | 84.0 | 88.7 | 83.3 |
OSTrack-B384 [10] | 71.1 | 81.1 | 77.6 | 50.5 | 61.3 | 57.6 | 83.9 | 88.5 | 83.2 |
OSTrack-B256 [10] | 69.1 | 78.7 | 75.2 | 47.4 | 57.3 | 53.3 | 83.1 | 87.8 | 82.0 |
SimTrack-L [9] | 70.5 | 79.7 | - | - | - | - | 83.4 | 87.4 | - |
SimTrack-B [9] | 69.3 | 78.5 | - | - | - | - | 82.3 | 86.5 | - |
SwinTrack-B [22] | 69.6 | 78.6 | 74.1 | 47.6 | 58.2 | 54.1 | 82.5 | 87.0 | 80.4 |
Mixformer-L [48] | 70.1 | 79.9 | 76.3 | - | - | - | 83.9 | 88.9 | |
MixFormer-22k [48] | 69.2 | 78.7 | 74.7 | - | - | - | 83.1 | 88.1 | 81.6 |
AiATrack [49] | 69.0 | 79.4 | 73.8 | 47.7 | 55.6 | 55.4 | 82.7 | 87.8 | 80.4 |
ToMP-101 [50] | 68.5 | - | - | 45.9 | - | - | 81.5 | 86.4 | 78.9 |
GTELT [51] | 67.7 | - | 73.2 | 45.0 | 54.2 | 52.4 | 82.5 | 86.7 | 81.6 |
KeepTrack [37] | 67.1 | 77.2 | 70.2 | 48.2 | 58.1 | 56.4 | - | - | - |
STARK-101 [19] | 67.1 | 77.0 | - | - | - | - | 82.0 | 86.9 | - |
TransT [20] | 64.9 | 73.8 | 69.0 | - | - | - | 81.4 | 86.7 | 80.3 |
SiamR-CNN [52] | 64.8 | 72.2 | - | - | - | - | 81.2 | 85.4 | 80.0 |
TrDiMP [21] | 63.9 | - | 61.4 | - | - | - | 78.4 | 83.3 | 73.1 |
LTMU [53] | 57.2 | - | 57.2 | 41.4 | 49.9 | 47.3 | - | - | - |
DiMP [54] | 56.9 | 65.0 | 56.7 | 39.2 | 47.6 | 45.1 | 74.0 | 80.1 | 68.7 |
SiamPRN++ [55] | 49.6 | 56.9 | 49.1 | 34.0 | 41.6 | 39.6 | 73.3 | 80.0 | 69.4 |
SiamFC [56] | 33.6 | 42.0 | 33.9 | 23.0 | 31.1 | 26.9 | 57.1 | 66.3 | 53.3 |
Tracker | UAV123 | NFS | TNL2K |
---|---|---|---|
AdaDe-L256 | 69.9 | 67.5 | 59.1 |
AdaDe-B256 | 68.5 | 67.3 | 56.2 |
GRM-L320 [34] | 72.2 | 66.0 | - |
GRM-B256 [34] | 70.2 | 65.6 | - |
OSTrack-384 [10] | 70.7 | 66.5 | 55.9 |
OSTrack-256 [10] | 68.3 | 64.7 | 54.3 |
MixFormer-22k [48] | 70.4 | - | - |
KeepTrack [37] | 69.7 | 66.4 | - |
STARK-101 [19] | 68.2 | 66.2 | - |
TransT [20] | 68.1 | 65.3 | 50.7 |
TrDiMP [21] | 66.4 | 66.2 | - |
SiamR-CNN [52] | 64.9 | 63.9 | - |
SiamPRN++ [55] | 59.3 | 57.1 | - |
4.3. Other Benchmarks
Tracker | AVisT | ||
---|---|---|---|
AUC | OP50 | OP75 | |
AdaDe-L256 | 59.9 | 69.6 | 52.1 |
AdaDe-B256 | 54.6 | 63.3 | 44.5 |
OSTrack-B384 [10] | 58.1 | 67.9 | 48.6 |
OSTrack-B256 [10] | 56.2 | 65.3 | 46.5 |
MixFormerL-22k [48] | 56.0 | 65.9 | 46.3 |
MixFormer-22k [48] | 53.7 | 63.0 | 43.0 |
GRM-L320 [34] | 55.1 | 63.8 | 46.9 |
GRM-B256 [34] | 54.5 | 63.1 | 45.2 |
STARK-101 [19] | 50.5 | 58.23 | 39.0 |
KeepTrack [37] | 49.4 | 56.3 | 37.2 |
TransT [20] | 49.0 | 56.4 | 37.2 |
TrDiMP [21] | 48.1 | 55.3 | 33.8 |
SiamPRN++ [55] | 39.0 | 43.5 | 21.2 |
DiMP [54] | 38.6 | 41.5 | 22.2 |
4.4. Visualizations
4.5. Effect of Decoder
4.6. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15979–15988. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1106–1114. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yan, W.; Sun, Y.; Yue, G.; Zhou, W.; Liu, H. FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting. IEEE J. Emerg. Sel. Top. Circuits Syst. 2024, 14, 235–244. [Google Scholar] [CrossRef]
- Marin, D.; Chang, J.R.; Ranjan, A.; Prabhu, A.; Rastegari, M.; Tuzel, O. Token Pooling in Vision Transformers for Image Classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 12–21. [Google Scholar]
- Chen, B.; Li, P.; Bai, L.; Qiao, L.; Shen, Q.; Li, B.; Gan, W.; Wu, W.; Ouyang, W. Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 375–392. [Google Scholar]
- Ye, B.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 341–357. [Google Scholar]
- Wu, Q.; Yang, T.; Liu, Z.; Wu, B.; Shan, Y.; Chan, A.B. Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14561–14571. [Google Scholar]
- Zhao, H.; Wang, D.; Lu, H. Representation Learning for Visual Object Tracking by Masked Appearance Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18696–18705. [Google Scholar]
- Liu, F.; Zhang, X.; Peng, Z.; Guo, Z.; Wan, F.; Ji, X.; Ye, Q. Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 6802–6811. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.T.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Ma, L.V.; Nguyen, T.T.D.; Shim, C.; Kim, D.Y.; Ha, N.; Jeon, M. Visual multi-object tracking with re-identification and occlusion handling using labeled random finite sets. Pattern Recognit. 2024, 156, 110785. [Google Scholar]
- Zhu, T.; Hiller, M.; Ehsanpour, M.; Ma, R.; Drummond, T.; Reid, I.D.; Rezatofighi, H. Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12783–12797. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv 2021, arXiv:2110.06864. [Google Scholar]
- Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T. Multiple object tracking: A literature review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
- Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10448–10457. [Google Scholar]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Wang, N.; Zhou, W.; Wang, J.; Li, H. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1571–1580. [Google Scholar]
- Lin, L.; Fan, H.; Xu, Y.; Ling, H. Swintrack: A simple and strong baseline for transformer tracking. arXiv 2021, arXiv:2112.00995. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the EMNLP, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 3045–3059. [Google Scholar]
- Jia, M.; Tang, L.; Chen, B.; Cardie, C.; Belongie, S.J.; Hariharan, B.; Lim, S. Visual Prompt Tuning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Volume 13693, pp. 709–727. [Google Scholar]
- Chen, S.; Ge, C.; Tong, Z.; Wang, J.; Song, Y.; Wang, J.; Luo, P. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Pfeiffer, J.; Kamath, A.; Rücklé, A.; Cho, K.; Gurevych, I. AdapterFusion: Non-Destructive Task Composition for Transfer Learning. In Proceedings of the EACL, Online, 19–23 April 2021; pp. 487–503. [Google Scholar]
- Xin, Y.; Du, J.; Wang, Q.; Lin, Z.; Yan, K. VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding. arXiv 2023, arXiv:2312.08733. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representation, Online, 25–29 April 2022. [Google Scholar]
- Yang, J.; Li, Z.; Zheng, F.; Leonardis, A.; Song, J. Prompting for multi-modal tracking. In Proceedings of the ACMMM, Lisbon, Portugal, 10–14 October 2022; pp. 3492–3500. [Google Scholar]
- Zhu, J.; Lai, S.; Chen, X.; Wang, D.; Lu, H. Visual Prompt Multi-Modal Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9516–9526. [Google Scholar]
- Wu, Z.; Zheng, J.; Ren, X.; Vasluianu, F.; Ma, C.; Paudel, D.P.; Gool, L.V.; Timofte, R. Single-Model and Any-Modality for Video Object Tracking. arXiv 2023, arXiv:2311.15851. [Google Scholar]
- Cao, B.; Guo, J.; Zhu, P.; Hu, Q. Bi-directional Adapter for Multi-modal Tracking. arXiv 2023, arXiv:2312.10611. [Google Scholar]
- Hou, X.; Xing, J.; Qian, Y.; Guo, Y.; Xin, S.; Chen, J.; Tang, K.; Wang, M.; Jiang, Z.; Liu, L.; et al. SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking. arXiv 2024, arXiv:2403.16002. [Google Scholar]
- Gao, S.; Zhou, C.; Zhang, J. Generalized relation modeling for transformer tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18686–18695. [Google Scholar]
- Chen, X.; Peng, H.; Wang, D.; Lu, H.; Hu, H. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14572–14581. [Google Scholar]
- Song, Z.; Luo, R.; Yu, J.; Chen, Y.P.P.; Yang, W. Compact transformer tracker with correlative masked modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 2–14 February 2023; Volume 37, pp. 2321–2329. [Google Scholar]
- Mayer, C.; Danelljan, M.; Paudel, D.P.; Van Gool, L. Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 13444–13454. [Google Scholar]
- Han, Q.; Cai, Y.; Zhang, X. RevColV2: Exploring Disentangled Representations in Masked Image Modeling. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Fan, H.; Bai, H.; Lin, L.; Yang, F.; Ling, H. LaSOT: A High-quality Large-scale Single Object Tracking Benchmark. Int. J. Comput. Vis. 2021, 129, 439–461. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Muller, M.; Bibi, A.; Giancola, S.; Alsubaihi, S.; Ghanem, B. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 300–317. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 445–461. [Google Scholar]
- Galoogahi, H.K.; Fagg, A.; Huang, C.; Ramanan, D.; Lucey, S. Need for Speed: A Benchmark for Higher Frame Rate Object Tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1134–1143. [Google Scholar]
- Wang, X.; Shu, X.; Zhang, Z.; Jiang, B.; Wang, Y.; Tian, Y.; Wu, F. Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13763–13773. [Google Scholar]
- Lin, L.; Fan, H.; Zhang, Z.; Wang, Y.; Xu, Y.; Ling, H. Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Volume 15059, pp. 300–318. [Google Scholar]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. MixFormer: End-to-End Tracking with Iterative Mixed Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13608–13618. [Google Scholar]
- Gao, S.; Zhou, C.; Ma, C.; Wang, X.; Yuan, J. Aiatrack: Attention in attention for transformer visual tracking. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 146–164. [Google Scholar]
- Mayer, C.; Danelljan, M.; Bhat, G.; Paul, M.; Paudel, D.P.; Yu, F.; Van Gool, L. Transforming model prediction for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8731–8740. [Google Scholar]
- Zhou, Z.; Chen, J.; Pei, W.; Mao, K.; Wang, H.; He, Z. Global tracking via ensemble of local trackers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8761–8770. [Google Scholar]
- Voigtlaender, P.; Luiten, J.; Torr, P.H.; Leibe, B. Siam R-CNN: Visual Tracking by Re-Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6578–6588. [Google Scholar]
- Dai, K.; Zhang, Y.; Wang, D.; Li, J.; Lu, H.; Yang, X. High-Performance Long-Term Tracking With Meta-Updater. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6298–6307. [Google Scholar]
- Bhat, G.; Danelljan, M.; Gool, L.V.; Timofte, R. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6182–6191. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Noman, M.; Ghallabi, W.A.; Najiha, D.; Mayer, C.; Dudhane, A.; Danelljan, M.; Cholakkal, H.; Khan, S.; Gool, L.V.; Khan, F.S. AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility. In Proceedings of the British Machine Vision Conference, London, UK, 21–24 November 2022. [Google Scholar]
- Li, B.; Fu, C.; Ding, F.; Ye, J.; Lin, F. All-Day Object Tracking for Unmanned Aerial Vehicle. IEEE Trans. Mob. Comput. 2023, 22, 4515–4529. [Google Scholar] [CrossRef]
- Ye, J.; Fu, C.; Cao, Z.; An, S.; Zheng, G.; Li, B. Tracker Meets Night: A Transformer Enhancer for UAV Tracking. IEEE Robot. Autom. Lett. 2022, 7, 3866–3873. [Google Scholar] [CrossRef]
- Fan, H.; Miththanthaya, H.A.; Harshit; Rajan, S.R.; Liu, X.; Zou, Z.; Lin, Y.; Ling, H. Transparent Object Tracking Benchmark. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10714–10723. [Google Scholar]
- Zhu, J.; Tang, H.; Cheng, Z.; He, J.; Luo, B.; Qiu, S.; Li, S.; Lu, H. DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs. arXiv 2023, arXiv:2309.10491. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 3–19 June 2020; pp. 2633–2642. [Google Scholar]
- Sun, T.; Segù, M.; Postels, J.; Wang, Y.; Gool, L.V.; Schiele, B.; Tombari, F.; Yu, F. SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21339–21350. [Google Scholar]
- Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; p. 155. [Google Scholar]
- Danelljan, M.; Gool, L.V.; Timofte, R. Probabilistic Regression for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7183–7192. [Google Scholar]
Tracker | LaSOT | Improvement | LaSOT | Improvement |
---|---|---|---|---|
Baseline | 68.9 | - | 48.7 | - |
Pre-trained Decoder | 69.3 | +0.4% | 48.5 | −0.2% |
Random Decoder | 68.6 | −0.3% | 47.5 | −1.2% |
Variant | Method | LaSOT | LaSOT | AVisT | UAVDark135 | DarkTrack2021 |
---|---|---|---|---|---|---|
L256 | + Adapter Module | 71.2 | 50.6 | 58.4 | 58.1 | 53.8 |
+ Adapter Module + Decoder | 71.2 (0.0 ↑) | 50.3 (0.3 ↓) | 59.9 (1.5 ↑) | 59.7 (1.6 ↑) | 53.3 (0.5 ↓) | |
B256 | + Adapter Module | 68.3 | 48.7 | 54.9 | 53.2 | 47.3 |
+ Adapter Module + Decoder | 68.7 (0.4 ↑) | 49.5 (0.8 ↑) | 54.6 (0.3 ↓) | 54.9 (1.7 ↑) | 49.2 (1.9 ↑) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, R.; Wang, L.; Yang, S. Parameter-Efficient Tuning for Object Tracking by Migrating Pre-Trained Decoders. Electronics 2024, 13, 4621. https://doi.org/10.3390/electronics13234621
Zhang R, Wang L, Yang S. Parameter-Efficient Tuning for Object Tracking by Migrating Pre-Trained Decoders. Electronics. 2024; 13(23):4621. https://doi.org/10.3390/electronics13234621
Chicago/Turabian StyleZhang, Ruijuan, Li Wang, and Song Yang. 2024. "Parameter-Efficient Tuning for Object Tracking by Migrating Pre-Trained Decoders" Electronics 13, no. 23: 4621. https://doi.org/10.3390/electronics13234621
APA StyleZhang, R., Wang, L., & Yang, S. (2024). Parameter-Efficient Tuning for Object Tracking by Migrating Pre-Trained Decoders. Electronics, 13(23), 4621. https://doi.org/10.3390/electronics13234621