Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method
Abstract
:1. Introduction
- (1)
- A lightweight target-aware attention learning network is designed to learn the most effective channel features of the target online. The new network mines the expressiveness of different channels to the target by the first frame template.
- (2)
- A new attention learning loss function is developed to optimize the training of the proposed network using the Adam optimization method. The loss function effectively improves the modeling capability and tracking accuracy of the network by introducing the gradient information during training.
- (3)
- The lightweight target-aware attention learning network is unified into the Siamese tracking network framework to effectively achieve target tracking. Moreover, the proposed method performs better against other trackers.
2. Related Work
2.1. Lightweight Network-Based Tracker
2.2. Siamese Network-Based Tracker
3. Proposed Method
3.1. Basic Siamese Network for Visual Tracking
3.2. Attentional Learning Loss Function
3.3. Lightweight Target-Aware Attention Learning Network
- (1)
- Parameter learning process.
- (2)
- Obvious characteristic of the lightweight target-aware attention learning network.
4. Experiment and Analysis
4.1. Ablation Studies
4.2. OTB Dataset Experiments
- (1)
- Challenge analysis of the OTB dataset
- (2)
- Qualitative experimental analysis of the OTB dataset
4.3. TC-128 Dataset Experiments
4.4. UAV123 Dataset Experiment
4.5. VOT2016 Dataset Experiment
4.6. LaSOT Dataset Experiment
4.7. Discussions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Farabet, C.; Couprie, C.; Laurent, N.; Yann, L. Learning Hierarchical Features for Scene Labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bousetouane, F.; Dib, L.; Snoussi, H. Improved mean shift integrating texture and color features for robust real time object tracking. Vis. Comput. 2012, 29, 155–170. [Google Scholar] [CrossRef]
- Zhang, H.; Chen, J.; Nie, G.; Hu, S. Uncertain motion tracking based on convolutional net with semantics estimation and region proposals. Pattern Recognit. 2020, 102, 107232. [Google Scholar] [CrossRef]
- Guo, W.; Gao, J.; Tian, Y.; Yu, F.; Feng, Z. SAFS: Object Tracking Algorithm Based on Self-Adaptive Feature Selection. Sensors 2021, 21, 4030. [Google Scholar] [CrossRef] [PubMed]
- Cao, Z.; Fu, C.; Ye, J.; Li, B.; Li, Y. HiFT: Hierarchical Feature Transformer for Aerial Tracking. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Zhao, H.; Yang, G.; Wang, D.; Lu, H. Lightweight Deep Neural Network for Real-Time Visual Tracking with Mutual Learning. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
- Cheng, X.; Zhao, L.; Hu, Q. Real-Time Semantic Segmentation Based on Dilated Convolution Smoothing and Lightweight Up-Sampling. Laser Optoelectron. Prog. 2020, 57, 021017. [Google Scholar] [CrossRef]
- Zhang, H.; Chen, J.; Nie, G.; Lin, Y.; Yang, G.; Zhang, W. Light regression memory and multi-perspective object special proposals for abrupt motion tracking. Knowl.-Based Syst. 2021, 226, 107127. [Google Scholar] [CrossRef]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Gao, P.; Yuan, R.; Wang, F.; Xiao, L.; Hamido, F.; Zhang, Y. Siamese Attentional Keypoint Network for High Performance Visual Tracking. Knowl.-Based Syst. 2019, 193. [Google Scholar] [CrossRef] [Green Version]
- Chen, K.; Tao, W. Learning linear regression via single-convolutional layer for visual object tracking. IEEE Trans. Multimed. 2018, 21, 86–97. [Google Scholar] [CrossRef]
- Ramprasaath, R.S.; Michael, C.; Abhishek, D.; Ramakrishna, V.; Devi, P.; Dhruv, B. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [Green Version]
- Liang, P.; Blasch, E.; Ling, H. Encoding color information for visual tracking: Tracker and benchmark. IEEE Trans. Image Process. 2015, 24, 5630–5644. [Google Scholar] [CrossRef] [PubMed]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 445–461. [Google Scholar]
- Hadfield, S.; Bowden, R.; Lebeda, K. The visual object tracking VOT2016 challenge results. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 777–823. [Google Scholar]
- Fan, H.; Bai, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Huang, M.; Liu, J.; Xu, Y.; et al. Lasot: A high-quality large-scale single object tracking benchmark. Int. J. Comput. Vis. 2021, 129, 439–461. [Google Scholar] [CrossRef]
- Yang, T.; Chan, A.B. Learning dynamic memory networks for object tracking. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Yang, T.; Chan, A.B. Visual tracking via dynamic memory networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 360–374. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [Green Version]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H.S. Staple: Complementary learners for real-time tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, F. Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1561–1575. [Google Scholar] [CrossRef] [Green Version]
- Li, F.; Tian, C.; Zuo, W.; Zhang, L.; Yang, M.H. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4904–4913. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
- Song, Y.; Ma, C.; Gong, L.; Zhang, L.; Lau, R.W.H.; Yang, M.H. Crest: Convolutional residual learning for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Lukezic, A.; Vojir, T.; Zajc, L.C.; Matas, J.; Kristan, M. Discriminative correlation filter with channel and spatial reliability. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4847–4856. [Google Scholar] [CrossRef] [Green Version]
- Tan, K.; Xu, T.B.; Wei, Z. Online visual tracking via background-aware Siamese networks. Int. J. Mach. Learn. Cybern. 2022, 1–18. [Google Scholar] [CrossRef]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4660–4669. [Google Scholar]
- Yuan, D.; Kang, W.; He, Z. Robust visual tracking with correlation filters and metric learning. Knowl.-Based Syst. 2020, 195, 105697. [Google Scholar] [CrossRef]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Robust Visual Tracking via Hierarchical Convolutional Features. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2709–2723. [Google Scholar] [CrossRef] [Green Version]
- Mueller, M.; Smith, N.; Ghanem, B. Context-Aware Correlation Filter Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Oron, S.; Bar-Hillel, A.; Levi, D.; Avidan, S. Locally orderless tracking. Int. J. Comput. Vis. 2015, 111, 213–228. [Google Scholar] [CrossRef]
- Henriques, J.F.; Rui, C.; Martins, P.; Batista, J. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In Proceedings of the 12th European conference on Computer Vision—Volume Part IV, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.; Hicks, S.L.; Torr, P.H.S. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2096–2109. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Ma, C.; Wu, B.; He, Z.; Yang, M.H. Target-aware deep tracking. In Proceedings of the IEEE CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1369–1378. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yang, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Danelljan, M.; Hager, G.; Khan, F.S.; Felsberg, M. Convolutional Features for Correlation Filter Based Visual Tracking. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Nam, H.; Han, B. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Pu, S.; Song, Y.; Ma, C.; Zhang, H.; Yang, M.H. Deep attentive tracking via reciprocative learning. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Tracker | Precision Score (%) | Success Rate (%) | Speed (FPS) |
---|---|---|---|
Ours | 83.3 | 64.3 | 59 |
BaSiamIoU | 83.9 | 70.8 | 50 |
ATOM | 87.9 | 66.7 | 30 |
CFML | 85.3 | 64.9 | 32 |
SiamFC | 77.2 | 58.3 | 102.3 |
CREST | 83.4 | 62.0 | 1.8 |
CSR-DCF | 79.9 | 57.9 | 8.5 |
SRDCF | 79.2 | 60.0 | 4.2 |
Tracker | EAO | Overlap | Failures |
---|---|---|---|
Ours | 0.306 | 0.546 | 20.180 |
SiamRPN++ | 0.479 | 06356 | 11.586 |
SiamRPN | 0.341 | 0.580 | 20.138 |
TADT | 0.300 | 0.546 | 19.973 |
DeepSRDCF | 0.275 | 0.522 | 20.346 |
MDNet | 0.257 | 0.538 | 21.081 |
SRDCF | 0.245 | 0.525 | 28.316 |
HCF | 0.219 | 0.436 | 23.856 |
DAT | 0.216 | 0.458 | 28.353 |
KCF | 0.153 | 0.469 | 52.031 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Zhang, J.; Duan, R.; Li, F.; Zhang, H. Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method. Mathematics 2022, 10, 2299. https://doi.org/10.3390/math10132299
Zhao Y, Zhang J, Duan R, Li F, Zhang H. Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method. Mathematics. 2022; 10(13):2299. https://doi.org/10.3390/math10132299
Chicago/Turabian StyleZhao, Yanchun, Jiapeng Zhang, Rui Duan, Fusheng Li, and Huanlong Zhang. 2022. "Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method" Mathematics 10, no. 13: 2299. https://doi.org/10.3390/math10132299
APA StyleZhao, Y., Zhang, J., Duan, R., Li, F., & Zhang, H. (2022). Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method. Mathematics, 10(13), 2299. https://doi.org/10.3390/math10132299