Scale-Adaptive KCF Mixed with Deep Feature for Pedestrian Tracking
Abstract
:1. Introduction
2. KCF with Deep Feature and Adaptive Scale
2.1. KCF Tracking Algorithm
2.2. KCF Scale Adaptation
2.3. Pedestrian Feature Extraction Based on Convolutional Neural Network
2.4. Pedestrian Tracking Based on Fusion Metrics
3. Improved YOLOv3 Algorithm for Pedestrian Recognition
3.1. Use Soft-NMS Algorithm to Filter Incorrectly Predicted Detection Boxes
3.2. Use the Retrieval Algorithm to Recover the Detected Bounding Box Missed by Soft-NMS
4. Experiments
4.1. Train the Deep Feature Extraction Network
4.2. Soft-NMS and Retrieval Algorithm to Improve YOLOv3
4.3. Tracking Effect Analysis
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–12 June 2015; pp. 815–823. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition IEEE, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-Aware Siamese Networks for Visual Object Tracking. arXiv 2018, arXiv:1808.06048. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- João, F.; Henriques, C.R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. arXiv 2014, arXiv:1804.02767. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Kinga, D.; Adam, J.B. A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-identification: A Benchmark. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Institute of Electrical and Electronics Engineers (IEEE), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Weinberger, K.Q.; Saul, L.K. Distance Metric Learning for Large Margin Nearest Neighbor Classification. J. Mach. Learn. Res. 2009, 10, 207–244. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vision. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Wu, Y.; Lim, J.; Yang, M.-H. Online Object Tracking: A Benchmark. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Jia, X.; Lu, H.; Yang, M.-H. Visual tracking via adaptive structural local sparse appearance model. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Rhode Island, RI, USA, 16–21 June 2012; pp. 1822–1829. [Google Scholar]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-Learning-Detection. IEEE Trans. Softw. Eng. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
- Huang, R.; Pedoeem, J.; Chen, C. YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In Proceedings of the 2018 IEEE International Conference on Big Data, Institute of Electrical and Electronics Engineers (IEEE), Seattle, WA, USA, 10–13 December 2018; pp. 2503–2510. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, MA, USA, 18–23 June 2018; Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2018; pp. 6848–6856. [Google Scholar]
- Wu, B.; Keutzer, K.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. arXiv 2018, arXiv:1812.03443. [Google Scholar]
Layer Name | Output Size | Convolution Kernel | Stride |
---|---|---|---|
Convolution layer | 32 × 128 × 64 | 3 × 3 | 1 |
Convolution layer | 64 × 128 × 64 | 3 × 3 | 1 |
Max pooling layer | 64 × 64 × 32 | 2 × 2 | 2 |
Convolution layer | 64 × 64 × 32 | 3 × 3 | 1 |
Convolution layer | 128 × 64 × 32 | 3 × 3 | 1 |
Max pooling layer | 128 × 32 × 16 | 2 × 2 | 2 |
Convolution layer | 128 × 32 × 16 | 3 × 3 | 1 |
Max pooling layer | 128 × 8 × 4 | 4 × 4 | 4 |
FC (Full Convolution) layer | 128 |
Method | Accuracy (%) |
---|---|
YOLOv3 | 43.0 |
YOLOv3 using Soft-NMS and the retrieval algorithm | 46.1 |
Name of Tracking Algorithm | “Human 6 Video” | “Woman Video” | “Girl 2 Video” | FPS |
---|---|---|---|---|
Success Rate | Success Rate | Success Rate | ||
KCF | 0.20 | 0.69 | 0.05 | 245 |
ASLA | 0.38 | 0.14 | 0.55 | 12 |
TLD | 0.30 | 0.13 | 0.07 | 37 |
DASiamRPN | 0.91 | 0.93 | 0.87 | 160 |
SiamRPN | 0.75 | 0.94 | 0.94 | 200 |
Ours | 0.97 | 0.94 | 0.94 | 80 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Y.; Yang, W.; Shen, Y. Scale-Adaptive KCF Mixed with Deep Feature for Pedestrian Tracking. Electronics 2021, 10, 536. https://doi.org/10.3390/electronics10050536
Zhou Y, Yang W, Shen Y. Scale-Adaptive KCF Mixed with Deep Feature for Pedestrian Tracking. Electronics. 2021; 10(5):536. https://doi.org/10.3390/electronics10050536
Chicago/Turabian StyleZhou, Yang, Wenzhu Yang, and Yuan Shen. 2021. "Scale-Adaptive KCF Mixed with Deep Feature for Pedestrian Tracking" Electronics 10, no. 5: 536. https://doi.org/10.3390/electronics10050536
APA StyleZhou, Y., Yang, W., & Shen, Y. (2021). Scale-Adaptive KCF Mixed with Deep Feature for Pedestrian Tracking. Electronics, 10(5), 536. https://doi.org/10.3390/electronics10050536