Re-Identification in Urban Scenarios: A Review of Tools and Methods
Abstract
:1. Introduction
2. Deep Learning for Object Re-Identification
3. Evaluation Metrics
4. Methodology
- What was the ReID- or multi-object ReID problem addressed?
- What was the general approach and type of DNN-based models employed?
- What were the datasets and models proposed by the authors? Were there any variations observed by the authors?
- Was any pre-processing of data or data augmentation technique used?
- What was the overall performance in (depending on the adopted metric)?
- Did the authors test their model performances on different datasets?
- Did the authors compare their approaches with other techniques? If yes, what was the difference in performance?
5. Person Re-Identification
5.1. Person Re-Identification Databases
5.2. Person Re-Identification Methods
5.2.1. Feature Learning
5.2.2. Deep Learning Metrics
5.2.3. Sequence Learning for ReID
5.2.4. Generative Learning for ReID
5.2.5. Summary of Person ReID
6. ReID and Spatial–Temporal Multi Object ReID Methods
6.1. Multi Object ReID Datasets with Trajectories
6.2. Spatial–Temporal Constrained and Multi Object ReID Methods
6.2.1. Deep Learning Metrics for Vehicle ReID
6.2.2. Sequence Models for Vehicle ReID
6.2.3. Feature Learning for Vehicle ReID
6.2.4. Tracking for Vehicle ReID
6.2.5. Summary of Vehicle ReID Methods
7. Methods for Image Enhancement
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wu, L.; Wang, Y.; Gao, J.; Li, X. Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognit. 2018, 73, 275–288. [Google Scholar] [CrossRef] [Green Version]
- Zhang, W.; Ma, B.; Liu, K.; Huang, R. Video-based pedestrian re-identification by adaptive spatio-temporal appearance model. IEEE Trans. Image Process. 2017, 26, 2042–2054. [Google Scholar] [CrossRef]
- Varior, R.R.; Haloi, M.; Wang, G. Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 791–808. [Google Scholar]
- Xiao, T.; Li, H.; Ouyang, W.; Wang, X. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 1249–1258. [Google Scholar]
- McLaughlin, N.; Martinez del Rincon, J.; Miller, P. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 1325–1334. [Google Scholar]
- Yan, Y.; Ni, B.; Song, Z.; Ma, C.; Yan, Y.; Yang, X. Person re-identification via recurrent feature aggregation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 701–716. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Deep metric learning for person re-identification. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Washington, DC, USA, 24–28 August 2014; pp. 34–39. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2017, 14, 1–20. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1989, 2, 396–404. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv 2016, arXiv:1602.07261. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Canziani, A.; Paszke, A.; Culurciello, E. An analysis of deep neural network models for practical applications. arXiv 2016, arXiv:1605.07678. [Google Scholar]
- Gong, S.; Cristani, M.; Yan, S.; Loy, C.C.; Re-Identification, P. Springer Publishing Company. Incorporated 2014, 1447162951, 9781447162957. [Google Scholar]
- Li, D.; Zhang, Z.; Chen, X.; Ling, H.; Huang, K. A richly annotated dataset for pedestrian attribute recognition. arXiv 2016, arXiv:1603.07054. [Google Scholar]
- Gray, D.; Tao, H. Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 262–275. [Google Scholar]
- Nguyen, T.B.; Le, T.L.; Nguyen, D.D.; Pham, D.T. A Reliable Image-to-Video Person Re-Identification Based on Feature Fusion. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Dong Hoi City, Vietnam, 19–21 March 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 433–442. [Google Scholar]
- Pham, T.T.T.; Le, T.L.; Vu, H.; Dao, T.K. Fully-automated person re-identification in multi-camera surveillance system with a robust kernel descriptor and effective shadow removal method. Image Vis. Comput. 2017, 59, 44–62. [Google Scholar] [CrossRef]
- Cheng, D.S.; Cristani, M.; Stoppa, M.; Bazzani, L.; Murino, V. Custom pictorial structures for re-identification. BMVC 2011, 1, 6. [Google Scholar]
- Das, A.; Chakraborty, A.; Roy-Chowdhury, A.K. Consistent Re-Identification in a Camera Network. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 330–345. [Google Scholar]
- Moon, H.; Phillips, P.J. Computational and performance aspects of PCA-based face-recognition algorithms. Perception 2001, 30, 303–321. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.B.; Le, T.L.; Ngoc, N.P. Fusion schemes for image-to-video person re-identification. J. Inf. Telecommun. 2019, 3, 74–94. [Google Scholar] [CrossRef] [Green Version]
- Matsukawa, T.; Okabe, T.; Suzuki, E.; Sato, Y. Hierarchical gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 1363–1372. [Google Scholar]
- Li, W.; Zhu, X.; Gong, S. Person re-identification by deep joint learning of multi-loss classification. arXiv 2017, arXiv:1705.04724. [Google Scholar]
- Argyriou, A.; Evgeniou, T.; Pontil, M. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 2007, 19, 41–48. [Google Scholar]
- Kong, D.; Fujimaki, R.; Liu, J.; Nie, F.; Ding, C. Exclusive Feature Learning on Arbitrary Structures via l1,2-norm. Adv. Neural Inf. Process. Syst. 2014, 1, 1655–1663. [Google Scholar]
- Wang, H.; Nie, F.; Huang, H. Multi-view clustering and feature learning via structured sparsity. Int. Conf. Mach. Learn. 2013, 28, 352–360. [Google Scholar]
- Gray, D.; Brennan, S.; Tao, H. Evaluating appearance models for recognition, reacquisition, and tracking. In Proceedings of the IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), Rio de Janeiro, Brazil, 14 October 2007; Volume 3, pp. 1–7. [Google Scholar]
- Zhou, K.; Yang, Y.; Cavallaro, A.; Xiang, T. Omni-scale feature learning for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–3 November 2019; pp. 3702–3712. [Google Scholar]
- Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. arXiv 2017, arXiv:1708.04896. [Google Scholar] [CrossRef]
- Ning, X.; Gong, K.; Li, W.; Zhang, L.; Bai, X.; Tian, S. Feature refinement and filter network for person Re-identification. IEEE Trans. Circ. Syst. Video Technol. 2020, 31, 3391–3402. [Google Scholar] [CrossRef]
- Quan, R.; Dong, X.; Wu, Y.; Zhu, L.; Yang, Y. Auto-ReID: Searching for a part-aware ConvNet for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–3 November 2019; pp. 3750–3759. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv 2018, arXiv:1806.09055. [Google Scholar]
- Yaghoubi, E.; Borza, D.; Alirezazadeh, P.; Kumar, A.; Proença, H. An Implicit Attention Mechanism for Deep Learning Pedestrian Re-identification Frameworks. arXiv 2020, arXiv:2001.11267. [Google Scholar]
- Luo, H.; Gu, Y.; Liao, X.; Lai, S.; Jiang, W. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 4321–4329. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, X.; Zhao, H.; Tian, M.; Sheng, L.; Shao, J.; Yi, S.; Yan, J.; Wang, X. Hydraplus-net: Attentive deep features for pedestrian analysis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 350–359. [Google Scholar]
- Hou, R.; Chang, H.; Ma, B.; Huang, R.; Shan, S. BiCnet-TKS: Learning Efficient Spatial–Temporal Representation for Video Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 21–24 June 2021; pp. 2014–2023. [Google Scholar]
- Ning, X.; Gong, K.; Li, W.; Zhang, L. JWSAA: Joint weak saliency and attention aware for person re-identification. Neurocomputing 2021, 453, 801–811. [Google Scholar] [CrossRef]
- Shen, C.; Jin, Z.; Zhao, Y.; Fu, Z.; Jiang, R.; Chen, Y.; Hua, X.S. Deep siamese network with multi-level similarity perception for person re-identification. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1942–1950. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–11 June 2015; pp. 815–823. [Google Scholar]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–11 June 2015; pp. 1116–1124. [Google Scholar]
- Li, W.; Wang, X. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3594–3601. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Lv, J.; Chen, W.; Li, Q.; Yang, C. Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7948–7956. [Google Scholar]
- Loy, C.C.; Xiang, T.; Gong, S. Multi-camera activity correlation analysis. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1988–1995. [Google Scholar]
- Ahmed, E.; Jones, M.; Marks, T.K. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–11 June 2015; pp. 3908–3916. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Weinberger, K.Q.; Saul, L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 2009, 10, 207–244. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Baldassarre, F.; Morín, D.G.; Rodés-Guirao, L. Deep koalarization: Image colorization using cnns and inception-resnet-v2. arXiv 2017, arXiv:1712.03400. [Google Scholar]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
- Zheng, L.; Bie, Z.; Sun, Y.; Wang, J.; Su, C.; Wang, S.; Tian, Q. Mars: A Video Benchmark for Large-Scale Person Re-Identification. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 868–884. [Google Scholar]
- Cheng, D.; Gong, Y.; Zhou, S.; Wang, J.; Zheng, N. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the iEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1335–1344. [Google Scholar]
- Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hu, Y.; Yi, D.; Liao, S.; Lei, Z.; Li, S.Z. Cross Dataset Person Re-Identification. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 650–664. [Google Scholar]
- Hirzer, M.; Beleznai, C.; Roth, P.M.; Bischof, H. Person Re-Identification by Descriptive and Discriminative Classification. In Proceedings of the 17th Scandinavian Conference on Image Analysis, Ystad, Sweden, 23–25 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 91–102. [Google Scholar]
- Liao, X.; He, L.; Yang, Z.; Zhang, C. Video-Based Person Re-Identification Via 3D Convolutional Networks and Non-Local Attention. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 620–634. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Li, J.; Zhang, S.; Huang, T. Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8618–8625. [Google Scholar]
- Zhou, Z.; Huang, Y.; Wang, W.; Wang, L.; Tan, T. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 4747–4756. [Google Scholar]
- Ge, Y.; Li, Z.; Zhao, H.; Yin, G.; Yi, S.; Wang, X. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018; pp. 1222–1233. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Zhong, Z.; Zheng, L.; Zheng, Z.; Li, S.; Yang, Y. Camera style adaptation for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5157–5166. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Zou, Y.; Yang, X.; Yu, Z.; Kumar, B.V.; Kautz, J. Joint disentangling and adaptation for cross-domain person re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 87–104. [Google Scholar]
- Fan, X.; Jiang, W.; Luo, H.; Fei, M. Spherereid: Deep hypersphere manifold embedding for person re-identification. J. Vis. Commun. Image Represent. 2019, 60, 51–58. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A Discriminative Feature Learning Approach for Deep Face Recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 499–515. [Google Scholar]
- Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 1318–1327. [Google Scholar]
- Dietlmeier, J.; Antony, J.; McGuinness, K.; O’Connor, N.E. How important are faces for person re-identification? In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6912–6919. [Google Scholar]
- Lu, X.Y.; Skabardonis, A. Freeway traffic shockwave analysis: Exploring the NGSIM trajectory data. In Proceedings of the 86th Annual Meeting of the Transportation Research Board, Washington, DC, USA, 21–25 January 2007. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bai, Y.; Lou, Y.; Gao, F.; Wang, S.; Wu, Y.; Duan, L.Y. Group-sensitive triplet embedding for vehicle reidentification. IEEE Trans. Multimed. 2018, 20, 2385–2399. [Google Scholar] [CrossRef]
- Em, Y.; Gag, F.; Lou, Y.; Wang, S.; Huang, T.; Duan, L.Y. Incorporating intra-class variance to fine-grained visual recognition. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1452–1457. [Google Scholar]
- Zhang, Y.; Liu, D.; Zha, Z.J. Improving triplet-wise training of convolutional neural network for vehicle re-identification. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1386–1391. [Google Scholar]
- Liu, X.; Liu, W.; Mei, T.; Ma, H. Provid: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimed. 2017, 20, 645–658. [Google Scholar] [CrossRef]
- Feng, W.; Hu, Z.; Wu, W.; Yan, J.; Ouyang, W. Multi-object tracking with multiple cues and switcher-aware classification. arXiv 2019, arXiv:1901.06129. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Zhou, Y.; Liu, L.; Shao, L. Vehicle re-identification by deep hidden multi-view inference. IEEE Trans. Image Process. 2018, 27, 3275–3287. [Google Scholar] [CrossRef]
- Zhang, S.; Wu, G.; Costeira, J.P.; Moura, J.M. Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3667–3676. [Google Scholar]
- He, Z.; Lei, Y.; Bai, S.; Wu, W. Multi-Camera vehicle tracking with powerful visual features and spatial-temporal cue. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 203–212. [Google Scholar]
- Naphade, M.; Anastasiu, D.C.; Sharma, A.; Jagrlamudi, V.; Jeon, H.; Liu, K.; Chang, M.C.; Lyu, S.; Gao, Z. The nvidia ai city challenge. In Proceedings of the 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), San Francisco, CA, USA, 4–8 August 2017; pp. 1–6. [Google Scholar]
- Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision meets drones: A challenge. arXiv 2018, arXiv:1804.07437. [Google Scholar]
- Voigtlaender, P.; Krause, M.; Osep, A.; Luiten, J.; Sekar, B.B.G.; Geiger, A.; Leibe, B. MOTS: Multi-object tracking and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7942–7951. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–10 December 2015; pp. 91–99. [Google Scholar]
- Zapletal, D.; Herout, A. Vehicle re-identification for automatic video traffic surveillance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 25–31. [Google Scholar]
- Liu, X.; Liu, W.; Ma, H.; Fu, H. Large-scale vehicle re-identification in urban surveillance videos. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016; pp. 1–6. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Shen, X.; Lin, Z.; Brandt, J.; Avidan, S.; Wu, Y. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 16–21 June 2012; pp. 3013–3020. [Google Scholar]
- Muja, M.; Lowe, D.G. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP 2009, 2, 2. [Google Scholar]
- Alahi, A.; Ramanathan, V.; Fei-Fei, L. Socially-aware large-scale crowd forecasting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2203–2210. [Google Scholar]
- Sochor, J.; Špaňhel, J.; Herout, A. BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance. IEEE Trans. Intell. Transp. Syst. 2018, 20, 97–108. [Google Scholar] [CrossRef] [Green Version]
- Luiten, J.; Fischer, T.; Leibe, B. Track to reconstruct and reconstruct to track. IEEE Robot. Autom. Lett. 2020, 5, 1803–1810. [Google Scholar] [CrossRef] [Green Version]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 2008, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Zhang, H.; Sindagi, V.; Patel, V.M. Image de-raining using a conditional generative adversarial network. IEEE Trans. Circ. Syst. Video Technol. 2019, 30, 3943–3956. [Google Scholar] [CrossRef] [Green Version]
- Schaefer, G.; Stich, M. UCID: An uncompressed color image database. In Storage and Retrieval Methods and Applications for Multimedia 2004; International Society for Optics and Photonics: Washington, DC, USA, 2003; Volume 5307, pp. 472–480. [Google Scholar]
- Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
- Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef]
- Kang, L.W.; Lin, C.W.; Fu, Y.H. Automatic single-image-based rain streaks removal via image decomposition. IEEE Trans. Image Process. 2011, 21, 1742–1755. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Tan, R.T.; Guo, X.; Lu, J.; Brown, M.S. Rain streak removal using layer priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2736–2744. [Google Scholar]
- Fu, X.; Huang, J.; Ding, X.; Liao, Y.; Paisley, J. Clearing the skies: A deep network architecture for single-image rain removal. IEEE Trans. Image Process. 2017, 26, 2944–2956. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dataset | # Identities | # Cameras | # Images | Label Method | Size | Tracking Sequences |
---|---|---|---|---|---|---|
VIPeR | 632 | 2 | 1264 | Hand | NO | |
ETH1,2,3 | 853,528 | 1 | 8580 | Hand | Vary | YES |
QMUL iLIDS | 119 | 2 | 476 | Hand | Vary | NO |
GRID | 1025 | 8 | 1275 | Hand | Vary | NO |
CAVIAR4reid | 72 | 2 | 1220 | Hand | Vary | NO |
3DPeS | 192 | 8 | 1011 | Hand | Vary | NO |
PRID2011 | 934 | 2 | 24,541 | Hand | YES | |
WARD | 70 | 3 | 4786 | Hand | YES | |
SAIVT-Softbio | 152 | 8 | 64,472 | Hand | Vary | YES |
CUHK01 | 971 | 2 | 3884 | Hand | NO | |
CUHK02 | 1816 | 10 (5 pairs) | 7264 | Hand | NO | |
CUHK03 | 1467 | 10 (5 pairs) | 13,164 | Hand | Hand/DPM | NO |
RAiD | 43 | 4 | 6920 | Hand | NO | |
iLIDS-VID | 300 | 2 | 42,495 | Hand | Vary | YES |
MPR Drone | 84 | 1 | - | ACF | Vary | NO |
HDA Person Dataset | 53 | 13 | 2976 | Hand/ACF | Vary | YES |
Shinpuhkan Dataset | 24 | 16 | - | Hand/ACF | YES | |
CASIA Gait Database B | 124 | 11 | - | Background subtraction | Vary | YES |
Market-1501 | 1501 | 6 | 32,217 | Hand/DPM | NO | |
PKU-reid | 114 | 2 | 1824 | Hand | NO | |
PRW | 932 | 6 | 34,304 | Hand | Vary | NO |
Large scale person search | 11,934 | - | 34,574 | Hand | Vary | NO |
MARS | 1261 | 6 | 1,191,003 | DPM+GMMCP | YES | |
DukeMTMC-reid | 1812 | 8 | 36,441 | Hand | Vary | NO |
DukeMTMC4reid | 1852 | 8 | 346,261 | Doppia | Vary | NO |
Airport | 9651 | 6 | 39,902 | ACF | NO | |
MSMT17 | 4101 | 15 | 126,441 | Faster RCNN | Vary | NO |
RPIfield | 112 | 12 | 1,601,581 | ACF | Vary | NO |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
Fusion | [19] | KDE and CNN features, late fusion, SVM model | CAVIAR4reid CMC 0.933 | Robust, simple |
[24] | GOG and ResNet features, Data augmentation | CAVIAR4reid CMC 0.919 | Simple, lack train data | |
[26] | Joint learning multi-loss, two-branch CNN | CHUK03, CMC 0.832 | Simple, efficient | |
[31] | Residual blocks, multi-scale feature | Market-1501 CMC 0.948 | Hard to train, can over fit | |
[34] | Weaken feature convolution, ResNet | Market-1501 mAP 0.942 | Robust, reusable | |
Strip | [35] | Neural architecture, search (NAS) | Market-1501 CMC 0.954 | Hard to train, complex, not reusable |
Drop | [4] | Modified dropout layer | CUHK03 CMC 0.666 | Easy train, data domain, problematic |
Attention | [37] | ResNet-50 as feature extractor, attention mechanism | RAP mAP 0.862 | Simple to replicate, reusable |
[40] | ResNet-50 as feature extractor, multi-directional and level attention maps | CUHK03 CMC 0.918 | Complex, not reusable state-of-the-art | |
[41] | Two Branch, multi-scale and attention maps | MARS mAP 0.860 | Complex, reusable state-of-the-art | |
[42] | Attention, saliency maps ResNet | Market-1501 mAP 0.892 | Complex, generalizes well, state-of-the-art |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
Contrast Loss | [43] | Unsupervised, ResNet50 features, Siamese networks, Bayesian fusion | CUHK03, CMC 0.857 | Good dataset generalization, Cross domains, Complex |
[46] | Filter pairing neural network | CHUK03, CMC 0.206 | Bad performance Complex, Not robust | |
[50] | Unsupervised, ResNet50 features, Siamese networks, Bayesian fusion, spatial–temporal model | CUHK03, CMC 0.857 | Good dataset generalization, Cross domains, Complex | |
[52] | Siamese networks, Tied convolution | CUHK03 CMC 0.547 | Simple, reusable | |
Triplet Loss | [53] | Triplet loss, pre-trained ResNet | CUHK03 CMC 0.876 | Simple to replicate, architecture poses restraints |
[60] | Three CNN with shared weights, modified triplet loss | i-LIDS CMC 0.604 | Simple train Scalable Efficient |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
3D CNN | [64] | 3D CNN, Attention, triple loss | MARS mAP 0.834 | Simple to replicate, reusable state-of-the-art |
[66] | 3D- Two stream CNN, Residual attention | MARS mAP 0.740 | Replicable, SOTA | |
RNN | [6] | LSTM, LBP features | iLIDS-VID Acc1 0.493 | Not robust, Old |
[67] | LSTM, CNN features | MARS Rank1 0.706 | Simple, replicable |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
GAN | [68] | GAN, ResNet-50, pre-trained, pose generator | Market-1501 CMC 0.905 | Uses GAN, hard to train state-of-the-art |
[70] | GAN, style transfer, smooth regularization | Market-1501 mAP 0.715 | Uses GAN, simple replicable | |
[72] | GAN, joint learning, domain adaptation | Market-1501 Rank 1 0.831 | Uses GAN, complex replicable | |
Others | [38] | Evaluation of techniques, Pre-trained, Modified Triple loss | Market-1501 CMC 0.945 | Simple to reuse, reusable explanatory |
Dataset | # Identities | # Cameras | # Images | Label Method | Size | Tracking Sequences |
---|---|---|---|---|---|---|
NGSIM | ||||||
KITTI | - | - | - | R-CNN | Yes | |
UA-DETRAC | 825 | 24 | 1.21 M | Manual | Yes | |
VehicleID | 26,267 | - | 221 K | Manual | Vary | Yes |
VeRi-776 | 776 | 18 | 50 k | Manual | Vary | Yes |
CompCar | 1687 | - | 18 k | Manual | Vary | Yes |
PKU-Vehicle | - | - | 18 M | Manual | Cropped | Yes |
MOT20-03 | 735 | - | 356 k | R-CNN | Yes | |
MOT16 | - | - | 476 k | R-CNN | Yes | |
TRANCOS | 46,796 | - | 58 M | HOG | Vary | Yes |
WebCamT | - | 212 | 60 k | - | - | No |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
Triplet | [82] | CNN features, group-sensitive-triplet emb. | VehicleID mAP 0.743 | Reproducible, ranking problems |
[84] | VGG features, triplet sampling method | VeRi-776 mAP 0.574 | reproducible, robust | |
Contrast loss | [85] | Siamese Neural Net, spatial–temporal similarity | VeRi-776 mAP 27.77 | Simple, not robust |
[86] | Siamese-RPN, switcher -aware classification (SAC) | MOT16 CLEAR 0.712 | Complex, trajectory ID handled |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
LSTM | [88] | Spatially Concatenated CNN, CNN-LSTM bi-directional loop | VeRi-776 mAP 18.13 | Simple, applicable |
[89] | CNN features, gFCN- rLSTM network + Atrous | TRANCOS MAE 4.21 | Reproducible, ranking problems | |
[90] | ResNet50, LSTM + clustering DBSCAN | AI City MAE 0.730 | Complex, trajectory problems | |
3D | [93] | Mask R-CNN, 3D convolutional layers | KITTI MOTS 0.669 | Robust, Short term ID handled |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
Fusion | [95] | SVM, HOG | Own - | Not replicable, |
[96] | GoogLeNet, Feature fusion | VeRi-776 mAP 19.92 | Simple, Baseline | |
[98] | SIFT + BOW, re-ranking | INRIA mAP 0.762 | OLD fashion, Not SOTA | |
[100] | Social Affinity Maps (SAM), Markov-chain model | Own – | Complex, Only indoors | |
[101] | 3D box prediction, ResNet | BoxCars116k ACC 0.808 | Not useful, simple |
Cat | Ref. | Main Technique(s) | # Data Success | Pros/Cons |
---|---|---|---|---|
[102] | 3D reconstruction, 2D optical flow | KITTI MOTA 0.848 | Robust, Short term ID handled | |
[105] | CNN features, Kalman + Association | - - | Robust, Short term ID handled |
Reference | Main Techniques | # Data Success | Pros/Cons |
---|---|---|---|
[107] | GANS, conditional GAN | UCID PSNR 24.34 | Robust, SOTA |
[113] | Bilateral filter, image decomposition | Systeticg VIF 0.60 | Simple, Parameter dependent |
[114] | GMM, image decomposition | Systetic SSIM 0.880 | Simple, Pre-trained dependent |
[115] | CNN, HF component layer | Systetic SSIM 0.900 | Simple, Robust |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Oliveira, H.S.; Machado, J.J.M.; Tavares, J.M.R.S. Re-Identification in Urban Scenarios: A Review of Tools and Methods. Appl. Sci. 2021, 11, 10809. https://doi.org/10.3390/app112210809
Oliveira HS, Machado JJM, Tavares JMRS. Re-Identification in Urban Scenarios: A Review of Tools and Methods. Applied Sciences. 2021; 11(22):10809. https://doi.org/10.3390/app112210809
Chicago/Turabian StyleOliveira, Hugo S., José J. M. Machado, and João Manuel R. S. Tavares. 2021. "Re-Identification in Urban Scenarios: A Review of Tools and Methods" Applied Sciences 11, no. 22: 10809. https://doi.org/10.3390/app112210809
APA StyleOliveira, H. S., Machado, J. J. M., & Tavares, J. M. R. S. (2021). Re-Identification in Urban Scenarios: A Review of Tools and Methods. Applied Sciences, 11(22), 10809. https://doi.org/10.3390/app112210809