Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification
Abstract
:1. Introduction
- We propose an automatic cropping method for the occluded Re-ID problem. It can automatically crop the occlusion regions of pedestrian images and retain the visible regions, avoiding the inefficiency and human bias of manual cropping. This method can be embedded in any occluded Re-ID model;
- We propose a semantic-guided alignment model (SGAM), which can use semantic information to guide the model to extract local features in pedestrians’ visible regions and to only focus on the public visible areas between images during the matching stage, thus significantly suppressing interference noises caused by spatial misalignment and unshared regions;
- We conducted several experiments on a series of public Re-ID datasets [9,12,13,14,15,16] to verify the effectiveness of SGAM. Experimental results demonstrate that our model outperforms previous occluded Re-ID methods [8,9,10,11,12]. In the holistic Re-ID problem, our method still achieves competitive performance. Sufficient ablation experiment results confirm that SGAM has outstanding matching capability, and the proposed strategies can be easily embedded into other pedestrian re-identification methods.
2. Related Work
2.1. Deep Person Re-Identification
2.2. Partial Person Re-Identification
2.3. Semantic-Guided Person Re-Identification
3. Materials and Methods
3.1. Automatic Cropping Strategy
3.2. Semantic-Guided Feature Extraction Network
3.3. Public Distance Measurement Strategy
3.4. Training SGAM
4. Results
4.1. Datasets and Evaluation Measures
4.2. Implementation Details
4.3. Results Comparison
4.4. Ablation Study
4.5. Visualization
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. Available online: https://arxiv.org/abs/1703.07737 (accessed on 21 November 2017).
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
- Varior, R.R.; Shuai, B.; Lu, J.; Xu, D.; Wang, G. A Siamese Long Short-Term Memory Architecture for Human Re-identification. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 135–153. [Google Scholar]
- Zhao, L.; Li, X.; Zhuang, Y.; Wang, J. Deeply-Learned Part-Aligned Representations for Person Re-identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3239–3248. [Google Scholar]
- Li, W.; Zhu, X.; Gong, S. Harmonious Attention Network for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2285–2294. [Google Scholar]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
- Liu, X.; Zhao, H.; Tian, M.; Sheng, L.; Shao, J.; Yi, S.; Yan, J.; Wang, X. HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 350–359. [Google Scholar]
- He, L.; Liang, J.; Li, H.; Sun, Z. Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7073–7082. [Google Scholar]
- Zheng, W.; Li, X.; Xiang, T.; Liao, S.; Lai, J.; Gong, S. Partial Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4678–4686. [Google Scholar]
- He, L.; Sun, Z.; Zhu, Y.; Wang, Y. Recognizing partial biometric patterns. arXiv 2018, arXiv:1810.07399. Available online: https://arxiv.org/abs/1810.07399v1 (accessed on 17 October 2018).
- Sun, Y.; Xu, Q.; Li, Y.; Zhang, C.; Li, Y.; Wang, S.; Sun, J. Perceive Where to Focus: Learning Visibility-aware Part-level Features for Partial Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 393–402. [Google Scholar]
- Miao, J.; Wu, Y.; Liu, P.; Ding, Y.; Yang, Y. Pose-Guided Feature Alignment for Occluded Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 542–551. [Google Scholar]
- Zheng, W.; Gong, S.; Xiang, T. Person re-identification by probabilistic relative distance comparison. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 20–25 June 2011; pp. 649–656. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-identification: A Benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. Unlabeled Samples Generated by GAN Improve the Person Re-Identification Baseline in Vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3774–3782. [Google Scholar]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-target, Multi-camera Tracking. arXiv 2016, arXiv:1609.01775. Available online: https://arxiv.org/abs/1609.01775 (accessed on 19 September 2016).
- Wei, L.; Zhang, S.; Yao, H.; Gao, W.; Tian, Q. GLAD: Global-Local-Alignment Descriptor for Scalable Person Re-Identification. IEEE Trans. Multimed. 2018, 21, 986–999. [Google Scholar] [CrossRef]
- Wang, G.; Yuan, Y.; Li, J.; Ge, S.; Zhou, X. Receptive Multi-Granularity Representation for Person Re-Identification. IEEE Trans. Image Process. 2020, 29, 6096–6109. [Google Scholar] [CrossRef] [PubMed]
- Kalayeh, M.M.; Basaran, E.; Gökmen, M.; Kamasak, M.E.; Shah, M. Human Semantic Parsing for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1062–1071. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Lan, C.; Zeng, W.; Chen, Z. Densely Semantically Aligned Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 667–676. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. Available online: https://arxiv.org/abs/1706.05587 (accessed on 5 December 2017).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. arXiv 2017, arXiv:1708.04896. Available online: https://arxiv.org/abs/1708.04896 (accessed on 16 November 2017).
- Gong, K.; Liang, X.; Zhang, D.; Shen, X.; Lin, L. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6757–6765. [Google Scholar]
- Liao, S.; Hu, Y.; Zhu, X.; Li, S.Z. Person Re-identification by Local Maximal Occurrence Representation and Metric Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2197–2206. [Google Scholar]
- Yu, Q.; Chang, X.; Song, Y.; Xiang, T.; Hospedales, T.M. The devil is in the middle: Exploiting mid-level representations for cross-domain instance matching. arXiv 2017, arXiv:1711.08106. Available online: https://arxiv.org/abs/1711.08106 (accessed on 4 April 2018).
- Huang, H.; Li, D.; Zhang, Z.; Chen, X.; Huang, K. Adversarially Occluded Samples for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5098–5107. [Google Scholar]
- Suh, Y.; Wang, J.; Tang, S.; Mei, T.; Lee, K.M. Part-Aligned Bilinear Representations for Person Re-identification. arXiv 2018, arXiv:1804.07094. Available online: https://arxiv.org/abs/1804.07094 (accessed on 19 April 2018).
- Ge, Y.; Li, Z.; Zhao, H.; Yin, G.; Yi, S.; Wang, X. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. arXiv 2018, arXiv:1810.02936. Available online: https://arxiv.org/abs/1810.02936 (accessed on 12 December 2018).
- Liao, S.; Jain, A.K.; Li, S.Z. Partial Face Recognition: Alignment-Free Approach. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1193–1205. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, Y.; Zheng, L.; Deng, W.; Wang, S. VDNet for Pedestrian Retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3820–3828. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. Pedestrian Alignment Network for Large-scale Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3037–3045. [Google Scholar] [CrossRef] [Green Version]
- Li, W.; Zhu, X.; Gong, S. Person re-identification by deep joint learning of multi-loss classification. arXiv 2018, arXiv:1705.04724. Available online: https://arxiv.org/abs/1705.04724 (accessed on 23 May 2017).
- Lin, Y.; Zheng, L.; Zheng, Z.; Wu, Y.; Hu, Z.; Yan, C.; Yang, Y. Improving person re-identification by attribute and identity learning. Pattern Recognit. 2019, 95, 151–161. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Zhu, X.; Gong, S. Person Re-identification by Deep Learning Multi-Scale Representations. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 2590–2600. [Google Scholar]
- Chang, X.; Hospedales, T.M.; Xiang, T. Multi-level Factorisation Net for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2109–2118. [Google Scholar]
Methods | Rank-1 (%) | Rank-5 (%) | Rank-10 (%) | mAP (%) |
---|---|---|---|---|
DIM [27] | 21.5 | 36.1 | 42.8 | 14.4 |
LOMO+XQDA [28] | 8.1 | 17 | 22 | 5 |
Part Aligned [4] | 28.8 | 44.6 | 51 | 20.2 |
Random Erasing [25] | 40.5 | 59.6 | 66.8 | 30 |
HACNN [5] | 34.4 | 51.9 | 59.4 | 26 |
Adver Occluded [29] | 44.5 | - | - | 32.2 |
Part Bilinear [30] | 36.9 | - | - | - |
FD-GAN [31] | 40.8 | - | - | - |
PCB [2] | 42.6 | 57.1 | 62.9 | 33.7 |
DSR [8] | 40.8 | 58.2 | 65.2 | 30.4 |
SFR [10] | 42.3 | 60.3 | 67.3 | 32 |
PGFA [12] | 51.4 | 68.6 | 74.9 | 37.3 |
SGAM | 55.1 | 68.7 | 74 | 35.3 |
Methods | Partial-REID | Partial-iLIDS | ||
---|---|---|---|---|
Rank-1 (%) | Rank-3 (%) | Rank-1 (%) | Rank-3 (%) | |
MTRC [32] | 23.7 | 27.3 | 17.7 | 26.1 |
AMC+SWM [9] | 37.3 | 46.0 | 21.0 | 32.8 |
DSR [8] | 50.7 | 70.0 | 58.8 | 67.2 |
SFR [10] | 56.9 | 78.5 | 63.9 | 74.8 |
VPM [11] | 67.7 | 81.9 | 65.5 | 74.8 |
PGFA [12] | 68.0 | 80.0 | 69.1 | 80.9 |
SGAM | 74.3 | 82.3 | 70.6 | 82.4 |
Methods | Market1501 | DukeMTMC-reID | ||
---|---|---|---|---|
Rank-1 (%) | mAP | Rank-1 (%) | mAP | |
BoW+kissme [16] | 44.4 | 20.8 | 25.1 | 12.2 |
SVDNet [33] | 82.3 | 62.1 | 76.7 | 56.8 |
PAN [17] | 82.8 | 63.4 | 71.7 | 51.5 |
PAR [4] | 81 | 63.4 | - | - |
Pedestrian [34] | 82 | 63 | - | - |
DSR [8] | 83.5 | 64.2 | - | - |
MultiLoss [35] | 83.9 | 64.4 | - | - |
TripletLoss [1] | 84.9 | 69.1 | - | - |
Adver Occluded [29] | 86.5 | 78.3 | 79.1 | 62.1 |
APR [36] | 87 | 66.9 | 73.9 | 55.6 |
MultiScale [37] | 88.9 | 73.1 | 79.2 | 60.6 |
MLFN [38] | 90 | 74.3 | 81 | 62.8 |
PCB [2] | 92.4 | 77.3 | 81.9 | 65.3 |
PGFA [12] | 91.2 | 76.8 | 82.6 | 65.5 |
VPM [11] | 93 | 80.8 | 83.6 | 72.6 |
SGAM | 91.4 | 77.6 | 83.5 | 67.3 |
Methods | Rank-1 (%) | Rank-5 (%) | Rank-10 (%) | mAP |
---|---|---|---|---|
SGAM | 55.1 | 68.7 | 74.0 | 35.3 |
SGAM-1 | 47.6 | 58.0 | 63.1 | 29.2 |
SGAM-2 | 51.8 | 62.8 | 68.1 | 32.2 |
Method | Rank-1 | Rank-5 | Rank-10 | mAP |
---|---|---|---|---|
ResNet50 | 39.5 | 57.2 | 63.7 | 27.2 |
ResNet50+crop | 48.2 | 65.9 | 73.1 | 32.5 |
SGAM (no crop) | 44.8 | 61.3 | 68.6 | 35.0 |
SGAM | 55.1 | 68.7 | 74.0 | 35.3 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Q.; Wang, P.; Fang, Z.; Lu, Q. Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification. Sensors 2020, 20, 4431. https://doi.org/10.3390/s20164431
Yang Q, Wang P, Fang Z, Lu Q. Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification. Sensors. 2020; 20(16):4431. https://doi.org/10.3390/s20164431
Chicago/Turabian StyleYang, Qin, Peizhi Wang, Zihan Fang, and Qiyong Lu. 2020. "Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification" Sensors 20, no. 16: 4431. https://doi.org/10.3390/s20164431
APA StyleYang, Q., Wang, P., Fang, Z., & Lu, Q. (2020). Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification. Sensors, 20(16), 4431. https://doi.org/10.3390/s20164431