FCL: Pedestrian Re-Identification Algorithm Based on Feature Fusion Contrastive Learning
Abstract
:1. Introduction
- We propose a pedestrian re-identification algorithm, namely, Feature Fusion Contrastive Learning (FCL). With a proposed feature fusion pooling method, FCL facilitates a more efficient distribution of feature representations across pedestrian images.
- To address the imbalance in the proportion of positive and negative samples, we introduce FocalLoss, which computes the comparative loss between sample features and their corresponding cluster centers.
- To validate the effectiveness of our network algorithm, we conducted experiments on three benchmark datasets: Market1501, DukeMTMC-reID, and MSMT17. Through comprehensive comparisons with a series of algorithms, we demonstrate the superiority of FCL.
2. Related Work
2.1. Pedestrian Re-Identification
2.2. Contrastive Learning
3. Feature Fusion Contrastive Learning
3.1. Overall
3.2. Pseudo-Label Generation
3.3. Feature Fusion
3.4. Loss Function
3.4.1. Instance-Level Contrastive Loss
3.4.2. Cluster-Level Contrastive Loss
- Inter-view loss refers to the loss between features and cluster centers after the same data augmentation. Due to the substantial imbalance between positive and negative samples, positive samples that pertain to pedestrians are notably scarce in comparison to the profusion of irrelevant samples and interfering factors. This disparity results in an overwhelming surplus of negative samples relative to positive ones, making it difficult for the model to learn the characteristics of the positive samples. Therefore, we chose to use FocalLoss to adaptively balance the proportion of positive and negative samples and adjust the contribution of samples to the loss. The FocalLoss formula is as follows:This loss introduces () as a modulation coefficient to reduce the weight of easily distinguishable negative samples, making the model more focused on difficult-to-classify samples during training. Specifically, when tends to 1, it indicates that the sample is easily distinguishable and has a small contribution to the loss. At this point, tends to 0, which reduces the proportion for easily distinguishable samples. When is very small, the sample is misclassified into other categories, and the modulation factor tends to 1, without affecting the calculation of the loss. Compared to the cross-entropy loss, FocalLoss does not change the loss for hard negative samples but decreases the loss for easily distinguishable samples. With , FocalLoss can adjust the weight of positive and negative samples, as well as control the weight of difficult-to-classify samples.Since the FocalLoss function can reduce the impact of substantial imbalance between positive and negative samples, we calculate inter-view loss as follows:
- Intra-view loss is to pull the distance between the image representation and the enhanced representation in response to the image. Considering the calculation cost, we calculate the picture representation and the corresponding clustering center as follows:
4. Experiments
4.1. Dataset
4.2. Baslines
4.3. Experimental Settings
4.4. Overall Performance
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Y.; He, J.; Zhang, T.; Liu, X.; Zhang, Y.; Wu, F. Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Chen, X.; Fu, C.; Zhao, Y.; Zheng, F.; Song, J.; Ji, R.; Yang, Y. Salience-Guided Cascaded Suppression Network for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ming, Z.; Zhu, M.; Wang, X.; Zhu, J.; Cheng, J.; Gao, C.; Yang, Y.; Wei, X. Deep learning-based person re-identification methods: A survey and outlook of recent works. Image Vis. Comput. 2022, 119, 104394. [Google Scholar] [CrossRef]
- Chen, Y.; Zhu, X.; Gong, S. Instance-Guided Context Rendering for Cross-Domain Person Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- S, S.R.; Prasad, M.V.; Balakrishnan, R. Spatio-Temporal association rule based deep annotation-free clustering (STAR-DAC) for unsupervised person re-identification. Pattern Recognit. 2022, 122, 108287. [Google Scholar] [CrossRef]
- Xie, J.; Zhan, X.; Liu, Z.; Ong, Y.S.; Loy, C.C. Delving into Inter-Image Invariance for Unsupervised Visual Representations. Int. J. Comput. Vis. 2022, 130, 2994–3013. [Google Scholar] [CrossRef]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Xuan, S.; Zhang, S. Intra-inter camera similarity for unsupervised person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11926–11935. [Google Scholar]
- Li, M.; Li, C.G.; Guo, J. Cluster-guided asymmetric contrastive learning for unsupervised person re-identification. IEEE Trans. Image Process. 2022, 31, 3606–3617. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, G.; Chen, Y.; Zheng, Y. Global relation-aware contrast learning for unsupervised person re-identification. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8599–8610. [Google Scholar] [CrossRef]
- Yu, H.X.; Wu, A.; Zheng, W.S. Unsupervised Person Re-Identification by Deep Asymmetric Metric Embedding. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 956–973. [Google Scholar] [CrossRef] [PubMed]
- Xiao, T.; Wang, X.; Efros, A.A.; Darrell, T. What should not be contrastive in contrastive learning. arXiv 2020, arXiv:2008.05659. [Google Scholar]
- Jawaharlalnehru, A.; Sambandham, T.; Sekar, V.; Ravikumar, D.; Loganathan, V.; Kannadasan, R.; Khan, A.A.; Wechtaisong, C.; Haq, M.A.; Alhussen, A.; et al. Target Object Detection from Unmanned Aerial Vehicle (UAV) Images Based on Improved YOLO Algorithm. Electronics 2022, 11, 2343. [Google Scholar] [CrossRef]
- Khobdeh, S.B.; Yamaghani, M.R.; Sareshkeh, S.K. Basketball action recognition based on the combination of YOLO and a deep fuzzy LSTM network. J. Supercomput. 2023, 80, 3528–3553. [Google Scholar] [CrossRef]
- Sharma, N.; Haq, M.A.; Dahiya, P.K.; Marwah, B.R.; Lalit, R.; Mittal, N.; Keshta, I. Deep Learning and SVM-Based Approach for Indian Licence Plate Character Recognition. Comput. Mater. Contin. 2023, 74, 881–895. [Google Scholar] [CrossRef]
- Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet Loss for Person Re-Identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Si, T.; He, F.; Wu, H.; Duan, Y. Spatial-driven features based on image dependencies for person re-identification. Pattern Recognit. 2022, 124, 108462. [Google Scholar] [CrossRef]
- Wang, G.; Lai, J.; Huang, P.; Xie, X. Spatial-Temporal Person Re-identification. Proc. Aaai Conf. Artif. Intell. 2019, 33, 8933–8940. [Google Scholar] [CrossRef]
- Ge, Y.; Zhu, F.; Chen, D.; Zhao, R.; Li, H. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural Inf. Process. Syst. 2020, 33, 11309–11321. [Google Scholar]
- Ganin, Y.; Lempitsky, V. Unsupervised Domain Adaptation by Backpropagation; PMLR: New York, NY, USA, 2019. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
- Coates, A.; Ng, A.Y. Learning feature representations with k-means. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 561–580. [Google Scholar]
- Ji, H.; Wang, L.; Zhou, S.; Tang, W.; Zheng, N.; Hua, G. Meta Pairwise Relationship Distillation for Unsupervised Person Re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Lin, Y.; Xie, L.; Wu, Y.; Yan, C.; Tian, Q. Unsupervised Person Re-Identification via Softened Similarity Learning; Cornell University: New York, NY, USA, 2020. [Google Scholar]
- Morabbi, S.; Soltanizadeh, H.; Mozaffari, S.; Fadaeieslam, M.J. Improving generalization in deep neural network using knowledge transformation based on fisher criterion. J. Supercomput. 2023, 79, 20899–20922. [Google Scholar] [CrossRef]
- Ye, M.; Zhang, X.; Yuen, P.C.; Chang, S.F. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6210–6219. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Robinson, J.; Chuang, C.Y.; Sra, S.; Jegelka, S. Contrastive learning with hard negative samples. arXiv 2020, arXiv:2010.04592. [Google Scholar]
- Li, L.; Zhou, Z.; Wang, B.; Miao, L.; Zong, H. A novel CNN-based method for accurate ship detection in HR optical remote sensing images via rotated bounding box. IEEE Trans. Geosci. Remote Sens. 2020, 59, 686–699. [Google Scholar] [CrossRef]
- Zhao, F.; Liao, S.; Xie, G.S.; Zhao, J.; Zhang, K.; Shao, L. Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020. Proceedings, Part XI 16. pp. 526–544. [Google Scholar]
- Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv 2020, arXiv:2001.01526. [Google Scholar]
- Yang, F.; Zhong, Z.; Luo, Z.; Cai, Y.; Lin, Y.; Li, S.; Sebe, N. Joint noise-tolerant learning and meta camera shift adaptation for unsupervised person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4855–4864. [Google Scholar]
- Wang, M.; Lai, B.; Huang, J.; Gong, X.; Hua, X.S. Camera-aware proxies for unsupervised person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 2764–2772. [Google Scholar]
- Cho, Y.; Kim, W.J.; Hong, S.; Yoon, S.E. Part-based pseudo label refinement for unsupervised person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7308–7318. [Google Scholar]
- Zhang, X.; Ge, Y.; Qiao, Y.; Li, H. Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3436–3445. [Google Scholar]
- Chen, H.; Lagadec, B.; Bremond, F. Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 14960–14969. [Google Scholar]
- Pang, Z.; Zhao, L.; Liu, Q.; Wang, C. Camera invariant feature learning for unsupervised person re-identification. IEEE Trans. Multimed. 2022, 25, 6171–6182. [Google Scholar] [CrossRef]
- Li, P.; Wu, K.; Zhou, S.; Huang, Q.; Wang, J. Pseudo Labels Refinement with Intra-Camera Similarity for Unsupervised Person Re-Identification. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 366–370. [Google Scholar]
- Cheng, S.; Chen, Y. Camera sensing unsupervised pedestrian re-recognition method guided by pseudo-label refinement. JEMI 2023, 50, 230239. [Google Scholar]
Dataset | Training Set | Test Set | Cam | ||
---|---|---|---|---|---|
IDs | Images | IDs | Images | ||
Market-1051 | 751 | 12,936 | 750 | 19,732 | 6 |
DukeMTMC-reID | 702 | 16,522 | 702 | 19,889 | 8 |
MSMT17 | 4101 | 32,621 | 3060 | 93,820 | 15 |
Method | Market-1501 | DukeMTMC-reID | ||||
---|---|---|---|---|---|---|
mAP | Rank-1 | Rank-5 | mAP | Rank-1 | Rank-5 | |
SpCL [20] (NIPS20) | 73.1% | 88.1% | 96.3% | 65.3% | 81.2% | 90.3% |
NRMT [34] (ECCV20) | 71.7% | 87.8% | 94.6% | 62.2% | 77.8% | 86.9% |
MMT [35] (ICLR20) | 73.8% | 89.5% | 96.0% | 62.3% | 76.3% | 87.7% |
ITCS [9] (CVPR21) | 72.9% | 89.5% | 95.3% | 64.4% | 80.0% | 89.0% |
CAP [37] (AAAI21) | 79.2% | 91.4% | 96.3% | 67.3% | 81.1% | 89.3% |
RLCC [39] (CVPR21) | 77.7% | 90.8% | 96.3% | 69.2% | 83.2% | 91.6% |
MetaCam [36] (CVPR21) | 61.7% | 83.9% | 92.3% | 53.8% | 73.8% | 84.2% |
ICE [40] (ICCV21) | 82.3% | 93.8% | 97.6% | 69.6% | 83.3% | 91.5% |
CIFL [41] (TMM22) | 82.3% | 93.8% | 97.6% | 69.6% | 83.3% | 91.5% |
CACL [10] (TIP22) | 80.9% | 92.7% | 97.4% | 69.6% | 82.6% | 91.2% |
GRACL [11] (TCSVT22) | 83.7% | 93.2% | - | - | - | - |
PPLR [38] (CVPR22) | 81.5% | 92.8% | 97.1% | - | - | - |
PLRIS [42] (ICIP23) | 83.2% | 93.1% | - | - | - | - |
FCL | 83.7% | 93.8% | 97.9% | 70.3% | 83.3% | 92.1% |
Method | MSMT17 | |
---|---|---|
mAP | Rank-1 | |
SpCL [20] (NIPS20) | 26.8% | 53.7% |
MMT [35] (ICLR20) | 24.0% | 50.1% |
RLCC [39] (CVPR21) | 27.9% | 56.5% |
LRMGFS [43] (JEMI23) | 27.4% | 28.4% |
FCL | 30.8% | 58.1% |
Method | Market-1501 | DukeMTMC-reID | MSMT17 | ||||||
---|---|---|---|---|---|---|---|---|---|
mAP | Rank-1 | Rank-5 | mAP | Rank-1 | Rank-5 | mAP | Rank-1 | Rank-5 | |
FCL | 83.7% | 94.1% | 97.9% | 70.3% | 83.3% | 92.1% | 30.8% | 58.1% | 72.1% |
-FocalLoss | 82.4% | 93.0% | 97.5% | 69.8% | 82.9% | 91.6% | 26.5% | 54.9% | 67.9% |
Method | Market-1501 | DukeMTMC-reID | MSMT17 | ||||||
---|---|---|---|---|---|---|---|---|---|
mAP | Rank-1 | Rank-5 | mAP | Rank-1 | Rank-5 | mAP | Rank-1 | Rank-5 | |
FCL | 83.7% | 94.1% | 97.9% | 70.3% | 83.3% | 92.1% | 30.8% | 58.1% | 72.1% |
-Feature Fusion | 81.6% | 92.8% | 97.6% | 69.7% | 82.8% | 91.7% | 29.4% | 57.5% | 69.3% |
Method | Market-1501 | ||
---|---|---|---|
mAP | Rank-1 | Rank-5 | |
FCL and channel concat | 83.4% | 93.8% | 97.6% |
FCL and opp | 82.5% | 92.6% | 97.3% |
Single and typ | 79.7% | 90.9% | 96.5% |
Multi and typ | 82.3% | 92.9% | 97.5% |
Single and cir | 82.1% | 92.3% | 97.1% |
Multi and cir(FCL) | 83.7% | 94.1% | 97.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Zhang, Y.; Gao, Y.; Xu, B.; Liu, X. FCL: Pedestrian Re-Identification Algorithm Based on Feature Fusion Contrastive Learning. Electronics 2024, 13, 2368. https://doi.org/10.3390/electronics13122368
Li Y, Zhang Y, Gao Y, Xu B, Liu X. FCL: Pedestrian Re-Identification Algorithm Based on Feature Fusion Contrastive Learning. Electronics. 2024; 13(12):2368. https://doi.org/10.3390/electronics13122368
Chicago/Turabian StyleLi, Yuangang, Yuhan Zhang, Yunlong Gao, Bo Xu, and Xinyue Liu. 2024. "FCL: Pedestrian Re-Identification Algorithm Based on Feature Fusion Contrastive Learning" Electronics 13, no. 12: 2368. https://doi.org/10.3390/electronics13122368
APA StyleLi, Y., Zhang, Y., Gao, Y., Xu, B., & Liu, X. (2024). FCL: Pedestrian Re-Identification Algorithm Based on Feature Fusion Contrastive Learning. Electronics, 13(12), 2368. https://doi.org/10.3390/electronics13122368