Dynamic Weighting Network for Person Re-Identification
Abstract
:1. Introduction
- To enable the ReID model to retain the powerful ability to extract local features of CNNs while also acquiring long-distance dependencies without exceeding resource consumption limits, we conducted extensive experiments to investigate the feasibility and challenges of using a neural network model with a parallel structure of both CNNs and Transformers in the ReID task;
- We propose the FFG to iteratively fuse CNN-based local features with Transformer-based global representations based on the problems identified in the experimental results. We experimentally verified the general applicability of the FFG;
- We propose a high-performance ReID framework called DWNet, which is based on FFG. DWNet has an ability to fuse local features and global representations based on specific conditions. It outperforms the original baseline in the ReID task with comparable parameter complexity and computational consumption, demonstrating its potential to be the backbone of the ReID model.
2. Related Work
2.1. Object ReID
2.2. Transformer in Vision
3. Methods
3.1. Feature Fusion Gate
3.2. DWNet Uses ResNet as the CNN Backbone (DWNet-R)
3.3. DWNet Uses OSNet as the CNN Backbone (DWNet-O)
4. Experiments
4.1. Datasets and Evaluation Protocol
4.2. Ablation Experiments to Verify the Effectiveness of FFG
4.3. Ablation Experiments for Structure Selection of DWNet Framework
4.4. Comparison with Baseline and Other Methods
5. Discussion
5.1. Experimental Results and Analysis
5.2. Ethical Considerations and Future Improvements for DWNet
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Khorramshahi, P.; Peri, N.; Chen, J.C.; Chellappa, R. The devil is in the details: Self-supervised attention for vehicle re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 369–386. [Google Scholar]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV) 2018, Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
- Sun, Y.; Cheng, C.; Zhang, Y.; Zhang, C.; Zheng, L.; Wang, Z.; Wei, Y. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6398–6407. [Google Scholar]
- Wang, Y.; Chen, Z.; Wu, F.; Wang, G. Person re-identification with cascaded pairwise convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1470–1478. [Google Scholar]
- Wang, G.; Lai, J.H.; Liang, W.; Wang, G. Smoothing adversarial domain attack and p-memory reconsolidation for cross-domain person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10568–10577. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wu, B.; Xu, C.; Dai, X.; Wan, A.; Zhang, P.; Yan, Z.; Tomizuka, M.; Gonzalez, J.; Keutzer, K.; Vajda, P. Visual transformers: Token-based image representation and processing for computer vision. arXiv 2020, arXiv:2006.03677. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Yuan, L.; Chen, Y.; Wang, T.; Yu, W.; Shi, Y.; Jiang, Z.H.; Tay, F.E.; Feng, J.; Yan, S. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 558–567. [Google Scholar]
- Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12175–12185. [Google Scholar]
- Peng, Z.; Huang, W.; Gu, S.; Xie, L.; Wang, Y.; Jiao, J.; Ye, Q. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 367–376. [Google Scholar]
- Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3286–3295. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhou, K.; Yang, Y.; Cavallaro, A.; Xiang, T. Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3702–3712. [Google Scholar]
- Liu, H.; Feng, J.; Qi, M.; Jiang, J.; Yan, S. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 2017, 26, 3492–3506. [Google Scholar] [CrossRef] [Green Version]
- Zheng, Z.; Zheng, L.; Yang, Y. A discriminatively learned cnn embedding for person reidentification. Acm Trans. Multimed. Comput. Commun. Appl. (Tomm) 2017, 14, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Zhao, L.; Li, X.; Zhuang, Y.; Wang, J. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3219–3228. [Google Scholar]
- Suh, Y.; Wang, J.; Tang, S.; Mei, T.; Lee, K.M. Part-aligned bilinear representations for person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 402–419. [Google Scholar]
- Wang, G.; Yuan, Y.; Chen, X.; Li, J.; Zhou, X. Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 274–282. [Google Scholar]
- Zhu, K.; Guo, H.; Zhang, S.; Wang, Y.; Huang, G.; Qiao, H.; Liu, J.; Wang, J.; Tang, M. Aaformer: Auto-aligned transformer for person re-identification. arXiv 2021, arXiv:2104.00921. [Google Scholar]
- He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15013–15022. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Islam, M.A.; Kowal, M.; Jia, S.; Derpanis, K.G.; Bruce, N.D. Position, padding and predictions: A deeper look at position information in cnns. arXiv 2021, arXiv:2101.12322. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, 11–14 October 2016; pp. 17–35. [Google Scholar]
- Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 79–88. [Google Scholar]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
- He, L.; Liao, X.; Liu, W.; Liu, X.; Cheng, P.; Mei, T. Fastreid: A pytorch toolbox for general instance re-identification. arXiv 2020, arXiv:2006.02631. [Google Scholar]
- Chen, G.; Lin, C.; Ren, L.; Lu, J.; Zhou, J. Self-critical attention learning for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9637–9646. [Google Scholar]
- Tay, C.P.; Roy, S.; Yap, K.H. Aanet: Attribute attention network for person re-identifications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7134–7143. [Google Scholar]
- Zheng, Z.; Yang, X.; Yu, Z.; Zheng, L.; Yang, Y.; Kautz, J. Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2138–2147. [Google Scholar]
- Dai, Z.; Chen, M.; Gu, X.; Zhu, S.; Tan, P. Batch dropblock network for person re-identification and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3691–3701. [Google Scholar]
- Hou, R.; Ma, B.; Chang, H.; Gu, X.; Shan, S.; Chen, X. Interaction-and-aggregation network for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9317–9326. [Google Scholar]
Model | Market1501 | DukeMTMC-reID | MSMT17 | CUHK03-L | CUHK03-D | |||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | |
BoTNet-50 | 80.98 | 92.04 | 71.28 | 83.38 | 47.05 | 71.48 | 65.43 | 69.38 | 62.28 | 65.82 |
ResNet-50 | 85.23 | 94.01 | 76.33 | 84.85 | 48.80 | 74.26 | 68.28 | 69.88 | 67.05 | 67.96 |
CNN and Transformer in parallel | 84.96 | 93.98 | 76.27 | 84.45 | 48.18 | 70.71 | 69.23 | 70.45 | 65.12 | 66.13 |
FFG | 87.53 | 94.98 | 79.18 | 88.48 | 50.03 | 75.36 | 70.38 | 72.98 | 68.43 | 70.92 |
Model | Market1501 | DukeMTMC-reID | MSMT17 | CUHK03-L | CUHK03-D | |||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | |
Replaces the third layer | 85.68 | 94.30 | 77.01 | 85.86 | 48.60 | 72.00 | 68.18 | 70.01 | 66.82 | 69.01 |
Replaces the fourth layer | 87.53 | 94.98 | 79.18 | 88.48 | 50.03 | 75.36 | 70.38 | 72.98 | 68.43 | 70.92 |
ResNet-50 | 85.23 | 94.01 | 76.33 | 84.85 | 48.80 | 74.26 | 67.28 | 69.01 | 65.23 | 68.32 |
Model | Market1501 | DukeMTMC-reID | MSMT17 | CUHK03-L | CUHK03-D | |||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | |
(0,0,0) | 84.9 | 94.8 | 73.5 | 88.6 | 52.9 | 78.7 | - | - | 67.8 | 72.3 |
(1,0,0) | 86.83 | 95.9 | 78.68 | 89.10 | 55.66 | 78.96 | 71.96 | 74.00 | 68.81 | 71.25 |
(0,1,0) | 86.16 | 93.94 | 77.36 | 88.55 | 54.47 | 78.58 | 70.78 | 73.25 | 68.52 | 71.05 |
(0,0,1) | 86.27 | 94.39 | 77.48 | 89.18 | 54.42 | 78.67 | 71.02 | 73.78 | 68.63 | 71.11 |
(1,1,0) | 86.31 | 94.45 | 76.92 | 88.02 | 53.51 | 77.74 | 71.13 | 73.84 | 68.72 | 71.24 |
(1,0,1) | 86.15 | 94.63 | 77.02 | 87.93 | 53.43 | 77.61 | 70.38 | 72.98 | 68.43 | 70.92 |
(0,1,1) | 86.38 | 94.54 | 77.19 | 87.75 | 54.00 | 78.28 | 71.22 | 73.86 | 68.77 | 71.29 |
(1,1,1) | 86.28 | 94.21 | 76.96 | 88.38 | 53.01 | 77.67 | 71.16 | 73.88 | 68.66 | 71.01 |
Model | Market1501 | DukeMTMC-reID | MSMT17 | CUHK03-L | CUHK03-D | |||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | |
DWNet-R | 87.53 | 94.98 | 79.18 | 88.48 | 50.03 | 75.36 | 70.38 | 72.98 | 68.43 | 70.92 |
ResNet | 85.23 | 94.01 | 76.33 | 84.85 | 48.80 | 74.26 | 67.28 | 69.01 | 65.23 | 68.32 |
DWNet-O | 86.83 | 95.9 | 78.68 | 89.10 | 55.66 | 78.96 | 71.96 | 74.00 | 68.81 | 71.25 |
OSNet | 84.9 | 94.8 | 73.5 | 88.6 | 52.9 | 78.7 | - | - | 67.8 | 72.3 |
Model | Market1501 | DukeMTMC-reID | MSMT17 | CUHK03-L | CUHK03-D | |||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | mAP | rank-1 | |
PCB [10] | 81.6 | 93.8 | 69.2 | 83.3 | 40.4 | 68.2 | - | - | 57.5 | 63.7 |
AANet [34] | 83.4 | 93.9 | 74.3 | 87.7 | - | - | - | - | - | - |
DGNet [35] | 86.0 | 94.8 | 73.5 | 88.6 | 52.3 | 77.2 | - | - | - | - |
BDB [36] | 86.7 | 95.3 | 76.0 | 89.0 | - | - | 71.7 | 73.6 | 69.3 | 72.8 |
OSNet [18] | 84.9 | 94.8 | 73.5 | 88.6 | 52.9 | 78.7 | - | - | 67.8 | 72.3 |
MGN [23] | 86.9 | 95.7 | 78.4 | 88.7 | 52.1 | 76.9 | 68.0 | 67.4 | 66.8 | 66.0 |
IANet [37] | 83.1 | 94.4 | 73.4 | 87.1 | 46.8 | 75.5 | - | - | - | - |
SCAL [33] | 89.3 | 95.8 | 79.6 | 89.0 | - | - | 72.3 | 74.8 | 68.6 | 71.1 |
DWNet-R | 87.53 | 94.98 | 79.18 | 88.48 | 50.03 | 75.36 | 70.38 | 72.98 | 68.43 | 70.92 |
DWNet-O | 86.83 | 95.9 | 78.68 | 89.10 | 55.66 | 78.96 | 71.96 | 74.00 | 68.81 | 71.25 |
Model | Total Params | Total Memory | Total MAdd | Total Flops | Total MemR+W |
---|---|---|---|---|---|
ResNet | 23,508,032 | 97.75 MB | 8.14 GMAdd | 4.08 GFlops | 252.05 MB |
DWNet-R | 55,782,720 | 112.77 MB | 16.2 GMAdd | 8.11 GFlops | 399.69 MB |
OSNet | 1,905,828 | 104.69 MB | 1.99 GMAdd | 1.0 GFlops | 214.96 MB |
DWNet-O | 1,976,372 | 115.76 MB | 2.19 GMAdd | 1.11 GFlops | 234.33 MB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, G.; Liu, P.; Cao, X.; Liu, C. Dynamic Weighting Network for Person Re-Identification. Sensors 2023, 23, 5579. https://doi.org/10.3390/s23125579
Li G, Liu P, Cao X, Liu C. Dynamic Weighting Network for Person Re-Identification. Sensors. 2023; 23(12):5579. https://doi.org/10.3390/s23125579
Chicago/Turabian StyleLi, Guang, Peng Liu, Xiaofan Cao, and Chunguang Liu. 2023. "Dynamic Weighting Network for Person Re-Identification" Sensors 23, no. 12: 5579. https://doi.org/10.3390/s23125579
APA StyleLi, G., Liu, P., Cao, X., & Liu, C. (2023). Dynamic Weighting Network for Person Re-Identification. Sensors, 23(12), 5579. https://doi.org/10.3390/s23125579