Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. System Overview
3.2. Two-Path Spatial-Temporal Feature Fusion Module
3.2.1. Multi-Scale Feature Extraction (MSFE)
3.2.2. Frame-Level Spatial Feature Extraction (FLSFE)
3.2.3. Multi-Scale Temporal Feature Extraction (MSTFE)
3.3. View Embedding Module
3.3.1. View Prediction
3.3.2. HPP with View Embedding
3.4. Joint Losses
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Comparison with State-of-the-Art Methods
4.4. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, J.; Zheng, N. Gait History Image: A Novel Temporal Template for Gait Recognition. In Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, Beijing, China, 2–5 July 2007. [Google Scholar]
- Singh, S.; Biswas, K.K. Biometric Gait Recognition with Carrying and Clothing Variants. Pattern Recognition and Machine Intelligence. In Proceedings of the Third International Conference, New Delhi, India, 16–20 December 2009. [Google Scholar]
- Huang, S.; Elgammal, A.; Lu, J.; Yang, D. Cross-speed Gait Recognition Using Speed-Invariant Gait Templates and Globality–Locality Preserving Projections. IEEE Trans. Inf. Forensics Secur. 2015, 10, 2071–2083. [Google Scholar] [CrossRef]
- Chao, H.; He, Y.; Zhang, J.; Feng, J. Gaitset: Regarding Gait as A Set for Cross-View Gait Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Hou, S.; Cao, C.; Liu, X.; Huang, Y. Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition. In Proceedings of the European Conference on Computer Vision, Edinburgh, UK, 23–28 August 2020. [Google Scholar]
- Chen, Y.; Zhao, Y.; Li, X. Spatio-Temporal Gait Feature with Adaptive Distance Alignment. arXiv 2022, arXiv:2203.03376v3. [Google Scholar]
- Fan, C.; Peng, Y.; Cao, C.; Liu, X. Gaitpart: Temporal Part-Based Model for Gait Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–20 June 2020. [Google Scholar]
- Huang, Z.; Xue, D.; Shen, X.; Tian, X. 3D Local Convolutional Neural Networks for Gait Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Liao, R.; Cao, C.; Garcia, E.B.; Yu, S. Pose-based Temporal-Spatial Network (PTSN) for Gait Recognition with Carrying and Clothing Variations. In Proceedings of the Chinese Conference on Biometric Recognition, Shenzhen, China, 28–29 October 2017. [Google Scholar]
- Liao, R.; Yu, S.; An, W.; Huang, Y. A Model-based Gait Recognition Method with Body Pose and Human Prior Knowledge. Pattern Recognit. 2020, 98, 107069. [Google Scholar] [CrossRef]
- Shiraga, K.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Geinet: View-Invariant Gait Recognition Using a Convolutional Neural Network. In Proceedings of the 2016 International Conference on Biometrics, Halmstad, Sweden, 13–16 June 2016. [Google Scholar]
- Memory, L.S.T. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
- Wang, Y.; Sun, J.; Li, J.; Zhao, D. Gait Recognition Based on 3D Skeleton Joints Captured by Kinect. In Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Guler, R.A.; Neverova, N.; DensePose, I.K. Densepose: Dense Human Pose Estimation in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Ben, X.; Zhang, P.; Lai, Z.; Yan, R.; Zhai, X. A General Tensor Representation Framework for Cross-View Gait Recognition. Pattern Recognit. 2019, 90, 87–98. [Google Scholar] [CrossRef]
- Fan, C.; Liang, J.; Shen, C.; Hou, S.; Huang, Y. OpenGait: Revisiting Gait Recognition Towards Better Practicality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zhang, C.; Liu, W.; Ma, H.; Fu, H. Siamese Neural Network Based Gait Recognition for Human Identification. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 20–25 March 2016. [Google Scholar]
- Wu, Z.; Huang, Y.; Wang, L.; Wang, X.; Tan, T. A Comprehensive Study on Cross-view Gait Based Human Identification with Deep CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 209–226. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Huang, Y.; Yu, S.; Wang, L. Cross-view Gait Recognition by Discriminative Feature Learning. IEEE Trans. Image Process. 2019, 29, 1001–1015. [Google Scholar] [CrossRef] [PubMed]
- Qin, H.; Chen, Z.; Guo, Q.; Wu, Q.J.; Lu, M. RPNet: Gait Recognition with Relationships Between Each Body-Parts. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2990–3000. [Google Scholar] [CrossRef]
- Lin, B.; Zhang, S.; Yu, X. Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Beijing, China, 20–25 June 2021. [Google Scholar]
- Wang, M.; Lin, B.; Guo, X.; Li, L.; Zhu, Z. GaitStrip: Gait Recognition via Effective Strip-Based Feature Representations and Multi-Level Framework. In Proceedings of the Asian Conference on Computer Vision, Macau, China, 4–8 December 2022. [Google Scholar]
- Huang, X.; Zhu, D.; Wang, H.; Wang, X.; Yang, B. Context-Sensitive Temporal Feature Learning for Gait Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Wu, H.; Tian, J.; Fu, Y.; Li, B.; Li, X. Condition-Aware Comparison Scheme for Gait Recognition. IEEE Trans. Image Process. 2020, 30, 2734–2744. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Tran, L.; Yin, X.; Atoum, Y.; Liu, X.; Wan, J.; Wang, N. Gait Recognition via Disentangled Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Ariyanto, G.; Nixon, M.S. Marionette Mass-spring Model for 3D Gait Biometrics. In Proceedings of the 2012 5th IAPR International Conference on Biometrics, New Delhi, India, 29 March–1 April 2012. [Google Scholar]
- Ariyanto, G.; Nixon, M.S. Model-Based 3D Gait Biometrics. In Proceedings of the 2011 International Joint Conference on Biometrics, Washington, DC, USA, 11–13 October 2011. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Wolf, T.; Babaee, M.; Rigoll, G. Multi-View Gait Recognition Using 3D Convolutional Neural Networks. In Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar]
- Lin, B.; Zhang, S.; Bao, F. Gait Recognition with Multiple-Temporal-Scale 3D Convolutional Neural Network. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020. [Google Scholar]
- He, Y.; Zhang, J.; Shan, H.; Wang, L. Multi-Task GANs for View-Specific Feature Learning in Gait Recognition. IEEE Trans. Inf. Forensics Secur. 2018, 14, 102–113. [Google Scholar] [CrossRef]
- Chai, T.; Mei, X.; Li, A.; Wang, Y. Silhouette-Based View-Embeddings for Gait Recognition under Multiple Views. In Proceedings of the IEEE International Conference on Image Processing, Anchorage, AK, USA, 19–22 September 2021. [Google Scholar]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B. Recent Advances in Convolutional Neural Networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
- Huo, Y.; Gang, S.; Guan, C. FCIHMRT: Feature Cross-Layer Interaction Hybrid Method Based on Res2Net and Transformer for Remote Sensing Scene Classification. Electronics 2023, 12, 4362. [Google Scholar] [CrossRef]
- Zhang, Z.; Sabuncu, M. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3 December 2018. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In Defense of The Triplet Loss for Person Re-Identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Yu, S.; Tan, D.; Tan, T. A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition. In Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, 20–24 August 2006. [Google Scholar]
- Takemura, N.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Multi-View Large Population Gait Dataset and Its Performance Evaluation for Cross-View Gait Recognition. IPSJ Trans. Comput. Vis. Appl. 2018, 10, 4. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E. Automatic Differentiation in Pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhu, Z.; Guo, X.; Yang, T.; Huang, J.; Deng, J. Gait Recognition in the Wild: A Benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Zheng, J.; Liu, X.; Liu, W.; He, L.; Yan, C. Gait Recognition in the Wild with Dense 3D Representations and a Benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Yao, L.; Kusakunniran, W.; Wu, Q.; Xu, J.; Zhang, J. Collaborative Feature Learning for Gait Recognition under Cloth Changes. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3615–3629. [Google Scholar] [CrossRef]
- Zhao, L.; Guo, L.; Zhang, R.; Xie, X.; Ye, X. MmGaitSet: Multimodal Based Gait Recognition for Countering Carrying and Clothing Changes. Appl. Intell. 2022, 52, 2023–2036. [Google Scholar] [CrossRef]
Gallery NM#1–4 | 0–180° | Mean | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Probe | 0° | 18° | 36° | 54° | 72° | 90° | 108° | 126° | 144° | 162° | 180° | ||
NM#5–6 | GaitNet [12] | 93.1 | 92.6 | 90.8 | 92.4 | 87.6 | 95.1 | 94.2 | 95.8 | 92.6 | 90.4 | 90.2 | 92.3 |
GaitSet [4] | 90.8 | 97.9 | 99.4 | 96.9 | 93.6 | 91.7 | 95.0 | 97.8 | 98.9 | 96.8 | 85.8 | 95.0 | |
GaitPart [7] | 94.1 | 98.6 | 99.3 | 98.5 | 94.0 | 92.3 | 95.9 | 98.4 | 99.2 | 97.8 | 90.4 | 96.2 | |
MT3D [29] | 95.7 | 98.2 | 99.0 | 97.5 | 95.1 | 93.9 | 96.1 | 98.6 | 99.2 | 98.2 | 92.0 | 96.7 | |
RPNet [21] | 95.1 | 99.0 | 99.1 | 98.3 | 95.7 | 93.6 | 95.9 | 98.3 | 98.6 | 97.7 | 90.8 | 96.6 | |
ours | 96.1 | 99.6 | 99.8 | 98.7 | 96.5 | 95.9 | 96.7 | 99.3 | 99.4 | 98.8 | 94.2 | 97.7 | |
BG#1–2 | GaitNet [12] | 88.8 | 88.7 | 88.7 | 94.3 | 85.4 | 92.7 | 91.1 | 92.6 | 84.9 | 84.4 | 86.7 | 88.9 |
GaitSet [4] | 83.8 | 91.2 | 91.8 | 88.8 | 83.3 | 81.0 | 84.1 | 90.0 | 92.2 | 94.4 | 79.0 | 87.2 | |
GaitPart [7] | 89.1 | 94.8 | 96.7 | 95.1 | 88.3 | 84.9 | 89.0 | 93.5 | 96.1 | 93.8 | 85.8 | 91.5 | |
MT3D [29] | 91.0 | 95.4 | 97.5 | 94.2 | 92.3 | 86.9 | 91.2 | 95.6 | 97.3 | 96.4 | 86.6 | 93.0 | |
RPNet [21] | 92.6 | 92.3 | 96.6 | 94.5 | 91.9 | 87.6 | 90.7 | 94.7 | 96.0 | 93.9 | 86.1 | 92.8 | |
ours | 93.1 | 96.1 | 97.2 | 95.1 | 91.6 | 87.8 | 91.1 | 96.1 | 97.5 | 95.8 | 89.5 | 93.7 | |
CL#1–2 | GaitNet [12] | 50.1 | 60.7 | 72.4 | 72.7 | 74.6 | 78.4 | 70.3 | 68.2 | 53.5 | 44.1 | 40.8 | 62.3 |
GaitSet [4] | 61.4 | 75.4 | 80.7 | 77.3 | 72.1 | 70.1 | 71.5 | 73.5 | 73.5 | 68.4 | 50.0 | 70.4 | |
GaitPart [7] | 70.7 | 85.5 | 86.9 | 83.3 | 77.1 | 72.5 | 76.9 | 82.2 | 83.8 | 80.2 | 66.5 | 78.7 | |
MT3D [29] | 76.0 | 87.6 | 89.8 | 85.0 | 81.2 | 75.7 | 81.0 | 84.5 | 85.4 | 82.2 | 68.1 | 81.5 | |
RPNet [21] | 75.6 | 87.1 | 88.3 | 87.1 | 83.1 | 78.0 | 79.9 | 82.7 | 83.9 | 78.9 | 66.6 | 80.3 | |
ours | 77.4 | 89.0 | 90.0 | 87.3 | 83.8 | 79.1 | 80.7 | 87.2 | 87.5 | 85.0 | 75.2 | 83.8 |
Probe | RPNet [21] | GaitSet [4] | GaitPart [7] | Ours |
---|---|---|---|---|
0° | 73.5 | 79.5 | 82.6 | 84.2 |
15° | 84.4 | 87.9 | 88.9 | 89.9 |
30° | 89.6 | 89.9 | 90.8 | 91.3 |
45° | 89.8 | 90.2 | 91.0 | 90.8 |
60° | 86.3 | 88.1 | 89.7 | 90.2 |
75° | 87.4 | 88.7 | 89.9 | 89.6 |
90° | 86.0 | 87.8 | 89.5 | 88.9 |
180° | 76.3 | 81.7 | 85.2 | 86.7 |
195° | 83.2 | 86.7 | 88.1 | 89.7 |
210° | 88.6 | 89.0 | 90.0 | 90.5 |
225° | 88.9 | 89.3 | 90.1 | 90.2 |
240° | 85.7 | 87.2 | 89.0 | 89.7 |
255° | 86.4 | 87.8 | 89.1 | 89.5 |
270° | 84.4 | 86.2 | 88.2 | 89.8 |
Mean | 85.0 | 87.1 | 88.7 | 89.4 |
Model | NM | BG | CL | Mean |
---|---|---|---|---|
Baseline | 89.8 | 82.0 | 59.5 | 77.1 |
Baseline + MSFE | 91.6 | 83.5 | 60.7 | 78.6 |
Baseline + MSFE + FLSFE | 95.6 | 88.6 | 75.2 | 86.5 |
Baseline + MSFE + FLSFE + MSTFE | 97.0 | 92.9 | 81.6 | 90.5 |
Baseline + MSFE + FLSFE + MSTFE + view embedding | 97.7 | 93.7 | 83.8 | 91.7 |
Kernel Size | NM | BG | CL |
---|---|---|---|
1 | 89.8 | 82 | 59.5 |
1, 3 | 91.3 | 82.8 | 60.2 |
1, 5 | 90.9 | 82.6 | 59.9 |
1, 3, 5 | 91.6 | 83.5 | 60.7 |
R-conv1 | R-conv2 | R-conv3 | NM | BG | CL |
---|---|---|---|---|---|
1 | 1 | 1 | 95.3 | 86.1 | 71.9 |
2 | 2 | 2 | 95.3 | 86.9 | 72.8 |
2 | 4 | 4 | 95.4 | 87.8 | 73.9 |
2 | 8 | 8 | 95.6 | 88.6 | 75.2 |
4 | 8 | 8 | 95.9 | 87.9 | 74.7 |
Short-Term | Long-Term | NM | BG | CL |
---|---|---|---|---|
✓ | 96.8 | 91.9 | 80.6 | |
✓ | 95.5 | 90.3 | 74.3 | |
✓ | ✓ | 97.0 | 92.9 | 81.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guan, D.; Hua, C.; Zhao, X. Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition. Appl. Sci. 2023, 13, 12808. https://doi.org/10.3390/app132312808
Guan D, Hua C, Zhao X. Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition. Applied Sciences. 2023; 13(23):12808. https://doi.org/10.3390/app132312808
Chicago/Turabian StyleGuan, Diyuan, Chunsheng Hua, and Xiaoheng Zhao. 2023. "Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition" Applied Sciences 13, no. 23: 12808. https://doi.org/10.3390/app132312808
APA StyleGuan, D., Hua, C., & Zhao, X. (2023). Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition. Applied Sciences, 13(23), 12808. https://doi.org/10.3390/app132312808