Social-STGMLP: A Social Spatio-Temporal Graph Multi-Layer Perceptron for Pedestrian Trajectory Prediction
Abstract
:1. Introduction
- It is demonstrated that pedestrian trajectory prediction can be modeled more simply and introduces the first-ever pedestrian trajectory prediction approach based on multi-layer perceptrons, termed Social-STGMLP.
- We design an efficient structure consisting solely of fully connected layers and layer normalization. Social-STGMLP showcases impressive performance metrics concerning model parameter count and inference time.
- Through extensive experimentation and analysis, Social-STGMLP demonstrates superior accuracy compared to alternative approaches. This validation underscores the efficacy and superiority of Social-STGMLP.
2. Related Works
3. Our Method: Social-STGMLP
3.1. Problem Definition
3.2. Main Architecture
3.3. Feature Extraction
3.4. Feature Fusion and Trajectory Prediction
4. Experiment
4.1. Datasets and Metrics
4.2. Experimental Settings
4.3. Brief Introduction to Comparison Methods
4.4. Quantitative Analysis
4.5. Ablation Study
4.6. Qualitative Analysis
4.7. Comparison of Experimental Processes, Model Parameters and Inference Time
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Large, F.; Vasquez, D.; Fraichard, T.; Laugier, C. Avoiding cars and pedestrians using velocity obstacles and motion prediction. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 375–379. [Google Scholar]
- Luo, Y.; Cai, P.; Bera, A.; Hsu, D.; Lee, W.S.; Manocha, D. Porca: Modeling and planning for autonomous driving among many pedestrians. IEEE Robot. Autom. Lett. 2018, 3, 3418–3425. [Google Scholar] [CrossRef]
- Wu, P.; Chen, S.; Metaxas, D.N. Motionnet: Joint perception and motion prediction for autonomous driving based on bird’s eye view maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11385–11395. [Google Scholar]
- Rudenko, A.; Palmieri, L.; Herman, M.; Kitani, K.M.; Gavrila, D.M.; Arras, K.O. Human motion trajectory prediction: A survey. Int. J. Robot. Res. 2020, 39, 895–935. [Google Scholar] [CrossRef]
- DeSouza, G.N.; Kak, A.C. Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 237–267. [Google Scholar] [CrossRef]
- Xiao, G.; Juan, Z.; Gao, J. Travel mode detection based on neural networks and particle swarm optimization. Information 2015, 6, 522–535. [Google Scholar] [CrossRef]
- Alghodhaifi, H.; Lakshmanan, S. Holistic Spatio-Temporal Graph Attention for Trajectory Prediction in Vehicle–Pedestrian Interactions. Sensors 2023, 23, 7361. [Google Scholar] [CrossRef] [PubMed]
- Korbmacher, R.; Tordeux, A. Review of pedestrian trajectory prediction methods: Comparing deep learning and knowledge-based approaches. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24126–24144. [Google Scholar] [CrossRef]
- Lian, J.; Ren, W.; Li, L.; Zhou, Y.; Zhou, B. Ptp-stgcn: Pedestrian trajectory prediction based on a spatio-temporal graph convolutional neural network. Appl. Intell. 2023, 53, 2862–2878. [Google Scholar] [CrossRef]
- Sharma, N.; Dhiman, C.; Indu, S. Pedestrian intention prediction for autonomous vehicles: A comprehensive survey. Neurocomputing 2022, 508, 120–152. [Google Scholar] [CrossRef]
- Huang, Y.; Du, J.; Yang, Z.; Zhou, Z.; Zhang, L.; Chen, H. A survey on trajectory-prediction methods for autonomous driving. IEEE Trans. Intell. Veh. 2022, 7, 652–674. [Google Scholar] [CrossRef]
- Zhao, D.; Chen, Y.; Lv, L. Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans. Cogn. Dev. Syst. 2016, 9, 356–367. [Google Scholar] [CrossRef]
- Jozefowicz, R.; Zaremba, W.; Sutskever, I. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 2342–2350. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Shi, L.; Wang, L.; Long, C.; Zhou, S.; Zhou, M.; Niu, Z.; Hua, G. SGCN: Sparse graph convolution network for pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8994–9003. [Google Scholar]
- Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XII 16. pp. 507–523. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Pellegrini, S.; Ess, A.; Schindler, K.; Van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 261–268. [Google Scholar]
- Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. Comput. Graph. Forum 2007, 26, 655–664. [Google Scholar] [CrossRef]
- Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part VIII 14. pp. 549–565. [Google Scholar]
- Liu, Z.; Zhang, Z.; Lei, Z.; Omura, M.; Wang, R.L.; Gao, S. Dendritic Deep Learning for Medical Segmentation. IEEE/CAA J. Autom. Sin. 2024, 11, 803–805. [Google Scholar] [CrossRef]
- Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12085–12094. [Google Scholar]
- Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 14424–14432. [Google Scholar]
- Sang, H.; Chen, W.; Wang, J.; Zhao, Z. RDGCN: Reasonably dense graph convolution network for pedestrian trajectory prediction. Measurement 2023, 213, 112675. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Gulati, A.; Qin, J.; Chiu, C.C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv 2020, arXiv:2005.08100. [Google Scholar]
- Liu, Y.; Yao, L.; Li, B.; Wang, X.; Sammut, C. Social graph transformer networks for pedestrian trajectory prediction in complex social scenarios. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 1339–1349. [Google Scholar]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
- Bouazizi, A.; Holzbock, A.; Kressel, U.; Dietmayer, K.; Belagiannis, V. Motionmixer: Mlp-based 3d human body pose forecasting. arXiv 2022, arXiv:2207.00499. [Google Scholar]
- Guo, W.; Du, Y.; Shen, X.; Lepetit, V.; Alameda-Pineda, X.; Moreno-Noguer, F. Back to mlp: A simple baseline for human motion prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 4809–4819. [Google Scholar]
- Sun, J.; Jiang, Q.; Lu, C. Recursive social behavior graph for trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 660–669. [Google Scholar]
- Raksincharoensak, P.; Hasegawa, T.; Nagai, M. Motion planning and control of autonomous driving intelligence system based on risk potential optimization framework. Int. J. Automot. Eng. 2016, 7, 53–60. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2255–2264. [Google Scholar]
- Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1349–1358. [Google Scholar]
- Kosaraju, V.; Sadeghian, A.; Martín-Martín, R.; Reid, I.; Rezatofighi, H.; Savarese, S. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2019; Volume 32. [Google Scholar]
- Liang, J.; Jiang, L.; Niebles, J.C.; Hauptmann, A.G.; Fei-Fei, L. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5725–5734. [Google Scholar]
- Zhou, L.; Zhao, Y.; Yang, D.; Liu, J. Gchgat: Pedestrian trajectory prediction using group constrained hierarchical graph attention networks. Appl. Intell. 2022, 52, 11434–11447. [Google Scholar] [CrossRef]
- Zhang, X.; Angeloudis, P.; Demiris, Y. Dual-branch spatio-temporal graph neural networks for pedestrian trajectory prediction. Pattern Recognit. 2023, 142, 109633. [Google Scholar] [CrossRef]
- Yang, X.; Fan, J.; Xing, S. IST-PTEPN: An improved pedestrian trajectory and endpoint prediction network based on spatio-temporal information. Int. J. Mach. Learn. Cybern. 2023, 14, 4193–4206. [Google Scholar] [CrossRef]
- Zhu, W.; Liu, Y.; Wang, P.; Zhang, M.; Wang, T.; Yi, Y. Tri-HGNN: Learning triple policies fused hierarchical graph neural networks for pedestrian trajectory prediction. Pattern Recognit. 2023, 143, 109772. [Google Scholar] [CrossRef]
- Lv, K.; Yuan, L. SKGACN: Social knowledge-guided graph attention convolutional network for human trajectory prediction. IEEE Trans. Instrum. Meas. 2023, 72, 2517111. [Google Scholar] [CrossRef]
- Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6272–6281. [Google Scholar]
- Amirian, J.; Hayet, J.B.; Pettré, J. Social ways: Learning multi-modal distributions of pedestrian trajectories with gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Monti, A.; Bertugli, A.; Calderara, S.; Cucchiara, R. Dag-net: Double attentive graph neural network for trajectory forecasting. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2551–2558. [Google Scholar]
- Mohamed, A.; Zhu, D.; Vu, W.; Elhoseiny, M.; Claudel, C. Social-implicit: Rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 463–479. [Google Scholar]
Model | YEAR | ETH | HOTEL | UNIV | ZARA1 | ZARA2 | AVG |
---|---|---|---|---|---|---|---|
Social-LSTM [16] | 2016 | 1.09/2.35 | 0.79/1.76 | 0.67/1.40 | 0.47/1.00 | 0.56/1.17 | 0.72/1.54 |
Social-GAN [36] | 2018 | 0.81/1.52 | 0.72/1.61 | 0.60/1.26 | 0.34/0.69 | 0.42/0.84 | 0.58/1.18 |
Sophie [37] | 2019 | 0.70/1.43 | 0.76/1.67 | 0.54/1.24 | 0.30/0.63 | 0.38/0.78 | 0.54/1.15 |
Social-BiGAT [38] | 2019 | 0.69/1.29 | 0.49/1.01 | 0.55/1.32 | 0.30/0.62 | 0.36/0.75 | 0.48/1.00 |
PIF [39] | 2019 | 0.73/1.65 | 0.30/0.59 | 0.60/1.27 | 0.38/0.81 | 0.31/0.68 | 0.46/1.00 |
SR-LSTM [25] | 2019 | 0.64/1.28 | 0.39/0.78 | 0.52/1.13 | 0.42/0.92 | 0.34/0.74 | 0.46/0.97 |
RSBG [34] | 2020 | 0.80/1.53 | 0.33/0.64 | 0.59/1.25 | 0.40/0.86 | 0.30/0.65 | 0.48/0.99 |
Social-STGCNN [26] | 2020 | 0.64/1.11 | 0.49/0.85 | 0.44/0.79 | 0.34/0.53 | 0.30/0.48 | 0.44/0.75 |
SGCN [18] | 2021 | 0.63/1.03 | 0.32/0.55 | 0.37/0.70 | 0.29/0.53 | 0.25/0.45 | 0.37/0.65 |
GCHGAT [40] | 2022 | 0.63/1.10 | 0.38/0.73 | 0.55/1.16 | 0.33/0.66 | 0.30/0.64 | 0.44/0.86 |
PTP-STGCN [9] | 2022 | 0.63/1.04 | 0.34/0.45 | 0.48/0.87 | 0.37/0.61 | 0.30/0.46 | 0.42/0.68 |
Social TAG [41] | 2023 | 0.61/1.00 | 0.37/0.56 | 0.51/0.87 | 0.33/0.50 | 0.30/0.49 | 0.42/0.68 |
IST-PTEPN [42] | 2023 | 0.46/0.70 | 0.44/0.47 | 0.54/0.92 | 0.35/0.62 | 0.31/0.59 | 0.42/0.66 |
Tri-HGNN [43] | 2023 | 0.62/0.86 | 0.38/0.65 | 0.49/0.88 | 0.27/0.44 | 0.25/0.40 | 0.40/0.65 |
SKGACN [44] | 2023 | 0.55/0.83 | 0.30/0.50 | 0.39/0.75 | 0.30/0.51 | 0.26/0.45 | 0.36/0.61 |
RDGCN [27] | 2023 | 0.58/0.94 | 0.30/0.45 | 0.35/0.65 | 0.28/0.48 | 0.25/0.44 | 0.35/0.59 |
Social-STGMLP | / | 0.60/0.94 | 0.29/0.38 | 0.36/0.59 | 0.27/0.44 | 0.24/0.38 | 0.35/0.54 |
STGAT [45] | Social Ways [46] | DAG-Net [47] | Social-Implicit [48] | Social-STGMLP | |
---|---|---|---|---|---|
ADE | 0.58 | 0.62 | 0.53 | 0.47 | 0.47 |
FDE | 1.11 | 1.16 | 1.04 | 0.89 | 0.75 |
Model | ETH | HOTEL | UNIV | ZARA1 | ZARA2 | AVG |
---|---|---|---|---|---|---|
Social-STGMLP w/o Spa | 0.66/1.06 | 0.31/0.45 | 0.35/0.62 | 0.28/0.48 | 0.23/0.41 | 0.37/0.60 |
Social-STGMLP w/o Tem | 0.63/0.91 | 0.35/0.50 | 0.39/0.64 | 0.30/0.44 | 0.23/0.40 | 0.38/0.58 |
Social-STGMLP (Ours) | 0.60/0.94 | 0.29/0.38 | 0.36/0.59 | 0.27/0.44 | 0.24/0.38 | 0.35/0.54 |
Model | ETH | HOTEL | UNIV | ZARA1 | ZARA2 | AVG |
---|---|---|---|---|---|---|
Social-STGMLP-2 | 0.65/1.08 | 0.30/0.42 | 0.35/0.60 | 0.29/0.44 | 0.25/0.42 | 0.37/0.59 |
Social-STGMLP-4 | 0.64/0.99 | 0.29/0.41 | 0.36/0.58 | 0.28/0.44 | 0.23/0.41 | 0.36/0.57 |
Social-STGMLP-8 | 0.63/0.95 | 0.30/0.39 | 0.37/0.64 | 0.28/0.45 | 0.23/0.41 | 0.36/0.57 |
Social-STGMLP-16 (Ours) | 0.60/0.94 | 0.29/0.38 | 0.36/0.59 | 0.27/0.44 | 0.24/0.38 | 0.35/0.54 |
Social-STGMLP-24 | 0.66/1.01 | 0.31/0.44 | 0.67/0.82 | 0.28/0.46 | 0.23/0.39 | 0.43/0.62 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Meng, D.; Zhao, G.; Yan, F. Social-STGMLP: A Social Spatio-Temporal Graph Multi-Layer Perceptron for Pedestrian Trajectory Prediction. Information 2024, 15, 341. https://doi.org/10.3390/info15060341
Meng D, Zhao G, Yan F. Social-STGMLP: A Social Spatio-Temporal Graph Multi-Layer Perceptron for Pedestrian Trajectory Prediction. Information. 2024; 15(6):341. https://doi.org/10.3390/info15060341
Chicago/Turabian StyleMeng, Dexu, Guangzhe Zhao, and Feihu Yan. 2024. "Social-STGMLP: A Social Spatio-Temporal Graph Multi-Layer Perceptron for Pedestrian Trajectory Prediction" Information 15, no. 6: 341. https://doi.org/10.3390/info15060341
APA StyleMeng, D., Zhao, G., & Yan, F. (2024). Social-STGMLP: A Social Spatio-Temporal Graph Multi-Layer Perceptron for Pedestrian Trajectory Prediction. Information, 15(6), 341. https://doi.org/10.3390/info15060341