Spatiotemporal Transformer Neural Network for Time-Series Forecasting
Abstract
:1. Introduction
- An STNN is developed to adopt the STI equation, which transforms the spatial information of high-dimensional variables into the temporal evolution information of one target variable, thus equivalently expanding the sample size and alleviating the short-term data problem.
- A continuous attention mechanism is developed to improve the numerical prediction accuracy of the STNN.
- A continuous spatial self-attention structure in the STNN is developed to capture the effective spatial information of high-dimensional variables, with the temporal self-attention structure used to capture the temporal evolution information of the target variable, and the transformation attention structure used to combine spatial information and future temporal information.
- We show that the STNN model can reconstruct the phase space of the dynamical system, which is explored in the time-series prediction.
2. Related Works
2.1. Delay Embedding for Spatiotemporal Transformation Equation
2.2. Transformer Neural Network for Time-Series Prediction
3. Problem Setup and Methodology
3.1. Problem Definition
3.2. STNN Model
3.2.1. Encoder
3.2.2. Decoder
3.2.3. Objective Function for STNN Model
4. Experiments
4.1. Datasets
4.1.1. Benchmarks
4.1.2. Public Datasets
4.2. Experimental Details
4.3. Results and Analysis
4.3.1. Time-Series Forecasting
4.3.2. Characteristic Experiment
4.3.3. The Performance of STNN
4.3.4. Ablation Experiment
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- Zhang, L.; Liu, G.; Kong, M.; Li, T.; Wu, D.; Zhou, X.; Yang, C.; Xia, L.; Yang, Z.; Chen, L. Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model. Bioinformatics 2019, 37, 1554–1561. [Google Scholar] [CrossRef] [PubMed]
- Xiao, M.; Liu, G.; Xie, J.; Dai, Z.; Wei, Z.; Ren, Z.; Yu, J.; Zhang, L. 2019nCoVAS: Developing the Web Service for Epidemic Transmission Prediction, Genome Analysis, and Psychological Stress Assessment for 2019-nCoV. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 1250–1261. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Shasha, D.E. StatStream: Statistical monitoring of thousands of data streams in real time. In VLDB’02: Proceedings of the 28th International Conference on Very Large Databases; Morgan Kaufmann: San Francisco, CA, USA, 2002; pp. 358–369. [Google Scholar]
- Zhang, X.; Huang, C.; Xu, Y.; Xia, L.; Dai, P.; Bo, L.; Zhang, J.; Zheng, Y. Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network. Proc. Conf. AAAI Artif. Intell. 2021, 35, 15008–15015. [Google Scholar] [CrossRef]
- Bosilovich, M.G.; Robertson, F.R.; Chen, J. NASA’s Modern Era Retrospective-analysis for Research and Applications (MERRA). U.S. CLIVAR Var. 2006, 4, 5–8. [Google Scholar]
- Lai, G.; Chang, W.-C.; Yang, Y.; Liu, H. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. In Proceedings of the SIGIR ‘18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018. [Google Scholar]
- Ma, H.; Leng, S.; Aihara, K.; Lin, W.; Chen, L. PNAS Plus: Randomly distributed embedding making short-term high-dimensional data predictable. Proc. Natl. Acad. Sci. USA 2018, 115, E9994–E10002. [Google Scholar] [CrossRef] [Green Version]
- Masarotto, G. Bootstrap prediction intervals for autoregressions. Int. J. Forecast. 1990, 6, 229–239. [Google Scholar] [CrossRef]
- Box, G.E.; Pierce, D.A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, Y.; Wang, Y. Performance evaluation of kernel functions based on grid search for support vector regression. In Proceedings of the 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), Beijing, China, 2–5 August 2015. [Google Scholar]
- Shamshirband, S.; Petkovic, D.; Javidnia, H.; Gani, A. Sensor Data Fusion by Support Vector Regression Methodology—A Comparative Study. IEEE Sens. J. 2014, 15, 850–854. [Google Scholar] [CrossRef]
- Wu, W.; Song, L.; Yang, Y.; Wang, J.; Liu, H.; Zhang, L. Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model. BMC Bioinform. 2020, 21 (Suppl. 7), 152. [Google Scholar] [CrossRef]
- Song, H.; Chen, L.; Cui, Y.; Li, Q.; Wang, Q.; Fan, J.; Yang, J.; Zhang, L. Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training. Neurocomputing 2021, 469, 354–365. [Google Scholar] [CrossRef]
- Gao, J.; Liu, P.; Liu, G.D.; Zhang, L. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods. J. Comput. Sci. Technol. 2021, 36, 334–346. [Google Scholar] [CrossRef]
- Liu, G.-D.; Li, Y.-C.; Zhang, W.; Zhang, L. A Brief Review of Artificial Intelligence Applications and Algorithms for Psychiatric Disorders. Engineering 2019, 6, 462–467. [Google Scholar] [CrossRef]
- Jiang, J.; Lai, Y.-C. Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius. Phys. Rev. Res. 2019, 1, 033056. [Google Scholar] [CrossRef] [Green Version]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Haluszczynski, A.; Rth, C. Good and bad predictions: Assessing and improving the replication of chaotic attractors by means of reservoir computing. Chaos 2019, 29, 103143. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.; Li, R.; Shu, L.; He, Z.; Wang, J.; Zhang, C.; Ma, H.; Aihara, K.; Chen, L. Predicting future dynamics from short-term time series using an Anticipated Learning Machine. Natl. Sci. Rev. 2020, 7, 1079–1091. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sauer, T.; Yorke, J.A.; Casdagli, M. Embedology. J. Stat. Phys. 1991, 65, 579–616. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762v5. [Google Scholar]
- Azencot, O.; Erichson, N.B.; Lin, V.; Mahoney, M.W. Forecasting Sequential Data Using Consistent Koopman Autoencoders. arXiv 2020, arXiv:2003.02236v2. [Google Scholar]
- Lusch, B.; Kutz, J.N.; Brunton, S. Deep learning for universal linear embeddings of nonlinear dynamics. Nat. Commun. 2018, 9, 4950. [Google Scholar] [CrossRef] [Green Version]
- Wu, N.; Green, B.; Ben, X.; O’Banion, S. Deep transformer models for time series forecasting: The influenza prevalence case. arXiv 2020, arXiv:2001.08317. [Google Scholar]
- Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Packard, N.H.; Crutchfield, J.P.; Farmer, J.D.; Shaw, R.S. Geometry from a Time Series. Phys. Rev. Lett. 1980, 45, 712–716. [Google Scholar] [CrossRef]
- Chen, P.; Liu, R.; Aihara, K.; Chen, L. Autoreservoir computing for multistep ahead prediction based on the spatiotemporal information transformation. Nat. Commun. 2020, 11, 4568. [Google Scholar] [CrossRef] [PubMed]
- Shih, S.-Y.; Sun, F.-K.; Lee, H.-Y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef] [Green Version]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. Adv. Neural Inf. Process. Syst. 2014, 27, 3104–3112. [Google Scholar]
- Smale, S. Differential Equations, Dynamical Systems, and Linear Algebra; Academic Press: Cambridge, MA, USA, 1974; Volume 60. [Google Scholar]
- Greydanus, S.; Dzamba, M.; Yosinski, J. Hamiltonian Neural Networks. arXiv 2019, arXiv:1906.01563. [Google Scholar]
- Bertalan, T.; Dietrich, F.; Mezić, I.; Kevrekidis, I.G. On learning Hamiltonian systems from data. Chaos Interdiscip. J. Nonlinear Sci. 2019, 29, 121107. [Google Scholar] [CrossRef] [Green Version]
- Curry, J.H. A generalized Lorenz system. Commun. Math. Phys. 1978, 60, 193–204. [Google Scholar] [CrossRef]
- Takeishi, N.; Kawahara, Y.; Yairi, T. Learning Koopman invariant subspaces for dynamic mode decomposition. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Long Beach, California, USA, 2017; pp. 1130–1140. [Google Scholar]
- Bianconi, F.; Antonini, C.; Tomassoni, L.; Valigi, P. Robust Calibration of High Dimension Nonlinear Dynamical Models for Omics Data: An Application in Cancer Systems Biology. IEEE Trans. Control Syst. Technol. 2018, 28, 196–207. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, X.-S.; Chen, L. A Network Biology Study on Circadian Rhythm by Integrating Various Omics Data. OMICS A J. Integr. Biol. 2009, 13, 313–324. [Google Scholar] [CrossRef] [PubMed]
- Hirata, Y.; Aihara, K. Predicting ramps by integrating different sorts of information. Eur. Phys. J. Spéc. Top. 2016, 225, 513–525. [Google Scholar] [CrossRef]
- Qing, X.; Niu, Y. Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 2018, 148, 461–468. [Google Scholar] [CrossRef]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
- Lee Rodgers, J.; Nicewander, W.A. Thirteen ways to look at the correlation coefficient. Am. Stat. 1988, 42, 59–66. [Google Scholar] [CrossRef]
- Lai, X.; Zhou, J.; Wessely, A.; Heppt, M.; Maier, A.; Berking, C.; Vera, J.; Zhang, L. A disease network-based deep learning approach for characterizing melanoma. Int. J. Cancer 2021, 150, 1029–1044. [Google Scholar] [CrossRef]
- Zhang, L.; Dai, Z.; Yu, J.; Xiao, M. CpG-island-based annotation and analysis of human housekeeping genes. Brief. Bioinform. 2021, 22, 515–525. [Google Scholar] [CrossRef]
- Zhang, L.; Bai, W.; Yuan, N.; Du, Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput. Biol. 2019, 15, e1007069. [Google Scholar] [CrossRef] [Green Version]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Zhang, L.; Xiao, M.; Zhou, J.; Yu, J. Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA). Bioinformatics 2018, 34, 3624–3630. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Guo, Y.; Xiao, M.; Feng, L.; Yang, C.; Wang, G.; Ouyang, L. MCDB: A comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction. Acta Pharm. Sin. B 2021, 11, 3092–3104. [Google Scholar] [CrossRef] [PubMed]
- Glantz, S. Primer of Applied Regression & Analysis of Variance; McGraw-Hill, Inc.: New York, NY, USA, 1990. [Google Scholar]
Dataset | Metric | STNN | STNN* | ARIMA | SVR | RBF | RNN | KAE | |
---|---|---|---|---|---|---|---|---|---|
Pendulum | PCC | Mean | 0.994 | 0.884 | 0.371 | 0.991 | 0.993 | 0.947 | 0.990 |
Var | 1.419 × 10−5 | 0.018 | 0.248 | 3.250 × 10−5 | 1.250 × 10−6 | 8.065 × 10−4 | 8.475 × 10−5 | ||
NRMSE | Mean | 0.146 | 0.590 | 0.679 | 0.178 | 0.190 | 0.258 | 0.129 | |
Var | 0.005 | 0.028 | 0.051 | 0.010 | 0.014 | 3.482 × 10−4 | 1.725 × 10−5 | ||
Lorenz | PCC | Mean | 0.995 | −0.554 | 0.906 | −0.306 | −0.446 | 0.308 | −0.525 |
Var | 3.569 × 10−5 | 0.194 | 0.013 | 0.640 | 0.601 | 0.254 | 0.245 | ||
NRMSE | Mean | 0.097 | 2.451 | 0.620 | 1.580 | 1.600 | 1.816 | 2.629 | |
Var | 0.002 | 0.781 | 0.833 | 0.294 | 0.133 | 0.184 | 0.786 | ||
Gene | PCC | Mean | 0.395 | 0.381 | 0.243 | 0.404 | 0.446 | 0.171 | −0.065 |
Var | 0.007 | 0.115 | 0.160 | 0.087 | 0.014 | 0.383 | 0.162 | ||
NRMSE | Mean | 0.658 | 1.058 | 0.948 | 0.762 | 1.017 | 1.110 | 1.948 | |
Var | 0.005 | 0.125 | 0.0416 | 0.037 | 0.038 | 0.213 | 0.270 | ||
TS | PCC | Mean | 0.866 | 0.668 | 0.258 | 0.514 | 0.545 | 0.198 | −0.223 |
Var | 0.005 | 0.102 | 0.149 | 0.022 | 0.009 | 0.089 | 0.164 | ||
NRMSE | Mean | 0.504 | 0.755 | 1.082 | 1.226 | 1.303 | 1.232 | 1.275 | |
Var | 0.011 | 0.090 | 0.074 | 0.022 | 0.049 | 0.108 | 0.151 | ||
Solar | PCC | Mean | 0.948 | 0.951 | 0.188 | 0.643 | 0.831 | 0.155 | 0.010 |
Var | 0.001 | 0.001 | 0.112 | 0.065 | 3.747 × 10−4 | 0.005 | 0.046 | ||
NRMSE | Mean | 0.372 | 0.345 | 1.129 | 0.809 | 0.934 | 1.580 | 1.602 | |
Var | 0.024 | 0.019 | 0.005 | 0.058 | 0.007 | 0.091 | 0.096 | ||
TF | PCC | Mean | 0.989 | 0.846 | 0.821 | 0.987 | 0.990 | 0.507 | 0.658 |
Var | 2.497 × 10−4 | 0.003 | 0.092 | 4.262 × 10−4 | 3.168 | 0.712 | 0.313 | ||
NRMSE | Mean | 0.121 | 0.787 | 0.334 | 0.380 | 1.362 | 0.802 | 6.793 | |
Var | 0.002 | 0.058 | 0.100 | 0.025 | 0.185 | 0.136 | 1.825 | ||
Winning counts | 9 | 2 | 0 | 0 | 1 | 0 | 0 |
Observed Time-Series Steps (M) | Area I | Area II | Area III |
---|---|---|---|
50 | 1.5367 | 0.5644 | 0.9169 |
60 | 1.3138 | 0.3799 | 0.9085 |
70 | 2.8385 | 0.5307 | 1.3988 |
80 | 1.1171 | 0.2452 | 0.6116 |
90 | 1.4030 | 0.2434 | 0.7835 |
100 | 0.7451 | 0.1138 | 1.2030 |
Mean | 1.4924 | 0.3462 | 0.9704 |
Variance | 0.4255 | 0.0263 | 0.0680 |
Variance analysis | p-value = 1.89 × 10−3 |
Model | Metric | Pendulum | Lorenz |
---|---|---|---|
STNN | PCC | 0.9983 | 0.9979 |
NRMSE | 0.0778 | 0.0967 | |
STNN# | PCC | 0.9955 | 0.7362 |
NRMSE | 0.0780 | 0.7118 | |
STNN## | PCC | 0.9944 | 0.5703 |
NRMSE | 0.0802 | 0.9747 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
You, Y.; Zhang, L.; Tao, P.; Liu, S.; Chen, L. Spatiotemporal Transformer Neural Network for Time-Series Forecasting. Entropy 2022, 24, 1651. https://doi.org/10.3390/e24111651
You Y, Zhang L, Tao P, Liu S, Chen L. Spatiotemporal Transformer Neural Network for Time-Series Forecasting. Entropy. 2022; 24(11):1651. https://doi.org/10.3390/e24111651
Chicago/Turabian StyleYou, Yujie, Le Zhang, Peng Tao, Suran Liu, and Luonan Chen. 2022. "Spatiotemporal Transformer Neural Network for Time-Series Forecasting" Entropy 24, no. 11: 1651. https://doi.org/10.3390/e24111651
APA StyleYou, Y., Zhang, L., Tao, P., Liu, S., & Chen, L. (2022). Spatiotemporal Transformer Neural Network for Time-Series Forecasting. Entropy, 24(11), 1651. https://doi.org/10.3390/e24111651