A Temporal Window Attention-Based Window-Dependent Long Short-Term Memory Network for Multivariate Time Series Prediction
Abstract
:1. Introduction
- (1)
- Temporal window attention mechanism. We develop a new attention concept, namely a temporal window attention mechanism, to learn the spatio-temporal pattern. The temporal window attention mechanism learns the weight distribution strategy to select the relevant variables in a temporal window; hence, it can capture two-dimensional spatio-temporal correlations from a global perspective.
- (2)
- Window-dependent long short-term memory network (WDLSTM). We design a novel recurrent neural network as the encoder and decoder to enforce the long-term temporal dependencies among time series because it utilizes the previous temporal window hidden state matrix.
- (3)
- Real evaluation. We construct extensive experiments on four real-world datasets to demonstrate the effectiveness of our model. The results show that our model outperforms state-of-the-art comparison models.
2. Related works
2.1. Recurrent Neural Network
2.2. Convolutional Neural Network
3. Model
3.1. Notation and Problem Statement
3.2. Exhaustive Description of Model
3.2.1. Temporal Window Attention Mechanism
3.2.2. Encoder–Decoder
4. Experimental Studies
4.1. Datasets
- Photovoltaic (PVP) power dataset: The dataset is derived from National Energy Day-ahead PV power, which is competition data. The frequency of data collection is every 15 min. We take the PV power as the target series and choose six relevant features as exogenous series. We select the first 24,000 time steps as the training set and the rest 6000 time steps as the test set.
- SML 2010 dataset: This dataset collected 17 features from the house monitor system. We utilized 16 exogenous series to predict room temperature. We select the first 2211 time steps as the training set and the rest 553 time steps as the test set.
- Beijing PM2.5 dataset: This dataset collected hourly the concentration of PM2.5 and some meteorological readings from air-quality monitoring sites in Beijing, China. The time period is from 1 January 2013 to 31 December 2014, in which the first 14,016 time steps are used to train models and the remaining 3504 time steps are employed to test models.
- NASDAQ 100 stock dataset: The dataset contains the per-minute stock prices of 81 major companies. The NASDAQ 100 is regarded as target series. We select the first 32,448 time steps as the training set and the remaining 8112 time steps as the test set.
4.2. Methods for Comparison
4.3. Parameter and Evaluation Metrics
4.4. Experimental Results and Analyses
4.5. Interpretability of Temporal Window Attention Mechanism
4.6. Evaluation on WDLSTM
5. Conclusions
- (1)
- In many actual cases, capturing the spatio-temporal correlations in multivariate time series is a challenge. However, most studies focus on capturing one-dimensional spatio-temporal correlations from a local perspective so that they could ignore some important information. The newly introduced temporal window attention mechanism can pick the important variables within a temporal window to capture two-dimensional spatio-temporal correlations from a global perspective.
- (2)
- RNNs cannot memorize very long term information because they only summarize the information within a temporal window. To this end, we design WDLSTM as an encoder and decoder to enhance the learning ability of the long-term temporal dependencies.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fu, E.; Zhang, Y.; Yang, F.; Wang, S. Temporal self-attention-based Conv-LSTM network for multivariate time series prediction. Neurocomputing 2022, 501, 162–173. [Google Scholar] [CrossRef]
- Kamarthi, H.; Rodríguez, A.; Prakash, B.A. Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future. In Proceedings of the Tenth International Conference on Learning Representations, Virtual, 25–29 April 2022; pp. 1–14. [Google Scholar]
- Huang, Y.; Ying, J.J.C.; Tseng, V.S. Spatio-attention embedded recurrent neural network for air quality prediction. Knowl. Based Syst. 2021, 233, 107416. [Google Scholar] [CrossRef]
- Chen, Y.; Ding, F.; Zhai, L. Multi-scale temporal features extraction based graph convolutional network with attention for multivariate time series prediction. Expert Syst. Appl. 2022, 200, 117011. [Google Scholar] [CrossRef]
- Shih, S.Y.; Sun, F.K.; Lee, H.y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef] [Green Version]
- Mahmoudi, M.R.; Baroumand, S. Modeling the stochastic mechanism of sensor using a hybrid method based on seasonal autoregressive integrated moving average time series and generalized estimating equations. ISA Trans. 2022, 125, 300–305. [Google Scholar] [CrossRef]
- Dimitrios, D.; Anastasios, P.; Thanassis, K.; Dimitris, K. Do confidence indicators lead Greek economic activity ? Bull. Appl. Econ. 2021, 8, 1–15. [Google Scholar]
- Guefano, S.; Tamba, J.G.; Azong, T.E.W.; Monkam, L. Forecast of electricity consumption in the Cameroonian residential sector by Grey and vector autoregressive models. Energy 2021, 214, 118791. [Google Scholar] [CrossRef]
- Li, J. A Hidden Markov Model-based fuzzy modeling of multivariate time series. Soft Comput. 2022, 6, 1–18. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Ozdemir, A.C.; Buluş, K.; Zor, K. Medium- to long-term nickel price forecasting using LSTM and GRU networks. Resour. Policy 2022, 78, 102906. [Google Scholar] [CrossRef]
- Han, S.; Dong, H.; Teng, X.; Li, X.; Wang, X. Correlational graph attention-based Long Short-Term Memory network for multivariate time series prediction. Appl. Soft Comput. 2021, 106, 107377. [Google Scholar] [CrossRef]
- Qin, Y.; Song, D.; Cheng, H.; Cheng, W.; Jiang, G.; Cottrell, G.W. A dual-stage attention-based recurrent neural network for time series prediction. IJCAI Int. Jt. Conf. Artif. Intell. 2017, 2627–2633. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Zheng, W. Multistage attention network for multivariate time series prediction. Neurocomputing 2020, 383, 122–137. [Google Scholar] [CrossRef]
- Feng, X.; Chen, J.; Zhang, Z.; Miao, S.; Zhu, Q. State-of-charge estimation of lithium-ion battery based on clockwork recurrent neural network. Energy 2021, 236, 121360. [Google Scholar] [CrossRef]
- Zhang, Y.; Peng, N.; Dai, M.; Zhang, J.; Wang, H. Memory-Gated Recurrent Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 10956–10963. [Google Scholar]
- Ma, Q.; Lin, Z.; Chen, E.; Cottrell, G.W. Temporal pyramid recurrent neural network. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 5061–5068. [Google Scholar] [CrossRef]
- Harutyunyan, H.; Khachatrian, H.; Kale, D.C.; Steeg, G.V.; Galstyan, A. Multitask learning and benchmarking with clinical time series data. Sci. Data 2019, 6, 96. [Google Scholar] [CrossRef] [Green Version]
- Zhang, C.; Fiore, M.; Murray, I.; Patras, P. CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 12B, pp. 10851–10858. [Google Scholar] [CrossRef]
- Wang, L.; Sha, L.; Lakin, J.R.; Bynum, J.; Bates, D.W.; Hong, P.; Zhou, L. Development and Validation of a Deep Learning Algorithm for Mortality Prediction in Selecting Patients with Dementia for Earlier Palliative Care Interventions. JAMA Netw. Open 2019, 2, e196972. [Google Scholar] [CrossRef]
- Liu, Y.; Gong, C.; Yang, L.; Chen, Y. DSTP-RNN: A dual-stage two-phase attention-based recurrent neural networks for long-term and multivariate time series prediction. Expert Syst. Appl. 2020, 143, 113082. [Google Scholar] [CrossRef]
- Liang, Y.; Ke, S.; Zhang, J.; Yi, X.; Zheng, Y. Geoman: Multi-level attention networks for geo-sensory time series prediction. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3428–3434. [Google Scholar] [CrossRef] [Green Version]
- Deng, A.; Hooi, B. Graph Neural Network-Based Anomaly Detection in Multivariate Time Series. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 5A, pp. 4027–4035. [Google Scholar]
- Preeti; Bala, R.; Singh, R.P. A dual-stage advanced deep learning algorithm for long-term and long-sequence prediction for multivariate financial time series. Appl. Soft Comput. 2022, 126, 109317. [Google Scholar] [CrossRef]
- Wang, K.; Li, K.; Zhou, L.; Hu, Y.; Cheng, Z.; Liu, J.; Chen, C. Multiple convolutional neural networks for multivariate time series prediction. Neurocomputing 2019, 360, 107–119. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, Virtual, 6–10 July 2020; pp. 753–763. [Google Scholar] [CrossRef]
- Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long- and short-term temporal patterns with deep neural networks. In Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR 2018, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar] [CrossRef] [Green Version]
- Cao, S.; Wu, L.; Wu, J.; Wu, D.; Li, Q. A spatio-temporal sequence-to-sequence network for traffic flow prediction. Inf. Sci. 2022, 610, 185–203. [Google Scholar] [CrossRef]
- De Brébisson, A.; Vincent, P. An exploration of softmax alternatives belonging to the spherical loss family. 4th Int. Conf. Learn. Represent. arXiv 2015, arXiv:1511.05042. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A Comparison of ARIMA and LSTM in Forecasting Time Series. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar] [CrossRef]
- Gers, F. Long Short-Term Memory in Recurrent Neural Networks. Volume 2366. [CrossRef]
- Chang, Y.-Y.; Sun, F.-Y.; Wu, Y.-H.; Lin, S.-D. A Memory-Network Based Solution for Multivariate Time-Series Forecasting. arXiv 2018, arXiv:1809.02105. [Google Scholar]
- Xiao, Y.; Yin, H.; Zhang, Y.; Qi, H.; Zhang, Y.; Liu, Z. A dual-stage attention-based Conv-LSTM network for spatio-temporal correlation and multivariate time series prediction. Int. J. Intell. Syst. 2021, 36, 2036–2057. [Google Scholar] [CrossRef]
Hyperparameter | Value |
---|---|
Training epochs | 30 |
Batch size | 128 |
hidden states | {16, 32, 64, 128} |
Temporal window size | {6, 12, 24, 48}, {5, 10, 15, 25} |
Initial Learning rate | 0.001 |
Dataset | PV Power | SML2010 | ||||
---|---|---|---|---|---|---|
MAE | RMSE | R2 | MAE | RMSE | R2 | |
ARIMA | 0.4630 | 0.7951 | 0.9308 | 0.3033 | 0.3816 | 0.9754 |
LSTM | 0.4119 | 0.7744 | 0.9365 | 0.2437 | 0.3080 | 0.9839 |
DA-RNN | 0.3508 | 0.6356 | 0.9442 | 0.0932 | 0.1202 | 0.9962 |
DSTP | 0.3280 | 0.6417 | 0.9523 | 0.0764 | 0.1106 | 0.9974 |
MTNet | 0.3013 | 0.6251 | 0.9575 | 0.1779 | 0.2313 | 0.9903 |
DA-Conv-LSTM | 0.3020 | 0.6218 | 0.9576 | 0.0803 | 0.1084 | 0.9980 |
CGA-LSTM | 0.2796 | 0.6133 | 0.9582 | 0.0726 | 0.0959 | 0.9982 |
TWA-WDLSTM | 0.2552 | 0.6084 | 0.9594 | 0.0358 | 0.0546 | 0.9994 |
Dataset | Beijing PM2.5 | NASDAQ100 Stock | ||||
---|---|---|---|---|---|---|
MAE | RMSE | R2 | MAE | RMSE | R2 | |
ARIMA | 22.2014 | 34.2323 | 0.8771 | 7.4350 | 9.3034 | 0.9811 |
LSTM | 19.7313 | 31.9733 | 0.8928 | 6.7567 | 8.7375 | 0.9833 |
DA-RNN | 18.5451 | 31.2088 | 0.8982 | 3.8339 | 4.9586 | 0.9923 |
DSTP | 19.1360 | 31.4934 | 0.8900 | 1.5583 | 2.2816 | 0.9988 |
MTNet | 18.6874 | 30.1577 | 0.8974 | 4.7974 | 5.6801 | 0.9904 |
DA-Conv-LSTM | 18.4088 | 31.0415 | 0.8990 | 2.2974 | 3.1040 | 0.9978 |
CGA-LSTM | 18.2674 | 31.2421 | 0.8993 | 1.4952 | 2.2392 | 0.9990 |
TWA-WDLSTM | 17.6751 | 30.2252 | 0.9042 | 1.4189 | 2.1681 | 0.9990 |
Dataset | LSTM | WDLSTM | TWA-WDLSTM | |
---|---|---|---|---|
PV Power | MAE | 0.4119 | 0.3409 | 0.2552 |
RMSE | 0.7744 | 0.6859 | 0.6084 | |
R2 | 0.9365 | 0.9485 | 0.9594 | |
SML2010 | MAE | 0.2437 | 0.1732 | 0.0358 |
RMSE | 0.3080 | 0.2214 | 0.0546 | |
R2 | 0.9839 | 0.9917 | 0.9994 | |
Beijing PM2.5 | MAE | 19.7313 | 18.038 | 17.6751 |
RMSE | 31.9733 | 30.5179 | 30.2252 | |
R2 | 0.8928 | 0.9024 | 0.9042 | |
NASDAQ100 Stock | MAE | 6.7567 | 5.7436 | 1.4189 |
RMSE | 8.7375 | 7.7110 | 2.1681 | |
R2 | 0.9833 | 0.9870 | 0.9990 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, S.; Dong, H. A Temporal Window Attention-Based Window-Dependent Long Short-Term Memory Network for Multivariate Time Series Prediction. Entropy 2023, 25, 10. https://doi.org/10.3390/e25010010
Han S, Dong H. A Temporal Window Attention-Based Window-Dependent Long Short-Term Memory Network for Multivariate Time Series Prediction. Entropy. 2023; 25(1):10. https://doi.org/10.3390/e25010010
Chicago/Turabian StyleHan, Shuang, and Hongbin Dong. 2023. "A Temporal Window Attention-Based Window-Dependent Long Short-Term Memory Network for Multivariate Time Series Prediction" Entropy 25, no. 1: 10. https://doi.org/10.3390/e25010010
APA StyleHan, S., & Dong, H. (2023). A Temporal Window Attention-Based Window-Dependent Long Short-Term Memory Network for Multivariate Time Series Prediction. Entropy, 25(1), 10. https://doi.org/10.3390/e25010010