Prediction Method of Multiple Related Time Series Based on Generative Adversarial Networks
Abstract
:1. Introduction
- (1)
- We propose a novel GAN-based deep learning model MTSGAN, which is an end-to-end solution to the prediction problem of multiple related time series that exist widely in the real world. Compared with other existing time series prediction models, MTSGAN can simultaneously capture the complex interactional dependencies between time series and the temporal dependencies within each time series, which has unique advantages in the multiple related time series prediction task.
- (2)
- In the multiple related time series prediction problem, the complex interactional dependencies between time series is hidden in the data. Conventional methods cannot directly extract these hidden complex interactional dependencies. The MTSGAN model we proposed skillfully uses a generator to generate these interactional dependencies and uses a discriminator to optimize the generated interactional dependencies. This method of directly extracting interactional dependencies from data does not rely on other prior knowledge. In addition, we use transposed convolutional networks to implement our interaction matrix generator, which improves the scalability of MTSGAN.
2. Related Work
2.1. Time Series Prediction
2.2. GAN and GCN
3. Proposed Model
3.1. Problem Definition
3.2. MTSGAN Overview
- Interaction Matrix Generator : consists of a transposed convolutional networks, which implements a mapping function . It maps a k dimensional random noise vector sampled from the Gaussian distribution to a interaction matrix. We use this interaction matrix as the adjacency matrix of the time series interaction graph.
- Prediction Generator : consists of a graph convolutional networks (GCN) and a long short-term memory networks (LSTM). Its input is a time series interaction graph, which is described by an interaction matrix and a time series feature matrix. First, GCN performs graph convolution operations on the time series interaction graph to obtain an intermediate feature representation that incorporates the interactional dependencies between time series, and then LSTM processes this intermediate feature representation and capture the temporal dependencies. In this way, the predicted value of each time series can be generated by the prediction generator.
- Time Series Discriminator D: The discriminator is used to judge the quality of the data generated by the prediction generator. It takes real time series samples and fake time series samples as input, and then outputs a value indicating the probability that the input sample is true. After the discriminator is well trained, it will be fixed as an estimator of the above two generators, and the gradient information will be fed back to and to optimize its parameters.
Algorithm 1 MTSGAN Framework |
Input: time series feature vector , real target Output: prediction values , generator , and discriminator D 1: Initialize , , 2: for number of training iterations do 3: generate the random noise 4: generate the interaction matrix 5: make interaction matrix symmetrical: and 6: construct time series feature matrix: 7: generate the prediction values 8: construct real time series samples: 9: construct fake time series samples: 10: for k steps do 11: training to distinguish real samples from fake samples 12: update discriminator by ascending its gradient: 13: end for 14: generate the random noise 15: update the by descending its gradient: 16: update the by descending its gradient: 17: end for 18: generate the random noise 19: get prediction values 20: return , , , D |
3.3. Interaction Matrix Generator
3.4. Prediction Generator
3.4.1. GCN for Extracting Interactional Dependencies
3.4.2. LSTM for Extracting Temporal Dependencies
3.5. Time Series Discriminator
4. Experiments
4.1. Data Sets
- Store Item Demand Dataset (https://www.kaggle.com/c/demand-forecasting-kernels-only/data). This dataset provides daily sale records of 50 different products in 10 different stores. The sale records of each product start on 1 January 2013 and end on 31 December 2017, which means this dataset contains 500 time series, and the length of each time series is 1826 days.
- Web Traffic Dataset (https://www.kaggle.com/c/web-traffic-time-series-forecasting/data). This dataset records the data of Wikipedia website traffic. The entire dataset contains about 145,000 time series. Each time series represents the daily traffic of a Wikipedia page. The recording time starts from 1 July 2015 to 10 September 2017. The length of time series is 804 days. The dataset contains missing values, and the data used in the experiment is 500 time series that do not contain missing values.
- NOAA China Dataset (https://www.ncei.noaa.gov/data/global-summary-of-the-day/). This dataset is the meteorological data recorded by weather stations in different locations in China provided by the National Oceanic and Atmospheric Administration of the United States. We extracted daily temperature data from 400 different weather stations from 2015 to 2018 as our experimental data.
4.2. Experimental Settings
- (1)
- Autogressive Integrated Moving Average [1] (ARIMA): This method first make the time series stationary through a difference operation, and then combines the AR model and the MA model to predict the future value of the time series. It is a very widely used time series prediction method.
- (2)
- Vector Auto-Regressive [11] (VAR): This model is often used to solve the prediction problem of multivariate time series. It can consider the correlation between variables of different dimensions.
- (3)
- Support Vector Regression machines [5] (SVR): This model is a well-known machine learning model with solid mathematical theoretical support.
- (4)
- LightGBM [25] (LGB): This model is an improved gradient boosting tree model, which can solve classification and regression problems, and demonstrates powerful prediction performance in various data mining competitions.
- (5)
- Long Short-Term Memory [4] (LSTM): This model is a recurrent neural network model that can capture long-distance dependencies in sequence data.
- (6)
- Gate Recurrent Unit [26] (GRU): This model is also a recurrent neural network model, which simplifies the gate structure in LSTM and makes training more efficient.
4.3. Prediction Performance of MTSGAN
4.4. Influence of Interaction Matrix Generator’s Structure
4.5. Influence of GCN Depth
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Box, G.E.P.; Jenkins, G. Time Series Analysis, Forecasting and Control; Holden-Day, Inc.: Stoakland, CA, USA, 1990. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 155–161. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Kim, K.J. Financial time series forecasting using support vector machines. Neurocomputing 2003, 55, 307–319. [Google Scholar] [CrossRef]
- Tay, F.E.; Cao, L. Application of support vector machines in financial time series forecasting. Omega 2001, 29, 309–317. [Google Scholar] [CrossRef]
- Hao, W.; Yu, S. Support vector regression for financial time series forecasting. In International Conference on Programming Languages for Manufacturing; Springer: Boston, MA, USA, 2006; pp. 825–830. [Google Scholar]
- Mellit, A.; Pavan, A.M.; Benghanem, M. Least squares support vector machine for short-term prediction of meteorological time series. Theor. Appl. Climatol. 2013, 111, 297–307. [Google Scholar] [CrossRef]
- Zivot, E.; Wang, J. Vector autoregressive models for multivariate time series. In Modeling Financial Time Series with S-PLUS®; Springer: New York, NY, USA, 2006; pp. 385–429. [Google Scholar]
- Chai, D.; Wang, L.; Yang, Q. Bike flow prediction with multi-graph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 6–9 November 2018; pp. 397–400. [Google Scholar]
- Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syste. 2019, 21, 3848–3858. [Google Scholar] [CrossRef] [Green Version]
- Luo, Y.; Cai, X.; Zhang, Y.; Xu, J. Multivariate Time Series Imputation with Generative Adversarial Networks. Advances in Neural Information Processing Systems. 2018, pp. 1596–1607. Available online: https://papers.nips.cc/paper/7432-multivariate-time-series-imputation-with-generative-adversarial-networks (accessed on 25 January 2021).
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 2016, 29, 3844–3852. [Google Scholar]
- Yao, L.; Mao, C.; Luo, Y. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 5 September 2019; Volume 33, pp. 7370–7377. [Google Scholar]
- Parisot, S.; Ktena, S.I.; Ferrante, E.; Lee, M.; Guerrero, R.; Glocker, B.; Rueckert, D. Disease prediction using graph convolutional networks: Application to Autism Spectrum Disorder and Alzheimer’s disease. Med. Image Anal. 2018, 48, 117–130. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Dumoulin, V.; Visin, F. A guide to convolution arithmetic for deep learning. arXiv 2016, arXiv:1603.07285. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Store Item | Web Traffic | NOAA China | ||||
---|---|---|---|---|---|---|
MAE | RMSE | MAE | RMSE | MAE | RMSE | |
ARIMA | 12.637 | 15.479 | 20.196 | 35.712 | 8.954 | 11.231 |
VAR | 7.714 | 9.729 | 16.108 | 31.197 | 7.806 | 9.879 |
SVR | 8.419 | 11.527 | 13.482 | 22.754 | 5.692 | 7.539 |
LGB | 6.897 | 9.029 | 13.078 | 21.531 | 4.419 | 5.853 |
LSTM | 9.113 | 12.217 | 12.314 | 18.538 | 4.755 | 6.436 |
GRU | 9.237 | 13.351 | 13.197 | 18.017 | 4.892 | 6.785 |
MTSGAN | 5.843 | 7.675 | 10.610 | 16.968 | 3.467 | 4.726 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, W.; Huang, F.; Kao, Y.; Chen, Z.; Wu, Q. Prediction Method of Multiple Related Time Series Based on Generative Adversarial Networks. Information 2021, 12, 55. https://doi.org/10.3390/info12020055
Wu W, Huang F, Kao Y, Chen Z, Wu Q. Prediction Method of Multiple Related Time Series Based on Generative Adversarial Networks. Information. 2021; 12(2):55. https://doi.org/10.3390/info12020055
Chicago/Turabian StyleWu, Weijie, Fang Huang, Yidi Kao, Zhou Chen, and Qi Wu. 2021. "Prediction Method of Multiple Related Time Series Based on Generative Adversarial Networks" Information 12, no. 2: 55. https://doi.org/10.3390/info12020055
APA StyleWu, W., Huang, F., Kao, Y., Chen, Z., & Wu, Q. (2021). Prediction Method of Multiple Related Time Series Based on Generative Adversarial Networks. Information, 12(2), 55. https://doi.org/10.3390/info12020055