TS2ARCformer: A Multi-Dimensional Time Series Forecasting Framework for Short-Term Load Prediction
Abstract
:1. Introduction
- Efficient Multi-Dimensional Encoding. We incorporate the TS2Vec layer into the multi-dimensional time series forecasting task to improve the encoding efficiency of diverse features;
- Enhanced Interdependency Learning. By introducing a Cross-Dimensional-Self-Attention mechanism to the Transformer model, we enable better exploration of interdependencies among multi-dimensional features, enhancing the model’s learning capabilities;
- AR Integration for Scale Sensitivity. To address the insensitivity of traditional deep learning models to input scale, we integrate an autoregressive (AR) component into our framework, enhancing the model’s ability to extract linear features and adapt to varying input scales;
- Comprehensive analysis and validation of TS2ARCformer.The proposed TS2ARCformer model’s predictive performance is thoroughly analyzed and validated, providing insights into its effectiveness for power load forecasting.
2. Related Work
3. Materials and Methods
3.1. Preliminary
3.2. Overview
3.3. Framework
3.3.1. TS2Vec Layer
3.3.2. AR (AutoRegressive Component)
3.3.3. Transformer Model
3.3.4. Cross-Dimensional-Self-Attention Module
- Temporal Self-Attention (Vertical Weight Allocation) + Multivariate Self-Attention (Horizontal Weight Allocation) ≈ Global Attention Allocation.
- Global Self-Attention (Weight Allocation Across the Entire Sequence) (Disadvantage: The model becomes complex, especially for long forecasting tasks, its complexity becomes intolerable).
- Temporal Self-Attention (Weight Allocation Along the Temporal Axis) (Disadvantage: Lack of vertical attention span, resulting in information loss and lower accuracy).
3.4. TS2ARCformer
4. Experimental Results and Analysis
4.1. Evaluation Metrics
4.2. Data Preparation
4.3. Experimental Setup
- LSTM (Long Short-Term Memory): LSTM is a widely used recurrent neural network (RNN) for time series modeling. It captures long-term dependencies in sequences through gated mechanisms.
- GRU (Gated Recurrent Unit): GRU is another type of gated recurrent neural network that simplifies the gating mechanism while capturing sequence dependencies effectively.
- Transformer: Transformer is a model with self-attention mechanism, initially used in natural language processing. It captures dependencies in a sequence and processes long sequences efficiently.
- TS2Vec: TS2Vec is a representation learning method that encodes multi-dimensional data into fixed-dimensional vectors. It extracts features for prediction tasks. In this case, TS2Vec is used for encoding, followed by a fully connected layer for prediction.
- TS2Vec-LSTM: TS2Vec-LSTM combines TS2Vec with LSTM for sequence modeling and prediction, capturing multi-dimensional features and time dependencies.
- TS2Vec-GRU: TS2Vec-GRU is similar to TS2Vec-LSTM but uses GRU for sequence modeling, with fewer parameters and higher learning efficiency.
- Seq2Seq: The Seq2Seq model, widely employed for sequence-to-sequence tasks, consists of an encoder and a decoder. Both the encoder and the decoder are built using LSTM networks.
4.4. Comparative Experiments
- In terms of computational resource consumption, TCN (206.21 K Flops) is one of the most efficient models, while Transformer (305.82 M Flops) and our model “Ours” (406.84 M Flops) require higher computational resources.
- Regarding training time, TCN (236 s) and TS2Vec (185 s) are the quickest to train, while our model “Ours” (2250 s) and TCN-Transformer (1805 s) take longer to complete training.
- In the number of model parameters, TCN (2.063 K Params) and Seq2Seq (565.344 K Params) have the fewest parameters, while Transformer (5.523 M Params) and our model “Ours” (6.621 M Params) have more parameters.
4.5. Ablation Experiment
- Data Set Comparison: In the initial step, we evaluate the performance of the baseline model on two distinct datasets. It is observed that the Transformer model obtains a MAPE of 7.43% on dataset area1, whereas the baseline model achieves a MAPE of 5.74% on dataset area2. This suggests that there are variations between the two datasets, which establishes a reference point for future ablation experiments.
- Impact of TS2Vec Layer: In our study, we conduct representation learning on the power load data using the TS2Vec layer and feed the learned representations into the Transformer model for prediction. We perform experiments on two different datasets and observe the following results. When we introduce the TS2Vec layer, we achieve a MAPE of 6.01% on dataset area1. This results in a substantial reduction of 19.11% compared to the baseline. Similarly, on dataset area2, the MAPE decreases to 4.84%, showing a reduction of 15.6%. These results clearly demonstrate that the inclusion of Module 1 positively impacts the performance of the prediction model on both datasets. Notably, dataset area1 experiences a more pronounced improvement. Additionally, we observe improvements in three other performance metrics when compared to the baseline Transformer model without the TS2Vec layer. This suggests that the utilization of the TS2Vec layer for representing temporal data enhances the performance of the prediction model in downstream tasks. The analysis of these results indicates that training the prediction model using the TS2Vec layer yields a significant improvement in performance compared to the Transformer model trained without the TS2Vec layer.
- Impact of AR Component and Cross-Dimensional-Self-Attention Module: We conduct an evaluation to assess the impact of incorporating the Autoregressive (AR) component and Cross-Dimensional-Self-Attention module into the Transformer architecture of our forecasting model. The results show that these modules improve the predictive accuracy of the model. Specifically, when the AR component is included, the MAPE on dataset area1 decreases from 7.43% to 6.97%, representing a reduction of 6.19%. Similarly, on dataset area2, the MAPE decreases from 5.74% to 4.77%, indicating a reduction of 16.89%. These findings demonstrate the positive effect of the AR component on both datasets, with a slightly more significant impact on dataset area2. Furthermore, the integration of the Cross-Dimensional-Self-Attention module further enhances the model’s performance. On dataset area1, the MAPE decreases to 5.55%, resulting in a reduction of 25.30%. On dataset area2, the MAPE decreases to 4.74%, showing a reduction of 17.42%. The results demonstrate that the Cross-Dimensional-Self-Attention module successfully captures relationships between different time steps, leading to improved forecasting accuracy on both datasets. Furthermore, we integrated these two modules to evaluate their combined impact on the model’s performance. The results demonstrate that when the AR component and Cross-Dimensional-Self-Attention module are used together, the MAPE on dataset area1 further decreases to 5.22%, representing a relative reduction of 29.74%. Similarly, on dataset area2, the MAPE decreases to 4.11%, showing a relative reduction of 28.40%. These findings highlight the further enhancement of the model’s performance on both datasets through the integration of the Cross-Dimensional-Self-Attention module with the AR component. In summary, the experimental findings indicate that incorporating the AR component and Cross-Dimensional-Self-Attention module positively affects the performance of the forecasting model. The incorporation of these modules results in significant decreases in MAPE and three other metrics for both datasets, highlighting their effectiveness in capturing temporal dependencies and enhancing the model’s predictive abilities.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
STLF | Short-term Load Forecasting. |
AR | AutoRegressive. |
TCN | Temporal Convolutional Network. |
CNN | Convolutional Neural Network. |
RNN | Recurrent Neural Network. |
LSTM | Long Short-term Memory. |
GRU | Gated Recurrent Unit. |
Seq2Seq | Sequence-to-Sequence. |
MAPE | Mean Absolute Percentage Error. |
RMSE | Root Mean Square Error. |
MAE | Mean Absolute Error. |
Coefficient of Determination. |
References
- Kim, N.; Park, H.; Lee, J.; Choi, J.K. Short-term electrical load forecasting with multidimensional feature extraction. IEEE Trans. Smart Grid 2022, 13, 2999–3013. [Google Scholar] [CrossRef]
- Sharma, A.; Jain, S.K. A novel seasonal segmentation approach for day-ahead load forecasting. Energy 2022, 257, 124752. [Google Scholar] [CrossRef]
- Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
- Lu, C.; Li, J.; Zhang, G.; Zhao, Z.; Bamisile, O.; Huang, Q. A GRU-based short-term multi-energy loads forecast approach for integrated energy system. In Proceedings of the 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 25–28 March 2022; pp. 209–213. [Google Scholar]
- Liu, R.; Chen, L.; Hu, W.; Huang, Q. Short-term load forecasting based on LSTNet in power system. Int. Trans. Electr. Energy Syst. 2021, 31, e13164. [Google Scholar] [CrossRef]
- Zhang, Y.; Tang, S.; Yu, G. An interpretable hybrid predictive model of COVID-19 cases using autoregressive model and LSTM. Sci. Rep. 2023, 13, 6708. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Guo, C.; Kang, X.; Xiong, J.; Wu, J. A new time series forecasting model based on complete ensemble empirical mode decomposition with adaptive noise and temporal convolutional network. Neural Process. Lett. 2022, 55, 4397–4417. [Google Scholar] [CrossRef]
- Staffell, I.; Pfenninger, S. The increasing impact of weather on electricity supply and demand. Energy 2018, 145, 65–78. [Google Scholar] [CrossRef]
- Abideen, Z.U.; Sun, H.; Yang, Z.; Ahmad, R.Z.; Iftekhar, A.; Ali, A. Deep wide spatial-temporal based transformer networks modeling for the next destination according to the taxi driver behavior prediction. Appl. Sci. 2020, 11, 17. [Google Scholar] [CrossRef]
- Xiang, Y.; Chen, J.; Yu, W.; Wu, R.; Liu, B.; Wang, B.; Li, Z. A Two-Phase Approach for Predicting Highway Passenger Volume. Appl. Sci. 2021, 11, 6248. [Google Scholar] [CrossRef]
- Hernández-Callejo, L.; Baladrón, C.; Aguiar, J.M.; Calavia, L.; Carro, B.; Sánchez-Esguevillas, A.; Cook, D.J.; Chinarro, D.; Gomez, J. A study of the relationship between weather variables and electric power demand inside a smart grid/smart world framework. Sensors 2012, 12, 11571–11591. [Google Scholar] [CrossRef] [Green Version]
- Yue, Z.; Wang, Y.; Duan, J.; Yang, T.; Huang, C.; Tong, Y.; Xu, B. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February 22–1 March 2022; Volume 36, pp. 8980–8987. [Google Scholar]
- Saber, A.Y.; Alam, A.K.M.R. Short term load forecasting using multiple linear regression for big data. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–6. [Google Scholar]
- Sharma, S.; Majumdar, A.; Elvira, V.; Chouzenoux, E. Blind Kalman filtering for short-term load forecasting. IEEE Trans. Power Syst. 2020, 35, 4916–4919. [Google Scholar] [CrossRef]
- Rendon-Sanchez, J.F.; de Menezes, L.M. Structural combination of seasonal exponential smoothing forecasts applied to load forecasting. Eur. J. Oper. Res. 2019, 275, 916–924. [Google Scholar] [CrossRef]
- Singh, U.; Vadhera, S. Random Forest and Xgboost Technique for Short-Term Load Forecasting. In Proceedings of the 2022 1st International Conference on Sustainable Technology for Power and Energy Systems (STPES), Srinagar, India, 4–6 July 2022; pp. 1–6. [Google Scholar]
- Zhang, J.; Zhang, Q.; Li, G.; Ma, Y.; Wang, C. Application of HIMVO-SVM in short-term load forecasting. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 768–772. [Google Scholar]
- Janković, Z.; Selakov, A.; Bekut, D.; Đorđević, M. Day similarity metric model for short-term load forecasting supported by PSO and artificial neural network. Electr. Eng. 2021, 103, 2973–2988. [Google Scholar] [CrossRef]
- Xuan, Y.; Si, W.; Zhu, J.; Sun, Z.; Zhao, J.; Xu, M.; Xu, S. Multi-model fusion short-term load forecasting based on random forest feature selection and hybrid neural network. IEEE Access 2021, 9, 69002–69009. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Dong, X.; Qian, L.; Huang, L. Short-term load forecasting in smart grid: A combined CNN and K-means clustering approach. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Republic of Korea, 13–16 February 2017; pp. 119–125. [Google Scholar]
- Park, K.; Yoon, S.; Hwang, E. Hybrid load forecasting for mixed-use complex based on the characteristic load decomposition by pilot signals. IEEE Access 2019, 7, 12297–12306. [Google Scholar] [CrossRef]
- Rafi, S.H.; Deeba, S.R.; Hossain, E. A short-term load forecasting method using integrated CNN and LSTM network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
- Gong, G.; An, X.; Mahato, N.K.; Sun, S.; Chen, S.; Wen, Y. Research on short-term load prediction based on Seq2seq model. Energies 2019, 12, 3199. [Google Scholar] [CrossRef] [Green Version]
- Wu, L.; Kong, C.; Hao, X.; Chen, W. A short-term load forecasting method based on GRU-CNN hybrid neural network model. Math. Probl. Eng. 2020, 2020, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Wang, C.; Wang, Y.; Ding, Z.; Zheng, T.; Hu, J.; Zhang, K. A transformer-based method of multienergy load forecasting in integrated energy system. IEEE Trans. Smart Grid 2022, 13, 2703–2714. [Google Scholar] [CrossRef]
- Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient transformers: A survey. ACM Comput. Surv. 2022, 55, 1–28. [Google Scholar] [CrossRef]
- Guo, S.; Lin, Y.; Wan, H.; Li, X.; Cong, G. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans. Knowl. Data Eng. 2021, 34, 5415–5428. [Google Scholar] [CrossRef]
- L’heureux, A.; Grolinger, K.; Capretz, M.A.M. Transformer-Based Model for Electrical Load Forecasting. Energies 2022, 15, 4993. [Google Scholar] [CrossRef]
- Zhao, Z.; Xia, C.; Chi, L.; Chang, X.; Li, W.; Yang, T.; Zomaya, A.Y. Short-term load forecasting based on the transformer model. Information 2021, 12, 516. [Google Scholar] [CrossRef]
- Koohfar, S.; Woldemariam, W.; Kumar, A. Prediction of Electric Vehicles Charging Demand: A Transformer-Based Deep Learning Approach. Sustainability 2023, 15, 2105. [Google Scholar] [CrossRef]
- Li, C.; Qian, G. Stock Price Prediction Using a Frequency Decomposition Based GRU Transformer Neural Network. Appl. Sci. 2022, 13, 222. [Google Scholar] [CrossRef]
- Ran, P.; Dong, K.; Liu, X.; Wang, J. Short-term load forecasting based on ceemdan and transformer. Electr. Power Syst. Res. 2023, 214, 108885. [Google Scholar] [CrossRef]
- Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
Item | Hyper-Parameter |
---|---|
represent_dimension | 320 |
lr | 0.001 |
batch_size | 256 |
epochs | 200 |
Item | Hyper-Parameter |
---|---|
batch_size | 16 |
hidden_size | 256 |
num_layers | 1 |
gamma | 0.9 |
weight_decay | |
step_size | 8 |
loss_function | MSE |
epochs | 200 |
Item | Hyper-Parameter |
---|---|
batch_size | 16 |
d_model | 256 |
optimization | Adam |
n_head | 8 |
num_layers | 10 |
step size | 8 |
lr | 0.001 |
dropout | 0.1 |
epochs | 200 |
gamma | 0.9 |
weight_decay | |
loss_function | MSE |
Item | Hyper-Parameter |
---|---|
batch_size | 32 |
step_size | 8 |
lr | 0.001 |
epochs | 200 |
node in hidden layer | 256 |
optimization | Adam |
loss_function | MSE |
Item | Hyper-Parameter |
---|---|
batch_size | 32 |
step_size | 8 |
lr | 0.001 |
num_channels | [6] |
kernal_size | 7 |
epochs | 200 |
optimization | Adam |
loss_function | MSE |
Item | Hyper-Parameter |
---|---|
batch_size | 16 |
step size | 8 |
lr | 0.001 |
gamma | 0.9 |
weight_decay | |
epochs | 200 |
optimization | Adam |
loss_function | MSE |
d_model_trvs | 256 |
ar_window | 48 |
ar_output | 48 |
Models | MAPE% | MSE/MW | MAE/MW | |
---|---|---|---|---|
LSTM | 6.42 | 0.00419 | 0.03813 | 0.8733 |
GRU | 7.21 | 0.00495 | 0.04632 | 0.8413 |
Transformer | 7.43 | 0.00512 | 0.04666 | 0.8312 |
TS2Vec | 6.99 | 0.00448 | 0.04363 | 0.8652 |
TS2Vec-LSTM | 5.03 | 0.00254 | 0.03121 | 0.9222 |
TS2Vec-GRU | 4.76 | 0.00252 | 0.03037 | 0.9211 |
Seq2Seq | 7.10 | 0.00473 | 0.04566 | 0.8491 |
TCN | 7.01 | 0.00489 | 0.04401 | 0.8405 |
TCN-Transformer | 6.75 | 0.00467 | 0.04277 | 0.8586 |
Ours | 4.22 | 0.00201 | 0.02623 | 0.9412 |
Models | MAPE% | MSE/MW | MAE/MW | |
---|---|---|---|---|
LSTM | 5.79 | 0.00319 | 0.04017 | 0.8929 |
GRU | 5.55 | 0.00268 | 0.03771 | 0.9182 |
Transformer | 5.74 | 0.00314 | 0.03813 | 0.9031 |
TS2Vec | 5.30 | 0.00232 | 0.03494 | 0.9307 |
TS2Vec-LSTM | 4.41 | 0.00186 | 0.03012 | 0.9395 |
TS2Vec-GRU | 4.08 | 0.00179 | 0.02859 | 0.9401 |
Seq2Seq | 5.63 | 0.00283 | 0.03891 | 0.9119 |
TCN | 5.26 | 0.00275 | 0.03533 | 0.9143 |
TCN-Transformer | 5.56 | 0.00279 | 0.03724 | 0.9169 |
Ours | 3.57 | 0.00132 | 0.02376 | 0.9622 |
Models | Flops | Training Time | Params |
---|---|---|---|
LSTM | 28.51 M | 470 s | 295.01 K |
GRU | 21.99 M | 390 s | 227.424 K |
Transformer | 305.82 M | 1449 s | 5.523 M |
TS2Vec | 61.02 M | 185 s | 637.95 K |
TS2Vec-LSTM | 89.53 M | 635 s | 932.96 K |
TS2Vec-GRU | 83.01 M | 565 s | 865.37 K |
Seq2Seq | 54.65 M | 1629 s | 565.344 K |
TCN | 206.21 K | 236 s | 2.063 K |
TCN-Transformer | 308.69 M | 1805 s | 5.885 M |
Ours | 406.84 M | 2250 s | 6.621 M |
Models | MAPE% | MSE/MW | MAE/MW | |
---|---|---|---|---|
Transformer | 7.43 | 0.00512 | 0.04666 | 0.8312 |
TS2Vec | 6.99 | 0.00448 | 0.04363 | 0.8652 |
AR + Transformer | 6.97 | 0.00424 | 0.04382 | 0.8700 |
Enhanced Transformer | 5.55 | 0.00320 | 0.03490 | 0.9024 |
AR + Enhanced Transformer | 5.22 | 0.00262 | 0.03267 | 0.9157 |
TS2Vec-Transformer | 6.01 | 0.00325 | 0.03606 | 0.9029 |
Models | MAPE% | MSE/MW | MAE/MW | |
---|---|---|---|---|
Transformer | 5.74 | 0.00314 | 0.03556 | 0.9031 |
TS2Vec | 5.28 | 0.00230 | 0.0348 | 0.9313 |
AR + Transformer | 4.77 | 0.00221 | 0.0326 | 0.9358 |
Enhanced Transformer | 4.74 | 0.00218 | 0.0328 | 0.9311 |
AR + Enhanced Transformer | 4.11 | 0.00161 | 0.0277 | 0.9524 |
TS2Vec-Transformer | 4.84 | 0.00198 | 0.0311 | 0.9439 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Zhang, W.; Wang, P. TS2ARCformer: A Multi-Dimensional Time Series Forecasting Framework for Short-Term Load Prediction. Energies 2023, 16, 5825. https://doi.org/10.3390/en16155825
Li S, Zhang W, Wang P. TS2ARCformer: A Multi-Dimensional Time Series Forecasting Framework for Short-Term Load Prediction. Energies. 2023; 16(15):5825. https://doi.org/10.3390/en16155825
Chicago/Turabian StyleLi, Songjiang, Wenxin Zhang, and Peng Wang. 2023. "TS2ARCformer: A Multi-Dimensional Time Series Forecasting Framework for Short-Term Load Prediction" Energies 16, no. 15: 5825. https://doi.org/10.3390/en16155825
APA StyleLi, S., Zhang, W., & Wang, P. (2023). TS2ARCformer: A Multi-Dimensional Time Series Forecasting Framework for Short-Term Load Prediction. Energies, 16(15), 5825. https://doi.org/10.3390/en16155825