Interpretable Mixture of Experts for Decomposition Network on Server Performance Metrics Forecasting
Abstract
:1. Introduction
2. Related Works
2.1. Long-Term Time Series Forecasting
2.2. Linear Model for Time Series Forecasting
LTSF-Linear Structure and Principles
2.3. Kolmogorov–Arnold Network
3. Methods
3.1. Mixture of Experts for Performance Metrics Decomposition
3.2. KAN-Based Temporal Layer
4. Experiment
4.1. Experiment Setup
4.1.1. Dataset Description
4.1.2. Experimental Environment
4.2. Training Details
4.3. Evaluation Metrics
4.4. Experimental Result
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LTSF | Long-term Time Series Forecasting |
MOE | Mixture of Experts |
KAN | Kolmogorov–Arnold Network |
MAE | Mean Absolute Error |
MSE | Mean-Squared Error |
References
- Hintemann, R.; Hinterholzer, S.; Konrat, F. Server Stock Data—A Basis for Determining the Energy and Resource Requirements of Data Centres. In Proceedings of the 2024 Electronics Goes Green 2024+(EGG), Berlin, Germany, 18–20 June 2024; pp. 1–5. [Google Scholar]
- Li, Z.; Liang, M.; O’brien, L.; Zhang, H. The cloud’s cloudy moment: A systematic survey of public cloud service outage. arXiv 2013, arXiv:1312.6485. [Google Scholar] [CrossRef]
- Fraunhofer, I.Z.M.; European Commission; Deloitte; Directorate-General for Internal Market, Industry, Entrepreneurship and SMEs. Ecodesign Preparatory Study on Enterprise Servers and Data Equipment; European Union: Brussels, Belgium, 2016. [Google Scholar]
- Ismail, L.; Materwala, H. Computing server power modeling in a data center: Survey, taxonomy, and performance evaluation. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Xu, F.; Liu, F.; Jin, H.; Vasilakos, A.V. Managing performance overhead of virtual machines in cloud computing: A survey, state of the art, and future directions. Proc. IEEE 2013, 102, 11–31. [Google Scholar] [CrossRef]
- Shuja, J.; Bilal, K.; Madani, S.A.; Othman, M.; Ranjan, R.; Balaji, P.; Khan, S.U. Survey of techniques and architectures for designing energy-efficient data centers. IEEE Syst. J. 2014, 10, 507–519. [Google Scholar] [CrossRef]
- Binkert, N.L.; Hsu, L.R.; Saidi, A.G.; Dreslinski, R.G.; Schultz, A.L.; Reinhardt, S.K. Performance analysis of system overheads in TCP/IP workloads. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05), St. Louis, MO, USA, 17–21 September 2005; pp. 218–228. [Google Scholar]
- Balen, J.; Vajak, D.; Salah, K. Comparative performance evaluation of popular virtual private servers. J. Internet Technol. 2020, 21, 343–356. [Google Scholar]
- Zia, A.; Khan, M.N.A. Identifying key challenges in performance issues in cloud computing. Int. J. Mod. Educ. Comput. Sci. 2012, 4, 59. [Google Scholar] [CrossRef]
- Rao, V.V.; Rao, M.V. A survey on performance metrics in server virtualization with cloud environment. J. Cloud Comput. 2015, 2015, 291109. [Google Scholar]
- Katal, A.; Dahiya, S.; Choudhury, T. Energy efficiency in cloud computing data centers: A survey on software technologies. Clust. Comput. 2023, 26, 1845–1875. [Google Scholar] [CrossRef]
- Kalbarczyk, Z.T.; Nakka, N.M. Classical Dependability Techniques. In Dependable Computing: Design and Assessment; Wiley: Hoboken, NJ, USA, 2024. [Google Scholar]
- Tuli, S.; Ilager, S.; Ramamohanarao, K.; Buyya, R. Dynamic scheduling for stochastic edge-cloud computing environments using a3c learning and residual recurrent neural networks. IEEE Trans. Mob. Comput. 2020, 21, 940–954. [Google Scholar] [CrossRef]
- Abouelyazid, M. Machine Learning Algorithms for Dynamic Resource Allocation in Cloud Computing: Optimization Techniques and Real-World Applications. J. AI-Assist. Sci. Discov. 2021, 1, 1–58. [Google Scholar]
- Deng, S.; Xiang, Z.; Zhao, P.; Taheri, J.; Gao, H.; Yin, J.; Zomaya, A.Y. Dynamical resource allocation in edge for trustable internet-of-things systems: A reinforcement learning method. IEEE Trans. Ind. Inform. 2020, 16, 6103–6113. [Google Scholar] [CrossRef]
- Sabireen, H.; Neelanarayanan, V. A review on fog computing: Architecture, fog with IoT, algorithms and research challenges. ICT Express 2021, 7, 162–176. [Google Scholar]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
- Liu, Y.; Gong, C.; Yang, L.; Chen, Y. DSTP-RNN: A dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction. Expert Syst. Appl. 2020, 143, 113082. [Google Scholar] [CrossRef]
- Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A survey on long short-term memory networks for time series prediction. Procedia Cirp 2021, 99, 650–655. [Google Scholar] [CrossRef]
- Torres, J.F.; Hadjout, D.; Sebaa, A.; Martínez-Álvarez, F.; Troncoso, A. Deep learning for time series forecasting: A survey. Big Data 2021, 9, 3–21. [Google Scholar] [CrossRef]
- Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
- Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
- Liu, X.; Lin, Z. Impact of Covid-19 pandemic on electricity demand in the UK based on multivariate time series forecasting with Bidirectional Long Short Term Memory. Energy 2021, 227, 120455. [Google Scholar] [CrossRef]
- Tayal, A.R.; Tayal, M.A. DARNN: Discourse Analysis for Natural languages using RNN and LSTM. Int. J. Next-Gener. Comput. 2021, 12, 762. [Google Scholar] [CrossRef]
- Smyl, S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int. J. Forecast. 2020, 36, 75–85. [Google Scholar] [CrossRef]
- Hamfelt, T. Forecasting the Regulating Price in the Finnish Energy Market Using the Multi-Horizon Quantile Recurrent Neural Network. Master’s Thesis, Lund University, Lund, Sweden, 2020. [Google Scholar]
- Jing, X.; Luo, J.; Zuo, G.; Yang, X. Interpreting runoff forecasting of long short-term memory network: An investigation using the integrated gradient method on runoff data from the Han River basin. J. Hydrol. Reg. Stud. 2023, 50, 101549. [Google Scholar] [CrossRef]
- Kag, A.; Saligrama, V. Time adaptive recurrent neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15149–15158. [Google Scholar]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtul, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
- Chen, M.; Peng, H.; Fu, J.; Ling, H. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12270–12280. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Liang, X.; Yang, E.; Deng, C.; Yang, Y. CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph Transformer. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Shen, L.; Wang, Y. TCCT: Tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 2022, 480, 131–145. [Google Scholar] [CrossRef]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar]
- Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time series data augmentation for deep learning: A survey. arXiv 2020, arXiv:2002.12478. [Google Scholar]
- Kim, G.; Yoo, H.; Kim, C.; Kim, R.; Kim, S. LTScoder: Long-term time series forecasting based on a linear autoencoder architecture. IEEE Access 2024, 12, 98623–98633. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar] [CrossRef]
- Schmidt-Hieber, J. The Kolmogorov–Arnold representation theorem revisited. Neural Netw. 2021, 137, 119–126. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Zheng, L.; Chen, W. Kolmogorov-Arnold Networks (KAN) for Time Series Classification and Robust Analysis. arXiv 2024, arXiv:2408.07314. [Google Scholar]
- Xu, K.; Chen, L.; Wang, S. Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability. arXiv 2024, arXiv:2406.02496. [Google Scholar]
- Han, X.; Zhang, X.; Wu, Y.; Zhang, Z.; Wu, Z. KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? arXiv 2024, arXiv:2408.11306. [Google Scholar]
- Vaca-Rubio, C.J.; Blanco, L.; Pereira, R.; Caus, M. Kolmogorov-arnold networks (kans) for time series analysis. arXiv 2024, arXiv:2405.08790. [Google Scholar]
Groups | Machines | Variates | Timesteps | Granularity |
---|---|---|---|---|
3 | 29 | 38 | 50,400 | 1 min |
Resource | Specification |
---|---|
CPU | AMD Epyc 9654 Processor 96 Core 2.4 GHz |
RAM | 128 GB |
GPU | NVIDIA RTX A6000 |
OS | Ubuntu 22.04.3 |
Model | MOE-DKAN | Linear | DLinear | FEDformer | ||||
---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
96 | 0.053 | 0.177 | 0.178 | 0.359 | 0.062 | 0.195 | 0.084 | 0.251 |
192 | 0.0785 | 0.199 | 0.097 | 0.245 | 0.093 | 0.244 | 0.119 | 0.245 |
336 | 0.093 | 0.235 | 0.104 | 0.268 | 0.119 | 0.290 | 0.126 | 0.292 |
720 | 0.108 | 0.241 | 0.176 | 0.350 | 0.193 | 0.401 | 0.146 | 0.310 |
Methods | MOE-DKAN | Linear | DLinear | FEDformer | ||||
---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
96 | 0.152 | 0.278 | 0.181 | 0.293 | 0.153 | 0.281 | 0.194 | 0.312 |
192 | 0.169 | 0.283 | 0.172 | 0.280 | 0.169 | 0.277 | 0.219 | 0.352 |
336 | 0.173 | 0.288 | 0.186 | 0.285 | 0.173 | 0.285 | 0.215 | 0.403 |
720 | 0.212 | 0.325 | 0.223 | 0.318 | 0.212 | 0.325 | 0.273 | 0.386 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, F.; Ji, X.; Zhang, L.; Wang, J.; Zhang, K.; Wu, W. Interpretable Mixture of Experts for Decomposition Network on Server Performance Metrics Forecasting. Electronics 2024, 13, 4116. https://doi.org/10.3390/electronics13204116
Peng F, Ji X, Zhang L, Wang J, Zhang K, Wu W. Interpretable Mixture of Experts for Decomposition Network on Server Performance Metrics Forecasting. Electronics. 2024; 13(20):4116. https://doi.org/10.3390/electronics13204116
Chicago/Turabian StylePeng, Fang, Xin Ji, Le Zhang, Junle Wang, Kui Zhang, and Wenjun Wu. 2024. "Interpretable Mixture of Experts for Decomposition Network on Server Performance Metrics Forecasting" Electronics 13, no. 20: 4116. https://doi.org/10.3390/electronics13204116
APA StylePeng, F., Ji, X., Zhang, L., Wang, J., Zhang, K., & Wu, W. (2024). Interpretable Mixture of Experts for Decomposition Network on Server Performance Metrics Forecasting. Electronics, 13(20), 4116. https://doi.org/10.3390/electronics13204116