Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting
Abstract
:1. Introduction
- A new method of probability informing of ML models is introduced. It involves creating additional features (connected mixture components) based on probability and stochastic models. This approach is suitable for various machine learning algorithms and deep neural networks.
- For the first time, we have shown that probability-informed ML models can also improve their accuracy in forecasting, not just neural networks like in previous research. When applying these models to geophysical time series, the Root Mean Square Error (RMSE) was reduced by – and the Mean Absolute Percentage Error (MAPE) was decreased by – compared with the results obtained with vanilla decision trees [17], random forests [18], and gradient boosting [19].
- For real-world time series, a significant increase in the ML and NN forecasting accuracy is demonstrated with various methods of probability informing by connected mixture components as well as forecast periods. Thus, for geophysical data, the RMSE was decreased by – and MAPE was by –, compared with models without informing. For Electricity Transformer Dataset (https://github.com/zhouhaoyi/ETDataset (accessed on 1 July 2024)) (ETDataset) [20], the Mean Squared Error (MSE) improvement was –.
- For any test datasets and algorithms, probability-informed models are better than vanilla ones. However, the best accuracy can be obtained by various algorithms. An informed ensemble of LSTM (Long Short-Term Memory) architecture [21] and vanilla transformer [22] provides the best results for geophysical data in short- and medium-term forecasting. Alternatively, for medium-term forecasts, an informed random forest should be used for ETDataset.
- The introduced probability-informed approach allows us to outperform the results of both transformer NN architectures and classical statistical and machine learning methods.
2. Related Works
3. Methodology of Probability Informing Based on Connected Mixture Components
3.1. Overall Framework
3.2. Feature Construction Based on Finite Normal Mixtures
- are weights of the corresponding components;
- are expectations;
- are standard deviations.
- Let be a set of indices (numbers) of components for step number t, that is, , and is an analogous set for step. .
- Let and be sets of indexes from the first and second sets, respectively, for which the nearest component was found. Initially, we assume , .
- For each , one should find the closest number I in the sense of solving an optimization problem:
- To correctly identify connected components, the following condition must be met:
- Steps 1–4 are repeated for each acceptable position of the sliding window, forming a parameter adjacency matrix.
Algorithm 1. Forming connected mixture components |
|
3.3. Probability-Informed Machine Learning Models
- The additional features are transformed using an affine transformation to a form that corresponds to the internal state of the recurrent neural network (RNN):
- Then, the hidden state of the RNN is initialized using the vector .
4. Experimental Section
4.1. Test Data and Connected Mixture Components
4.2. Accuracy Metrics, Hyperparameters, and Typical Training Times
- Seven minutes for architecture (IV) on geophysical data;
- Fifteen minutes for architecture (V) on geophysical data;
- Forty minutes for architecture (V) on ETDataset (due to the larger input space);
- Three hours for an ensemble architecture (VI) on geophysical data;
- Five hours for architecture (VI) on ETDataset.
4.3. Geophysical Data Forecasting
- From to for model (I);
- From to for model (II);
- From to for model (III);
- From to for architecture (IV);
- From to for architecture (V);
- From to for architecture (VI).
- From to for model (I);
- From to for model (II);
- From to for model (III);
- From to for architecture (IV);
- From to for architecture (V);
- From to for architecture (VI).
- From to for model (I);
- From to for model (II);
- From to for model (III);
- From to for architecture (IV);
- From to for architecture (V);
- From to for architecture (VI).
- From to for model (I);
- From to for model (II);
- From to for model (III);
- From to for architecture (IV);
- From to for architecture (V);
- From to for architecture (VI).
4.4. ETDatset Forecasting
- A total of for model (I);
- A total of for model (II);
- A total of for model (III);
- A total of for architecture (V);
- A total of for architecture (VI).
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bzdok, D.; Altman, N.; Krzywinski, M. Statistics versus Machine Learning. Nat. Methods 2018, 15, 233–234. [Google Scholar] [CrossRef] [PubMed]
- Korb, K.; Nicholson, A. Bayesian Artificial Intelligence; Chapman and Hall/CRC: London, UK, 2011. [Google Scholar]
- Murphy, K. Probabilistic Machine Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
- James, G.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Dong, S.; Wang, P.; Abbas, K. A Survey on Deep Learning and its Applications. Computer Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Lim, B.; Zohren, S. Time-series Forecasting with Deep Learning: A Survey. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2021, 379, 20170179. [Google Scholar] [CrossRef] [PubMed]
- Torres, J.F.; Hadjout, D.; Sebaa, A.; Martínez-Álvarez, F.; Troncoso, A. Deep Learning for Time Series Forecasting: A Survey. Big Data 2021, 9, 3–21. [Google Scholar] [CrossRef]
- Benidis, K.; Rangapuram, S.S.; Flunkert, V.; Wang, Y.; Maddix, D.; Turkmen, C.; Gasthaus, J.; Bohlke-Schneider, M.; Salinas, D.; Stella, L.; et al. Deep Learning for Time Series Forecasting: Tutorial and Literature Survey. ACM Comput. Surv. 2022, 55, 1–36. [Google Scholar] [CrossRef]
- Chen, Z.; Ma, M.; Li, T.; Wang, H.; Li, C. Long sequence time-series forecasting with deep learning: A survey. Inf. Fusion 2023, 97, 101819. [Google Scholar] [CrossRef]
- Safonova, A.; Ghazaryan, G.; Stiller, S.; Main-Knorn, M.; Nendel, C.; Ryo, M. Ten Deep Learning Techniques to Address Small Data Problems with Remote Sensing. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103569. [Google Scholar] [CrossRef]
- Xu, P.; Ji, X.; Li, M.; Lu, W. Small Data Machine Learning in Materials Science. NPJ Comput. Mater. 2023, 9, 42. [Google Scholar] [CrossRef]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Alkilane, K.; He, Y.; Lee, D.H. MixMamba: Time series modeling with adaptive expertise. Inf. Fusion 2024, 112, 102589. [Google Scholar] [CrossRef]
- Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-Informed Machine Learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
- Härdle, W.; Werwatz, A.; Müller, M.; Sperlich, S. Nonparametric and Semiparametric Models; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar] [CrossRef]
- Safavian, S.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Schapire, R.E. The Boosting Approach to Machine Learning: An Overview. In Lecture Notes in Statistics; Springer: New York, NY, USA, 2003; pp. 149–171. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; NIPS’17; pp. 6000–6010. [Google Scholar]
- Scott, S.; Matwin, S. Feature engineering for text classification. In Proceedings of the ICML, Bled, Slovenia, 27–30 June 1999; Volume 99, pp. 379–388. [Google Scholar]
- Mutlag, W.K.; Ali, S.K.; Aydam, Z.M.; Taher, B.H. Feature Extraction Methods: A Review. J. Phys. Conf. Ser. 2020, 1591, 012028. [Google Scholar] [CrossRef]
- Fernandes, S.V.; Ullah, M.S. A Comprehensive Review on Features Extraction and Features Matching Techniques for Deception Detection. IEEE Access 2022, 10, 28233–28246. [Google Scholar] [CrossRef]
- Zhou, H.; Li, J.; Zhang, S.; Zhang, S.; Yan, M.; Xiong, H. Expanding the Prediction Capacity in Long Sequence Time-Series Forecasting. Artif. Intell. 2023, 318, 103886. [Google Scholar] [CrossRef]
- Jia, B.; Wu, H.; Guo, K. Chaos Theory Meets Deep Learning: A New Approach to Time Series Forecasting. Expert Syst. Appl. 2024, 255, 124533. [Google Scholar] [CrossRef]
- Cruz, L.F.S.A.; Silva, D.F. Financial Time Series Forecasting Enriched with Textual Information. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Virtual, 13–15 December 2021; pp. 385–390. [Google Scholar] [CrossRef]
- Plutenko, I.; Papkov, M.; Palo, K.; Parts, L.; Fishman, D. Metadata Improves Segmentation Through Multitasking Elicitation. In Proceedings of the Domain Adaptation and Representation Transfer, Vancouver, BC, Canada, 12 October 2024; pp. 147–155. [Google Scholar]
- Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Mao, Z.; Jagtap, A.D.; Karniadakis, G.E. Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 2020, 360, 112789. [Google Scholar] [CrossRef]
- Cai, S.; Mao, Z.; Wang, Z.; Yin, M.; Karniadakis, G.E. Physics-informed neural networks (PINNs) for fluid mechanics: A review. Acta Mech. Sin. 2021, 37, 1727–1738. [Google Scholar] [CrossRef]
- Jin, X.; Cai, S.; Li, H.; Karniadakis, G.E. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. J. Comput. Phys. 2021, 426, 109951. [Google Scholar] [CrossRef]
- Li, Y.; Xiao, L.; Wei, H.; Kou, Y.; Yang, L.; Li, D. A Time-Frequency Physics-Informed Model for Real-Time Motion Prediction of Semi-Submersibles. Ocean. Eng. 2024, 299, 117379. [Google Scholar] [CrossRef]
- Saito, N.; Coifman, R.R.; Geshwind, F.B.; Warner, F. Discriminant feature extraction using empirical probability density estimation and a local basis library. Pattern Recognit. 2002, 35, 2841–2852. [Google Scholar] [CrossRef]
- Gorodetsky, V.; Samoylov, V. Feature Extraction for Machine Learning: Logic-Probabilistic Approach. In Proceedings of the Fourth International Workshop on Feature Selection in Data Mining, Hyderabad, India, 21 June 2010; Volume 10, pp. 55–65. [Google Scholar]
- Le, T.; Schuff, N. A Probability-Based Approach for Multi-scale Image Feature Extraction. In Proceedings of the 2014 11th International Conference on Information Technology: New Generations, Las Vegas, NV, USA, 7–9 April 2014; pp. 143–148. [Google Scholar] [CrossRef]
- Ma, Y.; Huang, B. Bayesian Learning for Dynamic Feature Extraction With Application in Soft Sensing. IEEE Trans. Ind. Electron. 2017, 64, 7171–7180. [Google Scholar] [CrossRef]
- Yan, H.; He, L.; Song, X.; Yao, W.; Li, C.; Zhou, Q. Bidirectional Statistical Feature Extraction Based on Time Window for Tor Flow Classification. Symmetry 2022, 14, 2002. [Google Scholar] [CrossRef]
- Subramanian, A.; Mahadevan, S. Probabilistic Physics-Informed Machine Learning for Dynamic Systems. Reliab. Eng. Syst. Saf. 2023, 230, 108899. [Google Scholar] [CrossRef]
- Fuhg, J.N.; Bouklas, N. On Physics-Informed Data-Driven Isotropic and Anisotropic Constitutive Models Through Probabilistic Machine Learning and Space-Filling Sampling. Comput. Methods Appl. Mech. Eng. 2022, 394, 114915. [Google Scholar] [CrossRef]
- Zhou, T.; Jiang, S.; Han, T.; Zhu, S.P.; Cai, Y. A Physically Consistent Framework for Fatigue Life Prediction Using Probabilistic Physics-Informed Neural Network. Int. J. Fatigue 2023, 166, 107234. [Google Scholar] [CrossRef]
- Gorshenin, A.; Kuzmin, V. Method for improving accuracy of neural network forecasts based on probability mixture models and its implementation as a digital service. Inform. Primen. 2021, 15, 63–74. [Google Scholar] [CrossRef]
- Gorshenin, A.K.; Vilyaev, A.L. Finite Normal Mixture Models for the Ensemble Learning of Recurrent Neural Networks with Applications to Currency Pairs. Pattern Recognit. Image Anal. 2022, 32, 780–792. [Google Scholar] [CrossRef]
- Itô, K. On Stochastic Differential Equations; Number 4; American Mathematical Society: Washington, DC, USA, 1951. [Google Scholar]
- Gikhman, I.; Skorokhod, A.V. The Theory of Stochastic Processes II; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Wu, X.; Kumar, V.; Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Yu, P.S.; Zhou, Z.-H.; et al. Top 10 Algorithms in Data Mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
- Gorshenin, A.K. On Implementation of EM-type Algorithms in the Stochastic Models for a Matrix Computing on GPU. AIP Conf. Proc. 2015, 1648, 250008. [Google Scholar] [CrossRef]
- Belyaev, K.P.; Gorshenin, A.K.; Korolev, V.Y.; Osipova, A.A. Comparison of Statistical Approaches for Reconstructing Random Coefficients in the Problem of Stochastic Modeling of Air–Sea Heat Flux Increments. Mathematics 2024, 12, 288. [Google Scholar] [CrossRef]
- Gorshenin, A.; Korolev, V.; Shcherbinina, A. Statistical estimation of distributions of random coefficients in the Langevin stochastic differential equation. Inform. Primen. 2020, 14, 3–12. [Google Scholar] [CrossRef]
- Liu, C.; Li, H.; Fu, K.; Zhang, F.; Datcu, M.; Emery, W. A Robust EM Clustering Algorithm for Gaussian Mixture Models. Pattern Recognit. 2012, 45, 3950–3961. [Google Scholar] [CrossRef]
- Wu, D.; Ma, J. An Effective EM Algorithm for Mixtures of Gaussian Processes via the MCMC Sampling and Approximation. Neurocomputing 2019, 331, 366–374. [Google Scholar] [CrossRef]
- Zeller, C.B.; Cabral, C.R.B.; Lachos, V.H.; Benites, L. Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Adv. Data Anal. Classif. 2018, 13, 89–116. [Google Scholar] [CrossRef]
- Abid, S.; Quaez, U.; Contreras-Reyes, J. An Information-Theoretic Approach for Multivariate Skew-t Distributions and Applications. Mathematics 2021, 9, 146. [Google Scholar] [CrossRef]
- Audhkhasi, K.; Osoba, O.; Kosko, B. Noise-Enhanced Convolutional Neural Networks. Neural Netw. 2016, 78, 15–23. [Google Scholar] [CrossRef] [PubMed]
- Greff, K.; van Steenkiste, S.; Schmidhuber, J. Neural Expectation Maximization. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6694–6704. [Google Scholar]
- Kolmogorov, A.; Fomin, S. Elements of the Theory of Functions and Functional Analysis; FIZMATLIT: Moscow, Russia, 2004. [Google Scholar]
- Gorshenin, A.K.; Kuzmin, V.Y. Statistical Feature Construction for Forecasting Accuracy Increase and its Applications in Neural Network Based Analysis. Mathematics 2022, 10, 589. [Google Scholar] [CrossRef]
- Karpathy, A.; Li, F.-F. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3128–3137. [Google Scholar]
- Wang, B.; Jiang, T.; Zhou, X.; Ma, B.; Zhao, F.; Wang, Y. Time-Series Classification Based on Fusion Features of Sequence and Visualization. Appl. Sci. 2020, 10, 4124. [Google Scholar] [CrossRef]
- Chang, J.; Jin, L. Gating Mechanism Based Feature Fusion Networks for Time Series Classification. In Proceedings of the 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Wuhan, China, 22–24 April 2022; pp. 147–150. [Google Scholar] [CrossRef]
- Wang, T.; Liu, Z.; Zhang, T.; Hussain, S.F.; Waqas, M.; Li, Y. Adaptive feature fusion for time series classification. Knowl.-Based Syst. 2022, 243, 108459. [Google Scholar] [CrossRef]
- Park, S.H.; Syazwany, N.S.; Lee, S.C. Meta-Feature Fusion for Few-Shot Time Series Classification. IEEE Access 2023, 11, 41400–41414. [Google Scholar] [CrossRef]
- Perry, A.; Walker, J. The Ocean-Atmosphere System; Longman: London, UK, 1977. [Google Scholar]
- Gorshenin, A.K.; Osipova, A.A.; Belyaev, K.P. Stochastic analysis of air–sea heat fluxes variability in the North Atlantic in 1979–2022 based on reanalysis data. Comput. Geosci. 2023, 181, 105461. [Google Scholar] [CrossRef]
- Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
- Gavrikov, A.; Gulev, S.K.; Markina, M.; Tilinina, N.; Verezemskaya, P.; Barnier, B.; Dufour, A.; Zolina, O.; Zyulyaeva, Y.; Krinitskiy, M.; et al. RAS-NAAD: 40-yr high-resolution north atlantic atmospheric hindcast for multipurpose applications (new dataset for the regional mesoscale studies in the atmosphere and the ocean). J. Appl. Meteorol. Climatol. 2020, 59, 793–817. [Google Scholar] [CrossRef]
- Grainger, J.J.; Stevenson, W.D. Power System Analysis; McGraw Hill: New York, NY, USA, 1994. [Google Scholar]
- Weedy, B.; Cory, B.; Jenkins, N.; Ekanayake, J.; Strbac, G. Electric Power Systems; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
- Banchuin, R.; Chaisricharoen, R. An SDE based Stochastic Analysis of Transformer. In Proceedings of the 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), Barcelona, Spain, 30 January 2019; pp. 310–313. [Google Scholar] [CrossRef]
- Schein, O.; Denk, G. Numerical solution of stochastic differential-algebraic equations with applications to transient noise simulation of microelectronic circuits. J. Comput. Appl. Math. 1998, 100, 77–92. [Google Scholar] [CrossRef]
- Römisch, W.; Winkler, R. Stochastic DAEs in Circuit Simulation. In Proceedings of the Modeling, Simulation, and Optimization of Integrated Circuits, Basel, Switzerland, 23 October 2003; pp. 303–318. [Google Scholar] [CrossRef]
- Kolarova, E. Modelling RL Electrical Circuits by Stochastic Diferential Equations. In Proceedings of the EUROCON 2005—The International Conference on “Computer as a Tool”, Belgrade, Serbia, 21–24 November 2005; Volume 2, pp. 1236–1238. [Google Scholar] [CrossRef]
- Patil, N.S.; Sharma, S.N. On a non-linear stochastic dynamic circuit using Stratonovich differential. J. Frankl. Inst. 2015, 352, 2999–3013. [Google Scholar] [CrossRef]
- Huy, P.C.; Minh, N.Q.; Tien, N.D.; Anh, T.T.Q. Short-Term Electricity Load Forecasting Based on Temporal Fusion Transformer Model. IEEE Access 2022, 10, 106296–106304. [Google Scholar] [CrossRef]
- Torres, J.; Martí’nez-Álvarez, F.; Troncoso, A. A Deep LSTM Network for the Spanish Electricity Consumption Forecasting. Neural Comput. Appl. 2022, 34, 10533–10545. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Wang, Y.; Ding, Z.; Zheng, T.; Hu, J.; Zhang, K. A Transformer-Based Method of Multienergy Load Forecasting in Integrated Energy System. IEEE Trans. Smart Grid 2022, 13, 2703–2714. [Google Scholar] [CrossRef]
- Cui, Y.; Li, Z.; Wang, Y.; Dong, D.; Gu, C.; Lou, X.; Zhang, P. Informer Model with Season-Aware Block for Efficient Long-Term Power Time Series Forecasting. Comput. Electr. Eng. 2024, 119, 109492. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 4–9 August 2019; KDD ’19. pp. 2623–2631. [Google Scholar] [CrossRef]
- Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Efficient Transformer. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26 April–1 May 2020. [Google Scholar]
- Taylor, S.J.; Letham, B. Forecasting at Scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
- Kochetkova, I.; Kushchazli, A.; Burtseva, S.; Gorshenin, A. Short-Term Mobile Network Traffic Forecasting Using Seasonal ARIMA and Holt-Winters Models. Future Internet 2023, 15, 290. [Google Scholar] [CrossRef]
- Gorshenin, A.; Kozlovskaya, A.; Gorbunov, S.; Kochetkova, I. Mobile network traffic analysis based on probability-informed machine learning approach. Comput. Netw. 2024, 247, 110433. [Google Scholar] [CrossRef]
- Viroli, C.; McLachlan, G. Deep Gaussian Mixture Models. Stat. Comput. 2019, 29, 43–51. [Google Scholar] [CrossRef]
Characteristic | Gulfstream-1 | Gulfstream-2 | Labrador-1 | Labrador-2 | Tropical-1 | Tropical-2 |
---|---|---|---|---|---|---|
Number of observations | 14,612 | 14,612 | 14,612 | 14,612 | 14,612 | 14,612 |
Minimum value | 3 | 9 | ||||
Maximum value | 995 | 677 | 330 | 645 | 403 | 69 |
Mean value | 227 | 52 | 60 | 65 | 136 | 10 |
Characteristic | HUFL | HULL | MUFL | MULL | LUFL | LULL |
---|---|---|---|---|---|---|
Number of observations | ||||||
Minimum value | ||||||
Maximum value | 23.6 | 10.1 | 17.3 | 7.8 | 8.5 | 3.0 |
Mean value | 7.4 | 2.2 | 4.3 | 0.9 | 3.1 | 0.9 |
Hyperparameter | Value Range | Description |
---|---|---|
max_depth | 5–50 | Maximum tree depth |
min_samples_split | 2–50 | Minimum number of samples required to split an internal node |
min_samples_leaf | 1–20 | Minimum number of samples in a leaf |
max_features | Sqrt, | Function for the maximum number of features considered when splitting |
Hyperparameter | Value Range | Description |
---|---|---|
n_estimators | 30–300 | Number of trees |
max_depth | 5–40 | Maximum tree depth |
min_samples_split | 2–20 | Minimum number of samples required to split an internal node |
min_samples_leaf | 1–10 | Minimum number of samples in a leaf |
max_features | Sqrt, | Function for the maximum number of features considered when splitting |
Hyperparameter | Value Range | Description |
---|---|---|
n_estimators | 30–300 | Number of boosting steps |
learning_rate | – | Model learning rate |
max_depth | 3–15 | Maximum tree depth |
Subsample | – | Part of the training data for each iteration |
Hyperparameter | Value Range | Description |
---|---|---|
Units_LSTM1 | 24–256 | Number of neurons in the LSTM layer |
Units_FC1 | 24–256 | Number of neurons in the fully connected layer |
Learning_rate | – | Model learning rate |
Dropout_rate | 0– | Dropout layer parameter |
Regularization | – | regularization parameter |
Regularization | – | regularization parameter |
Epochs | 50–900 | 50 epochs are for all, while the best models are trained for another 850 epochs |
Hyperparameter | Value Range | Description |
---|---|---|
Units_LSTM1 | 24–256 | Number of neurons in the LSTM layer |
Units_FC1 | 24–256 | Number of neurons in the fully connected layer |
Num_heads | 2–32 | Number of transformer attention heads |
Num_layers | 1–10 | Number of transformer layers |
Hidden_size | 128–4096 | transformer hidden layer size |
Units_final | 4–128 | Number of neurons in the feature fusion layer |
Learning_rate | – | Model learning rate |
Dropout_rate | 0– | Dropout layer parameter |
Regularization | – | regularization parameter |
Regularization | – | regularization parameter |
Epochs | 50–900 | 50 epochs are for all, while the best models are trained for another 850 epochs |
Model | Gulfstream-1 | Gulfstream-2 | Labrador-1 | Labrador-2 | Tropical-1 | Tropical-2 |
---|---|---|---|---|---|---|
Decision tree | 0.178 | 0.103 | 0.141 | 0.111 | 0.151 | 0.083 |
Informed decision tree (I) | 0.174 | 0.100 | 0.139 | 0.109 | 0.147 | 0.080 |
Random forest | 0.125 | 0.074 | 0.095 | 0.076 | 0.108 | 0.059 |
Informed random forest (II) | 0.122 | 0.072 | 0.094 | 0.074 | 0.107 | 0.058 |
Gradient boosting | 0.134 | 0.080 | 0.103 | 0.081 | 0.115 | 0.062 |
Informed gradient boosting (III) | 0.131 | 0.079 | 0.102 | 0.079 | 0.112 | 0.061 |
LSTM | 0.078 | 0.076 | 0.080 | 0.073 | 0.083 | 0.070 |
Informed LSTM (IV) | 0.069 | 0.071 | 0.069 | 0.072 | 0.075 | 0.061 |
Informed LSTM (V) | 0.065 | 0.061 | 0.067 | 0.066 | 0.065 | 0.058 |
LSTM + transformer | 0.074 | 0.076 | 0.078 | 0.065 | 0.069 | 0.063 |
Informed LSTM + transformer (VI) | 0.060 | 0.060 | 0.066 | 0.062 | 0.061 | 0.058 |
Model | Gulfstream-1 | Gulfstream-2 | Labrador-1 | Labrador-2 | Tropical-1 | Tropical-2 |
---|---|---|---|---|---|---|
Decision tree | 34.8% | 21.1% | 25.5% | 19.9% | 25.3% | 13.7% |
Informed decision tree (I) | 33.7% | 20.1% | 24.7% | 19.1% | 24.8% | 13.5% |
Random forest | 25.4% | 18.3% | 18.6% | 13.5% | 20.0% | 10.8% |
Informed random forest (II) | 25.2% | 17.7% | 17.9% | 13.3% | 19.7% | 10.4% |
Gradient boosting | 29.3% | 18.5% | 22.3% | 18.4% | 21.2% | 14.5% |
Informed gradient boosting (III) | 27.9% | 18.2% | 22.0% | 17.7% | 20.9% | 14.3% |
LSTM | 16.8% | 17.2% | 21.8% | 15.3% | 13.4% | 16.9% |
Informed LSTM (IV) | 14.4% | 16.0% | 18.5% | 13.6% | 11.2% | 13.8% |
Informed LSTM (V) | 13.3% | 13.1% | 15.7% | 12.7% | 10.3% | 11.6% |
LSTM + transformer | 14.3% | 15.8% | 17.4% | 13.0% | 10.7% | 12.4% |
Informed LSTM + transformer (VI) | 12.8% | 12.9% | 16.5% | 10.3% | 8.8% | 10.7% |
Model | Gulfstream-1 | Gulfstream-2 | Labrador-1 | Labrador-2 | Tropical-1 | Tropical-2 |
---|---|---|---|---|---|---|
Decision tree | 0.201 | 0.126 | 0.158 | 0.130 | 0.192 | 0.103 |
Informed decision tree (I) | 0.196 | 0.124 | 0.154 | 0.125 | 0.187 | 0.101 |
Random forest | 0.145 | 0.093 | 0.115 | 0.100 | 0.138 | 0.075 |
Informed random forest (II) | 0.139 | 0.092 | 0.109 | 0.095 | 0.133 | 0.073 |
Gradient boosting | 0.155 | 0.099 | 0.124 | 0.106 | 0.149 | 0.080 |
Informed gradient boosting (III) | 0.152 | 0.093 | 0.122 | 0.103 | 0.145 | 0.078 |
LSTM | 0.092 | 0.090 | 0.096 | 0.089 | 0.101 | 0.085 |
Informed LSTM (IV) | 0.083 | 0.086 | 0.088 | 0.084 | 0.092 | 0.075 |
Informed LSTM (V) | 0.082 | 0.084 | 0.081 | 0.081 | 0.083 | 0.071 |
LSTM + transformer | 0.085 | 0.089 | 0.092 | 0.079 | 0.084 | 0.076 |
Informed LSTM + transformer (VI) | 0.074 | 0.076 | 0.077 | 0.075 | 0.071 | 0.070 |
Model | Gulfstream-1 | Gulfstream-2 | Labrador-1 | Labrador-2 | Tropical-1 | Tropical-2 |
---|---|---|---|---|---|---|
Decision tree | 38.3% | 25.9% | 26.7% | 24.4% | 32.2% | 17.0% |
Informed decision tree (I) | 38.0% | 24.9% | 26.4% | 21.9% | 31.5% | 17.0% |
Random forest | 29.5% | 23.0% | 22.5% | 17.8% | 25.6% | 13.7% |
Informed random forest (II) | 28.8% | 22.5% | 20.6% | 17.2% | 24.4% | 13.1% |
Gradient boosting | 33.2% | 22.6% | 26.4% | 24.2% | 27.4% | 18.8% |
Informed gradient boosting (III) | 32.4% | 21.9% | 26.2% | 23.1% | 26.5% | 18.4% |
LSTM | 18.7% | 19.7% | 23.5% | 16.5% | 16.0% | 19.7% |
Informed LSTM (IV) | 17.3% | 18.1% | 22.0% | 15.4% | 14.7% | 17.3% |
Informed LSTM (V) | 16.8% | 18.0% | 20.1% | 15.0% | 13.2% | 15.2% |
LSTM + transformer | 16.6% | 18.3% | 20.0% | 15.4% | 13.0% | 14.8% |
Informed LSTM + transformer (VI) | 15.7% | 16.4% | 19.2% | 12.6% | 11.3% | 12.8% |
Model | MSE | RMSE |
---|---|---|
Decision tree | 0.255 | 0.505 |
Informed decision tree (I) | 0.248 | 0.498 |
Random forest | 0.178 | 0.422 |
Informed random forest (II) | 0.166 | 0.407 |
Gradient boosting | 0.175 | 0.418 |
Informed gradient boosting (III) | 0.173 | 0.416 |
LSTM | 0.232 | 0.482 |
Informed LSTM (V) | 0.211 | 0.459 |
LSTM + transformer | 0.224 | 0.473 |
Informed LSTM + transformer (VI) | 0.207 | 0.455 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gorshenin, A.K.; Vilyaev, A.L. Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting. AI 2024, 5, 1955-1976. https://doi.org/10.3390/ai5040097
Gorshenin AK, Vilyaev AL. Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting. AI. 2024; 5(4):1955-1976. https://doi.org/10.3390/ai5040097
Chicago/Turabian StyleGorshenin, Andrey K., and Anton L. Vilyaev. 2024. "Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting" AI 5, no. 4: 1955-1976. https://doi.org/10.3390/ai5040097
APA StyleGorshenin, A. K., & Vilyaev, A. L. (2024). Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting. AI, 5(4), 1955-1976. https://doi.org/10.3390/ai5040097