Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models
Abstract
:1. Introduction
- An approach to managing hierarchical constraints was developed to enhance the performance of ML models by accounting for the prediction errors in each model (see Section 2).
- In this study, a connection was established between the summation matrix utilized in hierarchical time series (HTS) forecasting and the incidence matrix employed in traditional DR methods (see Section 2).
- The developed methods exhibit strong performance in a range of case studies with different complexities. Our tests included a three-level scenario with 9 elements in rock composition estimation from spectral signals, a three-level scenario with 14 elements in a distribution model (retail sales M5 competition), and a four-level waste deposition scenario involving more than 3000 elements (Hungarian counties, districts, and cities) (see Section 3).
2. Integrated Correction of Machine Learning Predictions Using Data Reconciliation Techniques
2.1. Formulating the Integration of Machine Learning and Data Reconciliation
2.2. Methods for Integrating Machine Learning and Data Reconciliation Techniques
3. Modeling Results for Cases of Varying Complexities
3.1. Mineral Composition of the Rock Samples
3.2. Retail Sales Forecasting
3.3. Waste Management Hierarchical Time Series Prediction with Data Reconciliation
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
a | element of the incidence matrix |
A | incidence matrix |
ARIMA | autoregressive integrated moving average |
stands for the constant values | |
CA | California state |
DR | data reconciliation |
error oAuthorsf the predicted value | |
FT-IR | Fourier-transform infrared spectroscopy |
HTS | hierarchical time series |
HWES | Holt–Winters exponential smoothing |
I | identity matrix |
k | index of the hierarchical structure level |
K | lowest level of the hierarchical structure |
ML | machine learning |
projection matrix | |
PLS | partial least squares |
R2 | correlation coefficient |
RMSE | root-mean-squared error |
S | summation matrix |
t | timestamp |
parameter of the model | |
TX | Texas state |
V−1 | covariance matrix |
WI | Washington state |
X | independent variable |
y | dependent variable |
modeled independent variable | |
reconciled independent variable |
References
- Spiliotis, E.; Abolghasemi, M.; Hyndman, R.J.; Petropoulos, F.; Assimakopoulos, V. Hierarchical forecast reconciliation with machine learning. Appl. Soft Comput. 2021, 112, 107756. [Google Scholar] [CrossRef]
- Athanasopoulos, G.; Gamakumara, P.; Panagiotelis, A.; Hyndman, R.J.; Affan, M. Hierarchical forecasting. In Macroeconomic Forecasting in the Era of Big Data: Theory and Practice; Springer: Cham, Switzerland, 2020; pp. 689–719. [Google Scholar]
- Hyndman, R.J.; Ahmed, R.A.; Athanasopoulos, G.; Shang, H.L. Optimal combination forecasts for hierarchical time series. Comput. Stat. Data Anal. 2011, 55, 2579–2589. [Google Scholar] [CrossRef]
- Neubauer, L.; Filzmoser, P. Rediscovering Bottom-Up: Effective Forecasting in Temporal Hierarchies. arXiv 2024, arXiv:2407.02367. [Google Scholar]
- Jeon, J.; Panagiotelis, A.; Petropoulos, F. Probabilistic forecast reconciliation with applications to wind power and electric load. Eur. J. Oper. Res. 2019, 279, 364–379. [Google Scholar] [CrossRef]
- Athanasopoulos, G.; Hyndman, R.J.; Kourentzes, N.; Panagiotelis, A. Forecast reconciliation: A review. Int. J. Forecast. 2023, 40, 430–456. [Google Scholar] [CrossRef]
- Hyndman, R.J.; Athanasopoulos, G. Optimally Reconciling Forecasts in a Hierarchy. Foresight Int. J. Appl. Forecast. 2014, 35, 42–48. [Google Scholar]
- Panagiotelis, A.; Gamakumara, P.; Athanasopoulos, G.; Hyndman, R. Probabilistic forecast reconciliation: Properties, evaluation and score optimisation. Eur. J. Oper. Res. 2023, 306, 693–706. [Google Scholar] [CrossRef]
- Van Erven, T.; Cugliari, J. Game-theoretically optimal reconciliation of contemporaneous hierarchical time series forecasts. In Modeling and Stochastic Learning for Forecasting in High Dimensions; Springer: Cham, Switzerland, 2015; pp. 297–317. [Google Scholar]
- Nystrup, P.; Lindström, E.; Pinson, P.; Madsen, H. Temporal hierarchies with autocorrelation for load forecasting. Eur. J. Oper. Res. 2020, 280, 876–888. [Google Scholar] [CrossRef]
- Hyndman, R.J.; Lee, A.J.; Wang, E. Fast computation of reconciled forecasts for hierarchical and grouped time series. Comput. Stat. Data Anal. 2016, 97, 16–32. [Google Scholar] [CrossRef]
- Leprince, J.; Madsen, H.; Møller, J.K.; Zeiler, W. Hierarchical learning, forecasting coherent spatio-temporal individual and aggregated building loads. arXiv 2023, arXiv:2301.12967. [Google Scholar] [CrossRef]
- Taghiyeh, S.; Lengacher, D.C.; Sadeghi, A.H.; Sahebi-Fakhrabad, A.; Handfield, R.B. A novel multi-phase hierarchical forecasting approach with machine learning in supply chain management. Supply Chain. Anal. 2023, 3, 100032. [Google Scholar] [CrossRef]
- Ashouri, M.; Hyndman, R.J.; Shmueli, G. Fast forecast reconciliation using linear models. J. Comput. Graph. Stat. 2022, 31, 263–282. [Google Scholar] [CrossRef]
- Hanzelik, P.P.; Kummer, A.; Ipkovich, Á.; Abonyi, J. Fusion and integrated correction of chemometrics and machine learning models based on data reconciliation. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2023; Volume 52, pp. 1379–1384. [Google Scholar]
- Narasimhan, S.; Jordache, C. Data Reconciliation and Gross Error Detection: An Intelligent Use of Process Data; Elsevier: Amsterdam, The Netherlands, 1999. [Google Scholar]
- Godiño, J.A.V.; Aguilar, F.J.J.E. Joint data reconciliation and artificial neural network based modelling: Application to a cogeneration power plant. Appl. Therm. Eng. 2024, 236, 121720. [Google Scholar] [CrossRef]
- Dabros, M.; Amrhein, M.; Bonvin, D.; Marison, I.W.; von Stockar, U. Data reconciliation of concentration estimates from mid-infrared and dielectric spectral measurements for improved on-line monitoring of bioprocesses. Biotechnol. Prog. 2009, 25, 578–588. [Google Scholar] [CrossRef] [PubMed]
- Bennouna, O.; Heraud, N.; Rodriguez, M.; Camblong, H. Data reconciliation and gross error detection applied to wind power. Proc. Inst. Mech. Eng. Part I J. Syst. Control. Eng. 2007, 221, 497–506. [Google Scholar] [CrossRef]
- Narasimhan, S.; Bhatt, N. Deconstructing principal component analysis using a data reconciliation perspective. Comput. Chem. Eng. 2015, 77, 74–84. [Google Scholar] [CrossRef]
- Hanzelik, P.P.; Kummer, A.; Abonyi, J. Edge-Computing and Machine-Learning-Based Framework for Software Sensor Development. Sensors 2022, 22, 4268. [Google Scholar] [CrossRef]
- Sundaramoorthy, A.S. Probabilistic Graphical Models for Data Reconciliation and Causal Inference in Process Data Analytics. Master’s Thesis, University of Alberta Libraries, Edmonton, AB, Canada, 2021. [Google Scholar]
- Balaram, V.; Sawant, S.S. Indicator Minerals, Pathfinder Elements, and Portable Analytical Instruments in Mineral Exploration Studies. Minerals 2022, 12, 394. [Google Scholar] [CrossRef]
- Hanzelik, P.P.; Gergely, S.; Gáspár, C.; Győry, L. Machine learning methods to predict solubilities of rock samples. J. Chemom. 2020, 34, e3198. [Google Scholar] [CrossRef]
- Xu, Z.; Cornilsen, B.C.; Popko, D.C.; Pennington, W.D.; Wood, J.R.; Hwang, J.Y. Quantitative mineral analysis by FTIR spectroscopy. Internet J. Vib. Spectrosc 2001, 5. Available online: https://www.irdg.org/ijvs/ijvs-volume-5-edition-1/quantitative-mineral-analysis-by-ftir-spectroscopy (accessed on 31 October 2024).
- Raven, M.D.; Self, P. Outcomes of 12 Years of the Reynolds Cup Quantitative Mineral Analysis Round Robin. Clays Clay Miner. 2017, 65, 122–134. [Google Scholar] [CrossRef]
- Motoso, O.; McCarty, D.; Hillier, S.; Kleeberg, R. Some successful approaches to quantitative mineral analysis as revealed by the Reynolds Cup contest. Clays Clay Miner. 2006, 54, 748–760. [Google Scholar] [CrossRef]
- Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M5 competition: Background, organization, and implementation. Int. J. Forecast. 2022, 38, 1325–1336. [Google Scholar] [CrossRef]
- Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. M5 accuracy competition: Results, findings, and conclusions. Int. J. Forecast. 2022, 38, 1346–1364. [Google Scholar] [CrossRef]
- Zavíralová, L.; Šomplák, R.; Pavlas, M.; Kropác, J.; Popela, P.; Putna, O.; Gregor, J. Computational system for simulation and forecasting in waste management incomplete data problems. Chem. Eng. Trans. 2015, 45, 763–768. [Google Scholar]
- De-la Mata-Moratilla, S.; Gutierrez-Martinez, J.M.; Castillo-Martinez, A.; Caro-Alvaro, S. Prediction of the Behaviour from Discharge Points for Solid Waste Management. Mach. Learn. Knowl. Extr. 2024, 6, 1389–1412. [Google Scholar] [CrossRef]
- Eryganov, I.; Roseckỳ, M.; Šomplák, R.; Smejkalová, V. Forecasting the waste production hierarchical time series with correlation structure. Optim. Eng. 2024, 1–23. [Google Scholar] [CrossRef]
- Pavlas, M.; Šomplák, R.; Smejkalová, V.; Nevrlý, V.; Zavíralová, L.; Kůdela, J.; Popela, P. Spatially distributed production data for supply chain models-Forecasting with hazardous waste. J. Clean. Prod. 2017, 161, 1317–1328. [Google Scholar] [CrossRef]
- Kalekar, P.S. Time series forecasting using holt-winters exponential smoothing. Kanwal Rekhi Sch. Inf. Technol. 2004, 4329008, 1–13. [Google Scholar]
Actual | Submitted * | SD ** | |
---|---|---|---|
Quartz | 29.9 | 30.67 | 0.77 |
K-feldpar and Plagioclase | 8.6 | 8.0 | 0.6 |
Calcite | 4.6 | 4.47 | 0.13 |
Kaolinite | 15.0 | 15.7 | 0.7 |
Total clay without Kaolinite | 20.2 | 19.17 | 1.03 |
Dolomite, Magnesite, Hematite Aragonite, Fluorite, Apatite … | 21.7 | 21.73 | 0.03 |
level 0 | Total | |||||
---|---|---|---|---|---|---|
y_real | 100 | |||||
_1st | 103.24 | |||||
_2nd | 102.88 | |||||
_3rd | 100 | |||||
level 1 | Silicate | Carbonate | ||||
y_real | 79 | 21 | ||||
_1st | 66.55 | 36.69 | ||||
_2nd | 66.20 | 36.68 | ||||
_3rd | 65.22 | 34.78 | ||||
level 2 | Quartz | K-feldp. and Plagio. | Kaolin. | Clay without Kaolin. | Calcite | Dolom. Magne. Hemat. … |
y_real | 51 | 14 | 2 | 12 | 11 | 10 |
_1st | 37.57 | 12.81 | 1.67 | 12.19 | 19.32 | 14.33 |
_2nd | 37.56 | 12.73 | 1.60 | 12.72 | 19.36 | 14.02 |
_3rd | 37.70 | 13.04 | 2.16 | 12.32 | 19.78 | 15.00 |
level 0 | Country | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
y_real | 22.0 | |||||||||
_1st | 14.102 | |||||||||
_3rd | 13.938 | |||||||||
level 1 | CA | TX | WI | |||||||
y_real | 11.0 | 1.0 | 10.0 | |||||||
_1st | 6.034 | 1.569 | 4.416 | |||||||
_3rd | 6.278 | 2.124 | 5.536 | |||||||
level 2 | CA_1 | CA_2 | CA_3 | CA_4 | TX_1 | TX_2 | TX_3 | WI_1 | WI_2 | WI_3 |
y_real | 2.0 | 4.0 | 1.0 | 4.0 | 0.0 | 1.0 | 0.0 | 1.0 | 3.0 | 6.0 |
_1st | 1.656 | 1.314 | 1.685 | 0.432 | 0.244 | 0.052 | 0.656 | 2.627 | 1.872 | 1.885 |
_3rd | 1,729 | 1.376 | 1.879 | 1.294 | 0.458 | 0.544 | 1.122 | 2.442 | 1.501 | 1.593 |
2020 | 2021 | 2022 | |
---|---|---|---|
level 0 | Total | ||
3,301,482.3 | 3,350,245.0 | 3,215,457.4 | |
_1st | 3,254,760.8 | 3,256,817.8 | 3,258,874.8 |
_3rd | 3,254,503.5 | 3,256,560.6 | 3,258,617.7 |
level 1 | County | ||
119,844.3 | 120,933.9 | 116,180.1 | |
_1st | 116,495.1 | 117,577.4 | 118,659.6 |
_3rd | 121,780.4 | 122,861.6 | 123,942.9 |
level 2 | District | ||
15,558.7 | 16,903.2 | 15,751.4 | |
_1st | 14,640.5 | 15,013.5 | 15,386.6 |
_3rd | 14,767.7 | 15,140.1 | 15,512.5 |
level 3 | Settlement | ||
1341.7 | 1427.4 | 1355.0 | |
_1st | 1037.7 | 1031.7 | 1025.7 |
_3rd | 1038.3 | 1032.3 | 1026.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hanzelik, P.P.; Kummer, A.; Abonyi, J. Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Mach. Learn. Knowl. Extr. 2024, 6, 2601-2617. https://doi.org/10.3390/make6040125
Hanzelik PP, Kummer A, Abonyi J. Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Machine Learning and Knowledge Extraction. 2024; 6(4):2601-2617. https://doi.org/10.3390/make6040125
Chicago/Turabian StyleHanzelik, Pál Péter, Alex Kummer, and János Abonyi. 2024. "Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models" Machine Learning and Knowledge Extraction 6, no. 4: 2601-2617. https://doi.org/10.3390/make6040125
APA StyleHanzelik, P. P., Kummer, A., & Abonyi, J. (2024). Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Machine Learning and Knowledge Extraction, 6(4), 2601-2617. https://doi.org/10.3390/make6040125