Next Article in Journal
Diversifying Multi-Head Attention in the Transformer Model
Previous Article in Journal
Adaptive AI Alignment: Established Resources for Aligning Machine Learning with Human Intentions and Values in Changing Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models

1
Data, AI and Innovation, MOL ITD GBS HU Ltd., Dombovari Way 28, H-1117 Budapest, Hungary
2
HUN-REN Complex Systems Monitoring Research Group, University of Pannonia, Egyetem Street 10, H-8200 Veszprem, Hungary
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2024, 6(4), 2601-2617; https://doi.org/10.3390/make6040125
Submission received: 23 July 2024 / Revised: 11 October 2024 / Accepted: 1 November 2024 / Published: 11 November 2024
(This article belongs to the Section Data)

Abstract

:
In the context of hierarchical system modeling, ensuring constraints between different hierarchy levels are met, so, for instance, ensuring the aggregation constraints are satisfied, is essential. However, modelling and forecasting each element of the hierarchy independently introduce errors. To mitigate this balance error, it is recommended to employ an optimal data reconciliation technique with an emphasis on measurement and modeling errors. In this study, three different machine learning methods for development were investigated. The first method involves no data reconciliation, relying solely on machine learning models built independently at each hierarchical level. The second approach incorporates measurement errors by adjusting the measured data to satisfy each constraint, and the machine learning model is developed based on this dataset. The third method is based on directly fine-tuning the machine learning predictions based on the prediction errors of each model. The three methods were compared using three case studies with different complexities, namely mineral composition estimation with 9 elements, forecasting of retail sales with 14 elements, and waste deposition forecasting with more than 3000 elements. From the results of this study, the conclusion can be drawn that the third method performs the best, and reliable machine learning models can be developed.

1. Introduction

Digitalization offers significant opportunities through various machine learning (ML) algorithms that can replace conventional quality assurance methods. When developing ML models for processes, only certain parameters can typically be estimated with adequate precision. These ML model estimations carry varying degrees of error. Our modeling approach integrates engineering insights and additional data to ensure that the estimates meet specific constraints or conditions. The balance equations between hierarchical levels must be satisfied in the context of modeling hierarchical systems, and these constraints should be considered when training models.
Frequently, the systems to be modeled are naturally organized in hierarchical structures, mainly in forecasting problems; for instance, the demand for a product can be recorded on different hierarchy levels, like on a store, regional, or country level [1]. The measurements and observed values at each level will add up to the higher levels, which is called “coherence” [2]. In practical solutions, model development and prediction occur independently at each hierarchy level, resulting in incoherence in the results, so the predictions do not aggregate well.
This incoherence or balance error between the hierarchy levels was handled myopically with “bottom-up”, “middle-out”, and “top-down” methods. With these methods, it is not necessary to develop a model for each hierarchy level, but only one model is developed, and then we aggregate and/or disaggregate the predictions to other hierarchy levels [3]. For instance, the “bottom-up” technique works by developing models at the most granular level of the hierarchy and then summing lower-level predictions [4].
The fact that these myopic methods do not consider some useful information on other hierarchy levels is partly solved by combination approaches, which use statistically weighted information from all hierarchical levels [5]. However, none of the mentioned approaches result in optimal reconciliation among the predictions [6].
Optimal reconciliation is based on generating independent models for all elements of the hierarchy, where the prediction results are incoherent. However, the next step is to perform optimal reconciliation to adjust the independent predictions, which will lead to the prediction being consistent with the hierarchical structure [7]. In this case, reconciliation is formulated as a regression model which projects the elementary forecasts and predictions onto a subspace where the predictions adhere to the aggregation constraints [8]. Weighted squared-error optimal reconciliation was proposed to perform optimal reconciliation forecasts, where the base predictions are adjusted minimally due to the least-squares function [9,10]. Generalized least squares and weighted least squares are also a solution that can be used to obtain the optimal reconciled forecasts and predictions, but it is really difficult to estimate the covariance matrices used in the solution. The easiest way to perform an ordinary least-squares estimate is to reconcile the base predictions [11]. These hierarchical forecasting methods worked well in different cases, like forecasting the electrical loads of a building [12], in supply chain forecasting [13] or in tourism forecasting [14].
The presented optimal reconciliation approaches only deal with the constraint due to the hierarchical system, but other system- and modeler-defined specifications and constraints can be imagined, too; for instance, the predictions must add up to a user-specified constraint, like the percentages must add up to 100 percent [15]. Data reconciliation (DR) is a general method used to implement and define both hierarchical constraints and any other necessary constraint [16]. DR enables the correction of measurements, reducing the associated uncertainty while satisfying different constraints [17].
The advantages of DR are also presented in various case studies, for instance, in analysis of measurement outcomes within the field of analytical chemistry, where the application of DR enhances predictions through real-time mass and element balancing. This strategy has been shown to decrease the standard error in predictions, eliminating the need for additional offline analyses [18]. The efficacy of the DR method in processing analytical chemistry data is clearly beneficial. In practical problems, maintaining variable consistency is crucial, and increased process automation requires rigorous monitoring. Errors are unavoidable in the measurement, processing, and transmission of signals. These errors can degrade the performance of the monitoring and control system and sometimes lead to process failures. Thus, minimizing these error effects is critical [19]. DR improves process measurement estimates by mitigating the impact of random errors [20]. The outcomes are independent and consistent, indicating that the value model at a higher hierarchical level produces results that differ from the aggregation of the lower-level models’ results. In this case study, the ML models are structured hierarchically and subject to specific constraints.
In this study, a hierarchical modeling approach with optimal DR is presented that improves the performance of ML models to facilitate optimization with digital tools. With this approach, ML models can be developed that satisfy hierarchical and other user-specified constraints. Furthermore, with the expansion of software sensors in Industry 4.0 [21], the proposed method can be implemented in edge computing devices. Various case studies with different complexities were used to evaluate the proposed methodology. The main novelties of our work are summarized as follows:
  • An approach to managing hierarchical constraints was developed to enhance the performance of ML models by accounting for the prediction errors in each model (see Section 2).
  • In this study, a connection was established between the summation matrix utilized in hierarchical time series (HTS) forecasting and the incidence matrix employed in traditional DR methods (see Section 2).
  • The developed methods exhibit strong performance in a range of case studies with different complexities. Our tests included a three-level scenario with 9 elements in rock composition estimation from spectral signals, a three-level scenario with 14 elements in a distribution model (retail sales M5 competition), and a four-level waste deposition scenario involving more than 3000 elements (Hungarian counties, districts, and cities) (see Section 3).

2. Integrated Correction of Machine Learning Predictions Using Data Reconciliation Techniques

This section outlines the methods used in our approach to combine ML and DR techniques to improve the accuracy of ML predictions. By merely predicting each series separately, the hierarchical or grouping structure is ignored, resulting in forecasts that are not ‘coherent’, so they do not aggregate correctly and do not satisfy the hierarchical constraints.

2.1. Formulating the Integration of Machine Learning and Data Reconciliation

The goal of this study is to correct ML predictions using optimal DR considering the modeling errors in hierarchical problems. Due to the hierarchical structure used, our method deals with multivariate modeling by considering multiple target variables ( y ). The basic assumption is that all modeled variables are subject to error, so the model estimates for all variables need to or should be corrected. Consider hierarchical data, the target variables ( y t ) in the t-th sample instance can be written, in general, as follows:
y t = f ( x t , θ ) + ϵ t
where y t is the vector of the target variables, x t represents the matrix of independent variables, f denotes the set of ML models, θ represents the matrix of model parameters, and ϵ t is the prediction error vector. The model prediction ( y ^ ) of the target variables can be written, in general, as follows:
y ^ t = f ( x t , θ )
Given that the measurements must adhere to the specified constraint, it is imperative that the model predictions comply as well. To achieve this, DR is used in this study. This process adjusts the predicted variables minimally to ensure compliance with a set of model constraints. It aims to reduce the discrepancy between predicted and reconciled values while considering the variance of these variables and guarantees that the reconciled parameters meet certain equality and inequality constraints [22]. Typically, the objective function for minimization is denoted as Equation (3). The general non-linear DR problem is outlined as follows:
min y ˜ t y ^ t y ˜ t V 1 y ^ t y ˜ t
subject to
h y ˜ t = 0
g y ˜ t 0
where V 1 is the inverse of the covariance matrix of errors, t is the sample instance, y ^ t is a vector of model predictions, y ˜ t is a vector of reconciled values for each target variable, h is a vector that describes the functional form of model equality constraints, and g is a vector that describes the functional form of model inequality constraints [16,22].
Let us refer to the general HTS problem in Figure 1, where the relationship and subordination at the different hierarchy levels are illustrated [12]. For simplicity reasons, the measured and unmeasured variables were not labeled in the hierarchy. Every element (node) of the hierarchy (tree) is labeled as y t , j k , p , where t = 1 … n denotes the sample instance, k = 1 … K means the level of the hierarchy, p presents the node’s parent on the upper ( k 1 ) hierarchy level, and j = 1 … q p k is the number of child nodes at the k-th hierarchy level of the p-th parent.
This hierarchical structure means that the element at the root level ( y t 0 ) is the sum of the elements in the first level of the hierarchy, formally
y t 0 = j = 1 q 1 , 1 y t , j 1 , 1
where q 1 , 1 is the number of child nodes at the first hierarchy level.
A general element on the k-th level ( y t , j k , p ) is the sum of its child nodes, formally written as
y t , j k , p = i = 1 q j k + 1 , j y t , i k + 1 , j
The number of elements in the k-th hierarchical level ( q k ) can be formalized as the sum of the number of child nodes in the upper hierarchy level, as follows:
q k = p = 1 q k 1 q p k
The number of nodes in the hierarchy is obtained if the number of nodes in each hierarchy level is added together.
q = k = 0 K q k
By stacking all the tree elements in a vector ( y t ) based on Figure 1, the following is obtained:
y t = [ y t 0 , y t , 1 1 , 1 y t , q 1 1 1 , 1 y t , 1 T , y t , 1 2 , 1 , y t , q 1 2 2 , 1 y t , 1 2 , q 1 1 , y 1 , q q 1 1 2 2 , q 1 1 y t , 2 T , , y t , K T ] T
where y t , K represents the elements on the bottom hierarchy level (K-th level).
Every element at each hierarchy level can be calculated using a summation matrix ( S ) and the bottom-level elements ( y t , K ) as follows:
y t = y t 0 y t , 1 y t , K = S y t , K
The coherence requirements within the hierarchy can be defined with the help of a summing matrix ( S ), which dictates the method in which the bottom-level series aggregate at higher hierarchy levels. The element of the summation matrix ( s i , j ) refers to the tree node element i and tree leaf element j. This sets the matrix element of a given node i to 1 or 0 depending on if it is an ancestor of the leaf element j or not, respectively. The dimensions of S are q × q K , where q represents the total number of nodes in the hierarchy and q K represents the number of nodes at the bottom level.
For the optimized reconciliation of the samples, a connection must be established between the summation matrix ( S ) and the equality constraint of the DR technique at each hierarchical level (Equation (4)). Due to the hierarchical structure, this equality constraint can be formalized as A y ˜ t = 0 , where A is the incidence matrix. To create matrix A , with dimensions q × q, the [ 0 S ] q × (q- q K ) null matrix and q × q K summation matrix ( S ) need to be subtracted from the q × q identity matrix ( I ).
A = I 0 S
Based on this, the following relation can be obtained to formalize the linear constraint for DR using the summation matrix:
A y ˜ t = ( I [ 0 S ] ) y ˜ t = 0
We also consider additional linear equality constraints which need to be satisfied in addition to the hierarchical constraints, as shown in Equation (14):
A * y ˜ t = b *
Each linear constraint can be grouped, as shown in Equation (15):
A A * y ˜ t = 0 b *
and, in a more compact form, we can rewrite this as follows:
A ˜ y ˜ t = b ˜
The analytical solution of the DR (Equation (3)) with linear constraints (Equation (16)) is the following:
y ˜ t = ( I V 1 A ˜ T ( A ˜ V 1 A ˜ T ) 1 A ˜ ) y ^ t + V 1 A ˜ T ( A ˜ V 1 A ˜ T ) 1 b ˜
where I is the identity matrix, A ˜ is the incidence matrix, and b ˜ are the constraint values. Equation (18) can be written in a more compact form as follows:
y ˜ t = P y ^ t + b
where P is the projection matrix and b is the correction term needed to satisfy the linear constraints.
P = ( I V 1 A ˜ T ( A ˜ V 1 A ˜ T ) 1 A ˜ )
b = V 1 A ˜ T ( A ˜ V 1 A ˜ T ) 1 b ˜
In the case of linear regression models, Equation (21) becomes the following:
y ˜ t = P θ x e , t + b
where θ represents the model parameters and x e , t represents the extended input features at the t-th sample instance [16].

2.2. Methods for Integrating Machine Learning and Data Reconciliation Techniques

Based on the available information, ML models can be developed in three ways, as presented in Figure 2. The first route is to develop an ML model without any DR. The second route is that if there is information about the measurement uncertainty, then the inverse covariance matrix can be defined with the standard deviation of the measurement errors, and then DR can be performed on the measured dataset. Since the measurements are independent from each other, the covariance matrix for DR contains non-zero values only in the diagonal elements. After DR, ML models can be developed using the reconciled dataset.
The third route is when there is no information about the measurement uncertainty, but the prediction errors can be used to fine-tune the model predictions.
In this case, first, the ML models are trained on the raw dataset of the measurements, and then they are reconciled with the model predictions, where the standard deviation of the prediction errors is included in the inverse covariance matrix used to perform DR. In this case, we assumed that the modeling errors are independent of each other due to the fact that an independent ML model was developed for each node. Therefore, in this case, the developed covariance matrix for DR contains non-zero values only on the diagonals. Besides this assumption, there can be cases where the modeling errors are not independent, and hence the covariance matrix for DR should be developed with care in these cases.

3. Modeling Results for Cases of Varying Complexities

Hierarchical modeling with optimal DR was performed in three different case studies with different complexity levels. The first case study is about predicting the mineral composition of different rock samples from spectral data to support oil exploration. This problem includes three hierarchy levels with nine elements.
In the second case, a Walmart commercial dataset included in a Kaggle competition was investigated, where reconciliation was performed between time series predictions on three hierarchy levels with 14 elements. The third case also includes an HTS analysis dataset used to predict regional waste deposition in Hungary on four hierarchy levels with more than 3000 elements.

3.1. Mineral Composition of the Rock Samples

The first case study uses the results of data generated during oil exploration and production as input data. The traditional approach to determining the mineral composition of rock samples has been to take field samples/drill cores collected during field investigations to the laboratory. Analytical data are prepared using traditional laboratory measurement methods, which are costly and time consuming, may include the use of hazardous substances, and are challenging due to the exploration of large areas [23]. From the point of view of technology development, the presented methodology is suitable for obtaining an accurate picture of the analysis during oil field drilling. This is a complex task for geologists in oil exploration and production. Geological analyses classify the different storage rocks into separate categories, which is particularly important from the point of view of geological interpretations [24]. Furthermore, knowing the detailed composition of the elements is also essential before executing the various efficiency-enhancing procedures of the wells that are already in operation.
Fourier-transform infrared spectroscopy (FT-IR) measurements can also determine the composition and properties of rocks. In these cases, chemometric (ML) models are necessary to analyze spectra [24]. These models offer the possibility of installing software sensors in drill heads, which allows for a detailed analysis of the composition of rocks in the field [25].
The mineral composition of the rocks used in the modeling is represented by mass percentage values, which must total 100%. Our task involves a hierarchical regression process to predict the specific mineral composition of the rocks, ensuring their sum equals 100%. The aims of this methodology are to develop an algorithm that outperforms traditional models while meeting the initial conditions. In this study, ML models were constructed for determining the mineral composition of rock samples. The target (dependent) variables y of the model are derived from X-ray diffraction (XRD) measurements, while the independent variables X are derived from FT-IR spectra.
In the first method, a partial least-squares (PLS) regression model was developed, creating a distinct model for each element’s composition. In the second method, DR was applied considering the uncertainty associated with traditional XRD measurement errors. For the third method, DR was performed on the model predictions using the uncertainty of these predictions. The training dataset consisted of 618 samples and applied 10-fold cross-validation to fine-tune the PLS parameters. The refined models were then evaluated on 305 unknown samples.
The dependent variables are the mineral compositions obtained from different rocks. Despite these traditional laboratory measurements being performed with an XRD device, the total of the mineral compositions does not add up to 100, as expected. The DR method is also applicable to conventional laboratory measurements. To calculate the standard deviation of the XRD measurements, the measurement uncertainty associated with traditional measurements reported in the literature was taken into account. Since 2000, the Reynolds Cup has been held every two years, allowing various institutions and companies to participate in lap sample measurements [26]. From these measured data, an aggregated result is compiled in Table 1, which shows the standard deviation of the XRD measurements for the six mineral compositions [27].
The hierarchical levels related to the rock case study and the relationship between parent and child nodes are presented in Figure 3. This figure presents the hierarchical structure at the highest level (level 0), where the total of the compositions must always be equal to 100. At the subsequent level (level 1), y t , 1 1 , 1 presents a silicate, and y t , 2 1 , 1 presents carbonate rock types. The lowest level (level 2) includes the variables of the six mineral compositions. In this case study, siderite is represented by quartz, K-feldspar and plagioclase, calcite, and kaolinite. The total clay without kaolinite, along with dolomite, magnesite, and hematite, constitutes the carbonate group, and the t index in this case study is the number of samples, not a timestamp.
The hierarchical relationship in the mineral composition is shown in detail in the summation matrix ( S ).
S = 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1
The incidence matrix ( A ˜ ) is
A ˜ = 1 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0
In this case study, the constraint that the sum of the silicate and carbonate content should be 100 is included in matrix b ˜ .
b ˜ = 0 0 100
V is a 8 × 8 matrix that includes the standard deviation of all variables in the diagonal of the matrix. So, the standard deviation of the measurements in the second method is taken from Table 1, and in the third method, it is the standard deviation of the ML prediction errors.
In the following, the results of the development of ML models for each method are presented through the average prediction errors and through the hierarchical balance error. The results of different hierarchical levels and different methods were compared with the root-mean-squared error of the prediction ( R M S E ) value, which indicates the accuracy of each estimate. At the first level, the third method performed the best for silicate ( R M S E = 4.577%), and the first method was the best ( R M S E = 4.154%) for carbonate, but here, the balance error was not met. The most significant difference is seen between the kaolinite R M S E values at the two levels, where the first method performed the best ( R M S E = 1.584%), but in this case, the balance errors were not met either.
The results for the rock samples are summarized in the bar plots in Figure 4.
Table 2 presents a chosen subset from the dataset of 305 rock samples, illustrating the actual composition of the samples and the outcomes achieved by each method. At level 0, the DR error is 3.24 for method 1 and 2.88 for method 2, while the third method exhibits no DR error. At level 1, method 1 has an error of 3.24, method 2 has an error of 2.88, and method 3 shows no error. At the lowest level (level 2), method 1 has a DR error of 2.11, method 2 has an error of 2.01, and method 3 has no error. These results clearly demonstrate the advantage of the third method, as the initial conditions are also satisfied, considering the errors.
The developed method helps to better predict the rock composition of oil fields. It can also be used to balance the material flows of products produced during chemical processes. It is also suitable for monitoring and controlling chemical and separation technology processes. Incorporating the correlations between predicted values facilitates the embedding of ML models into an industrial environment. In addition, the DR technique increases the usability of the models.

3.2. Retail Sales Forecasting

The second case study discussed in the article involves an HTS analysis conducted on data from the Walmart shopping chain, part of a data analysis competition announced in 2020 as M5. The competition aimed to improve the accuracy of forecasting and empirically compare various forecasting methods [28]. The input data consist of unit sales for 3049 products in the US.
These products are sold in 10 shops across three states (CA, TX, WI).
Locations such as states and shops were selected carefully, representing different waiting habits, purchasing dynamics, durable consumer goods, and fast- and slow-moving products. The M5 dataset is an excellent choice due to its meaningful hierarchies and cross-sectional levels. The daily data span from 29.01.2011 to 19.06.2016 (1969 days), with the training set data spanning from 29.01.2011 to 24.04.2016 (1913 days), a 27-day validation period spanning from 25.04.2016 to 22.05.2016, and a 28-day test period spanning from 23.05.2016 to 19.06.2016. Explanatory variables include calendar information, sales prices, and promotional activities [29]. Figure 5 illustrates a hierarchical diagram of the M5 case study, and the related incidence matrix is shown in Equation (24).
In this case, the hierarchy of levels 0, 1, and 2 has ten elements at the lowest level (level 2: shop), three elements in a 4-3-3 split at the middle level (level 1: state) above, and one element at the top (level 0: country) in S .
The hierarchical relationship in the sales dataset is shown in detail in the summation matrix ( S ).
S = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
The incidence matrix ( A ˜ ) is
A ˜ = 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1
In this case study, the sum of the lower levels should be output to predict the upper level. Matrix b ˜ can be written in the following form:
b ˜ = 0 0 0 0
Figure 6 illustrates the performance of several forecasting models along with their hierarchical results. The metric used is the R M S E , displayed on the y axis. Different DR techniques are shown on the x axis, with colours indicating the hierarchical levels. In general, the middle-out method has the highest R M S E (2.421), while the bottom-up method has the lowest (2.241). Despite the small overall differences in the R M S E , the bottom-up method is the best performer. At the country level, the top-down method has the highest R M S E (5.408), whereas the bottom-up method, with a significantly lower R M S E (4.822), is more efficient and suitable for country-level forecasting. At the state level, the top-down method has the highest R M S E (2.911), while the bottom-up method has the lowest (2.487). In this category, the bottom-up method is the most effective. At the shop level, there are slight differences among all four methods. The top-down method has the highest R M S E (1.656), while the bottom-up method has the lowest (1.569).
With the introduction and development of the analytical DR solution in this article, the overall R M S E from the sales data is 2.305, which is in the mid-range compared to the other methods. At the country level, the analytical DR solution’s value is 4.848, ranking second after the bottom-up method, indicating strong performance. At the state level, the DR analytical solution’s value is 2.839, also showing good performance relative to the other methods. At the shop level, the DR analytical solution’s value is 1.633, which is significantly lower than that of the other methods, though the RMSE of the bottom-up method is even lower. In this case study, the initial dataset satisfies the hierarchical balance and does not require reconciliation. Consequently, the second method is less pertinent. Only the first and third methods are suitable in this scenario. By emphasizing one of the forecasted outcomes on 18.05.2016, the results of the first and third methods are summarized in Table 3. The results show that the third method gives similar results to the first method, but the third method involves comparing the data. At the national level, the third method provides the best results, but bottom-up is the second worst method. At the state level, the bottom-up method gives the best result, with the third method in the middle. There are slight differences at the shop level, with the lowest error obtained by the DR techniques given by the bottom-up method and the second lowest by the third method.
When comparing the results tested on the data of 18 May 2016, the analytical solution of the HTS analysis using the DR technique (third method) showed little difference from the results of the first methodology. Although the third methodology was among the top performers at the country level, it ranked in the lower half for the state and shop cases.

3.3. Waste Management Hierarchical Time Series Prediction with Data Reconciliation

Waste management is a critical issue in Hungary due to the increasing amount of solid waste generated yearly. Effective waste management strategies require the accurate forecasting of waste generation, which can be achieved through time series analysis. Much of the literature focuses on predicting waste quantities as accurately as possible because precise solid waste forecasts are crucial for the circular economy, aiming to maximize recycling and enhance energy efficiency [30]. An integral part of these forecasts involves predicting waste amounts at collection points, which is vital from a workload perspective [31]. Eryganov et al. employed reconciliation in HTS forecasting, which is known to improve the quality of initial forecasts [32]. Moreover, HTS and DR technologies were applied in analyzing hazardous waste, utilizing a tree structure that mirrors the organizational layout of a region (including regions, micro-regions, and their sections), with the autoregressive integrated moving average (ARIMA) algorithm being used for predictions [33]. However, waste data are often incomplete and inconsistent, which results in unreliable predictions. HTS analysis has proven to be a powerful method for forecasting when dealing with multiple time series at various levels of aggregation. Additionally, DR techniques can enhance accuracy by resolving data inconsistencies. In this case study, HTS analysis and DR techniques were investigated to forecast solid waste generation within Hungary’s hierarchical waste management system.
The solid waste data for Hungary were sourced from the Hungarian Central Statistical Office database. This comprehensive dataset includes the volume of solid waste at the national level for all 19 counties, including Budapest (the capital city), 175 districts, and 3155 settlements. Consequently, the HTS analysis incorporates levels 0, 1, 2, and 3.
The modeling process used the Holt–Winters exponential smoothing (HWES) algorithm in this research, with annual data aggregation [34]. Data on solid waste quantities were collected from 2010 to 2022. The HWES models were trained from 2010 to 2019 and forecasted until 2022, and the measured data are available for 2020, 2021, and 2022. A settlement is selected in Table 4, and the forecast obtained by the first ML method and the third ML + DR method is displayed. The second method is not considered because in the first case, the balance error is met, so the DR of the input data of the ML models is not relevant in this case study. The reliability of the methods was examined for the years 2020, 2021, and 2022. The selected settlement is a randomly selected small town in Hungary, located in Veszprém County.
Figure 7 shows the RMSE value of the model errors, taking into account the national level, including all counties, all districts, and all settlements, in 2020, 2021 and 2022. Figure 7 clearly shows that the biggest errors are at the county level, and these errors are lower if DR is applied after ML. Taking into account the years, the highest error is seen in 2021, and the lowest is seen in 2022.

4. Conclusions

This study introduces a novel approach that combines data reconciliation techniques with machine learning. In addition to the already widespread and established methods, this research formalizes a unique analytical approach. In the case of hierarchical systems, machine learning models do not satisfy various constraints arising from hierarchical structures, and the balance equations between hierarchical levels must be satisfied when modeling hierarchical systems. In this study, three different machine learning model development methods were investigated and compared: (1) model building without data reconciliation, (2) model building with reconciled measurement data based on the measurement uncertainty, and (3) model building with direct reconciling of the machine learning model predictions based on the modeling errors of each model.
In this research, each method was examined through three case studies of different complexities. The results show that the fine-tuning of model predictions with optimal data reconciliation helps to improve the accuracy and reliability of model predictions, and all necessary constraints can be satisfied with this technique.
The presented method enables data integration from different measurement techniques, incorporating errors from different methods into the machine learning prediction process. In subsequent developments, this approach will allow us to create specific algorithms that take estimation errors into account and manage them efficiently. By remedying the imbalance, the goal was to develop more accurate, flexible, scalable, and usable models.
The limitations of the method presented in this study are that the hierarchical structure of the system, the quality of the data, and the reliability of the sources must be known precisely, and all the essential limitations of the system must be determined. If the hierarchical structure or constraints of the system change, the model structure must also be updated.

Author Contributions

Conceptualization, P.P.H., J.A. and A.K.; methodology, P.P.H., J.A. and A.K.; use cases P.P.H. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Research, Development and Innoviation Office through project nr. 2019-1.3.1-KK-2019-00015.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to industrial restrictions.

Acknowledgments

This publication/research was supported by the National Research, Development and Innovation Office through project nr. 2019-1.3.1-KK-2019-00015, titled “Establishment of a circular economy-based sustainability competence center at the University of Pannonia”. Supported by the 2024-2.1.1-EKÖP University Research Scholarship Programme of the Ministry for Culture and Innovation from the Source of the National Research, Development and Innovation Fund.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
aelement of the incidence matrix
Aincidence matrix
ARIMAautoregressive integrated moving average
b ˜ stands for the constant values
CACalifornia state
DRdata reconciliation
ϵ error oAuthorsf the predicted value
FT-IRFourier-transform infrared spectroscopy
HTShierarchical time series
HWESHolt–Winters exponential smoothing
Iidentity matrix
kindex of the hierarchical structure level
Klowest level of the hierarchical structure
MLmachine learning
P ^ projection matrix
PLSpartial least squares
R2correlation coefficient
RMSEroot-mean-squared error
Ssummation matrix
ttimestamp
θ parameter of the model
TXTexas state
V−1covariance matrix
WIWashington state
Xindependent variable
ydependent variable
y ^ modeled independent variable
y ˜ reconciled independent variable

References

  1. Spiliotis, E.; Abolghasemi, M.; Hyndman, R.J.; Petropoulos, F.; Assimakopoulos, V. Hierarchical forecast reconciliation with machine learning. Appl. Soft Comput. 2021, 112, 107756. [Google Scholar] [CrossRef]
  2. Athanasopoulos, G.; Gamakumara, P.; Panagiotelis, A.; Hyndman, R.J.; Affan, M. Hierarchical forecasting. In Macroeconomic Forecasting in the Era of Big Data: Theory and Practice; Springer: Cham, Switzerland, 2020; pp. 689–719. [Google Scholar]
  3. Hyndman, R.J.; Ahmed, R.A.; Athanasopoulos, G.; Shang, H.L. Optimal combination forecasts for hierarchical time series. Comput. Stat. Data Anal. 2011, 55, 2579–2589. [Google Scholar] [CrossRef]
  4. Neubauer, L.; Filzmoser, P. Rediscovering Bottom-Up: Effective Forecasting in Temporal Hierarchies. arXiv 2024, arXiv:2407.02367. [Google Scholar]
  5. Jeon, J.; Panagiotelis, A.; Petropoulos, F. Probabilistic forecast reconciliation with applications to wind power and electric load. Eur. J. Oper. Res. 2019, 279, 364–379. [Google Scholar] [CrossRef]
  6. Athanasopoulos, G.; Hyndman, R.J.; Kourentzes, N.; Panagiotelis, A. Forecast reconciliation: A review. Int. J. Forecast. 2023, 40, 430–456. [Google Scholar] [CrossRef]
  7. Hyndman, R.J.; Athanasopoulos, G. Optimally Reconciling Forecasts in a Hierarchy. Foresight Int. J. Appl. Forecast. 2014, 35, 42–48. [Google Scholar]
  8. Panagiotelis, A.; Gamakumara, P.; Athanasopoulos, G.; Hyndman, R. Probabilistic forecast reconciliation: Properties, evaluation and score optimisation. Eur. J. Oper. Res. 2023, 306, 693–706. [Google Scholar] [CrossRef]
  9. Van Erven, T.; Cugliari, J. Game-theoretically optimal reconciliation of contemporaneous hierarchical time series forecasts. In Modeling and Stochastic Learning for Forecasting in High Dimensions; Springer: Cham, Switzerland, 2015; pp. 297–317. [Google Scholar]
  10. Nystrup, P.; Lindström, E.; Pinson, P.; Madsen, H. Temporal hierarchies with autocorrelation for load forecasting. Eur. J. Oper. Res. 2020, 280, 876–888. [Google Scholar] [CrossRef]
  11. Hyndman, R.J.; Lee, A.J.; Wang, E. Fast computation of reconciled forecasts for hierarchical and grouped time series. Comput. Stat. Data Anal. 2016, 97, 16–32. [Google Scholar] [CrossRef]
  12. Leprince, J.; Madsen, H.; Møller, J.K.; Zeiler, W. Hierarchical learning, forecasting coherent spatio-temporal individual and aggregated building loads. arXiv 2023, arXiv:2301.12967. [Google Scholar] [CrossRef]
  13. Taghiyeh, S.; Lengacher, D.C.; Sadeghi, A.H.; Sahebi-Fakhrabad, A.; Handfield, R.B. A novel multi-phase hierarchical forecasting approach with machine learning in supply chain management. Supply Chain. Anal. 2023, 3, 100032. [Google Scholar] [CrossRef]
  14. Ashouri, M.; Hyndman, R.J.; Shmueli, G. Fast forecast reconciliation using linear models. J. Comput. Graph. Stat. 2022, 31, 263–282. [Google Scholar] [CrossRef]
  15. Hanzelik, P.P.; Kummer, A.; Ipkovich, Á.; Abonyi, J. Fusion and integrated correction of chemometrics and machine learning models based on data reconciliation. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2023; Volume 52, pp. 1379–1384. [Google Scholar]
  16. Narasimhan, S.; Jordache, C. Data Reconciliation and Gross Error Detection: An Intelligent Use of Process Data; Elsevier: Amsterdam, The Netherlands, 1999. [Google Scholar]
  17. Godiño, J.A.V.; Aguilar, F.J.J.E. Joint data reconciliation and artificial neural network based modelling: Application to a cogeneration power plant. Appl. Therm. Eng. 2024, 236, 121720. [Google Scholar] [CrossRef]
  18. Dabros, M.; Amrhein, M.; Bonvin, D.; Marison, I.W.; von Stockar, U. Data reconciliation of concentration estimates from mid-infrared and dielectric spectral measurements for improved on-line monitoring of bioprocesses. Biotechnol. Prog. 2009, 25, 578–588. [Google Scholar] [CrossRef] [PubMed]
  19. Bennouna, O.; Heraud, N.; Rodriguez, M.; Camblong, H. Data reconciliation and gross error detection applied to wind power. Proc. Inst. Mech. Eng. Part I J. Syst. Control. Eng. 2007, 221, 497–506. [Google Scholar] [CrossRef]
  20. Narasimhan, S.; Bhatt, N. Deconstructing principal component analysis using a data reconciliation perspective. Comput. Chem. Eng. 2015, 77, 74–84. [Google Scholar] [CrossRef]
  21. Hanzelik, P.P.; Kummer, A.; Abonyi, J. Edge-Computing and Machine-Learning-Based Framework for Software Sensor Development. Sensors 2022, 22, 4268. [Google Scholar] [CrossRef]
  22. Sundaramoorthy, A.S. Probabilistic Graphical Models for Data Reconciliation and Causal Inference in Process Data Analytics. Master’s Thesis, University of Alberta Libraries, Edmonton, AB, Canada, 2021. [Google Scholar]
  23. Balaram, V.; Sawant, S.S. Indicator Minerals, Pathfinder Elements, and Portable Analytical Instruments in Mineral Exploration Studies. Minerals 2022, 12, 394. [Google Scholar] [CrossRef]
  24. Hanzelik, P.P.; Gergely, S.; Gáspár, C.; Győry, L. Machine learning methods to predict solubilities of rock samples. J. Chemom. 2020, 34, e3198. [Google Scholar] [CrossRef]
  25. Xu, Z.; Cornilsen, B.C.; Popko, D.C.; Pennington, W.D.; Wood, J.R.; Hwang, J.Y. Quantitative mineral analysis by FTIR spectroscopy. Internet J. Vib. Spectrosc 2001, 5. Available online: https://www.irdg.org/ijvs/ijvs-volume-5-edition-1/quantitative-mineral-analysis-by-ftir-spectroscopy (accessed on 31 October 2024).
  26. Raven, M.D.; Self, P. Outcomes of 12 Years of the Reynolds Cup Quantitative Mineral Analysis Round Robin. Clays Clay Miner. 2017, 65, 122–134. [Google Scholar] [CrossRef]
  27. Motoso, O.; McCarty, D.; Hillier, S.; Kleeberg, R. Some successful approaches to quantitative mineral analysis as revealed by the Reynolds Cup contest. Clays Clay Miner. 2006, 54, 748–760. [Google Scholar] [CrossRef]
  28. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M5 competition: Background, organization, and implementation. Int. J. Forecast. 2022, 38, 1325–1336. [Google Scholar] [CrossRef]
  29. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. M5 accuracy competition: Results, findings, and conclusions. Int. J. Forecast. 2022, 38, 1346–1364. [Google Scholar] [CrossRef]
  30. Zavíralová, L.; Šomplák, R.; Pavlas, M.; Kropác, J.; Popela, P.; Putna, O.; Gregor, J. Computational system for simulation and forecasting in waste management incomplete data problems. Chem. Eng. Trans. 2015, 45, 763–768. [Google Scholar]
  31. De-la Mata-Moratilla, S.; Gutierrez-Martinez, J.M.; Castillo-Martinez, A.; Caro-Alvaro, S. Prediction of the Behaviour from Discharge Points for Solid Waste Management. Mach. Learn. Knowl. Extr. 2024, 6, 1389–1412. [Google Scholar] [CrossRef]
  32. Eryganov, I.; Roseckỳ, M.; Šomplák, R.; Smejkalová, V. Forecasting the waste production hierarchical time series with correlation structure. Optim. Eng. 2024, 1–23. [Google Scholar] [CrossRef]
  33. Pavlas, M.; Šomplák, R.; Smejkalová, V.; Nevrlý, V.; Zavíralová, L.; Kůdela, J.; Popela, P. Spatially distributed production data for supply chain models-Forecasting with hazardous waste. J. Clean. Prod. 2017, 161, 1317–1328. [Google Scholar] [CrossRef]
  34. Kalekar, P.S. Time series forecasting using holt-winters exponential smoothing. Kanwal Rekhi Sch. Inf. Technol. 2004, 4329008, 1–13. [Google Scholar]
Figure 1. Schematic diagram of the three-level hierarchical system of the variable y. Level 3 contains all the target variable models; it is the level with the most detailed predictions, so the relationships of the third-level predictions are built into level 2, and the constraints of levels 2 and 3 are built into level 1. Even at level 0, the total prediction constraints determine the y value. Among the subscripts of y, the first, t, is the time, and the second, j = 1 q p k , is the number of child nodes at the given level. Among the superscripts, the first, k = 1 K , is the number of hierarchical levels, and the second, p, denotes the number of parent nodes.
Figure 1. Schematic diagram of the three-level hierarchical system of the variable y. Level 3 contains all the target variable models; it is the level with the most detailed predictions, so the relationships of the third-level predictions are built into level 2, and the constraints of levels 2 and 3 are built into level 1. Even at level 0, the total prediction constraints determine the y value. Among the subscripts of y, the first, t, is the time, and the second, j = 1 q p k , is the number of child nodes at the given level. Among the superscripts, the first, k = 1 K , is the number of hierarchical levels, and the second, p, denotes the number of parent nodes.
Make 06 00125 g001
Figure 2. The three different applications of this methodology; the cyan box shows the input data, the blue boxes show the ML model predictions, the gray boxes show the uncertainty representation for DR, the orange boxes show the step of performing DR, and the green boxes contain the outputs of the methods.
Figure 2. The three different applications of this methodology; the cyan box shows the input data, the blue boxes show the ML model predictions, the gray boxes show the uncertainty representation for DR, the orange boxes show the step of performing DR, and the green boxes contain the outputs of the methods.
Make 06 00125 g002
Figure 3. Schematic hierarchical diagram of the rock case study, and in this case study, the t index is the number of samples, not a timestamp.
Figure 3. Schematic hierarchical diagram of the rock case study, and in this case study, the t index is the number of samples, not a timestamp.
Make 06 00125 g003
Figure 4. Modeling results of the three methods in predicting the mineral composition of rock samples evaluated on the test dataset for each element.
Figure 4. Modeling results of the three methods in predicting the mineral composition of rock samples evaluated on the test dataset for each element.
Make 06 00125 g004
Figure 5. Schematic diagram of the two-level hierarchical structure’s target variable ( y ) in the M5 forecast—an accuracy competition. In the hierarchical levels, level 0 is the country level, level 1 is the state level, and the level 2 is the shop level.
Figure 5. Schematic diagram of the two-level hierarchical structure’s target variable ( y ) in the M5 forecast—an accuracy competition. In the hierarchical levels, level 0 is the country level, level 1 is the state level, and the level 2 is the shop level.
Make 06 00125 g005
Figure 6. Results of the M5 competition and our own DR analytical solution compared based on RMSE values at different hierarchical levels.
Figure 6. Results of the M5 competition and our own DR analytical solution compared based on RMSE values at different hierarchical levels.
Make 06 00125 g006
Figure 7. Results of ML prediction and our own DR analytical solution using data on the amount of solid waste in Hungary, compared based on R M S E values at different hierarchical levels.
Figure 7. Results of ML prediction and our own DR analytical solution using data on the amount of solid waste in Hungary, compared based on R M S E values at different hierarchical levels.
Make 06 00125 g007
Table 1. Results of the XRD measurements for the 2006 Reynolds cup. * The results contain the average result of the first three places of the 2006 Reynolds cup (RC) 3-2 sample on the kidney [27]. ** SD: standard deviation.
Table 1. Results of the XRD measurements for the 2006 Reynolds cup. * The results contain the average result of the first three places of the 2006 Reynolds cup (RC) 3-2 sample on the kidney [27]. ** SD: standard deviation.
ActualSubmitted *SD **
Quartz29.930.670.77
K-feldpar and Plagioclase8.68.00.6
Calcite4.64.470.13
Kaolinite15.015.70.7
Total clay without Kaolinite20.219.171.03
Dolomite, Magnesite, Hematite Aragonite, Fluorite, Apatite …21.721.730.03
Table 2. Prediction results of the first, second, and third methods for the rock case study.
Table 2. Prediction results of the first, second, and third methods for the rock case study.
level 0Total
y_real100
y ^ _1st103.24
y ˜ _2nd102.88
y ˜ _3rd100
level 1SilicateCarbonate
y_real7921
y ^ _1st66.5536.69
y ˜ _2nd66.2036.68
y ˜ _3rd65.2234.78
level 2QuartzK-feldp. and Plagio.Kaolin.Clay without Kaolin.CalciteDolom. Magne. Hemat. …
y_real51142121110
y ^ _1st37.5712.811.6712.1919.3214.33
y ˜ _2nd37.5612.731.6012.7219.3614.02
y ˜ _3rd37.7013.042.1612.3219.7815.00
Table 3. Prediction results of the first and third methods in the sales case study.
Table 3. Prediction results of the first and third methods in the sales case study.
level 0Country
y_real22.0
y ^ _1st14.102
y ˜ _3rd13.938
level 1CATXWI
y_real11.01.010.0
y ^ _1st6.0341.5694.416
y ˜ _3rd6.2782.1245.536
level 2CA_1CA_2CA_3CA_4TX_1TX_2TX_3WI_1WI_2WI_3
y_real2.04.01.04.00.01.00.01.03.06.0
y ^ _1st1.6561.3141.6850.4320.2440.0520.6562.6271.8721.885
y ˜ _3rd1,7291.3761.8791.2940.4580.5441.1222.4421.5011.593
Table 4. Summary of the forecast results of the first and third routes of a selected settlement, as well as its district and county levels, in the case study concerning the amount of solid waste in Hungary in 2020, 2021, 2022 (tons).
Table 4. Summary of the forecast results of the first and third routes of a selected settlement, as well as its district and county levels, in the case study concerning the amount of solid waste in Hungary in 2020, 2021, 2022 (tons).
202020212022
level 0Total
y _ r e a l 3,301,482.33,350,245.03,215,457.4
y ^ _1st3,254,760.83,256,817.83,258,874.8
y ˜ _3rd3,254,503.53,256,560.63,258,617.7
level 1County
y _ r e a l 119,844.3120,933.9116,180.1
y ^ _1st116,495.1117,577.4118,659.6
y ˜ _3rd121,780.4122,861.6123,942.9
level 2District
y _ r e a l 15,558.716,903.215,751.4
y ^ _1st14,640.515,013.515,386.6
y ˜ _3rd14,767.715,140.115,512.5
level 3Settlement
y _ r e a l 1341.71427.41355.0
y ^ _1st1037.71031.71025.7
y ˜ _3rd1038.31032.31026.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hanzelik, P.P.; Kummer, A.; Abonyi, J. Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Mach. Learn. Knowl. Extr. 2024, 6, 2601-2617. https://doi.org/10.3390/make6040125

AMA Style

Hanzelik PP, Kummer A, Abonyi J. Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Machine Learning and Knowledge Extraction. 2024; 6(4):2601-2617. https://doi.org/10.3390/make6040125

Chicago/Turabian Style

Hanzelik, Pál Péter, Alex Kummer, and János Abonyi. 2024. "Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models" Machine Learning and Knowledge Extraction 6, no. 4: 2601-2617. https://doi.org/10.3390/make6040125

APA Style

Hanzelik, P. P., Kummer, A., & Abonyi, J. (2024). Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Machine Learning and Knowledge Extraction, 6(4), 2601-2617. https://doi.org/10.3390/make6040125

Article Metrics

Back to TopTop