1. Introduction
Energy conservation and emission reduction have recently received a lot of attention in the context of the increase in energy consumption [
1]. The prediction of heating and cooling loads could lead to a rational use of renewable energy to replace thermal energy systems which are based on fossil fuel. Moreover, reducing heating and cooling loads can help to decarbonize the building sector [
2]. Therefore, the application of machine learning techniques has become important in the context of heating and cooling load prediction. Even though many living buildings are usually equipped with electricity metrics which provide almost complete statistics, the data are often not enough to develop complex machine learning models for the prediction of heating and cooling loads. Information about characteristics which are particular to each building, such as the wall area, the surface area, and the roof area, can complement the metrics data to develop algorithms which are more accurate and which better address the particularities of buildings.
Buildings are responsible for approximately 40% of total global energy consumption [
3]. The improvement of energy efficiency and the conservation of more energy has become essential in recent years [
4,
5] due to the adverse effects of high-energy consumption on the environment. The estimation of heating and cooling loads depends on the characteristics of the structure. To construct energy-efficient buildings, it is helpful to develop conceptual systems that anticipate the cooling load in the residential building sector [
6].
Since energy resources present limitations and have an important role in the economic development of countries, reduction in energy consumption represents a necessity [
7,
8]. The modeling of heating and cooling loads represents the cornerstone of energy-efficient building design. Furthermore, as stated in [
9], energy efficiency leads to both environmental and financial benefits [
10]. Moreover, energy efficiency directly impacts economic competitiveness and sustainable development [
11]. These facts underscore the significance of the work presented in the current manuscript.
The accurate prediction of energy consumption and the determination of the factors that influence heating energy consumption are important [
12] in the context of the substantial increase in energy consumption in the case of residential buildings [
13]. Therefore, the use of advanced machine learning algorithms for energy consumption prediction presents great interest for researchers. As can be seen in [
14], methods based on machine learning for the prediction of energy consumption have advanced significantly in recent years. The development and examination of machine learning algorithms which can learn from patterns in the data and make predictions presented a lot of interest for many scholars and scientists [
15].
Several statistics, such as the ones presented by the authors of [
16], estimate that by 2040, there will be an increase of nearly 25% in global energy demand. Also, to create a sustainable and healthy economy, it is important to measure the economic and environmental effects of energy production [
17].
The optimization of building energy prediction represents an important research area because of its potential to improve the efficiency of energy management systems [
18]. Many studies have shown that the air conditioning system consumes up to 38% of the total energy in the building sector [
19,
20].
As can be seen, the energy efficiency research domain will present great interest to the research community in the years to come, and the application of novel machine learning and artificial intelligence techniques can lead to a significant improvement in the existing techniques used for energy consumption prediction.
The solution presented in the manuscript is specific to the heating and cooling load prediction optimization problem, as it aims to improve the prediction results for two objectives, namely the heating load and the cooling load. Two particularities of the proposed solution were the consideration of three machine learning algorithms which had common hyperparameters and returned good predictions for energy data characterized by a small number of features and the development of an objective function which considered a 10-fold cross-validation and averaged the heating and cooling prediction results.
The main contributions of the work presented in this paper are as follows:
- (1)
A critical review of the application of machine learning methods for the prediction of heating and cooling loads;
- (2)
The introduction of a novel algorithm called the Multi-Objective Plum Tree Algorithm (MOPTA) which adapts the original Plum Tree Algorithm [
21] to multi-objective optimization problems;
- (3)
The ranking of the solutions using the MOORA method [
22];
- (4)
The adaptation of the MOPTA to the hyperparameter optimization and the optimal regressor selection for a machine learning methodology used to predict heating and cooling loads, using the Energy Efficiency Dataset of the UCI Machine Learning Repository as experimental support [
23,
24];
- (5)
The development of an objective function that considers the averages of the heating and cooling RMSE results;
- (6)
The comparison and validation of the obtained results with the ones obtained by the Multi-Objective Grey Wolf Optimizer (MOGWO) [
25], Multi-Objective Particle Swarm Optimization (MOPSO) [
26], and NSGA-II [
27].
The manuscript is structured as follows:
Section 2 presents the research background,
Section 3 presents the MOPTA-based machine learning methodology for the optimization of heating and cooling load prediction,
Section 4 presents the results,
Section 5 compares the obtained results with the ones from previous studies, and
Section 6 shows the conclusions.
2. Research Background
This section reviews representative recent studies that considered the application of machine learning for the prediction of heating and cooling loads.
In [
28], the authors approached the prediction of the energy efficiency of buildings using machine learning techniques considering a small data approach. The method proposed by them considered the Support Vector Regression and the K-means algorithms. The prediction of heating and cooling parameters was considered as two separate tasks. The dataset was split at 75%:25% and the metrics used to evaluate the performance were mean square error and mean absolute error. Their proposed method was better in terms of mean square error and mean absolute error than other methods, such as the classical Support Vector Regression with rbf kernel.
The review from [
29] presented a selection of representative studies that used data-driven techniques, such as machine learning and artificial intelligence, for the prediction of the cooling and heating loads of residential buildings. The review considered various techniques, such as ensemble learning, Artificial Neural Network, Support Vector Machines, probabilistic models, and statistical models. As support for their experiments, the review also considered recent studies that used the same experimental dataset as the one used in our manuscript. For example, the approach presented in [
30] used an ensemble machine learning model based on three Random Forest models that achieved 0.999
for the heating load prediction and 0.997
for the cooling load prediction using a 10-fold cross-validation approach. On the other hand, the authors of [
31] used an approach based on the Tri-Layered Neural Network and Maximum Relevance Minimum Redundancy that led to 0.289 mean absolute error for heating load and 0.535 mean absolute error for cooling load, respectively.
The approach presented in [
32] considered a novel method for energy consumption estimation using Support Vector Machine and Random Forest. The Owl Search Algorithm [
33] was used to improve the performance of these two algorithms. The root mean square error values returned by the approaches based on Support Vector Machine and Random Forest were 0.85 and 1.29 for heating and 1.02 and 1.65 for cooling, respectively.
The approach presented in [
34] compared four algorithms, namely, the Linear Regression, the Decision Tree, the Random Forest, and the XGBoost. Like our approach, the data were split randomly into 80% training data and 20% testing data. The hyperparameters were optimized using Bayesian Optimization [
35]. The best results in terms of root mean square error for the testing data were obtained by the XGBoost algorithm, as follows: 0.3797 for the heating load and 0.7578 for the cooling load.
The mean square error results obtained by the authors of [
36] were 0.201 for the heating load prediction and 2.56 for the cooling load forecast. Another approach—for example, the one presented in [
37], which used the Multilayer Perceptron and Support Vector Regression algorithms—returned 0.4832 and 0.8853 root mean square error values for the heating load prediction and 2.626 and 1.7389 root mean square error values for the cooling load prediction. On the other hand, the approach presented in [
38] based on the Gated Recurrent Unit returned the 0.0166 and 0.0247 root mean square error values for the heating and cooling load predictions, respectively, when hold-out was used, and 0.01 root mean square error values for both heating and cooling load predictions when 10-fold validation was used.
The authors of [
39] considered an approach based on a Multi-Objective Optimization method for the tuning of the hyperparameters of a Random Forest model used for the prediction of heating and the cooling loads. The two objectives that were optimized were the averages of the heating and cooling load prediction. Compared to their approach, our method also predicts which regressor to use as part of the multi-objective optimization process.
3. Multi-Objective Plum Tree Algorithm (MOPTA) Machine Learning Methodology for Heating and Cooling Load Prediction
The original version of the Plum Tree Algorithm was introduced in [
21] with the following sources of inspiration:
The plum trees flowering at the beginning of spring;
The transformation of the flowers, which are pollinated into plums;
The dropping of a percentage of the plums before maturity due to various reasons;
The continuity of the lives of the plums after the harvest for a couple of weeks.
The PTA presents similarities with other bio-inspired algorithms, such as Chicken Swarm Optimization [
40], Particle Swarm Optimization, Grey Wolf Optimizer [
41], and Crow Search Algorithm [
42], which influenced how particular mathematical parts of the algorithm were modeled.
Table 1 summarizes the PTA’s configurable parameters.
The PTA starts with the initialization of
flowers in a search space with
dimensions, such that the values are selected randomly from the range
:
Then,
plums are initialized with the value of the flowers:
The is used to calculate the fitness values of the flowers and of the plums. The is set to the position of the plum that has the best fitness value.
Then, the PTA runs the following instructions times.
At the beginning of each iteration, the positions of the following two plums are computed:
For each , where is the iteration number and , a random number from the range is selected. Three cases, one for each phase, are considered further:
In this case, the positions of the flowers are updated using the following formula:
where
is a random number from
.
The following formula is used to update the positions of the flowers in this case:
where
and
are random numbers from
.
The positions of the flowers are updated as follows:
where
is a Gaussian distribution with mean
and standard deviation
defined as follows:
Then, the positions of the flowers are updated to be in . For each
If then ;
If then .
After the positions of the flowers are updated, for each
, where
is the iteration number and
, the following formula is used:
At the end of each iteration, the position of is updated to the position of the plum with the best fitness.
Finally, when all iterations are completed, the PTA returns the value of the .
3.1. Heating and Cooling Load Prediction
The Energy Efficiency Dataset was split randomly using a 5-fold cross-validation. For each split out of the five splits, the testing data were represented by one different fold, while the training data were represented by the remaining folds. The training data were standardized using the Z-score for the values of each column, while the testing data were standardized using the mean and the standard deviation values, which were computed for the training data.
The algorithms that were used were the Extra Trees Regressor (ETR), the Gradient Boosting Regressor (GBR), and the Random Forest Regressor (RFR). The regressors were configured with the hyperparameter values described by the plums. The eight metrics which were used to evaluate the results were the averages of the , , , and across the 5 folds for heating and cooling load prediction.
3.2. MOPTA Multi-Objective Fitness Function
Each position of a plum corresponds to an algorithm and its hyperparameter configuration. To apply the multi-objective fitness function to a plum, it is necessary to convert the values that describe the position of the plum to integers first. This is done using the floor function, which takes a real value as input and returns the greatest integer value that is less than or equal to it.
Figure 1 illustrates the high-level view of the multi-objective fitness function.
Compared to the approach presented in [
39], we did not consider the bootstrap parameter, as we aimed to use a set of hyperparameters that can be configured for all algorithms. However, compared to that approach, we added a new dimension that describes the algorithm, namely GBR, RFR, or ETR, such that 0 corresponds to GBR, 1 to RFR, and 2 to ETR, respectively.
The inputs of the fitness function are the converted position of the plum and the train data. The first dimension describes the algorithm, while the other five dimensions describe the values of the hyperparameters.
We performed a 10-fold cross-validation on the train data, and we computed the average RMSE values. In each partition, the test data are represented by one fold, and the train data are represented by the other nine folds. The configured selected algorithm was applied twice in each partition, depending on the prediction type. The first time, it was used to predict the heating load, while the second time, it was used to predict the cooling load.
The output of the fitness function is represented by the following two values:
The average Heating RMSE, denoted as ;
The average Cooling RMSE, denoted as .
The MOPTA aims to obtain minimal values for both values.
3.3. Multi-Objective Adaptations of PTA
The multi-objective adaptations of the PTA considered in this manuscript are similar to the ones we used in [
25], and they are based on the method presented in [
43]. The major adaptations introduced by the MOPTA are the application of an external archive for the saving and retrieval of the solutions that are pareto-optimal and the use of this archive for obtaining the values for the ripe and the unripe plums.
The dominance relations between and are defined as follows:
- (1)
If and and at least one of the relations and is true, then dominates ;
- (2)
If and and at least one of the relations and is true, then dominates ;
- (3)
If neither (1) nor (2) is true, then and are non-dominated.
A set that contains two solutions is non-dominated if neither solution dominates the other one.
3.3.1. Plum Matrix Grid Computation
Figure 2 presents the methodology for the computation of the plum matrix grid.
The input is represented by a set of plums , where is the total number of plums.
step 1: The costs of the plums are computed using the multi-objective fitness function. The result is the following matrix:
step 2: The minimum and the maximum cost values of
are computed using the
matrix and the grid’s inflation parameter (
):
step 3: Similarly, the minimum and the maximum cost values of
are computed as follows:
step 4: The plum matrix was defined using the number of grids such that the -axis presents the endpoints of the minimization objective, and the -axis presents the endpoints of the minimization objective.
step 5: The formulas that are used for the computation of the index of
with the cost
such that
are as follows:
The output is represented by the plum indices set .
3.3.2. Plum Selection Methodology
The ripe and unripe plums were selected using the archive of plums . The selection of these two plums was performed at each iteration of the MOPTA for each plum.
Figure 3 presents a high-level view of the methodology for the plum selection.
The set
of occupied indices for
is calculated as follows:
such that the function
converts the numbers received as input into a list of unique numbers sorted in increasing order. Suppose that there are
cells that are occupied, and each one of them is defined by the cell index
. Then:
The following vector that stores the cell count of each plum is defined:
such that for each
, the value
represents how many plums are present at the location
.
Then, a random number
is selected from
using a roulette wheel selection mechanism defined by the following:
The set
was defined by the formula:
such that
, where
is the archive size, was used to select the ripe or the unripe plum, randomly considering the uniform probability.
3.3.3. Plum Removal Methodology
The size of
was adjusted during each iteration of the algorithm if it was greater than the maximum archive size (
).
Figure 4 presents a high-level view of the methodology for the removal of the plums.
Therefore, a number of plums were removed from the archive using the steps similar to the ones presented in the plum selection methodology, with the following two adaptations:
3.3.4. MOPTA for Heating and Cooling Prediction
Algorithm 1 presents the MOPTA for heating and cooling prediction.
Algorithm 1 MOPTA for Heating and Cooling Prediction |
1: |
|
2: |
|
3: | ; |
4: | flowers; |
5: | to arrays of integers; |
6: | ; |
7: |
|
8: |
|
9: |
|
10: | do |
11: | do |
12: | ; |
13: | ; |
14: | ; |
15: | ; |
16: | ; |
17: | end for |
18: | do |
19: | to arrays of integers; |
20: | ; |
21: | then |
22: | ; |
23: | ; |
24: | end if |
25: | end for |
26: |
|
27: |
|
28: | ; |
29: |
|
30: |
|
31: |
|
32: |
|
33: |
|
34: | end for |
35: | ; |
The input parameters of the MOPTA consist of the standard input parameters of the PTA, which are presented in
Table 1, and the following additional parameters:
—the maximum archive size of the repository that contains the nondominated solutions;
—the number of grids per objective;
—the grid’s inflation parameter;
—the pressure parameter used during the plum selection;
—the plum selection pressure parameter used during the plum removal.
The is a multi-objective function that returns the average values of the and of the ML algorithm trained and validated according to the position of the plum. The range was adapted such that the first dimension describes the selected regressor, while the other dimensions describe the limits of the hyperparameters considered in the training of the ML algorithm.
The output of the MOPTA is , which consists of the non-dominant plums after iterations.
The
flowers were initialized with random values from
in the
-dimensional search space (line 3), while the
plums were initialized to the positions of the
flowers in line 4. Then, both the positions of the plums and of the flowers were adapted to arrays of integers in line 5 using the floor function. The multi-objective fitness function OF presented in
Section 3.2 was applied to each
(line 6).
Using the conditions presented at the beginning of
Section 3.3, the dominance relation was determined in line 7 of the algorithm. The
of non-dominated plums was created in line 8. Then, using the steps presented in
Section 3.3.1, the grid matrix and the indices of the plums were computed.
The instructions from lines 11–33 were repeated for
iterations. For each
, the instructions from lines 12–16 were performed. The current size of the archive
was updated in line 12 to the total number of plums from
. The values of
and
were computed in line 13 using the values of
,
,
and the plum selection methodology presented in
Section 3.3.2.
Initially, the ripe and the unripe plums were selected randomly from
. If
, then
was selected from
following the steps from
Section 3.3.2.
The positions
were updated in line 15 using Equations (3)–(6) for the three phases: fruitiness phase, ripeness phase, and storeness phase. The equations for the storeness phase were adapted for the multi-objective optimization using a procedure adapted after the one from [
44]. Equation (6) was adapted to the equation:
such that the function
was defined as follows:
where
.
Then, the positions of the flowers were updated to be in (line 16).
The instructions from (lines 19–24) were performed for each . First, the and the corresponding were updated to arrays of integers using the floor function (line 19). Then, the was used to compute the fitness values and of and , respectively (line 20). If dominated then the position and the fitness value were updated to and , respectively.
The plum dominance relation was determined again in line 26. The non-dominated plums were computed in line 27. Then, in line 28, the plums from were appended to (line 29) and the were updated in line 30.
The matrix
was computed in line 31, while the value
was computed in line 32. If the value
was greater than
, then
plums were removed from the archive, according to the methodology presented in
Section 3.3.3.
Finally, the MOPTA returned the as output in line 35.
3.3.5. Solution Ranking Using MOORA
The solutions which were returned by the MOPTA were ranked using an adaptation of MOORA [
22,
45].
The matrix
was defined as follows:
where
is the size of the plums archive and
represent the
values predicted by the model trained according to the position of the
-th plum, where
.
The values of
were normalized as follows:
The MOORA scores of the plums from the archive of size
were finally computed as follows:
The most dominant plum was the one with the lowest MOORA score.
3.3.6. MOPTA Methodology for Heating and Cooling Prediction
Figure 5 presents the high-level view of the MOPTA methodology which was used for the prediction of the heating and cooling loads.
The input of the methodology was represented by the Energy Efficiency Dataset. The data were split into Training Data and Testing Data considering a 5-Fold Cross-Validation approach, such that five splits were performed. Each time, one fold was used for testing and the remaining ones for training. Then, the Standardized Training Data and the Standardized Testing Data were obtained. The MOPTA was run using the Standardized Training Data as input, with a 10-fold cross-validation to evaluate the plums. The archive returned by the algorithm was evaluated using MOORA, and the plum with the best MOORA score was further considered to evaluate the predictions. The predictions were evaluated using the , , , and metrics.
4. Results
The experiments were performed in Python version 3.12.3 using the sklearn library on a machine with the following properties:
All the computations were CPU-based.
4.1. Energy Efficiency Dataset
The Energy Efficiency Dataset used in the experiments was characterized by 768 samples, eight attributes, and two responses. The dataset was obtained considering 12 building shapes, simulated in Ecotect.
Table 2 presents the summary of the features.
The dataset was split randomly into five folds of an approximately equal size, such that the Testing Data were represented by one of the folds while the Training Data were represented by the other four folds.
4.2. Hyperparameters Configuration
Table 3 presents the ranges of the hyperparameters used in the experiments. The values were inspired by the ones used by the authors of [
39].
4.3. MOPTA Configuration Parameters
Table 4 presents the MOPTA configuration parameters used in our experiments.
As a remark, in the case of
the table also adds the value 1 to the upper limit since the search space is represented by continuous values. However, if the upper limit is obtained, then the value 1 is subtracted from that value.
Figure 6 presents this adjusting transformation more clearly for the first dimension.
As can be seen in the figure, the values from are adjusted to , the values from are adjusted to , and the values from are adjusted to , respectively.
4.4. MOPTA Prediction Results
Table 5 presents the results obtained by the MOPTA for each of the five folds and the mean results.
As can be seen in the table, in the case of the MAPE metric, the result was negative for Fold 4. The negative value is justified by the fact that after the standardization operation, the labels had both positive and negative values. In all five cases, the selected algorithm was the GBR.
4.5. Comparison to the Prediction Results Obtained Using the Default Parameters
Table 6 compares the results obtained by the MOPTA approach to the ones obtained by each of the algorithms GBR, RFR, and ETR when the default values were used. We considered these three algorithms in the comparison because they are used by the MOPTA as part of the optimization process. Moreover, the best solution returned by the MOPTA describes which of the three algorithms is applied and the optimal values of the hyperparameters. Basically, in this table, each MOPTA result corresponds to one of the three algorithms, depending on the value of the first dimension of the best plum, tuned according to the values of the remaining dimensions of the best plum. To get reproducible results, each of the algorithms was initialized with a random_state equal to 42.
The obtained results show that the MOPTA RMSE results were better than the ones returned by the GBR, the RFR, and the ETR, both for the heating predictions and the cooling predictions in all cases.
4.6. Comparison to Other Multi-Objective Optimization Approaches
The results obtained using the MOPTA were compared to the ones obtained by the MOGWO, the MOPSO, and the NSGA-II. We considered these algorithm methods in the comparison because a part of the mathematical equations of the Plum Tree Algorithm were inspired by the Grey Wolf Optimizer and the Particle Swarm Optimization, while the Genetic Algorithms, which are at the base of the NSGA-II, are one of the most popular evolutionary algorithms. Moreover, the Particle Swarm Optimization and the Grey Wolf Optimizer are some of the most popular swarm intelligence algorithms. Therefore, we considered the multi-objective implementations of these three benchmark algorithms to validate our results. In the case of the MOGWO algorithm, we used the implementation from our previous work [
25], as it was used for this type of problem. The MOPSO and NSGA-II were also used in [
25] to validate our results. For NSGA-II, we considered the implementation from the DEAP (Distributed Evolutionary Algorithms in Python) framework [
46].
Some of the configuration parameters have the same values as in the case of the MOPTA, while other parameters were specific to each algorithm. The common configuration parameters for all four algorithms were the number of iterations , the number of dimensions , the population size , the minimum and the maximum position values and , and the objective function .
The configuration parameters that were common to the MOPTA, the MOGWO, and the MOPSO were the maximum archive size , the number of grids , the grid’s inflation parameter , the pressure parameter , and the selection pressure parameter .
Table 7 presents the specific configuration parameter values for each algorithm.
Table 8 compares the MOPTA results to the ones obtained by the other three multi-objective optimization algorithms.
The MOGWO returned the best RMSE for cooling for Fold 1 and Fold 2 and the best RMSE for heating for Fold 2 and Fold 3. The MOPSO returned the best RMSE for heating for Fold 1 and the best RMSE for cooling for Fold 4. The NSGA-II returned the best RMSE for cooling for Fold 4. The MOPTA returned the best RMSE for heating for all folds except for Fold 1, and the best RMSE for cooling for Fold 1, Fold 2, and Fold 5. Also, the MOPTA obtained the best mean RMSE values both for the heating predictions and for the cooling predictions. Another remark is that all of the multi-objective algorithms selected the GBR, even though, as can be seen in
Table 6, the GBR does not always return the best RMSE results when compared to RFR and ETR when the default parameter values are used.
Table 9 describes how many times each algorithm was the best with respect to the five folds.
We can see that in seven cases, the MOPTA was the best. The second algorithm, the MOGWO, was the best in only four cases.
4.7. Computational Load Analysis
This section presents a computational load analysis of the algorithms used in our experiments from the perspective of the running time.
Table 10 summarizes the total running time expressed in milliseconds for each algorithm across all five folds.
As can be seen in the table, the GBR, the RFR, and the ETR had the best running time. The running time of the MOPTA, which was approximately 19.5 h, was almost double compared to the running time of the MOGWO and MOPSO algorithms. The running time of the NSGA-II was almost 12 times better than the one of the MOPTA.
However, we also want to point out that a grid search which searches through all combinations of hyperparameters, namely , and which would need an average of around 200 ms per experiment, a value which is slightly less than that of the GBR algorithm when it is tuned with the default parameters (e.g., 297 ms), would need around 15,764,112,000 ms to complete, or around 182 days. With respect to these remarks, we can conclude that the MOPTA has a much better running time compared to the running time of the standard grid search.
4.8. Robustness and Convergence Analysis
In this section, we discuss the robustness and the convergence of the MOPTA, and we compare the results to the ones corresponding to the MOGWO, the MOPSO, and the NSGA-II.
To compute the robustness values, we consider the heating and cooling RMSE results obtained for each fold and calculate the standard deviation.
Table 11 summarizes the comparison of the standard deviation (std) values obtained for each multi-objective optimization algorithm.
If we are to consider that a lower variability means a better robustness value, then the MOPSO returned the best result for the heating standard deviation, while the MOPTA returned the best result for the cooling standard deviation.
For the analysis of the convergence of the MOPTA, we identified the first iteration, which returned the best value for each fold. We performed similar calculations for the other multi-objective algorithms.
Table 12 summarizes the convergence analysis results.
The MOPTA converged relatively fast compared to the other algorithms except for Fold 5, where it obtained the best result in Iteration 20. In the case of the other algorithms, the best results were obtained after more iterations, except for the PSO and Fold 5, when the best result was obtained in Iteration 2.
5. Discussions
This section compares our results to the ones obtained by recent studies in the literature.
Table 13 presents a summary of the result comparison building upon the comparison results presented in [
21]. The articles presented in the table were selected so that a 5-fold cross-validation was used, and the models performed two predictions, one for the heating load and the other one for the cooling load.
The results presented in [
47,
48] are better than the ones presented in [
50], but they are not directly comparable because no standardization was performed. For articles [
31,
49,
50], we presented only the best values. In the case of article [
34], which used a 5-fold cross-validation like our approach, we presented only the best results. Even though the results from [
34] are not directly comparable, as we also used standardization, they were better than the ones from [
31]. However, the root mean square error results of the current approach based on the Multi-Objective Plum Tree Algorithm were significantly better than the results obtained by the Plum Tree Algorithm-based ensemble.
Even though they are not directly comparable to our results because of different pre-processing configurations or cross-validation settings, the recent studies based on the latest deep learning methods returned promising results. For example, the approach presented in [
51] based on deep neural networks returned a root mean square error equal to 0.0137. However, compared to our approach, the problem was converted into an image processing problem by transforming the data into image datasets. Rounding the heating and cooling load values to the closest integer, the issue was converted into a multi-class classification issue.