Next Article in Journal
Seismic Vulnerability and Consolidation by FRP/FRCM Systems of a Masonry School Building in the District of Naples
Next Article in Special Issue
Computational Optimization of 3D-Printed Concrete Walls for Improved Building Thermal Performance
Previous Article in Journal
Quantification of the Transversal Fiber Strand Stiffness of Textiles Used in Textile-Reinforced Concrete via Shore Hardness
Previous Article in Special Issue
Climate Change Performance of nZEB Buildings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Based Prediction of the Energy Consumption of Residential Buildings in Oshawa

1
School of Environment and Architecture, University of Shanghai for Science and Technology, Shanghai 200093, China
2
Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L1G 0C5, Canada
3
Faculty of Architecture, Building and Planning, The University of Melbourne, Melbourne 3010, Australia
4
School of Engineering, RMIT University, Melbourne 3000, Australia
*
Authors to whom correspondence should be addressed.
Buildings 2022, 12(11), 2039; https://doi.org/10.3390/buildings12112039
Submission received: 13 October 2022 / Revised: 4 November 2022 / Accepted: 18 November 2022 / Published: 21 November 2022
(This article belongs to the Special Issue Building Energy-Saving Technology)

Abstract

:
Buildings consume about 40% of the global energy. Building energy consumption is affected by multiple factors, including building physical properties, performance of the mechanical system, and occupants’ activities. The prediction of building energy consumption is very complicated in actual practice. Accurate and fast prediction of the building energy consumption is very important in building design optimization and sustainable energy development. This paper evaluates 24 energy consumption models for 83 houses in Oshawa, Canada. The energy consumption, social and demographic information of the occupants, and the physical properties of the houses were collected through smart metering, a phone survey, and an energy audit. A total of 63 variables were determined, and based on the variable importance, three groups with different numbers of variables were selected, i.e., 26, 12, and 6 for electricity consumption; and 26, 13, and 6 for gas consumption. A total of eight data-driven algorithms, namely Multiple Linear Regression (MLR), Stepwise Regression (SR), Support Vector Machine (SVM), Backpropagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFN), Classification and Regression Tree (CART), Chi-Square Automatic Interaction Detector (CHAID), and Exhaustive CHAID (ECHAID), were used to develop energy prediction models. The results show that the BPNN model has the best accuracies in predicting both the annual electricity consumption and gas consumption, with mean absolute percentage errors (MAPEs) of 0.94% and 0.94% for training and validation data for electricity consumption, and 2.63% and 0.16% for gas consumption, respectively.

1. Introduction

Globally, buildings consume about 30% of end energy usage and over 55% of electricity [1]. Building energy consumption is increasing with the growth of the global population. It is affected by a large number of physical and sociological factors. Accurate energy prediction can help quantify and compare the energy-saving potentials of different conservation measures, as well as assist design optimization [2,3].
There are two approaches to predict building energy consumption. One is based on a physical model, and the other is data driven. The physical modeling approach is also called the forward modeling approach. The forward modeling approach is usually conducted with commercial software, e.g., DOE-2, DesignBuilder, etc., with given inputs to estimate the building energy consumption through simulation. The differences of the outcomes among different software are typically small with the same/identical input values of the variables [4]. Fumo et al. [5] used EnergyPlus Benchmark Models to generate the determining factors based on the monthly electrical and fuel utility bills to estimate the hourly electricity consumption and fuel energy consumption for a hypothetical building in Atlanta, GA, and in Meridian, MS, with estimated errors within 10%. Amiri et al. [6] developed a Stepwise Regression (SR) model, based on the simulation results from DOE-2, to predict the building energy consumption at the early design phase. The physical modeling approach requires detailed information about the building, mechanical systems, and occupants’ activities to develop a mathematical model to estimate the building energy consumption, which might not be readily available. Meanwhile, the physical model could not take into account the sociological factors that potentially affect the energy-usage patterns of the occupants.
The data-driven approach uses data analysis through known data sets to overcome the limitations of physical models to predict the energy consumption. Typically, an energy-usage database is created through the simulation of building samples or data collection. Examples of data-driven approaches include Multiple Linear Regression (MLR), Classification and Regression Tree (CART), artificial neural network (ANN), etc.
MLR models have been developed to replace the outcomes from building simulation software. Chen et al. [7] developed a physical-based MLR model to predict the building cooling load based on the data set created through building energy simulation using EnergyPlus. It was demonstrated to have a stronger generalization ability than the BP-ANN and MLR models. By using this method, the space cooling load can be predicted based on the total cooling load. Ciulla et al. [8] used TRNSYS to run 1560 simulations of a non-residential building with different configurations across Italy to create an energy database and developed MLR models to estimate the building energy consumption with determination coefficients (R2) higher than 0.9 and mean absolute error (MAE) lower than 10 kWh/m2 year.
Stepwise Regression (SR) can help overcome the multicollinearity problem that could exist in the multiple regression problem and reduce the number of input variables. Tso and Yau [9] developed the SR analysis of the household electricity consumption in winter and summer in Hongkong. Zhao and Lin [10] proposed SR models to predict the energy consumption and visual discomfort of a passive house, compared with the simulated outcomes from DesignBuilder. R-squares of 0.9808 and 0.8487 were found, respectively, which demonstrate the potential of SR in predicting the building energy consumption.
The Support Vector Machine (SVM) helps to solve high-dimensional difficulty and local minima problems. Ma et al. [11] applied support vector regression (SVR) models to estimate the provincial building energy consumption in four provinces in Southern China. Seven parameters, including yearly mean outdoor dry-bulb air temperature, relative humidity, total solar radiation, urbanization ratio, gross domestic product, household consumption level, and total construction area of were used as inputs. Good agreements were found between the predicted and actual energy consumptions, with the mean square errors (MSEs) and correlation coefficients found to be less than 0.001 and greater than 0.99, respectively. Li et al. [12] developed a SVM model to estimate the office hourly cooling load with outdoor air temperature, relative humidity, and solar radiation intensity as the input variables. The SVM model outperforms the Backpropagation Neural Network (BPNN) model in terms of accuracy and generalization. Paudel et al. [13] developed a SVM model for a low-energy residential building in France, using a small representative day data set. The outdoor air temperature, horizontal solar radiation, solar gain transmitted through windows, solar energy absorbed by walls, occupancy profile, and time moving average of outdoor air temperature were used as input variables for the model. It was found that the model achieves higher prediction accuracy (R2 = 0.98; RMSE = 3.4), compared to the one developed with all the data sets (R2 = 0.93; RMSE = 7.1).
BPNN is the most widely used neural network. Ahmad et al. [14] developed feed-forward BPNN and random forest (RF) models to estimate the energy demand of the HVAC system in a commercial building in Madrid, Spain. The input variables include outdoor air temperature, dew point temperature, relative humidity, wind speed, duration time, number of guests on the day, and number of rooms booked. The results show that the RMSEs of the prediction results of the BPNN and RF models were 4.97 and 6.10, respectively. The BPNN model achieves a slightly better performance than the RF model in terms of accuracy.
Radial Basis Function Neural Networks (RBFNs) have been used to predict the energy consumption of university buildings. Han et al. [15] proposed an RBFN model to evaluate the energy performance of the buildings, using the University of California Irvine data sets. The predicted values agree well with the simulation outcome from Ecotech. Zhao et al. [16] developed an RBFN model to predict the energy consumption of colleague buildings in Fujian Province in China, with a maximum error of 13.3%.
Classification and Regression Tree (CART) is also one of the machine learning approaches favored by the researchers. Zekić-Sušac et al. [17] developed a CART model to predict the energy cost of public buildings in the Republic of Croatia. Capozzoli et al. [18] developed a CART model to predict the heating energy consumption in schools with an R-square of 0.86.
The Chi-Square Automatic Interaction Detector (CHAID) can be used to generate a multi-branched decision tree and determine the branch variables’ values based on statistical significance. Yang and Wu [19] applied CHAID to find the energy-saving strategies for central air-conditioning system operation in Shenzhen, China. Kusiak et al. [20] developed a CHAID model to predict the building steam load with a mean absolute error (MAE) of 405 for training and 578 for testing.
Exhaustive CHAID (ECHAID) is another decision tree algorithm that ensures the same degree of freedom for all the inputs. Kusiak et al. [20] compared the outcomes from ECHAID model with the CHAID model in predicting the building steam load. The ECHAID achieved a mean absolute error (MAE) of 398 for training and 570 for testing. Yan et al. [21] developed an ECHAID model to predict the system coefficient of performance (COP) of a ground-source heat pump with an MAE of 0.098 for training and 0.105 for testing.
Researchers have also investigated other data-driven approaches; for example, Li et al. [22] developed a hybrid teaching–learning artificial neural network model (TL-ANN) to predict the hourly electrical energy consumption for two educational buildings located in USA and China, using weather conditions, calendar date, occupancy pattern, and historical energy usage data. Moayedi [23] compared the performances of three cooling load prediction models for a residential building. The elephant herding optimization (EHO), ant colony optimization (ACO), and Harris Hawks optimization (HHO), were combined with a multilayer perceptron neural network (MLP) model. The relative compactness of the building, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution are used as inputs for the model. The results show that the EHO–MLP has the highest prediction accuracy, followed by HHO–MLP and ACO–MLP. Aruta et al. [24] developed an artificial neural networks (ANNs) model, using NARX (nonlinear autoregressive model with exogenous inputs) networks for training based on simulated heating load of a building in Rome from EnergyPlus. The outdoor air temperature and solar radiation were used as inputs and demonstrated satisfactory prediction performance. Ndiaye and Gabriel [25] used the latent root regression technique to reduce the number of input variables from 59 to 9, while achieving an R-square of 0.79 in predicting the housing unit electricity consumption in Oshawa. Still, they performed studies only on a few data-driven algorithms.
From the literature survey, it can be found that very few studies were conducted to predict the yearly residential building energy consumption based on actual energy consumption data. Many studies focus on monthly [26], daily [27,28,29], or hourly [13,27,28,29,30] energy consumption, based on the simulation outcomes from commercial software [26,31,32,33,34,35]. Short-term energy predictions are easily affected by seasonal variation and the outcomes from the simulation often deviate from actual energy consumption. In addition, the effects of occupants’ behaviors on the energy usage are often neglected in the prediction model, and most of the parameters focus on weather data [26,27,28,29,31] or design parameters of the building envelope [26,31,33,34,35], thus causing deviations in energy consumption predictions for different households; social and demographic information are often neglected, as well. Moreover, many of the studies used fixed number of input variables and training/validation ratio, without seeking for the least number of inputs needed and the models with the best performance. Therefore, it is important to develop a residential building energy prediction model based on the collected data from actual annual energy consumption, taking into account the social and demographic information and evaluate the impact of the number of input variables, as well as the training/validation ratio for the performance of the prediction model.
This paper attempts to develop yearly energy consumption prediction models for residential buildings in Oshawa. Data related to electricity consumption, gas consumption, physical information of the buildings, and social and demographic information of the residents were collected through smart metering, a phone survey, and energy auditing of a total of 83 households. A total of eight data-driven algorithms, namely Multiple Linear Regression (MLR), Stepwise Regression (SR), Support Vector Machine (SVM), Backpropagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFN), Classification and Regression Tree (CART), Chi-Square Automatic Interaction Detector (CHAID), and Exhaustive CHAID (ECHAID), were used to develop energy prediction models to select the most suitable models for electricity consumption and gas consumption predictions. Different numbers of input variables and training/validation ratios were employed to find the models with the best prediction performance with the least number of inputs. The outcomes from this paper can provide references for residential-building energy prediction.

2. Method

The actual electricity and gas consumption data, physical properties, mechanical system information, and consumer information of 227 houses in Oshawa—which has a humid continental climate with large seasonal temperature variations, with warm summers and cold winters—were collected and analyzed. The energy consumption is for a full year. Firstly, smart meters were installed on 227 houses in Oshawa to obtain the electricity readings, and a phone survey on the social and demographic information of the occupants, as well as information on the electrical appliances, was conducted on the houses with installed smart meters. Energy audits were conducted according to the willingness of the house owner/renter. A total of 65 input and output parameters were identified after an analysis of the gathered information. During the data preprocessing, it was found that, due to the reluctance of some house owners/renters to disclose certain information, or that they were unclear about certain information, there were 144 samples with missing data for annual electricity consumption and 154 samples with missing data for gas consumption. Therefore, the predictions of electricity consumption and gas consumption are based on 83 and 73 residential buildings, respectively. Then three groups of input parameters are selected based on variable importance (VI) through statistical analysis. Finally, eight data-driven modeling approaches were used to develop electricity and gas consumption prediction models based on different groups of input parameters. The performances of different models were evaluated, and the best prediction models for electricity and gas consumption were identified. The IBM SPSS Statistics 26.0 and Clementine 12.0 were used to apply the algorithm [36]. A flowchart of the research strategy is presented in Figure 1.

2.1. Independent and Dependent Variables

Table 1 lists the variable names and their value ranges, where the independent variables 1–29 and 30–63 and dependent variables 64–65 were collected through a phone survey, energy audit, and smart metering. The range of values is formed based on the outcomes from the collected data.

2.2. Prediction Model Development

The MLR, SR, SVM, BPNN, RBFN, CART, CHAID, and ECHAID were employed to develop electricity consumption and gas consumption prediction models.

2.2.1. Multiple Linear Regression

MLR has been widely used in building energy consumption prediction and can be used in the early design stage to improve the building performance [37] and hourly cooling load prediction [7]. In this paper, MLR is used to develop the relationship between the independent variables (variables 1–63), and dependent variables (variables 64 and 65). The MLR model can be presented as follows:
y = β 0 + β 1 x 1 + β 2 x 2 + + β n x p + ε
where   β 0 denotes the regression constant; β 1 , β 2 ,   and   β p denote the regression coefficients; xi refers to the input variables; ε is the random error; and p denotes the number of independent variables involved in the regression.
The regression coefficients are determined based on the least square method, which minimizes the residual sum of squares (RSS). The RSS is calculated by the following equation:
RSS = i = 1 n ( y i β 0 β 1 x 1 β 2 x 2 β p x p ) 2
where n is the number of samples.

2.2.2. Stepwise Regression

The SR uses a step-by-step iterative approach to develop a regression model by selecting only the important independent variables. It is also widely used in building simulation [38]. In this paper, 63 independent variables were introduced into the regression model one-by-one and sorted according to their importance. Each dependent variable goes through an F-test and T-test and remains in the model if it is statistically significant.

2.2.3. Support Vector Machine

The SVM introduces the principle of structural risk minimization, which effectively solves the high-dimensional difficulty and local minima problem. Gao [39] developed an SVM model to predict building energy consumption based on historical data with good prediction performance. By studying the output/input variables relationship, the SWM predicts the output variable values of new samples with the same distribution as the training sample set. A loss function is introduced to correct the distance to the decision boundary, so as to determine the regression function. Thus, a prediction model is developed to predict the outputs for new samples with the same distribution [40].

2.2.4. Backpropagation Neural Network

The BPNN is the most widely used neural network. As a multilayer feed-forward neural network, it is trained according to an error backpropagation algorithm [41]. BPNN features arbitrarily complex pattern classification ability and demonstrates excellent multidimensional function mapping ability. It includes an input, a hidden, and an output layer. The least square error of the network is obtained by using the gradient descent method to for minimization.

2.2.5. Radial Basis Function Neural Network

RBFN utilizes radial basis functions (RBFs) as activation functions. The RBF network consists only of a single hidden layer that has its own way of computing the output. The input layer receives the input data and feeds them into the special hidden layer. The computations in the hidden layers are based on comparisons with prototype vectors from the training set. Each neuron computes the similarity between the input vector and its prototype vector. RBFN has been proven to have a good prediction performance for the building cooling load [13].

2.2.6. CART

The CART is a classification algorithm that builds a decision tree based on Gini’s impurity index [42]. It applies the binary segmentation method to recursively construct the binary decision tree process and uses the square error minimization criterion for feature selection for the regression tree. CART has been proven to achieve good performance in heating energy prediction [18].

2.2.7. CHAID

CHAID is based on adjusted significance testing, which was proposed by Kass et al. [43]. In this method, multi-branch decision trees can be generated. First, the F-test is carried out, and variables statistically similar to the target variable are combined; then p-values for the remaining variables are calculated, and the variable with the best predictor (lowest p-value) is selected as the first variable in the decision tree branches. The process repeats until the tree is fully grown. It has been successfully used to predict the steam load [20].

2.2.8. Exhaustive CHAID

As an improved algorithm based on CHAID, ECHAID is different from CHAID on the merging step [44]. The latter stops when all remaining categories are found to be statistically different. The former continues grouping, leaving only two super categories. In this way, all input variables are ensured to have the same degree of freedom. It has been successfully employed to predict the performance of heat pumps [21].

2.3. Choice of Input Variables

In order to eliminate the variables that are unimportant to the prediction of building energy consumption, the variable importance (VI) is employed to assist in the selection of the input variables to develop prediction models; detailed information in the calculation can be found in Ref. [45]. At the same time, the ratios of samples for training and validation are set as 7:3, 8:2, and 9:1, respectively. The data are split randomly.

2.4. Prediction Model Evaluation

The prediction model performance is evaluated through maximum errors (MAXEs), mean absolute error (MAE), standard deviation (SD), correlation coefficient (R), and MAPE. The MAE, SD, and R can be calculated as shown below:
MAE = 1 n Σ i = 1 n y ^ i y i
S D = Σ i = 1 n y ^ i y i M A E 2 n
R = 1 Σ i = 1 n y ^ i y i 2 Σ i = 1 n y ^ i y 2
MAPE = 1 n Σ i = 1 n y ^ i y i × 100 %
where y ^ i   denotes   the   prediction   value ,   y i denotes the targeted value, y ¯ denotes the average value of the targeted values, and n is the number of samples.
Evaluation on the validation of the performance of the prediction model based on MAXE, MAE, SD, R, and MAPE under different training-to-validation ratios (7:3, 8:2, and 9:1) to ensure the best performance and the least amount of data for training.

3. Results and Discussion

3.1. Results of Variable Selection

Depending on the variable importance (VI) of each variable, totals of 26, 12, and 6 variables were selected to develop the prediction models for electricity consumption (Table 2), and totals of 26, 13, and 6 variables were selected to develop the prediction models for natural gas consumption (Table 3).

3.2. Performance of Electricity Consumption Prediction Model

Analyses of the results of the prediction models for electricity consumption are presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8 in Appendix A. The regressions between predicted and simulated electricity consumption for the best models of each data-driven approach are presented in Figure 2a–h.
The outcomes of the MLR models on the prediction of electricity consumption are listed in Appendix A Table A1. It can be found that when the number of variables is 6 and the ratio of training sample vs. validation samples is 9:1, the MLR model has the best performance, with MAPEs of 15.05% for training and 11.71% for validation, respectively. Figure 2a presents the regression between predicted and simulated electricity consumption for the best MLR model. The model predicts pretty well when the electricity consumption is less than 35,000 kWh (93% of all the samples), and it underpredicts the electricity consumption when it exceeds 35,000 kWh.
The outcomes of the SR models on the prediction of electricity consumption are listed in Appendix A Table A2; they are similar to those of the MLR models. When the number of variables is 6 and the ratio of training sample vs. validation samples is 9:1, the SR model has the best performance with MAPEs of 14.79% for training and 14.18% for validation, respectively. Figure 2b presents the regression between predicted and simulated electricity consumption for the best SR model. The model also predicts pretty well when the electricity consumption is less than 35,000 kWh, and it underpredicts the electricity consumption when it exceeds 35,000 kWh.
The outcomes of the SVM models on the prediction of electricity consumption are listed in Appendix A Table A3. It can be found that when the number of variables is 6 and ratio of training sample vs. validation samples is 7:3, the SVM model has the best performance, with MAPEs of 21.89% for training and 11.50% for validation, respectively. Figure 2c presents the regression between predicted and simulated electricity consumption for the best SVM model. The model predicts pretty well when the electricity consumption is around 10,000 kWh, and it underpredicts the electricity consumption when it exceeds 15,000 kWh.
The outcomes of the BPNN models on the prediction of electricity consumption are listed in Appendix A Table A4. It can be found that when the number of variables is 26 and the ratio of training sample vs. validation samples is 9:1, the BPNN model has the best performance, with MAPEs of 0.94% for training and 0.94% for validation, respectively. The number of inputs can be reduced to 12, with a correlation coefficient almost equal to 1.0 and MAPE less than 1.18%. Figure 2d presents the regression between predicted and simulated electricity consumption for the best BPNN model. Compared with the results from Ndiaye and Gabriel (2011), the R-square value is significantly improved from 0.79 to 0.9997. The model predicts pretty well for all the samples.
The outcomes of the RBFN models on the prediction of electricity consumption are listed in Appendix A Table A5. It can be found that when the number of variables is 6 and the ratio of training sample vs. validation samples is 8:2, the RBFN model has the best performance, with MAPEs of 8.82% for training and 5.62% for validation, respectively. Figure 2e presents the regression between predicted and simulated electricity consumption for the best RBFN model. The model predicts pretty well when the electricity consumption is less than 35,000 kWh, and it tends to underpredict the electricity consumption when it is in the range of 35,000–40,000 kWh.
The outcomes of the CART models on the prediction of electricity consumption are listed in Appendix A Table A6. It can be found that when the number of variables is 6 and the ratio of training sample vs. validation samples is 7:3, the CART model has the best performance, with MAPEs of 1.41% for training and 5.50% for validation, respectively. Figure 2f presents the regression between predicted and simulated electricity consumption for the best CART model. The model predicts pretty well for almost all the samples, with the exception that it underpredicts one sample with actual consumption at around 50,000 kWh.
The outcomes of the CHAID models on the prediction of electricity consumption are listed in Appendix A Table A7. It can be found that when the number of variables is 26 and the ratio of training sample vs. validation samples is 7:3, the CHAID model has the best performance, with MAPEs of 0.87% for training and 5.03% for validation, respectively. Figure 2g presents the regression between predicted and simulated electricity consumption for the best CHAID model. Similar to the CART model, it predicts pretty well for almost all the samples, with the exception that it underpredicts one sample with actual consumption at around 50,000 kWh.
The outcomes of the ECHAID models on the prediction of electricity consumption are listed in Appendix A Table A8. It can be found that when the number of variables is 26 and the ratio of training sample vs. validation samples is 7:3, the CHAID model has the best performance, with MAPEs of 0.92% for training and 9.89% for validation, respectively. Figure 2h presents the regression between predicted and simulated electricity consumption for the best ECHAID model. It predicts pretty well for almost all the samples, except that it overpredicts two samples with actual consumption at around 26,000 kWh and underpredicts one sample with actual consumption at around 50,000 kWh.
Table 4 presents the range of relative errors for the eight best prediction models for each data-driven approach. It can be found that the BPNN model has the best prediction performance, followed by the CHAID model, ECHAID model, CART model, and RBFN model. The performances of the SVM, SR, and MRL models are not as good as the other ones.

3.3. Performance of Natural Gas Consumption Prediction Model

The outcomes of the natural gas consumption prediction models are listed in Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15 and Table A16 in Appendix A. The regressions between predicted and simulated natural gas consumption for the best models of each data-driven approach are presented in Figure 3a–h.
The outcomes of the MLR models on the prediction of natural gas consumption are listed in Appendix A Table A9. It can be found that when the number of variables is 13 and the ratio of training sample vs. validation samples is 7:3, the MLR model has the best performance, with MAPEs of 13.98% for training and 24.67% for validation, respectively. Figure 3a presents the regression between predicted and simulated natural gas consumption for the best MLR model. Good agreements are found between the predicted and actual energy consumption.
The outcomes of the SR models on the prediction of natural gas consumption are listed in Appendix A Table A10. Similar to the MLR model, when the number of variables is 13 and the ratio of training sample vs. validation samples is 7:3, the SR model has the best performance, with MAPEs of 14.03% for training and 24.89% for validation, respectively. Figure 3b presents the regression between predicted and simulated natural gas consumption for the best SR model. Good agreements are found between the predicted and actual energy consumption.
The outcomes of the SVM models on the prediction of natural gas consumption are listed in Appendix A Table A11. It can be found that when the number of variables is 26 and the ratio of training sample vs. validation samples is 7:3, the SVM model has the best performance, with MAPEs of 59.47% for training and 53.23% for validation, respectively. Figure 3c presents the regression between predicted and simulated natural gas consumption for the best SVM model. Large deviations between the predicted value and actual energy consumption are found.
The outcomes of the BPNN models on the prediction of natural gas consumption are listed in Appendix A Table A12. It can be found that when the number of variables is 26 and the ratio of training sample vs. validation samples is 9:1, the BPNN model has the best performance, with MAPEs of 2.63% for training and 0.16% for validation, respectively. The number of inputs can be reduced to 13, with a correlation coefficient higher than 0.979 and MAPEs less than 7.03%. When the number of inputs is reduced to 6, the correlation coefficient is still higher than 0.927, with MAPEs less than 11.63%. Figure 3d presents the regression between predicted and simulated natural gas consumption for the best BPNN model. The model predicts pretty well for almost all the samples.
The outcomes of the RBFN models on the prediction of natural gas consumption are listed in Appendix A Table A13. It can be found that when the number of variables is 6 and ratio of training sample vs. validation samples is 8:2, the RBFN model has the best performance, with MAPEs of 12.85% for training and 7.57% for validation, respectively. Figure 3e presents the regression between predicted and simulated natural gas consumption for the best RNFN model. The model predicts pretty well for all the samples, except under predicting one sample with natural gas consumption of 5049 m3.
The outcomes of the CART models on the prediction of natural consumption are listed in Appendix A Table A14. It can be found that when the number of variables is 13 and the ratio of training sample vs. validation samples is 7:3, the CART model has the best performance with MAPEs of 5.08% for training and 31.56% for validation, respectively. Figure 3f presents the regression between predicted and simulated natural gas consumption for the best CART model. The model predicts generally well for most of the samples, with big deviations for only a few samples.
The outcomes of the CHAID models on the prediction of natural consumption are listed in Appendix A Table A15. It can be found that when the number of variables is 6 and the ratio of training sample vs. validation samples is 7:3, the CHAID model has the best performance, with MAPEs of 18.74% for training and 24.72% for validation, respectively. Figure 3g presents the regression between predicted and simulated natural gas consumption for the best CHAID model. It can be observed that the model predicts generally well for some of the samples; however, for some of the samples, the natural gas consumption is predicted to be about 3600 m3 regardless of their actual consumption.
The outcomes of the ECHAID models on the prediction of natural consumption are listed in Appendix A Table A16. Similar to the CHAID model, when the number of variables is 6 and the ratio of training sample vs. validation samples is 7:3, the ECHAID model has the best performance, with MAPEs of 18.74% for training and 24.72% for validation, respectively. Figure 3h presents the regression between predicted and simulated natural gas consumption for the best ECHAID model, which is similar to the CHAID model.
Table 5 presents the ranges of relative errors for the eight best prediction models. It can be found that the BPNN model has the best prediction performance, followed by the CART model and RBFN model. The performance of other models is much poorer, with the SVM model being the worst case.

4. Conclusions and Limitations

In this paper, eight data-driven methods were employed to develop energy prediction models for residential buildings in Oshawa with different numbers of input variables and training to validation ratios. The following conclusions can be made:
(1)
The performance of the prediction model can be improved through careful selections of variables based on VI and training to validation ratios. As only a small number of input variables are used, it can also help reduce the efforts of data collection.
(2)
With 26 input variables, the BPNN models have the best performance in predicting both the electricity consumption and gas consumption because their maximum error, mean absolute error, standard deviation, and MAPE are smaller than those of other models, and their correlation coefficient is larger than that of other models.
(3)
The MLR model has the worst performance in predicting the electricity consumption, and the SVM model has the worst performance in natural gas consumption prediction.
(4)
The number of inputs can be reduced to 12 in the BPNN model to predict the electricity consumption, with a correlation coefficient almost equal to 1.0 and MAPE ≤ 1.18%. By using the CART model, the number of inputs can be further reduced to 6, with a correlation coefficient ≥0.95 and MAPE ≤ 5.50%.
(5)
The number of inputs can be reduced to 13 in the BPNN model for natural gas consumption prediction with a correlation coefficient ≥0.979 and MAPE ≤ 7.03%. When it is further reduced to 6, the correlation coefficient of the BPNN model is still ≥0.927, with the MAPE ≤ 11.63%.
(6)
Based on the performance of the prediction models, when the human factor, e.g., SpenLess (awareness of the importance of spending less on energy bills), FromHome (number of people working or staying at home), and HomState (housing situation), are introduced, the performance of the prediction model can be improved. Those variables are often very difficult to introduce to develop physical models in traditional methods.
The limitations of the prediction models are as follows:
(1)
They can only be applied to residential buildings (houses) in Oshawa and cannot be applied to commercial buildings.
(2)
More data collection is needed, including weather data, to develop prediction models that are applicable throughout Canada.

Author Contributions

Y.L. and K.G. contributed to the conception of the study and the development of the methodology; Y.L., J.L., K.G., W.Y. and C.-Q.L. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by NATURAL SCIENCE FOUNDATION OF HUBEI PROVINCE, grant number 2017CFB602.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to their containing information that could compromise the privacy of the research participants.

Acknowledgments

The authors acknowledge the support from Natural Resources Canada, Oshawa Public Utility Corporation, Ontario Center for Excellence, and the Faculty of Engineering and Applied Science of University of Ontario Institute of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Analysis of the results of the MLR model for electricity consumption.
Table A1. Analysis of the results of the MLR model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
26 7:3Training9217275936570.9420.8%
Validation18,347430057890.7936.8%
8:2Training10,174275136910.9320.5%
Validation17,416384153100.8532.1%
9:1Training10,686256835970.9319.1%
Validation15,901357152210.9126.8%
127:3Training13,489304045420.9020.0%
Validation13,655224235010.9518.5%
8:2Training13,496290544280.9019.2%
Validation13,830244437330.9519.8%
9:1Training14,043271242050.9018.4%
Validation13,415274842440.9620.5%
67:3Training14,332286448920.8816.1%
Validation18,652220742680.9216.6%
8:2Training14,339278047950.8815.8%
Validation19,260221545600.9315.5%
9:1Training14,231258445630.8915.0%
Validation18,420217949710.9511.7%
Table A2. Analysis of the results of the SR model for electricity consumption.
Table A2. Analysis of the results of the SR model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training12,178318945200.9021.9%
Validation17,646268343640.9121.9%
8:2Training12,116308044280.9021.3%
Validation17,879272845930.9121.4%
9:1Training12,450284042090.9019.8%
Validation17,765319653870.9222.9%
127:3Training13,208322847220.8921.2%
Validation17,633275143710.9122.5%
8:2Training13,126310946210.8920.6%
Validation17,894281146160.9122.1%
9:1Training13,636288044020.8919.2%
Validation17,612315153140.9422.3%
67:3Training15,664289850330.8716.4%
Validation21,563256549180.9018.2%
8:2Training15,638281449160.8816.2%
Validation21,503268151740.9118.2%
9:1Training15,443258346940.8814.8%
Validation21,016268857400.9514.2%
Table A3. Analysis of the results of the SVM model for electricity consumption.
Table A3. Analysis of the results of the SVM model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training37,611629010,3410.8121.9%
Validation41,168359594080.8511.5%
8:2Training37,611598010,1660.8120.9%
Validation41,171405199340.8312.9%
9:1Training37,612552197910.8219.5%
Validation41,171509611,6580.8615.0%
127:3Training37,567627810,3250.8421.9%
Validation41,129358893960.8611.5%
8:2Training37,564596910,1500.8420.9%
Validation41,127404399200.8612.9%
9:1Training37,567551197750.8519.4%
Validation41,130508611,6430.8914.9%
67:3Training37,514626810,3110.8621.9%
Validation41,063358293800.9211.5%
8:2Training37,519596010,1370.8620.8%
Validation41,068403699040.9212.8%
9:1Training37,515550297610.8719.4%
Validation41,064507811,6240.9314.9%
Table A4. Analysis of the results of the BPNN model for electricity consumption.
Table A4. Analysis of the results of the BPNN model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training16,131280643810.9116.5%
Validation13,618202433860.9514.4%
8:2Training25544228331.001.9%
Validation1562374111.001.5%
9:1Training345871711.000.9%
Validation4351101551.000.9%
127:3Training711237610021.001.8%
Validation27353005491.001.9%
8:2Training458674313290.993.5%
Validation18034275661.002.7%
9:1Training564811331.000.8%
Validation2361361881.001.1%
67:3Training11,85787221100.983.9%
Validation24433648001.002.3%
8:2Training13,089169735860.947.7%
Validation36523458651.001.7%
9:1Training17,032218745370.8910.3%
Validation20,134172352970.946.5%
Table A5. Analysis of the results of the RBFN model for electricity consumption.
Table A5. Analysis of the results of the RBFN model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training19,346421453360.8628.2%
Validation6519221626410.9620.1%
8:2Training14,505284644440.9016.8%
Validation15,093227440820.9113.7%
9:1Training13,076277442520.9019.1%
Validation8920194227150.9912.9%
127:3Training15,797248242270.9114.3%
Validation3274113514400.999.5%
8:2Training17,058316749660.8719.5%
Validation7338178824980.9815.1%
9:1Training15,795209438550.9212.2%
Validation2710115414590.998.8%
67:3Training15,105210039250.9310.5%
Validation298990212680.997.8%
8:2Training14,315187837080.938.8%
Validation339276410951.005.6%
9:1Training13,931142828550.968.6%
Validation89562811421.006.0%
Table A6. Analysis of the results of the CART model for electricity consumption.
Table A6. Analysis of the results of the CART model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training522446012070.992.2%
Validation10,680142038460.925.5%
8:2Training522444411760.992.1%
Validation10,680158640970.925.9%
9:1Training522461820860.982.9%
Validation10,680140833190.974.8%
127:3Training38502757171.001.2%
Validation18,575196554660.837.0%
8:2Training38502687001.001.2%
Validation18,575220358250.837.7%
9:1Training385046218880.982.1%
Validation18,575235461740.857.4%
67:3Training37453388811.001.4%
Validation29,551179059370.875.5%
8:2Training37453278591.001.4%
Validation29,551200662980.876.1%
9:1Training591538710790.991.7%
Validation29,551262976850.857.1%
Table A7. Analysis of the results of the CHAID model for electricity consumption.
Table A7. Analysis of the results of the CHAID model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training31751675471.000.9%
Validation29,983168461320.765.0%
8:2Training31751695341.000.9%
Validation29,346184663960.775.3%
9:1Training10,49683324030.970.9%
Validation29,346283182040.715.3%
127:3Training18,988253852790.8610.2%
Validation22,535119145150.893.7%
8:2Training18,988241551430.869.8%
Validation22,535132948070.893.9%
9:1Training19,124216648370.878.7%
Validation22,671179859320.894.7%
67:3Training18,988254752790.8610.3%
Validation22,535119345150.893.7%
8:2Training18,988242051430.869.8%
Validation22,535133248080.894.0%
9:1Training19,124216848370.878.8%
Validation22,671180159320.894.76%
Table A8. Analysis of the results of the ECHAID model for electricity consumption.
Table A8. Analysis of the results of the ECHAID model for electricity consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training31751715471.000.9%
Validation29,983292884920.659.9%
8:2Training31751445301.000.7%
Validation29,346327289530.6511.0%
9:1Training18,441198745550.897.5%
Validation21,988185858030.895.2%
127:3Training18,259233849620.889.0%
Validation21,806124644320.893.9%
8:2Training18,259221648340.888.5%
Validation21,806138247200.894.1%
9:1Training18,441200645550.897.7%
Validation21,988184158080.895.0%
67:3Training18,259234349620.889.1%
Validation21,806124944320.893.9%
8:2Training18,259222148340.888.6%
Validation21,806137747210.894.1%
9:1Training18,441201045550.897.8%
Validation21,988184658080.895.0%
Table A9. Analysis of the results of the MLR model for natural gas consumption.
Table A9. Analysis of the results of the MLR model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training7682713400.9611.9%
Validation14526038350.7732.6%
8:2Training7712613340.9611.4%
Validation14606628760.7635.8%
9:1Training7632773430.9612.0%
Validation217273410520.6943.0%
137:3Training9643264090.9414.0%
Validation13815266490.8624.7%
8:2Training9693164020.9413.5%
Validation13945776840.8627.1%
9:1Training9023154020.9413.2%
Validation18316668550.8233.5%
67:3Training28925127290.7922.3%
Validation24944697570.8121.0%
8:2Training28825067200.7821.9%
Validation25154807850.8221.5%
9:1Training28784726860.8120.1%
Validation15234587140.8826.2%
Table A10. Analysis of the results of the SR model for natural gas consumption.
Table A10. Analysis of the results of the SR model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training11523174030.9413.2%
Validation17236258060.7830.7%
8:2Training11533053950.9412.7%
Validation17236938500.7834.1%
9:1Training9893274150.9313.5%
Validation18507008700.8133.6%
137:3Training10913314260.9314.0%
Validation17905546960.8424.9%
8:2Training10853214170.9313.6%
Validation17976087330.8327.4%
9:1Training9893274150.9313.5%
Validation18507008700.8133.6%
67:3Training25685647550.7728.4%
Validation24825858660.7328.0%
8:2Training25595597440.7727.9%
Validation25036128930.7429.4%
9:1Training29824856940.8019.5%
Validation24935339080.7927.2%
Table A11. Analysis of the results of the SVM model for natural gas consumption.
Table A11. Analysis of the results of the SVM model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training231394011640.7559.5%
Validation299195812620.7453.2%
8:2Training231292811480.7558.3%
Validation299099313110.7856.0%
9:1Training219292611420.7757.1%
Validation2872101914270.7473.4%
137:3Training231994511680.8059.9%
Validation298995812650.7853.4%
8:2Training231793311520.7858.6%
Validation298699313130.7856.2%
9:1Training220693011460.7757.3%
Validation2875101914280.8173.5%
67:3Training232594711700.6959.8%
Validation300095912660.7153.3%
8:2Training232593511540.6958.6%
Validation300099413140.7456.1%
9:1Training221593311480.7057.3%
Validation2887101714300.8373.3%
Table A12. Analysis of the results of the BPNN model for natural gas consumption.
Table A12. Analysis of the results of the BPNN model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training13342633090.9711.0%
Validation5512723220.9713.2%
8:2Training14671452760.976.3%
Validation2721021251.005.2%
9:1Training663552260.982.6%
Validation13251.000.2%
137:3Training2621181480.995.1%
Validation4871732330.986.0%
8:2Training8091912590.988.2%
Validation1861381680.996.3%
9:1Training8481922430.987.0%
Validation5332392860.989.2%
67:3Training14633744600.9211.9%
Validation10333735450.9111.3%
8:2Training26504276170.8514.3%
Validation9753424350.9614.2%
9:1Training9133384350.9311.6%
Validation5122823760.9710.3%
Table A13. Analysis of the results of RBFN model for natural gas consumption.
Table A13. Analysis of the results of RBFN model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training13204706070.8618.7%
Validation9734585870.8923.2%
8:2Training28485257170.7921.2%
Validation10314705960.8920.0%
9:1Training28964766910.8016.6%
Validation8044776180.9021.4%
137:3Training11713814690.9216.5%
Validation7893194070.9515.6%
8:2Training14244415390.8918.2%
Validation7063464200.9517.0%
9:1Training18164475620.8717.4%
Validation6664194960.9420.4%
67:3Training29286337400.8126.3%
Validation10085007120.8427.7%
8:2Training44323947440.7912.9%
Validation6952303240.977.6%
9:1Training45553957310.7913.2%
Validation4612222460.999.3%
Table A14. Analysis of the results of the CART model for natural gas consumption.
Table A14. Analysis of the results of the CART model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training6341332090.985.0%
Validation18406899940.6434.3%
8:2Training6601642470.985.7%
Validation256981711550.5539.2%
9:1Training8341542520.985.4%
Validation244072311350.6143.5%
137:3Training6341392120.985.1%
Validation18406059240.6931.6%
8:2Training6601732610.975.9%
Validation256970510760.6035.2%
9:1Training8341622640.975.6%
Validation244068011240.6341.5%
67:3Training4941172100.983.8%
Validation256980613350.4047.6%
8:2Training6601432220.984.5%
Validation256989114060.3451.9%
9:1Training9791722990.975.5%
Validation256999816810.2868.5%
Table A15. Analysis of the results of the CHAID model for natural gas consumption.
Table A15. Analysis of the results of the CHAID model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training13662714380.937.7%
Validation203866510120.6437.4%
8:2Training14112804420.938.1%
Validation208363810150.6537.0%
9:1Training15892424410.926.6%
Validation2261100713860.4360.5%
137:3Training12462544210.937.9%
Validation6727089880.6641.4%
8:2Training19154306470.8314.5%
Validation258779411840.4643.5%
9:1Training13902304070.946.1%
Validation206299413920.4057.4%
67:3Training37143857220.7918.7%
Validation23396127990.7824.7%
8:2Training37143777090.7918.2%
Validation23516568430.7826.6%
9:1Training16403054780.919.7%
Validation231286115150.3464.0%
Table A16. Analysis of the results of the ECHAID model for natural gas consumption.
Table A16. Analysis of the results of the ECHAID model for natural gas consumption.
Number of VariablesTraining: ValidationData SetMAX ErrorMAESDRMAPE
267:3Training12462884820.918.1%
Validation416492014390.2845.9%
8:2Training12461683660.954.4%
Validation4164106515510.1951.4%
9:1Training15892424410.926.6%
Validation2261100713860.4360.5%
137:3Training19133976430.8413.1%
Validation258575411360.4740.6%
8:2Training19153826310.8412.6%
Validation258783012010.4544.8%
9:1Training11502434270.936.9%
Validation169287312940.5258.2%
67:3Training37143857220.7918.7%
Validation23396127990.7824.7%
8:2Training37143777090.7918.2%
Validation23516568430.7826.6%
9:1Training16403064780.919.7%
Validation231286115150.3464.0%

References

  1. International Energy Agency (IEA). Available online: https://www.iea.org/reports/the-critical-role-of-buildings (accessed on 19 September 2022).
  2. Lin, Y.; Zhou, S.; Yang, W.; Li, C.-Q. Design optimization considering variable thermal mass, insulation, absorptance of solar radiation and glazing ratio using prediction model and Genetic Algorithm. Sustainability 2018, 10, 336. [Google Scholar] [CrossRef] [Green Version]
  3. Lin, Y.; Yang, W. Application of Multi-objective Genetic Algorithm based Simulation for Cost-effective Building Energy Efficiency Design and Thermal Comfort Improvement. Front. Energy Res. 2018, 6, 25. [Google Scholar] [CrossRef] [Green Version]
  4. Zhu, D.; Yan, D.; Wang, C.; Hong, T. Comparison of building energy consumption simulation software: DeST, EnergyPlus and DOE-2. Build. Sci. 2012, 28, 213–222. [Google Scholar]
  5. Fumo, N.; Mago, P.; Luck, R. Methodology to estimate building energy consumption using EnergyPlus Benchmark Models. Energy Build. 2010, 42, 2331–2337. [Google Scholar] [CrossRef]
  6. Amiri, S.S.; Mottahedi, M.; Asadi, S. Using multiple regression analysis to develop energy consumption indicators for commercial buildings in the U.S. Energy Build. 2015, 109, 209–216. [Google Scholar] [CrossRef]
  7. Chen, S.; Zhou, X.; Zhou, G.; Fan, C.; Ding, P.; Chen, Q. An online physical-based multiple linear regression model for building’s hourly cooling load prediction. Energy Build. 2022, 254, 11574. [Google Scholar] [CrossRef]
  8. Ciulla, G.; D’Amico, A. Building energy performance forecasting: A multiple linear regression approach. Appl. Energy 2019, 253, 113500. [Google Scholar] [CrossRef]
  9. Tso, G.K.F.; Yau, K.K.W. Predicting electricity energy consumption-A comparison of regression analysis, decision tree and networks. Energy 2007, 32, 1761–1768. [Google Scholar] [CrossRef]
  10. Zhao, L.; Lin, Y.; Huang, X. Prediction Model for Energy Consumption and Visual Discomfort of Passive House Based on Stepwise Regression Analysis. Build. Energy Effic. 2021, 49, 50–55, 69. [Google Scholar]
  11. Ma, Z.; Ye, C.; Ma, W. Support vector regression for predicting building energy consumption in southern China. Energy Procedia 2019, 158, 3433–3438. [Google Scholar] [CrossRef]
  12. Li, Q.; Meng, Q.; Mochida, A. Applying support vector machine to predict hourly cooling load in the building. Appl. Energy 2009, 86, 2249–2256. [Google Scholar] [CrossRef]
  13. Paudel, S.; Elmitri, M.; Le Corre, O. A relevant data selection method for energy consumption prediction of low energy building based on support vector machine. Energy Build. 2017, 138, 240–256. [Google Scholar] [CrossRef]
  14. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
  15. Han, Y.; Fan, C.; Yu, B. Energy efficient building envelope using novel RBF neural network integrated affinity propagation. Energy 2020, 209, 118414. [Google Scholar] [CrossRef]
  16. Zhao, C.; Lin, S.; Xu, Q. Prediction of building energy consumption in collegue buildings based on GM-RBF neural network. J. Nanjing Univ. Sci. Technol. 2014, 38, 48–53. [Google Scholar]
  17. Zekić-Sušac, M.; Has, A.; Knežević, M. Predicting energy cost of public buildings by artificial neural networks, CART, and random forest. Neurocomputing 2021, 439, 223–233. [Google Scholar] [CrossRef]
  18. Capozzoli, A.; Grassi, D.; Causone, F. Estimation models of heating energy consumption in schools for local authorities planning. Energy Build. 2015, 105, 302–313. [Google Scholar] [CrossRef] [Green Version]
  19. Yang, J.; Wu, J. Research on energy-saving optimization of commercial central air-conditioning based on data mining algorithm. Energy Build. 2022, 272, 112326. [Google Scholar] [CrossRef]
  20. Kusiak, A.; Li, M.; Zhang, Z. A data-driven approach for steam load prediction in buildings. Appl. Energy 2010, 87, 925–933. [Google Scholar] [CrossRef]
  21. Yan, L.; Hu, P.; Li, C.; Yao, Y.; Xing, L.; Lei, F.; Zhu, N. The performance prediction of ground source heat pump system based on monitoring data and data mining technology. Energy Build. 2016, 127, 1085–1095. [Google Scholar] [CrossRef]
  22. Li, K.; Xie, X.; Yang, X. A hybrid teaching-learning artificial neural network for building electrical energy consumption prediction. Energy Build. 2018, 174, 323–334. [Google Scholar] [CrossRef]
  23. Moayedi, H.; Mu’azu, M.A.; Foong, K.K. Novel swarm-based approach for predicting the cooling load of residential buildings based on social behavior of elephant herds. Energy Build. 2020, 206, 109579. [Google Scholar] [CrossRef]
  24. Aruta, G.; Ascione, F.; Boettcher, O.; De Masi, R.F.; Mauro, G.M.; Vanoli, G.P. Machine learning to predict building energy performance in different climates. IOP Conf. Ser. Earth Environ. Sci. 2022, 1078, 012137. [Google Scholar] [CrossRef]
  25. Ndiaye, D.; Gabriel, K. Principal component analysis of the electricity consumption in residential dwellings. Energy Build. 2011, 43, 446–453. [Google Scholar] [CrossRef]
  26. Shen, M.; Sun, H.; Lu, Y. Household Electricity Consumption Prediction Under Multiple Behavioural Intervention Strategies Using Support Vector Regression. Energy Procedia 2017, 142, 2734–2739. [Google Scholar] [CrossRef]
  27. Jain, R.K.; Smith, K.M.; Taylor, J.E. Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Appl. Energy 2014, 123, 168–178. [Google Scholar] [CrossRef]
  28. Rahman, A.; Srikumar, V.; Smith, A.D. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl. Energy 2017, 212, 372–385. [Google Scholar] [CrossRef]
  29. Wang, Z.; Liu, X.; Li, H. Energy performance prediction of vapor-injection air source heat pumps in residential buildings using a neural network model. Energy Build. 2020, 228, 110499. [Google Scholar] [CrossRef]
  30. Kim, T.-Y.; Cho, S.-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
  31. Bui, D.-K.; Nguyen, T.N.; Nguyen-Xuan, H. An artificial neural network (ANN) expert system enhanced with the electromagnetism-based firefly algorithm (EFA) for predicting the energy consumption in buildings. Energy 2020, 190, 116370. [Google Scholar] [CrossRef]
  32. Farzana, S.; Liu, M.; Hossain, M.U. Multi-model prediction and simulation of residential building energy in urban areas of Chongqing, South West China. Energy Build. 2014, 81, 161–169. [Google Scholar] [CrossRef]
  33. Guo, Z.; Moayedi, H.; Bahiraei, M. Optimal modification of heating, ventilation, and air conditioning system performances in residential buildings using the integration of metaheuristic optimization and neural computing. Energy Build. 2020, 214, 109866. [Google Scholar] [CrossRef]
  34. Kerdan, I.G.; Gálvez, D.M. Artificial neural network structure optimisation for accurately prediction of exergy, comfort and life cycle cost performance of a low energy building. Appl. Energy 2020, 280, 115862. [Google Scholar] [CrossRef]
  35. Chegari, B.; Tabaa, M.; Medromi, H. Multi-objective optimization of building energy performance and indoor thermal comfort by combining artificial neural networks and metaheuristic algorithms. Energy Build. 2021, 239, 110839. [Google Scholar] [CrossRef]
  36. IBM. Available online: https://www.ibm.com/spss (accessed on 1 November 2022).
  37. Asadi, S.; Amiri, S.S.; Mottahedi, M. On the development of multi-linear regression analysis to assess energy consumption in the early stages of building design. Energy Build. 2014, 85, 246–255. [Google Scholar] [CrossRef]
  38. Wang, M.; Wright, J.; Brownlee, A.; Buswell, R. A comparison of approaches to stepwise regression on variables sensitivities in building simulation and analysis. Energy Build. 2016, 127, 313–326. [Google Scholar] [CrossRef] [Green Version]
  39. Gao, Y. Research on Building Energy Consumption Prediction Method Based on machine Learning. PhD. Thesis, Beijing University of Civil Engineering and Architecture, Beijing, China, 2020. [Google Scholar]
  40. Xue, W. SPSS Modeler Data Mining Methods and Applications, 3rd. ed.; Electronics Industry Press: Beijing, China, 2020. [Google Scholar]
  41. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  42. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth Inc.: New York, NY, USA, 1984. [Google Scholar]
  43. Kass, G.V. An exploratory technique for investigating large quantities of categorical data. J. R. Stat. Soc. 1980, 29, 119–127. [Google Scholar] [CrossRef] [Green Version]
  44. Biggs, D.; de Ville, B.; Suen, E. A method of choosing multiway partitions for classification and decision trees. J. Appl. Stat. 1991, 18, 49–62. [Google Scholar] [CrossRef]
  45. Zhang, X.; Wada, T.; Fujiwara, K.; Kano, M. Regression and independence based variable importance measure. Comput. Chem. Eng. 2020, 135, 106757. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the research strategy.
Figure 1. Flowchart of the research strategy.
Buildings 12 02039 g001
Figure 2. Regression between predicted and simulated electricity consumption: (a) MLR model vs. (b) BPNN model vs. (c) SVM vs. (d) BPNN model vs. (e) RNFN model vs. (f) CART model vs. (g) CHAID vs. (h) ECHAID model.
Figure 2. Regression between predicted and simulated electricity consumption: (a) MLR model vs. (b) BPNN model vs. (c) SVM vs. (d) BPNN model vs. (e) RNFN model vs. (f) CART model vs. (g) CHAID vs. (h) ECHAID model.
Buildings 12 02039 g002aBuildings 12 02039 g002b
Figure 3. Regression between predicted and simulated natural gas consumption: (a) MLR model vs. (b) BPNN model vs. (c) SVM vs. (d) BPNN model vs. (e) RNFN model vs. (f) CART model vs. (g) CHAID vs. (h) ECHAID model.
Figure 3. Regression between predicted and simulated natural gas consumption: (a) MLR model vs. (b) BPNN model vs. (c) SVM vs. (d) BPNN model vs. (e) RNFN model vs. (f) CART model vs. (g) CHAID vs. (h) ECHAID model.
Buildings 12 02039 g003aBuildings 12 02039 g003b
Table 1. Variable names and value ranges.
Table 1. Variable names and value ranges.
No.Information of the VariableVariable NameCollecting MethodValue Range
1Number of halogen bulbs used outdoors HalogenPhone survey0–5
2Number of compact fluorescent lamp (CFL) bulbs used outdoors CFLPhone survey0–4
3Number of fluorescent bulbs used outdoors FluorPhone survey0–4
4Number of incandescent lamps used outdoors IncandPhone survey0–5
5Awareness of the importance of reducing energy consumption RedEnergPhone survey1–5
6Awareness of the importance of spending less on energy billSpenLessPhone survey1–5
7Perceptions of government involvement in energy conservationGvInvolvPhone survey1–5
8Interested in learning more about ways to save energy indoorsLearnMorPhone survey1–5
9Interest in using computer software to control indoor energy consumptionCompSoftPhone survey1–5
10Number of occupants NbOccupPhone survey1–6
11Number of residents working full-timeFullTimePhone survey0–5
12Number of residents working part-timeParTimePhone survey0–1
13Number of residents working in shiftsSiftWorkPhone survey0–1
14Number of people working or staying at homeFromHomePhone survey0–3
15Housing situationHomStatePhone surveyOwned (1), Rent (2)
16Lights turned on when empty for a short period of timeLOnEmptyPhone survey1–3 Occurs more and more frequently
17The moment when the outdoor lights in front of the house are turned onTOnOutLtPhone survey1–3 Occurs more and more frequently
18Feeling safe between neighborsSafetyPhone survey1–5 Increased sense of security
19Worry about crimeCrimePhone survey1–5 Increased sense of security
20Age of the homeownerAgeRangePhone survey18–24 (1), 25–35 (2), 36–45 (3), 46–55 (4), 56–65 (5), over 65 (6)
21Number of energy-saving electrical appliances purchased in the past 5 years NbNewAppPhone survey0–7
22Fuel type of the ovenOvenFuelPhone surveyNatural gas (1), electricity (2)
23Fuel type of the dryerDryerFlPhone surveyNatural gas (1), electricity (2)
24Fuel type of the pool heatersPHeatrFlPhone surveyUnused (0), Solar (1), Natural Gas (2), Electricity (3)
25Upgrade or renovation of the house in the past five to ten yearsRecUpgdPhone surveyRenovated (1), Not renovated (2)
26Amount willing to spend on energy-efficient equipment (CAD)WlgSpendPhone survey<$100 (1), $100–250 (2), $250–500 (3), >$1000 (4)
27Highest level of educationLevelEduPhone surveyHigh School (1), College (2), University (3)
28Gross household income before taxes (CAD/year)HsIncomePhone survey<$20,000 (1), $20,000–$39,999 (2), $40,000–$59,999 (3), $60,000–$79,999 (4), $80,000–$99,999 (5), >$100,000 (6)
29Born in CanadaBornCanPhone surveyYes (1), No (2)
30Fuel type for heating systemHeatTypeEnergy auditElectricity (1), Natural gas (2), Oil (3)
31House typeHsTypeEnergy auditSingle detached (1), Row end (2)
32Number of floorsNbStorisEnergy audit1–2
33Heating system typeHSysTypeEnergy auditBaseboard (1), medium-efficiency furnace (2), heat pump (3), high-efficiency boiler (4)
34Fuel type for domestic water heatersDHWFuelEnergy auditNatural gas (1), Electricity (2)
35Types of domestic hot water heaterDHWTypeEnergy auditCondensing unit (1), Induced draft fan boiler (2), conventional tank heater (3)
36Existing air-conditioning systemACSystEnergy auditNo (0), Yes (1)
37Air-conditioning system typeACTypeEnergy auditcentral system (1), heat pump (2), Not applicable (3)
38Year builtConstYrEnergy auditPre 76 (1),1976–1987 (2), 1988–2002 (3)
39Heating system efficiencyHSysEffiEnergy audit76–100%
40Service length of the heating system (years)HSysAgeEnergy audit0–35
41Service length of the air-conditioning system (years)ACAgeEnergy audit0–33
42thermal resistance of the window (m2·K/W)TherReWindEnergy audit0.99–2.64
43thermal resistance of the external wall (m2·K/W)TherReWalEnergy audit0.64–3.12
44thermal resistance of the ceiling (m2·K/W)TherReCeiEnergy audit0.53–7.05
45Area of the ceiling (m2)CeilAreaEnergy audit45.2–227.4
46Area of the external wall (m2)TWlAreaEnergy audit52.8–317.6
47Area of the window (m2)TWdAreaEnergy audit6.7–49.2
48U-value of foundation wall (W/(m2·K))FwUvalueEnergy audit0.23–3.17
49U-value of the basement ceiling (W/(m2·K))BhUvalueEnergy audit0.48–3.87
50Air change rate per hour at 50 PaNbACHEnergy audit1.49–14.88
51Residential floor area (m2)ReFlAreaEnergy audit49–166
52Building orientationOriBuildEnergy audit1 East 2 West 3 South 4 North 5 Northeast 6 Southeast 7 Northwest 8 Southwest
53Building width (m)WidBuildEnergy audit5.18–16.46
54Building depth (m)DepBuildEnergy audit7.01–16.46
55Building perimeter (m)PerBuildEnergy audit28.65–52.43
56Window typeTypWindEnergy auditSingle-layer (1), Double-layer (2), Double-layer Low-E (3)
57Window frame typeTypWindFraEnergy auditWood (1), Vinyl (2), Metal (3)
58Door typeTypDoorEnergy auditWood (1), Steel (2)
59Door area (m2)AreDoorEnergy audit0.94–6.8
60Cooling system COPCOPRefSysEnergy audit2–10
61Ventilation system exhaust volume (m³/h)ExVolVentiEnergy audit1–15
62Floor area (m2)AreFloorEnergy audit97.8–374.6
63Total basement wall area (m2)AreBaseWalEnergy audit43.4–117.5
64Annual electricity consumption (kWh)AnnPowConsuEnergy audit+smart metering8944–50,415
65Annual natural gas consumption (m³)AnnNaGEnConsuEnergy audit0–5937
Table 2. Variable selected for predicting electricity consumption.
Table 2. Variable selected for predicting electricity consumption.
Number of VariablesVariable Set
26 (importance of variable (IV) ≥ 0.01)HeatType, DHWFuel, AreFloor, HSysEffi, HSysAge, HSysType, Halogen, NbOccup, TherReCei, FromHome, ACSyst, OriBuild, LOnEmpty, TherReWal, SpenLess, Incand, NbACH, PHeatrFl, AgeRange, LearnMor, ExVolVenti, FullTime, TWdArea, ConstYr, COPRefSys, CFL
12 (IV ≥ 0.016)HeatType, DHWFuel, AreFloor, HSysEffi, HSysAge, HSysType, Halogen, NbOccup, TherReCei, FromHome, ACSyst, OriBuild
6 (IV ≥ 0.05)HeatType, DHWFuel, AreFloor, HSysEffi, HSysAge, HSysType
Table 3. Variable selected for predicting natural gas consumption.
Table 3. Variable selected for predicting natural gas consumption.
Number of VariablesVariable Set
26 (IV ≥ 0.015)HeatType, NbACH, HSysEffi, TWlArea, Fluor, DHWFuel, Halogen, TherReWind, TherReWal, PerBuild, RedEnerg, NbOccup, PHeatrFl, SpenLess, TypWindFra, CeilArea, OvenFuel, BhUvalue, DHWType, ReFlArea, TherReCei, WidBuild, HomState, FwUvalue, AreBaseWal, AreFloor
13 (IV ≥ 0.022)HeatType, NbACH, HSysEffi, TWlArea, Fluor, DHWFuel, Halogen, TherReWind, TherReWal, PerBuild, RedEnerg, NbOccup, PHeatrFl
6 (IV ≥ 0.032)HeatType, NbACH, HSysEffi, TWlArea, Fluor, DHWFuel
Table 4. Range of relative errors for the eight electricity consumption prediction models.
Table 4. Range of relative errors for the eight electricity consumption prediction models.
Method≤5%≤15%≤25%≤50%
MLR38%64%79%98%
SR43%68%81%94%
SVM73%73%73%75%
BPNN99%100%100%100%
RBFN57%85%92%100%
CART89%97%98%99%
CHAID93%98%98%99%
ECHAID93%97%97%97%
Table 5. Range of relative errors for the eight natural gas consumption prediction models.
Table 5. Range of relative errors for the eight natural gas consumption prediction models.
Method≤5%≤15%≤25%≤50%
MLR22%62%83%98%
SR25%60%82%98%
SVM6%32%48%78%
BPNN93%96%99%99%
RBFN30%75%93%99%
CART49%83%93%99%
CHAID38%60%76%87%
ECHAID38%60%76%87%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lin, Y.; Liu, J.; Gabriel, K.; Yang, W.; Li, C.-Q. Data-Driven Based Prediction of the Energy Consumption of Residential Buildings in Oshawa. Buildings 2022, 12, 2039. https://doi.org/10.3390/buildings12112039

AMA Style

Lin Y, Liu J, Gabriel K, Yang W, Li C-Q. Data-Driven Based Prediction of the Energy Consumption of Residential Buildings in Oshawa. Buildings. 2022; 12(11):2039. https://doi.org/10.3390/buildings12112039

Chicago/Turabian Style

Lin, Yaolin, Jingye Liu, Kamiel Gabriel, Wei Yang, and Chun-Qing Li. 2022. "Data-Driven Based Prediction of the Energy Consumption of Residential Buildings in Oshawa" Buildings 12, no. 11: 2039. https://doi.org/10.3390/buildings12112039

APA Style

Lin, Y., Liu, J., Gabriel, K., Yang, W., & Li, C. -Q. (2022). Data-Driven Based Prediction of the Energy Consumption of Residential Buildings in Oshawa. Buildings, 12(11), 2039. https://doi.org/10.3390/buildings12112039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop