Predicting Site Energy Usage Intensity Using Machine Learning Models
Abstract
:1. Introduction
2. Proposed Procedure
2.1. Dataset Collection
2.2. Preprocessing
2.3. Overview of the Used Machine Learning Algorithms
- Random forest (RF): is a versatile ML algorithm that can be used for both regression and classification problems [30]. It is an ensemble ML algorithm consisting of multiple decision trees, adding more randomness as the forest grows. When compared to other ML methods, RF has additional advantages such as providing an estimation of the input variables’ importance, its lower sensitivity to noise compared, handling missing values, avoiding overfitting, etc., allowing it to achieve higher performances. Because our research was conducted on a dataset with continuous output labels, we used regression [31] rather than classification. The RF operates by constructing a collection of DTs from various combinations of samples and taking the average results obtained by those trees.
- Gradient boost decision tree (GBDT) regressor: is an ensemble learning technique for regression problems that consists of weak DT learners to produce the final output. GBDT algorithms considerably minimize the loss function and optimize the predictions by implementing a parallel learning approach via a gradient boost. GBDT also prevents overfitting and low learning time [32].
- Decision tree (DT) regressor: is a decision support mechanism with a tree-like structure representing the input features as nodes with test outcomes represented by branches. Using the dataset attributes and following the entropy concept, the DT is built in a top-down fashion following the recursive partitioning methodology, called CART [33], The root node represents the most critical predictor. The DT node homogeneity, branches’ construction, and leaf node values are obtained from Equations (2) and (3), respectively.
- Support vector regressor (SVR): is a parametric regression algorithm that uses a kernel function to manipulate and fit the data samples so that in a high-dimension space, a non-linear decision surface can be transformed into a linear one described by Equation (4). SVR’s objective is to find the optimal hyperplane that minimizes the absolute error, , to that of the maximum allowed threshold error range (named epsilon) as shown in Equation (5), where is the Euclidean norm of the vector, .
2.4. Methods’ Accuracy
- RMSE: is used to express the root mean squared difference between the observed actual values and the model predicted values. It is said to be used for absolute error representations.
- MAE: is a simple equation to calculate the regression model evaluation metric, referred to as the average absolute error between the observations and the predictions. It is being used to evaluate the dataset residuals’ average.
3. Experimentation
3.1. Development Environment
3.2. Dataset
3.3. Hyperparameter Tuning
3.4. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Solomon, S.; Plattner, G.-K.; Friedlingstein, P. Irreversible climate change due to carbon dioxide emissions. Proc. Natl. Acad. Sci. USA 2009, 106, 1704–1709. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rodrigo-Comino, J.; Salvia, R.; Quaranta, G.; Cudlín, P.; Salvati, L.; Gimenez-Morera, A. Climate Aridity and the Geographical Shift of Olive Trees in a Mediterranean Northern Region. Climate 2021, 9, 64. [Google Scholar] [CrossRef]
- Wheeler, T.; von Braun, J. Climate Change Impacts on Global Food Security. Science 2013, 341, 508–513. [Google Scholar] [CrossRef] [PubMed]
- Grossi, G.; Goglio, P.; Vitali, A.; Williams, A.G. Livestock and Climate Change: Impact of Livestock on Climate and Mitigation Strategies. Anim. Front. 2019, 9, 69–76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- NASA Global Climate Change. Available online: https://climate.nasa.gov/global-warming-vs-climate-change/ (accessed on 4 July 2022).
- NASA Climate Kids. Available online: https://climatekids.nasa.gov/climate-model/ (accessed on 5 July 2022).
- Why Buildings Are the Foundation of an Energy-Efficient Future. Available online: https://www.weforum.org/agenda/2021/02/why-the-buildings-of-the-future-are-key-to-an-efficient-energy-ecosystem/ (accessed on 6 July 2022).
- Strategies to Save Energy in Commercial Buildings. Available online: https://www.bartingalemechanical.com/strategies-to-save-energy-in-commercial-buildings/ (accessed on 6 July 2022).
- Md. Motaharul, I.; Ngnamsie Njimbouom, S.; Arham, A.S. Efficient Payload Compression in IP-based Wireless Sensor Network: Algorithmic Analysis and Implementation. J. Sens. 2019, 2019, 9808321. [Google Scholar] [CrossRef]
- Md. Motaharul, I.; Ngnamsie Njimbouom, S.; Faizullah, S. Structural Health Monitoring by Payload Compression in Wireless Sensors Network: An Algorithmic Analysis. Int. J. Eng. Manag. Res. 2018, 8, 184–190. [Google Scholar] [CrossRef]
- Frei, M.; Deb, C.; Stadler, R.; Nagy, Z.; Schlueter, A. Wireless sensor network for estimating building performance. Autom. Constr. 2020, 111, 103043. [Google Scholar] [CrossRef]
- Maiti, P.; Sahoo, B.; Turuk, A.K.; Satpathy, S. Sensors data collection architecture in the Internet of Mobile Things as a service (IoMTaaS) platform. In Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 10–11 February 2017; pp. 578–582. [Google Scholar] [CrossRef]
- Kang, I.-A.; Ngnamsie Njimbouom, S.; Lee, K.-O.; Kim, J.-D. DCP: Prediction of Dental Caries Using Machine Learning in Personalized Medicine. Appl. Sci. 2022, 12, 3043. [Google Scholar] [CrossRef]
- Ngnamsie Njimbouom, S.; Lee, K.; Kim, J.-D. MMDCP: Multi-Modal Dental Caries Prediction for Decision Support System Using Deep Learning. Int. J. Environ. Res. Public Health 2022, 19, 10928. [Google Scholar] [CrossRef]
- Elbasani, E.; Kim, J.-D. LLAD: Life-log anomaly detection based on recurrent neural network LSTM. J. Healthc. Eng. 2021, 2021, 8829403. [Google Scholar] [CrossRef]
- Elbasani, E.; Ngnamsie Njimbouom, S.; Oh, T.J.; Kim, E.H.; Lee, H.; Kim, J.-D. GCRNN: Graph convolutional recurrent neural network for compound–protein interaction prediction. BMC Bioinform. 2021, 22, 616. [Google Scholar] [CrossRef] [PubMed]
- Elbasani, E.; Kim, J.-D. AM R-CNN: Abstract Meaning Representation with Convolution Neural Network for Toxic Content Detection. J. Web Eng. 2022, 21, 677–692. [Google Scholar]
- Zhong, H.; Wang, J.; Jia, H.; Mu, Y.; Lv, S. Vector field-based support vector regression for building energy consumption prediction. Appl. Energy 2019, 242, 403–414. [Google Scholar] [CrossRef]
- Joe, J.; Karava, P. A model predictive control strategy to optimize the performance of radiant floor heating and cooling systems in office buildings. Appl. Energy 2019, 245, 65–77. [Google Scholar] [CrossRef]
- Ajay, J.; Song, C.; Rathore, A.S.; Zhou, C.; Xu, W. 3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, Xi’an, China, 8–12 April 2017; pp. 419–433. [Google Scholar]
- Massana, J.; Pous, C.; Burgas, L.; Melendez, J.; Colomer, J. Short-term load forecasting in a non-residential building contrasting artificial occupancy attributes. Energy Build. 2016, 130, 519–531. [Google Scholar] [CrossRef] [Green Version]
- Farzana, S.; Liu, M.; Baldwin, A.; Hossain, M.U. Multi-model prediction and simulation of residential building energy in urban areas of Chongqing, Southwest China. Energy Build. 2014, 81, 161–169. [Google Scholar] [CrossRef]
- Fan, C.; Wang, J.; Gang, W.; Li, S. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Appl. Energy 2019, 236, 700–710. [Google Scholar] [CrossRef]
- Ahmad, M.W.M.M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
- Zhang, Y.; O’Neill, Z.; Dong, B.; Augenbroe, G. Comparisons of inverse modeling approaches for predicting building energy performance. Build. Environ. 2015, 86, 177–190. [Google Scholar] [CrossRef]
- WIDS Datathon 2022. Available online: https://www.kaggle.com/competitions/widsdatathon2022/data (accessed on 4 July 2022).
- Haq, I.U.; Gondal, I.; Vamplew, P.; Brown, S. Categorical Features Transformation with Compact One-Hot Encoder for Fraud Detection in Distributed Environment. In Communications in Computer and Information Science; Springer: Singapore, 2019; pp. 69–80. [Google Scholar] [CrossRef]
- Seger, C. An Investigation of Categorical Variable Encoding Techniques in Machine Learning: Binary Versus One-Hot and Feature Hashing, Technical Report 2018:596; KTH, School of Electrical Engineering and Computer Science (EECS): Stockholm, Sweden, 2018; URN: urn:nbn:se:kth:diva-237426.
- Patro, S.G.K.; Sahu, K.K. Normalization: A preprocessing stage. arXiv 2015, arXiv:1503.06462. [Google Scholar] [CrossRef]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
- Segal, M.R. Machine Learning Benchmarks and Random Forest Regression; eScholarship Repository; UCSF: Center for Bioinformatics and Molecular Biostatistics: 2004. Available online: https://escholarship.org/uc/item/35x3v9t4 (accessed on 17 November 2022).
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Loh, W.Y. Classification and regression trees. WIREs Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
- Ali, P.J.M.; Faraj, R.H. Data Normalization and Standardization: A Technical Report. Mach. Learn. Tech. Rep. 2014, 1, 1–6. [Google Scholar]
- Chen, R.C.; Dewi, C.; Huang, S.W.; Caraka, R.E. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
Components | Description |
---|---|
GPU | NVIDIA RTX 3080 Ti x 4 |
CPU | Intel Core i-9-9900k (3.60 Hz) |
RAM | 64 GB (16 GB × 4) |
OS | Ubuntu 18.04 64 bit |
CUDA | 11.2.0 |
TensorFlow | 2.5.0 |
Python | 3.8.0 |
Models | Hyperparameters | Description | |
---|---|---|---|
RF | N_estimator | # Of DT that will constitute the forest | 635 |
Max_feature | # Of features in each tree | auto | |
Max_depth | Max depth of each DT | 150 | |
Min_sample_leaf | # Of required samples at leaf node | 1 | |
DT | Max_depth | 5 | |
Max_feature | Auto | ||
Min_samples_leaf | 2 | ||
Max_leaf_nodes | Identical to RF and GBDT | 40 | |
Min_weight_fraction_leaf | Fraction of samples’ sum of weight required at leaf node | 0.1 | |
Splitter | Split strategy for each node | random | |
GBDT | Max_depth | 40 | |
N_estimators | 142 | ||
Max_features | Identical to parameters of RF and DT | auto | |
Min_sample_leaf | 63 | ||
Subsample | Fraction of sample sets used in fitting each tree learner | 0.65 | |
Learning_rate | Rate at which each learning tree contribute | 0.05 | |
SVM | Kernel | Type of kernel used in the algorithm | rbf |
C | Weight importance for the training data | 1.0 | |
Gamma | Factor controlling single point distance of the influence | 0.4 | |
epsilon | Margin of error that can be tolerated | 0.2 |
Models | MAE | RMSE |
---|---|---|
DT | 20.61 | 26.87 |
GBDT | 14.94 | 20.88 |
RF | 14.63 | 20.54 |
SVM | 17.25 | 23.52 |
Pearson Correlation | ANOVA | |||
---|---|---|---|---|
Models | MAE | RMSE | MAE | RMSE |
DT | 20.61 | 26.87 | 21.25 | 27.21 |
GBDT | 14.94 | 20.88 | 20.29 | 26.27 |
RF | 14.63 | 20.54 | 21.75 | 28.27 |
SVM | 17.25 | 23.52 | 23.23 | 29.02 |
Models | MAE | RMSE |
---|---|---|
DT | 20.52 | 26.80 |
GBDT | 15.12 | 21.10 |
RF | 14.91 | 20.84 |
SVM | 17.30 | 23.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ngnamsie Njimbouom, S.; Lee, K.; Lee, H.; Kim, J. Predicting Site Energy Usage Intensity Using Machine Learning Models. Sensors 2023, 23, 82. https://doi.org/10.3390/s23010082
Ngnamsie Njimbouom S, Lee K, Lee H, Kim J. Predicting Site Energy Usage Intensity Using Machine Learning Models. Sensors. 2023; 23(1):82. https://doi.org/10.3390/s23010082
Chicago/Turabian StyleNgnamsie Njimbouom, Soualihou, Kwonwoo Lee, Hyun Lee, and Jeongdong Kim. 2023. "Predicting Site Energy Usage Intensity Using Machine Learning Models" Sensors 23, no. 1: 82. https://doi.org/10.3390/s23010082
APA StyleNgnamsie Njimbouom, S., Lee, K., Lee, H., & Kim, J. (2023). Predicting Site Energy Usage Intensity Using Machine Learning Models. Sensors, 23(1), 82. https://doi.org/10.3390/s23010082