4.1. Layer 1: Data Collection
University of Florida (UF) has an 800 hectare campus and more than 900 buildings. According to the campus utility data obtained from UF’s Physical Plant Division (PPD), there are a total of 217 buildings with a sensor configuration that captures an individual building’s utility consumption every 15 minutes. UF also provided access to the documentation of the building’s energy performance for various energy performance rating systems, such as the US Green Building Council’s (USGBC) Leadership in Energy and Environmental Design (LEED) rating system. Reviewing energy rating documentation varying from preliminary design plans to as-built plans and LEED V4 (release date: Nov. 2013) forms, we could derive some of the thermophysical properties that influence building energy performance. Twelve buildings had the required information available. The set includes various primary building functions, such as educational, residential, research laboratory, and sport facilities. Four types of data were collected for this research: (A) space functionality characteristics; (B) building thermophysical properties, including lighting and equipment energy intensities; (C) building energy use; and (D) historic and future weather data.
(A) Space functionality characteristics were determined using the percentages of different functional spaces in every building, which were calculated for each building used in this study. Offices, classrooms, teaching labs, research areas, auditoriums, gymnasiums, and residential areas are some of the functional spaces used for this classification. For instance, Rinker Hall (Bldg. ID 0272), which houses UF’s School of Construction Management, consists of 14% classrooms and 25% office areas, while Hough Graduate School of Business (Bldg. ID 0064) has 21% office areas and 22% classrooms. The space classification percentages were based on the Gross Square Feet (GSF) area of the buildings [
29].
Table 1 represents the list of space functionality percentages used in this study. Note that space functionalities like campus supply, residential, or sport that were mostly zero for the selected buildings were removed from the study in later sections.
(B) The building thermophysical properties that we could derive from the energy performance documents, and can be seen in
Table 2, were total Gross Square Feet (GSM), number of floors, exterior walls U-value (W/m
2 °C), windows U-value (W/m
2 °C), window to wall ratio, Solar Heat Gain Coefficient (SHGC), floor U-value (W/m
2 °C), roof U-value (W/m
2 °C), Lighting Power Density (LPD) (W/m
2), and Equipment Power Density (EPD) (W/m
2).
(C) Monthly and hourly utility consumption for the 12 buildings were collected for 3 years (36 months) from January 2015 to December 2017. The buildings are of various ages. The purpose of choosing a 3-year time period was to make sure that data were available for all the buildings over the analysis time period. The consumption values are in kilowatt hour (kWh).
(D) Monthly and hourly average outdoor temperatures (°C), relative humidity (%), and solar radiation (W/m
2) for UF campus were collected from the Florida Automated Weather Network (FAWN) and can be seen in
Figure 2. In addition, in order to assess the effects of climate change on campus energy consumption, we collected hourly average outdoor temperatures (°C), relative humidity (%), and solar radiation (W/m
2) for the city of Gainesville, FL for three future weather scenarios based on their impact, namely the median (year 2063), hottest (2057), and coldest (2041), representing climate conditions for the 2038 to 2066 period. The climate scenarios were created by a proprietary algorithm developed by SeventhWave, a not-for-profit company in Madison, Wisconsin. The algorithm uses climate change variables from North American Regional Climate Change Assessment Program (NARCCAP), which is an international program that serves the high-resolution climate scenario needs of the United States, Canada, and northern Mexico, and uses a regional climate model, coupled global climate model, and time-slice experiments.
4.2. Layer 2: AI Development
This section describes how the model was developed for AI model implementation. In order to forecast the energy performance of buildings based on their historic consumption and climate data and thermophysical properties, we implemented k-means clustering, PCA, ARIMA, polynomial regression analysis, and Long Short-Term Memory (LSTM) methods.
4.2.1. K-Means Clustering
For clustering, we used eight input variables which are a linear combination of the initially introduced thermophysical and space functionality variables as follows:
Building Thermophysical Properties: 1.1. Total U-value × Area (W/°C), 1.2. Windows SHGC × Area (m2), and 1.3. Total Power (W).
Space Functionality Percentages: 2.1. Classroom, 2.2. Office, 2.3. Residential, 2.4. Teaching Laboratories, and 2.5. Research Laboratories.
Variable 1.1 is the sum–product of exterior walls and windows U-values and their surface areas in W/°C, variable 1.2 is the sum–product of window SHGC and their surface area in m2, and variable 1.3 is building total lighting and equipment power in Watts.
The buildings were clustered into similar building types using the k-means approach to reduce the complexity of forecasting, so that there was no need to model each and every campus building in order to predict its energy consumption. Also, building clusters were used in extrapolating representative building energy use to campus energy use. K-means is usually used for cluster analysis in data mining. It aims to partition n observations into k clusters in which each observation fits to the cluster with the closest mean, that is, the cluster prototype.
Figure 3 shows the results of clustering buildings based on their thermophysical and space functionality properties using k-means clustering. The method alternates between two steps:
Here, each axis is a unitless linear combination of the eight independent thermophysical and space functionality variables that we defined in our study. The reason for using such functions was simply that we were unable to map each of the buildings in an 8-dimensional space. Therefore, we needed to use these functions to map the buildings in a 2-dimensional space. As a result, we could partition the buildings into four clusters of educational (yellow), residential (green), research (purple), and sport (blue) buildings. We could observe that buildings with similar space functionalities and thermophysical properties are located closer to each other in the k-means clustering 2-dimensional map. Also, we see that building 0860 is located near the residential buildings, but as we knew the sport functionality of this building, we categorized it in a different cluster.
Figure 4 shows the 3-year (2015–2017) monthly energy consumption in MWh for buildings within each of the clusters. Clusters 0, 1, 2, and 3 are of type research, sport, residential, and educational buildings, respectively. The relative similarity of the consumption patterns over time for the buildings within a cluster could be observed.
4.2.2. PCA and ARIMA
PCA is a multivariate statistical approach for assessing the correlations existing among a set of intercorrelated variables. Being able to categorize a complex and highly intercorrelated set of variables, PCA gives a better understanding of cause and effect relationships. Tardioli et al. [
31] used PCA, k-means clustering, and RF to identify representative buildings and building groups in a set of commercial urban buildings by using building typology, construction period, district location, building final use and geometric information. In this study, we conducted PCA in order to prioritize the eight independent variables based on their effects on campus building energy consumption.
ARIMA models are the most typical model of time series prediction methods. Lu et al. [
32] used ARIMA, ANN, and SVR to predict the hourly electricity, heating, and cooling energy consumption for a set of community sports buildings, in which they considered building heterogeneity to improve forecast accuracy. Initially, ARIMA forecasting was conducted for each cluster prototype representing the average amount of energy consumption of buildings within each cluster. Here, the training size was 75% of the available data and the testing size was 25%. After training, the overall energy consumption of buildings could be forecasted using their cluster prototype for 2018. The dependent variable was the total monthly energy consumption, normalized by the average outdoor temperature in each month, in order to include the effects of this climate variable in the prediction model.
As a measure of accuracy, we used Mean Squared Error (MSE). We calculated the error percentages by dividing MSE by the cluster’s representative building energy consumption, which was the average amount of energy consumption of buildings within each cluster.
Table 3 shows the comparison of output errors of the forecasting results.
Based on this comparison, we concluded that for the majority of buildings, conducting PCA with ARIMA forecasting would result in better accuracy. Also, it should be noted that PCA reduces the dimension of input variables significantly and hence results in better accuracy. Consequently, in order to increase the model accuracy levels, instead of forecasting only based on cluster representative buildings, we conducted PCA and ARIMA for all the 12 buildings used in this study.
4.3. Layer 3: Model Validation
To make CEUP more reliable, we needed to validate it with the buildings’ actual energy use. The validation process was essential in order to produce realistic energy use predictions. Our validation method followed these steps:
Compare CEUP results with the buildings’ actual energy use;
Calculate validation measures (i.e., RMSE, etc.);
Compare the validation measures to the allowable range according to building energy codes.
For validation of CEUP monthly energy forecasting, according to availability of actual energy consumption data, we used the buildings’ monthly energy consumption data from year 2017. As an example,
Figure 5 shows actual versus CEUP simulated energy use for the educational cluster representative, UF Rinker School of Construction Management. Other than the month of February, the actual versus CEUP simulated energy consumption patterns were quite similar. The inconsistency can by due to energy sensor malfunction in that specific month.
Referring to the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) guidelines, the acceptable range of CVRMSE for monthly validation is ±15%.
Table 4 shows the CVRMSE calculations for actual energy use versus CEUP simulations for Rinker Hall. The calculated 14.1% CVRMSE is within the acceptable ranges of ASHRAE code.
Accordingly, the CVRMSE for actual energy use versus CEUP simulations for research, sport, and residential cluster representative buildings were calculated as 9.22%, 8.3%, and 9.13%, meeting ASHRAE requirements and validating CEUP acceptable levels of accuracy.
4.4. Layer 4: Model Implementation
4.4.1. Hourly Energy Use Prediction Based on Climate and Temporal Properties
In order to predict the energy use of campus buildings in hourly time intervals, and regarding data availability, we collected hourly utility consumption data for eight of the sample buildings used in this study for two years between January 2016 to December 2017 (dependent variables). In addition, average outdoor temperature (°C), relative humidity (%), and solar radiation (W/m2) values were collected for the two years of study, as well as the three years of future climate scenarios for the years 2041, 2057, and 2063. Furthermore, to account for the effects of seasonality, we introduced hour of day as the temporal variable (independent variables). Also, we have considered the absolute deviation of average outdoor temperature and cooling degree days baseline (Tcdd, 18.3 °C) and its squared values as two other independent variables for our study. In order to predict a campus building’s hourly energy use, we assessed the performance of two methods that we describe in the following sections.
Polynomial Regression Analysis
Initially, we calculated the correlation coefficients between the buildings’ energy use and the four climatic and temporal predictive variables for hourly values, which can be found in
Table 5. The correlations between the buildings’ hourly energy consumptions and average outdoor temperature was considerably higher than the other two climatic variables and the temporal variable. This shows that among climatic and temporal variables, average outdoor temperature is a better measure to predict hourly energy consumption of campus buildings given the available consumption data. Then, we conducted regression analysis to assess the performance of the four variables in predicting the buildings’ hourly energy consumption. Six degrees of regression were tested for the prediction variables. R-squared values which are statistical measures of how close the data are to the fitted regression line, were calculated for each degree of regression analysis.
As it can be seen in
Table 6, as we increased the polynomial degrees, the R-squared values tended to increase, toward higher accuracy of fitted lines. However, raising the polynomial regression degrees can result in higher chances of overfitting the regression, which we wanted to avoid. Therefore, we concluded that polynomial regression analysis is not an appropriate method for predicting hourly energy use of campus buildings over an entire year, given the amount of available data. It was expected that other models that can capture more feedbacks would perform better than regression models.
Long Short-Term Memory (LSTM)
LSTM blocks are building units for layers of Recurrent Neural Networks (RNN). RNNs composed of LSTM units are usually called LSTM networks. LSTM units have various architectures. A usual LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. The cell accounts for memorizing values over random time intervals and, as a result, the memory in LSTM. The three gates can be considered as an artificial neuron, the same as a feedforward/multilayer neural network.
Figure 6 shows a typical LSTM cell used to forecast time series data, where
is the input vector for LSTM unit,
,
, and
are the activation vectors for the forget gate, the input gate, and the output gate,
is the output vector for the LSTM unit, and
is the cell state vector. Average outdoor temperature, relative humidity, solar radiation, and hour of the day were used as the predictor variables in this method.
We allocated 50% of the consumption data to model training and the other 50% to model testing. Also, while training neural networks, one epoch refers to one passage of the entire training set. In this analysis, we considered 50 epochs for training and testing steps. Mean squared error was used as the objective loss function to minimize, in order to assess the accuracy levels of LSTM. As an example,
Figure 7 shows the loss (unitless) values of training and testing hourly energy consumption for building 1377, Emerging Pathogens Institute, over 50 epochs. As can be seen, after 50 epochs, the error percentage of tested data was almost 8%, hence, representing the high level of LSTM accuracy for predicting the buildings’ hourly energy use based on the four climatic and temporal variables, as well as complying with the +/–15% acceptable error of the ASHRAE-14 guideline.
Accordingly, we predicted the hourly energy consumption of the eight buildings for the three climate scenarios. LSTM is a prediction method for time series data, and currently we only know the building’s consumption values as of now, and hence, there are no data from now to any of the future years for which we are interested in predicting the consumption pattern and values. As a result, the levels of error for predicting three years far in the future was relatively high. Consequently, the predicted scenario results were normalized by the mean values of consumptions over a year, in order to mitigate the effects of the induced error. As an example,
Figure 8 shows the LSTM-predicted hourly energy consumption (in MWh) of building 0064, UF Hough Graduate School of Business, for the median (year 2063), hottest (2057), and coldest (2041) future scenarios.
It should be noted that, based on the nature of the LSTM prediction method, while normalizing and forecasting the hourly energy consumption values, a few negative values were observed, which were considered as 0.
4.4.2. Monthly Energy Use Prediction Based on Climate Properties
Table 7 shows the CEUP simulated monthly energy consumption values in MWh for the twelve buildings used in this study by using the ARIMA forecasting technique. The total energy consumption, normalized by average outdoor temperature values, was simulated with the CEUP tool for year 2018 and was calculated to be 26,676 MWh for the twelve buildings. According to the utility consumption data from UF PPD, we could extrapolate the consumption of this set of buildings to the entire UF campus, based on campus buildings space functionality percentages, and predict 2018 campus energy consumption to be 812,560 MWh.
4.4.3. Extrapolation to Campus Energy Consumption
We calculated UF campus buildings’ energy consumption by extrapolating the consumption of the set of representative buildings to the entire UF campus, based on campus buildings’ space functionality percentages. According to US NARCCAP future climate scenarios, in order to estimate campus operational energy consumption under long-term climate change, three future climate scenarios, median, hottest, and coldest annual average temperature, were used. Considering year 2018 as the simulation baseline, with the mentioned approaches for hourly and monthly CEUP, we predicted annual campus energy consumption values for the three future climate scenarios. The results in MWh can be found in
Table 8.
It can be seen that the variation of campus energy use in the upcoming 40 years, based on NARCCAP future weather scenarios, can be between +3.64% to +19.81%, and should be managed accordingly. CEUP is a credible tool for predicting campus energy use, and given various possible climate scenarios, CEUP can be helpful to campus energy managers to plan their energy strategies. Also, by using additional building data, we can increase the forecasting accuracy levels and develop the CEUP model to be representative of campus energy performance.
4.4.4. Scenario Analysis
After calculating the campus energy use for the three future scenarios, we wanted to assess how the changes in building thermophysical properties would change the campus energy consumption in the three future scenarios of the median (year 2063), hottest (2057), and coldest (2041). In order to conduct this analysis, we referred to the technology roadmap that is used to assess and present the development of technology products in building sector. The technology roadmap for building sectors can be categorized into five groups: building envelope, lighting, electronics, HVAC, and energy management, among which our focus is on the building envelope group. According to the Technology Roadmap—Energy Efficient Building Envelopes (IEA, 2013), we introduced five envelope scenarios for each of the future climate scenarios, as can be seen in
Table 9.
By updating CEUP results according to the properties of five envelope scenarios, the configurations of building clusters change, resulting in different campus energy use predictions for the three future climate scenarios. As an example,
Figure 9 shows the updated building k-means clustering result for envelope scenario 5.
According to the updated building clustering, CEUP simulation results for the future climate and envelope scenarios can be seen in
Figure 10 and
Table 10. In all envelope scenarios, campus energy consumption rises when compared to the baseline year of 2018. It should be noted that the highest energy use level happens in year 2057 (hottest climate scenario).
According to the scenario analysis results, we could observe that scenario 5 had the most influence on campus energy use when compared to the other scenarios. According to this scenario, campus energy use can be between 1.39% to 16.45% higher when compared to the amount of energy used by the campus in the baseline year 2018, for the three future climate scenarios.