2.1. Classification
Classification algorithms are used to categorize a set of input values. This is achieved by comparing the similarities between input values from the baseline dataset to new input data. Datasets used for classification are constructed with a set of input variables, called features, and an output class, which is a categorical value dependent on the dataset’s input variables. Types of faults in building systems are defined by their failure mode, represented by a categorical value in fault analytics. Classification algorithms generate a mapping between the feature space and each output. The feature space of a fault-free cooling coil is displayed in
Figure 2a. The feature space for a cooling coil with fouling is displayed in
Figure 2b.
The example cooling coil shown has three sets of input variables: a cooling coil valve command, a cooling coil leaving temperature, and a cooling coil inlet (mixed air) temperature. The two data categories, or output classes, in
Figure 2 are fault-free coil data and coil-fouling data. For this example, the fault-free class is defined as the baseline. Each output class is paired with a set of input data, which consists of system data from the same year-long baseline period. The fault-free data points are measured directly from the system, without modifications or additional simulation. The coil-fouling points are generated using a first-principles equation with reduced chilled water flowrate values to produce a new cooling coil leaving temperature. In the fouled-coil dataset, this new temperature value replaces the measured cooling coil leaving temperature value from the baseline dataset. A second fault manifestation was simulated by increasing the cooling coil valve command, which may be measured from data if the building compensates for the fouled coil to meet the coil’s temperature setpoint. One output class of data points is shown in each figure for clarity, though the two classes belong to the same dataset and will be analyzed together.
Ongoing measurements from the BMS will be mapped to the feature space and evaluated with a probability of belonging to each class. The objective of the classification algorithm is to determine whether ongoing measurements belong to the baseline or coil fouling point category so that a determination can be made about the state of the system when those points were measured. These determinations are called predictions.
ML analysis predicts point categories for each time-dependent data point. In isolation, a single point’s predicted category communicates information for one instant of time. Additional processing can be used to aggregate point predictions within a time period to generate time-based metrics such as the rate of degradation. For this paper, a metric was developed which aggregates thousands of point predictions over a desired time length, producing an overall metric representing the system state. This metric, Percent of Time, calculates the percent of timestamps which were predicted faulty and can be used to summarize system performance throughout the period. Percent of Time can be used to track the system degradation over time.
LightGBM Classification
LightGBM, which means Light Gradient Boosting Machine, is an open-source implementation of decision tree logic [
13]. LightGBM performs well in structured datasets, which can be translated directly into a matrix form of rows and columns. Structured datasets, such as BMS trend data, are commonly stored on system databases. Decision trees divide the feature space into regions that are created as binary partitions [
14]. Each region represents a unique prediction for the categorical output Y.
LightGBM is an evolution of the decision tree logic due to its inclusion of gradient boosting. Gradient boosting combines a group of learning models into a single learning model. The combined model is more powerful than individual models in the group. Standard implementations of the Gradient Boosted Decision Tree (GBDT) concept, such as XGBoost [
15], struggle with extremely large datasets because their complexity is proportional to the number of features and data points. LightGBM automatically reduces the number of features and data points to improve computational efficiency.
The algorithm achieves this by combining two techniques: gradient-based, one-side sampling and exclusive feature bundling, which are explained in LightGBM [
13]. Gradient-based, one-side sampling reduces the number of data points in the analysis. Exclusive feature bundling (EFB) reduces the total number of features in the dataset.
Figure 3 shows two decision boundaries calculated via the LightGBM algorithm using the data in
Figure 2. The boundary locations are determined via an iterative process of moving the boundary across each axis and minimizing the sum of square errors after assigning the points in each region an output class.
Measured points A, B, and C in
Figure 3 were selected to illustrate the changes in output class probabilities between different regions of the feature space. The class probabilities of each measured point are tabulated in
Table 2. Measured points A and C lie within clearly defined high-density regions of their respective class predictions, which produces a high prediction probability for each of those points. Measured point B, located between the two high-density regions, is predicted with a lower probability to each output class. The class probabilities are used to predict the point’s categorization.
In
Figure 4, the prediction made by the decision tree is the class label representing baseline (zero) and coil fouling (one), the two categories of the dataset. Two boundaries defining class regions are shown for clarity in the visualization, though the full model may contain hundreds of boundaries.
Figure 4 shows
Figure 3a, with both prediction boundaries overlayed, and the corresponding predicted fault, represented by Y, for each region.
2.2. Regression
While classification algorithms focus on output classes, regression algorithms generate a model to estimate an output using its relationship to the inputs and provide a measure of the data precision. Several researchers have used regression analytics to predict system consumption in different system states [
16,
17,
18,
19,
20]. Regression model performance improves as features are added to the dataset. Features can be ranked according to their contribution to the model’s output. Relative to higher-ranked features, the lower-ranked features can be removed with less added error to model predictions. To demonstrate feature weights, a regression model was trained using data generated via a 13-variable equation.
Figure 5 shows the R
2 performance curve of a regression model as variable measurements are added to the model’s list of features, sorted via the contribution to the R
2 value. The R
2 loss of removing feature 1 is 0.14, while the R
2 loss of removing feature 13 is 0.04. Feature 1 contributes over three times as much to minimizing the model’s prediction error compared to feature 13.
In a real building, measurements are interdependent. For example, the zone temperature of a room is dependent on the terminal box leaving air temperature and the volume of air being provided. Yan et al. used a principal component analysis (PCA) in a chiller model to rank the relative importance of each sensor reading for predicting the model’s output [
21]. The results were used to rank sensors as important to install or maintain so as to minimize model error.
A regression analysis can be used to create a model for noisy datasets. The models produced from this analysis are commonly used for predicting a system’s output when given a set of system inputs. The output variable can be any measurement dependent on the inputs. The output of datasets used in this project’s regression analysis is system energy consumption, which is dependent on variables including outside air temperature, internal loads, and system setpoints.
First-principles equations are used to calculate energy consumption from system measurements. At their simplest, the energy across any component is governed using Equation (1), where
is the mass flowrate of a substance (air or water),
is that substance’s specific heat, and Δ
T is the change in temperature across the component. Evaluating Equation (1) requires a set of sensor measurements, including a flowrate sensor to measure
and a pair of temperature sensors to measure
and
.
Regression model forecasting can also be used to predict values in two date periods. Building system FDD defines these two periods as a baseline, which is normally one year, and the one year after the building’s HVAC system has either been improved with commissioning or remained unchanged. Traditionally, only temperature measurements are used as the regressor variables for energy use. Adding a humidity measurement to the input dataset can improve the prediction.
While many faults in building systems will increase energy consumption, some faults will lower system consumption but adversely affect occupant comfort. If a high zone temperature consumes less cold air in conditioning, this fault would reduce energy use and negatively affect occupant comfort. Whether a fault reduces or increases energy consumption in the building, it represents a change in the system and should be investigated for unintentional effects on occupant comfort.
2.2.1. Advantages of Machine Learning
Developing physical models to represent buildings requires time and expert knowledge [
22]. ML model training has advantages over manual calibration in its ability to automatically infer dependencies between measured and unmeasured variables. As a building ages, components and construction will degrade. Due to this degradation, unmeasured properties in the building will change, with duct leakages potentially increasing and coils potentially beginning to foul. System sensors to directly measure these properties may be missing, though their values can be estimated using related sensors and a physical model of the system.
Temperature-based economizer controls can be defined as a function of the outdoor air temperature, return air temperature, and outdoor air damper command, which can be monitored to detect faults in the economizer. The relationships between the system measurements will change in a malfunctioning economizer. For example, a faulty economizer may mix a higher fraction of outside air into the system in hot external conditions when the damper is expected to minimize outside air in the system according to ASHRAE Standard 62.1.
Manual calibration for physical models requires a series of parameter-tuning and re-evaluation steps to minimize model errors. Typically, a 10% error between the model predictions and measured values is considered acceptable for determining savings in commercial buildings. Model calibration in ML is an automatic grid-searching process, where dozens of tuning parameter configurations are defined and evaluated to minimize model errors. These parameters are properties of the algorithm: learning rate represents the step size between calibration iterations to minimize model loss, tolerance represents the amount of acceptable error in model predictions, and the model’s depth represents the number of layers in a decision tree. The expert knowledge for ML training is an understanding of each tuning parameter and the consequences of changing their values, while expert knowledge for physical model calibration requires an understanding of the building systems and the first-principles equations that govern their operation.
2.2.2. Determining Unmeasured Relationships
Both ML and physical models rely on interdependencies to correlate the relationships between measured variables and unmeasured variables. A simple example has been developed to illustrate a ML algorithm’s process of estimating an unmeasured variable.
Learning the interdependencies between measured and unmeasured variables allows ML to adapt its predictions at the component level, rather than the generalized rule-based FDD which analyzes all components under a single rule. ML does this by defining a feature space which can consist of 10 or more features independent of other components in the system. Each feature must be simulated according to its behavior after a fault is introduced to the system. For example, the simulation could modify the data to have a damper in a constant position which emulates a stuck damper. The defined feature space automatically associates dependencies between system variables to analyze the system.
Consider a terminal box model which is used to predict the total energy output from the terminal box output to the space. The terminal box is damper-controlled with airflow provided by the AHU’s supply air fan. Relevant measured data from the system includes the supply air temperature, space temperature, terminal box damper command, supply duct static pressure, and supply air VFD command. The terminal box’s energy impact is dependent on the temperature and flowrate of the air output to the zone as well as the space temperature in the zone, which is used to calculate ΔT in Equation (1). The supply air temperature is measured by the system and is included as a model input. The supply air flowrate is unmeasured in the system and must be inferred using dependent system measurements.
Figure 6 illustrates the measurements in the system and the relationship between measured and unmeasured variables.
Because the flowrate is directly dependent on the terminal box damper command, supply duct static pressure, and supply air VFD command, the value of the supply air flowrate can be inferred using those three measurements. The supply air flowrate is also indirectly dependent on space temperatures. Space temperatures failing to meet the setpoint may require a higher flowrate of supply air to condition zone loads. The supply air temperature is excluded from the supply air flowrate correlations in this simple model to reduce computational time because air densities had less than a 2% impact on the output variable. If higher precision models are required, the supply air temperature could be included.
Table 3 shows results from two models trained to predict the terminal box energy output. The Used For column contains variable names which have been color-matched to relevant features in the Features column. Model 1 used all three green dependent variables to learn and estimate the supply air flowrate. Model 2 used one green dependent variable to learn and estimate the supply air flowrate. The energy transfer for the supplied air is calculated using the first-principles energy calculation and is compared to each of the two model’s energy outputs. The mean squared error (MSE) squares the value of all errors of model predictions. In this example, Model 2’s MSE is over 13 times Model 1’s MSE.
The lower performance of Model 2 is expected because of the fewer features used to learn and estimate the supply air flowrate. The supply duct static pressure and VFD command measurements are crucial in predicting the terminal box’s energy output because of their correlation to the supply air flowrate. Using features which differ from those used in a rule-based FDD analysis to train an ML model can increase the resolution of the analysis to allow for earlier fault detection. For example, a rule-based analysis for space temperature faults typically analyzes only space temperature values. Including outdoor air temperature or time of day features in an ML model can provide additional context for fault diagnostics.
The example above demonstrates that, provided the relationship between an unmeasured variable and measured variables in the system are governed by a relationship which can be modeled, ML can automatically derive the relationships between unmeasured and measured variables. Physical models can also predict a building’s measured behavior but require additional manual calibration and an expert knowledge of building systems to achieve similar results.
Figure 7 illustrates the training process of a baseline model of a building system. This process was used to train the models discussed in
Table 3 with the addition of weather conditions. In the flowchart, the external weather and internal conditions represent measured variables from the system. Relationships between the variables and the model output are inferred in the training process.
A regression model calibrated to the baseline period estimates the baseline system’s consumption; any faults or component degradation occurring after the baseline period will exist in the evaluation period for comparison. Comparisons between the weather-normalized baseline energy consumption, which is predicted by the ML model, and the measured energy consumption can show the improvement from energy efficiency measures or the energy impact of a fault in the system. Weather-normalized baseline evaluations are a normal measurement and verification (M&V) procedure, though physical models are typically used instead of ML models. The procedure is displayed in
Figure 8.
To demonstrate the magnitude of the increase in energy consumption when various faults are introduced to a building, a simulated dataset has been created. The evaluation uses data measured from the Zachry building with a simulated cooling coil fouling fault using the procedures outlined in
Section 2.1: the chilled water flowrate through the coil was reduced to produce a new fouled coil dataset which is compared to the baseline. The flowrate of the system has been normalized in this example dataset. The predicted difference in total system energy consumption between the two system states was 15%.
Figure 9 plots the two consumption calculations and illustrates the magnitude of their difference. The blue series represents baseline consumption, and the orange series represents the system consumption with a fouled cooling coil. The difference between the two series is the change in energy consumption in the system due to the fault. Knowing the change in energy consumption due to each fault in the system enables a building operator to prioritize faults based on a calculated impact on the system, which is normally uncalculated in rule-based analytics.