1. Introduction
The International Energy Agency (IEA) estimated that buildings have become the third largest energy consumer in the world [
1]. Generally, energy usage in buildings is expended on lighting, electrical equipment, and heating, ventilation, and air conditioning (HVAC) systems. HVAC systems, which play an important role in ensuring occupant comfort, are among the largest energy consumers in buildings with up to 60% of the total energy consumption in households [
1]. Performance enhancements to traditional HVAC systems therefore offer an exciting opportunity for significant reductions in energy consumption.
Several studies show that almost 50% [
2] of USA’s energy usage in buildings is utilized for indoor climate conditioning. Moreover, the worldwide energy consumption by HVAC equipment also shows considerable high values. In Europe, they represent around 40% of energy consumption [
3]. In China, about 20% of the total energy usage in the year 2004 is reported with a constant annual increase [
4]. In the Middle East, more than 65% of energy consumption is reported for use of cooling systems [
5] and in Mexico, the cooling system makes up to 44% of the total energy consumption [
6].
The increase in building energy consumption is highly affected by building design, change of occupant comfort standard, building operation, maintenance, and HVAC system design. Especially important has been the intensification of energy consumption in HVAC systems, which has now become almost essential in parallel to the spread in the demand for thermal comfort, considered a luxury not long ago. All those aspects should be conceived with energy consumption and occupant comfort in mind.
Thermal comfort is all about human satisfaction with the thermal environment. The design and calculation of air conditioning systems to control the thermal environment in a way that also achieves an acceptable standard of air quality inside a building should comply with the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) standard 55-2017 [
7]. This standard acknowledges the two main research areas in thermal comfort: thermal physiology and human behavior.
As for the first area, it includes two common indicators called predicted mean vote (PMV) and the predicted percentage of dissatisfied (PPD) known as the Fanger’s model [
2,
3]. The PMV predicts the mean value of the votes on the seven-point thermal sensation scales:
+3 Hot;
+2 Warm;
+1 Slight warm;
0 Neutral;
−1 Slight cool;
−2 Cool;
−3 Cold.
On the other hand, the PPD represents the prediction of the percentage of occupants that feel uncomfortable and its value ranges from 0% to 100%. Thus, an acceptable range of thermal comfort goes from slight warm to slight cool for 20% of dissatisfied for the residential sector [
2]. The ASHRAE-55 calls this the PMV-PPD model described by the Fanger’s equation, considered to be the milestone of the development of a thermal model [
8]:
where:
M: the metabolic rate;
W: mechanical work is done;
C: convective heat loss from the clothed body;
R: radiative heat loss from the clothed body;
E: evaporative heat loss from the clothed body;
Cres: convective heat loss from respiration;
Eres: evaporative heat loss from respiration;
S: the rate at which heat is stored in the body tissues.
The ASHRAE-55 standard uses empirical tables with common activities and their respective met units (metabolic rate) and clothing insulations for different garments in clo units. However, the use of fixed value tables has some limitations in real time analysis and only works for statistical analysis or estimations.
The second area is related to a hypothesis in which the perception of thermal comfort is related to outdoor weather conditions, and it is based on adaptive opportunities of occupants to control their own comfort [
1,
4,
9]. This thermal adaptation was defined by three categories [
10]: behavioral adjustment which includes personal modifications such as removing garments, or doing physical activity, and external modifications such as opening a window or changing the air conditioner; physiological adaptation referring to the acclimatization or even genetic adaptation; psychological adaptation refers to the expectations due to past experiences [
10].
Both PMV and the adaptive models are aggregate models, which means they are designed to predict the average thermal comfort of large populations and ultimately present limitations in real case scenarios [
11]. As many studies have shown [
12,
13,
14,
15], the measurement of thermal comfort in office buildings is limited by the subjectivity and high dependency on the six mandatory parameters used for heating and air conditioning setpoint controls. Four variables are related to the environment:
Air temperature;
Air speed;
Humidity;
Radiant temperature.
Two variables are related to the occupant:
Metabolic rate;
Clothing insulation.
Furthermore, the work in [
16] highlights the difficulty and the cost of obtaining some of these variables. For example, mean radiant temperature and air speed are two of the environmental variables that are not typically monitored as they require expensive instruments for measuring [
16]. Moreover, the two personal variables metabolic rate and clothing insulation are said to be impossible to collect in real time [
17]; hence, the process is simplified with the assumed values or fixed set of data collected from laboratory or field measurements, which ultimately causes erroneous estimations [
14].
Hence, the contribution of this paper is focused on a methodology for an on-line estimation of the metabolic rate of a single occupant to improve simulations of energy consumption in smart homes with an HVAC system, as there is still not a way to measure the two variables of thermal comfort related to the occupant: metabolic rate and clothing insulation [
17]. The metabolic rate estimation is based on Human Activity Recognition with RGB-D data using a skeleton-based model over a 3D representation with a recurrent neural network as the classification method. The RGB-D data are intended to reduce privacy issues in comparison with the RGB data, as the 3D skeleton model is used to reduce data used for the classification method compared with a pixel’s image data. The recognized activity is paired with a metabolic rate value that is used as an input variable for the human-centered approach of the adaptive thermal comfort on a simulation for comparing energy savings between setpoints with fixed values and adaptable setpoints.
This paper is organized as follows:
Section 2 shows the literature overview for human activity recognition.
Section 3 describes the materials and methods used for the proposal.
Section 4 shows the results of the proposal. Finally,
Section 5 discusses and presents the improvements from implementing depth sensors for activity recognition and its impact on energy consumption analysis in HVAC systems.
2. Literature Overview
Thermostats that control HVAC systems are employed in about 85% of households; thus, they represent an opportunity for saving energy at home. Initial approaches for saving energy through connected thermostats are presented with gamification techniques [
18,
19], data analysis [
20], behavior analysis [
21], and usability of interfaces [
22,
23,
24]. However, a first approach using the adaptive model to measure the differences between increasing or decreasing the thermostat setpoint depending on the season was analyzed. Reductions and thermal comfort were achieved. The research suggested more information about the householder and how, for instance, clothing insulation and metabolic rates affect thermal comfort. Hence, in [
25,
26], the authors proposed to measure thermal comfort in smart homes through dynamic clothing insulation with cameras; the activities were inferred depending on the position of the householder and the clothing insulation of twelve homes. They found that energy reductions are achievable and that tailored strategies were required as not all the homes achieved thermal comfort, and there were homes where the comfort was not met and the energy consumption increased. Therefore, in [
27,
28], they proposed using cameras to measure the clo value dynamically through a Convolutional Neural Network (CNN) model classification and obtain the thermal comfort range of a household in Concord, California. These approaches considered only the
clo value and assumed a metabolic rate of 1.0. Furthermore, in [
25,
26], the authors pointed out the need to measure the activities as well to obtain dynamic thermal comfort models instead of conventional models.
Human Activity Recognition (HAR) requires a series of physical actions that construct one physical activity, where a physical action is defined as any bodily movement produced by skeletal muscles and the activity itself is the sequence of those produced movements [
29]. HAR research focuses on types of activities that humans perform within a time interval; thus, it is based on sensors and/or video data analysis. Moreover, the two types of sensors found in HAR are: wearables and ambient sensors [
30].
Wearables sensors are attached to the person’s body to measure a given attribute such as electrocardiogram (ECG), location, temperature, motion, electroencephalogram (EEG), etc. [
31,
32,
33,
34]. All the data of these sensors may be sent to another device for processing; regularly this device can be a computer, an embedded system, or a smartphone. Moreover, the smartphone itself can be used as a wearable sensor because of all the technological progress made on them [
35,
36,
37]. The main disadvantages of wearables are they require batteries, thus charging them may be annoying for the user; some data may be synchronized manually because of no communication between them; and finally, the user may feel them intrusive so they may not wear them at all [
38].
On the other hand, ambient sensors are not intrusive and may be connected directly to a source of power. Video cameras today are low-cost devices to obtain the necessary data over time. A sequence of images is directly used to make human activity recognition and they need to be processed by a computer. The disadvantage of the camera is that there may be privacy issues; therefore, a strong acceptance of this technology may be needed by the user. Other ambient sensors (also known as binary sensors) such as motion detectors, pressure sensors, contact switches, etc., can be an effective way to track human activity [
39,
40].
The field of HAR has become an important research area due to its increasing number of applications and therefore, recent surveys about this field offer a precise description of state-of-the-art methods, publicly available databases, and actual research challenges [
30,
41,
42,
43,
44]. The vision-based HAR research is divided based on data type; the most common is Red, Green, and Blue (RGB) data from a normal camera (CCTV, webcam, etc.) and the Red, Green, Blue and Depth (RGB-D) data that incorporate depth information. The RGB data have achieved lower accuracy compared to the RGB-D data but the configuration complexity, high cost, and the need for big datasets of the RGB-D data are the reasons why RGB data is mainly used [
45].
A key component for vision-based HAR is human body modeling. The three most common types are: skeleton-based model, contour-based model, and volume-based model [
46]. The skeleton-based model represents a set of joint locations following a human body skeletal structure; this model is visually identified as a stick-figure. The contour-based model contains the contour information of the human body, and it is often represented with rectangles of a person silhouette. Lastly, the volume-based model is represented in 3D by geometric shapes (cylinders, conics, cubes) or meshes that resemble a human body [
47].
Depending on the data type and the human body model used for HAR, a 2D or 3D representation of the data can be selected to work with. Regarding the data type, the RGB only offers information in 2D while the RGB-D can work with 2D and 3D data representation. On the other hand, regarding human body modeling the skeleton-based model can be used with 2D and 3D data, the contour-based model with 2D data and the volume-based-model with 3D data [
47].
Finally, the most used methods of classification for human activities can be divided in two: conventional neural networks and other machine learning methods [
41]. Machine learning methods such as K-nearest neighbors (KNN), Support Vector Machines (SVM), decision trees, and Hidden Markov Models (HNN) are mainly used with wearables sensors and some ambient sensors; while neural networks methods such as Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) are used with vision-based sensors [
41,
43].
Figure 1 depicts a diagram with the previously described classification of vision- based HAR-related works based on: data representation, data type, body modeling and methods for classification. Moreover, the figure shows the characteristics this paper focuses on.
3. Materials and Methods
Figure 2 presents the general methodology proposed for obtaining MET values depending on the activity detected on-line. A MET is a ratio of the working metabolic rate relative to the resting metabolic rate, where one MET is the energy spent sitting at rest. First, the data preprocessing consists of obtaining the 3D-axis joints information of a human skeleton shape detected by a depth vision sensor. This 3D data is then transformed to change orientation and size in a new reference 3D plane. Next, the transformed 3D data is input in a Recurrent Neural Network with a classification layer that detects the activity performed by a human; this network needs to be trained with a custom-created database. Finally, a MET value is obtained depending on the activity detected. On the other hand, a simulation of energy usage on a house with an HVAC adaptive setpoint based on the MET value over a whole week is made to get an estimated energy saving that can be achieved with the proposed methodology.
3.1. Data Preprocessing
The data preprocessing consists of five steps that take a combination of image and depth information into signals that can be classified to recognize different activities.
Figure 3 depicts the five steps of the proposed methodology. Each step is described next.
The first step is to extract the joints in a 3D coordinate system of a skeleton model using the Azure Kinect Body Tracking SDK (Software Development Kit). This uses the depth sensor’s information built in the Kinect Azure and an ANN to track multiple human bodies at the same time. Then, the preprocessing is made with the 3D coordinate information of each joint for the objective to make the data invariant to different orientations towards the camera.
The skeleton model consists of 32 joints (
Figure 4a) over a 3D frame of reference that depends on the actual position of the camera as shown in
Figure 4. The 3D coordinate system is represented as metric [X, Y, Z] coordinate triplets with units in millimeters. The origin [0, 0, 0] is located at the focal point of the camera with the orientation such that the positive X-axis points right, the positive Y-axis points forward, and the positive Z-axis points up.
As the joint position and orientation are estimates relative to the global depth sensor’s frame of reference, the second step of the preprocessing is to translate the skeleton joints to a new reference centered on the origin of the coordinate plane to eliminate distance variability of the subject.
Figure 5 depicts the original joints data on a side view (plane YZ) and a top view (plane XY) with the translated data, which now is centered on the joint corresponding to the pelvis on the origin (0, 0, 0). In the same
Figure 5, it can be noticed that the 3D model is inclined in reference to plane XY, although the real position of the body is in perpendicular position to the floor (standing). This is due to the actual position of the camera/depth sensor in reference to the floor. Hence, the next step is to correct the pitch and roll rotations due to the position of the camera/depth sensor to make the activity recognition independent of where the camera/depth sensor is located.
Figure 6 depicts the roll and pitch angles referenced to the camera and how it can affect the perspective view of the person.
The correction of the position of the camera/depth sensor uses the Inertial Measurement Unit’s (IMU) accelerometer of the device with three axes to determine the angle individually of each axis. The reference position of the device is taken with the X- and Y-axes in the plane of horizon with 0 g field and the Z-axis orthogonal to the horizon with 1 g field in rest.
To perform a rotation in Euclidean space, a rotation matrix is used to transform a vector such as the earth’s gravitational field vector
g. In a 3D coordinate system, the rotations of the X-, Y- and Z-axes in a counterclockwise direction when looking towards the origin are represented by the matrices in Equations (2)–(4) [
48]:
Any rotation can be given as a composition of rotations about three axes (Euler’s rotation theorem), thus the matrix
R with the order roll, pitch, and yaw and with the effect of the earth’s gravitational field of 1
g initially aligned downwards along the Z-axis is shown in Equation (5) and solved in (6) and (7):
Rewriting Equation (7) relating the normalized accelerometer reading
A to the rotation angles gives Equation (8):
Thus, the roll and pitch angles equations can be obtained solving Equation (8) with the normalized accelerometer reading as shown in Equations (9) and (10) [
49]:
where:
Ax—normalized accelerometer reading in X-axis;
Ay—normalized accelerometer reading in Y-axis;
Az—normalized accelerometer reading in Z-axis.
Figure 7 shows the result of step 3 that corrects the human skeleton position by applying pitch and roll rotations. The rotation is applied to the skeleton with Euler angle transformations [
50] with the parameters previously calculated from the accelerometer. This process is only made once at starting the activity recognition as it is assumed the depth sensor is in a fixed position.
Step 4 is to make a yaw rotation of the skeleton (through the Z-axis) to make it always face front to the depth sensor (anatomically anterior position) as
Figure 4a.
Figure 8 depicts the implemented rotation of the skeleton from a side view (YZ plane) and the top view (plane XY). This process takes the relative position of the left and right clavicle joints forming a vector that should be parallel to the X-axis and the nose-head joint vector pointing negative into the Y-axis. In this way the invariance in position of the skeleton is achieved to be always in the same reference. The last step of the transformations is to eliminate the variance of heights.
As the reference values of the joints are in millimeters, the subjects’ measures add a variable to the classification process that should be removed.
Figure 9 depicts the result of step 5, the normalization of all values from −1 to 1, applying Equation (11) to each joint’s data.
Finally, this process is repeated for each frame of the captured data as a set of frames will be required to detect the activity. The time series generated with the data of each of the three axes of each joint is used as one feature for the Recurrent Neural Network.
Figure 10 depicts the data that make up the input for the RNN, the sequential data of each axis for every joint.
3.2. RNN + Activity Classification
Recurrent Neural Networks (RNN) are typically used to solve time series analysis problems, hence the use of this type of network in the Human Activity Recognition problem.
Figure 11 depicts a representation of an RNN where
Xt is some input in the form of a vector representing a time series,
ht is the output hidden state vector, and the blue line is the loop representing that the output is fed back as an input in the network. Unrolling the basic representation of the RNN, it is clear that the loop allows information to be passed from one step of the network to the next, where
t represents the number of observations in time. Therefore, an RNN consists of a function
F dependent on the past state vector and the current input feature which outputs the current hidden state vector h
t, as stated in Equation (12):
However, the RNN is highly susceptible to the vanishing gradient problem because the hidden layer of one observation is used to train the hidden layer of the next observation, meaning that the cost function of the network is calculated for each observation [
51]. Therefore, the cost function calculated at a deep layer will be used to change the weights of neurons at the shallow layers; because of the multiplicative nature of the backpropagation algorithm, the gradients calculated at those deep layers either have too small or too large of an impact on the weights of neurons in the shallow layers [
51]. This effect is depicted in Equation (13), where the gradient on the current state vector
hc from the past state vector
hp is the product of gradients for all intermediate state vectors:
There are many techniques to try to solve the vanishing gradient problem [
52,
53], but the most important is a specific type of network called Long Short-Term Memory Networks (LSTMs). The LSTM solves the problem by setting the weight initialization to 1 but also adding new components to the traditional RNN architecture: forget gate, input gate, cell state, and output gate.
Figure 12 depicts the difference between a normal RNN (
Figure 13a) and an LSTM architecture (
Figure 12b).
With the forget layer, operated by a sigmoid function, the magnitude of the gradient of LSTM does not decrease, thereby avoiding the gradient problem [
54]. The output of the forget layer is between 0 and 1 for each value in the cell state, where a 0 represents to completely forget the value while a 1 represents to totally keep the state as shown in Equation (14):
The next part handles what information is stored in the cell state by including the input gate layer with a sigmoid function and a tanh function that creates the vector for new
candidate values [
54]. Therefore, the updated cell state is described by Equation (15):
Finally, the output gate is a filtered version of the cell state evaluated by a sigmoid function that decides what parts of the cell states are used [
54]. Hence, the output
ht is described in Equation (16):
As for the specific architecture of the LSTM used in this paper, the model is defined as a sequential Keras model with a single hidden layer and a dropout layer of 10% with the goal of reducing overfitting of the model to the training data. A dense, fully connected layer is implemented to interpret the features extracted by the LSTM and its final output layer implements a softmax function to classify the three activities: raising hands, sitting, and walking. The inputs of the network consist in 96 data that represent 3 axis values for each of the 32 joints.
Figure 13 depicts the architecture described and the Python code for implementation.
3.3. Study Case: Metablic Rate Dynamic Analysis Applied on Thermostats
A dataset of daily activities for periods of 15 min during a week’s time was obtained from the RNN classification. Thus, 672 observations were obtained with 3 different activities. Then, two energy simulations were performed during the extremely hot week for a household located in Concord, California. The first simulation was the baseline that considered the building, electric loads, and occupation schedules presented in [
55] with a fixed value of metabolic rate. This home has two conditioning zones: bedroom two and living room zones. This paper analyzed the living room. The cooling setpoint was 24.4 °C, and the heating setpoint was 21.7 °C, the same initial values considered in [
27]. As the extremely hot week was during the summer period, the clo value considered was 0.5 [
7]. The second simulation considered the three different activities in the dining and living zone. The energy model was simulated using LadybugTools V1.5.0 software plugin for Grasshopper by Ladybug Tools LLC, USA [
56,
57].
Then, a strategy to save energy considering thermal comfort was proposed to be compared to the first two simulations. This strategy consisted of increasing or decreasing the cooling and heating setpoints by 1 °C [
58,
59] or even turning off the thermostat, depending on the following considerations:
The difference between outdoor temperature and operative temperature. As the operative temperature tends to match the outdoor temperature, we will call heating tendency when the outdoor temperature is higher than the operative temperature and cooling tendency otherwise.
The thermal sensation scale evaluation with the PMV equation [
60]. If the thermal sensation at a particular moment is negative, the occupant feeling tends to be cool while a positive value means the occupant feeling tends to be hotter.
Four rules are obtained with the combination of the two previous considerations. If the natural tendency is heat and the occupant sensation is negative, the AC is turned off, but if the occupant sensation is positive then the setpoints decrease by 1 °C. Moreover, if the natural tendency is cooling and the occupant sensation is negative, the setpoint is increased by 1 °C; on the contrary, if the occupant sensation is positive, then the AC is turned off.
This strategy is evaluated with two more simulations, the first one using the previously described baseline and the second considering the same three activities’ recognition of the last simulations. Finally, both results are compared for energy consumption and total comfort state.
Moreover, those activities were converted into W per person because the energy simulation requires that measure.
Table 1 depicts these activities, the metabolic rate, and the W/person. The W/person was calculated by multiplying 58.1 W/m
2 equal to 1 met, and 1.8 m
2 is equal to the skin surface of an average individual of 1.70 m in height and 68 kg [
61].
Table 1 depicts these activities, the metabolic rate, and the W/person.
Finally, a comparison between the base model and the dynamic activities model was performed. This comparison included the differences between the total hours of thermal comfort and the total kWh HVAC consumption.
4. Results
This section presents the results of two simulated processes, first the activity recognition depicted in
Figure 2 and then the energy saving simulation with the dynamic setpoint for HVAC systems. First, the activity recognition results are shown with the use of an RNN and how a small dataset of activities was created to train the neural network. Then, the evaluation of a simulated model of a house with an HVAC system to obtain an analysis of energy savings between a model with a fixed setpoint and one with an adaptive setpoint is presented.
4.1. Activity Recognition
To show the capability of an RNN to classify activities of daily living (ADL) with the proposed methodology, a small dataset of three activities (sitting, walking, raising arms) was created as most of the available datasets are vision-based (images) or sensor-based as reviewed by [
62,
63].
The total data gathered for training included 201,600 values as shown in
Table 2. This corresponds to 40, 50 and 50 repetitions of each of the three activities to train: sitting, walking, and raising arms. Each repetition consists of 15 timesteps at 2.5 frames per second; and each observation has 96 values corresponding to the x-axis, y-axis, and z-axis values for 32 joints of a 3D skeleton human model. The activities were performed by four different subjects indistinctly with parameters shown in
Table 3.
Of the total data gathered, the values of five joints were discarded: nose, eye left, eye right, ear left, and ear right as they are not necessary since they do not provide relevant information for the detection of the activity.
Figure 14 depicts the office plan where the training and test data were gathered and the four different positions where the device was located. For the training data, the camera/depth sensor was placed on position 3, while for the testing data the device was placed on the four positions marked to evaluate if the proposed methodology can deal with different view perspectives for classifying the activity.
Table 4 shows the position characteristics for each location referenced to the camera/depth sensor.
Figure 15 shows the position 2 (a) and position 3 (b) different view perspectives for the testing data.
Moreover, different levels of ambient lightning were used for the testing data. For measuring the light, precision light sensor 1127 was used. Three levels of lightning for each different position of the camera/depth sensor were tested: fully illuminated (513 lux), partially illuminated (235 lux) and dark (4 lux).
The data recorded for evaluating the model in which 15, 17 and 16 repetitions for sitting, walking, and raising arms respectively were recorded are shown in
Table 5. The data were recorded in different camera/depth sensor positions (
Figure 14), different lighting, with partial occlusions and with three different subjects (
Table 6) as depicted in
Figure 16.
Because of the stochastic nature of neural networks, different models will result when training with the same data configuration. Therefore, the evaluation of the RNN model was repeated multiple times for a specific number of epochs to be trained and then changed to compare the results.
Table 7 summarizes the mean and standard deviation of the performance of the model for 5, 10, 15 and 20 epochs. The mean gives the average accuracy of the model on the dataset, whereas the standard deviation gives the average variance of the accuracy from the mean.
After observing the results, the best values correspond to the model for 10 epochs of training. In addition,
Figure 17 depicts a confusion matrix showing the performance of the model with the test data.
The results obtained with the trained model show a very high accuracy and validates the methodology proposed for activity recognition where instead of doing image classification we only use 81 signals over 15 timesteps that represent the movement in three axes for the skeleton joints of a human model.
4.2. Energy Savings Simulation
An experiment consisting of four simulations is proposed to evaluate the power consumption; these were made using LadybugTools V1.5.0 by Ladybug Tools LLC, USA. The experiment first consisted of two simulations comparing the estimated energy consumption of a living room with an HVAC system, as described in
Section 3.3, for a constant met value set to 1.1 and with variable met values, as described in
Table 1, emulating the process of activity recognition, as described in
Section 3.1. The simulation is configured to evaluate the parameters every 15 min over a period of 24 h for 7 days (a complete week) but only the time between 7:00 to 21:00 was considered for the results as it is the busiest time for that specific room. The results given by the simulations are:
Condition: Value between −3 and +3 representing the PMV index within the thermal sensation scale.
Comfort: Binary value that evaluates whether the occupant is comfortable (1) or not (0) with the current environmental and occupant-related variables according to the adaptive thermal comfort model.
Energy: Energy consumed in kWh.
The results for the first simulation with constant met values and the second simulation with variable met values are listed in
Table 8.
The ideal average of the condition should be 0 as it would represent that for every period of 15 min, the thermal sensation is “normal”. More positive values would represent that the thermal sensation is “hotter” and more negative values would represent a “colder” sensation. For a constant met value, the general sensation would be slightly cold; as for the variable met simulation, the sensation is almost normal with a little tendency to be a bit hot.
The result of the sum of comfort values represents how many periods of fifteen minutes the occupant felt comfortable according to the adaptive thermal comfort model. The higher the value the more comfortable the occupant is. It can be observed that the simulation with constant met values has a higher value.
The sum of energy consumed in kWh is the third observable result. For both simulations the consumption is almost the same with a difference of 0.0567 kWh.
The second part of the experiment consists of two more simulations. This time the proposed strategy for saving energy described in
Section 3.3 was applied to the setpoint limits of the thermostat and the other parameters remain the same as for the first two simulations. The results for these new simulations are shown in
Table 9.
The condition result shows that using a constant met value, the average sensation for the whole week is colder than having a variable met. In comparison with the previous simulations, for the constant met the condition improved as it got closer to zero.
The sum of comfort for a constant met doubled for the variable met. As for comparing with the first simulations, the comfort increased for a constant met but decreased for variable met.
The sum of energy consumption is almost 1.5 kWh less for the constant met simulation than for the variable met simulation. However, in comparison with the first two simulations, both decreased at least 15% with the energy saving strategy proposed.
5. Discussion
This paper focuses on three main aspects to propose a strategy to try to reduce energy consumed by a HVAC system in a building without compromising the thermal comfort of the occupant. The first one considers a dynamic met value that can change according to the activity carried out by the occupant in the calculations of comfort. Moreover, the activity must be detected on-line to let the thermal comfort models update as the occupant-related variables change. Therefore, the way to go is a vision-based system, as deep learning techniques have significantly progressed [
46] and offer less intrusive sensing.
The second aspect is using a depth sensor-based system to recognize human activities of daily living to avoid the main challenges an RGB-based classification system faces. With the presented methodology that uses a skeleton model with 3D data of 32 joints to make a classification using a simple LSTM network, it is shown that the recognition of activities can be achieved with high accuracy and with less data for training in comparison with similar public available datasets [
46]. Moreover, the manipulation of 3D information allowed the recognition without affecting the position in which the camera was placed, the orientation of the occupant with respect to the camera when carrying out the activity or even the physiological differences of the occupants, as could be demonstrated in the tests carried out and obtaining a high level of accuracy.
The last aspect is the strategy to save energy by increasing or decreasing by 1 °C the heating and cooling setpoints of a connected thermostat. The proposed strategy showed in the simulations that the comfort level for a constant met value is higher than the one for a variable met value, showing that actual models are not giving a real perspective of the occupant’s comfort as they are estimating higher values of comfort when in real-life scenarios depending on the activity of the occupant, the comfort values should be lower. The results also showed that the energy consumption decreased by 33% compared to the simulations with constant met value and 14.2% comparing with the variable met values simulations. As the variable met simulation offers more realistic information it is important to notice that the 14.2% of energy saving comes with a decrease of 11 points in comfort, meaning that in eleven time slots of 15 min of the whole week the occupant felt not comfortable; that is, 165 min less than without using the energy saving strategy. A 14.2% of energy saving for a 1.63% decrease in comfort can be considered an acceptable strategy; moreover, the decrease in comfort can be improved by introducing the capacity of changing occupant’s clothes in future work.
The implementation of an on-line estimation of metabolic rate on a connected thermostat opens the possibility to implement energy saving strategies that currently are limited to just the information obtained by the environment sensors allocated in the thermostat. The simulation presented in this paper shows a strategy that reaches 14% of energy saving compared to a strategy that does not include the on-line metabolic rate information, showing the importance of adding the information of all thermal comfort parameters. Furthermore, incorporating a vision-based sensing system allows not just to incorporate the metabolic rate information to the thermal comfort analysis but also the clothing insulation of a person to increase even more the thermal comfort estimations.