Building Heat Demand Prediction Based on Reinforcement Learning for Thermal Comfort Management
Abstract
:1. Introduction
1.1. Backgrounds
1.2. Reinforcement Learning
1.3. Research Objectives
- Based on the DRL algorithm, this study addressed accurate prediction of indoor heat demand to achieve on-demand heating operation.
- The performance of the DRL model was analyzed, including model training, indoor temperature control and the supply–demand error.
- Appropriate reward function was proposed for the application of the DRL model in building heat demand prediction tasks.
- A novel evaluation criterion was applied for the prediction model, the indoor thermal comfort performance (indoor temperature) was used instead of prediction accuracy to evaluate model performance.
2. Methods
2.1. Reinforcement Learning
2.2. Deep Deterministic Policy Gradient
2.3. Twin Delayed Deep Deterministic Policy Gradient
2.4. Proposed Building Heat Demand Prediction Mode
2.4.1. Environment
Physical Parameters
- (1)
- Building shape
- (2)
- Parameters of building envelopes
Mathematical Model
- (1)
- Dynamic heat balance of external walls and roofs
- (2)
- Dynamic thermal balance of floors
- (3)
- Building heat demand
- (4)
- Dynamic equation for indoor temperature
2.4.2. Reward Function
- (1)
- Reward function 1 (R1): only the target interval has a slightly positive reward, as shown in Figure 5.
- (2)
- Reward function 2 (R2): the reward of the target interval has the same absolute value of the reward of the disqualified intervals and the suitable interval has a slightly positive reward, as shown in Figure 6.
- (3)
- Reward function 3 (R3): on the basis of R2, the reward of the target interval and the right side of the qualified interval and disqualified interval are halved because lower temperature is even more strongly disallowed than higher temperature, considering the thermal comfort of residents, as shown in Figure 7.
2.4.3. Agent
2.5. Experiment Schemes
2.5.1. Model Settings
- (1)
- Meteorological data
- (2)
- Parameters settings for the neural networks
- (3)
- Parameters settings for the reinforcement learning
2.5.2. Evaluation Indices
3. Results and Discussions
3.1. Model Training
3.2. The Performance of Indoor Temperature Control
3.3. The Performance of On-Demand Heating Operation
4. Conclusions
- (1)
- The experimental results showed that the design of the reward function significantly influenced the performance of the model after training. The agent failed to achieve the primary target that 1/3 of the total reward based on R1. However, the agent achieved the advanced target and intermediate target, respectively, when R2 and R3 were used.
- (2)
- The model proposed in this study can dynamically control the indoor temperature in the acceptable interval through building heat demand prediction. The experimental results showed that even if the model only reached the primary target through training, it could maintain the indoor temperature in the acceptable interval (19–23 °C) for more than 90% of the time during the experiment. Moreover, the proportion of time that the indoor temperature can be controlled within the target interval (20.5–21.5 °C) was over 35%, 55% and 70%, when the model reached the primary, intermediate or advanced targets, respectively. This study verified the prediction accuracy of building heat demand from the perspective of indoor temperature control performance, and validated the effectiveness and advancement the model proposed in this study.
- (3)
- In addition to maintaining indoor temperature, the model proposed in this study also achieved on-demand heating operation. Although the final heat supply was directly influenced by the indoor temperature control, the experimental results show that the supply–demand error can be controlled below 10% by training a model that achieved the primary target. The model achieving the advanced target, which had the best indoor temperature control performance, had a supply–demand error of only 4.56%, providing basis and support for the realization of on-demand heating operation.
Author Contributions
Funding
Conflicts of Interest
References
- The State Council Information Office of the People’s Republic of China. Responding to Climate Change: China’s Policies and Actions. 2021. Available online: http://www.scio.gov.cn/ztk/dtzt/44689/47315/index.htm (accessed on 8 September 2022). (In Chinese)
- Yuan, J.; Huang, K.; Han, Z.; Wang, C.; Lu, S.; Zhou, Z. Evaluation of the operation data for improving the prediction accuracy of heating parameters in heating substation. Energy 2022, 238, 121632. [Google Scholar] [CrossRef]
- Tsinghua University Building Energy Research Center. Annual Report on China Building Energy Efficiency; China Construction Industry Publishing House: Guangzhou, China, 2021. [Google Scholar]
- Zhang, Y.; Wang, J.; Chen, H.; Zhang, J.; Meng, Q. Thermal comfort in naturally ventilated buildings in hot-humid area of China. Build. Environ. 2010, 45, 2562–2570. [Google Scholar] [CrossRef]
- Zheng, X.; Wei, C.; Qin, P.; Guo, J.; Yu, Y.; Song, F.; Chen, Z. Characteristics of residential energy consumption in China: Findings from a household survey. Energy Policy 2014, 75, 126–135. [Google Scholar] [CrossRef]
- Dotzauer, E. Simple model for prediction of loads in district-heating systems. Appl. Energy 2002, 73, 277–284. [Google Scholar] [CrossRef]
- Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
- Yuan, J.; Wang, C.; Zhou, Z. Study on refined control and prediction model of district heating station based on support vector machine. Energy 2019, 189, 116193. [Google Scholar] [CrossRef]
- Wang, R.; Lu, S.; Feng, W. A novel improved model for building energy consumption prediction based on model integration. Appl. Energy 2020, 262, 114561. [Google Scholar] [CrossRef]
- Wang, R.; Lu, S.; Li, Q. Multi-criteria comprehensive study on predictive algorithm of hourly heating energy consumption for residential buildings. Sustain. Cities Soc. 2019, 49, 101623. [Google Scholar] [CrossRef]
- Chae, Y.T.; Horesh, R.; Hwang, Y.; Lee, Y.M. Artificial neural network model for forecasting sub-hourly electricity usage in commercial buildings. Energy Build. 2016, 111, 184–194. [Google Scholar] [CrossRef]
- Fan, C.; Sun, Y.; Zhao, Y.; Song, M.; Wang, J. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
- Xue, G.; Pan, Y.; Lin, T.; Song, J.; Qi, C.; Wang, Z. District heating load prediction algorithm based on feature fusion LSTM model. Energies 2019, 12, 2122. [Google Scholar] [CrossRef] [Green Version]
- Song, J.; Zhang, L.; Xue, G.; Ma, Y.; Gao, S.; Jiang, Q. Predicting hourly heating load in a district heating system based on a hybrid CNN-LSTM model. Energy Build. 2021, 243, 110998. [Google Scholar] [CrossRef]
- Kim, T.; Cho, S. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Behzadan, V.; Munir, A. Vulnerability of deep reinforcement learning to policy induction attacks. In International Conference on Machine Learning and Data Mining in Pattern Recognition; Springer: Cham, Switzerland, 2017; pp. 262–275. [Google Scholar]
- Zhao, D.; Chen, Y.; Lv, L. Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans. Cogn. Dev. Syst. 2016, 9, 356–367. [Google Scholar] [CrossRef]
- Liu, G.; Schulte, O.; Zhu, W.; Li, Q. Toward interpretable deep reinforcement learning with linear model u-trees. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2018; pp. 414–429. [Google Scholar]
- Yu, C.; Yang, T.; Zhu, W.; Li, G. Learning shaping strategies in human-in-the-loop interactive reinforcement learning. arXiv 2018, arXiv:1811.04272. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Tso, S.; Liu, K. Hidden markov model for intelligent extraction of robot trajectory command from demonstrated trajectories. In Proceedings of the IEEE International Conference on Industrial Technology, Shanghai, China, 2–6 December 1996; pp. 294–298. [Google Scholar]
- Kruger, V.; Herzog, D.L.; Baby, S.; Ude, A.; Kragic, D. Learning actions from observations. IEEE Robot. Autom. Mag. 2010, 17, 30–43. [Google Scholar] [CrossRef]
- Liu, T.; Tan, Z.; Xu, C.; Chen, H.; Li, Z. Study on deep reinforcement learning techniques for building energy consumption forecasting. Energy Build. 2020, 208, 109675. [Google Scholar] [CrossRef]
- Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid 2018, 10, 3698–3708. [Google Scholar] [CrossRef] [Green Version]
- Mason, K.; Grijalva, S. A review of reinforcement learning for autonomous building energy management. Comput. Electr. Eng. 2019, 78, 300–312. [Google Scholar] [CrossRef] [Green Version]
- Baghaee, S.; Ulusoy, I. User Comfort and Energy Efficiency in HVAC Systems by Q-learning. In Proceedings of the Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar]
- Faddel, S.; Tian, G.; Zhou, Q.; Aburub, H. Data Driven Q-Learning for Commercial HVAC Control. In Proceedings of the SoutheastCon, Raleigh, NC, USA, 28–29 March 2020. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Tang, H.; Houthooft, R.; Foote, D.; Stooke, A.; Xi Chen, O.; Duan, Y.; Schulman, J.; De Turck, F.; Abbeel, P. Exploration: A study of count-based exploration for deep reinforcement learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 1–18. [Google Scholar]
- Plappert, M.; Houthooft, R.; Dhariwal, P.; Sidor, S.; Chen, R.Y.; Chen, X.; Asfour, T.; Abbeel, P.; Andrychowicz, M. Parameter space noise for exploration. arXiv 2017, arXiv:1706.01905. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actorcritic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Ng, A.Y.; Harada, D.; Russell, S. Policy invariance under reward transformations: Theory and application to reward shaping. Icml 1999, 99, 278–287. [Google Scholar]
- Devidze, R.; Radanovic, G.; Kamalaruban, P.; Singla, A. Explicable reward design for reinforcement learning agents. Adv. Neural Inf. Process. Syst. 2021, 34, 20118–20131. [Google Scholar]
- Balakrishnan, S.; Nguyen, Q.P.; Low BK, H.; Soh, H. Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 4187–4198. [Google Scholar]
- Amasyali, K.; El-Gohary, N.M. A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energy Rev. 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
Parameters | Values |
---|---|
Length | 4 m |
Width | 3 m |
Height | 2.8 m |
Area of windows | 1.2 m2 |
External Walls (from Outside to Inside) | ||||
Materials | Thickness | Thermal Conductivity | Density | Heat Capacity |
(mm) | (W/(m·K)) | (kg/m3) | (kJ/(kg·K)) | |
Cement mortar | 20 | 0.93 | 1800 | 1.05 |
Reinforced concrete | 200 | 1.74 | 2500 | 0.92 |
EPS insulation board | 80 | 0.039 | 30 | 1.38 |
Cement mortar | 20 | 0.93 | 1800 | 1.05 |
Roofs (from Top to Bottom) | ||||
Materials | Thickness | Thermal Conductivity | Density | Heat Capacity |
(mm) | (W/(m·K)) | (kg/m3) | (kJ/(kg·K)) | |
Cement mortar | 20 | 0.93 | 1800 | 1.05 |
XPS insulation board | 75 | 0.03 | 30 | 1.38 |
Reinforced concrete | 200 | 1.74 | 2500 | 0.92 |
Cement mortar | 20 | 0.93 | 1800 | 1.05 |
Floors (from Top to Bottom) | ||||
Materials | Thickness | Thermal Conductivity | Density | Heat Capacity |
(mm) | (W/(m·K)) | (kg/m3) | (kJ/(kg·K)) | |
Cement mortar | 20 | 0.93 | 1800 | 1.05 |
EPS insulation board | 25 | 0.039 | 30 | 1.38 |
Reinforced concrete | 150 | 1.74 | 2500 | 0.92 |
External Windows (5 mm/9 mm Argon/5 mm/9 mm Argon/5 mm) | ||||
Parameters | Heat Transfer Coefficient (W/(m2 K) | Solar Heat Gain Coefficient (−) | Effective Area Factor (−) | |
Value | 1.0 | 0.7 | 0.75 |
Types of Reward Function | Characteristics |
---|---|
Discrete reward function [29] | Contains less information |
Continuous reward function [35] | Contains more information, and suitable for continuous action space |
Inverse reinforcement learning [36] | Require a large amount of labeled data |
Parameters | Values |
---|---|
Neuron number of the LSTM | 256 |
Activation function of the LSTM | ReLU |
Neuron number of the FC | 128 |
Activation function of the FC | ReLU |
Optimizer | Adam |
Learning rate | 0.001 |
Dropout | 0.5 |
Parameters | Branch of State | Branch of Action |
---|---|---|
Neuron number of the LSTM | 256 | 128 |
Activation function of the LSTM | ReLU | ReLU |
Neuron number of the FC | 128 | - |
Activation function of the FC | ReLU | - |
Optimizer | Adam | Adam |
Learning rate | 0.001 | 0.001 |
Dropout | 0.5 | 0.5 |
Parameters | Values | |
---|---|---|
Sample time | 1 | |
Discount factor | 0.995 | |
Maximum episode | 1000 | |
Average window length | 10 | |
Exploration Method | Type of noise | Gaussian action noise |
Mean | 50 | |
Standard deviation | 200 | |
Decay of SD | 0.00001 | |
Batch size | 64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.; Zheng, L.; Yuan, J.; Huang, K.; Zhou, Z. Building Heat Demand Prediction Based on Reinforcement Learning for Thermal Comfort Management. Energies 2022, 15, 7856. https://doi.org/10.3390/en15217856
Wang C, Zheng L, Yuan J, Huang K, Zhou Z. Building Heat Demand Prediction Based on Reinforcement Learning for Thermal Comfort Management. Energies. 2022; 15(21):7856. https://doi.org/10.3390/en15217856
Chicago/Turabian StyleWang, Chendong, Lihong Zheng, Jianjuan Yuan, Ke Huang, and Zhihua Zhou. 2022. "Building Heat Demand Prediction Based on Reinforcement Learning for Thermal Comfort Management" Energies 15, no. 21: 7856. https://doi.org/10.3390/en15217856
APA StyleWang, C., Zheng, L., Yuan, J., Huang, K., & Zhou, Z. (2022). Building Heat Demand Prediction Based on Reinforcement Learning for Thermal Comfort Management. Energies, 15(21), 7856. https://doi.org/10.3390/en15217856