Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit
Abstract
:1. Introduction
1.1. Background and Motivation
1.2. Literature Review and Research Gap
1.3. Contribution and Organization
- (1)
- A new resource management model, which decouples the resource selection problem and the power dispatch problem to avoid redundancy and unnecessary participation.
- (2)
- A solution strategy based on contextual bandit with a DQN structure, which treats the performance of the power dispatch problem as a reward, and learns the policy of resource selection in a closed-loop manner.
- (3)
- A distributionally robust variant to cope with the demand uncertainty in the power dispatch task, whose superiorities are empirically demonstrated.
2. Problem Formulation
2.1. The Architecture of the Regional Energy System
2.2. The Power Dispatch Problem
2.2.1. Diesel Generator
2.2.2. Gas Turbine
2.2.3. Curtailable Load
3. Solution Strategy
3.1. The Contextual Bandit Formulation
3.2. The Contextual Bandit with a DQN Structure
Algorithm 1: Contextual bandit approach with a DQN structure |
1. Input: Batch size , learning rate , , learning parameters ,,. 2. Initialize: The parameter of the DQN-based agent is initialized randomly. 3. for do 4. for do 5. Based on the input state , the agent selects random action with probability , otherwise selects the best action . 6. Execute the action in the environment: Solve the power dispatch problem (26), and obtain the reward according to (13). 7. Restore the tuple in the agent buffer . 8. Randomly sample dataset with batch size of from the buffer , and update the agent’s network parameter according to (29). 9. if 10. 11. else 12. 13. end 14. end for 15. end for |
4. Case Study
4.1. Implementation Detail
4.2. Approaches Comparison
4.3. Results under the Different Number of Selected Resources
4.4. Comparison with Other Approaches for Dealing with the Uncertainty
4.5. Comparison with Other Contextual Bandit-Based Approaches
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Parameters of Resources in the Regional Energy Network
Item | Value |
---|---|
Resource quantity | 100 |
Marginal costs | [0.2, 0.4] |
Ramp up parameters | [30, 50] |
Ramp down parameters | [20, 40] |
Maximum output | [80, 120] |
Item | Value |
---|---|
Resource quantity. | 100 |
Generation efficiencies | [0.4, 0.6] |
Ramp up parameters | [50, 60] |
Ramp down parameters | [30, 50] |
Maximum output | [80, 100] |
Item | Value |
---|---|
Resource quantity | 50 |
Maximum curtailable power | [50, 80] |
Appendix B. The Detailed Model of the Policy Gradient (PG)—Based Bandit Approach and the Corresponding Parameters
Item | Value |
---|---|
Iteration epochs | 1000 |
No. of neurons in each layer | 64 |
No. of hidden layers | 2 |
No. of neurons in the input layer | 48 |
No. of neurons in the output layer | 250 |
Optimizer | SGD |
Learning rate | 1 × 10−3 |
References
- Available online: http://www.nea.gov.cn/ (accessed on 2 June 2022).
- Available online: https://www.gov.cn/ (accessed on 2 April 2021).
- Zhang, Y.; Ai, Q.; Wang, H.; Li, Z.; Huang, K. Bi-level distributed day-ahead schedule for islanded multi-microgrids in a carbon trading market. Electr. Power Syst. Res. 2020, 186, 106412. [Google Scholar] [CrossRef]
- Haider, R.; Annaswamy, A.M. A hybrid architecture for volt-var control in active distribution grids. Appl. Energy 2022, 312, 118735. [Google Scholar] [CrossRef]
- Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. [Google Scholar] [CrossRef]
- Bertsimas, D.; Litvinov, E.; Sun, X.A.; Zhao, J.; Zheng, T. Adaptive Robust Optimization for the Security Constrained Unit Commitment Problem. IEEE Trans. Power Syst. 2013, 28, 52–63. [Google Scholar] [CrossRef]
- Kong, F.; Liu, Y.; Tong, L.; Guo, W.; Qiu, Y.; Wang, Y. Optimization of co-production air separation unit based on MILP under multi-product deterministic demand. Appl. Energy 2022, 325, 119850. [Google Scholar] [CrossRef]
- Khodayar, M.; Liu, G.; Wang, J.; Khodayar, M.E. Deep learning in power systems research: A review. CSEE J. Power Energy Syst. 2020, 7, 209–220. [Google Scholar]
- Matsuo, Y.; Le Cun, Y.; Sahani, M.; Precup, D.; Silver, D.; Sugiyama, M.; Uchibe, E.; Morimoto, J. Deep learning, reinforcement learning, and world models. Neural Netw. 2022, 152, 267–275. [Google Scholar] [CrossRef]
- Jiang, W.; Liu, Y.; Fang, G.; Ding, Z. Research on short-term optimal scheduling of hydro-wind-solar multi-energy power system based on deep reinforcement learning. J. Clean. Prod. 2023, 385, 135704. [Google Scholar] [CrossRef]
- Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response. Energy Build. 2022, 259, 111903. [Google Scholar] [CrossRef]
- Qiu, D.; Ye, Y.; Papadaskalopoulos, D.; Strbac, G. A deep reinforcement learning method for pricing electric vehicles with discrete charging levels. IEEE Trans. Ind. Appl. 2020, 56, 5901–5912. [Google Scholar] [CrossRef]
- Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid 2020, 11, 1343–1355. [Google Scholar] [CrossRef]
- Du, Y.; Zandi, H.; Kotevska, O.; Kurte, K.; Munk, J.; Amasyali, K.; Mckee, E.; Li, F. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Appl. Energy 2021, 281, 116117. [Google Scholar] [CrossRef]
- Bouneffouf, D.; Rish, I.; Cecchi, G.A.; Feraud, R. Context attentive bandits: Contextual bandit with restricted context. arXiv 2017, arXiv:1705.03821. [Google Scholar]
- Zhang, Y.; Wu, Q.; Ai, Q.; Catalão, J.P.S. Closed loop Aggregated Baseline Load Estimation using Contextual Bandit with Policy Gradient. IEEE Trans. Smart Grid 2022, 13, 243–254. [Google Scholar] [CrossRef]
- Audibert, J.Y.; Munos, R.; Szepesvári, C. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 2009, 410, 1876–1902. [Google Scholar] [CrossRef]
- Silva, N.; Werneck, H.; Silva, T.; Pereira, A.C.M.; Rocha, L. Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions. Expert Syst. Appl. 2022, 197, 116669. [Google Scholar] [CrossRef]
- Lu, S.; Zhou, Y.H.; Shi, J.C.; Zhu, W.; Yu, Q.; Chen, Q.G.; Da, Q.; Zhang, L. Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event, AZ, USA, 21–25 February 2022; pp. 618–627. [Google Scholar]
- Kim, G.S.; Hong, Y.S.; Lee, T.H.; Paik, M.C.; Kim, H. Bandit-supported care planning for older people with complex health and care needs. arXiv 2023, arXiv:2303.07053. [Google Scholar]
- Yang, Y.; Teng, X.; Chen, K.; Cheng, W.; Si, Y.; Xu, J. Multi-armed Bandit Based Load Aggregation for Power System Frequency Regulation. In Proceedings of the 16th Annual Conference of China Electrotechnical Society, Beijing, China, 25–26 September 2021; Springer Nature Singapore: Singapore, 2022; Volume 2, pp. 686–693. [Google Scholar]
- Chen, X.; Hu, Q.; Shi, Q.; Quan, X.; Wu, Z.; Li, F. Residential HVAC aggregation based on risk-averse multi-armed bandit learning for secondary frequency regulation. J. Mod. Power Syst. Clean Energy 2020, 8, 1160–1167. [Google Scholar] [CrossRef]
- Lei, Z.; Xu, X.; Li, J.; Fan, L.; Chen, X.; Ding, H. Optimal scheduling of Renewable Energy Sources for Grid Frequency Stability Using Multi-armed Bandit Method. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 23–25 December 2021; pp. 665–670. [Google Scholar]
- Sun, J.; Zhao, Y.; Zhang, N.; Chen, X.; Hu, Q.; Song, J. A dynamic distributed energy storage control strategy for providing primary frequency regulation using multi-armed bandits method. IET Gener. Transm. Distrib. 2022, 16, 669–679. [Google Scholar] [CrossRef]
- Hu, Q.; Zhang, N.; Quan, X.; Bai, L.; Wang, Q.; Chen, X. A user selection algorithm for aggregating electric vehicle demands based on a multi-armed bandit approach. IET Energy Syst. Integr. 2021, 3, 295–305. [Google Scholar] [CrossRef]
- Cheng, S.; Han, R.; Zhao, Y.; Hu, Q.; Jiang, W. Aggregating residential demands with a multi-armed bandit approach. In Proceedings of the 2019 IEEE Sustainable Power and Energy Conference (iSPEC), Beijing, China, 21–23 November 2019; pp. 2144–2149. [Google Scholar]
- Chen, X.; Nie, Y.; Li, N. Online residential demand response via contextual multi-armed bandits. IEEE Control. Syst. Lett. 2020, 5, 433–438. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, M.; Yu, H.; Ji, H.; Song, G.; Li, P.; Wang, C.; Wu, J. An islanding partition method of active distribution networks based on chance-constrained programming. Appl. Energy 2019, 242, 78–91. [Google Scholar] [CrossRef]
- Doluweera, G.; Hahn, F.; Bergerson, J.; Pruckner, M. A scenario-based study on the impacts of electric vehicles on energy consumption and sustainability in Alberta. Appl. Energy 2020, 268, 114961. [Google Scholar] [CrossRef]
- Ehsan, A.; Yang, Q. Scenario-based investment planning of isolated multi-energy microgrids considering electricity, heating and cooling demand. Appl. Energy 2019, 235, 1277–1288. [Google Scholar] [CrossRef]
- Li, Y.; Ming, B.; Huang, Q.; Wang, Y.; Liu, P.; Guo, P. Identifying effective operating rules for large hydro–solar–wind hybrid systems based on an implicit stochastic optimization framework. Energy 2022, 245, 123260. [Google Scholar] [CrossRef]
- Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
- Xie, W.; Ahmed, S. Distributionally robust chance constrained optimal power flow with renewables: A conic reformulation. IEEE Trans. Power Syst. 2017, 33, 1860–1867. [Google Scholar] [CrossRef]
- Jin, X.; Liu, B.; Liao, S.; Cheng, C.; Yan, Z. A Wasserstein metric-based distributionally robust optimization approach for reliable-economic equilibrium operation of hydro-wind-solar energy systems. Renew. Energy 2022, 196, 204–219. [Google Scholar] [CrossRef]
- Erdoğan, E.; Iyengar, G. Ambiguous chance constrained problems and robust optimization. Math. Program. 2006, 107, 37–61. [Google Scholar] [CrossRef] [Green Version]
- Amari, S.I.; Karakida, R.; Oizumi, M. Information geometry connecting wasserstein distance and kullback–leibler divergence via the entropy-relaxed transportation problem. Inf. Geom. 2018, 1, 13–37. [Google Scholar] [CrossRef] [Green Version]
- Xie, W. On distributionally robust chance constrained programs with asserstein distance. Math. Program. 2021, 186, 115–155. [Google Scholar] [CrossRef] [Green Version]
- Bayraksan, G.; Love, D.K. Data-driven stochastic programming using phi-divergences. Oper. Res. Revolut. Inf. 2015, 10, 1–19. [Google Scholar]
- Hanasusanto, G.A.; Roitch, V.; Kuhn, D.; Wiesemann, W. Ambiguous joint chance constraints under mean and dispersion information. Oper. Res. 2017, 63, 751–767. [Google Scholar] [CrossRef]
- Zhang, Y.; Shen, S.; Mathieu, J. Distributionally robust chance constrained optimal power flow with uncertain renewables and uncertain reserves provided by loads. IEEE Trans. Power Syst. 2017, 32, 1378–1388. [Google Scholar] [CrossRef]
- Schofield, J.; Tindemans, S.; Carmichael, R.; Tindemans, S.H.; Bilton, M.; Woolf, M.; Strbac, G. Low carbon london project: Data from the dynamic time-of-use electricity pricing trial, 2013. Tech. Rep. 2016, 7857, 7857. [Google Scholar]
- Al Mamun, A.; Sohel, M.; Mohammad, N.; Sunny, M.S.H.; Dipta, D.R.; Hossain, E. A comprehensive review of the load forecasting techniques using single and hybrid predictive models. IEEE Access 2020, 8, 134911–134939. [Google Scholar] [CrossRef]
- Feng, Y.; Rios, I.; Ryan, S.M.; Spürkel, K.; Watson, J.P.; Wets, R.J.B.; Woodruff, D.L. Toward scalable stochastic unit commitment. Part 1: Load scenario generation. Energy Syst. 2015, 6, 309–329. [Google Scholar] [CrossRef] [Green Version]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
- Hwang, C.L.; Yoon, K. Methods for multiple attribute decision making. In Multiple Attribute Decision Making: Methods and Applications: A State-of-the-Art Survey; Springer: New York, NY, USA, 1981; Volume 186, pp. 58–191. [Google Scholar]
- Wu, H.; Shahidehpour, M.; Li, Z.; Tian, W. Chance-constrained day-ahead scheduling in stochastic power system operation. IEEE Trans. Power Syst. 2014, 29, 1583–1591. [Google Scholar] [CrossRef]
Ref. | Resource Type | Service | Approach |
---|---|---|---|
[21] | Residential air conditioning | Primary and secondary frequency regulation | Risk-averse MAB |
[22] | Heating, ventilation, and air conditioning (HVAC) | Secondary frequency regulation | Risk-averse MAB |
[23] | Renewable Energy Sources | Secondary frequency regulation | MAB |
[24] | Energy storage | Primary frequency regulation | MAB |
[25] | Electric vehicle | Ancillary services | MAB |
[26] | Residential demand | Demand response | MAB |
[27] | Residential demand | Demand response | Contextual MAB |
This paper | Diesel generator, Gas turbine, Curtailable load | Load balance, Demand response | Contextual MAB |
Item | Value |
---|---|
Batch size | 128 |
No. of hidden layers | 2 |
No. of neurons in the input layer | 48 |
No. of neurons in the output layer | 250 |
No. of neurons in the first hidden layer | 256 |
No. of neurons in the second hidden layer | 512 |
Optimizer | Adam |
Learning rate | 1 × 10−4 |
K = 50 | K = 75 | K = 100 | K = 125 | K = 150 | |
---|---|---|---|---|---|
The cost of the proposed approach | −11,859.38 | −20,146.53 | −23,437.17 | −24,486.76 | −25,228.14 |
The cost of the comparison benchmark | \ | \ | \ | \ | \ |
The number of non-participating resources | 0 | 14 | 29 | 57 | 72 |
K = 175 | K = 200 | K = 225 | K = 250 | |
---|---|---|---|---|
The cost of the proposed approach | −25,855.20 | −25,849.38 | −26,056.01 | −26,064.28 |
The cost of the comparison benchmark | \ | \ | \ | −26,064.28 |
The number of non-participating resources | 94 | 117 | 139 | 180 |
K = 75 | K = 100 | K = 125 | K = 150 | K = 175 | K = 200 | K = 225 | |
---|---|---|---|---|---|---|---|
Improvement | 34.38% | 15.56% | 10.92% | 5.99% | 7.99% | 7.11% | 4.73% |
Approach | Non-Violation Probability | Profit |
---|---|---|
DRO-10% | 99.94% | 23,437.17 |
CCP-10% | 97.90% | 23,383.58 |
DRO-20% | 99.66% | 23,860.45 |
CCP-20% | 95.38% | 23,151.89 |
DRO-30% | 97.11% | 23,260.54 |
CCP-30% | 93.46% | 23,246.75 |
DRO-40% | 95.31% | 23,742.61 |
CCP-40% | 90.78% | 23,558.23 |
DRO-50% | 96.84% | 24,331.25 |
CCP-50% | 89.26% | 22,911.03 |
RO | 100% | 6274.31 |
Approach | The DQN-Based Bandit Approach | The PG-Based Bandit Approach | Profit Improvement |
---|---|---|---|
K = 50 | 11,859.38 | 8097.61 | 46.46% |
K = 75 | 20,146.53 | 17,977.75 | 12.06% |
K = 100 | 23,437.17 | 20,578.44 | 13.89% |
K = 125 | 24,486.76 | 24,110.70 | 1.56% |
K = 150 | 25,228.14 | 25,140.37 | 0.35% |
K = 175 | 25,855.2 | 25,491.63 | 1.43% |
K = 200 | 25,849.38 | 25,587.45 | 1.02% |
K = 225 | 26,056.01 | 25,936.18 | 0.46% |
Approach | The DQN-Based Bandit Approach | The PG-Based Bandit Approach |
---|---|---|
K = 50 | 0 | 0 |
K = 75 | 14 | 6 |
K = 100 | 29 | 20 |
K = 125 | 57 | 55 |
K = 150 | 72 | 74 |
K = 175 | 94 | 98 |
K = 200 | 117 | 115 |
K = 225 | 139 | 139 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Ai, Q. Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit. Electronics 2023, 12, 2783. https://doi.org/10.3390/electronics12132783
Li Z, Ai Q. Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit. Electronics. 2023; 12(13):2783. https://doi.org/10.3390/electronics12132783
Chicago/Turabian StyleLi, Zhaoyu, and Qian Ai. 2023. "Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit" Electronics 12, no. 13: 2783. https://doi.org/10.3390/electronics12132783
APA StyleLi, Z., & Ai, Q. (2023). Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit. Electronics, 12(13), 2783. https://doi.org/10.3390/electronics12132783