A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids
Abstract
:1. Introduction
- (1)
- How the QL algorithm could be combined to fit better with energy trading mechanisms to describe the characteristics of the future energy market. Bidding in the future energy market is close to real-time enhanced by ICTs, and the iteration of Q-values should be round-based rather than hour-based or day-based, whereas the time scale of updating Q-values in [29] couldn’t reflect the latest market status. In addition, for a multi-microgrid system, the QL algorithm should be carried out separately by each microgrid. The authors in [34] provided the thought of applying a fitted-Q iteration algorithm in the electric power system, and more appropriate methods need to be proposed.
- (2)
- How the coupling relationship of bidding price and quantity should be modeled and reflected by the Q-values of the Q-learning algorithm. Little research has been made about the impact of bidding quantity on bidding results in the above literature. Wang’s work referred to the issue of bidding quantity [25], but only the bidding strategies of sellers in the market are discussed. In addition, the energy trading game presented in this paper adopted a discontinuous pricing rule. The impact of BESS on adjusting bidding quantity was mentioned in [35] without considering the ramping restriction, which is not practical in realistic scenes. The authors in [36] applied the extended fitted-Q iteration algorithm to control the operation modes of battery storage devices in a microgrid; however, only three actions were taken into consideration in this paper and the (dis)charge rate constraints were ignored.
- (3)
- How the QL parameters should be decided by each microgrid, considering real-time energy market status, microgrid preferences, historical trading records and other time-varying factors. In QL algorithms, the risk characteristic of one microgrid is reflected by the values of QL parameters. However, in the existing literature, those QL-based models try to identify the bidding status according to the experiences gained from a series of trials in the current bidding rounds, ignoring the importance of historical trading records. The authors in [30,31] had noticed this issue, but the relationship between QL parameters and bidding performances were not analyzed in detail. In addition, the progress of QL research in other areas [37] hasn’t been introduced into the energy trading market.
- (1)
- The energy trading problem among microgrids in the distribution network is framed as a sequential decision problem. The non-cooperative energy market operation and bidding behaviors are modeled with a continuous double auction mechanism, which decreases the need for centralized control and suits the weakly-centralized nature of this distribution network.
- (2)
- The high dimensional continuous problem is tackled by the Q-learning algorithm. Except for the bidding price, the bidding quantity of microgrids is considered as the second dimension of bidding action space and could be adjusted during the bidding process with the assistance of BESS, by which the coupling relationship between energy trading price and quantity during bidding process is handled. Related parameter setting and sharing mechanisms are designed.
- (3)
- A non-tabular solution of Q-values considering two dimensions of action space is designed as a Q-cube. The Q-value distribution in the proposed Q-cube is in accordance with the behavior preferences of the microgrids.
- (4)
- The real-time supply and demand relationship is highlighted as the state in the proposed Q-learning algorithm. A normal probability density distribution is divided into eight equal parts as eight states for all the microgrids. In addition, the idea of ’local search’ in heuristic algorithms is applied in the proposed Q-learning algorithm for generating the action space. This approach not only takes the characteristics of power grids into consideration, but also achieves the compromise between exploitation and exploration in the action space.
- (5)
- The proposed continuous double auction mechanism and Q-learning algorithm are validated by a realistic case from Hongfeng Lake, Guizhou Province, China. Profit comparison with traditional and P2P energy trading mechanisms highlights the doability and efficiency of the proposed method. A 65.7% and 10.9% increase in the overall profit of the distribution network could be achieved by applying a Q-learning based continuous double auction mechanism compared with the two mechanisms mentioned above.
2. Mechanism Design for Continuous Double Auction Energy Trading Market
2.1. Non-Cooperative Energy Trading Market Overview
2.2. Q-Learning Based Continuous Double Auction Mechanism
2.2.1. Initialization Setups
2.2.2. Energy Sharing Mechanism
- (1) Form a bidding layer according to the above-mentioned method and proceed to (2).
- (2) Allocate the energy in this layer. If energy demand exceeds supply in this layer, the sharing process is over after allocation. If energy supply exceeds demand in this layer, proceed to (3).
- (3) Remove the buyer microgrids in this layer from the optional sharing list as their energy demands are satisfied. Remove the sell microgrids whose selling prices are higher than the current highest price of buyer microgrids as there are no potential buyers for them. Return to (1) to form a new bidding layer.
3. A Q-Cube Framework for Q-Learning in a Continuous Double Auction Energy Trading Market
3.1. Basic Definitions for Q-Learning
3.1.1. State Space
3.1.2. Action
3.1.3. Q-Values and Rewards
3.2. Update Mechanism of the Proposed Q-Cube Framework
3.2.1. Q-Value Update
3.2.2. Action Update
4. Case Studies and Simulation Results
- Conservative Strategy: the high value of (ranged in [0.6,1]) indicates that the MGO is greedy about new bidding information, but the higher choice of (ranged in [0.6,1]) indicates that he is conservative on basic QLCDA bidding actions. In addition, he’s satisfied with a lower value of (ranged in [0,0.4]) as future revenue is not important for him.
- In-Between Strategy: Ordinary choices of three parameters ranged in [0.4,0.6].
- Risk-Taking Strategy: the MGO is not greedy about new bidding information (low value of ranged in [0,0.4]) but likes to explore more potential actions (low value of ranged in [0,0.4]) as a risk-taker. In the meantime, he is eager for more future profits (high value of ranged in [0.6,1]).
4.1. Case Study 1: Bidding Performance of the Proposed Continuous Double Auction Mechanism
4.1.1. Bidding Performance of the Overall Energy Trading Market
4.1.2. Bidding Results of Specific Microgrids with Different Roles
4.2. Case Study 2: Effectiveness Verification of the Proposed Q-Cube Framework
4.3. Case Study 3: Profit Analysis on Different Energy Trading Mechanisms
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
EI | Energy Internet |
DER | Distributed Energy Resource |
DG | Distributed Generation |
BESS | Battery Energy Storage System |
EV | Electric Vehicle |
DL | Dispatchable Load |
RES | Renewable Energy Source |
ICT | Information and Communication Technology |
MGO | Microgrid Operator |
DNO | Distribution Network Operator |
CDA | Continuous Double Auction |
P2P | Peer-to-Peer |
RL | Reinforcement Learning |
QL | Q-learning |
MDP | Markov Decision Process |
ICD | Internal Coordination Dispatch |
SOC | State of Charge |
QLCDA | Q-learning based Continuous Double Auction |
SDR | Supply and Demand Relationship |
Appendix A. Supplementary Case Data from the Guizhou Grid, China
MG 1 | MG 2 | MG 3 | MG 4 | MG 5 | MG 6 | MG 7 | |
Capacity(kWh) | 40 | 20 | 80 | 100 | 100 | 300 | 80 |
Initial SOC (%) | 33.74 | 30.60 | 68.78 | 68.05 | 49.65 | 54.93 | 60.59 |
Charge & Discharge Restriction(%) | 24.05 | 16.18 | 16.39 | 15.55 | 15.98 | 16.15 | 18.23 |
Charge & Discharge Efficiency | 0.8543 | 0.9207 | 0.9465 | 0.8756 | 0.9156 | 0.9400 | 0.9309 |
MG 8 | MG 9 | MG 10 | MG 11 | MG 12 | MG 13 | MG 14 | |
Capacity(kWh) | 150 | 70 | 50 | 60 | 100 | 200 | 150 |
Initial SOC (%) | 60.59 | 44.99 | 32.84 | 59.05 | 48.00 | 69.58 | 62.65 |
Charge & Discharge Restriction(%) | 18.23 | 20.91 | 19.93 | 20.91 | 23.39 | 18.96 | 18.87 |
Charge & Discharge Efficiency | 0.9309 | 0.8899 | 0.8933 | 0.8791 | 0.8891 | 0.9004 | 0.8622 |
Time Interval | Interval Type | Price (CNY/kWh) |
---|---|---|
8 a.m.–11 a.m., 6 p.m.–9 p.m. | Peak | 1.197 |
6 a.m.–8 a.m., 11 a.m.–6 p.m., 9 p.m.–10 p.m. | Flat | 0.744 |
10 p.m.–6 a.m. | Valley | 0.356 |
MG 1 | MG 2 | MG 3 | MG 4 | MG 5 | MG 6 | MG 7 | |
Learning Rate | 0.5107 | 0.3205 | 0.7124 | 0.7969 | 0.7169 | 0.2617 | 0.6210 |
Discount Factor | 0.5240 | 0.6423 | 0.7569 | 0.3373 | 0.7781 | 0.6721 | 0.4231 |
Greedy Degree | 0.6564 | 0.3564 | 0.8156 | 0.4961 | 0.3485 | 0.1680 | 0.4894 |
MG 8 | MG 9 | MG 10 | MG 11 | MG 12 | MG 13 | MG 14 | |
Learning Rate | 0.3546 | 0.5083 | 0.3291 | 0.5566 | 0.2371 | 0.6812 | 0.2996 |
Discount Factor | 0.4670 | 0.7065 | 0.2570 | 0.4146 | 0.6397 | 0.3330 | 0.4062 |
Greedy Degree | 0.3648 | 0.2658 | 0.6173 | 0.3173 | 0.4983 | 0.8462 | 0.7943 |
Parameter | |||||||||
---|---|---|---|---|---|---|---|---|---|
Value | 0.50 | 0 | 0.30 | 1.20 | 0.40 | 0.98 | 0.30 | 0.10 | 0.70 |
References
- Ampatzis, M.; Nguyen, P.H.; Kling, W. Local electricity market design for the coordination of distributed energy resources at district level. In IEEE PES Innovative Smart Grid Technologies, Europe; IEEE: Piscataway, NJ, USA, 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Al-Awami, A.T.; Amleh, N.A.; Muqbel, A.M. Optimal Demand Response Bidding and Pricing Mechanism With Fuzzy Optimization: Application for a Virtual Power Plant. IEEE Trans. Ind. Appl. 2017, 53, 5051–5061. [Google Scholar] [CrossRef]
- Dehghanpour, K.; Nehrir, M.H.; Sheppard, J.W.; Kelly, N.C. Agent-Based Modeling of Retail Electrical Energy Markets With Demand Response. IEEE Trans. Smart Grid 2018, 9, 3465–3475. [Google Scholar] [CrossRef]
- Jeong, G.; Park, S.; Lee, J.; Hwang, G. Energy Trading System in Microgrids With Future Forecasting and Forecasting Errors. IEEE Access 2018, 6, 44094–44106. [Google Scholar] [CrossRef]
- Wei, W.; Liu, F.; Mei, S. Energy Pricing and Dispatch for Smart Grid Retailers Under Demand Response and Market Price Uncertainty. IEEE Trans. Smart Grid 2015, 6, 1364–1374. [Google Scholar] [CrossRef]
- Olson, M.; Rassenti, S.; Rigdon, M.; Smith, V. Market Design and Human Trading Behavior in Electricity Markets. IIE Trans. 2003, 35, 833–849. [Google Scholar] [CrossRef]
- Nunna, H.S.V.S.K.; Doolla, S. Energy Management in Microgrids Using Demand Response and Distributed Storage—A Multiagent Approach. IEEE Trans. Power Deliv. 2013, 28, 939–947. [Google Scholar] [CrossRef]
- Nunna, H.S.V.S.K.; Doolla, S. Demand Response in Smart Distribution System With Multiple Microgrids. IEEE Trans. Smart Grid 2012, 3, 1641–1649. [Google Scholar] [CrossRef]
- Sueyoshi, T.; Tadiparthi, G. Wholesale Power Price Dynamics Under Transmission Line Limits: A Use of an Agent-Based Intelligent Simulator. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2008, 38, 229–241. [Google Scholar] [CrossRef] [Green Version]
- Kasbekar, G.S.; Sarkar, S. Pricing games among interconnected microgrids. In Proceedings of the 2012 IEEE Power and Energy Society General Meeting, San Diego, CA, USA, 22–26 July 2012; pp. 1–8. [Google Scholar] [CrossRef]
- Zhang, C.; Wu, J.; Zhou, Y.; Cheng, M.; Long, C. Peer-to-Peer energy trading in a Microgrid. Appl. Energy 2018, 220, 1–12. [Google Scholar] [CrossRef]
- Devine, M.T.; Cuffe, P. Blockchain Electricity Trading Under Demurrage. IEEE Trans. Smart Grid 2019, 10, 2323–2325. [Google Scholar] [CrossRef]
- Bunn, D.W.; Oliveira, F.S. Agent-based simulation-an application to the new electricity trading arrangements of England and Wales. IEEE Trans. Evol. Comput. 2001, 5, 493–503. [Google Scholar] [CrossRef]
- Vytelingum, P.; Cliff, D.; Jennings, N.R. Strategic bidding in continuous double auctions. Artif. Intell. 2008, 172, 1700–1729. [Google Scholar] [CrossRef] [Green Version]
- Nicolaisen, J.; Petrov, V.; Tesfatsion, L. Market power and efficiency in a computational electricity market with discriminatory double-auction pricing. IEEE Trans. Evol. Comput. 2001, 5, 504–523. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Wang, Q.; Zhou, N.; Chi, Y. A Novel Electricity Transaction Mode of Microgrids Based on Blockchain and Continuous Double Auction. Energies 2017, 10, 1971. [Google Scholar] [CrossRef]
- Tan, Z.; Gurd, J.R. Market-based grid resource allocation using a stable continuous double auction. In Proceedings of the 2007 8th IEEE/ACM International Conference on Grid Computing, Austin, TX, USA, 19–21 September 2007; pp. 283–290. [Google Scholar] [CrossRef]
- Long, C.; Wu, J.; Zhou, Y.; Jenkins, N. Peer-to-peer energy sharing through a two-stage aggregated battery control in a community Microgrid. Appl. Energy 2018, 226, 261–276. [Google Scholar] [CrossRef]
- Wang, N.; Xu, W.; Xu, Z.; Shao, W. Peer-to-Peer Energy Trading among Microgrids with Multidimensional Willingness. Energies 2018, 11, 3312. [Google Scholar] [CrossRef]
- Tushar, W.; Saha, T.K.; Yuen, C.; Liddell, P.; Bean, R.; Poor, H.V. Peer-to-Peer Energy Trading With Sustainable User Participation: A Game Theoretic Approach. IEEE Access 2018, 6, 62932–62943. [Google Scholar] [CrossRef]
- Rocchetta, R.; Bellani, L.; Compare, M.; Zio, E.; Patelli, E. A reinforcement learning framework for optimal operation and maintenance of power grids. Appl. Energy 2019, 241, 291–301. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Glavic, M.; Fonteneau, R.; Ernst, D. Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives. IFAC-PapersOnLine 2017, 50, 6918–6927. [Google Scholar] [CrossRef]
- Foruzan, E.; Soh, L.; Asgarpoor, S. Reinforcement Learning Approach for Optimal Distributed Energy Management in a Microgrid. IEEE Trans. Power Syst. 2018, 33, 5749–5758. [Google Scholar] [CrossRef]
- Wang, H.; Huang, T.; Liao, X.; Abu-Rub, H.; Chen, G. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information. IEEE Trans. Cybern. 2017, 47, 3404–3416. [Google Scholar] [CrossRef] [PubMed]
- Cai, K.; Niu, J.; Parsons, S. Using Evolutionary Game-Theory to Analyse the Performance of Trading Strategies in a Continuous Double Auction Market. In Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning; Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; pp. 44–59. [Google Scholar]
- Schvartzman, L.J.; Wellman, M.P. Stronger CDA Strategies Through Empirical Game-theoretic Analysis and Reinforcement Learning. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ’09), Budapest, Hungary, 10–15 May 2009; International Foundation for Autonomous Agents and Multiagent Systems: Richland, SC, USA, 2009; Volume 1, pp. 249–256. [Google Scholar]
- O’Neill, D.; Levorato, M.; Goldsmith, A.; Mitra, U. Residential Demand Response Using Reinforcement Learning. In Proceedings of the 2010 First IEEE International Conference on Smart Grid Communications, Gaithersburg, MD, USA, 4–6 October 2010; pp. 409–414. [Google Scholar] [CrossRef]
- Naghibi-Sistani, M.B.; Akbarzadeh-Tootoonchi, M.R.; Javidi-Dashte Bayaz, M.H.; Rajabi-Mashhadi, H. Application of Q-learning with temperature variation for bidding strategies in market based power systems. Energy Convers. Manag. 2006, 47, 1529–1538. [Google Scholar] [CrossRef]
- Rahimiyan, M.; Rajabi Mashhadi, H. Supplier’s optimal bidding strategy in electricity pay-as-bid auction: Comparison of the Q-learning and a model-based approach. Electr. Power Syst. Res. 2008, 78, 165–175. [Google Scholar] [CrossRef]
- Rahimiyan, M.; Mashhadi, H.R. An Adaptive $Q$-Learning Algorithm Developed for Agent-Based Computational Modeling of Electricity Market. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2010, 40, 547–556. [Google Scholar] [CrossRef]
- Salehizadeh, M.R.; Soltaniyan, S. Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration. Renew. Sustain. Energy Rev. 2016, 56, 1172–1181. [Google Scholar] [CrossRef]
- Rodriguez-Fernandez, J.; Pinto, T.; Silva, F.; Praça, I.; Vale, Z.; Corchado, J.M. Context aware Q-Learning-based model for decision support in the negotiation of energy contracts. Int. J. Electr. Power Energy Syst. 2019, 104, 489–501. [Google Scholar] [CrossRef]
- Ernst, D.; Glavic, M.; Geurts, P.; Wehenkel, L. Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. Int. J. Emerg. Electr. Power Syst. 2005, 3. [Google Scholar] [CrossRef]
- Bui, V.H.; Hussain, A.; Im, Y.H.; Kim, H.M. An internal trading strategy for optimal energy management of combined cooling, heat and power in building microgrids. Appl. Energy 2019, 239, 536–548. [Google Scholar] [CrossRef]
- Mbuwir, B.V.; Ruelens, F.; Spiessens, F.; Deconinck, G. Battery Energy Management in a Microgrid Using Batch Reinforcement Learning. Energies 2017, 10, 1846. [Google Scholar] [CrossRef]
- Manchin, A.; Abbasnejad, E.; Hengel, A.V. Reinforcement Learning with Attention that Works: A Self-Supervised Approach. arXiv 2019, arXiv:1904.03367. [Google Scholar]
Trading Quantity | MG 1 | MG 2 | MG 3 | MG 4 | MG 5 | MG 6 | MG 7 |
With Grid(Before)(kWh) | 530.0 | 351.3 | 1540.7 | 1747.2 | 2840.3 | 6787.7 | 2167.5 |
With Grid(After)(kWh) | 125.0 (23.6%) | 85.5 (24.4%) | 166.5 (10.8%) | 1145.2 (65.5%) | 741.1 (26.1%) | 3876.4 (57.1%) | 595.7 (27.5%) |
BESS Storage Change (kWh) | 21.7 | 13.3 | 55.5 | 31.2 | 32.0 | 151.1 | 36.1 |
BESS (Dis)Charge Loss(kWh) | 7.5 | 8.1 | 14.6 | 10.9 | 26.6 | 51.9 | 10.4 |
Trading Quantity | MG 8 | MG 9 | MG 10 | MG 11 | MG 12 | MG 13 | MG 14 |
With Grid(Before)(kWh) | 3754.1 | 1640.0 | 1275.4 | 1209.2 | 1616.9 | 4427.7 | 2230.6 |
With Grid(After)(kWh) | 1131.1 (30.1%) | 386.2 (23.5%) | 287.9 (22.6%) | 243.4 (20.1%) | 622.7 (38.5%) | 705.5 (15.9%) | 720.9 (32.3%) |
BESS Storage Change (kWh) | −5.9 | 15.2 | 14.2 | 3.9 | 26.4 | −26.1 | −9.7 |
BESS (Dis)Charge Loss(kWh) | 16.5 | 12.7 | 7.7 | 12.5 | 20.1 | 33.9 | 34.8 |
MG 1 | MG 2 | MG 3 | MG 4 | MG 5 | MG 6 | MG 7 | |
Profit in TM (CNY) | −428.3 | −281.5 | −1224.7 | −1429.4 | −1989.0 | −4542.4 | −1516.6 |
Profit in P2PM (CNY) | −340.9 | −230.0 | −993.5 | −1090.6 | −1483.0 | −3539.1 | −1199.6 |
Profit in QLCDAM (CNY) | −313.7 | −217.4 | −892.1 | −1154.5 | −1494.1 | −3989.2 | −1143.2 |
Growth Rate(Over P2PM) | 8.0% | 5.5% | 10.2% | −5.9% | −0.7% | −12.7% | 4.7% |
MG 8 | MG 9 | MG 10 | MG 11 | MG 12 | MG 13 | MG 14 | |
Profit in TM (CNY) | 1126.2 | 491.9 | 295.4 | −110.1 | 421.2 | 1328.3 | 515.6 |
Profit in P2PM (CNY) | 1687.4 | 808.4 | 459.3 | −70.5 | 581.0 | 1874.2 | 708.4 |
Profit in QLCDAM (CNY) | 1875.4 | 897.0 | 471.3 | 29.3 | 564.2 | 2105.8 | 743.1 |
Growth Rate(Over P2PM) | 11.1% | 11.0% | 2.6% | 141.6% | −2.9% | 12.4% | 4.9% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, N.; Xu, W.; Shao, W.; Xu, Z. A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids. Energies 2019, 12, 2891. https://doi.org/10.3390/en12152891
Wang N, Xu W, Shao W, Xu Z. A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids. Energies. 2019; 12(15):2891. https://doi.org/10.3390/en12152891
Chicago/Turabian StyleWang, Ning, Weisheng Xu, Weihui Shao, and Zhiyu Xu. 2019. "A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids" Energies 12, no. 15: 2891. https://doi.org/10.3390/en12152891
APA StyleWang, N., Xu, W., Shao, W., & Xu, Z. (2019). A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids. Energies, 12(15), 2891. https://doi.org/10.3390/en12152891