Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications
Abstract
:1. Introduction
- Renewable Energy Sources and Storage Solutions for Smart Grid Services and Provisioning: In RES frameworks, optimal control is empowered by novel methodologies that can be used to optimally exploit the provision of renewable energy, to store excess energy when it is available and discharge it when energy demand is high or energy prices are favorable, to minimize costs, to forecast RES generation, as well as to optimize the operation of the integrated energy-storage systems, such as batteries. Procedures may utilize historical energy production and consumption data, weather forecasts and energy market prices to make optimal energy-storage decisions and RES integration to the grid.
- Building Energy-Management Systems—BEMSs: In buildings, optimal control and multiple energy-friendly technologies are being used to reduce energy consumption and carbon emissions. For example, building energy-management systems can use data from sensors and smart meters to optimize heating, ventilation and air-conditioning (HVAC) systems, lighting and other building systems. Solar PV systems and other RES technologies along with storage may be additionally integrated into the optimization mix to provide a reliable and sustainable source of electricity.
- Electric Vehicle Charging Stations—EVCSs: Similarly, in charging stations, optimal control and multiple energy-friendly technologies are being used to manage the charging of electric vehicles. Smart charging systems can prioritize the charging of vehicles based on their battery levels and enhance the efficiency of renewable energy sources while minimizing the load on the grid during peak hours.
1.1. Control Strategies for Energy Systems
1.2. Literature Analysis Approach
- Criteria of Integrated Studies: First, the criteria of included studies are defined according to the following research topics: renewable energy sources plant control based on Reinforcement Learning methodologies; building energy-management systems control based on Reinforcement Learning methodologies; electric vehicle charging station system control based on Reinforcement Learning methodologies.
- Selection of Related Key Words: Subsequently, we identified and examined the pertinent keywords associated with the subject matter. A research term is a composite of the primary terms: photovoltaics Reinforcement Learning Control, wind power plants Reinforcement Learning Control, building HVAC Reinforcement Learning Control, water heating Reinforcement Learning Control, lighting Reinforcement Learning Control, electric vehicle charging station Reinforcement Learning Control. The selection of keywords was based on the distinctive attributes of RL and the challenges encountered in RES, BEMS and EVCS domains.
- Study Selection: Subsequently, the research articles obtained through the search conducted in online databases, including Google Scholar and Scopus, were subjected to a rigorous assessment by screening the abstracts. To this end, promising and valuable outcomes were selected for further comprehensive examination and analysis.
- Data Extraction: In the survey process, the fourth stage involved the extraction of data, which entailed categorizing the acquired information from each study based on the type of Reinforcement Learning, the utilized algorithmic methodology and the relevant application testbed. Subsequently, an analysis was conducted on the selected studies, addressing various aspects such as the merits, drawbacks, contributions, potential enhancements, practicality and other relevant factors pertaining to the optimal control problem in energy systems. This comprehensive evaluation provided insights and a critical examination of the surveyed literature.
- Quality Assessment: Following the previous steps, the quality of the work is evaluated based on several criteria. These criteria include the number of citations received by the paper, the scientific contribution of the authors and the methodologies employed in the research. These assessments provide insights into the impact, significance and rigor of the work conducted.
- Data Analysis: In the concluding phase, the studies and their outcomes are systematically organized into distinct subcategories and classifications, facilitating a comprehensive comparison of the research findings.
1.3. Previous Literature Work
1.4. Novelty and Contribution
1.5. Paper Structure
2. Background and Overview of Reinforcement Learning
2.1. Markov Decision Processes
- denotes the state space, which contains all the possible states an agent can find itself in;
- denotes the action space, which contains all the possible actions an agent can take;
- denotes the transition probability of the agent, ), where is the next time step and t the current one;
- ) denotes the reward function which defines the reward the agent will receive;
- denotes the start-state distribution;
- denotes the discount factor, which determines the influence an action has in future rewards, with values closer to 1 having greater impact;
- is the policy which maps the states to the actions taken.
2.2. Value Functions and Bellman Equations
2.3. Reinforcement Learning Methods and Algorithms
2.3.1. Policy-Optimization and Policy-Based Approaches
2.3.2. Q-Learning and Value-Based Approaches
3. Review of RL-Based Methods for Optimal Control of Energy Systems and Smart Grids
3.1. Reinforcement Learning towards Optimal Control in Renewable Energy Source (RES) Plants
3.1.1. Value-Based RL in RES
3.1.2. Policy-Based, Actor–Critic and Hybrid RL in RES
Reference Year | Field | Methodlogy | Type |
---|---|---|---|
[67] 2017 | Solar Power Systems | RLMPPT Q-Learning | Value-based |
[68] 2019 | Solar Power Systems | Q-Learning | Value-based |
[73] 2019 | Solar Power Systems | hybrid P&O w/Q-Learning | Value-based |
[77] 2020 | Solar Power Systems | MADDPG | Policy and Value-based, Actor–Critic |
[76] 2021 | Solar Power Systems | PPO | Policy-based |
[65] 2015 | Wind Power Systems | MPPT Q-Learning | Value-based |
[66] 2016 | Wind Power Systems | ANN MPPT Q-Learning | Value-based |
[74] 2019 | Wind Power Systems | ANNs & Q-Learning | Value-based |
[75] 2020 | Wind Power Systems | ANNs & Q-Learning | Value-based |
[78] 2020 | Wind Power Systems | KA-DDPG | Policy and Value-based, Actor–Critic |
[64] 2013 | Multi | Q-Learning | Value-based |
[69] 2019 | Multi | DQN | Value-based |
[79] 2022 | Multi | PPO | Policy-based |
3.2. Reinforcement Learning towards Optimal Control in Building Energy-Management Systems (BEMSs)
3.2.1. Value-Based RL in BEMS
3.2.2. Policy-Based, Actor–Critic and Hybrid RL Approaches in BEMS
Reference Year | Field | Methodlogy | Type |
---|---|---|---|
[81] 2015 | HVAC | ANNs & Q-Learning | Value-based |
[85] 2017 | HVAC | ANNs & Q-Learning | Value-based |
[86] 2017 | HVAC | Q-Learning | Value-based |
[87] 2019 | HVAC | REINFORCE | Policy-based |
[98] 2019 | HVAC | DDPG | Actor–Critic |
[88] 2019 | HVAC | Double Q-Learning | Value-based |
[99] 2020 | HVAC | PPO-Clip | Policy-based |
[100] 2021 | HVAC | DDPG | Actor–Critic |
[94] 2021 | HVAC/Multi | DQN | Value-based |
[95] 2021 | HVAC | DQN | Value-based |
[82] 2016 | Water Heating | Fitted Q-iteration | Value-based |
[83] 2016 | Water Heating | Q-Learning | Value-based |
[96] 2017 | Water Heating | Batch-RL & Fitted Q-iteration | Policy and Value-based |
[97] 2018 | Water Heating | A3C | Policy-based |
[89] 2019 | Water Heating/Multi | Monte-Carlo with Exploring Starts (MCES) | Value-based |
[92] 2020 | Water Heating | Double Q-Learning with Memory Replay | Value-based |
[93] 2021 | Water Heating/Multi | ANNs & Q-Learning | Value-based |
[101] 2021 | Water Heating/Multi | SAC | Actor–Critic |
[102] 2021 | Water Heating/Multi | SAC | Actor–Critic |
[84] 2016 | Lighting | Q-Learning | Value-based |
[90] 2019 | Lighting | Value Iteration | Value-based |
[91] 2019 | Lighting | Branching Dueling Q-Network | Value-based |
3.3. Reinforcement Learning for Optimal Control in Electric Vehicle Charging Stations (EVCSs)
3.3.1. Value-Based RL in EVCSs
3.3.2. Policy-Based, Actor–Critic and Hybrid RL in EVCSs
Reference Year | Field | Methodlogy | Type |
---|---|---|---|
[103] 2015 | EVCS/Multi | Batch-RL Fitted Q-iteration | Value-based |
[104] 2016 | EVCS/Multi | Batch-RL Fitted Q-iteration | Value-based |
[105] 2017 | EVCS/Multi | Batch-RL Fitted Q-iteration | Value-based |
[113] 2019 | EVCS | Constrained Policy Optimization | Policy-based |
[106] 2019 | EVCS | MASCO | Value-based |
[107] 2019 | EVCS | Deep RL & DQN | Value-based |
[108] 2019 | EVCS | Batch-RL Fitted Q-iteration | Value-based |
[109] 2019 | EVCS | Q-Learning & DQN | Value-based |
[110] 2020 | EVCS/Multi | Q-Learning | Value-based |
[111] 2020 | EVCS | DQN | Value-based |
[114] 2020 | EVCS/Multi | CDDPG | Actor–Critic |
[115] 2021 | EVCS | Double DQN, DDPG & Parameterized DQN | Actor–Critic, Policy and Value-based |
[112] 2021 | EVCS/Multi | Q-Learning & DQN | Value-based |
[116] 2022 | EVCS/Multi | DDPG | Actor–Critic |
4. Evaluation
4.1. Evaluation per Type
4.2. Evaluation per Algorithmic Methodology
4.3. Evaluation per Application Field
4.3.1. RL Optimal Control in RES
- RL optimal control may target reducing operational costs, increasing energy production and improving system reliability just as in [68,71,72,73,122]. In such cases, optimal control in RL works by using a reward-based approach to learn the optimal control policies that can maximize a particular objective function. This objective function can be defined in different ways depending on the specific application of the RES. For example, in a wind or solar farm, the objective may be to maximize the energy output while minimizing the operational costs.
- Additionally, optimal control empowered by RL methodologies can help to overcome the inherent uncertainty and variability of renewable energy sources, energy prices or load demand just as in [70,71,72,75,79]. RL algorithms are capable of adapting to changing weather conditions, energy demand and other variables in real time, ensuring that the RES operates optimally under different scenarios.
- Last but not least, RL may help to reduce the environmental impact of RES by optimizing energy generation to meet demand while minimizing the use of non-renewable sources of energy just as in [68,71,72,73]. This tendency may lead to greenhouse gas emission reductions and deliver sustainability in the potential energy system. Figure 5 illustrates the Reinforcement Learning Optimal Control schemes in RES indicating the control targets as denoted in the literature.
4.3.2. RL Optimal Control in BEMS
4.3.3. RL Optimal Control in EVCS
5. Research Tendencies and Future Directions
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
A2C | Advantage Actor–Critic |
A3C | Asynchronous Advantage Actor–Critic |
ADHDP | Action-Dependent Heuristic Dynamic Programming |
ANN | Artificial Neural Network |
BEM | Building Energy Model |
BEMS | Building Energy-Management Systems |
DBN | Deep Belief Network |
DC | Direct Current |
DDPG | Deep Deterministic Policy Gradient |
DEWH | Domestic Electric Water Heating Systems |
DG | Distributed Generation |
DHW | Domestic Hot Water |
DNN | Deep Neural Network |
DQN | Deep Q-Network |
DRL | Deep Reinforcement Learning |
DSCRM | Deep-State Clustering and Representation Modeling |
EG | Energy Grid |
ESN | Echo State Network |
EVCS | Electric Vehicle Charging Stations |
FC | Fuel Cell |
FMDP | Finite Markov Decision Process |
FNN | Feedforward Neural Network |
HRES | Hybrid Renewable Energy Systems |
HRL | Hierarchical Reinforcement Learning |
HVAC | Heating Ventilation Air Conditioning |
IES | Integrated Energy System |
KA | Knowledge-Assisted |
LCP | Load Commitment Problem |
LSTM | Long Short-Term Memory |
MDP | Markov Decision Process |
MLP-BP | Multilayer Perceptron and Backpropagation |
MOC | Model-based Optimal Control |
MPC | Model Predictive Control |
MPPT | Maximum Power Point Tracking |
PEV | Plug-in Electric Vehicle |
PHM | Prognosis and Health Management |
PMSG | Permanent Magnet Synchronous Generators |
PMV | Predicted Mean Vote Index |
PPO | Proximal Policy Optimization |
PV | Photovoltaic |
RBC | Rule-based Control |
RES | Renewable Energy Sources |
RL | Reinforcement Learning |
RLMPPT | Reinforcement Learning Maximum Power Point Tracking |
SAC | Soft Actor–Critic |
SDRL | Symbolic Deep Reinforcement Learning |
TD3 | Twin-Delayed Deep Deterministic Policy Gradient |
TRPO | Trust Region Policy Optimization |
VPG | Vanilla Policy Gradient |
WECS | Wind Energy-Conversion System |
References
- Mikayilov, J.I.; Mukhtarov, S.; Dinçer, H.; Yüksel, S.; Aydın, R. Elasticity analysis of fossil energy sources for sustainable economies: A case of gasoline consumption in Turkey. Energies 2020, 13, 731. [Google Scholar] [CrossRef] [Green Version]
- Martins, F.; Felgueiras, C.; Smitkova, M.; Caetano, N. Analysis of fossil fuel energy consumption and environmental impacts in European countries. Energies 2019, 12, 964. [Google Scholar] [CrossRef] [Green Version]
- Zahraoui, Y.; Basir Khan, M.R.; AlHamrouni, I.; Mekhilef, S.; Ahmed, M. Current status, scenario and prospective of renewable energy in Algeria: A review. Energies 2021, 14, 2354. [Google Scholar] [CrossRef]
- Abas, N.; Kalair, A.; Khan, N. Review of fossil fuels and future energy technologies. Futures 2015, 69, 31–49. [Google Scholar] [CrossRef]
- Holechek, J.L.; Geli, H.M.; Sawalhah, M.N.; Valdez, R. A global assessment: Can renewable energy replace fossil fuels by 2050? Sustainability 2022, 14, 4792. [Google Scholar] [CrossRef]
- Shafiee, S.; Topal, E. When will fossil fuel reserves be diminished? Energy Policy 2009, 37, 181–189. [Google Scholar] [CrossRef]
- Halkos, G.E.; Gkampoura, E.C. Reviewing usage, potentials and limitations of renewable energy sources. Energies 2020, 13, 2906. [Google Scholar] [CrossRef]
- Yan, J.; Chou, S.K.; Desideri, U.; Lee, D.J. Transition of clean energy systems and technologies towards a sustainable future. Fifteenth International Conference on Atmospheric Electricity (ICAE 2014), Norman, Oklahoma, USA, 15–20 June 2014. Appl. Energy 2015, 160, 619–1006. [Google Scholar] [CrossRef]
- Dominković, D.F.; Bačeković, I.; Pedersen, A.S.; Krajačić, G. The future of transportation in sustainable energy systems: Opportunities and barriers in a clean energy transition. Renew. Sustain. Energy Rev. 2018, 82, 1823–1838. [Google Scholar] [CrossRef]
- Michailidis, P.; Pelitaris, P.; Korkas, C.; Michailidis, I.; Baldi, S.; Kosmatopoulos, E. Enabling optimal energy management with minimal IoT requirements: A legacy A/C case study. Energies 2021, 14, 7910. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Sangi, R.; Michailidis, P.; Schild, T.; Fuetterer, J.; Mueller, D.; Kosmatopoulos, E.B. Balancing energy efficiency with indoor comfort using smart control agents: A simulative case study. Energies 2020, 13, 6228. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Schild, T.; Sangi, R.; Michailidis, P.; Korkas, C.; Fütterer, J.; Müller, D.; Kosmatopoulos, E.B. Energy-efficient HVAC management using cooperative, self-trained, control agents: A real-life German building case study. Appl. Energy 2018, 211, 113–125. [Google Scholar] [CrossRef]
- Tamani, N.; Ahvar, S.; Santos, G.; Istasse, B.; Praca, I.; Brun, P.E.; Ghamri, Y.; Crespi, N.; Becue, A. Rule-based model for smart building supervision and management. In Proceedings of the 2018 IEEE International Conference on Services Computing, San Francisco, CA, USA, 2–7 July 2018; pp. 9–16. [Google Scholar]
- De Hoog, J.; Abdulla, K.; Kolluri, R.R.; Karki, P. Scheduling fast local rule-based controllers for optimal operation of energy storage. In Proceedings of the Ninth International Conference on Future Energy Systems, Karlsruhe, Germany, 12–15 June 2018; pp. 168–172. [Google Scholar]
- Kermadi, M.; Salam, Z.; Berkouk, E.M. A rule-based power management controller using stateflow for grid-connected PV-battery energy system supplying household load. In Proceedings of the 2018 9th IEEE International Symposium on Power Electronics for Distributed Generation Systems (PEDG), Charlotte, NC, USA, 25–28 June 2018; pp. 1–6. [Google Scholar]
- Schreiber, T.; Netsch, C.; Baranski, M.; Mueller, D. Monitoring data-driven Reinforcement Learning Controller training: A comparative study of different training strategies for a real-world energy system. Energy Build. 2021, 239, 110856. [Google Scholar] [CrossRef]
- Fu, Y.; Xu, S.; Zhu, Q.; O’Neill, Z.; Adetola, V. How good are learning-based control vs model-based control for load shifting? Investigations on a single zone building energy system. Energy 2023, 273, 127073. [Google Scholar] [CrossRef]
- Jahedi, G.; Ardehali, M. Genetic algorithm-based fuzzy-PID control methodologies for enhancement of energy efficiency of a dynamic energy system. Energy Convers. Manag. 2011, 52, 725–732. [Google Scholar] [CrossRef]
- Ooka, R.; Komamura, K. Optimal design method for building energy systems using genetic algorithms. Build. Environ. 2009, 44, 1538–1544. [Google Scholar] [CrossRef]
- Parisio, A.; Wiezorek, C.; Kyntäjä, T.; Elo, J.; Strunz, K.; Johansson, K.H. Cooperative MPC-based energy management for networked microgrids. IEEE Trans. Smart Grid 2017, 8, 3066–3074. [Google Scholar] [CrossRef]
- Mariano-Hernández, D.; Hernández-Callejo, L.; Zorita-Lamadrid, A.; Duque-Pérez, O.; García, F.S. A review of strategies for building energy management system: Model predictive control, demand side management, optimization and fault detect & diagnosis. J. Build. Eng. 2021, 33, 101692. [Google Scholar]
- Michailidis, I.T.; Kapoutsis, A.C.; Korkas, C.D.; Michailidis, P.T.; Alexandridou, K.A.; Ravanis, C.; Kosmatopoulos, E.B. Embedding autonomy in large-scale IoT ecosystems using CAO and L4G-CAO. Discov. Internet Things 2021, 1, 1–22. [Google Scholar] [CrossRef]
- Jin, X.; Wu, Q.; Jia, H.; Hatziargyriou, N.D. Optimal integration of building heating loads in integrated heating/electricity community energy systems: A bi-level MPC approach. IEEE Trans. Sustain. Energy 2021, 12, 1741–1754. [Google Scholar] [CrossRef]
- Artiges, N.; Nassiopoulos, A.; Vial, F.; Delinchant, B. Calibrating models for MPC of energy systems in buildings using an adjoint-based sensitivity method. Energy Build. 2020, 208, 109647. [Google Scholar] [CrossRef]
- Forgione, M.; Piga, D.; Bemporad, A. Efficient calibration of embedded MPC. IFAC-PapersOnLine 2020, 53, 5189–5194. [Google Scholar] [CrossRef]
- Storek, T.; Esmailzadeh, A.; Mehrfeld, P.; Schumacher, M.; Baranski, M.; Müller, D. Applying Machine Learning to Automate Calibration for Model Predictive Control of Building Energy Systems. In Proceedings of the Building Simulation 2019, Rome, Italy, 2–4 September 2019; Volume 16, pp. 900–907. [Google Scholar]
- Saad, A.A.; Youssef, T.A.; Elsayed, A.T.; Amin, A.M.A.; Abdalla, O.H.; Mohammed, O.A. Data-Centric Hierarchical Distributed Model Predictive Control for Smart Grid Energy Management. IEEE Trans. Ind. Inform. 2019, 15, 4086–4098. [Google Scholar] [CrossRef]
- Nian, R.; Liu, J.; Huang, B. A review on Reinforcement Learning: Introduction and applications in industrial process control. Comput. Chem. Eng. 2020, 139, 106886. [Google Scholar] [CrossRef]
- Coronato, A.; Naeem, M.; de Pietro, G.; Paragliola, G. Reinforcement Learning for intelligent healthcare applications: A survey. Artif. Intell. Med. 2020, 109, 101964. [Google Scholar] [CrossRef] [PubMed]
- Polydoros, A.S.; Nalpantidis, L. Survey of model-based Reinforcement Learning: Applications on robotics. J. Intell. Robot. Syst. 2017, 86, 153–173. [Google Scholar] [CrossRef]
- Khan, M.A.M.; Khan, M.R.J.; Tooshil, A.; Sikder, N.; Mahmud, M.P.; Kouzani, A.Z.; Nahid, A.A. A systematic review on Reinforcement Learning-based robotics within the last decade. IEEE Access 2020, 8, 176598–176623. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Michailidis, P.; Alexandridou, K.; Brewick, P.T.; Masri, S.F.; Kosmatopoulos, E.B.; Chassiakos, A. Seismic Active Control under Uncertain Ground Excitation: An Efficient Cognitive Adaptive Optimization Approach. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; pp. 847–852. [Google Scholar]
- Karatzinis, G.D.; Michailidis, P.; Michailidis, I.T.; Kapoutsis, A.C.; Kosmatopoulos, E.B.; Boutalis, Y.S. Coordinating heterogeneous mobile sensing platforms for effectively monitoring a dispersed gas plume. Integr.-Comput.-Aided Eng. 2022, 29, 411–429. [Google Scholar] [CrossRef]
- Salavasidis, G.; Kapoutsis, A.C.; Chatzichristofis, S.A.; Michailidis, P.; Kosmatopoulos, E.B. Autonomous trajectory design system for mapping of unknown sea-floors using a team of AUVs. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 1080–1087. [Google Scholar]
- Keroglou, C.; Kansizoglou, I.; Michailidis, P.; Oikonomou, K.M.; Papapetros, I.T.; Dragkola, P.; Michailidis, I.T.; Gasteratos, A.; Kosmatopoulos, E.B.; Sirakoulis, G.C. A Survey on Technical Challenges of Assistive Robotics for Elder People in Domestic Environments: The ASPiDA Concept. IEEE Trans. Med. Robot. Bionics 2023, 5, 196–205. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. Autonomous self-regulating intersections in large-scale urban traffic networks: A Chania city case study. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; pp. 853–858. [Google Scholar]
- Moerland, T.M.; Broekens, J.; Plaat, A.; Jonker, C.M. Model-based Reinforcement Learning: A survey. Found. Trends® Mach. Learn. 2023, 16, 1–118. [Google Scholar] [CrossRef]
- Pong, V.; Gu, S.; Dalal, M.; Levine, S. Temporal difference models: Model-free Deep RL for model-based control. arXiv 2018, arXiv:1802.09081. [Google Scholar]
- Sun, W.; Jiang, N.; Krishnamurthy, A.; Agarwal, A.; Langford, J. Model-based rl in contextual decision processes: Pac bounds and exponential improvements over model-free approaches. In Proceedings of the Conference on Learning Theory, Phoenix, AZ, USA, 25–28 June 2019; pp. 2898–2933. [Google Scholar]
- Lu, R.; Hong, S.H.; Zhang, X. A dynamic pricing demand response algorithm for smart grid: Reinforcement Learning approach. Appl. Energy 2018, 220, 220–230. [Google Scholar] [CrossRef]
- Aktas, A.; Erhan, K.; Özdemir, S.; Özdemir, E. Dynamic energy management for photovoltaic power system including hybrid energy storage in smart grid applications. Energy 2018, 162, 72–82. [Google Scholar] [CrossRef]
- Korkas, C.D.; Baldi, S.; Michailidis, P.; Kosmatopoulos, E.B. A cognitive stochastic approximation approach to optimal charging schedule in electric vehicle stations. In Proceedings of the 2017 25th Mediterranean Conference on Control and Automation (MED), Valletta, Malta, 3–6 July 2017; pp. 484–489. [Google Scholar]
- Mosavi, A.; Salimi, M.; Faizollahzadeh Ardabili, S.; Rabczuk, T.; Shamshirband, S.; Varkonyi-Koczy, A.R. State of the art of Machine Learning models in energy systems, a systematic review. Energies 2019, 12, 1301. [Google Scholar] [CrossRef] [Green Version]
- Mason, K.; Grijalva, S. A review of Reinforcement Learning for autonomous building energy management. Comput. Electr. Eng. 2019, 78, 300–312. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Hong, T. Reinforcement Learning for building controls: The opportunities and challenges. Appl. Energy 2020, 269, 115036. [Google Scholar] [CrossRef]
- Shaqour, A.; Hagishima, A. Systematic Review on Deep Reinforcement Learning-Based Energy Management for Different Building Types. Energies 2022, 15, 8663. [Google Scholar] [CrossRef]
- Abdullah, H.M.; Gastli, A.; Ben-Brahim, L. Reinforcement Learning based EV charging management systems–a review. IEEE Access 2021, 9, 41506–41531. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Wiering, M.; Otterlo, M.v. Reinforcement Learning: State-of-the-Art; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 1889–1897. [Google Scholar]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems; Solla, S., Leen, T., Müller, K., Eds.; MIT Press: Cambridge, MA, USA, 1999; Volume 12. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor–Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 1928–1937. [Google Scholar]
- Watkins, C. Learning from Delayed Rewards. Ph.D. Thesis, King’s College, London, UK, 1989. [Google Scholar]
- Watkins, C.; Dayan, P. Q-Learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Hasselt, H. Double Q-Learning. In Advances in Neural Information Processing Systems; Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2010; Volume 23. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.A.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 1995–2003. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor–Critic Methods. arXiv 2018, arXiv:1802.09477. [Google Scholar]
- Bellemare, M.G.; Dabney, W.; Munos, R. A Distributional Perspective on Reinforcement Learning. arXiv 2017, arXiv:1707.06887. [Google Scholar]
- Kuznetsova, E.; Li, Y.F.; Ruiz, C.; Zio, E.; Ault, G.; Bell, K. Reinforcement Learning for microgrid energy management. Energy 2013, 59, 133–146. [Google Scholar] [CrossRef]
- Wei, C.; Zhang, Z.; Qiao, W.; Qu, L. Reinforcement-learning-based intelligent maximum power point tracking control for wind energy-conversion systems. IEEE Trans. Ind. Electron. 2015, 62, 6360–6370. [Google Scholar] [CrossRef]
- Wei, C.; Zhang, Z.; Qiao, W.; Qu, L. An adaptive network-based Reinforcement Learning method for MPPT control of PMSG wind energy-conversion systems. IEEE Trans. Power Electron. 2016, 31, 7837–7848. [Google Scholar] [CrossRef]
- Kofinas, P.; Doltsinis, S.; Dounis, A.; Vouros, G. A Reinforcement Learning approach for MPPT control method of photovoltaic sources. Renew. Energy 2017, 108, 461–473. [Google Scholar] [CrossRef]
- Remani, T.; Jasmin, E.; Ahamed, T.I. Residential Load Scheduling With Renewable Generation in the Smart Grid: A Reinforcement Learning Approach. IEEE Syst. J. 2019, 13, 3283–3294. [Google Scholar] [CrossRef]
- Diao, R.; Wang, Z.; Shi, D.; Chang, Q.; Duan, J.; Zhang, X. Autonomous Voltage Control for Grid Operation Using Deep Reinforcement Learning. In Proceedings of the 2019 IEEE Power & Energy Society General Meeting (PESGM), Atlanta, GA, USA, 4–8 August 2019; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
- Rocchetta, R.; Bellani, L.; Compare, M.; Zio, E.; Patelli, E. A Reinforcement Learning framework for optimal operation and maintenance of power grids. Appl. Energy 2019, 241, 291–301. [Google Scholar] [CrossRef]
- Zhang, B.; Hu, W.; Cao, D.; Huang, Q.; Chen, Z.; Blaabjerg, F. Deep Reinforcement Learning–based approach for optimizing energy conversion in integrated electrical and heating system with renewable energy. Energy Convers. Manag. 2019, 202, 112199. [Google Scholar] [CrossRef]
- Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef] [Green Version]
- Phan, B.C.; Lai, Y.C. Control strategy of a hybrid renewable energy system based on Reinforcement Learning approach for an isolated microgrid. Appl. Sci. 2019, 9, 4001. [Google Scholar] [CrossRef] [Green Version]
- Saenz-Aguirre, A.; Zulueta, E.; Fernandez-Gamiz, U.; Lozano, J.; Lopez-Guede, J.M. Artificial neural network based Reinforcement Learning for wind turbine yaw control. Energies 2019, 12, 436. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Yu, C.; Wu, H.; Duan, Z.; Yan, G. A new hybrid ensemble deep Reinforcement Learning model for wind speed short term forecasting. Energy 2020, 202, 117794. [Google Scholar] [CrossRef]
- Jeong, J.; Kim, H. DeepComp: Deep Reinforcement Learning based renewable energy error compensable forecasting. Appl. Energy 2021, 294, 116970. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.; Zhao, J.; Huang, Q.; Chen, Z.; Blaabjerg, F. A multi-agent deep Reinforcement Learning based voltage regulation using coordinated PV inverters. IEEE Trans. Power Syst. 2020, 35, 4120–4123. [Google Scholar] [CrossRef]
- Zhao, H.; Zhao, J.; Qiu, J.; Liang, G.; Dong, Z.Y. Cooperative wind farm control with deep Reinforcement Learning and knowledge-assisted learning. IEEE Trans. Ind. Inform. 2020, 16, 6912–6921. [Google Scholar] [CrossRef]
- Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Real-time optimal energy management of microgrid with uncertainties based on deep Reinforcement Learning. Energy 2022, 238, 121873. [Google Scholar] [CrossRef]
- Sierla, S.; Ihasalo, H.; Vyatkin, V. A Review of Reinforcement Learning Applications to Control of Heating, Ventilation and Air Conditioning Systems. Energies 2022, 15, 3526. [Google Scholar] [CrossRef]
- Barrett, E.; Linder, S. Autonomous HVAC control, A Reinforcement Learning approach. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, 7–11 September 2015; pp. 3–19. [Google Scholar]
- Ruelens, F.; Claessens, B.J.; Quaiyum, S.; De Schutter, B.; Babuška, R.; Belmans, R. Reinforcement Learning applied to an electric water heater: From theory to practice. IEEE Trans. Smart Grid 2016, 9, 3792–3800. [Google Scholar] [CrossRef] [Green Version]
- Al-Jabery, K.; Xu, Z.; Yu, W.; Wunsch, D.C.; Xiong, J.; Shi, Y. Demand-side management of domestic electric water heaters using approximate dynamic programming. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2016, 36, 775–788. [Google Scholar] [CrossRef]
- Cheng, Z.; Zhao, Q.; Wang, F.; Jiang, Y.; Xia, L.; Ding, J. Satisfaction based Q-Learning for integrated lighting and blind control. Energy Build. 2016, 127, 43–55. [Google Scholar] [CrossRef]
- Wei, T.; Wang, Y.; Zhu, Q. Deep Reinforcement Learning for building HVAC control. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Chen, Y.; Norford, L.K.; Samuelson, H.W.; Malkawi, A. Optimal control of HVAC and window systems for natural ventilation through Reinforcement Learning. Energy Build. 2018, 169, 195–205. [Google Scholar] [CrossRef]
- Jia, R.; Jin, M.; Sun, K.; Hong, T.; Spanos, C. Advanced building control via deep Reinforcement Learning. Energy Procedia 2019, 158, 6158–6163. [Google Scholar] [CrossRef]
- Valladares, W.; Galindo, M.; Gutiérrez, J.; Wu, W.C.; Liao, K.K.; Liao, J.C.; Lu, K.C.; Wang, C.C. Energy optimization associated with thermal comfort and indoor air control via a deep Reinforcement Learning algorithm. Build. Environ. 2019, 155, 105–117. [Google Scholar] [CrossRef]
- Kazmi, H.; Suykens, J.; Balint, A.; Driesen, J. Multi-agent Reinforcement Learning for modeling and control of thermostatically controlled loads. Appl. Energy 2019, 238, 1022–1035. [Google Scholar] [CrossRef]
- Park, J.Y.; Dougherty, T.; Fritz, H.; Nagy, Z. LightLearn: An adaptive and occupant centered controller for lighting based on Reinforcement Learning. Build. Environ. 2019, 147, 397–414. [Google Scholar] [CrossRef]
- Ding, X.; Du, W.; Cerpa, A. Octopus: Deep Reinforcement Learning for holistic smart building control. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities and Transportation, New York, NY, USA, 13–14 November 2019; pp. 326–335. [Google Scholar]
- Brandi, S.; Piscitelli, M.S.; Martellacci, M.; Capozzoli, A. Deep Reinforcement Learning to optimise indoor temperature control and heating energy consumption in buildings. Energy Build. 2020, 224, 110225. [Google Scholar] [CrossRef]
- Lissa, P.; Deane, C.; Schukat, M.; Seri, F.; Keane, M.; Barrett, E. Deep Reinforcement Learning for home energy management system control. Energy AI 2021, 3, 100043. [Google Scholar] [CrossRef]
- Jiang, Z.; Risbeck, M.J.; Ramamurti, V.; Murugesan, S.; Amores, J.; Zhang, C.; Lee, Y.M.; Drees, K.H. Building HVAC control with Reinforcement Learning for reduction of energy cost and demand charge. Energy Build. 2021, 239, 110833. [Google Scholar] [CrossRef]
- Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-efficient heating control for smart buildings with deep Reinforcement Learning. J. Build. Eng. 2021, 34, 101739. [Google Scholar] [CrossRef]
- De Somer, O.; Soares, A.; Vanthournout, K.; Spiessens, F.; Kuijpers, T.; Vossen, K. Using Reinforcement Learning for demand response of domestic hot water buffers: A real-life demonstration. In Proceedings of the 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Turin, Italy, 26–29 September 2017; pp. 1–7. [Google Scholar]
- Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lu, S.; Lam, K.P. A deep Reinforcement Learning approach to using whole building energy model for hvac optimal control. In Proceedings of the 2018 Building Performance Analysis Conference and SimBuild, Chicago, IL, USA, 26–28 September 2018; Volume 3, pp. 22–23. [Google Scholar]
- Gao, G.; Li, J.; Wen, Y. Energy-efficient thermal comfort control in smart buildings via deep Reinforcement Learning. arXiv 2019, arXiv:1901.04693. [Google Scholar]
- Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement Learning for whole-building HVAC control and demand response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
- Du, Y.; Zandi, H.; Kotevska, O.; Kurte, K.; Munk, J.; Amasyali, K.; Mckee, E.; Li, F. Intelligent multi-zone residential HVAC control strategy based on deep Reinforcement Learning. Appl. Energy 2021, 281, 116117. [Google Scholar] [CrossRef]
- Pinto, G.; Deltetto, D.; Capozzoli, A. Data-driven district energy management with surrogate models and deep Reinforcement Learning. Appl. Energy 2021, 304, 117642. [Google Scholar] [CrossRef]
- Pinto, G.; Piscitelli, M.S.; Vázquez-Canteli, J.R.; Nagy, Z.; Capozzoli, A. Coordinated energy management for a cluster of buildings through deep Reinforcement Learning. Energy 2021, 229, 120725. [Google Scholar] [CrossRef]
- Vandael, S.; Claessens, B.; Ernst, D.; Holvoet, T.; Deconinck, G. Reinforcement Learning of heuristic EV fleet charging in a day-ahead electricity market. IEEE Trans. Smart Grid 2015, 6, 1795–1805. [Google Scholar] [CrossRef] [Green Version]
- Chiş, A.; Lundén, J.; Koivunen, V. Reinforcement Learning-based plug-in electric vehicle charging with forecasted price. IEEE Trans. Veh. Technol. 2016, 66, 3674–3684. [Google Scholar]
- Mbuwir, B.V.; Ruelens, F.; Spiessens, F.; Deconinck, G. Battery energy management in a microgrid using batch reinforcement learning. Energies 2017, 10, 1846. [Google Scholar] [CrossRef] [Green Version]
- Da Silva, F.L.; Nishida, C.E.; Roijers, D.M.; Costa, A.H.R. Coordination of electric vehicle charging through multiagent Reinforcement Learning. IEEE Trans. Smart Grid 2019, 11, 2347–2356. [Google Scholar] [CrossRef]
- Qian, T.; Shao, C.; Wang, X.; Shahidehpour, M. Deep Reinforcement Learning for EV charging navigation by coordinating smart grid and intelligent transportation system. IEEE Trans. Smart Grid 2019, 11, 1714–1723. [Google Scholar] [CrossRef]
- Sadeghianpourhamami, N.; Deleu, J.; Develder, C. Definition and evaluation of model-free coordination of electrical vehicle charging with Reinforcement Learning. IEEE Trans. Smart Grid 2019, 11, 203–214. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Bi, S.; Zhang, Y.A. Reinforcement Learning for real-time pricing and scheduling control in EV charging stations. IEEE Trans. Ind. Inform. 2019, 17, 849–859. [Google Scholar] [CrossRef]
- Chang, F.; Chen, T.; Su, W.; Alsafasfeh, Q. Control of battery charging based on Reinforcement Learning and long short-term memory networks. Comput. Electr. Eng. 2020, 85, 106670. [Google Scholar] [CrossRef]
- Lee, J.; Lee, E.; Kim, J. Electric vehicle charging and discharging algorithm based on Reinforcement Learning with data-driven approach in dynamic pricing scheme. Energies 2020, 13, 1950. [Google Scholar] [CrossRef] [Green Version]
- Tuchnitz, F.; Ebell, N.; Schlund, J.; Pruckner, M. Development and evaluation of a smart charging strategy for an electric vehicle fleet based on Reinforcement Learning. Appl. Energy 2021, 285, 116382. [Google Scholar] [CrossRef]
- Li, H.; Wan, Z.; He, H. Constrained EV charging scheduling based on safe deep reinforcement learning. IEEE Trans. Smart Grid 2019, 11, 2427–2439. [Google Scholar] [CrossRef]
- Zhang, F.; Yang, Q.; An, D. CDDPG: A deep-reinforcement-learning-based approach for electric vehicle charging control. IEEE Internet Things J. 2020, 8, 3075–3087. [Google Scholar] [CrossRef]
- Dorokhova, M.; Martinson, Y.; Ballif, C.; Wyrsch, N. Deep Reinforcement Learning Control of electric vehicle charging in the presence of photovoltaic generation. Appl. Energy 2021, 301, 117504. [Google Scholar] [CrossRef]
- Park, S.; Pozzi, A.; Whitmeyer, M.; Perez, H.; Kandel, A.; Kim, G.; Choi, Y.; Joe, W.T.; Raimondo, D.M.; Moura, S. A deep Reinforcement Learning framework for fast charging of li-ion batteries. IEEE Trans. Transp. Electrif. 2022, 8, 2770–2784. [Google Scholar] [CrossRef]
- Belousov, B.; Abdulsamad, H.; Klink, P.; Parisi, S.; Peters, J. Reinforcement Learning Algorithms: Analysis and Applications; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Kabanda, G.; Kannan, H. A Systematic Literature Review of Reinforcement Algorithms in Machine Learning. In Handbook of Research on AI and Knowledge Engineering for Real-Time Business Intelligence; IGI Global: Hershey, PA, USA, 2023; pp. 17–33. [Google Scholar]
- Mosavi, A.; Faghan, Y.; Ghamisi, P.; Duan, P.; Ardabili, S.F.; Salwana, E.; Band, S.S. Comprehensive review of deep Reinforcement Learning methods and applications in economics. Mathematics 2020, 8, 1640. [Google Scholar] [CrossRef]
- Glorennec, P.Y. Reinforcement Learning: An overview. In Proceedings of the European Symposium on Intelligent Techniques (ESIT-00), Aachen, Germany, 14–15 September 2000; pp. 14–15. [Google Scholar]
- Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement Learning and its applications in modern power and energy systems: A review. J. Mod. Power Syst. Clean Energy 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
- Muriithi, G.; Chowdhury, S. Optimal energy management of a grid-tied solar PV-battery microgrid: A Reinforcement Learning approach. Energies 2021, 14, 2700. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies 2023, 16, 5326. https://doi.org/10.3390/en16145326
Vamvakas D, Michailidis P, Korkas C, Kosmatopoulos E. Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies. 2023; 16(14):5326. https://doi.org/10.3390/en16145326
Chicago/Turabian StyleVamvakas, Dimitrios, Panagiotis Michailidis, Christos Korkas, and Elias Kosmatopoulos. 2023. "Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications" Energies 16, no. 14: 5326. https://doi.org/10.3390/en16145326
APA StyleVamvakas, D., Michailidis, P., Korkas, C., & Kosmatopoulos, E. (2023). Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies, 16(14), 5326. https://doi.org/10.3390/en16145326