1. Introduction
The power system is facing major changes and challenges as a result of the growing integration of renewable energy sources and the rising popularity of electric vehicles (EVs). It becomes increasingly important to ensure system stability as the penetration of renewable energy increases. Furthermore, the rapid expansion of EVs calls for a dependable and effective infrastructure for charging them, which calls for efficient frequency control. Thus, the goal of this research is to determine the best scenario for a dynamic load in frequency regulation for electric vehicle fast charging stations by utilizing a reinforcement learning (RL) algorithm and renewable energy penetration [
1,
2,
3]. There are several environmental and financial advantages to using renewable energy sources, such as wind and solar energy [
2,
4,
5].
Figure 1 shows the percentage of CO
2 emissions attributable to different sectors: industry, transportation, residential, electricity and heat, and other sectors worldwide [
6].
The renewable sources’ intermittency makes it difficult to keep the system stable because variations in generation might cause frequency variances. Therefore, it is crucial to investigate how the penetration of renewable energy affects the system’s stability and to create plans to address any possible problems [
5,
7]. Furthermore, the widespread use of EVs has increased the demand for infrastructure related to charging. In order to meet the changing requirements of electric vehicles (EVs), provide user convenience, and encourage the wider use of electric mobility, fast charging stations are essential. When there is a significant demand for EV charging, efficient frequency control is essential for controlling the charging load and preserving grid stability [
8,
9,
10,
11,
12,
13].
The increasing integration of renewable energy sources and the massive growth of EVs are transforming the global energy landscape as shown in
Figure 2. Renewable energy sources, like wind and solar energy, provide sustainable and environmentally friendly substitutes for conventional fossil fuel-based power. The broad deployment of EVs intends to lessen the transportation sector’s reliance on fossil fuels and greenhouse gas emissions at the same time. These advancements, nonetheless, provide new difficulties for the power system and call for creative fixes to guarantee grid stability and effective infrastructure for charging [
14,
15,
16].
Although it is much desired, intermittent generation is produced by renewable energy sources because they are weather-dependent. The power grid may experience frequency fluctuations as a result of the fluctuating output of renewable energy, which could jeopardize system stability. For power systems to function dependably, a steady frequency must be maintained because fluctuations might affect linked equipment’s performance and upset the supply and demand balance. Thus, it is crucial to comprehend how the penetration of renewable energy affects system stability in order to guarantee the dependable integration of clean energy sources. Furthermore, a reliable and effective charging infrastructure is now required due to the rising popularity of EVs. Long-distance driving and reduced charging times for EV users are made possible by EV fast charging stations, which are essential in satisfying the growing demand for quick charging. Nevertheless, these fast charging stations’ high power requirements can put pressure on the grid, causing variations in frequency and voltage. During times of heavy demand for EV charging, efficient frequency control is crucial to managing the charging load and maintaining grid stability [
17,
18,
19,
20].
2. Related Works
A secondary controller was created and used in [
21] to control the system frequency in a networked microgrid system. Energy-storing units (ESUs), synchronous generators, and renewable energy resources (RESs) made up the suggested power system. Photovoltaic (PV) and wind turbine generator (WTG) units are two examples of renewable energy sources. A flywheel and a battery made up the ESU. Despite various load demand variations and renewable energy sources, the deployed (Mayfly Algorithm) MA-PID controller proved its ability to deliver and control the system frequency effectively. The study’s validation comparisons provided verification of this [
21].
A novel form of power system surfaced as a means of fulfilling China’s carbon emission reduction objectives for the years 2030 and 2060. One of the most significant challenges facing this field of technical innovation is coordinating the energy storage systems (ESSs), load, network infrastructures, and generation. It is necessary to estimate the associated static and dynamic model parameters accurately and effectively in order to manage and run these grid resources in a secure, reliable, and cost-effective manner. Power systems have made use of deep reinforcement learning (DRL), a popular artificial intelligence technique in recent years, especially in control-related domains. DRL is a potent optimization tool that produces promising results when used to solve parameter estimate issues. As a result, the DRL techniques used to estimate the parameters of power system models, including load and generator models, were thoroughly addressed in this study [
22].
When compared to synchronous generation power grids, inverter-based networks are not physically able to produce a substantial amount of inertia. This calls for greater in-depth research on stability issues related to these kinds of systems. Consequently, in power systems with significant penetration rates of renewable energy, when many inverters and dispersed energy sources are involved, it is imperative to utilize suitable analytical techniques for voltage stability analysis. This study reviewed the literature in-depth on voltage stability analyses in power systems that include a significant amount of renewable energy. There was a discussion of several upgrade techniques and assessment plans for voltage stability in power networks with various distributed energy sources. The study also conducted a thorough examination of the simulation verification models and voltage stability analysis techniques currently in use for microgrids. The many distributed generator types, microgrid designs, and modes of operation were taken into consideration when reviewing the conventional and enhanced voltage stability analysis methods. Furthermore, a detailed evaluation of the voltage stability indices was conducted, taking into account the applicable conditions. These indices are crucial in voltage stability assessments [
23].
The goal of the project was to model a fleet of EVs in order to offer Frequency Containment Reserve for Disturbance (FCR-D) upwards and Firm Frequency Response (FFR) services while adhering to current laws and regulations. The Nordic 45 test model (N45) is an enhanced version of the Nordic 44 test model (N44) that was created to perform more accurate simulations of the Nordic power system. To evaluate various aspects of electric vehicle (EV) performance as frequency reserve providers in Kundur’s two-area power system model (K2A) and the new N45 test model, ten simulation cases were developed. After presenting and analyzing the simulation findings, a conclusion about the viability of EVs as frequency reserve suppliers in the current Nordic power system was reached. Because EV charging equipment and measurement devices have an inherent time delay, it was found that deploying EVs for FCR-D and FFR services was not currently possible. However, it was found that the usefulness of using EVs as frequency reserve providers might be improved if the time delay of these components was much lowered, enabling nearly instantaneous activation of EVs’ frequency response following a fault occurrence [
24].
Reference [
25] presented a two-stage transfer and deep learning-based method to predict dynamic frequency in power systems after disturbances and find the best event-based load-shedding plan to keep the frequency of the system stable. Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks were coupled in the suggested method to efficiently use the input data’s spatial and temporal measurements. The CNN-LSTM technique greatly improved the timeliness of online frequency control while maintaining good accuracy and efficacy, according to simulation findings performed on the IEEE 118-bus system. Furthermore, test cases conducted on the South Carolina 500-bus system and the New England 39-bus system showed that the transfer learning procedure produced accurate results even in the presence of a shortage of training data [
25].
Another study proposed a system to enable a thorough examination of grid transient stability, based on several machine-learning algorithms. Furthermore, the framework made it easier to thoroughly identify new patterns that appeared in the dynamic response of individual synchronous generators in systems with reduced inertia. Given that crucial operating points fluctuate in systems with high renewable penetration, the framework’s capacity to evaluate all hourly operating points over the course of a study year was a significant advantage. An IEEE-39 system was used as a case study, integrating inverter-based resources at different penetration levels and locations, to illustrate the advantages of the suggested platform [
26].
The literature assessment indicates that further research and exploration of the optimal condition as a reinforcement learning algorithm for electric vehicle fast charging stations and a dynamic load for frequency control using renewable energy integration is still possible. To overcome these obstacles and achieve optimal frequency regulation and effective operation of fast charging stations, this research suggests using a reinforcement learning (RL) algorithm in conjunction with renewable energy penetration. Because RL algorithms may learn optimal control policies through interactions with the environment, they are especially well suited for dynamic and unpredictable systems, such as electric grids and networks for electric vehicle charging. There are two goals that this study aims to achieve. First, in order to evaluate the effect of renewable energy penetration on system stability, it looks at the frequency response and active power flow under various photovoltaic (PV) penetration scenarios. Second, it endeavors to assess how well the suggested RL algorithm performs in terms of attaining optimal frequency control and guaranteeing the efficient operation of EV fast charging stations.
After the introduction and the related work in
Section 1 and
Section 2, respectively, the rest of this article has been designed as follows:
Section 3 describes the proposed methodology and it covers system identifications and simulation tools as well as the mathematical model of EV charging stations. In addition,
Section 3 describes the development of merging traditional PID control with RL using the adaptive PID controller and Q-learning. In
Section 4, the experimental setup and Kundur’s model details have been explained. The analysis of case studies as well as the results have been highlighted in
Section 5. Finally,
Section 6 and
Section 7 cover the discussion and conclusion of this research including the main findings and future work.
3. Methodology
This section introduces the methodology of this research. It involves a Proportional-Integral-Derivative (PID) controller, reinforcement learning (RL), design-adaptive PID controller using Quality (Q)-learning, and experimental setup.
The increasing integration of renewable energy sources in today’s modern power systems creates an urgent need to address the decreasing system inertia [
1]. A potential solution to this problem is to combine RL techniques with the conventional Proportional-Integral-Derivative (PID) control as seen in
Figure 3. Power system operators face a distinct set of issues when integrating renewable energy sources. Variations in power generation are caused by the intrinsic variability and uncertainty of renewable sources, which distinguishes them from conventional power plants. As a result, the overall inertia of the system tends to diminish, which is important for maintaining stability. It is crucial to use reinforcement learning approaches to improve the performance of current control mechanisms, like PID control, in order to overcome this deterioration.
Therefore, in order to preserve power system stability during the integration of renewable energy sources, creative control measures are required. Through the integration of PID control and reinforcement learning, the control system can be made more capable of responding to changing circumstances and improving overall system performance. The evaluation results show this approach’s efficacy and highlight its potential for a range of power system applications. Subsequent investigations may concentrate on enhancing the control approach and investigating supplementary prospects for its integration into practical power systems.
The Power System Simulation for Engineering (PSS/E) is used to model the power system with great care from the beginning of the design process. Dynamic equations relevant to the control actuator (a governor or a power electronic converter) are generated within this framework. The transfer function is then trained using the model-free RL algorithm known as Q-learning. The first step in this process is to define the actions, the state space, and a reward function that are all carefully customized for the power system.
The final phase is to validate the developed controller in multiple PSS/E scenarios. After being fitted with PID parameters that are taken out of the Q-learning procedure, the controller is tested to determine how reliable and effective it is. The primary goal is to make sure that the controller improves the power system’s general stability and reliability while simultaneously mitigating the difficulties brought on by significant renewable integration.
3.1. System Identification and Transfer Function Construction in PSS/E
A popular software program for power system analysis is called PSS/E. Techniques for system identification are used to collect information and examine how the system reacts to various inputs or disruptions. The transfer function of the system, which depicts the connection between its input and output, is then estimated using this data.
In order to construct a mathematical model, system identification involves creating a dynamic system representation using measurement data. Three crucial processes are included in the model development process, which is based on the following measurements [
2]:
Stage 1: input–output selection: this refers to choosing the best actuation and observation signals.
Stage 2: system identification: building the functional model is the task of this step.
Stage 3: controller design: this entails creating a controller using the system that has been identified.
Initially, a set of differential equations called the subspace state-space model or difference equations are used to generate a transfer function model [
3]. Then, using system identification techniques, a model is created based on real measurements [
4].
The measurement-guided model structure makes use of the Output Error (OE) polynomial model. Within the category of polynomial models, this model is a particular configuration that is used, in this case, for model identification using probing measurements. Conventional transfer functions are described by means of OE models, in which the system is defined by the combined effects of the measured outputs and the inputs (represented by injected noise) [
5].
Figure 3 shows the simulated response comparison between validation data (y_1) and model_OE 94.02 %. Amplitude per time can be obtained according to
Figure 4.
The system structure can be described as follows, assuming that the injected noise signal has enough information to identify a system model with the chosen order of n as Equation (1) [
7]:
The OE mathematical model is therefore represented by Equation (2):
where
t denotes the sampled data index,
e represents the injected noise process,
u and
are the input and output of the model,
is the model coefficient denominator, and
is the model coefficient numerator.
For time-domain validation, the created model and the real model responses are compared. To achieve fit accuracy between the created system and the real model response, the original data that was used to build the model must be available. This is known as the actual system response. Equation (3) is used to express how accurate the system that was built is [
8] and is as follows:
where
and
denote the constructed model and actual system responses, respectively, and
is the mean of the actual system response over several periods. The index should approach 1, indicating that the constructed model response aptly represents the actual system.
3.2. Mathematical Computations and Analysis of Aggregated Electric Vehicles for Frequency Regulation
To use EVs in frequency regulation, multiple EVs must be connected to the power grid. An EV aggregator is an EV control center that manages the battery charge and discharge behavior of all EVs in the aggregator. The dynamic model of the
ith EV in the EV aggregator is presented by the first-order transfer function as in Equation (4):
where K
EV and T
EV are the gain and time constants of the
ith EV battery system, respectively.
The transfer function of is used to describe the communication delay from an EV aggregator to the EV and the scheduling delay in the EV aggregator. is the delay time taken to receive control signals from the EV aggregator. All EVs’ delays and time constants (i = 1, 2…) are considered to be equal on average, denoted by and , respectively. This assumption yields an aggregated model of numerous EVs consisting of a single delay function and one EV dynamic.
The usage of an aggregation model of electric vehicles appears to be suitable since a cluster of several EVs and traditional generators are regulated jointly to modify their power injection to follow load disturbances.
To calculate the stability region and delay the margin, the characteristics equations of the single region LFC-EV system was determined as in Equation (5).
is the characteristic of the equation, while
P(
S) and
Q(
S) represent two polynomial with actual coefficients based on systems characteristics. The polynomial
P(
S) and
Q(
S) are given by Equation (6):
3.3. Linear Proportional-Integral-Derivative (PID) Controller
The proportional, integral, and derivative components make up the three basic portions of the linear Proportional-Integral-Derivative (PID) controller, a commonly used control mechanism. Equation (7) [
9] determines the PID controller’s output in its discrete-time version.
The output of the controller at step k is represented by u(k) in this equation, the error at step k is indicated by (k), and the proportional, integral, and derivative gains are denoted by Kp, Ki and Kd, respectively.
In proportion to the present mistake, the proportional term Kp ∗ e(k) adds to the controller’s output. It assists in lowering steady-state mistakes and offers prompt remedial action in response to the error. The cumulative sum of previous errors is taken into consideration by the integral term, Ki ∗ Σe(n). By gradually changing the controller’s output, it guarantees that any persistent fault will finally be fixed. The system’s ability to respond to persistent disturbances is enhanced and steady-state errors are reduced thanks to the integral term.
The rate of change of the error is taken into account in the derivative term, Kd ∗ (e(k) − e(k − 1)). It exerts a dampening effect on the system’s response and predicts future changes in the error. The derivative term contributes to the reduction of overshoot and the improvement of controller stability. To attain the intended control performance, the PID controller can be tuned by adjusting the values of Kp, Ki, and Kd. Effective regulation and control of the system are made possible by the proportional, integral, and derivative gains, which establish the relative contributions of each component to the output of the controller.
3.4. Reinforcement Learning (RL)
Through the use of reinforcement learning, a machine learning technique, an agent is trained to make choices and behave in a way that maximizes a cumulative reward. The way that both people and animals learn by making mistakes served as inspiration. Through interaction with the environment and feedback in the form of incentives or penalties for its activities, the agent in reinforcement learning learns to make the best decisions by experimenting with different tactics and taking advantage of the most lucrative ones. The agent iteratively refines its decision-making skills and creates a policy that maximizes its long-term payoff by mapping states to actions. The applications of reinforcement learning have been effectively implemented in a number of fields, such as autonomous systems, gaming, and robotics.
A kind of machine learning called reinforcement learning (RL) places a strong emphasis on solving problems repeatedly in order to achieve predetermined goals. It is essentially based on how creatures learn by adjusting their tactics or control laws in response to their surroundings, all without a prior understanding of the underlying system model [
10].
Think of this as an agent interacting with its surroundings, as shown in
Figure 5. Throughout this conversation [
11]:
The agent observes the current state, denoted as from a set of possible states S, and chooses an action from a set of possible actions A.
Upon receiving the chosen action, the environment transitions to a new state , also from S and provides a scalar reward from a subset of real numbers R.
In response, the agent receives this reward, finds itself in the new state , and determines its next action from A.
This cycle perpetuates until the agent reaches the terminal state
from S. The agent’s primary objective is to formulate an optimal control policy that maximizes its cumulative discounted rewards over time, referred to as the expected discounted return G_T as shown in Equation (8) [
10], where γ is the discounting factor, 0 < γ < 1.
The fundamentals of Markov Decision Processes (MDPs) are closely related to the reinforcement learning (RL) problem [
11]. A crucial feature of an MDP is its ability to guarantee that, regardless of an agent’s past experiences in different states, the current state contains all the necessary information to make decisions. Generally speaking, RL algorithms fall into two main categories: policy-based and value-based [
11]. Within the value-based paradigm of reinforcement learning, a unique value function is utilized to determine the value or importance of every state. Under policy π, the state-value function is expressed as follows in Equation (9) when evaluating from a certain state:
where
denotes the expectation under policy π. Similarly, under policy π, the action-value function, denoted as
, which represents the value of taking action a in state s, can be described in Equation (10) [
10]:
Of all the action-value functions, the one that stands paramount is the optimal action-value function, articulated as Equation (11):
The Q-learning algorithm is an off-policy, value-based learning methodology within reinforcement learning (RL). Within this framework, the action-value function, denoted as Q, aims to emulate the optimal action-value function, q^* by bypassing the adherence to the present policy. Its update mechanism is articulated as in Equation (12) [
12,
27,
28,
29,
30]:
where α denotes the learning rate.
3.5. Design Adaptive PID Controller Using Q-Learning
To create an adaptive PID controller that learns using Q-learning, the fundamental ideas of the traditional PID controller must be seamlessly integrated with the reinforcement learning paradigms that are inherent in Q-learning. The following is a structured framework for conceiving such an integrated system (see
Figure 6):
State: This indicates the current position of the system. In control challenges, the state usually contains the instantaneous error, denoted as e(t), and its temporal derivative, denoted as (de(t))/dt.
Action: This refers to changes made to the PID parameters. In real-world scenarios, actions could include increasing, decreasing, or maintaining the current PID gains.
Reward: Known as scalar feedback, the reward offers an assessment of the agent’s effectiveness. Here, possible metrics could be anything from the opposite of the absolute error to more complex assessment instruments that capture the system’s operational intelligence.
Q-table initialization: Start with a tabular framework in which states characterize rows and the actions that correspond define the columns. The internal values of this matrix, known as Q-values, are then gradually improved based on the iterative feedback from the overall system.
The steps of an algorithm can be illustrated as follows:
Action selection: using an exploration approach like ε-greedy, choose an action (change PID parameters) at each time step based on the current state.
Take action: apply the PID controller to the system and adjust the PID settings.
Observe reward: Calculate the reward and assess how the system responded. A PID controller’s objective could be to arrive at a predetermined point with the least amount of oscillation and settling time. Either the negative absolute error or a more complex function, such as the integral of time-weighted absolute error (ITAE), can be used to calculate the reward.
Update Q-values: adjust the state-action pair’s Q-value by applying the Q-learning update rule.
Loop: repeat this process for a specified number of episodes or until the Q-values converge.
The Q-values will direct the PID controller’s actions once the Q-table has been sufficiently trained. The course of action with the highest Q-value is the best one in a particular state. Since Q-learning has historically operated on discrete sets of states and actions, both the states and the actions must be discretized. For more complicated situations, neural networks (deep Q-learning) are one continuous version and approximator that can be used. To make sure the system is stable and adaptive, it should be tested in a variety of circumstances after it has been taught. A framework for combining Q-learning with PID controllers was explained in this overview.
4. Experimental Setup
The system model used in this study is depicted in
Figure 7. It consists of two interconnected areas: Area 1 (generation) and Area 2 (loading). In Area 1, the generation units include Generator G1 with a capacity of 700 MW, and Generator G2, also with a capacity of 700 MW. In Area 2, the generation units include Generator G3 with a capacity of 719 MW, and Generator G4 with a capacity of 700 MW. The total generation capacity of the system is 2819 MW.
The network is structured with several buses and transmission lines connecting the generators and loads. In Area 1, Bus 1 is connected to Generator G1 and Bus 2 to Generator G2. In Area 2, Buses 3 and 4 are connected to Generators G3 and G4, respectively. The two areas are interconnected through a series of transmission lines with specific distances between buses: 25 km between Bus 1 and Bus 5, 10 km between Bus 5 and Bus 6, 110 km between Bus 6 and Bus 7, another 110 km between Bus 7 and Bus 8, and a further 110 km between Bus 8 and Bus 9. From Bus 9 to Bus 10, the distance is 10 km, followed by 25 km from Bus 10 to Bus 11.
In addition, the network includes reactive power elements at specific buses to manage voltage levels and stability. Capacitor C7 with a capacity of 200 MVAr and Load L7 with a capacity of 967 MW are present at Bus 7, while Capacitor C9 with a capacity of 350 MVAr, and Load L9 with a capacity of 1767 MW are at Bus 9. The total active power of the investigated system is 2734 M. There is a power flow of 400 MW from Area 1 to Area 2, indicating the amount of transfer of power through the interconnection.
This model represents a typical high-voltage transmission network, which is used to analyze the power flow, stability, and reliability of the interconnected power system. The configuration ensures a robust system that can manage significant loads and maintain stable operations across the interconnected areas.
Kundur’s model is shown in
Figure 7. In Kundur’s paradigm, power normally moves from Area 1 to Area 2, with two machines (G and PV) where G2 and G3 are replaced by PV in each area. The PSSE program was used to run simulations in order to assess the efficacy of the methods presented in this study. The PSSE model used in the simulations was obtained from the University of Illinois at Urbana-Champaign’s Illinois Center for a Smarter Electric Grid (ICSEG) [PSSE Models]. In order to assess the effectiveness of adopting PID control and examine the simulation findings, a comparison was carried out with a focus on four particular buses. These buses (Bus 1, Bus 3, Bus 7, and Bus 9) were selected based on an observability analysis which means two buses in the generation area and two buses in the loading area are required to observe the dynamic response of the system stability. In each area, two buses selected should observe the generation bus (Bus 1 and Bus 3) and the other two buses should be selected to observe the load buses (Bus 7 and Bus 9).
6. Discussion
As discussed in the literature [
17,
18,
19,
20,
21,
22,
23,
24], using EVs to enhance system stability requires a fast response without any delay. As shown in
Figure 11,
Figure 12 and
Figure 13, the system restored its stability within 4 s after the disturbance of 50 MW generation reduction occurred when PID-RL was used. The allowable time to restore the system stability is within 60 s after a disturbance has occurred. Moreover, the highest power flow between the interconnected areas occurred when PID-RL was used, whereas the active power oscillation was the lowest when PID-RL was used.
Comparing our proposed PID-RL controller technique with the PID-PSO proposed in [
19],
Figure 11 shows that our proposed technique has a faster frequency response compared to PID-PSO. Therefore, when an EV fast charging station is modeled as a dynamic load for frequency control, the PID-RL is the best controller option in terms of a fast frequency response.
According to [
32], a comparison was conducted between the Genetic Algorithm (GA)—PID, Grey Wolf Optimizer (GWO)—PID, PSO-PID, and JAYA-PID in terms of the following: Peak Undershoot frequency (∆Hz), Settling Time (s) for both Area 1 and Area 2 as well as Peak Undershoot for power deviation in tie-line (MW) and Settling Time (s). To validate the fast response of our proposed PID-RL, a similar comparison to [
32] is prepared in
Table 1.
The comparison shows how fast the PID-RL controller is compared to others thanks to the training stage of the controller using Q-learning. All optimized parameters of PID controller techniques—in the literature—have taken certain optimum values based on the stability problem; however, the power of reinforcement learning is that PID parameters are dynamic and change their values based on the best rewards of Q-learning as explained in
Section 3. This fast response makes the proposed PID-RL suitable for a dynamic EV fast charging for load control with high performance, and it overcomes the problem of the delay discussed in the literature [
17,
18,
19,
20,
21,
22,
23,
24].