Next Article in Journal
Development of a Battery Diagnostic Method Based on CAN Data: Examining the Accuracy of Data Received via a Communication Network
Previous Article in Journal
Quantile Connectedness of Uncertainty Indices, Carbon Emissions, Energy, and Green Assets: Insights from Extreme Market Conditions
Previous Article in Special Issue
An Off-Site Power Purchase Agreement (PPA) as a Tool to Protect against Electricity Price Spikes: Developing a Framework for Risk Assessment and Mitigation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling of Collusion Behavior in the Electrical Market Based on Deep Deterministic Policy Gradient

1
Hubei Electric Power Co., Ltd., Power Exchange Center, Wuhan 430073, China
2
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
*
Authors to whom correspondence should be addressed.
Energies 2024, 17(22), 5807; https://doi.org/10.3390/en17225807
Submission received: 14 October 2024 / Revised: 1 November 2024 / Accepted: 5 November 2024 / Published: 20 November 2024
(This article belongs to the Special Issue Electricity Market Modeling Trends in Power Systems)

Abstract

:
The evolution of the electricity market has brought the issues of market equilibrium and collusion to the forefront of attention. This paper introduces the Deep Deterministic Policy Gradient (DDPG) algorithm on the IEEE three-bus electrical market model. Specifically, it simulates the behavior of market participants through reinforcement learning (DDPG), and Nash equilibrium and the collusive equilibrium of the power market are simulated by setting different reward functions. The results show that, compared with the Nash equilibrium, collusion equilibrium can increase the price of nodal marginal electricity and reduce total social welfare.

1. Introduction

The analysis of power market equilibrium is pivotal in various domains, including market operation simulation, market mechanism design, trading decision-making by market entities, market power analysis, electricity price forecasting, and source-grid-load investment planning [1]. The bilevel optimization model is the predominant approach for modeling power market equilibrium. Solutions are primarily derived through two methods: the model transformation method, which leverages the Karush-Kuhn-Tucker (KKT) conditions or the strong duality theorem to convert the bilevel model into a single-level mathematical program with equilibrium constraints (MPEC) or an equilibrium problem with equilibrium constraints (EPEC) [2,3]; and the iterative solution method, which employs optimization algorithms based on agent-based models, such as reinforcement learning and deep reinforcement learning. The model transformation method presupposes that the lower-level model is a convex optimization problem, which restricts its applicability to unit commitment and clearing models in the power market and is more suited for single-period market equilibrium problems [4,5,6]. In contrast, the iterative solution method is adept at addressing multi-period market equilibrium problems. Notably, deep reinforcement learning circumvents the “curse of dimensionality” that can arise from the discretization of the action space in reinforcement learning, making it suitable for large-scale system models.
Currently, power market modeling predominantly employs Nash game equilibrium, with insufficient attention given to collusion equilibrium scenarios [7,8]. This oversight hinders a comprehensive understanding and effective regulation of the power market’s actual operations. Collusion among power producers is a real possibility in the electricity market, yet most existing research concentrates on Nash game equilibrium, neglecting the collusive equilibrium scenario. Due to the lack of research on collusion scenarios, market regulators cannot quickly identify collusion behaviors in the electricity market. Collusion can lead to market price distortions, inefficient resource allocation, consumer harm, and reduced market efficiency. Furthermore, the inadequate consideration of collusive equilibrium results in deficiencies in market mechanism design and policy formulation [9]. Existing market rules and regulatory policies may fail to identify and prevent collusive practices, thus jeopardizing fair competition and the stable operation of the electricity market. There is also a dearth of in-depth research on monitoring and punitive mechanisms for collusion, which hampers the creation of a strong deterrent against potential collusive activities. Additionally, overlooking the collusive equilibrium scenario restricts the assessment and management of power market risks. Collusion can induce market volatility and increased uncertainty, affecting the safe and stable operation of the power system. The absence of quantitative analysis and countermeasures for market risks under collusive equilibrium leaves the power market exposed to collusive risks.
Considering the problem mentioned above, this paper introduces an innovative application of the Deep Deterministic Policy Gradient (DDPG) algorithm to address the collusion equilibrium in the power market, validated through rigorous testing within the IEEE three-bus model. Using the DDPG model to simulate the power market, participants can converge to the market equilibrium point in different scenarios. The results show that the collusive equilibrium significantly increases the bidding price by market participants compared with Nash equilibrium. Concurrently, such collusive practices significantly elevate the node marginal price of electricity. The increased nodal marginal price not only indicates the distortion inflicted by collusion on the electricity market’s price formation mechanism but also underscores the detrimental effects of collusion on the market’s fairness and efficiency.

2. Problem Modeling

2.1. Electrical Market

Bidding curve and cost: Considering actual market operations, this paper employs slope bidding, assuming linearity between the unit’s power generation cost and its output. Given that unit costs are typically represented in quadratic form in existing literature, we adopt the secant method for approximation in this study [10].
Assume that the true cost of generator i is a quadratic function about output power. This quadratic model captures the relationship between the power output and the cost associated with it. The cost function is given by
cost i q u a r = 1 2 · b i · p i 2 + a i q u a r · p i p i min p i p i max
In this equation, cost i q u a r represents the quadratic cost of generator i, b i is the coefficient that determines the curvature of the cost function, p i is the output power of generator i at a given time interval, and a i q u a r is the linear coefficient that affects the cost. The constraints p i min p i p i max ensure that the output power p i is within the operational limits of the generator, where p i min is the minimum power output and p i max is the maximum power output.
In this paper, we employ secant functions to approximate quadratic cost functions. The quadratic real cost is approximated using the secant approximation formula as follows:
cost i = a i · p i a i = cost i q u a r ( p i max ) cost i q u a r ( p i min ) p i max p i min .
Here, a i represents the marginal cost, which is the slope of the secant line between the maximum and minimum power outputs. This approximation allows us to estimate the cost at any output level within the range p i min p i p i max . Furthermore, the bidding function is defined as
bidding cost = α i α i { 0 , 3 a i } .
In this context, α i denotes the bidding marginal cost, which is restricted to either zero or three times the marginal cost a i . This binary choice reflects a strategic decision in bidding within the market.
Figure 1 illustrates the cost function for a unit, depicting both the quadratic curve and the secant line approximation, effectively capturing the relationship between power output and cost.
Market model:at various time intervals, the electrical consumer demand can be accurately modeled as a linear function, which is represented by the equation
consumer price = f · ( D q d )
where, f is the slope of the demand function, D is the total demand, and q d is the quantity demanded. This linear model helps in understanding the relationship between the consumer price and the quantity demanded.
Subsequently, the consumer utility curve, which is a measure of the satisfaction derived from consuming a certain quantity of electricity, is expressed as follows:
consumer utility = f · D · q d 1 2 f · q d 2
This quadratic utility function captures the diminishing marginal utility, where the second term 1 2 f · q d 2 indicates that as the quantity demanded increases, the additional satisfaction gained from consuming more electricity decreases.
In the pursuit of maximizing total social welfare, the Independent System Operator (ISO) clears the market under the constraints of nodal power balance, branch flow limitations, and generator output boundaries. This market clearing process at discrete time intervals can be effectively modeled using a DC power flow approach. The optimization problem is formulated as follows:
max ( p i , q d ) d D f · D · q d 1 2 f · q d 2 i G cost i subject to : d D q d = i G p i F PTDF ( q t p t ) F p i min p i p i max , i G
In this formula, the vector F represents the maximum allowable flow on each line within the network. The matrix PTDF, which is known as the Power Transfer Distribution Factor matrix, plays a crucial role in the system. Vectors p t and q t denote the power generation and demand across all buses, respectively, and these are linear combinations of p i and q d .
At each time interval, the profit for each generator i is determined by maximizing the revenue less the cost. This can be mathematically represented as
max ( r i ) = λ i · p i cost i , i G
This formulation captures the economic cost of power generation, taking into account both the market price and the generation cost. The term cos t i represents the cost function for generator i, with i G indicating that the generator is located at bus i where λ i denotes the nodal price at bus i in Equation (7).

2.2. DDPG

In contrast to game theory methods that require full information, the agent-based simulation technique models the market as a partially observable Markov decision process, which employs a reinforcement learning approach. Within this framework, each generation agent g perceives an observable state s g t at time t, selects an action a g t , and subsequently receives a reward r g t . The objective for each agent is to maximize their cumulative reward over the entire time horizon T, although the specific formula for this cumulative reward is not included here for brevity.
The variables are interpreted in conjunction with market mechanisms as follows: The state s gt , which is a key component in defining the decision-making context for each generation agent g at time interval t, is constituted by the nodal prices from the previous time interval and the total load demand of the current time interval. Specifically, it is represented by the combination of these factors as
s g t = ( λ 1 , t 1 , λ 2 , t 1 , , λ I , t 1 , D t )
where λ i , t 1 denotes the nodal price at bus i during the previous time interval, and D t represents the total load demand at time t.
In this context, the strategic variable α gt corresponds to the action a g t , which is generated by the GenCo agent g. However, their ranges may differ and require subsequent scaling. The payoff r gt , often termed as reward in the RL domain, is assumed to reflect the GenCos’ rational consideration of their own payoffs, thus equating payoffs to rewards in this study.
The Deep Deterministic Policy Gradient (DDPG) algorithm, an actor-critic and model-free approach, is based on the Deterministic Policy Gradient and operates in continuous state and action spaces. The DDPG algorithm is built around two main networks: the actor network, described by the policy function μ ( s | θ μ ) with parameters θ μ , and the critic network, described by the action-value function Q ( s , a | θ Q ) with parameters θ Q . To facilitate training, target networks are created: the actor target network μ with parameters θ μ and the critic target network Q with parameters θ Q . The network architectures used in this study are depicted in Figure 2.
The Deep Deterministic Policy Gradient (DDPG) agent-based learning method leverages the Bellman equation to establish a recursive relationship, which is fundamental to the learning process. This relationship is expressed as Q μ ( s t , a t ) = E r t + γ Q μ s t + 1 , μ s t + 1 , where Q μ ( s t , a t ) represents the action-value function, r t is the immediate reward, and γ is the discount factor.
The loss function for the Q network, which is crucial for training the network, is given by: L ( θ Q ) = E μ Q ( s t , a t θ Q ) y t 2 . Here, y t is the target value used for training, computed as Q μ ( s t , a t ) = E r t + γ Q μ s t + 1 , μ s t + 1 . This formulation allows the agent to learn the optimal policy by minimizing the difference between the predicted and target values of the Q function.
The network μ is updated by applying the chain rule to the previous equation with respect to its parameters:
θ μ J E μ θ μ Q ( s , a θ Q ) | s = s t , a = μ s t θ μ = E μ a Q ( s , a θ Q ) | s = s t , a = μ s t θ μ μ ( s θ μ ) | s = s t
In the DDPG algorithm, the sequence of operations is as follows: The policy network μ produces an action u ( s t ) given the current state s t , and noise n t is introduced to this action, yielding a t = μ ( s t ) + n t . Upon taking action a t , the environment transitions the agent to the next state s t + 1 , along with a reward r t . This experience, encapsulated as ( s t , a t , r t , s t + 1 ) , is saved in the replay buffer. For training, a mini-batch consisting of N randomly selected experiences from the buffer, denoted as { ( s j , a j , r j , s j + 1 ) } , is utilized.
Utilizing the mini-batch s j , a j , r j , s j + 1 , the policy network μ computes μ s j , which is then fed into the action-value network Q. Within this framework, the gradients are calculated via automatic differentiation. Specifically, the gradient of the action a is determined as
a Q s , a θ Q | s = s j , a = μ s j
and the gradient of the parameter θ μ is found by
θ μ μ s θ μ | s = s j
These gradients facilitate the approximation of the policy gradient:
θ μ J 1 N j a Q ( s , a | θ Q ) | s = s j , a = μ s j θ μ μ ( s | θ μ ) | s = s j
Employing a small learning rate l r a , updates to the network μ ’s parameters are made via gradient ascent to refine the policy’s performance.
θ μ θ μ + l r a θ μ J
Lastly, the target networks are softly updated with a small update rate τ :
θ Q τ θ Q + ( 1 τ ) θ Q θ μ τ θ μ + ( 1 τ ) θ μ

2.3. Infinitely Repeated Games

Electricity market auctions are conceptualized as an infinite series of identical static games, each incorporating a discount factor γ that reflects the present value of future payoffs. This factor becomes crucial for generator companies (GenCo) g, as their present value of payoffs is determined by the discounted sum t = 1 γ t 1 r g t . The closer γ is to 1, the more significant the weight of future payoffs, signifying greater patience from the generator. When these static games are consistent over time, they constitute an infinitely repeated game Γ ( , γ ) . Gibbons’ [11] analysis of the infinitely repeated Prisoner’s Dilemma reveals that the grim-trigger strategy, characterized by initial cooperation and subsequent defection upon betrayal, constitutes a Nash Equilibrium when γ γ ̲ , with γ ̲ being the critical discount factor dependent on the payoffs from cooperation, betrayal, and mutual defection. The Folk Theorem [12] is that in such scenarios, highly patient players can secure higher payoffs than in single-stage games, with cooperative strategies emerging as Nash Equilibria as γ approaches 1.

2.4. Nash Equilibrium and Collusion Equilibrium

We categorized observed episodes and agent strategies along a spectrum from competitive to collusiveby defining an episodic collusion measure. In the context of a Markov game, a collection of agent policies is identified as competitive, or a Nash equilibrium, if no agent i can unilaterally select a different policy to improve their expected total episode profit, given the fixed strategies of their opponents, as expressed by the maximization of individual expected rewards:
max E r i .
Conversely, the same collection of policies is deemed collusive, or a monopolistic equilibrium, if it aims to maximize the expected total profit across all agents, which is the maximization of the sum of expected rewards for all agents:
max E i = 1 n r i .

3. Results

3.1. Setting

We employ multiple Deep Deterministic Policy Gradient (DDPG) agents to simulate an electricity market, where each agent independently learns its own policy, considering other agents as part of the environment. Centralized training, opponent modeling, and communication can improve convergence but require knowledge of other agents’ actions or policies, conflicting with market realities. We focus on the market strategy of each generator (gen) without prior knowledge of other gen’ behaviors, thus opting for an IQL-like architecture. Moreover, considerable empirical evidence indicates that such architectures are often effective in practical scenarios, a finding our experimental results will corroborate.
The process of employing multiple Deep Deterministic Policy Gradient (DDPG) agents to simulate an electricity market is delineated by DDPG. In this context, distinct agents are represented by the subscript g for their respective parameters. It is posited in this study that the strategic variable α g t is allowed to vary without constraints within the interval of 0 to 3 α g m , represented as A g = [ 0 , 3 α g m ] . As given by the equation, the permissible bounds of the strategic variable require the output a of the actor network, which ranges from −1 to 1 as per the hyperbolic tangent function tanh ( x ) , to be scaled. This is shown in Figure 1.
α g t = ( a g t + 1 ) · 3 2 α g m
The capacity of the replay buffer is established at T cap = 1000 . Training of the agents is initiated once the replay buffer reaches its full capacity. For the agent’s exploration mechanism, Gaussian noise is utilized. The default hyperparameters for the Deep Deterministic Policy Gradient (DDPG) algorithm are as follows: γ = 0 , N = 64 , τ = 10 2 , lr a = 10 3 , lr c = 10 3 , and T = 10,000 characterized by:
n g t N 0 , σ t 2
The σ t is initialized at a value of 1, and it is designed to gradually decrease to 0.02, with a minimum threshold established to prevent excessively small values from occurring once the training phase begins. The decay of σ t is governed by the following piecewise function:
σ t = 1 , 0 < t T cap max ( 0 . 999 t T cap , 0.02 ) , T cap < t T tra
In this context, the termination point for training, denoted by T tra , is set to T tra = 9000 . Once the value of t surpasses T tra , the training and the addition of noise are ceased to allow for the observation of the network’s unaided output, marking the transition into the testing phase. For a detailed account of the parameters of the three-node power market model, reference is made to [13]. The IEEE three-bus system model is depicted in Figure 3. Marginal cost prices for Gen are 17.5 USD/MWh and 20 USD/MWh, respectively. The max power for the line of Bus 1 to Bus 3 is 25 MWh.

3.2. Equilibrium Result

Within the electricity market model, we deployed both the Particle Swarm Optimization (PSO) and the Deep Deterministic Policy Gradient (DDPG) algorithms to simulate two distinct equilibrium scenarios: Nash equilibrium and collusive equilibrium. Under the Nash equilibrium, the PSO algorithm achieved an equilibrium point of (22.70, 24.65) as depicted in Figure 4, whereas the DDPG algorithm arrived at (22.59, 24.83) as shown in Figure 5. In the context of collusive equilibrium, the PSO algorithm determined the point (28.93, 29.58) presented in Figure 4, whereas the DDPG algorithm’s convergence was to (28.81, 29.14) illustrated in Figure 5. These results demonstrate the algorithms’ capability to converge to equilibrium points within their respective scenarios, with the DDPG algorithm exhibiting tighter convergence in the collusive equilibrium scenario. This performance is attributed to DDPG’s adaptability and robustness in managing continuous action spaces and high-dimensional state spaces. Consequently, the DDPG algorithm emerges as a potent tool for pinpointing equilibrium points within the electricity market model, thereby providing precise decision support for market participants. Additionally, Figure 6a,b indicate the presence of optimal collusion when blocking occurs.

3.3. Collusion Effect

Figure 7 demonstrates the substantial influence of collusive behavior on the profits of two power-generating units within the electricity market. Under collusive equilibrium conditions, these units enhance their profits by decreasing their generation levels and increasing the nodal marginal price. This conduct contravenes the tenets of market competition, potentially diminishing market efficiency and curtailing consumer surplus. Given that collusive behavior could lead to market power abuse and unfair competition, it is imperative for regulatory bodies to intensify their oversight to safeguard market fairness and transparency.

4. Conclusions

In order to solve the limitation of not fully considering the collusion equilibrium scenario in the current electricity market research, this study introduces the Deep Deterministic Policy Gradient (DDPG) algorithm to solve the collusion equilibrium in the electricity market. It is verified by the IEEE three-bus model. The results of the DDPG algorithm and PSO algorithm show that both algorithms can converge to market Nash equilibrium and collusion equilibrium. Further observation shows that when the units collude, the node marginal price and social welfare will be significantly increased, and the profits of the colluding units will be significantly increased. This study improves the understanding of collusive behavior in the electricity market and lays a foundation for regulatory intervention.

Author Contributions

Data curation, Writing—original draft, Resource, Project administration, Y.L., J.C., M.C. and Z.H.; writing—review and editing, Y.G. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the State Grid Hubei Electric Power Company through the technical service project titled “The design and operation control of power market operation index system adapted to the whole market structure” (Project Number: SGDLIY00SCIS2400031).

Data Availability Statement

Data are unavailable due to privacy restrictions.

Conflicts of Interest

Authors Yifeng Liu, Jingpin Chen, Meng Chen and Zhongshi He were employed by the company Hubei Electric Power Co., Ltd. Power Exchange Center. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Boyd, S.; Vandenberghe, L.; Faybusovich, L. Convex optimization. IEEE Trans. Autom. Control. 2006, 51, 1859. [Google Scholar]
  2. Hassan, A.; Dvorkin, Y. Energy storage siting and sizing in coordinated distribution and transmission systems. IEEE Trans. Sustain. Energy 2018, 9, 1692–1701. [Google Scholar] [CrossRef]
  3. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  4. Ruiz, C.; Conejo, A.J. Pool strategy of a producer with endogenous formation of locational marginal prices. IEEE Trans. Power Syst. 2009, 24, 1855–1866. [Google Scholar] [CrossRef]
  5. Zhang, X.P. Restructured Electric Power Systems: Analysis of Electricity Markets with Equilibrium Models; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
  6. Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid 2019, 11, 1343–1355. [Google Scholar] [CrossRef]
  7. Chow, Y.; Ghavamzadeh, M. Algorithms for CVaR optimization in MDPs. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
  8. Wu, C.; Gu, W.; Yi, Z.; Lin, C.; Long, H. Non-cooperative differential game and feedback Nash equilibrium analysis for real-time electricity markets. Int. J. Electr. Power Energy Syst. 2023, 144, 108561. [Google Scholar] [CrossRef]
  9. Moitre, D. Nash equilibria in competitive electric energy markets. Electr. Power Syst. Res. 2002, 60, 153–160. [Google Scholar] [CrossRef]
  10. Astero, P.; Choi, B.J. Electrical market management considering power system constraints in smart distribution grids. Energies 2016, 9, 405. [Google Scholar] [CrossRef]
  11. Gibbons, R. A Primer in Game Theory; Prentice Hall: Saddle River, NJ, USA, 1992. [Google Scholar]
  12. Friedman, J.W. A non-cooperative equilibrium for supergames. Rev. Econ. Stud. 1971, 38, 1–12. [Google Scholar] [CrossRef]
  13. Waheed, M.; Jasim Sultan, A.; al Bakry, A.A.A.; Saeed, F.N. Harmonic reduction in IEEE-3-bus using hybrid power filters. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1105, 012012. [Google Scholar] [CrossRef]
Figure 1. The cost function of Unit.
Figure 1. The cost function of Unit.
Energies 17 05807 g001
Figure 2. Network structure of DDPG algorithm.
Figure 2. Network structure of DDPG algorithm.
Energies 17 05807 g002
Figure 3. IEEE 3-bus system.
Figure 3. IEEE 3-bus system.
Energies 17 05807 g003
Figure 4. PSO for electrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Figure 4. PSO for electrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Energies 17 05807 g004
Figure 5. DDPG for Eeectrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Figure 5. DDPG for Eeectrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Energies 17 05807 g005
Figure 6. Collusive benefit distribution. (a) Collusive benefit distribution without line blocking. (b) Collusive benefit distribution with line blocking.
Figure 6. Collusive benefit distribution. (a) Collusive benefit distribution without line blocking. (b) Collusive benefit distribution with line blocking.
Energies 17 05807 g006
Figure 7. Profit for Nash equilibrium and collusion equilibrium.
Figure 7. Profit for Nash equilibrium and collusion equilibrium.
Energies 17 05807 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Chen, J.; Chen, M.; He, Z.; Guo, Y.; Li, C. Modeling of Collusion Behavior in the Electrical Market Based on Deep Deterministic Policy Gradient. Energies 2024, 17, 5807. https://doi.org/10.3390/en17225807

AMA Style

Liu Y, Chen J, Chen M, He Z, Guo Y, Li C. Modeling of Collusion Behavior in the Electrical Market Based on Deep Deterministic Policy Gradient. Energies. 2024; 17(22):5807. https://doi.org/10.3390/en17225807

Chicago/Turabian Style

Liu, Yifeng, Jingpin Chen, Meng Chen, Zhongshi He, Ye Guo, and Chenghan Li. 2024. "Modeling of Collusion Behavior in the Electrical Market Based on Deep Deterministic Policy Gradient" Energies 17, no. 22: 5807. https://doi.org/10.3390/en17225807

APA Style

Liu, Y., Chen, J., Chen, M., He, Z., Guo, Y., & Li, C. (2024). Modeling of Collusion Behavior in the Electrical Market Based on Deep Deterministic Policy Gradient. Energies, 17(22), 5807. https://doi.org/10.3390/en17225807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop