1. Introduction
To manage price risk effectively, it is common to establish complementary markets for financial derivatives and introduce power financial derivatives. These strategies utilize hedging mechanisms to mitigate price risk. The development of power financial derivatives is a crucial element of the modern power financial market, driven by the high demand for risk hedging solutions [
1,
2]. The trading of power financial derivatives has become more standardized and transparent. Globally, countries have progressively established trading platforms for this market. For example, in Europe, Nordic power futures are primarily listed on the European Energy Exchange (EEX) and ICE ENDEX, while options contracts are transitioning from Nasdaq to EEX.
While participating in the spot power market, power producers can hedge risks, manage positions, and engage in arbitrage by using financial derivatives such as futures, options, and forward contracts within the power financial derivatives market [
3]. The introduction of financial derivatives represents an inevitable trend in the evolution of the power market. Currently, a variety of financial derivatives for electricity, including futures, options, and swaps, are available globally, each designed to address different risk management needs. Power producers in the power finance market must carefully select investment strategies to optimize risk hedging [
4].
Current research on investment strategies for electricity financial derivatives primarily addresses uncertainty and risk factors. For example, Yucekaya employs Monte Carlo simulations to analyze scenarios involving spot, strike, and contract prices, with a focus on the Turkish electricity market. The study aims to optimize capacity allocation to maximize expected profit for a coal-fired unit within this market [
5]. Sheybani et al. present an equilibrium model that incorporates market interactions, demand uncertainty, and consumer elasticity, examining the impact of strike and premium prices of put options on both the put option and day-ahead electricity markets [
6]. Jaeck et al. explore the Samuelson effect across four major electricity futures markets (German, Nordic, Australian, and US) from 2008 to 2014, adding a new dimension to the effect [
7]. Although these studies offer thorough analyses of price risk and investment strategies in the electricity financial market, they frequently lack specific scenarios for varying environments. Power financial transactions are inherently complex and variable, making it challenging to derive general scenarios from diverse trading conditions.
The emergence of artificial intelligence algorithms, particularly machine learning, has resulted in a significant increase in related methods within the power and finance markets. For instance, Zhipeng Liang et al. examine three advanced continuous reinforcement learning algorithms and introduce an adversarial training method that markedly enhances training efficiency, average daily return, and Sharpe ratio in backtests [
8]. Shangkun Deng et al. propose a combined approach integrating XGBoost, SMOTE, and NSGA-II to forecast the directional movement of futures prices, demonstrating improved prediction accuracy, profitability, and return–risk ratio in simulation trading [
9]. Germán G. Creamer et al. propose a multivariate distance nonlinear causality test (MDNC) that utilizes partial distance correlation within a time series framework, effectively detecting nonlinear lagged relationships and enhancing forecasting power when integrated with machine learning methods. This approach demonstrates a substantial improvement in forecasting key energy financial time series compared to the classical Granger causality method [
10]. Additionally, Foued Saâdaoui et al. propose a novel multiscale machine learning forecasting model that employs Variational Mode Decomposition (VMD) to analyze cross-correlation between two European electricity markets, revealing significant long- and medium-term dependencies while indicating short-term independence. This model thus provides diverse investment opportunities [
11].
Furthermore, numerous researchers have employed deep reinforcement learning (DRL) algorithms to devise timing strategies, stock-picking strategies, statistical arbitrage strategies, and portfolio strategies [
12,
13]. DRL can enhance investment strategies and real-time decision-making by interacting with the market and learning from feedback to refine optimal portfolio strategies. For example, Uyen Pham et al. investigate the application of DRL to develop an automated hedging strategy for Vietnam’s stock market, aiming to improve trading performance and manage risk under volatile conditions [
14]. Zhipeng Liang et al. examine three advanced continuous reinforcement learning algorithms and introduce an Adversarial Training method that significantly enhances training efficiency, average daily return, and Sharpe ratio in backtests [
15]. Additionally, Xiao-Yang Liu et al. introduce the FinRL library, designed to assist beginners in developing and testing DRL trading strategies [
16]. This library features a modular structure, incorporating state-of-the-art DRL algorithms and standard evaluation baselines, which facilitates strategy development, debugging, and comparison.
However, the DRL methods proposed for conventional financial derivatives trading strategies are not specifically tailored for the electricity trading process. Electricity, as an intangible commodity, operates under distinct market mechanics compared to traditional derivatives. Consequently, contracts in the electricity financial market differ significantly from those used in conventional financial derivatives, both in terms of structure and settlement mechanisms. Therefore, these studies may not fully address the unique characteristics and requirements of the electricity financial market. Specifically, the DRL methods discussed above do not yet provide adequate support for power producers engaging in the electricity finance market.
Based on this analysis, the paper proposes a multi-agent deep reinforcement learning (MADRL) algorithm that integrates various machine learning techniques to identify the optimal portfolio investment strategy for power producers in the electricity financial market. This approach acknowledges that financial derivatives, depending on their time granularity, exhibit varying degrees of effectiveness in risk hedging. Trading annual and quarterly contracts involves decision variability among power producers, influenced by their long-term investment objectives and specific circumstances. Given the complexity and diversity of strategies associated with these longer-term contracts, the focus shifts to short-term power financial derivatives, specifically day futures, week futures, and weekend futures. A cooperative environment is established by combining different machine learning techniques to develop tailored investment strategies for these derivatives. The traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm within MADRL is enhanced by employing a dual-Q network in place of the target network in the actor to address the issue of overestimated Q values for the agent’s actions. Subsequently, the dual-Q MADDPG algorithm is applied to historical data from the Nordic power market to identify the optimal portfolio strategy.
The numerical experiment results indicate that the Dual-Q MADDPG algorithm employed in this paper effectively mitigates spot market price risks. This is achieved through the cooperation of multiple agents and a combination of investments in various short-term electricity futures contracts. Additionally, this method generates objective market speculation income for power producers.
The contributions of this paper can be summarized as follows:
The MADRL method provides a portfolio investment strategy for power producers. It delivers an optimal buying and selling plan for futures by learning from historical trading data.
The algorithm accounts for the speculative and hedging behavior of power producers. Alongside hedging price risks with futures contracts, producers can also engage in futures transactions for additional income.
The traditional MADDPG algorithm is enhanced by using a double-target critic neural network in the actor-critic structure. This improvement helps power producers achieve greater prfit in the futures market.
2. Futures Trading Mechanisms for Power Producers
This paper uses the example of the EEX electricity futures market in Europe, which is a leading platform in the European power market. It offers trading in euro-denominated, cash-settled futures contracts across 20 European power markets. Among EEX’s largest markets, the German Power Future stands out as the most liquid power product and serves as the benchmark contract in European wholesale power trading [
17].
2.1. Types of Power Futures Contracts
European power futures are classified into day, week, weekend, quarter, and annual futures, along with other time-based variations. Regarding delivery mode, power futures are categorized into peak power futures, which cover high-demand periods, and base power futures.
We use the example of peak peak futures in Germany for illustration. According to the contract description provided by Intercontinental Exchange (ICE), a peak day load futures contract is financially settled based on the hourly prices from 08:00 to 20:00 CET of the German day-ahead auction in the control area operated by Amprion GmbH. This settlement applies to each day of the contract period, excluding weekends and irrespective of public holidays. For base day futures, the contract size is 1 MW per day, multiplied by the number of hours in the day, which can be 23, 24, or 25 h depending on the time of year (summer or winter) [
18]. Similarly, base week futures cover a delivery period of 7 days, or 168 h, while weekend base futures cover a delivery period of 48 h on the weekend.We can list the delivery times for the six types of short-term futures contracts as shown in
Table 1.
2.2. The Whole Trading Cycle
Futures trading cycles are typically divided into two phases: the contract period and the delivery period. During the contract period, traders establish positions in futures contracts within the trading hours set by EEX. In the delivery period, the holder of a long position in a futures contract is obligated to execute the contract. The detailed process is illustrated in
Figure 1.
Power producers can engage in futures contracts during the contract period to lock in the price of electricity delivery and pay a specified deposit. If the power producer does not sell the contract before the delivery date, it is obligated to fulfill the contract terms. If the agreed delivery price in the contract exceeds the spot market price, the power producer stands to make a profit; conversely, if it is lower, the producer will incur a loss. Additionally, power producers have the option to sell their contracts before the delivery date, potentially realizing a profit. This option entails a decision-making process for the power producers.
2.3. Decision-Making of Power Producers
Power producers engage in the power futures market for three primary reasons:
Next, an illustrative example is provided. A power producer anticipates a significant drop in the market price of electricity during the third week of the month due to expected changes in energy supply and demand, among other factors. Consequently, the producer purchases a base week futures contract at a price of EUR 100/MWh. The producer’s cost of generation is EUR 80/MWh, while the average spot market price during the delivery period is EUR 60/MWh. Despite the spot market price being lower than the generation cost, the producer still realizes a profit due to the favorable futures contract price. The net profit earned by the power producer during the 168 h delivery period of the base week futures contract is
Meanwhile, the power producer avoided the following losses:
Overall, power producers achieve a profit of EUR 336,000 by participating in the futures market.
Similarly, if the power producer opts to purchase 50 peak week futures contracts at a price of EUR 120/MWh, and the average electricity price during the peak delivery period is EUR 70/MWh, the profit earned by the power producer is
In addition to hedging their risks, power producers can also engage in speculation within the futures market. Since futures contracts can be sold short at any time before delivery, power producers have the opportunity to realize speculative profits. For example, a power producer can achieve a profit by purchasing a base day futures contract at EUR 80/MWh and subsequently selling 100 contracts at EUR 90/MWh before the delivery period.
However, if the power producer sells the futures contract, this action may impact the effectiveness of subsequent risk hedging.
3. Materials and Methods
Based on the analysis in the previous section, power producers typically face two primary challenges when participating in the electricity futures market:
Determine whether to purchase new contracts or sell existing ones, and establish the quantity to be bought or sold.
Formulate a strategic investment approach for various futures contracts to maximize profits.
To address these two challenges, a MADRL approach can be employed. This method leverages multiple agents to develop distinct investment strategies for futures contracts, which are subsequently integrated into a cohesive investment strategy.
3.1. Overview of MADRL
General planning theory typically addresses discrete subproblems, while Reinforcement Learning (RL) focuses on the overarching challenge of an agent interacting with an uncertain environment [
21]. In RL, agents receive rewards based on specific objectives by perceiving the environment and selecting actions that influence it. Deep Learning (DL) has significantly advanced RL by enhancing state–action mappings, thereby improving the agent’s capacity to comprehend the environment and make effective decisions. The integration of DL with RL has led to the emergence of DRL, which combines the strengths of both approaches [
22].
In real-world applications, multiple agents often learn and manage various tasks simultaneously, resulting in the emergence of the MADRL problem. However, increasing the number of agents presents challenges in managing their interactions. In MADRL, even without direct communication, the actions of one agent can still impact others. For example, the decisions made by one trader can influence stock prices, potentially benefiting or adversely affecting other automated traders.
In MADRL, there are four primary types of relationships between agents: Fully Cooperative, Fully Competitive, Mixed Cooperative and Competitive, and Self-Interested. When multiple agents engage in a cooperative relationship, the problem can often be modeled as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), which can be formulated as a tuple , where:
denotes the set of agents;
denotes the set of observations;
represents the joint action set;
represents the joint observation set;
r is the reward function applicable to all agents;
is the discount factor.
This paper models the investment strategies of power producers through six distinct agents: Agent 1 focuses on base day futures; Agent 2 concentrates on base week futures; Agent 3 specializes in base weekend futures; Agent 4 addresses peak day futures; Agent 5 focuses on peak week futures; and Agent 6 specializes in peak weekend futures.
The MADRL framework employs Centralized Training with Distributed Execution (CTDE). As illustrated in
Figure 2, each agent corresponds to different specifications of futures contracts. During offline centralized training, the agents share observational information, while during execution, each agent independently determines the appropriate trading volume for the futures contracts.
3.2. Action Sets and Action Spaces
Futures contracts are typically listed several days or weeks before the delivery date, and different types of contracts have varying delivery periods. Therefore, we can assume the following:
l: Day futures contracts will be available l days before the delivery date;
m: week futures contracts will be available l weeks before the delivery date;
n: weekend futures contracts will be available n weekends before the delivery date.
Thus, each trading day can determine the corresponding contract for
l days,
m weeks, and
n weeks back. The action sets available to the six agents are as follows:
where
is the volume of base day futures traded after
i day.
is the volume of base week futures traded after
i week.
represents the volume of base weekend futures traded after
i weekends, where
denotes the weekend of the current week. Specifically, futures contracts delivered during this weekend are tradable from Monday through Friday of the same week.
,
, and
denote the trading volumes of peak day, week, and weekend futures contracts, respectively.
At the time
t, Agent 1 must determine the purchase or sale volumes of
l types of base day futures. Similarly, Agent 2 must decide the purchase or sale volumes of
m types of week futures, and Agent 3 must determine the purchase or sale volumes of
n types of weekend futures. Agents 4, 5, and 6 perform analogous actions concerning peak futures. The meanings of the symbols
,
,
,
,
, and
in the action set are illustrated in
Figure 3.
Next, we discuss the action space of the agents. Since the signing of futures contracts requires the payment of a corresponding margin, there is a limit on the number of contracts that can be signed and traded daily. The upper limit of an agent’s actions depends on the power producer’s capital position. Therefore, we further define the total cost for the power producer to trade futures contracts as
where
and
are the margins for base and peak day futures, respectively.
and
are the margins for base and peak week futures, respectively.
and
are the margins for base and peak weekend futures, respectively.
Power producers must allocate less than their maximum available funds when purchasing futures contracts.
where
represents the maximum amount of money a power producer can invest in futures at time
t.
As mentioned in
Section 2.3, power producers can sell futures contracts for speculative profit before the delivery date. When the trading volume of futures
, it indicates that power producers are partially selling their futures holdings. Therefore, the lower bound of an agent’s action at time
can be defined as
where we assume that when a futures contract is executed, the trading volume of futures
.
Since power producers often hold multiple futures contracts at varying prices, they must determine which contracts to sell and the quantities to sell when engaging in speculative trading for profit. This requires making assumptions about the rules governing the sale of futures and options contracts by power producers. It is assumed that when a power producer holds contracts at varying prices and decides to close a futures or options position, the contract with the largest spread will be sold preferentially to maximize short-term profits.
In addition, power producers must trade futures during the trading days specified by the EEX. Trades typically occur on business days that exclude weekends and holidays. For the purposes of this paper, we assume that trading takes place from Monday to Friday each week. Therefore, on Saturdays and Sundays, the trading volume of futures is .
3.3. Observation Sets and Observation Spaces
The observation sets available to the six agents can be described as follows:
where
is the average electricity price in the base hours at the time
t.
is the average electricity price in the peak hours at the time
t.
and
are the contract strike price for base and peak day futures, respectively.
and
are the contract strike price for base and peak week futures, respectively.
and
are the contract strike price for base and peak weekend futures, respectively. These observation spaces provide fundamental information about futures contracts and will enhance the agent’s decision-making.
3.4. Rewards Functions
The MADRL algorithm discussed in this paper operates within a cooperative multi-agent setting, where all agents share a common reward signal that reflects their joint contributions to the system’s performance. The reward for the agents is derived from the overall profits of the power producer, as these profits effectively capture the collective performance and operational efficiency of the agents. The profit gained by power producers consists of five components: the profit from speculating on futures contracts, the profit gained from the delivery of futures, the recovered contract margin, the margin paid for signing the futures contract, and the return of the margin gained at the last time step.
3.4.1. Profit from Speculating
A power producer may sell its holdings of futures contracts prior to the delivery period. At this point, the power producer is considered an ordinary participant in the futures market. Once the contract is sold, the power producer has no obligation to execute it. If the market price of the futures contract at that time is higher than the purchase price, the power producer can realize a profit. Additionally, by selling the futures contract, the previously paid margin can be recovered.
We take the example of base day futures. When
, it means that the contract is sold at the current time. The power producer’s portion of the gain can be expressed as follows:
where
denotes the power producer’s speculative profit on base day futures.
denotes the price of a futures contract purchased on day
for delivery after day
j.
denotes the volume. When
and
, it means it is a contract for delivery on the same day. The number 24 in Equation (
25) refers to the 24 h delivery period of the base load futures contract on that day. Equation (26) refers to the amount of futures contracts sold for all held futures contracts, equal to the action of the current smart body.
Similarly, the speculative returns on the peak day futures contracts can be expressed as follows:
where
denotes the power producer’s speculative profit on peak day futures.
When it comes to weeks, we need to use a downward rounding function
to calculate the week to which time
t belongs. When
, it means that the two-week futures contracts are delivered in the same week. Therefore, the speculative returns on the remaining four futures contracts can be expressed as follows:
where
,
,
, and
denote the power producer’s speculative profit on base week, peak week, weekend base, snf weekend peak futures respectively.
The total speculative profits from the six futures contracts represent the profit made by the power producers through speculation on futures:
3.4.2. Profit from the Delivery
When a contractual delivery is made, the power producer is obliged to sell electricity based on the contract price. According to the classification in
Table 1, day futures can be divided into base futures and peak futures, so the rewards also include the returns from both types of futures. At the time
t, The profits of a power producer engaged in day futures trading can be expressed as follows:
where
and
denote the power producer’s delivery profit on base day and peak day futures respectively. When
or
, the power producer received positive profit, which means that is successfully hedges its price risk through the futures contract.
Similarly, the profit on delivery of week and weekend futures can be expressed as follows:
where mod is the residual function.
is used to calculate what day of the week it is at time
t.
,
, and
,
denote the power producer’s delivery profit on base week, peak week, base weekend, and peak weekend futures, respectively.
The total return received by the power producer from making contractual deliveries is the sum of the profits from the delivery of the six futures:
3.4.3. Profit from Recovered Margin
The margin previously paid can be recovered once the power producer has completed the delivery of the futures contract. Therefore, this portion of the proceeds can be expressed as follows:
where
,
,
,
,
, and
denote the power producer’s profit from recovered margin on base day, peak day, base week, peak week, base weekend, and peak weekend futures, respectively.
The total return received by the power producer is the sum of the profits from the recovered margin of the six futures:
3.4.4. Profit from Paid Margin
Power producers must pay a margin when entering into a futures contract; therefore, the profit from this portion is negative. It can be expressed as follows:
where
,
, and
denote the margin paid on day, week, and weekend futures, respectively. Add the three to obtain the margin paid by the power producer for entering into a futures contract:
3.4.5. Profit from the Last Time Step
Since the time step of the MADRL process is finite, gains beyond the maximum episode cannot be recovered. Specifically, to recover the margin on futures that have been purchased but for which delivery has not yet been completed, we add that portion of the returned margin to the reward at time
T. It is important to clarify that this part of the reward is virtual; the early return of the margin will not affect the final total reward. Therefore, this portion of the profit can be expressed as follows:
where
,
, and
denote the returns of day, week, and weekend futures in the last time step, respectively. Adding all three components along with the judgment condition yields the total return at the last time step as follows:
3.4.6. Constraints Penalty
To ensure that the agents’ actions adhere to the constraints, a penalty term is incorporated into the reward function. Specifically,
addresses lower bound constraints, while
addresses upper bound constraints. These penalty terms can be defined as follows:
where
and
are the penalty term coefficients, respectively.
Therefore, the rewards for each agent at time
t can be expressed as follows:
By accumulating the rewards received by the agent throughout the cycle
T, one obtains its return
R as follows:
5. Conclusions
This study addresses the previously unexplored area of providing guidance to power generation companies in navigating the complexities of electricity futures markets. The conventional MADDPG algorithm has been enhanced to account for the high level of unpredictability inherent in these markets. Numerical experiments demonstrate that the model performs effectively in the Nordic power futures market, enabling power producers to achieve better returns from their futures trading activities. This research contributes to lowering barriers to participation in the electricity financial market, thereby promoting the broader use of electricity financial instruments and enhancing market liquidity and stability.
However, there are opportunities for further development. First, in considering futures trading strategies, it is crucial to prioritize risk management alongside the pursuit of optimal returns, as market participants typically aim for higher returns within a specified risk tolerance. Second, modern portfolio theory suggests that systematic market risk can be reduced by diversifying asset allocations. Therefore, when constructing a multi-asset portfolio, careful attention must be paid to the proportionate allocation of assets.
Additionally, this paper focuses on power producers. Future studies could also explore the perspectives of other market participants, such as financial institutions and large-scale electricity consumers in the electricity financial market.