1. Introduction
In many countries, Battery Energy Storage Systems (BESS) are becoming popular due to their advantages in managing power dispatch, interconnection, and demand. Their growing acceptance is due to their ability to smooth out the intermittent and unreliable nature of Renewable Energy Sources (RES). Although RES penetration has shown an increasing trend, their unreliable energy supply makes it difficult to incorporate RES into a modern electricity grid. However, in some niche applications, a variety of BESS is already installed, where they provide operational efficiency and reduce costs by exploiting synergies between storage and renewables (
Barton and Infield 2004;
Black and Strbac 2007;
Kim and Powell 2011;
Teleke et al. 2010). Nonetheless, the implementation of BESS is still not straightforward from economical and technical perspectives because of BESS costs and their sensitivity to deep discharge. In this domain, there is a large body of literature with diverse contributions to economic and technological aspects of BESS management. Furthermore, we emphasize
Yang et al. (
2014) and
Kempener and Borden (
2015), investigating the role of batteries with respect to RES, and
Lu et al. (
2014), on the optimal use of BESS for the so-called peak load shaving.
Let us describe an abstract, but typical framework for BESS application. The traditional electricity market players satisfy consumers’ energy demand by purchasing electricity in advance, usually taking positions in the long-term market. Long-term market may stand for any energy delivery agreements purchased prior to the delivery period (a year, a semester, a month), depending on the situation. However, this market is typically represented by the so-called day-ahead market for hourly delivery on the next day. On the contrary, the imbalances during the delivery period must be compensated, as they occur, at a short-term market. Such short-term energy balancing can either be achieved through complex over-the-counter trading or, more realistically, by participating in real-time energy auction, or by transferring supply from or to electricity grid at the so-called real time grid prices.
Figure 1 provides a simplified illustration of this optimal problem for a producer endowed with renewable generation capacity. However, in the presence of storage and renewable generation facilities, the problem changes. Within this framework, the agent is now required to simultaneously take long-term positions and setting energy storage levels, as shown in
Figure 2. The decision optimization problem becomes significantly more complex due to the uncertainty stemming from the future battery levels, electricity prices, and output of renewable energy.
While there exist numerous types of energy storage systems,
Beaudin et al. (
2010) found that no single storage system consistently outperforms all of the others for all types of renewable energy sources and applications. Hence, for the sake of simplicity, this paper will assume that the energy retailer pictured in
Figure 2 uses a battery device for storing energy. However, our methods and results can easily be extended to other types of storage technologies, or even to the use of multiple types of storage devices. From a real options analysis viewpoint (applied in this context), the incorporation of energy storages into the energy grid also poses interesting investment questions. The work
Bakke et al. (
2016);
Bradbury et al. (
2014);
Locatelli et al. (
2016) examines the profitability of investing (in energy storage), while
Schachter and Mancarella (
2016) questions the suitability of the real options approach, stating that the risk neutrality assumption may not be appropriate.
The integration of electric energy storage systems yields challenging problems of optimal stochastic control type (see
Dokuchaev and Zolotarevich (
2020);
Kim and Powell (
2011);
Oudalov et al. (
2007);
Teleke et al. (
2010), among others). While the traditional power generation considers a sequence of independent decisions, the opportunity of storing electricity intertwines actions made at different times: It is obvious that storage facilities can be charged during the base load low-price hours and discharging during peak load high-price hours. However, how exactly to do this requires mathematical techniques, i.e., dynamic programming (
AzRISE 2010;
Löhndorf and Minner 2010), or the calculus of variations (
Flatley et al. 2016). Apart from rare cases, where a dynamic programming can be addressed analytically (
Bäuerle and Rieder 2011;
Pham 2009), these problems are usually based on numerical techniques and they are addressed by approximate dynamic programming methods (
Powell 2007). Although a huge variety of computational tools has been developed in this area, real-world problems are too complex for existing solution techniques, since the number of decision factors is high. The present contribution extends the state of the art in several aspects.
The first aspect is about modeling a complex financial and economic environment, which electricity retailers are routinely confronted with. We consider energy trading on two time scales: a long-term (realized as day-ahead trading, futures, or forward contracts) and a short-term (designed to adjust unexpected changes in the electricity demand in front of delivery, which can be realized by the so-called intra-day trading, or by diverse energy balancing procedures). Such a structure is typical for all energy markets, with differences in price dynamics, liquidity, and spreads. The distinction between the short-term and the long-term energy trading is of primary importance to correctly determine the economic value of a BESS. The present work extends that of
Hinz and Yee (
2018a), who address BESS management within an over-simplistic setting, merely considering a battery installation as a passive buffer, which settles energy imbalance against electricity grid. In difference to this, our work makes a realistic assumption that the battery levels and energy trading are simultaneously controlled, placing the costs of deep discharge at the core of investigation.
The second aspect is methodological: we attempt solving a class of battery management problems, rather than a specific case. More precisely, here we present a computational methodology, whose routines implement stochastic switching algorithms (written in the scientific language Julia.
1) Our approach realizes a highly customizable solution. Namely, the entire source code of our computations (available on Github) comprises several blocks that serve as place holders and can be tailored to specific situations. For instance, the state space model for electricity prices (including seasonal and mean-reverting components) considered in this work is not attempted to describe any typical electricity price pattern, being a proxy that can be replaced, modified, and adjusted. Such flexibility is ensured by assumptions on linear state dynamics that encompass any ARMA model combined with appropriate seasonal and trend components. The same considerations apply to the modeling of deep discharge costs. We suggest a simple proxy function penalizing deep discharge in a fairly general way allowing for the user to tailor such penalization to a particular battery technology.
The third aspect regards a novel computation technique. In comparison to existing schemes (for instance,
Löhndorf and Minner (
2010)), we optimize energy storage using a combination of primal and dual schemes. More specifically, we apply a sub-gradient method introduced by
Hinz (
2014);
Hinz and Yap (
2015);
Hinz and Yee (
2018a) to obtain, in a first step, an approximate solution of our stochastic control problem whose numerical quality is examined, in the second step, using a dual method (
Hinz et al. 2020;
Hinz and Yap 2015). That is, we provide simultaneous optimization of battery control and long-term trading in presence of the uncertainty that results from RES generation, demand, and electricity price with
guaranteed precision, which contrasts this work from all existing contributions, to the best of authors knowledge.
The paper is organized, as follows:
Section 2 details the model settings.
Section 3 briefly reviews the adopted solution technique.
Section 4 applies the solution technique to the present decision problem.
Section 5 includes an illustrative case study, while
Section 6 concludes.
2. Model Settings
In the following, we present an abstract, but generic, model for an electricity market, where an energy retailer has the obligation to meet the demand of its consumers using a combination of energy from renewable generation, contractual position from long-term market (for instance, day-ahead trading), and short-term market (real-time trading, balancing market) as well as battery storage.
Because, in reality, decisions are made and revised periodically, we propose a dynamic optimization with discrete-time decision-making. The present framework encompasses all important features of real-world energy trading in order to illustrate our methodology in the most general setting. Following this approach, our algorithmic solution can be adjusted to a specific market architecture.
We assume a given finite-time horizon and agree that the unit time corresponds to the energy delivery interval, which can measure hours, days, or weeks, depending on the particular application. At any time , an energy retailer has the obligation to satisfy, within , the unknown electricity demand of its customers, while the retailer’s renewable energy sources produce a random electricity amount. The retailer must trade electricity optimally in advance (at time t) and decide how to control the battery in order to manage the resulting energy imbalance.
The revenue optimization of such retailer is a sequential decision problem under uncertainty. That is, at any time , an action must be chosen (that will encompass both energy trading and BESS control). Such action also influences the transition to the subsequent state (next battery levels), changing all future revenues, costs, and decisions. The minimization of control costs in such setting is naturally addressed in terms of the so-called Markov Decision Theory. In the subsequent paragraphs, we formulate our optimization problem within this framework, which we customize accordingly to be ably applying our approximate solution methodology.
Let denote the residual electricity demand after that all renewable generation has been sold to customers. Here, stands for shortfall and for excess in the delivery period beginning at time t. We assume that the random variable is observed at when the delivery period ends. Assume that, at each time , the producer can take a position in a long-term market to ensure energy supply from the market to the retailer (if ) or outflow from the retailer to the market (if ) within . Further consider the variable , which stands for the decision to use the battery energy , where and represent the charging and discharging actions, respectively. Here, we agree that the BESS control actions must be decided at time t, before the start of the delivery period.
With these assumptions, the energy to be balanced within
through the short-term market that is given by
Let us introduce the random variables
that stand for prices of electricity delivered within the time interval
. Thereby, we assume that
is the long-term market price (that is observed and paid at
t for energy delivered as within
), whereas
and
represent the short-term market prices. Here,
applies for procurement and
for the purchase of energy delivered as constant flow within
. Note that both short-term market prices
,
are not observable at the time
t and become known at
at the end of period
.
Remark 1. The relations between prices of electricity, delivered at different time scales and traded at different market places, has been lively discussed since beginning of energy market deregulation. Naturally, the connection between long-term (day-ahead) and the short-term (balancing auction, real-time) prices heavily depends on risk aversion, variable production costs, production capacities, and delivery commitments of all market participants. For an overview on equilibrium analysis, we refer an interested reader to Hinz (2003) and the literature cited therein. Finally let us denote, by
, the operational costs that are associated with battery management, which must be modeled by a random variable, depending on the recent battery level and on energy delivered/absorbed within
. This quantity will reflect the impact of deep discharge on battery life, expressed in monetary units. At the moment, we postpone the specification of these costs and finalize a one-period revenue from BESS management, as
for
. In order to maximize the expectation of the total revenue
a stochastic control problem must be solved. This problem is dynamical in the sense that, at any time
t, the decision to charge/discharge battery changes the situation at the next decisions time
, which has a profound effect on the next-period planning. Typically, such a sequential decision problem admits no closed-form solution, and it can be computationally challenging.
It is well-known that the demand fluctuations follow a complex seasonal pattern and they are difficult to model, particularly by Markovian processes required for the state dynamics. At this point, we suggest a significant simplification: it turns out that, under generic assumptions, the demand modeling can be split off from the strategy optimization. We show that merely one-step prediction (conditional expectation on the most recent information) of the energy demand is relevant. That is, the problem that is addressed in this work separates into two distinct steps:
- (i)
Establishing a time-series model for the dynamics of the energy demand that serves at any time t the conditional expectation of the demand occurring within .
- (ii)
Solving the stochastic dynamic control problem, where the demand prediction is not a decision variable at time t, because the optimal long-term position is calculated as a deviation from the demand prediction.
- (iii)
Running an optimal policy. For this, the prediction of the demand must be available.
Note that the second step is disentangled in the sense that, for (ii), the dynamics of the demand prediction is irrelevant. On this account, we only consider (ii) in the reminder of this work. However, notice that, for strategy implementation in (iii), the demand prediction must be available at any decision time. However an advantage is that the user can alter or replace the entire demand prediction model without re-calculating the sequential decision strategy.
Let us establish such an approach and make some assumptions to obtain a model that can be solved by our numerical methodology. We suppose that there is finite set
A of all possible actions. At any time, the controller chooses an action via an (optimal) decision rule that we determine later. The actions can be defined in an abstract way, or they can be identified by integers (or indexed by integer vectors), and their meaning for control is merely established by some functions that are defined on actions
A. In the literature, such functions are known as look-up tables. In our case, to model diverse choices of the trade volume in the long-term market, a function
is used with the following interpretation: given the prediction
of the demand
occurring within
, for the action
chosen at time
t by the controller, the energy volume that is traded in advance (long-term market) is
The quantity is the energy (bought if , sold if ) on the top of the predicted demand and it will be referred to as a safety margin. The function f must be chosen in advance by the decision maker and it typically consists of fixing both the granularity and range for safety margins.
Similarly, the battery management variable is also determined by the action chosen at time t (immediately before the start of delivery period ). Here, we again use an appropriate function on A, but this time the modeling is more complex, as the energy absorbed/delivered by the battery must take the current battery level and the physical constraints into account. To detail this, we suggest discretizing the battery levels by a finite set P. Having chosen action at time , we suppose that the current battery level transforms to the next level in terms of a pre-specified level change function representing the technical restrictions of the battery (total capacity, electrical power). For instance, can have values above and below within a range representing one-period charge/discharge power restrictions, if is in one of the intermediate battery levels. However, if is the highest (the lowest) level, then only take values below (above) . Specifying the function l requires some details of battery technology used, in particular for determining the maximal charge/discharge intensity along with the highest and lowest (admissible) battery levels.
Notice that, with this convention, the energy amount transformed from/to the storage is given by
, where
with the constant
standing for battery efficiency. With these assumptions, we express the energy imbalance (
1) in terms of action
and battery level
while using prediction error
as
That is, an action
not only triggers transition in battery level from
to
, but it also determines the the energy amount that needs to be balanced against at the short-term market. Having defined the excess and shortage of the imbalance (
6) by
the profit/loss from balancing at time
is modeled by
Now, using (
4), the financial position for action
is
(where
is interpreted as the cost to guarantee energy from the long-term market.)
Finally, let us model the storage costs by
reflecting the dependence on action
and battery level
. The details of the costs must be described by an appropriate function
specified in accordance to battery technology.
With the assumptions (
7), (
8), and (
9), the profit/loss that is associated with action
depends on prices
,
,
, the demand prediction
, and battery level
as
Observe that the term
depends neither on the action
nor on the battery level
. Because this quantity can not be changed by the decision optimization, the action-dependent part of the reward is modeled in terms of
In what follows, we show how to obtain a strategy that simultaneously takes positions on long-term market and controls a battery level to maximize the expectation of the demand-adjusted total revenue
Remark 2. Note that (11) differs from (3) by a random variable , not depending on the control policy. On this account, maximizing the expectation of constitutes a solution to our problem. As mentioned above, such an approach avoids tedious modelling of energy demand dynamics. However, notice that the results cannot be used directly: for strategy implementation, the one-step demand prediction must be available at any time, thus a demand model is required to run the policy. 3. Research Methodology and Solution Techniques
This paper utilizes a novel numerical technique for sequential decision optimization, which requires a number of specific assumptions. To make this technique applicable, we model the battery storage optimization. In what follows, we present, in detail, our methodology and elaborate on the assumptions required. However, to give the reader an orientation, let us highlight some of the most important aspects beforehand. The technique represents a combination of a dual and a primal approach. Thereby the primal methodology delivers an approximate numerical solution whose quality is examined using duality methods. The primal solution is based on a specific function approximation method requiring convexity. Linear state dynamics is assumed in order to retain convexity through the backward induction. The dual part is based on solution diagnostics that can be viewed as Monte Carlo-based backtesting with variance reduction. This technique has been used extensively for optimal stopping problems and it has been extended to our context. Because of the combination of primal and a dual methods, the methodology delivers high performance with guaranteed accuracy.
Sequential decision making is usually encompassed by discrete-time stochastic control and it is addressed by
Markov Decision Processes/Dynamic Programming. This theory provides a variety of methods. However, approaching analytical solutions may be cumbersome (
Bäuerle and Rieder 2011;
Pham 2009;
Powell 2007) and numerical approximations may often be far more practical. This work will utilize an implementation of fast and accurate algorithms (see
Hinz and Yee 2018b) to address specific control problems, assuming a finite-time horizon, a finite set of actions, convex reward functions, and a state process following a linear dynamics. Although these assumptions are restrictive, they encompass a large class of practically important control problems and yield approximate solutions with excellent precision and numerical performance. Let us briefly describe this approach.
Suppose that the state space is a Cartesian product of a finite set P and . Furthermore, assume that a finite set A represents all possible actions. Given a finite-time horizon , consider a fully observable controlled Markovian process that consists of two parts.
Stochastic switching: Referrers to the evolution of the discrete component , which is described by a finite-state controlled Markov chain, taking values in a finite set P. This means that, at any time , the controller chooses an action a from A in order to trigger the one-step transition from the mode to the mode with probability , where are pre-specified transition probability matrices for all .
Linear dynamics: referrers to the evolution of the continuous component
, which is assumed to follow an uncontrolled evolution of such a component in the Euclidean space
. The evolution is modeled by the recursion
where
are independent
disturbance matrices.
State evolution: that is, the transition kernels
governing the evolution of our controlled Markovian process
from time
t to time
are given, for each
, by
that acts on each function
, where the above expectations are well-defined.
Costs of control: if the system is in the state
, the
rewards of applying action
at time
are given by
. Having arrived at time
in the state
, a final
scrap value is collected. Thereby, the reward functions
, as well as the scrap function
are exogenously given for
. At each time
, the
decision rule is given by a mapping
, prescribing at
t an action
in a state
. Note that, at each time, the decision rule refers to the recent state of the system, representing a so-called
feedback control.A sequence
of decision rules is called a
policy. For each policy,
, the policy value
is defined as the total expected reward
In this formula, stands for the expectation with respect to the probability distribution of that is defined by Markov transitions from to that are induced by the kernels for , started at the initial point .
Optimization goal: a policy
is called optimal if it maximizes the total expected reward over all policies
. To obtain such a policy, one introduces, for
, the so-called
Bellman operator
acting on all functions
v where the stochastic kernel is well-defined. Consider the
Bellman recursion, which is also referred to as backward induction:
Assuming that the reward functions are convex and globally Lipschitz (in the second variable) and the disturbance matrices
are integrable, there exists a solution
to the Bellman recursion. Such functions
are called
value functions, they determine an optimal policy
via
for
and
.
Remark 3. In applications, sequential decision problems frequently appear in a slightly different formulation than given above. Usually, the costs of control depend on both the recent and the next state. That is, instead of a previously introduced reward for taking action a in the situation , a modeling may naturally suggest where the action a is taken at time t in the situation but reward is observed and returned at with a random outcome depending on the next-time situation . Fortunately, this context is seamlessly covered by the formal setting introduced above. It turns out that, since the expectation of reward is being maximized, a pre-conditioning on the information that is available at time t can be applied. That is, having determined the control rewards as being next-state dependentfor each , and one averages themto obtain the usual reward functions, as introduced in the standard setting (3). Approximate solution: in order to obtain a numerical solution to the above Markov Decision problem, one needs to approximate the true value functions and the corresponding optimal policies . Because all reward and scrap functions are convex in the second variable, the value functions are also convex and they can be approximated by piecewise linear and convex functions.
Primal solution method: our approach is based on the observation that, for convex switching systems, the value functions in the backward induction are obtained by applying the following three operations to convex functions:
To obtain an efficient (approximative) numerical treatment of these operations, the concept of the so-called
sub-gradient envelopes was suggested in
Hinz (
2014). A sub-gradient
of a convex function
at a point
is an affine-linear functional supporting this point
from below
. Given a finite grid
, the sub-gradient envelope
of
f on
G is defined as a maximum of its sub-gradients
which provides a convex approximation of the function
f from below
and it enjoys many useful properties. For our purposes, the following observation is crucial:
If the function f results from operations in (18) applied to a (large) number of convex and piecewise linear argument functions , then can be obtained efficiently, unlike the function f itself. The reason is that the sub-gradients of
f are determined by sub-gradients of argument functions
on grid points only. Thus, all of the operations can be carried out sub-gradient-wise, namely, observe that the summation can be done on the level of sub-gradients
Furthermore, maximization requires merely sub-gradients of the maximizing function at each grid point
Finally, the sub-gradient envelope
of the composition of an argument function
with a linear mapping
W can be obtained from the composition of all sub-gradients
participating in
with
W as
The crucial point of our algorithm is a treatment of piecewise linear convex functions in terms of matrices. To address this aspect, let us agree on the following notation: given a function
f and matrix
F, we write
whenever
holds for all
, and call
F a
matrix representative of
f. It turns out that the sub-gradient envelope operation
acting on convex piecewise linear functions, corresponds to a certain row-rearrangement operator
acting on the matrix representatives of these functions, in the sense that
Such a row-rearrangement operator
that is associated with the grid
acts on each matrix
F with
d columns, as follows:
Let us explain in what sense the properties (
20)–(
22) are mirrored on the side of matrix representatives. Assume that the piecewise linear and convex functions
are given in terms of their matrix representatives
, such that
As a direct consequence of (
20)–(
22) and the definition (
23), it holds that
where the operator ⊔ denotes binding matrices by rows. Using the sub-gradient envelope operator, define the double-modified Bellman operator as
where the probability weights
correspond to the distribution sampling
of each disturbance matrix
. The corresponding backward induction
yields the so-called double-modified value functions
. Under appropriate assumptions on increasing grid density and disturbance sampling, the double-modified value functions uniformly converge to the true value functions in (
14) on compact sets (see
Hinz 2014). Let us present the algorithm from
Hinz (
2014) for calculating the modified value functions in terms of their matrix representatives:
Pre-calculations: given a grid
, implement the row rearrangement operator
and the row maximization operator
. Determine a distribution sampling
of each disturbance
with corresponding weights
for
. Given reward functions
and scrap value
, assume that the matrix representatives of their sub-gradient envelopes are given by
for
,
and
. The matrix representatives of each double-modified value function
are obtained via the following matrix-form of the approximate backward induction in (
27) and (28):
Initialization: start with the matrices
Recursion: For
and for
, calculate
This algorithm is depicted in the Algorithm 1.
Algorithm 1: Value Function Approximation |
|
Having calculated matrix representatives
, approximations to expected value functions are obtained as
for all
,
,
and
. Furthermore, an approximately optimal strategy
is obtained for
by
Dual diagnostics method: let us now turn to the diagnostics method following
Hinz and Yee (
2016) whose proof is found in
Hinz and Yap (
2015). Suppose that a candidate
for approximatively optimal policy is given. To estimate its distance-to-optimality, we address the performance gap
at a given starting point
. For this, we construct random variables
,
satisfying
The calculation of the expectations , and is realized through a recursive Monte Carlo scheme with variance reduction, which yields approximations to , and , along with appropriate confidence intervals.
For a practical application of the bound estimation, we assume that an approximate solution yields a candidate
for an optimal strategy, as in (
33), based on approximations
of the value functions from (32).
Bound estimation:
- (1)
Chose a path number K and a nesting number to obtain for each and independent realizations of the random variables .
- (2)
Define, for
, the state trajectories
recursively
and determine all of the realizations
for
- (3)
For each
, initialize the recursion at
as
and continue for
,
by
- (4)
Calculate the sample means
to estimate the performance gap
from above and below, possibly using in-sample confidence bounds.
This technique is depicted in the Algorithm 2 and is usually referred to as
pathwise stochastic control and it has gained increasing popularity over the recent decades. We refer the interested reader to
Hinz and Yap (
2015) and the literature cited therein for the technical details. Such a stochastic control exhibits a helpful
self-tuning property. The closer the value function approximations resemble their true unknown counterparts, the tighter the bounds in (
36) and the lower the standard errors of the bound estimates. We provide an application of this technique to the above battery control problem in what follows.
Algorithm 2: Solution Diagnostics |
|
4. BESS as Stochastic Switching with Linear State Dynamics
Let us construct a model that fulfills all of the assumptions of
Section 2, such that the methodology presented in
Section 3 becomes applicable. For this, we introduce the four-dimensional uncontrolled state evolution
carrying a constant entry in its first component. This is a minor increase of state dimension, allowing to encompass a broad class of dynamics while fulfilling linear (
12) restriction. Let us agree that the processes
,
,
, and
are functions of the components of
, as it follows:
where the deterministic affine-linear transformations
, with
, appropriately describe trends and seasonal patterns.
For a numerical case study, let us more specifically address the above framework. First, we suggest modeling the long-term price component as a function of an auto-regressive process. Therefore, consider a sequence
of independent standard normal random variables and introduce the auto-regressive state process
, such that
with parameters
,
, and
. To embed the evolution (
38) into the state process
, recall that the first component is equal to one for
, which allows for the desired linear dynamics
Other components can be modeled similarly as time dependent affine-linear functions of auto-regressions. For simplicity, we suggest independent identically distributed random variables
that yield a linear state dynamics (
12) with the following disturbance matrices
Here,
is a sequence of independent multivariate standard normally distributed random variables. For the dynamics (
37), the state variables must be scaled and shifted appropriately. The seasonality is reflected by functions
with deterministic shift
and scale
coefficients,
.
To describe the evolution of the controlled part
of the state dynamics, we assume that the finite set
P includes battery levels, which are equidistantly spaced with step size
between levels
and
. Given the level change function
, the transitions are not random:
Having defined the state evolution
and the processes (
37) via functions (
41) on states, observe that the rewards (
10) depend on both the current and the next state, as in (
16):
Indeed, due to (
37)
is a function of
, while
,
,
, and
are functions of
. Using (
17), we transform (
43) to the standard form of the reward:
In this equation, the expected surplus
and shortage
of the imbalance are obtained as integrals:
and
respectively, for all
and
, where
denotes the normal distribution with mean
and variance
. Furthermore,
and
are the expectations of
and
at time
t, as given by
With almost all of the ingredients now in place, we define the reward functions, in accordance to (
10), by
for all
,
,
, and
. Finally, let us introduce the last component—the scrap function. Here, we assume that the entire electricity from the BESS can be sold in the long-term market at time
T:
for
. With these definitions, we have formalized the optimal management of battery energy storage systems as a stochastic control problem and can address its numerical solution in the next section.
Remark 4. Note that the reward functions (48) and the scrap function (49) only depend on the second component of the state variable . That is, modeling the state evolution using linear dynamics can be reduced to the first two components, as in (39). 5. A Numerical Illustration
Consider a BESS with a total capacity of
MWh. We assume that the positions
represent a grid of all feasible battery levels that range from the minimum level
to the maximum level
. Such a discretization of battery levels (which are continuous by their physical nature) is a tribute that we have to pay to make our optimal switching approach applicable. However, our the numerical procedures are efficient, and the discretization can be realized at a sufficiently fine granularity. Further, assume that the space of actions is the Cartesian product of two finite sub-spaces:
with the interpretation that, by taking the action
, the retailer chooses a certain safety margin
through the first action component
and determines, at the same time, a potential battery charge/discharge
via the second action component
. In fact, we assume that both sets, that of the safety margins
and that of the charge/discharge
are represented by discrete grids, which range from their minimum values
and
to their maximum values
and
, respectively. In this setting, the safety margin function is given by
and the BESS management variable is defined by
for all
and
. With this variable, the loss due to battery inefficiency is described as in (
5).
The state process
is modeled as in
Section 4 with parameters
,
, and
. To reflect a trend and a seasonality in the price evolution, we assume
with deterministic coefficients
and
, where the parameter
represents the period length.
Figure 3 depicts the state process
and the long-term electricity prices in Euros
for the parameters that are listed in
Table 1. Further, we obtain the expected short-term prices:
where we set
and
for all
.
Finally, we suggest modeling deep discharge costs by the following function:
with parameters
. Note that this function depends on the ratio
, which measures the depth of discharge. Such a function increasingly penalizes the total reward as the battery level approaches zero. To examine the effect of this penalization, we compare the optimal battery levels in
Figure 4, which depicts level evolutions under the assumption that the battery starts at the lowest level at time
. The bottom plot shows that, in the absence of deep discharge costs (
), low battery levels are reached routinely. On the contrary, the upper plot shows that battery levels rarely fall below
of the total capacity.
Furthermore, let us illustrate the safety margins in
Figure 5. Because there is no significant difference between both graphs, deep discharge costs seem to have a moderate impact on safety margins. In both graphs, we merely see a tendency to buy energy through higher safety margins when electricity prices are low.
Finally, we provide a brief discussion on the value function, which is illustrated in
Table 2 and in
Figure 6. Each row of
Table 2 corresponds to a discretized battery level. The columns “Lower interval” and “Upper interval” include the empirical confidence intervals for the lower and upper estimate of the value function, respectively. This calculation was based on the assumption that the initial electricity price,
, was equal to 10. The confidence bounds, obtained by the diagnostic methods in
Hinz and Yap (
2015) based on a pathwise dynamic approach, are tight, which certifies the high precision of our solution obtained in terms of the sub-gradient method described in
Section 3.
Figure 6 depicts the approximate value functions delivered by the sub-gradient method for a range of values of the initial state variable
(represented on the horizontal axis), with different curves standing for different initial battery levels
. Here, we notice that the value function is increasing in the initial battery level
(the energy that is owned at the beginning yields a certain return) and it interacts with the initial state variable,
, which is, also, by construction, the initial electricity price
. Recall that a higher price at the beginning causes subsequent prices to be high on average (by the increasing trend of the auto-regressive state process). Therefore, if the battery is well charged at time
, then the retailer can sell electricity (within the time horizon) and obtain a substantial profit; if, on the contrary, the initial level of the battery is low, the retailer must pay more for the initial charge.
Notice that the numerical results for the value function do not allow a direct interpretation in terms of total revenues. The reason is that the rewards of our model, in (
10), do not take the retailer’s income from fixed delivery contracts into account (refer to the remark after (
10)).
Finally, we exemplary elaborate on a typical economic application addressing a stylized investment and capacity allocation problem. In this context, one of the most important questions is to determine the optimal installed capacity and the type of the battery. Having assumed zero costs of deep discharge,
Figure 7 depicts the value function, starting with an empty battery, in dependence on storage capacity. Let us refer to this value as the initial storage value. In line with intuition, a higher storage yields a higher value that is represented by a monotonically increasing concave curve. Because the initial investment in BESS is usually linear in the capacity put in place, this curve could be used to determine the optimal capacity by equating the marginal value of the storage to that of the marginal investment.
Our numerical experiments suggest that dealing with twenty to fifty equidistant levels shall yield sufficiently precise results (i.e., the numerical outcomes do not change significantly if the granularity becomes finer). However, the discretization of actions (that only yields a finite number of safety margins and battery controls) is a delicate issue. Here, it may be advisable to compare numerical outcomes from several models. Still, our experiments suggest that the optimal strategies are of bang-bang type, meaning that they apply just few extremal (usually largest and smallest) safety margins and battery charging/discharging actions. For this reason, we believe that good results are achievable by small action spaces.