3.2. Objective Function
MDP assumes that each action can be completed in a decision period, then the dynamic programming is solved by the optimization options of multiple state transitions. Therefore, power planning based on MDP can be explained as the power system periodically extracting environmental information: first interacting with the environment for more complete state observations and decisions selection, then performing an action on a specific state in the current environment, and finally moving to the next environment state to begin a new decision cycle. In general, such an MDP can be represented by a multivariate group, that is:
where
represents a finite set of states (current state of the system);
is the decision set corresponding to the state
;
is the return function for strategy
on the state
;
is the transfer matrix for state
by strategy
on the state
; and
is the strategy from
in state x at moment
i [
30].
Therefore, for a dynamic generation–transmission-integrated planning model, in this paper, the stage in the planning process was denoted by
i, where
i = 1, 2, …,
I. So, firstly, the planning stages in this paper covered the period from 2018 to 2050; we used 2018–2020 as the first stage, then considered a new planning stage every five years, assuming that the annual power investment was completed at the beginning of the year. Secondly, the state set of the power system at stage
i is
S and a possible state is denoted as
; the solution to the stage planning is represented by strategy
, which is the basic element that makes up a decision set
and also reflects a mapping from the states set to the actions set. If the policy set
at stage
i is adopted,
corresponds to a strategy from the decision set
. Then, the value function of a Markov programming model for the power generation and grid integrated planning can be expressed as:
E is used to describe the total social cost expectations of state
in the strategy
, during the sub-state evolution from
. For
we indicated the initial state
. Thus, the evolution process of the system from state
to state
could be described by a state transition function
, Therefore, the process of the system state change could be defined by the previous value function, and the recursive equation for
is as follows:
For a power generation and grid integrated planning model:
where
T is the planning period;
represents the set of regions contained in the planning;
represents the type of power source;
is the capacity of power source
s in region
r at planning stage
t;
is the retired capacity of power source
s in region
r at planning stage
t;
is the installation cost per unit capacity of source
s;
is the annual utilization in hours of source
s;
is the levelized cost of electricity;
,
, and
are the carbon dioxide, sulfur dioxide, and nitrogen oxide emission coefficients of power source
s at stage
t, respectively;
is the unit carbon tax price; and
,
are the effluent charges of sulfur dioxide and nitrogen oxide, respectively.
is the set of transmission channels;
is the transmission capacity of channel
g at stage
t;
is the construction cost of
g; and
R is the discount rate [
31]. This equation gives the staged objective function of the dynamic model. There are six elements in brackets: the first one is the planning costs of the power sources expansion, the second one represents the operation and maintenance costs of power source production under staged power structure, the next three ones represent the carbon and pollutant emission costs, and the last one is the costs of power grid expansion.
The above function describes the planning mechanism of the power-grid integrated planning model, and the dynamic processes are reflected by the state transition matrix, which expresses the change of the time value of capital, technology diffusion, policy mix, and renewable power permeability during each planning stage. Thus, , here, is equivalent to the state transition function in MDP, which indicates the varying feed-in price of generation technologies that is caused by the action of technology diffusion and policy scenario in different stages.
3.3. State Transfer Function Considering Technology Learning and Incentive Policy
In this paper, the state transition matrix is reflected by the variation of the
of each power generation technologies, which is influenced by the technology diffusion and policy scenarios in each stage. The
of an electric power project is the power generation cost, which achieves the lowest expected rate of return, that is, the net present value of the power project is zero. Therefore, the
of power technology can be expressed as:
where
is the construction cost of the power project,
N is the operation period of the project,
r is the discount rate. Thus, it can be seen from the above equation that the present value of the
times the annual electric power output
equals the present value of the project expenditure.
Therefore, it is possible to divide the cost of power generation project into three parts, i.e., construction cost, operation cost, and external cost. The construction cost includes the land cost, equipment cost, installation cost, relative taxes during the construction period, etc. Operation and maintenance (O&M) costs include production costs, maintenance costs, fuel costs, and various taxes and fees during the operation stage. The external cost of in this paper mainly refers to the economic loss of the external environment caused by output contaminants such as carbon dioxide, sulfur dioxide, nitrogen oxides, and other pollutants in the power generation process. On this basis,
considering the impact of externalities of power production and low-carbon policy is as follows:
where
is the year of planning,
, and
N is the life span of the power project;
x is the type of power generation technology;
is the annual value of unit investment cost during the construction period of power technology
x, under the unit installation cost price in year
nth;
is the taxes in the construction period of power technology
x, under the unit installation cost price in year
nth;
is the operation and maintenance cost of the power technology x in production at the
nth year price level, and
is taxes related to the operation and maintenance in production of the power technology
x in the policy scenario of year
nth;
is the external cost of the power technology
x in production in the policy scenario of year
nth;
is the newly installed capacity of the power technology
x in year
nth, and
is the average utilization hours of the power technology
x in year
nth.
In the calculation of the construction costs, the annual value of the unit capacity and the installation investment cost of power technology account for the largest proportion. Thus, in the calculation of the electricity generation cost, considering the time value of capital, the equivalent annuity approach was used to divide the initial investment into each year. Hence, the annual value of the construction cost of power technology including the taxes couls be calculated as follows:
where
is the unit investment cost of power technology
x, under the unit installation cost price in year
nth;
is the present value coefficient of annuity, and
r is the discount rate.
The calculation equation of annual O&M cost of power technology
x is:
where
is the fuel coefficient of the power technology
x,
is the price of the fuel,
is the other fixed operation cost coefficient,
is the taxes relevant to electricity generation and O&M processes.
For the external cost, the cost of environmental pollution comes in the form of taxes on pollutants in the electricity production process, such as carbon tax and pollutant emission tax. It is calculated as follows:
where
is the tax on fossil energy resources determined by the policy scenario,
is the pollutants emission taxes, and
is the pollutant discharge coefficient of the power technology
x.
Finally, the learning curve usually describes the trend of power technology diffusion. Therefore, in order to describe the development process of the power system more accurately, the signal factor learning curve was incorporated into the equation.
If the diffusion trend of power technology
x in stage n can be represented by the learning curve
, the formula for calculating the annual value of the initial investment cost of power technology
x with the learning curve is:
can be decomposed into:
where
is the initial cost of the power technology
x in stage
n,
is the minimal cost of the power technology
x in stage
n,
is the initial capacity of the power technology
x in planning stage, and
is the learning rate in ‘learning by doing’.
The diffusion of power technology will have a great impact on the construction investment and power generation cost. By setting up the state transfer function, which reflects the power technology diffusion and incentive policies scenario, the dynamic power system integrated planning model could be established.