1. Introduction
Electric energy systems have started being controlled by governments; since then, a constant search for improving such systems towards deregulation, seeking a competitive structure and attaining growth by satisfying the needs of society, has been pursued.
Electricity market prices have achieved a relevant importance as a consequence of the deregulation of the electricity sector. The importance of prices for the generation sector has been increasing in the last few years due to the high penetration of renewable energy sources, whose revenues come from selling the energy generated at market prices. Hence, price forecasting is still an active field of research, especially due to the incorporation of new technologies, such as wind and photovoltaic energy.
The high amount of renewable energies in the markets has decreased electricity market prices due to the fact that some renewable energies are offered in the market at zero price owing to their close-to-zero marginal costs. However, other effects in market prices can be observed since some generators could be working at a higher marginal cost to be used as reserves.
Literature Review and Contributions
Deregulation of electric energy systems started with the growth of the industry and the new demands for electricity [
1,
2]. After that, several research lines were created to reduce generation uncertainties, as presented in [
3,
4,
5,
6].
Thus, electricity market price forecasting has a high importance for generators, and electricity price forecasting is performed through different approaches [
7], such as multi-agent, fundamental, reduced-form, statistical and computational intelligence methods.
Neural networks are presented in [
8]. Another approach is based on a combinatorial neural network [
9]. In addition, different models used for the Pennsylvania-New Jersey-Maryland (PJM) and Spanish electricity markets are compared in [
10]. An artificial neural network with the preparation of input data through cluster algorithms is developed in [
11]. The work in [
12] combines an artificial neural network with a clustering algorithm. The works in [
13,
14,
15] present time series analysis, forecasting and control models. Hence, [
16] uses neural networks to forecast day-ahead market prices, while [
17] forecasts through an autoregressive integrated moving average (ARIMA) model. Moreover, some models are based on forecasting the volatility [
18] as a result of generalized autoregressive conditional heteroskedasticity (GARCH) models. In this way, forecasting trends of time series can be useful [
19,
20], as well as the use of filters [
21].
Some forecasting methods are based on the combination or a portfolio of several models, as proposed by [
22,
23,
24,
25].
In this regard, an interesting procedure is presented in [
26], proposing an enhanced hybrid approach composed of an innovative combination of wavelet transform, differential evolutionary particle swarm optimization and an adaptive neuro-fuzzy inference system to forecast electricity market price signals in the short-term through historical data.
Another work to estimate uncertainty uses a statistical approach for interval forecasting of the electricity price [
27] based on a support vector machine (SVM) where some model parameters are estimated by means of maximum likelihood estimation (MLE). A possible accuracy gain from using factor models, quantile regression and forecast averaging to compute interval forecasts of electricity spot prices is evaluated in [
28]. A general survey of support vector machines is shown in [
29]. An ensemble method for weather conditions is described in [
30].
This paper sets out a new stochastic programming model [
31] combining many models whose forecasts are made by way of ARIMA models. Note that any forecast has an error because of future uncertainty.
In this paper, we propose a new stochastic programming model with many input data, which may help to reduce the error, where the combination of models comes from several forecasts. In contrast, perfect input data considering our forecast methodology could achieve a perfect forecast. However, this paper is only focused on the stochastic programming model and its features and not on the best input data for the model.
The main contributions of this paper are as follows:
A description of the effects of input data on the created optimal forecasting portfolio is drawn. An application of forecasting in a real market with real-time series is also presented. A recent period is analyzed because the integration of renewable energy sources in the Spanish electricity market has produced a downward effect on the price from 2010 onwards, where the participation of renewable energy sources on some days (25 February 2015) was higher than 70%, as shown in [
32].
The remainder of the paper is structured as follows:
Section 2 describes the mathematical model, the case study and the input data, and the ARIMA models used are shown in
Section 3; in
Section 4, the results and a discussion are presented; and the conclusions are portrayed in
Section 5.
2. Mathematical Model
The aim of the paper is to create a new model whose final error can be reduced by way of stochastic programming, after combining several forecasting models and their errors. The main decision made is the weight of each model per period seeking the lower error of the combination of forecasts.
A stochastic mixed integer linear programming model to combine the forecasting models and their portfolio (SP) is created. The variables of the stochastic programming model are
.
subject to:
where the objective is to minimize the variables related to the errors (
1), positive
or negative
, in each period
p and scenario
s. The error variables of (
1) are both positive, as shown in (
5) and (
6);
can be positive or negative, as shown in (
3) and (
4), which are not zero through binary variable
; depending on (
5) or (
6); if
= 1,
, whereas, if
= 0,
(negative). Constant
m is a big enough value.
There are three input data, each one being a parameter, namely the forecasted errors , the forecasted trends and the forecasted prices .
The forecasted error (parameter) is an error that corrects the forecasted trends and the forecasted prices, i.e.,
. This parameter is the possible distance between the real price and the forecasted value. When the real price is lower than the forecasted price,
is negative, and variable
is positive (
11), i.e.,
binary variable is zero, so
. The opposite case is
= 1 and
, where
is a positive variable, as shown in (
10).
The forecasting portfolio is evaluated in (
12), where
is the parameter whose values are the forecasts made; one scenario for this parameter is a forecast per period, where variable
is the final price of the portfolio created by the forecasts,
, and the weight that has to be decided,
, whose sum,
, has to be equal to one, as shown in (
13).
could be different from one, even being an interval, and the model should decide what is the best value. In this paper,
is used as in (
13).
Equation (
14) decides the value of
variable; this variable comes from the multiplication of
and the weight
, being
the final price from (
12). Equation (
14) reduces the error whose value is the difference between the variable
and the parameters that are the input data of the model, such as
,
and
. The parameter
represents the trend of the price; thus, more forecasts provide more information for the possible behavior of the real unknown price. On the other hand, the differences between
and
can be corrected through parameter
, but also,
tries to reduce the imbalance between
and the real unknown price. Therefore, the forecasting portfolio could improve the ordinary forecasts, but always following parameter
.
To sum up, the stochastic programming model is composed of three kinds of input data. These input data are: (i) forecasted prices; (ii) errors of the forecasts that show the differences between the real price (unknown), forecasted price and the trend; and (iii) the trend of the prices (it could be obtained through more forecasts) in order to describe the possible evolution of the price. On the other hand, the variables of the model, , depend on the input data and can be calculated through different techniques.
Index p represents the hour, from Hour 1 spanning the time horizon of forecasting, and index s is the scenario of the stochastic programming model; nevertheless, each s scenario of each input datum can be achieved using any technique to forecast the prices, the error and the trend.
3. Case Study
The forecasts are obtained for the Spanish electricity market prices. The prices are quite different for each year;
Table 1 shows a summary of the statistics, and
Table 2 presents the correlation matrix of the Spanish electricity prices from 2010–2015 [
33]. The high standard deviation of prices of years 2013 and 2014 is remarkable, whose values are €20.73/MWh and €21.14/MWh, respectively. These values are two-times the standard deviation of the prices of 2011. This analysis is made using ECOTOOL (2016) [
34], a forecasting toolbox in MATLAB
® (R2011b) [
35].
Figure 1 portrays the scatter plots and the histograms of all of the hourly prices of each one of the six years. Red lines in the histograms indicate the shape of the normal distribution, whilst the green lines describe the real shape of the distribution that follows those data.
Figure 2 depicts the sample of 2016, from 1 January–10 June 2016; the mean price is equal to €29.17/MWh, and the standard deviation of prices is €12.35/MWh. The time series of the day-ahead electricity market prices is transformed using a logarithmic transformation to make the dispersion constant.
3.1. Input Data for the SP Model
As presented in
Section 2, there are three input data: forecasted prices, forecasted errors and forecasted trends. This section shows how these input data are calculated to test the SP model, achieving a portfolio of the forecasted prices, which are input data.
Figure 3 shows the three input data: forecasted prices, forecasted errors and forecasted trends; all of them represent the information used in the SP model. Moreover,
Figure 3 shows the main output data, i.e., the final forecasted prices.
Figure 3 presents how the different inputs are calculated. There are two forecasting processes; first, some forecasts are made for the three days (
d,
, and
) previous to the final day (
) in order to have more information; and second, some forecasts are made only for the final day (
).
The first forecasting process is used to determine the behavior of the errors and trends of the three days previous to day , where the models are similar to the models used in the second forecasting process. The three previous days are utilized because the real prices of these days are known; thus, the behavior of the errors and trends can be calculated from these forecasts. Thirty scenarios of trends and errors are attained for each hour of the day, 3 days × 10 models × 24 h. The 30 scenarios per hour are reduced to five scenarios per hour for the errors and trends. The scenario reduction from 30 scenarios per hour to five scenarios per hour is done by the k-means method using the Euclidean distance, where the centroid is the mean of the points of the cluster of the five scenarios.
The second forecasting process is applied to day
, that is the real forecasted day, and five forecasted prices are obtained from the first five ARIMA models of
Table 3. Five ARIMA models are used because the stochastic programming model presented in
Section 2 is tested utilizing five scenarios.
Note that the forecasting processes previous to the SP model can be done by means of neural networks, support vector machines, ensemble methods or by a mixture of all of them, with the possibility of including other methods.
The information of the input data comes from the use of ARIMA models. The behavior of previous days for each forecast is used in order to obtain the forecasted error and the forecasted trend.
3.2. ARIMA Models
The proposed general ARIMA formulation [
34] is as follows:
where
is the observed time series,
is the residual term,
,
is a set of seasonal periods,
,
,
are the
differencing operators necessary to reduce the time series and to achieve mean stationary,
and
,
are the AR and MA polynomials of the back shift operator
B:
of
, and
c is a constant.
Previously, the time series is transformed through logarithmic transformation to stabilize the variance; after that, ARIMA models can be applied.
Following the formulation of (
16), the indexes of each term for every ARIMA model are shown in
Table 3. The process to select each term of each ARIMA model is based on the evaluation of each autocorrelation function (ACF) and partial autocorrelation function (PACF) of the residual component,
, of each ARIMA model, as shown in
Figure 4 and
Figure 5.
Figure 4 and
Figure 5 portray the ACF and PACF of the residual terms for the ARIMA 2 model for 5 and 6 June 2016. The ARIMA model formulation of the ARIMA 2 model is presented in (
17), where it is easy to identify every term.
3.3. Forecasted Prices
Forecasted prices are obtained through ARIMA models; this case study uses five scenarios. These five scenarios for the input data of forecasted prices are ARIMA 1, ARIMA 2, ARIMA 3, ARIMA 4 and ARIMA 5, as presented in
Table 3, where each ARIMA model represents one scenario. An econometric toolbox of MATLAB
® [
35], ECOTOOL [
34], is used to obtain the forecasts. The sample used to forecast every day with each ARIMA model is 15 days, i.e., 360 h. A sample spanning 360 h has been selected because the sample changes every two or three weeks as a consequence of the price volatility. The computing time increases for a sample spanning more days.
The five scenarios of forecasted prices are depicted in
Figure 6, where one week of forecasts is shown. The forecasted days span from the 4–10 June 2016. After this, the models are verified for one week of each season of 2014, 2015 and 2016, the forecasting horizon being 24 h using a 24-h rolling horizon window for the next day until every day of each week is evaluated.
3.4. Forecasted Errors
Errors are the differences between the real price and the forecasted price, being positive when the real price is higher than the forecasted price and negative otherwise. It is remarkable that, if for the forecasting day the real price are unknown, then the error is also unknown. However, the error can be calculated for previous days since the real price is known. Thus, the ten ARIMA models of
Table 3 are used to make forecasts of the three previous days. As a consequence of using 10 ARIMA models in these three days, the number of scenarios of forecasted errors would be 30 (10 ARIMA models multiplied by three days), but they can be reduced to five scenarios. Scenario reduction is performed through the squared Euclidean distance, and each centroid is the mean of the points in the cluster for five scenarios, reducing them from 30 down to five. The input data forecasted errors are these five scenarios.
The five scenarios obtained from the 30 scenarios are shown in
Figure 7.
3.5. Forecasted Trends
Forecasted trends are made through the 10 ARIMA models of
Table 3. The forecasted trend is made for the three previous days, trying to recover some behaviors of previous days. Therefore, as happened for the forecasted error input data, the forecasted trend has 30 scenarios, three days multiplied by 10 forecasts of the 10 ARIMA models. The scenarios are reduced through the squared Euclidean distance as done for the forecasted error. The five scenarios of forecasted trends are depicted in
Figure 8.