1. Introduction
Electricity price forecasting has been a very active research field in the last 15 years because the hourly price for the electric energy that will be settled in the pool constitutes very valuable information if it could be known in advance: any agent involved in the electricity market would use this information to prepare his/her bids strategically in order to obtain the maximum profit. An accurate price forecast for an electricity market has a definitive impact on the bidding strategies and even on the price negotiation of bilateral contracts [
1]. Furthermore, an accurate price forecast has direct consequences on the producers’ electric energy management, and it could also influence the consumers’ demand response [
2].
An excellent state-of-the-art review about electricity price forecasting can be found in Ref. [
3]. The development of short-term electricity price forecasting (STEPF) models has received all the attention because of their immediate application in biddings strategies in the daily market. Basically, for the STEPF problem, two main modelling approaches can be identified in literature: classical models [
1,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14] and computational intelligence models [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34]. Some published works present two or more models for comparative purposes with the two approaches [
7,
8,
12], and others present models with a hybrid, thus combining both of them [
35,
36,
37,
38].
Classical STEPF models described in literature apply techniques as the conventional multiple regression [
4], time-varying regression [
4,
5,
6], ARIMA (Auto-Regressive Integrated Moving Average) [
1,
5,
6,
7,
8,
9,
10,
11,
12] and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity) [
13]. These models usually use diverse physical data, especially if explanatory variables are needed. In this sense, models based on ARX (Auto-Regressive with eXogenous variables) and ARMAX (Auto-Regressive Moving Average with eXogenous variables) structures [
4,
7,
10,
11,
12] use variables such as power demand and wind power production, trying to represent relationships which affect to the trading of electricity; sometimes the forecasts for these variables are obtained from other forecasting models [
6,
8,
23]. Some models utilize the wavelet decomposition to improve the forecasting results [
7,
13] or decomposition in deterministic and stochastic components to provide spot and interval forecasts using a recursive dynamic factor analysis [
14].
Computational intelligence STEPF models are based principally on artificial neural networks [
7,
8,
12,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29] and fuzzy systems [
15,
16,
30,
31,
32,
33,
34]. Similarly to classic models, besides past values of the hourly electricity price, some of the proposed models use forecasts of the power demand [
23,
24,
31] and the wind power production as input [
8,
27]. In addition, the wavelet decomposition is used to pre-process input data and get better forecasts [
20,
28,
32,
34]. More advanced computational techniques such as optimization methods (genetic algorithms and particle swarm intelligence) are proposed to improve the learning process for a neural network based model [
21], and to find the optimal structure (membership functions) for a neuro-fuzzy model [
33,
34]; or as new learning algorithms such as the Extreme Learning Machine method for single hidden layer feed-forward neural networks [
24,
28] or as enhanced capabilities to obtain not only point forecasts but also prediction intervals [
17,
24].
The combination of classical and computational intelligence approaches has allowed the development of some STEPF models with hybrid approaches such as a support vector regression model combined with an ARIMA model [
35]; a wavelet decomposition pre-process, an ARIMA model and a radial basis function neural network whose structure is chosen by means of a particle swarm optimization [
36]; an auto-regressive fractionally integrated moving average model in parallel with a multilayer perceptron neural network (MLP) [
37]; or a wavelet decomposition pre-process, ARIMA models combined with MLPs [
38]. The most recent works describe probabilistic models that aim to overcome the limitations of point forecasts [
39,
40].
In general, most of the published papers are focused on the description of the forecasting technique: there are advanced versatile techniques with small differences concerning accuracy when they are applied to a given STEPF case (same variables and same time period). Some authors compare the results obtained with their models with respect to those reported in other works, using the same data and the same period. In [
15], the forecasting results obtained with different models using different input variables were compared. Just the selection of the input variables has been appointed as one of the directions that the STEPF problem will or should take over the next decade [
3].
This paper is not concentrated on forecasting techniques, but it is focused on the “forecasting modelling”, that is, on the analysis of extensive sets of explanatory variables and their influence in price forecasts. This forecasting modelling also allows the identification of input data and processing structures suitable to be applied to specific electricity markets, that is, to other electricity markets different from the Iberian Electricity Market (MIBEL).
This paper presents mainly two models related to the MIBEL:
An Explanatory Model for Price Forecast (EMPF model) for day-ahead hourly price forecasts, at a regional level, which integrates a wide set of explanatory variables, which include regional aggregation of power multi-generations and power demands, recent prices, broad hourly time series records of weather forecasts as well as other chronological information. This EMPF model constitutes, as shown later (
Section 2 and
Section 4), the best explanatory model for forecasting purposes from point of view of including the most complete and suitable set of explanatory input variables.
An innovative Reference Explanatory Model for Price Estimation (REMPE model) for hourly price estimations based on actual power multi-generations and actual power demands of such day, that is, based on real data of explanatory variables. It allows the calculation of the best hourly price estimations and the corresponding error that represents the lowest limit of forecast error values reachable with the used explanatory variables. It should be noticed that the REMPE model is not a forecasting model because actual information of multi-generations and power demands are not available for day-ahead hourly price forecasting. As shown later (
Section 3), this REMPE model will determine the lowest limit of the forecasting error than the EMPF model could achieve what will allow evaluating the quality of the forecasting performance of the EMPF model.
The MIBEL, which covers the mainland of Portugal and Spain, has been used for testing the explanatory models in this paper. The building of the extensive data set, used for this real-life case, needed to overcome difficulties to gather such data as well as to join synchronized information from both countries. Different combinations of explanatory input variables of the EMPF model have been developed to determine which are the most important among the total set of such variables for the hourly price forecasts of the studied electricity market.
Lastly, other explanatory models (SEMPF models), which are simpler than the EMPF model, have been built by considering only some of the types of input information, and they have been also applied to the MIBEL, what allows the analysis of the value of the different types of input data for purposes of price forecasts.
The evaluation of forecasting performances of the explanatory models presented in this paper and the analysis of influences of their explanatory variables in price forecasts constitute valuable information for MIBEL market agents and for the electric energy industry.
The structure of this paper is as follows:
Section 2 contains a description of the time framework for the day-ahead electricity price forecasting and the data characterization corresponding to the MIBEL for hourly price forecast purposes.
Section 3 describes the Reference Explanatory Model for Price Estimation (REMPE model) for hourly price estimation, utilizing actual power generations and power demands of such day.
Section 4 presents an Explanatory Model for Price Forecast (EMPF model) for day-ahead hourly price forecasts, at a regional level, using hourly time series records of weather forecasts, previous prices and regional aggregation of power generations and power demands.
Section 5 contains a comparison among other Simpler Explanatory Models for Price Forecast (SEMPF models) showing advantages and disadvantages of such models. Lastly, the conclusions of this paper are presented in
Section 6.
2. Time Framework and Data Characterization for Explanatory Models
The time framework in this paper for the day-ahead MIBEL price forecasting is described in sub-
Section 2.1. Afterwards, sub-
Section 2.2 presents the data characterization corresponding to the MIBEL for the hourly price forecasting purposes of the explanatory models of this paper.
2.1. Time Framework
In this context, short-term forecasting models provide the hourly prices of the day-ahead, allowing the preparation of bidding offers to the electricity market and the implementation of other power system operation functions.
The general time framework of the EMPF model for day-ahead price forecasts is illustrated in
Figure 1. The price forecast
is obtained, at hour
t of the day
D, for each hour
h of the 24 h in day
D + 1. In most of the European markets, the hourly market price for day
D + 1 is calculated at 12:00 of day
D and electricity market biddings have to be created at least one hour before. The price forecast must be delivered some hours before the bidding limit hour. For our case, we assume a delivery of the price forecast in the first hour
t = 0 of day
D; however, in practical applications, the price forecasts are delivered in the first hours of day
D.
In the moment when the forecasting process is carried out, at hour t = 0 of day D, the price for each hour h of the 24 h in day D, pD,h, is known and it can be used as input to achieve the value of . The day of the week and hour h of the day D + 1 are also possible inputs for short-term forecasting models. The weather forecasts at time t = 0 of day D, for the geographical region corresponding to the electricity market and for hour h of day D + 1, are also potential inputs. In fact, regional weighted forecasted hourly wind speeds , regional weighted forecasted hourly temperatures , and regional weighted forecasted hourly irradiations are possible explanatory variables with valuable forecasting information.
Figure 1.
Time framework for price forecasts of the EMPF model.
Figure 1.
Time framework for price forecasts of the EMPF model.
In the work described in this paper, for the Iberian Peninsula (mainland of Spain and Portugal), more than 750 mesoscale geographical points located in areas with installed wind farm capacity were used to obtain the forecasts of regional weighted hourly wind speed. The resulting forecasts were calculated as a weighted average of the regional wind speed forecasts; the weighting factors were proportional to the installed regional wind power capacity. A similar approach were used to obtain weighted forecasted hourly irradiation values with weighting factors proportional to the installed regional aggregated solar plants capacity. Analogously, the final weighted forecasted hourly temperature was computed based on regional hourly temperature forecasts obtained from 250 geographical points, corresponding to power demand centres (towns and very populated areas); the weighting factors were proportional to the aggregated power demands associated with such regions.
From the recorded information of explanatory variables, hourly values of actual Iberian aggregated power demand and generation of different types of power plants, and by applying a time series autocorrelation analysis of these variables [
41], some explanatory information was identified for lags of 48 and 168 h; thus, in accordance with such analysis, in
Figure 1, several variables for hour
h of day
D – 1 and for hour
h of day
D – 6 were included: power demand (
LD), hydropower generation (HG), solar power generation and power cogeneration (
SG), coal power generation (
CG), combined cycle power generation (
CCG), and nuclear power generation (
NG). It should be noticed that information for a lag of 24 h was not included since those generation variables were not available for day
D. Similarly, from time series analysis, the price
pD,h, at hour
h of day
D and the price
pD – 6,h of day
D – 6 were included; in this case, the prices for all the hours of day
D were known because they were fixed in day
D – 1. Lastly, from the results of the autocorrelation analysis, wind power generations (
WG) from day
D – 6 to day
D – 1 were not considered in the time framework of the EMPF model.
The time framework for price
estimation of the REMPE model is given in
Figure 2, showing the major differences between this model and the EMPF model. A part of the input variables of the REMPE model (mainly actual power generations and power demands of day
D + 1) as well as its output variable (price
estimation ), are different from the input variables of the EMPF model (principally actual power generations and power demands of day
D – 1 and the ones of day
D – 6), as well as its output variable (price
forecast ). Thus, the REMPE model is not a
forecasting model but a model for hourly price
estimation. Lastly, notice that the wind power generation (
WG) of day
D + 1 is an input of the REMPE model.
Figure 2.
Time framework for price estimation of the REMPE model.
Figure 2.
Time framework for price estimation of the REMPE model.
2.2. Data Characterization
The day-ahead hourly price forecasting can be influenced by different kinds of explanatory variables:
- (a)
Actual recorded hourly data electricity prices, that is, real information known up to day
D. This information is generally available for free from the market operator for the day-ahead and intraday markets. In this paper, we will use the day-ahead hourly market price available up to the hour 23 of day
D from the website of the market operator OMIE (Market operator of the Iberian Electricity Market) [
42].
- (b)
Chronological variables: hour, week day, holiday, week number and month number. It is naturally known information for past and future periods. However, we only included the “hour” and “week day” variables, including a value to identify holidays in the “week day” variable. The “week number” or “month number” variables were not used because they would lead to a large set of cases for the training and testing of the explanatory models.
- (c)
Actual recorded hourly power system data, mainly regional aggregated hourly power demands and regional hourly power generations aggregated by generation type. It is a very large set of information and is frequently difficult to obtain. We used information extracted from the websites of REN day by day, the Portuguese Transmission System Operator (TSO) [
43] and REE, the Spanish TSO [
44]. It should be noticed that the collection process of this set of real hourly information for research works was not easy. In both websites the information was aggregated by source of generation, although it is not exactly the same kind of generation in both TSOs. With these data, it was possible to aggregate the Iberian hourly power data in the following data series: load (power demand), wind power generation, hydropower generation, cogeneration and solar power generation (only available aggregated in REE at the time of this research, which included all cogeneration plants, photovoltaic plants and concentrated solar power plants), nuclear power generation, coal power generation, combined cycle power generation, and power exchanged with France.
- (d)
Hourly weather forecasts, including wind speed, solar irradiance and temperature. We achieved hourly weather forecasts from the NWP (Numerical Weather Prediction) mesoscale model WRF NMM [
45], initialized with the forecasts provided by the global NWP model GFS [
46]. The NWP mesoscale model was run by the company Smartwatt for weather forecasts at geographic points in Portugal, and by the University of La Rioja for forecasts at geographic points in Spain. The different hourly weather forecasting variables were calculated by the weighting process described before in sub-
Section 2.1.
- (e)
Power system hourly variable forecasts: power demand forecasts, wind power forecasts, solar power forecasts, hydropower forecasts, independent cogeneration forecasts, thermal power forecasts, etc. This kind of information is difficult to obtain from a TSO for research works; furthermore, it is also complex to aggregate the forecast information from several TSOs. This kind of information, with high value but restricted to some actors, was not used in this paper. Alternatively, it was used the hourly weather forecast information previously described in paragraph (d), since such information is related partially to some of the power system hourly variable forecasts.
- (f)
Power market restriction variables: unavailable capacity for power generation, reserves of power generation and interconnection, volume of electric energy allocated in other electricity markets, and electricity futures market and bilateral contracts. This is inside information, whose usefulness and usability have not been studied yet and it was therefore beyond the scope of this paper.
From the set of the abovementioned variables, our explanatory models in
Section 3,
Section 4 and
Section 5 utilized the types of data (a), (b), (c), (d), with hourly values corresponding to recorded values in years 2012 and 2013. The recorded data were divided into an in-sample data set used for training, and an out-sample data set used for testing such explanatory models. The out-sample data set is composed by complete weeks extracted along the two years of data in order to have a good representation of the different price behaviours along the year. The in-sample and out-sample data sets are defined as follows:
In-sample data set: all the hours of the days in 2012 and 2013, except those included in the out-sample data set, totalizing 14184 cases (hours).
Out-sample data set: all the hours of the weeks with number 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 in 2012, and weeks number 2, 7, 12, 17, 22, 27, 32, 37, 42, 47 in 2013, totalizing 3360 cases (hours).
Prices for the MIBEL are mostly influenced by explanatory variables aggregated at the Iberian regional level, integrating the Portuguese and Spanish power systems. Additional to the weather forecasted variables, previously presented, the power generation variables are potentially important as explanatory variables. Annual-average hourly price behaviours could change from year to year, depending on diverse type of causes; for example, they can change due to variations in the electric production caused by the competitiveness between coal and natural gas power plants and by the renewable power generation available in each time period.
The daily average power demand was quite similar in 2012 and 2013, but there were slight changes in the generation-mix production. The year 2013 was a year with more renewable power generation. This higher renewable proportion of power generation in 2013 had direct implications in the scale of prices which decreased an average of around 4.4 €/MWh in 2013.
Figure 3 shows the evolution of the average daily prices of electricity for the day-ahead market of the MIBEL along part of the years 2012 and 2013, as well as the average daily values of some explanatory variables: average daily thermal power generation (combined cycles, coal and nuclear power generations) and renewable power generation (hydropower and wind power generations) and average daily power demand. We can observe that high prices occur in periods with high loads (power demands). On the other side, sometimes minimum values of the prices occur in days with low values of thermal power generation, caused by high values of hydropower or wind power productions (e.g., in April and May 2012, and April 2013). The variability of prices is very high compared with the variability of the explanatory variables represented in
Figure 3.
Figure 3 indicates very complex and nonlinear relationships between the prices and the explanatory variables represented in such figure. Furthermore, the non-linearity of relationships between hourly explanatory variables and hourly prices are related to price spikes corresponding to abnormal high and low values. In fact, in many of the cases, the price spikes cannot be explained by physical explanatory variables, since extreme abnormal values of prices could result from other possible causes; for example, they could be consequence of strategic decisions of market agents related to inside information of the electricity market. One of the reasons to build the REMPE model was to identify this partial lack of capability of the physical variables in order to explain the prices.
Figure 3.
Evolution of the daily average prices and some explanatory variables along part of the years 2012 and 2013.
Figure 3.
Evolution of the daily average prices and some explanatory variables along part of the years 2012 and 2013.
3. Reference Explanatory Model for Price Estimation (REMPE Model)
The REMPE model is an hourly price estimation model that basically uses recorded values of explanatory variables corresponding to hour h of day D + 1 in order to estimate the price of hour h of day D + 1. This model also includes values of hourly prices on previous days D and D – 6.
Thus, in order to build the REMPE model, four groups of explanatory variables were utilized, as it is shown in
Table 1. The first group is composed of the chronological variables “hour” and “week day” (variables V1 and V2). The second group of hourly variables is composed of the prices on previous days at the same hour
h (variables V3 and V4). The third group includes hourly explanatory variables of the power system, aggregated at a regional level, that is, the actual hourly power generations (variables V6R to V11R) and the actual hourly power demand (variable V5R), corresponding to hour
h of day
D + 1. The fourth group includes hourly weather forecasts of wind speed, temperature and irradiance (variables V12R to V14R) for hour
h of day
D + 1, obtained for the studied region as indicated in sub-
Section 2.1.
It should be noted that power system information will be generally available acquired up to day D – 1. Furthermore, it will also be possible to use the prices resulting from prices for hour h of day D: this information of prices is available for each of the 24 h in day D because it was established in the previous day. Therefore, although the REMPE model is a price estimation model, we naturally decided to use the available price of hour h of day D as input variable of such model.
The REMPE model was implemented with a MLP [
47], using one hidden layer with 2
n + 1 neurons, where
n is the number of input variables (explanatory variables). This model was trained and tested with the in-sample and out-sample data sets previously described in sub-
Section 2.2, which were utilized in all computing experiences presented in this paper. Since we used random weights initiation in these neural networks, different training of the same MLP resulted in slightly different computer results (outputs). In order to avoid this inconvenience, we used, as a final forecasting result, the ensemble averaging [
47,
48] of the outputs of five training processes of the same MLP. This final result is the linear combination of the five output values, thus achieving a more stable response and a lower error.
Table 1.
Explanatory variables of the reference explanatory model for price estimations (REMPE) model.
Table 1.
Explanatory variables of the reference explanatory model for price estimations (REMPE) model.
Variable | Description |
---|
V1 | hour |
V2 | week day |
V3 | hourly price D |
V4 | hourly price D – 6 |
V5R | hourly power demand D + 1 |
V6R | hourly wind power generation D + 1 |
V7R | hourly hydropower generation D + 1 |
V8R | hourly cogeneration and solar power generation D + 1 |
V9R | hourly coal power generation D + 1 |
V10R | hourly nuclear power generation D + 1 |
V11R | hourly combined cycled power generation D + 1 |
V12R | hourly forecasted temperature D + 1 |
V13R | hourly forecasted wind speed D + 1 |
V14R | hourly forecasted Irradiance D + 1 |
An error analysis for the REMPE model using the mean absolute percentage error (MAPE) was carried out in the price estimations corresponding to the out-sample data set, where the MAPE is defined by Equation (1),
where
Preal_T is the real hourly price value,
Pestimation_T is the
estimation of the hourly price value obtained from the explanatory model, and
N is the number of elements of the out-sample data set. The MAPE error value corresponding to the final ensemble result achieved by the REMPE model was 10.23%.
Several Alternative Reference Explanatory Models for Price Estimation, AREMPE-1 to AREMPE-14, were also built using MPLs. They are alternative models derived from the REMPE model: they have a similar structure (MLP, one hidden layer, 2
n + 1 neurons), and they use the same input variables except one of them, as it is shown in
Table 2. These models were created to detect whether some of the explanatory variables were relatively useful or not for the hourly price
estimation.
All the alternative models, from AREMPE-1 to AREMPE-14, resulted in higher MAPE errors than the error of the REMPE model which contains the complete set of variables. These results indicate that all explanatory variables contain valued information for the REMPE model. The alternative explanatory model for price estimation that leads to a higher value of MAPE error indicates that a relatively important variable was excluded. From the results of MAPE error values with the out-sample data set shown in
Table 2, we can conclude that the price variables on previous days (variables V3 and V4) are important ones; also, the actual wind power generation variable (V6R) is an important variable to explain the prices of day
D + 1; and the cogeneration and thermal generation variables are considered as significant information. On the other hand, forecasted wind speed (VR13), hour (V1), and week day (V2) variables contain relatively less valuable information for price
estimation.
Table 2.
Alternative Explanatory Models for Price Estimation and their mean absolute percentage error (MAPE) errors.
Table 2.
Alternative Explanatory Models for Price Estimation and their mean absolute percentage error (MAPE) errors.
Model | Excluded Variable | MAPE (%) |
---|
REMPE model | – | 10.23 |
AREMPE-1 | V1: hour | 10.39 |
AREMPE-2 | V2: week day | 10.34 |
AREMPE-3 | V3: hourly price D | 10.93 |
AREMPE-4 | V4: hourly price D – 6 | 10.78 |
AREMPE-5 | V5R: hourly power demand D + 1 | 10.50 |
AREMPE-6 | V6R: hourly wind power generation D + 1 | 10.71 |
AREMPE-7 | V7R: hourly hydropower generation D + 1 | 10.46 |
AREMPE-8 | V8R: hourly cogeneration and solar power generation D + 1 | 10.56 |
AREMPE-9 | V9R: hourly coal power generation D + 1 | 10.54 |
AREMPE-10 | V10R: hourly nuclear power generation D + 1 | 10.49 |
AREMPE-11 | V11R: hourly combined cycled power generation D + 1 | 10.45 |
AREMPE-12 | V12R: hourly forecasted temperature D + 1 | 10.52 |
AREMPE-13 | V13R: hourly forecasted wind speed D + 1 | 10.32 |
AREMPE-14 | V14R: hourly forecasted Irradiance D + 1 | 10.56 |
In
Table 2, the explanatory variables with apparent low importance (corresponding to AREMPE models with smaller MAPE errors, relatively close to the error of the REMPE model) are not variables without an explanatory value, but there is possibly another variable that better explains the same information. For instance, the forecasted wind speed for day
D + 1 (variable V13R) is a relevant variable in practical
forecasting applications, but in the REMPE model another more accurate variable, the actual wind power generation for day
D + 1 (variable V6R), practically extracts all the information that could exist in the forecasted wind speed variable (V13R) for price
estimation purposes. However, as we will explain in
Section 4, the wind power generation for day
D + 1 is not included in our Explanatory Model for Price Forecasts, EMPF model, and, therefore, the forecasted wind speed for day
D + 1 takes a considerable relevance in such EMPF model, as it is shown later.
The MAPE error obtained with the REMPE model is, as previously indicated, the lowest error (“minimum error”) using the considered explanatory variables. This value (10.23%) corresponds to the lowest limit of the possible performance of any explanatory model for price estimation (or for price forecast) belonging to the same kind of explanatory models, that is, with similar kind of variables to those used in the REMPE model. This was another reason to build this REMPE model.
The REMPE model uses the information that could better “explain” the prices of day
D + 1, in the context described in this paper for explanatory models. Since the explanatory variables of the REMPE model cannot “explain” such prices with more accuracy, this fact can mean that there is approximately a 10% of MAPE error that can be caused by other diverse factors: it could be caused by the market agents’ behaviours (strategic, speculation and/or other decision type of market actors) that are not related to physical explanatory variables. These behaviours are dependent on inside information that, obviously, we did not model as explanatory variables. In this matter, hourly price spikes are cases where the REMPE model frequently fails. For example,
Figure 4 presents actual hourly prices and REMPE hourly price
estimation values for the week 37 in year 2013: in the off-peak period of the third day (hours 48 to 55), an abnormal value of the actual price is shown, which cannot be sufficiently well explained by physical explanatory variables used by the REMPE model.
Figure 4.
Actual prices and REMPE price estimation values for week 37 of 2013.
Figure 4.
Actual prices and REMPE price estimation values for week 37 of 2013.
Figure 5 gives the hourly “error of the REMPE price
estimation” values
versus the actual hourly price values for the out-sample data set. Such error is the difference between the actual hourly price value and the corresponding hourly price
estimation value of the REMPE model.
Figure 5 shows that the abovementioned error is relatively well-centred in the horizontal axe for real prices in a range from 15 to 65 €/MWh.
Figure 5 also shows that the errors present an asymmetrical distribution in the vertical axe, with some relatively higher error values for positive errors, but with a relatively higher “frequency” (density of “points” in
Figure 5) for small negative errors. For this range, 15 to 65 €/MWh, the mean absolute error is approximately 4.5 €/MWh. However, for price spikes higher than 65 €/MWh, the REMPE model leads to worse estimations, unsatisfactorily explaining high actual prices. On other side, for price spikes lower than 15 €/kWh the REMPE model have difficulties to explain low actual prices.
Figure 5.
Error of the REMPE price estimation values versus actual price values.
Figure 5.
Error of the REMPE price estimation values versus actual price values.
4. Explanatory Model for Price Forecasts (EMPF Model)
The EMPF model is a short-term hourly price forecasting model that utilizes recorded explanatory variables mainly corresponding to hour h of days D – 1 and D – 6, as well as weather forecasts for hour h of day D + 1, in order to forecast the electricity price for hour h of day D + 1. This EMPF model also includes hourly prices on previous days D and D – 6.
Thus, four types of explanatory variables were considered in the EMPF model:
- (a)
Chronological variables such as “hour” and “week day”, including a differentiation for holidays in the week day.
- (b)
Hourly prices of days D and D – 6.
- (c)
Hourly explanatory variables of the power system, that is, hourly power demand and hourly power generations of days D – 1 and D – 6.
- (d)
Hourly weather forecasts of wind speed, temperature and irradiance, for day D + 1.
As briefly outlined in sub-
Section 2.1, the time series of hourly electricity price and other hourly power system variables, such as power demand or generation variables, have a common characteristic of seasonality, that is, an autocorrelation analysis shows that they contain significant information in the 24 h lag, 48 h lag or 168 h lag. This fact responds to the temporal behaviour of the hourly power demand, which has daily and weekly patterns. There is one exception: the hourly wind power generation variable, which does not contain significant pattern information in day
D – 1 since the Auto Correlation Function (ACF) and the Partial Auto Correlation Function (PACF) applied to the wind power generation variable don’t show significant values for a lag over a few hours. Therefore, the wind power generation of day
D – 1 is useless for the price forecasting of the day-ahead
D + 1 since such generation of day
D – 1 corresponds to a lag of 48 h. The solar power generation and power cogeneration variable is partially related to the meteorological variables but most of the power generation comes from that of the cogeneration power, with well-defined seasonal patterns (daily and weekly patterns).
Thus, for the explanatory variables of types (b) and (c) previously mentioned, the EMPF model considers three sets of seasonal variables, as it is shown in
Table 3. The first set includes only one variable: the hourly price, V3, with a 24 h lag, that is, the hourly price in day
D. The second set comprises the variables V5, V7, V9, V11, V13 and V15 with a 48 h lag, that is, they correspond to the values in day
D – 1. The third set comprises the variables V4, V6, V8, V10, V12, V14 and V16 with a 168 h lag, that is, they correspond to the values in day
D – 6. The explanatory variables of type (d), variables V17, V18 and V19, are the weather forecasts for day
D + 1. Obviously, these meteorological variables of the EMPF model, V17, V18 and V19, correspond to the variables V12R, V13R and V14R used the REMPE model (
Table 1).
The EMPF model was implemented with a MPL with the same structure than that used for the REMPE model, that is, one hidden layer with 2
n + 1 neurons, where
n is the number of input explanatory variables. For training and testing of the MLP, in-sample and out-sample data sets previously described in sub-
Section 2.2 were used again, as well as the ensemble technique for the corresponding computer results.
Table 3.
Explanatory variables of the explanatory model for price forecast (EMPF) model.
Table 3.
Explanatory variables of the explanatory model for price forecast (EMPF) model.
Variable | Description |
---|
V1 | hour |
V2 | week day |
V3 | hourly price D |
V4 | hourly price D – 6 |
V5 | hourly power demand D – 1 |
V6 | hourly power demand D – 6 |
V7 | hourly hydropower generation D – 1 |
V8 | hourly hydropower generation D – 6 |
V9 | hourly cogeneration and solar power generation D – 1 |
V10 | hourly cogeneration and solar power generation D – 6 |
V11 | hourly coal power generation D – 1 |
V12 | hourly coal power generation D – 6 |
V13 | hourly nuclear power generation D – 1 |
V14 | hourly nuclear power generation D – 6 |
V15 | hourly combined cycled power generation D – 1 |
V16 | hourly combined cycled power generation D – 6 |
V17 | hourly forecasted temperature D + 1 |
V18 | hourly forecasted wind speed D + 1 |
V19 | hourly forecasted irradiance D + 1 |
In a similar way as that followed for the REMPE model, the MAPE was calculated for the price forecasts corresponding to the out-sample data set for the EMPF model. In this case, the MAPE is defined by Equation (2),
where
Preal_T is the real hourly price value,
Pforecast_T is the forecasted hourly price value of the forecasting model; and
N is the number of elements in the out-sample data set.
The MAPE error value (final ensemble result) achieved by the EMPF model was 13.36%. As expected, this MAPE error is higher than the one obtained with the REMPE model (10.23%) but relatively close to this reference error value.
The error values of the EMPF model vary from week to week in the out-sample data set.
Figure 6 and
Figure 7 show two examples of hourly evolution of the actual price values,
forecast values of the EMPF model, and REMPE
estimation values in two different weeks belonging to the out-sample data set.
Figure 6 shows relatively good performances of both REMPE and EMPF models, and
Figure 7 shows performances of such models which are comparatively not so good.
Week 7 of year 2013 (
Figure 6) corresponds to a period with a medium hydropower generation level, and a very high wind power generation in the first two days of the week but with a strong decreasing generation to zero in the last days of the week.
Figure 6.
Actual price values, forecast values of the EMPF model, and REMPE estimation values for the week 7 of year 2013.
Figure 6.
Actual price values, forecast values of the EMPF model, and REMPE estimation values for the week 7 of year 2013.
Figure 7.
Actual price values, forecast values of the EMPF model and REMPE estimation values for the week 12 of year 2013.
Figure 7.
Actual price values, forecast values of the EMPF model and REMPE estimation values for the week 12 of year 2013.
Week 12 of year 2013 (
Figure 7) was a week with relatively high values of hydro and wind power generation, with a strong variability on wind speed. In the second day of this week, the forecasts of wind speed presented high error values, which could partially justify a relatively higher error in the forecasting values of the EMPF model; however the REMPE model estimates the price more correctly, probably because it uses exact values of the wind generation as a variable.
The histograms of the error values of the EMPF model and of the REMPE model are given in
Figure 8. These values represent the difference between the actual price values and the forecasted ones, and are percentage-wise expressed with respect to the actual price values. The distribution of error values of the REMPE model is centred with respect to the horizontal axe, with a bias of 0.13%, but the bias is 2.74% for the EMPF model, slightly showing some more frequency and magnitude of errors.
Figure 8.
Histogram of the error values of the EMPF model and the REMPE model for the out-sample data set.
Figure 8.
Histogram of the error values of the EMPF model and the REMPE model for the out-sample data set.
Several Alternative Explanatory Models for Price Forecasts (AEMPF-1 to AEMPF-19), shown in
Table 4, were also built by utilizing MPLs. These alternative models have a similar structure than that of the EMPF model, each of them excluding however a different input variable from those used by the EMPF model. Thus, we could identify explanatory variables that were relatively important for price forecasting, as it is indicated later.
Obviously, none of the Alternative Explanatory Models for Price Forecasts, AEMPF-1 to AEMPF-19, achieved better performance (MAPE error) than that of the EMPF model (base model), meaning that all considered variables contain useful price forecasting information. Variables excluded in the corresponding AEMPF model with higher MAPE error are relatively important for forecasting purposes. The forecasted wind speed of day
D + 1 (variable V18) is clearly the most important variable for the price forecasting, followed in relevance by the price of the previous day (variable V3). From
Table 4, analyzing the MAPE error values from the AEMPF-3 to the AEMPF-19 models, we can conclude that the power generation dispatch variables of day
D – 1 (variables V7, V9, V11, V13 and V15), and the consumption (power demand) of day
D – 1 (variable V5), seem variables with some relevance, but they are obviously less important than the price of day
D (variable V3) and the forecasted wind speed of day
D + 1 (variable V18). The combined cycled power generation of day
D – 1 (variable V15) is the least important variable of the set of generation variables of day
D – 1. In general, the inform ation in most of the variables of day
D – 1 seems to be more valuable than that from the same variable of day
D – 6.
In
Table 4, “hour” and “week day” chronological variables, variables V1 and V2, seem to have relatively and slightly less useful information, because there are several variables of day
D – 1 with seasonal information.
Table 4.
Alternative Explanatory models for Price Forecasts and their MAPE errors.
Table 4.
Alternative Explanatory models for Price Forecasts and their MAPE errors.
Model | Excluded Variable | MAPE (%) |
---|
EPFM model | – | 13.36 |
AEMPF-1 | V1: hour | 13.60 |
AEMPF-2 | V2: week day | 13.64 |
AEMPF-3 | V3: hourly price D | 14.98 |
AEMPF-4 | V4: hourly price D – 6 | 13.85 |
AEMPF-5 | V5: hourly power demand D – 1 | 14.06 |
AEMPF-6 | V6: hourly power demand D – 6 | 13.68 |
AEMPF-7 | V7: hourly hydropower generation D – 1 | 13.87 |
AEMPF-8 | V8: hourly hydropower generation D – 6 | 13.58 |
AEMPF-9 | V9: hourly cogeneration and solar power generation D – 1 | 13.77 |
AEMPF-10 | V10: hourly cogeneration and solar power generation D – 6 | 13.54 |
AEMPF-11 | V11: hourly coal power generation D – 1 | 13.78 |
AEMPF-12 | V12: hourly coal power generation D – 6 | 13.55 |
AEMPF-13 | V13: hourly nuclear power generation D – 1 | 13.74 |
AEMPF-14 | V14: hourly nuclear power generation D – 6 | 14.17 |
AEMPF-15 | V15: hourly combined cycled power generation D – 1 | 13.68 |
AEMPF-16 | V16: hourly combined cycled power generation D – 6 | 13.65 |
AEMPF-17 | V17: hourly forecasted temperature D + 1 | 13.74 |
AEMPF-18 | V18: hourly forecasted wind speed D + 1 | 17.71 |
AEMPF-19 | V19: hourly forecasted irradiance D + 1 | 13.97 |
Lastly, notice that for other specific electricity markets different from the MIBEL, the corresponding EMPF model could be built as well as the corresponding Alternative Explanatory Models for Price Forecasts that would allow one to determine the importance of the explanatory input variables in such different electricity markets.
5. Simpler Explanatory Models for Price Forecasts (SEMPF Models)
As previously discussed, there are different types of input information that can be used in explanatory models for short-term price forecasts: chronological information, hourly price information, recorded hourly power demand, recorded hourly power generations and hourly weather forecasts. If the complete set of information is available, then, from a technical point of view, the best option is to use the combination of all explanatory variables which provide the best hourly price forecasting performance. However, data gathering is generally a complex process in forecast services; furthermore, in some cases, a part of the data set is not available, or it is not complete or reliable and it cannot be used; or the data gathering leads to some kind of cost. There is also the computational effort issue: hourly price forecasting models with more explanatory variables are computationally more intensive, especially when they need data pre-processing.
In order to analyze the value of the type of information for short-term price forecasts, we utilized the same process used in the EMPF model applied to some Simpler Explanatory Models for Price Forecast (SEMPF models) that consider different types of explanatory information.
Table 5 shows the explanatory variables for the SEMPF-20 to SEMPF-26 models that were tested for comparative purposes, and the MAPE errors obtained with the out-sample data set for each model.
The SEMPF-20 model is a “baseline model” that uses “hour” and “week day” chronological variables (variables V1 and V2). In practice, it is a model that computes an average price for each hour in each week day. The output of the SEMPF-20 model is the same for all weeks of the year, and it leads to a MAPE error of 20.94%, as it is shown in
Table 5. This SEMPF-20 model is minimalist in the usage of information, although the price series have been used as a target in the training process. However, in situations where there is no way of receiving information about the prices in previous days, this naïve SEMPF-20 model is the most suitable.
Table 5.
EMPF and simpler explanatory models for price forecast (SEMPF) models input variables and their MAPE errors.
Table 5.
EMPF and simpler explanatory models for price forecast (SEMPF) models input variables and their MAPE errors.
Model | Used Variables | MAPE (%) | Description |
---|
SEMPF-20 | V1, V2 | 20.94 | baseline model |
SEMPF-21 | V3, V4 | 18.04 | hourly price time series |
SEMPF-22 | V1, V2, V3, V4 | 16.80 | hourly price time series and chronological information |
SEMPF-23 | V3, V4, V17, V18, V19 | 15.53 | hourly price time series and weather forecasts |
SEMPF-24 | V1, V2, V3, V4, V17, V18, V19 | 14.34 | hourly price time series, weather forecasts and chronological information |
SEMPF-25 | V1, V2, V3, V4, V5, V6 | 16.59 | hourly price time series, power demand and chronological information |
SEMPF-26 | V1, V2, V3, V4, V5, V6, V17, V18, V19 | 13.95 | hourly price time series, weather forecasts, power demand and chronological information |
EMPF model | All variables V1 to V19 | 13.36 | all variables |
The SEMPF-21 model only uses information about prices in previous days, that is, price of day
D and price of day
D – 6 (variables V3 and V4). It should be noticed that, with only this seasonal autoregressive information, the MAPE error decreases to 18.04%, as it is shown in
Table 5.
The SEMPF-22 model uses the chronological information and price time series (variables V1, V2, V3 and V4). This model is a natural alternative to the SEMPF-21 model because the chronological information has no additional data gathering effort and it is naturally known. The MAPE error, 16.8%, achieved by this model is 1.24% lower than that for the SEMPF-21 model. This appreciable decrease indicates a relative importance of the chronological information when there is a small number of seasonal variables.
The SEMPF-23 and SEMPF-24 models incorporate weather forecasts (variables V17, V18 and V19) to the price information (variables V3 and V4) and to the chronological information (variables V1 and V2). The SEMPF-24 model achieves a MAPE error of 14.34%, decreasing the error 2.46% with respect to the MAPE error of the SEMPF-22 model; furthermore, SEMPF-23 model achieves a MAPE error of 15.5%, decreasing the error 2.51% with respect to the MAPE error of the SEMPF-21 model. These error decreases show that weather forecasts are relevant information for the hourly price forecasting. Models SEMPF-23 and SEMPF-24 require weather forecast services that could incur costs for a regional level application (in the MIBEL, or other similar market) because hundreds of hourly weather forecast geographic points are necessary to conveniently cover the service area of the electricity market studied.
The SEMPF-25 model adds the power demands (variables V5 and V6) to the list of explanatory variables of SEMPF-22 model with a small MAPE error improvement of 0.21%. On the other hand, a small MAPE error improvement is observed when the power demands (variables V5 and V6) are added, in the SEMPF-26 model, to the list of variables of the SEMPF-24 model.
Finally, we can compare the SEMPF models with the EMPF model that achieves a MAPE error of 13.36%, as it is shown in
Table 5. It should be observed that the EMPF model uses 19 explanatory variables and the procurement of some of them require some effort, especially in the case of the different kinds of power generation time series. The EMPF model is the best for price forecasts under the MAPE error criterion with the variables analyzed; on the other hand, SEMPF models comparatively need lower effort for data gathering, but they lead to higher MAPE error values.
Thus,
Table 5 and the corresponding comparisons among explanatory models described above allow outlining some guidelines of the value of the diverse type of input information used by SEMPF models for the day-ahead price forecasting of the MIBEL.
6. Conclusions
This paper presents the analysis of the importance of a set of explanatory variables for the day-ahead price forecast in the Iberian Electricity Market (MIBEL). The set of explanatory variables includes recorded time series of prices, of regional-aggregated hourly power generations and hourly numerical weather forecasts associated with the studied region. Two main models, related with the analysis of importance, are presented in the paper: the EMPF model for the day-ahead hourly price forecasting and the REMPE model) for the estimation of the hourly prices. A set of alternative models derived from the EMPF model (AEMPF models), and a set of simpler models than the EMPF model (SEMPF models), have been tested in order to analyze the relative importance of each explanatory variable and each type of explanatory variables for the day-ahead price forecasting in the MIBEL.
The EMPF model provides day-ahead hourly price forecast, at a regional level, mainly using recorded time series of prices, of regional-aggregated hourly power generations and hourly numerical weather forecasts associated with the studied region. Thus, as input variables (price explanatory variables), the EMPF model considers, in addition to chronological variables, hourly prices in previous days, regional-aggregated hourly power demands and hourly power generations of most of the types of electricity production in previous days as well as weather forecasts (hourly wind speed, temperature and irradiation) for the day-ahead in hundreds of geographical points of the region.
The REMPE model provides the estimation of the hourly prices. The main difference between this model and the EMPF model is that the REMPE model uses actual power demand values and actual power generation values corresponding of such day-ahead instead of the values of these variables in previous days. Thus, the REMPE model is not a model for price forecasts but for price estimations.
Both EMPF and REMPE models have been successfully and satisfactorily applied to the real-life case study of the MIBEL that covers the mainland of Portugal and Spain.
The MAPE error of the REMPE model, for the MIBEL, is the lowest one, using the considered explanatory variables. This value is the lowest limit of the MAPE errors of any explanatory model for price estimation or for price forecast using the same input variables of the REMPE model. It seems that this MAPE error of the REMPE model could be caused by diverse factors which are different from the physical explanatory variables. Among these factors, for example, we could consider strategic market agents’ behaviours that are not essentially related to physical variables, but possibly related to inside information for market bidding. In these matters, price spikes are cases where the REMPE model presents a worse performance, possibly indicating some special market agents’ actions. Thus, this model has also been useful in analyzing price estimation difficulties in the price spikes.
The EMPF model achieves a MAPE error, in the MIBEL, higher than the “minimum error” obtained by the REMPE model, but relatively close to it, showing a satisfactorily performance with respect to the REMPE model.
The EMPF model has been useful in identifying the relative importance of each explanatory variable for price forecasts among the remaining variables that are inputs of such model. Thus, in the day-ahead price forecasts of the MIBEL, the EMPF model computer results indicate that the forecasted wind speed and the price in the previous day are the most relevant variables, although all the variables of the model give explanatory information for price forecasting.
Other SEMPF models have been also applied to the case of the MIBEL, using only some of the types of input information (chronological information, price information, power demands, power generations and/or weather forecasts). The reduction in the types of input information has led to higher MAPE errors for the SEMPF models than those for the EMPF model, since this model is the best for the day-ahead price forecasts with the considered explanatory variables. However, such SEMPF models obviously need a lower data gathering effort.
The explanatory models of this paper for price forecasts, their performance in the MIBEL, mainly in terms of MAPE errors, and the analysis of the importance of their input variables, can be useful for electricity market agents and other actors of the electric energy industry.