1. Introduction
Important recent changes in electricity markets make the electricity demand and production forecast a current challenge for the industries. Market liberalization, increasing use of electronic appliances and the penetration of renewable electricity sources are just a few of the numerous causes that explain the current challenges [
1]. On the other side, new sources of data are becoming available notably with the deployment of smart meters. However, access to these individual consumers’ data is not always possible (when available) and so aggregated data is used to anticipate the load of the system.
Load curves at some aggregate level (say regional or nation-wide) usually present a series of salient features that are the basis of any forecasting method. Common patterns are long-term trends, various cycles (with yearly, weekly and daily patterns) as well as a high dependence on external factors such as meteorological variables. We may separate these common patterns into two sets of effects. On one side, the effects linked to the social and economical organisation of the human activity. For instance, the calendar structure induces its cycle to the electrical demand: load needs during week days are higher than during weekends, and load during daylight is higher than during the night; holidays also have a large impact on the demand structure. Notice that these effects are mostly deterministic in their nature, that is they can be predicted without error. On the other side, we found the effects connected to the environment of the demand—for instance, the weather, since the demand usually depends on variables such as air temperature, humidity, wind speed and direction, dew point, etc., but also variables connected to exceptional events such as strikes or damages on the electrical grid that may affect the demand. While weather is still easier to anticipate than exceptional events, both effects share a stochastic nature that makes them more difficult to anticipate.
While only recorded at some time points (e.g., each hour, half-hour or quarter-hour), the electricity load of the system is a continuum. From this, one may consider mathematically the load curve as a function of time with some regularity properties. In fact, electrical engineers and forecaster usually represent the load curve as a function instead of a sequence of discrete measures. Then, one may study the electrical load as a sequence of functions. Recently, attention has been paid to this kind of setting, which is naturally called functional time series (FTS). A nice theoretical framework to cope with FTS is within the autoregressive Hilbertian processes, defined through families of random variables taking values on a Hilbert space [
2,
3]. These processes are strictly stationary and linear, which are two constrictive assumptions to model the electrical load. An alternative to linearity was proposed in [
4] where the prediction of a function is obtained as a linear combination of past observed segments, using the weights induced by a notion of similarity between curves. Although the stationary assumption of the full time series is still too strong for the electrical load data [
5], corrections can be made in order to render the hypothesis more reasonable. First, one may consider that the mean level of the curves presents some kind of evolution. Second, the calendar structure creates on the data at least two different regimes: workable and non workable days. Of course, the specific form of the corrections needed should depend on the nature of the model used to obtain the predictions.
State-space models (SSM) and the connection notion of Kalman filter are an interesting alternative to cope with nonlinearity and non stationary patterns of the electrical data. Let us mention some references where SSM have been used to forecast load demand. Classical vector autoregressive processes are used in [
6] under the form of SSM to compare the predictive performance with respect to seasonal autoregressive processes. The main point in [
7] is to combine several machine learning techniques with wavelet transforms of the electrical signal. SSM are then used to add adaptability to the proposed prediction technique. Since some of the machine learning tools may produce results that are difficult to interpret, the authors in [
8] looks for a forecasting procedure based on SSM that is easier to analyse making explicit the dependence on some exogenous variables. They use a Monte Carlo based version of the Kalman Filter to increase flexibility and add analytical information.
A more detailed model is in [
9], where the authors propose to describe the hourly load curve as a set of 24 individual regression models that share trends, seasons at different levels, short-term dynamics and weather effects including non linear functions for heating effects. The equations represent 24 univariate stochastically time-varying processes that should be estimated simultaneously within a multivariate linear Gaussian state space framework using the celebrated Kalman filter [
10]. However, the cumbersome of the computational burden is a drawback. A second work circumvents the problem by using a dimension reduction approach which reasonably resizes the problem into a handy number of dimension which render the use of the Kalman filter practicable [
11].
Some successful uses of SSM to cope with functional data (not necessarily time series) are reported in literature—for instance, by using common dynamical factor as in [
12] to model electricity price and load, or as in [
13] to predict yield curves of some financial assets in addition to [
14] where railway supervision is performed thanks to a new online clustering approach over functional time series using SSM.
Inspired by these ideas, we push forward the model in [
11] to describe now a completely functional autoregressive process whose parameter may eventually vary on time. Indeed, at each time point (days in our practical case), the whole functional structure (load curve) is described through the projection coefficients on a spline basis. Then, using a functional version of principal components analysis, the dimension of the representation is reduced. The vector of spline coefficients is then used as a multivariate autoregressive process, as in [
15]. Thus, our approach is completely endogenous but with the ability of incorporating exogenous information (available at the time of the forecast) as covariates.
This paper will be structured as follows. In
Section 2, we describe the model we propose for forecasting electricity demand. We present the functional data, functional data representation in splines basis, the state space model that we propose and model estimation methods.
Section 3 is proposed to show a model inference on a simulated dataset. We will talk about Kalman filtering and smoothing, functional principal component analysis.
Section 4 will describe the experiments we make on real data with simple application of our procedure at the aggregation level of a single supplier. We then explore, in
Section 5, some corrections and extension to the simple approach in order to take into account some of the non stationary patters present in the data. Additional experiments are the object of
Section 6, where we predict at the greater national aggregation level. The article concludes in
Section 7 where some future lines of work are discussed.
2. Materials and Methods
We introduce in this section the notation and we construct the prediction method. For convenience,
Table 1 sums up the used nomenclature including all variables, acronyms, indexes and constants defined in the manuscript, in order to make the text more clear and readable.
The starting point of our modeling is a univariate continuous-time stochastic process
. To study this process, a useful device [
2] is to consider a second stochastic process
, which is now a discrete-time process and at each time step it takes values on some functional space. The process
X is derived from
Z as follows. For a trajectory of
Z observed over the interval
, we consider the
n subintervals of form
such that
. Then, we can write
With this, anticipate the behavior of Z on say is equivalent to predict the next function of X. The construction is usually called a functional time series (FTS). The setting is particularly fruitful when Z presents a seasonal component of size . In our practical application, Z will represent the underlying electrical demand, will be the size of a day and so X is the sequence of daily electrical loads. Notice that X represents a continuum that is not necessarily completely observed. As mentioned in the Introduction, the records of load demand are only sampled at some discrete grid. We will discuss this issue below.
2.1. Prediction of Functional Time Series
The prediction task involves making assertions on the future value of the series
having observed the first
n elements
. From the statistical point of view, one may be interested in the predictor
which minimizes the
prediction error given the available observations at moment
n. A useful model is the (order 1) Autoregressive Hilbertian (ARH(1)) process defined by
where
is the mean function of
X,
is a linear operator and
is a strong white noise sequence of random functions. Under mild conditions, Equation (
2) defines a (strictly) stationary random process (see [
2]). The predictor (
1) for the ARH(1) process is
which depends generally on two unknown quantities: the function
and the operator
. The former can be predicted by the empirical mean
. The alternative for the latter is to predict
by say
and obtain the prediction
, or to estimate directly the application
of
over the last observed centered function. Both variants needs an efficient way to approximate the possibly infinite size of either the operator
or the function
which are then estimated (see discussion below on this point).
The inherent linearity of Equation (
2) makes this model not flexible enough to be used on electricity load forecast. Indeed, having only one (infinite-dimensional) parameter to describe the transition between any two consecutive days is not reasonable. Variants have been studied. We may mention [
16] which incorporate weights in Equation (
2) making the impact of recent functions more important; the doubly stochastic ARH model that considers the linear operator
to be random [
17]; or the conditional ARH where an exogenous covariate drives the behavior of
[
18]. In the sake of more flexibility, we aim to make predictions on a time-varying setting where the mean function
and the operator
are allowed to evolve.
2.2. Spline Smoothing for Functional Data
In practice, one only disposes a finite sampling
observed eventually with noise, from the trajectory
of the random function
. Then, one wishes to approximate
from the discrete measurements. A popular choice is to develop
over the elements of a
basis
, which is to write
where the coefficients
are the coordinates resulting of projecting the function
x on each of the elements of the basis. Among the several bases usually used, we choose to work with a B-spline basis because they are adapted to cope with the nature of the data we want to model and have nice computational properties.
B-splines is a basis system adapted to represent splines. In our case, we use cubic spline that is 3th-order polynomial piecewise functions. The connections are made at points called knots in order to join-up smoothing, which is warranting the continuity of the second order derivative. An appealing property of B-spline is the compact support of its elements that gives good location properties as well as efficient computation.
Figure 1 illustrates this fact from the reconstruction of a daily load curve. The B-spline elements have a support defined over compact subintervals of horizontal axis.
Another important property is that, at each point of the domain, the sum of the spline functions is 1. Since the shape of the spline functions on the border knots are clearly different, this fact is clearly observed on the extreme points of the horizontal axis where only one spline has a non null value. Together with the regularity constraints and the additional knots on the extreme support, these points are subject to a boundary condition.
Figure 1 illustrates this important issue concerning the behavior of the boundaries. To avoid this undesirable effect, we will use a large number of spline functions on the basis that empirically allows for reducing the boundary condition.
2.3. Functional Principal Components Analysis
Like in multivariate data analysis, Functional Principal Components Analysis (FPCA) provides a mechanism to reduce the dimension of the data by a controlled lost of information. Since data in FDA are of infinite dimension, some care must be given to the sense of dimension reduction. Indeed, what we look for is a representation of the functions like the one in (
3) with a relatively low number of basis functions that are now dependent on the data. Moreover, if we demand also that the basis functions form an orthonormal system, then the solution is given by the elements of the eigendecomposition of the associated covariance operator (i.e., the functional equivalent to the covariance matrix) [
19].
However, the problem is that these elements are functions and so of infinite dimension. The solution is to represent themselves into a functional basis system (for instance, the one presented on the precedent section). Thus, the initial curve
can be approximated in the eigenfunctions basis system:
where the number
p of eigenfunctions, expected to be relatively small, will be chosen as such according to the error of approximation of the curves.
Since the representation system may be non orthogonal, then it can be shown that the inner product needed in FPCA is connected to the properties of the representational basis system.
Then, the notion of dimension reduction can be understood when one compares the lower number of eigenfunctions with respect to the number of basis functions needed to represent an observation. FPCA reduction of a representation dimension, which will yield a dramatic drop of the computational time of the model we describe next.
2.4. State Space Model
State Space Models (SSM) are a powerful useful tool to describe dynamic behavior of time evolving processes. The shape of the load curve may present long-term changes that induce non stationary patterns on the signal. Taking into account these changes is one of the challenges of electricity demand forecast.
The linear SSM [
10] includes two terms. An inertial term in the form of an intrinsic state of the whole system being modeled. The observed output is a function of the state, some covariate and a noise structure. The state evolution over time is modeled as a linear equation involving the previous state and other observed variables summarized in a vector
. The general formulation is given by:
where
is the target variable observed at time
i,
is a vector of predictors, the state at time
i is represented as
,
and
are known matrices, and
and
are the noise and disturbance processes usually assumed to be independent Gaussian with zero-mean and its respective covariance matrices
and
, which usually contains unknown parameters.
The evolution of the states are useful to understand the system. Using the celebrated Kalman Filter and smoothing, one is able to extract information about the underlying state vector [
10]. The one-step-ahead prediction and prediction error are respectively
In addition, their associated covariance matrices are of interest so let us define
and
. Since these definitions use recursion, an important step is its initialization. When the observations are unidimensional, an exact diffusion method can be used from uninformative diffuse prior. However, the method may fail with multivariate observations because the diffusion phase can yield into a non invertible
matrix. Moreover, even when
is invertible, computations become slow due to its dimensionality. It is however possible to obtain an univariate alternative representation of (
5) that theoretically reduces computational cost of the Kalman filter and allows one to use the diffuse initialization.
Inference on SSM can be obtained by a maximum likelihood (see [
20]) and, due to the diffuse initialization, the stability of the model is described in [
21].
2.5. A Functional State Space Model
Approaches of SSM in continuous-time also exists. For instance, Ref. [
10] presents the simple mean level model. There, the random walk inherent to the state equation is replaced by a Brownian motion that drives the time-varying mean level. Early connections between FDA and SSM yielded derivations of a SSM with the help of FDA. For example, Ref. [
22] uses spline interpolation to approximate the behavior of a time dependent system that is described by a space model.
Our choice is to keep the time discrete by allowing the observations to be functions or curves. A similar idea is behind the model in [
14] where functions are used to represent observation on a SSM model, but only dependence between states is considered.
Let us consider the vector as the p FPCA scores resulting from the projection of , the load curve for day i, into the eigenfunctions basis system. Then, we may represent an autoregression system by replacing the covariate by the past load curve, or more precisely by its spline coefficients .
We propose the following Functional State Space Model (FSSM):
As before, the disturbance terms and follow independent Gaussian distribution with zero mean vector and generally unknown variance matrices and . The sizes of these matrices are in a function of p, the number of FPCA scores retained on the approximation step discussed above.
In order to keep the computation time under control while keeping some flexibility on the modeling, we focus on three structural forms of matrices
and
: full, diagonal and null, which yields six possible models.
Table 2 summarizes the variants as well as the number of parameters to be estimated on the covariance matrices. The complexity of the variant grows while going from 1 to 6. When
is null, then the state equation establishes that states are simply constant on time. Diagonal structures on
and
assumes that all the correlations are null and so only variance terms are to be treated. Conversely, full structures allows for a maximum of flexibility letting all the covariances be free. However, the important drawback of dimensionality becomes crucial since the number of terms to be estimated if of order
.
The FSSM we propose is an SSM on the FPCA scores. Another choice could have been to apply the SSM directly on the spline basis coefficients , but such choice would be computationally too expensive. It is illustrative to link these dimensions to the problem of electricity demand forecasting. Recall that the number of daily records on our load curves is 48 (sampled at half-hourly), which is prohibited to be treated within our framework. Even if this number can be easily divided by two using spline approximation, the number of coefficients would be still too high. Moreover, since the spline coefficients can not be considered independent, one would need to use full diagonal structures on the covariance matrices and eventually on . Lastly, the choice we make to reduce the dimension by using FPCA approach is then justified since, with a handy number of eigenfunctions, say less than 10, most of the variants discussed above can be easily computed.
The whole prediction procedure is schematically represented as a flowchart in
Figure 2.
4. Experiments on Real Electrical Demand Data
We centre the experiments on this section around the French supplier Enercoop (enercoop.fr), one of the new agents appearing with the recent liberalization of the French electrical market. Electricity supplied by Enercoop is from green renewable electricity plants owned by local producers everywhere in France. The utility has several kind of customers such as householders, industries as well as small and medium-sized enterprise (SME) with different profiles of electricity consumption. People with households, for example, use electricity mainly for lightning, heating and, sometimes cooling and cooking. The main challenge for Enercoop is electricity demand and production and so anticipation of load needs is crucial for the company. We work with the aggregated electricity data that is the simple sum of all the individual demands.
We first introduce the data paying special attention to those salient features that are important for the prediction task. Then, we introduce a simple benchmark prediction technique based on persistence that we compare to the naive utilization of our prediction procedure. Next, in
Section 5, we study some simple variants constructed to cope with the existence of non stationary patterns.
4.1. Description of the Dataset
Let us use some graphical summaries to comment on the features of these data. Naturally, we adopt the perspective of time series analysis to describe the demand series.
Figure 4 represents the dataset that consists of half-hourly sampled records over six years going from 1 January 2012 to 31 December 2017. Vertical bars delimits years that are shown on top of the plot. Each record represents the load demand measured in kWh.
First, we observe a clear, growing long-term trend that is connected to a higher variability of the signal at the end of the period. Second, an annual cycle is present with lower demand levels during the summer and higher during winter. Both the industrial production calendar and the weather impacts the demand, which explains this cyclical pattern.
Figure 5 shows the profile of daily electricity consumption for a period of three weeks (from 31 October 2016 to 20 November 2016). This unveils new cycles presented in data that can be seen as two new patterns: the weekly one and the daily one. The former is the consequence of how human activity is structured around the calendar. Demand during workable days is larger than during weekend days. The latter is also mainly guided by human activity with lower demand levels during the nighttime, and the usual plateau during the afternoon and peaks during the evening. However, a certain similarity can be detected among days. Indeed, even if the profile of Fridays is slightly different, the ensemble of workable days share a similar load shape. Similarly, the ensemble of weekends form a homogeneous group. Holiday banks and extra days off may also impact the demand shape. For instance, in
Figure 5, the second segment, which corresponds to 1st November, is the electrical demand on a bank holiday. Even if this occurs on a Tuesday, the shape of load of these days and the preceding one (usually an extra day off) are closer to weekend days.
We may also inspect the shape of the load curve across the annual cycle. A simple way to do this is to inspect the mean form between months.
Figure 6 describes four monthly load profiles, where each month belongs to a different season of the year. Some differences are easy to notice—for instance, the peak demand is during the afternoon in Autumn and Winter but at midday in Spring and Summer. Others are more subtle, like the effect of daylight savings time clock change, which horizontally shifts the whole demand. Transitions between the cold and warm seasons are particularly sensitive for the forecasting task, especially when the change is abrupt.
4.2. Spline and FPCA Representation
As before, we report on the reconstruction error resulting from the spline and FPCA representations.
For comparison purposes, we compute the error criteria for five choices on the number of splines (12, 24, 40, 45 and 47 splines) on the reconstruction of the coded functions.
Table 6 shows the MAPE and RMSE between the reconstruction and the real load data. As expected, the lower the number of splines, the higher the reconstruction error. This shows that, using only spline interpolation, our approach is not pertinent because a relatively large number of spline coefficients is needed. The extremest case of 12 splines, which would make the computing times reasonable, produces a too large MAPE of 1.310%, which hampers the performance of any forecasting method based on this reconstruction. On the other extreme, using 47 cubic splines to represent the 48-length discrete signal of the daily demand produces boundary effects that will dominate the variability of the curves.
Since spline approximation is connected to the FPCA in our approach, we may check the reconstruction quality for all the choices issued from the crossing of the selected number of splines and a number of principal component between 2 and 10.
Table 7 shows the MAPE and RMSE of the reconstructions obtained by each of the possible crossings. We may target a maximum accepted MAPE value of 1% in reconstruction. Then, there are just a few options, most of them with very close MAPE values. In what follows, we choose to work with 45 cubic splines and 10 principal components.
4.3. Forecasting Results
Forecasting is done in a rolling basis over one year in the data. Load data from 1 January 2012 to 30 October 2016 is used as a training dataset and, as a testing dataset, we choose a period from 31 October 2016 to 30 October 2017. Predictions are obtained at horizons one-segment ahead. This means that, actually, we are making predictions for the next 48 time steps (1 day) if we adopt a traditional time series point of view. Once the prediction is obtained, we compare it with the actual data and incorporate the observation into the training dataset. Thanks to the recursion in the SSM, only an update step is necessary here.
To give a comparison point to our methodology, we propose using a simple but powerful benchmark based on persistence forecasting.
4.3.1. Persistence-Based Forecasting
A persistence-based forecasting method for a functional time series equals to anticipate
with the simple predictor
. Thus, the predictor can be connected to the ARH model (Equation (
2)) where the linear operator is the identity operator
. However, this approach is not convenient since there are two groups of load profiles in the electricity demand given by workable days and the other days (e.g., weekends or holidays). We use then a smarter version of the persistence model, which takes into account the existence of these two groups. The predictor is then written
Table 8 summarizes the MAPE on prediction by type of day for the persistence-based forecasting method. We can observe that the global MAPE errors are several times the reconstruction error we observed above. There is a clear distinction between those days predicted by the previous day and the other ones (i.e., Saturdays, Sundays and Mondays). The lack of recent information for the latter group is a severe drawback and impacts its prediction performance. The increased difficulty of predicting bank holidays is translated into the highest error levels.
4.3.2. FSSM Forecasting
We now present the results for the proposed FSSM. Only the variant 1 in
Table 2 is used, namely we consider the covariance matrix of the observations
as diagonal and the covariance matrix of the states
as null. The reason is twofold. First, we show on simulations that basic models give as satisfactory results as the more involved ones. Second, computing time must be kept into reasonable standards for the practical application.
Table 9 and
Table 10 show the MAPE on prediction for days and months respectively for the FSSM approach. In comparison with the persistence-based forecasting, the global error is sensibly reduced with improvement on almost all day types. In addition, improvements are observed in all the months of the year but August (results for persistence-based forecasting are not shown). If we look at the distribution of MAPE, we see that the range is compressed with a lower maximum error but also a higher minimum error. This last effect is the price to pay for having an approximate representation of the function. We may think the MAPE on approximation as a lower bound for the MAPE on forecasting. The higher this bound, the more limited the approach is. Despite this negative result, the gain on the largest errors observed before more than compensates for the increment on the minimum MAPE and yields a globally better alternative. Among workable days, Mondays presents the higher MAPE. FSSM being an autoregressive approach, it builds on the previous days that present a different demand structure. Moreover, the mean load level of these two consecutive days is sharply different. Undoubtedly, incorporating the calendar information would help the model to better anticipate this kind of transition. Both mean load level corrections and calendar structure are modification or extensions that can be naturally incorporated in our FSSM. We discuss some clues for doing this in the next section.
7. Discussion and Conclusions
In a concurrent environment, electrical companies need to anticipate load demand from data that presents non stationary patterns induced by the arrival and departure of customers. Forecasting in this context is a challenge since one desires using as much past data as possible but needs to reduce the usable date to the records that describe the current situation. In this trade-off, adaptive methods have their role to play.
Figure 7 witnesses the ability that FSSM has to adapt to a certain extent to the changing environment. The impact of an external event to the electrical demand is translated into larger variability on the error and so an inflation of the trace of the errors variance matrix (cf. at the end of 2016).
Forecasting process for Enercoop’s load demand in this paper is mainly endogenous. Only some calendar information is used as an exogenous variable. However, in electricity forecasting, it is well known that weather has a great influence on the load curve. For instance, temperature impacts through the use of cooling systems in hot season and electrical heating during cold seasons. In France, this dependence is known to be asymmetrical with a higher influence of temperature on cold days. The nature of this covariable on forecasting is different to the ones we used. Indeed, while calendars can be deterministically predicted, it is not the case for the temperature. Using forecasted weather on an electrical demand forecast inserts the eventual bias and the uncertainty of the weather forecasting system to the electrical demand prediction. Integration of weather information into our model to forecast Enercoop’s load demand data, eventually changing the structure of the matrix , and obtaining prediction interval for the predictions are perspectives of future work.
For France’s grid load demand, we have integrated fixed and weather variables that enable better forecasting than in the case of Enercoop. It is possible to improve the accuracy of our model on this data if there is more time to have more information on all variables that can be introduced in the forecasting process.
In addition, only point predictions are obtained. In a probabilistic framework, one would like to have not only an idea of the mean level anticipation of the load, but also some elements about the predictive distribution of the forecast. Whether it is a whole distribution, some quantile levels or a predictive interval, this information is not trivially obtained from our approach. While SSM does provide intervals through the Gaussian assumptions coupled with the estimation of the variance matrices, FSSM has this information on the coefficients of the functional representation. Transporting this information to the whole curve needs to be studied.
Besides these technical considerations, a natural interrogation is how this model can be used in other forecasting contexts. While the work is focused on short-term prediction horizons, there is no mathematical constraint in using the model on other tight time frameworks. For instance, for a long-term horizon, one would naturally increase the sample resolution to a certainly monthly basis in which case the functional segments could represent the annual cycle. However, as it is well known in practice, predicting long-term patterns without relevant exogenous information is not the best option. Another case to consider is the presence of periodic and seasonal patterns. As it is the case, the electricity demand carries on very clear cycles (e.g., yearly, weekly, daily) that are exploited by our model in two ways. First, the smallest one is taken into consideration within the functional scheme and so all the non stationary patterns inside it are naturally taken into consideration by the model. Second, the largest ones are explicitly modelled as exogenous variables. A last case relevant to discuss is the one where the sampling of the time series is done by an irregular basis. However, in our examples, we only treated regular grid samplings, and the functional data analysis framework allows one to cope with irregular sampling in a natural way. Indeed, the data for each segment (in our case each day) is used to fit the corresponding curve (e.g., the daily load curve) and then estimate the spline coefficients. With this, the rest of the forecasting procedure remains unchanged. Of course, the proposed model could be used to forecast other kind of phenomena. The good reliability and predictability would depend on the nature of these signals.
To sum up, the presented model has enough flexibility to be used in the anticipation of energy demands in the electricity industry, providing credible predictions of energy supply, and improvement on the efficient use of energy resources. This may help to handle theoretical and practical electricity industry applications for further development of energy sources, strategic policy-making, plans of energy mix and adoption patterns [
25].
References