In this section, the modeling and forecasting results obtained using the OGPR and OSGPR models included in the comparative study as well as the evaluation metrics used are presented. The methods of initalising and estimating the hyperparameters are also explained.
5.1. Hyperparameter Initialisation and Estimation
To model GHI using Gaussian process regression, we first have to identify the most adapted kernel. In this comparative study, kernel identification is inspired by the procedure described in [
33]. The main idea is to construct a wide variety of kernel structures by adding or multiplying simple kernels from the kernel families discussed in
Section 2.2. While all the possible combinations of simple kernels have been evaluated, only the following kernels and combinations have been included in the comparative study.
Simple kernels: , , , , , and .
Quasiperiodic kernels, formulated as
- -
products of simple kernels—i.e., , , , , and ;
- -
sums of simple kernels—i.e., , , , , and .
As a matter of fact, results emanating from other combinations of non-periodic kernels are not presented here because they demonstrate a behaviour similar to that of simple kernels.
The maximisation of the log marginal likelihood (
21) allows the estimation of kernel hyperparameters from GHI training data. Convergence to the global maximum cannot be guaranteed since
is non-convex with respect to
. Several techniques can be used to remedy this issue, the most classical of which is to use multiple starting points which are randomly selected from a specific prior distribution; for instance,
. In [
48], the authors incorporate various prior types for the hyperparameters’ initial values, then examine the impact that priors have on the GPR models’ predictability. Results show that hyperparameters’ estimates, when using
, are indifferent to the prior distributions, as opposed to
, whose hyperparameters’ estimates differ along with the priors. This leads to the conclusion that prior distributions have a great influence on the hyperparameters’ estimates in the case of a periodic kernel.
As mentioned in
Section 4, the two GHI datasets (i.e., the summer and winter datasets) are split into a training subset and a testing subset that cover periods of 30 days and 15 days, respectively. As for the initial values of the hyperparameters
, we have made the following choices.
The correlation length ℓ has been chosen to be equal to the training data’s standard deviation.
In case a periodic kernel is involved in the regression process, an initial value equal to one day has been chosen for the period P in order to surpass issues that arise while estimating it.
The initial values of remaining hyperparameters, if any, are randomly drawn from a uniform distribution.
5.3. Forecasting Results Using OGPR
Forecasts of GHI obtained at different time horizons are compared to enable the assessment of the models’ global performance. The persistence model is used as a reference. For all the OGPR models catalogued in
Section 5.1, the nRMSE vs. the forecast horizon is displayed in
Figure 1. Further numerical results for the summer dataset (
Table A2 and
Table A3) and the winter dataset (
Table A4 and
Table A5) are presented in
Appendix B. Broadly speaking, there are three classes of models, each possessing a different performance level. The worst performance is exhibited by the persistence model, and improved performance is witnessed when using OGPR models based on simple kernels; lastly, considerably better performance is exhibited by OGPR models based on quasiperiodic kernels, particularly considering higher forecast horizons.
For both datasets, even at the lowest forecast horizon ( 30 ), OGPR models based on simple kernels give forecasts comparable to those given by the persistence model ( in summer and in winter), while OGPR models based on quasiperiodic kernels already give better forecasts ( in summer and in winter). As the forecast horizon increases, the persistence model’s performance degradation is more rapid than that of OGPR models. At the highest forecast horizon ( 5 ), the persistence model gives in summer and in winter; for OGPR models based on simple kernels, in summer and in winter; for OGPR models based on quasiperiodic kernels, in summer and in winter.
Regarding OGPR models based on simple kernels, no best-performing models have been found. Depending on the forecast horizon,
,
,
and
all alternatively give the best forecasts, while
does not manage to perform competitively. An interesting observation is that
—this kernel is often used to forecast GHI; see [
21,
23]—does not ensure the best results among the simple kernels.
Because OGPR models based on a periodic kernel produce a periodic signal, one can say that such models are similar to clear-sky GHI models whose parameters have been fitted to the data. As a consequence, these models simply recurrently reproduce the same bell-shaped curve and produce practically the same nRMSE with respect to increasing forecast horizons. Although good forecasts can be produced by these OGPR models on clear-sky days, they are unable to predict atmospheric-disturbance-induced variability in GHI data.
However, OGPR models based on quasiperiodic kernels—these kernels combine a periodic kernel with a non-periodic kernel–possess the advantage of the periodic kernel, while still managing to predict rapid changes in GHI during the day. Among quasiperiodic kernels,
surpasses other kernels for the summer dataset, with
,
and
all coming in a close second (see
Table A2); for the winter dataset, however, there is no clear best-performing kernel, as those four kernels alternatively take the first place (see
Table A4).
An in-depth analysis of the temporal evolution of GHI during the models’ training and testing phases sheds more light on their performance. Once more, the persistence model serves as a reference. Three OGPR models are selected: the classic
-based OGPR model and two of the best-performing models based on quasiperiodic kernels; i.e., the
-based OGPR model and the
-based OGPR model. Here, a dataset of nine days (selected from the summer dataset) is used, split into a seven-day training subset and a two-day testing subset. In
Figure 2,
Figure 3 and
Figure 4, 30
, 4
, and 48
forecasts are shown.
Recall that, while each data sample is used during training, during testing, a new observation is added to the observation set only each whole multiple of the forecast horizon. This means that, first, for all three figures, the training results are identical; second, the coefficients
in Equation (
16) are updated every 30
in
Figure 2 and every 4
in
Figure 3, while for the 48-h forecast horizon, no update occurs.
An inspection of the seven-day training phase reveals that the data are well-fitted by every OGPR model. Signals generated by both and are quite smooth and show few differences, as opposed to , whose capability for generating more irregular signals allows it to follow the temporal evolution of GHI more closely in the case of atmospheric disturbances.
A study of the two-day testing phase reveals the following: all models perform very well when a new observation is added to the observation set every 30
(
Figure 2), although OGPR models based on quasiperiodic kernels, especially
, perform slightly better. The performance gap becomes more apparent when the forecast horizon increases. Thus, when a new observation is added to the observation set every 4
, the
-based OGPR model struggles to predict changes in GHI accurately, as conveyed by the substantial confidence interval between observations (
Figure 3a). As soon as a new observation is made, the model fits to the data, but in the absence of an update, it converges to the constant mean value learned during training (around 280
/
). In
Figure 4a, this behaviour is more obvious: no update is made throughout the entire two-day testing period, and without additional information, the OGPR model simply makes an “educated guess”, consisting of this constant mean value associated with a large confidence interval. Quasiperiodic kernels, however, do possess additional information on GHI, showing that it has a daily pattern. As with OGPR models based on simple kernels, when a new observation is added to the observation set, they fit to the data (see
Figure 3b,d).
In the opposite case, OGPR models based on quasiperiodic kernels reproduce the periodic value learned during training (see
Figure 4b,d), giving a result that is distinctly more faithful to the desired behaviour than a constant mean value. This explains why the performance of OGPR models based on quasiperiodic kernels degrades more slowly when the forecast horizon increases, as seen in
Figure 1. Based on the results shown in
Figure 3b,d,
is the best choice among quasiperiodic kernels as it permits sharper and more drastic changes in the modelled signal.
The nRMSE gain relative to the persistence model is presented in
Table 1, for OGPR models based on the classical
and the three best-performing quasiperiodic kernels (winter dataset).
5.4. Forecasting Results Using OSGPR
The impact of training data sparsity on forecasting accuracy (generalisation) is assessed in this section of the paper. The subset of data technique is used; this technique simply consists of ignoring a part of the data available for training. Lowering the quantity of data used to train the models reduces computation time during training and also during testing, since the number of parameters is reduced (see Equation (
16)). In addition, this allows us to evaluate the models’ ability to handle missing data in the training sequence (i.e., their generalisation ability in the case of missing data in that sequence).
Thirty minute, 4
and 48
forecasts given by both the
-based OSGPR model and the
-based OSGPR model are displayed in
Figure 5. The dataset constitutes the same measurements of GHI (nine days) that were considered with OGPR models (see
Section 5.3). Here, however, only 20% of the available data—randomly selected from the summer dataset—have been used during training.
With this low number of training points, the OSGPR model based on does not provide acceptable results. Compare, for example, the third and fourth days of training: when given sufficient data, a good fit is obtained, but if the only training points are at the beginning and the end of the day (as during the fourth day), the OSGPR model based on simply makes the best inference, which, in this case, is not sufficient. In contrast, the OSGPR model based on still manages to give an acceptable fit. The key here is that even a low number of training points is enough for the models based on quasiperiodic kernels to learn the daily periodic behavior of GHI.
As expected, the forecast results are good when considering the 30
horizon, with an advantage shown for the
-based OSGPR model. However, as for OGPR models, when the forecast horizon increases, the superiority of the quasiperiodic kernel is shown again. It should be noted that, with a low number of data, periodic behaviour is not learned as well as before: compare
Figure 5f, where the two testing days are not bell-shaped but rather resemble the last four training days, to
Figure 4d). Nonetheless, this example demonstrates the usefulness of Gaussian process regression models, even in the case of (severely) missing data.
Figure 6 and
Figure 7 show nRMSE vs. training data sparsity for the
-based OSGPR model and the
-based OSGPR model, respectively.
Figure 8 shows the computation time vs. training data sparsity for the latter model. To avoid making conclusions based on a particular realisation (keep in mind that training data are randomly selected), 100 Monte Carlo runs have been conducted; thus, the use of box plots (the Tukey box plots display the median, 0.25-quartile and 0.75-quartile, and whiskers corresponding to ±1.5 times the interquartile range) in
Figure 6,
Figure 7 and
Figure 8. As can be seen, if during training, nRMSE decreases steadily with data sparsity, during testing, it quickly reaches a threshold level, and dispersion also quickly vanishes. In this study, this threshold seems to be around 70%. As a consequence, using only 30% of the available training data appears to be sufficient to achieve the same nRMSE results to those obtained when using the whole training dataset. The gain in computation time is significant, as it falls from a median of 175 s at a sparsity level of 0% to a median of 17 s at a sparsity level of 70%, which amounts to around a 90% reduction.