1. Introduction
Accurate solar forecasting plays a pivotal role in effectively managing solar power resources and, consequently, in expanding the contribution of solar energy to the overall electricity supply. These forecasts offer valuable insights for making well-informed decisions regarding energy distribution, resulting in cost reductions related to the variability of solar energy, maintaining supply–demand equilibrium, and backup generation.
The choice of forecasting type and tools hinges on the unique needs and objectives of users, whether they are energy operators, traders, or policymakers in the renewable energy sector. There are three categories of solar irradiance forecasting that cater to varying timeframes and specific applications, from real-time energy management to long-term planning and resource optimization.
Day-ahead solar irradiance forecasting predicts solar irradiance levels 24 h in advance with hourly resolution [
1,
2]. This is typically achieved by leveraging NWP models, historical weather data, and machine learning algorithms. Recent cutting-edge post-processing techniques using model output statistics (MOS) analysis have shown their effectiveness in producing exceptional results for this horizon [
3,
4]. This type of forecasting plays a crucial role in energy planning, scheduling, and market trading strategies. It enables utilities, grid operators, and solar power plant operators to anticipate solar energy availability, facilitating preparations for energy storage, grid balancing, and scheduling operations in alignment with these forecasts.
Intra-day solar irradiance forecasting is focused on short-term predictions within the current day, generally spanning from 1 to 6 h ahead. It makes use of various data sources, including satellite imagery, ground-based sensors, NWP models, and artificial intelligence (AI) algorithms. Intra-day forecasting is essential for real-time decision-making in solar power generation, allowing operators to optimize grid integration and make dynamic adjustments to PV system output to adapt to rapidly changing irradiance conditions and for the energy trading market [
5,
6].
Intra-hour solar irradiance forecasting is designed for ultra-short-term predictions within the next hour, with a high level of granularity, often down to 5–10 min intervals. It relies on tools like all-sky imager (ASI) and AI models, including neural networks and deep learning. Typically, this approach involves the utilization of either cloud motion vectors (CMVs) or cloud coverage data obtained through image processing of sky images. Subsequently, a deep learning model is employed to forecast cloud cover for the upcoming 5–10 min [
7].
Intra-hour forecasting is of paramount importance for grid operators and energy market stakeholders, as it assists in managing immediate fluctuations in solar power generation. It ensures a balanced supply–demand equation, sustaining grid stability, and is particularly vital for addressing rapid ramp events in real-time energy management.
For intra-day forecasting, it is necessary to obtain extensive cloud field observations. Satellite data, with their wide coverage, serve as suitable data sources for these forecasting horizons [
8]. Satellite-based solar forecasting relies on visible channel geostationary satellite imagery for two primary reasons. The first reason is the distinct visibility of cloud cover in these images, as clouds tend to reflect more solar irradiance into outer space compared to the ground, making them appear brighter in the visible channel. However, it is crucial to emphasize that in regions characterized by high albedo terrain, such as snowy areas or salt flats, specific measures are required. These often entail utilizing infrared images for effective treatment [
9].
The second reason pertains to the high temporal frequency of these images, with intervals typically ranging from 10 to 30 min, and their moderate spatial resolution, typically up to 1 km. This combination of temporal and spatial attributes allows for the tracking of cloud movement over time by estimating the cloud motion field (CMF) or CMV. CMVs are subsequently used to project the motion of clouds into the near future, enabling the prediction of forthcoming images and, in essence, the future position of clouds. To derive predictions for ground-level solar irradiation, a satellite-based assessment model is applied to each predicted image within each specified time horizon [
10,
11]. Several methods for estimating CMVs have been reported in the literature; yet, the following two approaches are extensively utilized for forecasting cloud motion.
- (i)
Block-matching algorithm: This method entails the identification of cloud formations in sequential images to compute CMVs, under the assumption that cloud structures with specific spatial scales and pixel intensities exhibit stability within the relatively brief time intervals being examined. In practical implementation, matching areas in two consecutive images are located [
12].
- (ii)
Optical flow estimation—The Lucas–Kanade algorithm: Optical flow is a broad concept in image processing that pertains to the detection of object motion in consecutive images. Optical flow algorithms specifically focus on methods that detect motion by utilizing image gradients. Essentially, they allow for the generation of dense motion vector fields, providing one vector per pixel. The fundamental principle behind optical flow algorithms is the assumption that changes in pixel brightness are solely attributable to the translational motion of elements within the observed scene [
13].
In recent years, numerous forecasting methodologies have been introduced, showcasing precise prediction outcomes. However, conducting comparisons of the proposed models or methodologies proves challenging due to the diverse locations and conditions associated with these distinct forecasts. Hybrid approaches are popular for intra-day forecasting utilizing optical flow methods to calculate CMVs from two subsequent satellite images and predict cloud displacement up to 3 h ahead combined with ANN. The initial phase of the methodology comprises calculating CMVs for the infrared channel pixels. Subsequently, these CMVs are extrapolated to the visible channel pixels to derive the cloud index. ANN models are employed to exclusively train on the cloud index, Sun-to-Earth position, and ground measurements of the Global horizontal irradiance (GHI) at the present moment, with the aim of forecasting the GHI in future instances [
14]. Some recent research explores alternative methodologies and chooses not to utilize the CMV-based approach. Instead, the authors directly incorporate historical satellite-derived observations into their convolutional neural network (CNN) architecture, facilitating the forecasting of the GHI up to six hours in advance [
15,
16]. A study has been published, which showcases an evaluation and comparison of different techniques for satellite-derived cloud motion vectors [
17].
The prediction of cloud behavior from satellite data is a challenging endeavor because clouds exhibit dynamic characteristics, constantly changing their position and shape, and even dissipating. These changes are influenced by the intricate dynamics of the atmosphere, which are not explicitly modeled in the forecasting method. Additionally, CMVs are represented as two-dimensional vector planes which ignore the vertical dimension. This implicit assumption implies that, within a local space, clouds are all at the same altitude, which limits the baseline techniques in capturing any phenomena related to vertical cloud motion, such as complex convection processes. To address these limitations, additional assumptions are introduced by the CMV estimation technique to facilitate motion estimation in the sequence of images.
Recent studies have been actively exploring a distinct domain, employing artificial intelligence (AI) methodologies such as CNN and deep learning. This aims to predict future cloud positions by analyzing their present states. Numerous scholarly articles have emerged in this field in recent times [
18,
19,
20,
21,
22].
In this research, the derivation of the global horizontal irradiance (GHI) from satellite-derived images is presented. Utilizing the pixel data from these satellite images, GHI forecasts up to 6 h are made through hybrid AI models. These predictions are subsequently combined with ground measurement-based hybrid forecast models and GHI forecasts generated by the Global Forecast System (GFS). This paper provides two principal contributions: (1) the utilization of satellite imagery for accurate estimation of the GHI at any given location, and (2) it presents empirical evidence demonstrating the enhanced forecasting accuracy achieved through the use of hybrid ensemble techniques that integrate tree-based regression machine learning with neural network-based deep learning models.
The paper is divided into the following parts:
Section 2 provides a comprehensive discussion of ground measurement data, satellite-derived data, and GFS data, along with relevant details concerning the three sites subject to evaluation.
Section 3 focuses on the derivation of the GHI from satellite image pixels and presents a comparison of the applied clear sky models. Error metrics employed in the study are outlined in
Section 4.
Section 5 delves into the experimental setup, encompassing the forecasting model architecture, feature engineering, and an overview of various models.
Section 6 is dedicated to the presentation of observations and the analysis of results. Finally,
Section 7 encapsulates the paper with its key insights, conclusions, and scope of future work.
4. Performance Metrics
The three predominant metrics employed to assess the accuracy of point forecasts include the mean absolute error (MAE), the root mean square error (RMSE), and the mean bias error (MBE). In this paper, the aim is to minimize the RMSE, as a lower RMSE is better suited to applications such as grid operation and stability where high errors are critical [
35], whereas the MAE is more suited to applications such as day-ahead markets, which have linear costs relative to the error [
36].
where
N represents the number of forecasts,
represents the actual ground measurement value, and
denotes the forecast value.
The normalized MAE (nMAE) and RMSE (nRMSE) are computed by dividing the respective metrices by the mean value of the measured GHI () for the evaluated test period.
The coefficient of determination denoted as R2 score is a statistical measure commonly used to assess the goodness of fit between observed and predicted values. A higher value for the R2 score indicates a more favorable model fit.
The skill score serves as a critical parameter for assessing model performance when compared to a reference baseline model, typically the smart persistence model in the context of solar forecasting. The skill score (
) of a particular model for the forecast is defined by the following equation:
Additionally, the Kolmogorov–Smirnov integral (KSI) score is used to measure the similarity between forecast data distributions with distribution of the ground measurement data. It is often applied in the context of comparing the cumulative distribution functions (CDFs) of two datasets. The KSI score quantifies the maximum vertical deviation between the two CDFs.
5. Experimental Setup
In this study, ML and DL models were analyzed and, subsequently, a hybrid model was developed to allow for higher degrees of model complexity. The methodology employed in this study and the fundamental stages of the modeling and evaluation algorithm for assessing GHI forecasts are depicted in
Figure 3.
The initial section on the left side of the flow chart represents the hybrid model (A), which solely relies on ground measurement data without incorporating any exogenous features. The middle segment beneath EUMETSAT demonstrates the computation of the GHI from satellite image pixels at the current time (t0), whereas the rightmost part illustrates forecasting of the GHI from satellite pixels (blue dashed lines). The hybrid ensemble model (B) is constructed using the output of Hybrid + SAT + GFS by applying a weighted average ensemble methodology.
The initial forecast for hourly solar irradiance within the day usually takes place early in the morning, typically between 5:00 and 7:00 local time. This forecast provides predictions for solar irradiance levels throughout the day, broken down into hourly intervals. The exact timing of the first forecast can vary depending on the weather forecasting service or organization providing the information. In view of this, the models are analyzed by issuing the first forecast at 6:00 hr only during daylight hours, i.e., when the solar zenith angle
< 80° for the following horizons:
t0 | τ | ∆ |
Every hour starting at 6:00 hr | 1, 2, …, 6 hr | 15 min |
where
t0 is the forecasting issuing time, and τ is the forecasting horizon and represents data ∆ aggregation.
This study utilizes a walk-forward approach, as visualized in
Figure 4, to improve and evaluate the performance of the model. These models are trained with the first three months of the available data and, subsequently, trained until the previous month and tested for the following month. The outcomes have shown that this approach can be effectively employed even in situations where a substantial volume of historical data is lacking.
5.1. Model Description
The different models used in this study are listed in
Table 2.
A comprehensive overview of theory and description of ML models which are used for modeling in this study, is presented by the authors of [
37].
5.2. Deep Learning (DL) Models
5.2.1. LSTM
LSTM, an abbreviation for long short-term memory, represents a variation of the recurrent neural network (RNN) concept. The primary objective of this model approach is to effectively capture and retain long-term temporal relationships, mitigating the challenge of vanishing gradients often encountered in conventional RNN [
38]. In addition, LSTM exhibits a noteworthy capability for learning and managing nonlinearity.
Within the LSTM architecture, memory blocks are constructed, forming an interconnected network designed to sustain and manage an information flow over time using nonlinear gating mechanisms. The structure of a standard LSTM architecture involves multiple individual units known as neurons, interconnected by weights. These neurons receive input from external parameters or other neurons, and through the training process, the associated weights are adjusted to achieve an optimal distribution of weights.
Figure 5 illustrates the fundamental structure of an LSTM memory cell, encompassing these gating elements, input signals, activation functions, and the resulting output [
39]. Notably, the output of this block is recurrently linked back to the input of the same block, involving all the gate components in this cyclical process.
The step-by-step description of the gates and the mathematical equations associated with them are discussed below:
Forget gate: This step concentrates on updating the forget input component, which entails combining the current input
with the output of that LSTM unit
from the previous iteration, as expressed by the following equation [
39]:
where
is the weight associated with
and
, and
represents a standard bias weight vector.
Input gate: In this step, a sigmoid layer called input gate decides which value is to be updated by combining the current input
and the output of that LSTM unit
[
39].
Cell: A tanh layer creates a vector of new candidate values,
, that could be added to the state [
39].
The tanh function squashes input values to the range of −1 to 1 and is symmetric around the origin, with an “S” shape.
In this study, the rectified linear unit (ReLU) activation function is implemented for all deep learning models as it has demonstrated superior results during the fine-tuning of hyperparameters. The ReLU function is defined as follows:
In other words, for any input x, if x is positive, ReLU returns x, and if x is negative, ReLU returns 0. This results in a piecewise-linear, non-decreasing activation function. Further, ReLU has the distinct advantages of simplicity and computational efficiency, and it mitigates the vanishing gradient problem in deep networks. This also promotes sparse activation, which can lead to better feature learning.
The fraction of the old state is discarded by multiplying with
and the result is added to the product of the current state product
to update the new state [
39]:
Output gate: The output layer consists of a single neuron employing an identity activation function. This step calculates the output gate or the current hidden state and can be depicted as below [
39]:
The following variants of the LSTM model based on the topologies of the layers [
40] are applied in this research:
Vanilla LSTM is the most basic model that has a single hidden layer of LSTM units, and an output layer to make a prediction.
Stacked LSMT: As the name implies, this variation comprises several hidden layers that are arranged one above the other. An LSTM layer by default requires input data in 3D format and will output 2D data.
Bidirectional LSTM: The key feature of this architecture is that it processes input sequences in both forward and backward directions, allowing it to capture information from past and future contexts simultaneously, both of which are connected to the same output layer to combine the interpretations. Bidirectional LSTM achieves a full error gradient calculation by eliminating the one-step truncation that was initially present in LSTM [
40]. Bidirectional processing empowers the model to capture dependencies in both forward and backward directions, proving especially beneficial for tasks where context from future data is significant. Moreover, this model exhibits the ability to capture extended dependencies, thereby enhancing the model’s comprehension of the input sequence.
Dense LSTM: The model’s inception involves an input layer designed to receive sequential data. There are one or more LSTM layers following the input layer, tailored for the processing of sequential data, as they retain hidden states and capture temporal dependencies. Densely connected layers, often referred to as fully connected layers, are established subsequent to the LSTM layers. These densely connected layers are used to perform various non-linear transformations and reduce the dimensionality of the data.
Figure 6 illustrates a typical schematic of the sequential flow of an LSTM neural network with all layers, input, and output nodes. The hidden layer nodes are the LSTM cells while dealing with the sequential data.
5.2.2. GRU
The gated recurrent unit (GRU) is a type of recurrent neural network (RNN) architecture specifically designed to address some of the limitations of traditional RNNs like vanishing gradients. GRU is a simpler and computationally more economical option compared to the long short-term memory (LSTM) network, yet it retains the capacity to capture extended dependencies in sequential data.
GRU consists of a solitary hidden layer and two principal gates labeled as the reset gate
and the update gate
. These gates control the flow of information through the hidden state, allowing the network to determine what information to retain and what to discard. A schematic of a typical GRU block is illustrated in
Figure 7 with subsequent equation and functions of each gate [
41].
The Update Gate (
) for time step t is derived by using the equation
where
σ denotes the sigmoid activation function;
denotes the weight matrix for the update gate;
is the concatenation of the previous hidden state and the current input;
represents the bias term for the update gate.
The Reset Gate (
) is akin to the update gate, but it determines what information from the previous hidden state should be erased.
Candidate Hidden State (
): A novel memory component that utilizes the reset gate to retain pertinent information from previous time steps. It is calculated as follows:
where ⊙ represents element-wise multiplication.
Final Hidden State (
): In the final stage, the network is responsible for computing
, a vector that encapsulates information pertaining to the current unit and transfers it further within the network. To achieve this, the update gate plays a crucial role by discerning what information to gather from the present memory content (
) and what to retain from prior time steps, represented by
.
The advantages of GRU compared to LSTM are threefold: (1) Compared to LSTM, GRU features a more streamlined architecture with a reduced number of gates and less intricate computations. This characteristic simplifies the training and implementation processes. (2) Due to its simplicity, GRU is computationally more efficient than LSTM, which can lead to faster training and inference times. (3) GRU is a good choice for tasks where capturing short-term dependencies in sequences is sufficient and a more complex model like LSTM may not be necessary. However, GRU has limited capacity in capturing very long-term dependencies in sequences, as it does not have a dedicated cell state like LSTM. LSTM is better suited for tasks that require modeling long-range dependencies.
The input data format for deep learning (DL) models differs from that of machine learning (ML) model algorithms. The data are structured as a 3-dimensional array, where the dimensions represent the batch size, the number of time-steps, and the number of input features. Through experimentation with hidden layers containing 24, 64, and 128 nodes, it was observed that 64 nodes produced the most favorable results.
All deep learning models are trained and compiled with buffer sizes equal to the length of training data, 50 epochs and early stopping using Tensforflow, mean squared error loss, mean absolute error metrics and Adam optimization. The hyperparameters of ML models are optimized by applying sklearn grid search.
Appendix A illustrates various hyperparameters of both ML and DL tuned and optimized. The optimized values shown are for a particular case of optimizing the N1 pixel (SAT) for a given month for Wien. These parameters are trained and optimized for every month as shown in
Figure 4 and the trained model is tested for the subsequent month for each station. The optimized parameters remain the same for different stations but may slightly vary depending on optimization for the SAT or Hyb model and also on the month.
5.2.3. DL Voting Ensemble Model
A voting ensemble model from optimized stacked LSTM, bidirectional LSTM, dense, and GRU is built. This is achieved by loading the pretrained models and making predictions using individual models known as base models. Input layers are defined for the ensemble model. These input layers are used to feed the predictions of the individual models into the ensemble.
The output tensors of the individual models are obtained by passing the input layers as inputs to each model. This produces output1, output2, output3, and output4, which represent the outputs of each LSTM model. An ensemble layer is then created using the average() function, which averages the outputs of the individual models (output1, output2, output3, and output4). Averaging is a common method for combining predictions in a voting ensemble for LSTM-based models.
The ensemble model is then defined using the model class from Keras. The input layers and the ensemble layer are specified as inputs and outputs, respectively. This creates a new model called Ensemble model that takes predictions from the four models and produces an ensemble prediction. The ensemble model is compiled with the Adam optimizer and the mean squared error (MSE) loss function. Additionally, accuracy is tracked as a metric.
Leveraging a voting ensemble with diverse LSTM models is a potent strategy for enhancing the precision and resilience of predictions. It utilizes the diversity of multiple LSTM models to provide more reliable results, making it a valuable tool in various machine learning and sequence modeling tasks.
Recent literature demonstrates an increasing research interest in the utilization of deep learning models for predicting solar irradiance for intra-day and intra-hour forecast horizons [
42,
43,
44,
45,
46].
5.3. Weighted Average Ensmble Model
In an ensemble model, particularly an average ensemble, the combined forecast obtained by averaging the predictions of each individual model is evaluated for each hour of the day, as expressed by Equation (21):
With the primary aim of minimizing the RMSE in short-term intra-day forecasting, the weighted average ensemble approach computes the performance weight of each model for different hours of the day by employing the reciprocal of the RMSE. For instance, the weight assigned to the forecast of model ‘
’ is determined as follows:
where
, i.e., the inverse of the RMSE for the respective models.
The forecast generated by the weighted average ensemble is subsequently assessed on an hourly basis using Equation (23).
The hybrid model is intricately designed to begin by independently computing the hourly weights for each individual ML and DL model. Following this calculation, rankings are assigned based on the cumulative weight of each model, and the six top-performing models are automatically selected according to their ranking in a weighted average ensemble model. This methodology empowers the selection of the most effective models, while simultaneously optimizing the trade-off between bias and variance. The selected model is labeled as Hybrid (Hyb) model in the subsequent analysis and discussion.
Additionally, we assess cases where a hybrid model is combined with satellite data referred to as (Hyb + SAT), as well as scenarios where it is further enhanced by integrating GFS forecast data, labeled as (Hyb + SAT + NWP).
5.4. Baseline Naive Model
In the academic literature, a typical approach involves comparing the outcomes of the proposed models with those of simpler or reference models to evaluate the enhancements. A commonly employed reference model is the smart persistence model [
47]. The smart persistence model exclusively relies on historical ground radiation data from the specific location and utilizes clear sky index series. It forecasts the clear sky index for each time horizon τ as the average of the τ preceding clear sky values [
48]. In statistical terms it is also defined as rolling mean and is defined by the equation:
Similar methodology is applied for the N1 pixel to calculate the persistence value of N1.
5.5. Joint and Marginal Distribution
The distribution-based forecast verification framework offers a robust method for evaluating a model’s performance by analyzing the extent to which the forecasted values diverge from the observed value ranges [
49].
The interconnection among joint, marginal, and conditional distributions of two random variables can be statistically expressed through the application of Bayes’ theorem. In the specific context of meteorology, where these variables pertain to forecast and observation, this very association is recognized as the Murphy–Winkler factorization expressed as [
50]:
where
is the probability of both events, and , occurring simultaneously;
denotes the conditional probability of event occurring, given that the event f has already occurred;
denotes the probability of event occurring;
denotes the conditional probability of event occurring, given that the event has already occurred;
represents the probability of event occurring.
The joint probability of both events happening, referred to as , is computed as the product of the conditional probability of given and the probability of .
5.6. Feature Engineering
Feature engineering serves as a pivotal step in both machine and deep learning pipelines, enabling the transformation of raw data into a more informative and suitable format for training and enhancing model performance. Its significance extends to influencing the bias–variance trade-off, thereby mitigating overfitting and underfitting in machine learning models. The bias–variance trade-off aims to find a balance between underfitting and overfitting. Feature engineering is a key tool for achieving this balance because it allows to control the model’s complexity using techniques like dimensionality and noise reduction, regularization, and feature expansion.
In the context of satellite pixel data, which have a sampling frequency of 15 min, we employ the Fourier transform on both the hourly and quarter-hourly (15 min, denoted as ‘minute’) values. This transformation allows us to identify and extract the predominant frequency components that encapsulate the most significant recurring patterns within the dataset. Subsequently, these frequency components are harnessed as crucial input features for ML and DL models. The value of is in range of 1–5 for and while extracting the Fourier transform of hour and quarter hour.
Important features are extracted by applying feature importance assessment using random forest and XGB by randomly selecting 10 days of data for each month. These features are further examined using a correlation method. Specifically, features exhibiting a correlation value greater than |0.3| with N1 are deemed pertinent, as illustrated in the heatmap in
Figure 8 for the Wien station, and subsequently used for the modeling, where:
N1: satellite pixel value (target);
Cosze: ;
ghi_mc: McClear GHI value (scaled using min-max scaler);
Pers_N1: persistent N1 value as defined by Equation (24);
Pers_csi: persistent clear sky index, ( as defined by Equation (24);
Hour_cos1: Fourier transform cosine of hour with k = 1;
min_sin2: Fourier transform sin of quarter hour with k = 2;
min_cos4: Fourier transform cosine of quarter hour with k = 4;
min_cos5: Fourier transform cosine of quarter hour with k = 5.
The feature importance for other stations also exhibits similar patterns.
6. Results and Discussion
6.1. Overview and Analysis of Clear Sky Models
The following observations emphasize the importance of understanding and accounting for the deviations of satellite-derived GHI from measurements when applying clear sky models for GHI estimation.
The GHI is calculated from the satellite-derived clear sky index using Equation (4) by applying two clear sky models. The violin and distribution plots in
Figure 9 clearly show that the Perez clear sky model deviates from ground measurements significantly under the clear sky conditions with a GHI above 800 W/m
2. Both the models show statistically significant deviations from ground measurement data around 150–180 W/m
2. But the Perez model shows a sharp deviation from ground measurements between 200 and 400 W/m
2.
In this study, the joint and marginal distributions of the nowcast global horizontal irradiance (GHI) obtained from the McClear and Perez models, compared to ground observations collected at three monitoring stations, is analyzed.
Figure 10 depicts the visual representation of these distributions where the diagonal black line corresponds to the ground measurements.
For the joint distribution using the McClear model, roughly equal probabilities are observed on both sides of the diagonal. However, for GHI calculated using the Perez model, it is discovered that it tends to underpredict the GHI. This is evident as the probability density is higher below the diagonal line, confirming that the Perez model’s predictions () are less than the observed values ().
A significant pattern is observed upon closer inspection of the 2D kernel density contours. For low-irradiance conditions, the satellite-derived GHI using the McClear model exhibits a slight upward drift above the diagonal line. Conversely, the Perez model shows the opposite behavior, with the GHI prediction drifting below the diagonal.
We also note a distinct characteristic regarding the satellite observations at the Salzburg location. These observations exhibit greater dispersion compared to the other two monitoring sites. This disparity might be ascribed to the frequency of data collection in Salzburg, which occurs at 10 min intervals. This interval could potentially introduce an aggregation effect in the ground measurement errors when these data points are averaged to an hourly resolution. In contrast, the other two sites provide ground measurement data at a 1 min resolution, potentially resulting in fewer aggregation-induced discrepancies.
These observations offer valuable insights into how the McClear and Perez models perform compared with ground-based GHI measurements, particularly in central Europe. They affirm that the McClear model surpasses the Perez model in terms of performance. This is further substantiated by the error metrics in
Table 3 for the two models for the GHI derived using satellite data. In light of these observations, the McClear model is applied for further work and assessment of forecast models.
6.2. Satellite-Derived GHI
The quality of GHI data obtained from METOSAT satellite pixels is demonstrated through a time series plot dated 24 June 2021 for Wels, presented in
Figure 11. This figure highlights three specific scenarios: (a) clear sky conditions, (b) cloudy conditions, and (c) partially cloudy conditions. These specific events are graphically compared using images (d, e, f) acquired by an ASI with their corresponding satellite cloud images (m, n, o). Green arrows in the satellite cloud images represent cloud motion vectors, indicating cloud movement and direction. The cloud mask and pixel images from EUMETSAT are depicted in (g, h, i) and (j, k, l), respectively.
The station’s position is marked at the center of the red square box in the satellite cloud and cloud mask images. Within the cloud mask images (g, h, i), the presence of green indicates clear sky regions, whereas the white areas correspond to cloud coverage. In the RGB images (j, k, l), the green areas signify clear skies, the light yellow regions represent scattered clouds, and the dark blue and black regions denote the presence of heavily overcast dark clouds. These images, corresponding to the three identified events in the time series plot, clearly demonstrate a correlation and the effectiveness of our algorithm in accurately capturing clear sky, cloudy, and partially cloudy conditions.
6.3. Hybrid Model Results Analysis
Table 4 presents a summary of the RMSE (W/m
2) values for the various individual models utilized in this study, as well as ensemble models and the ultimate hybrid model constructed using the six top-performing models. The hybrid model is denoted in bold font, while the voting ensemble model derived from deep learning (DL) models is highlighted in italics, and the voting ensemble model based on machine learning (ML) models is emphasized in bold italics. The table also showcases the percentage improvement (%) of the hybrid model over the least-performing individual model, as well as the corresponding improvement for the voting ensemble models compared to its lowest-performing constituent model for ML and DL, respectively.
The hybrid model exhibits a remarkable improvement, enhancing the RMSE score by 4.1% to 10.6% across various forecast horizons. These results also reveal that voting is a superior choice compared to stacking when it comes to ML ensemble models. Moreover, the performance of the ML voting ensemble model exhibits enhancement as the forecast horizon extends, whereas the opposite trend is observed for the DL voting ensemble model.
Comparable findings are observed at the other two locations, clearly underscoring the benefits of employing a hybrid model as opposed to relying on a single model for forecasting purposes.
6.4. Forecast Results
In this segment, we provide an assessment of intra-day GHI forecast up to 6 hr generated by the various models, conducting a comparative analysis to determine the performance of the models. Following this, we perform an in-depth examination of forecast performance across different clusters of clear sky indices.
Table 5,
Table 6 and
Table 7 provide an analysis of forecast performance for the three distinct stations, measured in terms of the RMSE in units of W/m
2. Each row in these tables represents different forecasting models and combinations of models used in this research. The results emphasize the consistent and superior performance of all the developed models in comparison to the smart persistence model. Notably, during the first hour of the forecast, the GFS exhibits subpar performance, aligning with the existing literature that highlights challenges in initializing atmospheric boundary conditions for NWP models.
The combination of the hybrid ensemble model, comprising the hybrid model in conjunction with satellite and NWP models, Hyb + SAT + NWP, surpasses all other models across all forecast horizons. Nonetheless, it is important to acknowledge that the RMSE error tends to rise as the forecast horizon is extended, as anticipated. The percentage increase in the RMSE compared to the smart persistence model over a 1 h to 6 h interval falls spanning within a range of {28.84%–47.54%}, {36.76%–70.92%}, and {28.78%–71.93%}, for Wien, Salzburg, and Wels, respectively. In contrast, the corresponding rate of increase observed for the Hyb + SAT + NWP across the same time frame is in the range of {16.95%–28.79%}, {20.67%–44.14%}, and {17.68%–51.18%} for the respective three stations. The relatively modest rise in the RMSE for our models over extended time horizons underscores the efficiency of the model.
Figure 12 illustrates the normalized nRMSE (%) and RMSE skill score for all models across various time horizons at each station. Notably, SAT results consistently outperform the hybrid model up to a 3 h horizon, beyond which the hybrid model’s performance exhibits a marginal improvement over SAT. This observation aligns with the fact that satellite data correspond to time lags from
to
hr.
The combination of Hyb + SAT presents a distinct advantage over individual models, with the effect becoming more pronounced up to a 3 h lead time. Beyond this point, the addition of NWP offers a clear advantage for all three locations as the time horizon increases. In terms of performance, Hyb + SAT + NWP remains nearly unchanged when compared to the combination of Hyb + SAT during the first hour, but afterward, it delivers a 5% to 20% improvement in skill score from 2 to 6 h ahead.
In the case of a 6 h ahead forecast, the best-performing Hyb + SAT + NWP model only demonstrates a marginal advantage over the NWP model for Salzburg and Wels, while the situation is reversed for Wien.
The assessment of the models based on MAE and R
2 scores confirms that all models surpass the smart persistence model.
Figure 13 illustrates that Hyb + SAT + NWP emerges as the top-performing model across all horizons.
The best performing model Hybrid + SAT + NWP is further assessed in terms of skill score by categorizing day types according to the daily average clear sky index values, as outlined in
Table 8 and the results are shown in
Figure 14 for all three stations [
37].
This analysis vividly reveals that, for overcast days (DT1), the model’s performance lags behind that of the smart persistence model, particularly in the 1 h ahead forecast. This finding corroborates our earlier observations, as depicted in the 2D kernel density contours in
Figure 10 wherein SAT forecasts exhibit a higher bias for lower values of GHI in cases of overcast conditions. Nevertheless, it is important to highlight that this distinction does not wield a statistically significant impact on the overall results. This is because the total number of overcast days (DT1) amounts to 28, 22, and 20 for Wien, Salzburg, and Wels, respectively, with an average hourly GHI value of approximately 30 W/m
2 for such days. Should the forecast values for the 1 h ahead prediction be adjusted to match the smart persistence values, the overall RMSE improvement for the entire test period is only around 2–3 W/m
2 for the 1 h ahead forecast with this correction.
A closer examination of the results achieved using this method highlights the model’s competency in capturing extremely high cloud variability during cloudy and heavily clouded sky conditions, resulting in a skill score improvement ranging from 10% to 30%.
Table 9 presents a summary of additional metrics, including the Pearson correlation coefficient, mean bias error (MBE), and Kolmogorov–Smirnov integral (KSI), across forecast horizons for the three stations. HYB + SAT + NWP stands out as the top performer, particularly excelling in the Pearson correlation coefficient and mostly outperforming other models in terms of MBE. Station Wien tends to exhibit a mostly negative bias, while the other two stations show a predominantly positive bias. For the forecast horizon of 2 h–4 h, Hyb + SAT performs the best for the Wien station in terms of MBE. Notably, smart persistence demonstrates the lowest KSI values, aligning with expectations since its data points originate from ground measurements, making it more likely to have a distribution similar to ground measurements compared to other models. SAT outperforms the other models in terms of the KSI score for Wien and Wels, whereas Hyb + SAT has a better KSI score for Salzburg.
6.5. Forecast Quality
The reliability of the ensemble model is evaluated by using a reliability diagram which is a graphical tool to assess the calibration of the forecast, as illustrated
Figure 15.
The horizontal axis represents the 10 forecast quantile values of the GHI. The vertical axis represents the observed relative frequency or mean forecast value associated with the forecast quantile. It indicates how often the forecasted probabilities align with the observed outcomes. The reference line is the diagonal line from the bottom-left to the top-right of the diagram and the points along this line indicate perfect reliability, where the forecasted probabilities match the observed relative frequencies. In an ideal case, the points on the reliability diagram should closely follow the diagonal reference line, indicating that the forecast probabilities are well-calibrated. Overconfident forecast points lie above the diagonal line; it suggests that the forecasted probabilities are too high, whereas the underconfident forecast points have low probabilities and lie below the reference line.
Hyb + SAT + NWP is exclusively depicted alongside the smart persistence model to enhance graphical clarity given the close adherence of all other models to the hybrid model. The plots evidently illustrate that the hybrid model is well calibrated and closely follows the reference line in contrast to the smart persistence model for all three stations across the different forecast horizons.
7. Conclusions and Future Work
The primary goal of this research was to apply satellite data to enhance the accuracy of intra-day GHI predictions, spanning from 1 to 6 h into the future, with an hourly time resolution. This study demonstrated the benefits of employing a hybrid model that harnesses the optimal performance of individual models across various temporal horizons, combining both machine learning (ML) and deep learning (DL) models. This study investigated the benefits of combining the hybrid model with models based on satellite imagery and numerical weather prediction (NWP) for improving intra-day solar radiation forecasts and proved that Hyb + SAT + NWP is the optimal choice for short-term hourly intra-day solar radiation forecasting. The nRMSE and RMSE skill score values demonstrate remarkable performance. Though the weighted average hybrid ensemble model is designed to minimize the RMSE, it demonstrated excellent performance for the MAE and R
2 score as well.
Table 10 illustrates the range of error metrices of the Hyb + SAT + NWP model (in bold) compared with the smart persistence model (in parentheses) for a forecast horizon ranging from 1 h to 6 h ahead.
A high R2 value for Hyb + SAT + NWP indicates that a large proportion of the variability in the observed values is explained by the forecast model. It suggests that the model is effective in explaining and capturing the variance in the observed data. A high Pearson correlation coefficient implies a strong linear relationship between the forecasted and observed values. A low RMSE indicates that, on average, the differences between the forecasted and observed values are small. It suggests that the model’s predictions are accurate in terms of minimizing overall error.
The reliability diagrams further substantiate the forecast quality and consistent performance of the Hyb + SAT + NWP model, illustrating its close adherence to a perfectly calibrated forecast across the 1 h to 6 h forecast horizon for all three stations.
This achievement is particularly noteworthy considering the limited dataset, which covers less than a year of measurements and originates from experimental sites in central Europe, where unpredictable cloudy weather conditions are prevalent.
This research yielded several additional noteworthy findings. Firstly, it introduced a consistent and highly accurate method of deriving the GHI based on satellite imagery. Secondly, this study emphasized the significance of using an appropriate clear sky model when deriving the GHI from satellite-based sources. A comprehensive comparative analysis of the Perez and McClear models clearly highlighted the advantages of the McClear model. This study also highlighted the importance of classifying different day types based on clear sky indices and underscored the benefits of the blending model, particularly on cloudy and highly cloudy days.
In light of the findings of this study, there are several valuable avenues for future research. First, in order to enhance forecasting accuracy, we recommend exploring the utilization of the European Centre for Medium-Range Weather Forecasts (ECMWF) as the primary source for NWP input data. Additionally, the incorporation of additional meteorological variables such as temperature, relative humidity, and wind speed as exogenous features needs to be investigated further.
Moreover, a promising area for future research involves conducting a more detailed analysis of historical large-scale satellite data. This analysis would focus on uncovering the precise temporal and seasonal correlations of and , thereby contributing to the refinement and increased accuracy of GHI derivation from satellite-based pixels.
The joint and marginal distribution analysis warrants a condition-based post-processing treatment of irradiance to correct satellite-based GHI calculations which can possibly enhance the forecasting accuracy to the next level.