This paper proposes a Dual-stage Solar Power Prediction Model (DSPPM) method to improve prediction accuracy by considering simple and categorized characteristics of weather forecasts.
In the second stage, a Hybrid Solar Model is introduced to improve the accuracy of weather forecasts and achieve precise power generation predictions by refining the weather forecasts based on the Solar Base Model. The Hybrid Solar Model identifies relevant and meaningful input data patterns based on the energy domain knowledge and defines this function as Semi-Auto Correlation. Following a correlation analysis of the chosen data, it provides a method to minimize the Solar Change Point (SCP)-based solar power generation forecast error.
2.1.2. Stage 2: Detect for Solar Charge Point (SCP)-based Hybrid Solar Model
In the second stage, we identify the period when a Solar Change Point (SCP) occurs based on the performance evaluation of the prediction errors calculated in the first stage. The SCP is detected by setting a threshold for the learning period. This threshold was determined using domain-specific knowledge about energy and was set to the suggested NMAE of 8% by KPX.
If the prediction error rate exceeds the SCP threshold, we apply Bayesian theory to calculate the likelihood of the weather forecast matching the actual weather used as an input for the Solar Base Model. The resulting updated weather forecast is then used as an input for the Solar Base Model.
Figure 5 is the configuration diagram for Stage 2, and Algorithm 2 provides a detailed description of the method used to enhance the prediction accuracy in Stage 2.
The input variables for predicting solar power generation from weather forecasts include temperature, humidity, precipitation, sky conditions, precipitation type, precipitation probability, and snow cover.
Algorithm 2. Solar Change Point detection |
Detect Solar Change Point and Anomaly Score thresholds Load the reference model Select window size (window lower, window upper, start index, end index) Sliding window and analyzing correlation with measured weather data and predicted weather data: Equation (5) Calculate Posterior Probability with Bayes theory Equations (6)–(10) Selection Probability Update of Solar Forecast Output
|
Data analysts with expertise in power and renewable energy domains are well aware of the strong correlation between weather forecast variables, such as sky conditions and precipitation patterns, and power generation. However, when conducting a numerical analysis of the correlation between each weather forecast and power generation, it becomes evident that sky conditions and precipitation patterns exhibit a low correlation with power generation due to the categorized data characteristics. In essence, while these data are indeed significant, the importance of these data is not adequately reflected in the correlations.
As a result, we found that the most important variables for input selection were sky conditions and precipitation type data. Based on this understanding, we developed a Solar Reference Model specifically designed to analyze patterns in solar power generation forecasts. The model primarily utilizes sky conditions and precipitation types as input variables while also considering daylight hours for solar power generation. Both sky conditions and precipitation types are categorized into four separate groups, and both variables simultaneously require the extraction of 16 separate patterns within each time interval.
Figure 6 shows the structure of the reference model used to extract the solar power generation patterns for each hour of sunlight based on the sky conditions and precipitation types, utilizing categorization characteristics.
Figure 7 shows the hourly distribution of the Solar Reference Model, which has been pivoted to align solar generation with sky conditions and precipitation type.
In Stage 2, we assess the performance index of the base model’s prediction and identify periods where prediction errors increase using a sliding window approach. The anomaly score is computed based on the evaluation of Stage 1, considering the moment when the prediction error in the time variable deviates from the typical pattern to an abnormal one, signifying a Solar Change Point. We set an anomaly score threshold to determine periods when the prediction error values are notably significant, indicating the need for corrective prediction values.
Figure 8 shows the time interval during which the anomaly score for predicted solar power generation values and the occurrence of Solar Change Points are depicted. Equation (5) defines a generalized anomaly score used for detecting Solar Change Points. In this equation,
X represents the sliding window size, while
Y represents the rate of change with respect to
X.
The anomaly score threshold is set at 8% of the supply demand error of solar power within the renewable energy sector of the power system. For the detection of prediction error anomalies, the default time period ranges from 11 to 15 h, a period characterized by rapid changes in solar output.
Algorithm 3 provides a detailed description of the method used to calculate the anomaly score in Stage 2. After identifying the Solar Change Point, we compare the weather forecast value, which serves as an input variable for the base model at a specific time point, with the actual weather value at the same time point, which is not considered an input variable. We then analyze the correlation between them. To update the weather forecasts for which the generation forecast error applies, we use a Naive Bayes classifier. By using the weather prediction value with the highest probability from the Naive Bayes classifiers as input, the reference model computes the power generation prediction, updating the existing generation amount prediction. Additionally, we analyze the relationship between weather prediction and meteorological measurements using the Pearson correlation coefficient.
Algorithm 3. Correcting Prediction Errors |
Set the threshold of anomaly score thresholds Compare with test score and solar change index Anomalies = test score[test score.anomaly == True] Detect solar change point and draw a scatter plot
|
The correlation coefficient calculation formula is as follows:
where the variables
and
stand for each variable being considered,
represents the total number of data points or variables, and
signifies the sequence or index of each data point. The term
refers to the average (mean) of the variable
, and
refers to the average (mean) of the variable
. The expressions
and
represent the standard deviations of
and
, respectively. These standard deviations capture how each variable deviates from its mean value.
The calculation of the likelihood of a weather forecast change is based on Bayes’ theory. When predicting tomorrow’s weather, especially when dealing with cloudy skies, we need to determine the probability of rain. This involves considering the evidence of cloudy skies to estimate the likelihood of rain. The challenge is to find out the probability of rain tomorrow when the sky is cloudy, which is represented as P(Rainy|Cloudy), a posterior probability. Using historical weather forecast data, we can calculate several key probabilities:
P(Cloudy|Rainy)—the likelihood that the sky will be cloudy when it rains.
P(Cloudy)—the probability that the sky will be gray or cloudy.
P(Rainy)—the probability of rain occurring.
The posterior probability, P(Rainy|Cloudy), is then calculated by updating our knowledge of these probabilities based on the evidence of cloudy skies. This helps us make informed predictions about the likelihood of rain when the sky is cloudy.
Figure 9 shows an example of the configuration and probability of Bayesian theory related to weather forecasting. The generalization of Bayesian theory is as follows.
where
P(
c|
x) is the posterior probability of target given prediction.
P(
c) is the prior probability of target.
P(
x|
c) is the likelihood which is the probability of prediction given target.
P(
x) is the prior probability of prediction.
Equation (8) is a Bayesian expression for variables denoted as . In the context of Naive Bayes theory, it is assumed that each variable is independent of the others. When variables are independent, their joint probability is calculated as a product. Equation (9) represents a simplification of the joint probability calculation by removing the common part, which is the product of individual probabilities, . This simplification makes it more manageable to compute probabilities when dealing with multiple independent variables.
Gaussian Naive Bayes applies Bayes’ theorem within a normal distribution with standard mean and sample variance. In this paper, it is assumed that the weather forecast follows a normal distribution. Under the assumption of mutual independence among each weather forecast variable, we calculate the likelihood using the estimated parameters of independent variables and the probability density function of the normal distribution. The Gaussian Naive Bayes Classifier determines the most likely weather category. It utilizes the probability density function to calculate the likelihood of a weather forecast change as follows:
where
represents the input variable, and
stands for the variable type.
represents the target variable to be classified, and
is the class to category. When we have
values with a mean of
and the variance of
, we express the probability distribution of these values for a specific class as a normal distribution.
The input variables consist of weather forecasts, and weather measurement data are independent variables. The target variable, on the other hand, represents the variable most likely to change within the weather forecast data (this is the dependent variable). Using the information from these input attributes, we calculate the probability associated with each attribute and then determine the target value with the highest probability.