1. Introduction
Reliability planning for the transmission system of the electric power system is essential in keeping the grid operational in times of high uncertainty/variability. Traditionally, this exercise involved managing only one variable—load. The scenarios used to evaluate the system operation and resilience were worst-case loading scenarios derived from historical data with some growth projections. However, globally coordinated initiatives for carbon emission reduction have led to increased emphasis on planning integrated energy systems that feature rapid growth in renewable generation (RG), energy efficiency, and high electrification rates [
1,
2,
3]. Particularly, due to the increasing proliferation of large RG sources, power system resource planning studies must include additional variables, viz., solar and wind generation. The variability associated with these new variables, which depends on weather conditions, such as solar irradiation and/or wind speed, makes reliability evaluation of renewable-rich power systems a more complex and challenging problem [
4,
5,
6,
7]. To address this problem, power system planners create
synthetic scenarios that are aimed at capturing actual system conditions [
8,
9,
10]. Two strategies that have been extensively used for creating synthetic scenarios are:
classical techniques, which try to fit a model onto the distribution and then attempt to generate scenarios from the fitted model, and
machine learning approaches, which learn the distribution from large amounts of historical data and are then able to produce similar scenarios. A brief overview of these two strategies is provided below.
The classical techniques typically rely on probabilistic modeling to generate new scenarios. These include methods that employ Latin hypercube sampling (LHS) [
11], generalized dynamic factor model (GDFM) [
12], generalized auto-regressive score (GAS) models [
13], vine copula methods [
14], principal component analysis (PCA) [
15], and generalized Gaussian mixture models (GGMM) [
16], amongst others. However, despite their complexity, these models cannot fully capture all the correlations between the variables. Consequently, many
classical studies still focus on one variable at a time [
17]. As the power systems become more complex, it will become increasingly hard to extract models that capture all of the system’s characteristics using probabilistic methods.
Machine learning models offer flexibility and versatility when generating new scenarios. Particularly, neural-network-based approaches eliminate the need for the extraction of relevant features from the available data. In [
18,
19], a simple single-layered neural network and radial basis function network (RBFN) were used to forecast wind power ramp-up events and distributions, respectively. However, complex tasks, such as multivariate scenario generation that is being considered in this paper, would require more complex (deeper) architectures.
In recent times, generative adversarial networks (GANs) [
20] have emerged as a popular deep-learning algorithm for scenario generation. This implicit generative model is capable of transforming raw noise into meaningful information. Therefore, it can work on a variety of datasets, such as two-dimensional images and one-dimensional time-series data. Furthermore, it can generate samples that replicate the ones available in the data and other, more varied samples
not present in the original dataset. The performance of a stand-alone GAN has been improved for scenario generation applications by using a hybrid model strategy, tweaking the error function, and/or adding appropriate conditions. For example, recurrent neural networks (RNN) with long short-term memory (LSTM) and reinforcement learning algorithms were added to the GAN model to produce wind power generation scenarios in [
21]. This hybrid model strategy was tested on two case studies, and it created varied and believable scenarios in both of them. Similarly, a Wasserstein distance-based error function was embedded into convolutional GANs to improve performance in [
22]. This approach was extended to condition-based solar and wind power scenario generation in [
23,
24]. These models accurately predicted wind ramp events and peak values but treated the variables (namely, solar and wind), independently.
Since the power system is a complex network of interconnected generation sites and electricity consumers, both residential and large-scale, the relationship between the different modes of power production and the nuances of load demand (i.e., their correlations) must be systematically considered. In line with this realization, a convolutional GAN with an LSTM-based sequence encoder was proposed in [
25] to perform day-ahead forecasting of correlated photovoltaic (PV) and wind production sequences from meteorological data. In [
26], correlated scenarios were generated to determine the most cost-effective generation procedure for optimizing a large-scale hydro–wind–solar hybrid system. Correlated GANs were also used in the cost-optimal scheduling of a battery energy storage system (BESS) to increase the BESS-PV system’s incentive revenue [
27]. Although GAN-based architectures have been applied to generate correlated scenarios for power systems, the
cross-correlation between RG and load has not been well-explored. In addition, the scenario-generation techniques developed in [
25,
26,
27] only focused on generating short-term forecasting scenarios.
To better facilitate long-term reliability planning, there is a genuine need to capture the cross-correlation present between the variables (RG and loads) while creating representative scenarios. The interdependence between these variables occurs naturally in the historical data. However, if the variables are treated independently during the scenario generation process, there is a risk of losing this interdependence and generating less meaningful scenarios. Particularly, under abnormal conditions, ignoring these cross-correlations and using the independent scenario generation approach can result in grossly misleading outcomes (see
Section 4). At the same time, note that incorporating cross-correlation during multivariate scenario generation is a more challenging task. To accomplish this task, a sophisticated implicit generative model is proposed, as explained below.
1.1. Major Contributions
In this paper, a conditional recurrent GAN is proposed to generate cross-correlated scenarios on a seasonal basis. The labeling of the historical data for GAN training is an important aspect of the methodology. In the presence of multiple variables, the determination of normal/abnormal days is not straightforward. Instead of relying on normal and abnormal labels assigned based on visual inspection, a data-driven technique is developed to create seasonal labels. Then, based on the labels, normal and abnormal day assignments are made for each season. Along with cross-correlation, this approach also captures the temporal correlations present in the time-series data of each variable.
For the generation of statistically similar but distinct correlated scenarios, the conditional recurrent GAN is modeled with the use of RNN-LSTM. An RNN-LSTM incorporated GAN is able to better process and reproduce the long-term modalities and temporal aspects in time-series data compared to a conventional GAN which does not consider these properties. Exploiting the temporal modeling capabilities of RNN-LSTMs along with the latent conditional feature modeling power of label-incorporated GANs helps enhance the relevance of the generated scenarios for different end-applications.
An extensive validation of the proposed approach is also provided in which correlated scenarios are compared against uncorrelated scenarios for an actual power system application, namely, optimal power flow (OPF). For normal conditions, uncorrelated synthesis of scenarios has a performance similar to correlated scenarios. However, during abnormal conditions, the results obtained using correlated scenario generation are more realistic than those obtained using uncorrelated scenario generation.
In summary, the novel contributions of this paper are as follows:
Creation of a fine-tuned cross-correlated conditional recurrent GAN () for multivariate scenario generation. This implicit generative model is scalable and yields relevant abnormal scenarios to augment limited historical data.
Formulation of a data-driven labeling process for historical data to eliminate the subjectivity associated with manual labeling.
Demonstration of the validity of correlated scenario generation for the power system OPF application in terms of cost and voltage angle distribution.
1.2. Paper Organization and Key Terms
Some of the salient terms used in this paper are explained here to provide the appropriate context.
Normal day refers to a day that follows the typical seasonal pattern.
Abnormal day refers to a day where one of the variables (RG and/or load) deviates significantly from the typical seasonal pattern. This is different from an abnormal operating condition/event that typically refers to line faults and sudden or unexpected load-shed/generator shut-down.
Scenario generation refers to creation of representative scenarios for long-term resource planning. This is different from scenario forecasting, which is typically used for short-term day-ahead planning.
Cross-correlated scenarios, one of the main contributions of this paper, refer to those representative scenarios that capture the inherent correlations between the variables. Implicit generative models are employed to extract these correlations.
The rest of the paper is structured as follows.
Section 2 presents relevant insights drawn from the data-driven label assignment of the historical data used for the analysis conducted here.
Section 3 provides a detailed look into the GAN architecture, selection, design, training, and implementation.
Section 4 delves into an extensive analysis of the results obtained using the proposed method and their comparison with the uncorrelated scenario generation results. The conclusion is provided in
Section 5.
3. Proposed Implicit Generative Model and Its Implementation
GANs are composed of two neural networks battling against one another (see
Figure 3). The first neural network is called the generator, which aims to generate the synthesized samples. The second neural network is called the discriminator (or the critic). The discriminator’s job is to differentiate between the real and the generated samples. The main objective of a GAN is to learn the distribution of a real dataset and map it to a separate latent space, from which more samples, similar to the original dataset, can be synthesized.
Let us have a dataset,
X, with samples
for time
, and with dimensions
i, whose distribution,
, is to be learned by the generative model. Noise vector inputs,
z, are sampled from a latent space,
, and the multi-layer perceptrons within the generator,
, are trained to map
to
, without explicitly training on
. This is accomplished by the generator producing samples as close to the real data’s distribution as possible (denoted by
). In contrast, the discriminator,
, tries to distinguish the real samples from the generated ones and forces the generator to perform better. As the training progresses, the generator becomes better at producing realistic-looking samples, while the discriminator gets better at distinguishing generated samples from the real ones. The losses of the generator and the discriminator are expressed as,
The training of the generator and the discriminator can be summarised as a two-player mini–max game with the value function
,
3.1. Proposed Conditional Recurrent GAN
GANs can be trained conditionally by incorporating labels in the training dataset, allowing the generator the ability to generate samples based on a certain event or condition. The label,
y, can be any auxiliary information that can be appended to the real samples,
x. The generator will then learn to associate a certain class of data with its associated label. After training has been finished, the generator can be forced to produce only a certain class of samples by appending the corresponding label,
y, to all the noise vectors. The value function of the conditional GANs, conditioned on the label
y, can be written as,
Since the available historical data was a multivariate
time series, it was necessary to include recurrent layers in both the generator and the discriminator. Recurrent layers in the generator model retain the time-series long-term modulations and help generate sequences that capture the fluctuations of the real data. In the discriminator, the recurrent layers help identify the sequential data better. The recurrent model of choice was RNN-LSTM, making the proposed machine learning model a conditional recurrent GAN. Note that the LSTM layer ensures that the recurrent GAN is properly trained to capture both short-term (daily) and long-term (seasonal) patterns in the time-series data. Furthermore, it leads to the generation of more homogeneous and valid training data for the GANs, which eventually leads to more consistent generated scenarios as the output of the GAN. The generator and the discriminator models consist of three stacked LSTM layers, along with a linear output layer. The hyperparameters of the models were tuned by comparing the observed outputs to the expected results. To optimize the discriminator output, it was trained thrice as much as the generator to maintain the best estimation ratio between the data density and the model density [
31]. The model details are given in
Table 2.
3.2. Overall Implementation
The proposed approach of systematic model-free data segmentation and scenario generation using implicit generative models has been summarized in
Figure 4. First, the historical data is preprocessed by normalizing the different variables to their peak values and creating daily profiles. Next, the data is segmented in preparation for GAN training. The available data is classified by season, and a representative day is selected for each season. Finally, the normal/abnormal classification is performed for each season, leading to the generation of six datasets (three seasons and normal/abnormal for each season). The next phase involves training the generator and discriminator using the labeled correlated datasets. The hyperparameters are tuned, and the loss functions are monitored to achieve an equilibrium that indicates a fully trained GAN model. In the next phase, the GAN model is fed labeled noise to generate similar but distinct scenarios for each of the six datasets. Finally, statistical validation of the generated scenarios is performed before moving onto OPF-based validation concerning the historical data and against the existing methodology of uncorrelated scenario generation.
4. Results and Analyses
The proposed approach was tested on a dataset provided by a power utility located in the US Southwest. The dataset comprised two years of hourly solar generation and load demand profiles at the transmission level. The nature of the data allowed for capture of temporal and cross-correlations within the variables. However, as no spatial information was provided with the dataset, spatial correlations could not be captured. After preprocessing and normalizing the dataset, it was segregated into
summer, shoulder, and
winter seasons, followed by classification into
normal and
abnormal days. The
was trained with these datasets. The
-generated scenarios were then evaluated for their similarity to the historical data in the same category. Comparison of individual generated profiles for each variable to the historical profiles showed a good match, as shown in
Figure 5 for summer normal real and generated load, and
Figure 6 for summer normal real and generated solar, respectively. As is evident from the figures, the seasonal segmentation results in scenarios that closely track the temporal variations of the real dataset.
4.1. Statistical Validation of Proposed Implicit Generative Model
Going beyond visual confirmation, we performed rigorous statistical analysis to investigate the performance of the proposed scenario generation methodology. The statistical measure employed was the auto-correlation function (ACF), which defines how data points in a time series are related, on average, to the preceding data points.
Under normal conditions, the ACF shapes of the real and generated datasets for both load and solar were found to be very similar (see
Figure 7a,b). The highest positive correlation at one hour for both variables confirms that the nearest temporal value has the highest correlation to any data point. However, since the normal solar peak and zero production times in summer are roughly 10 h each, the highest negative correlation occurs at a 10-h lag for the solar profile. The normal summer load pattern shown in
Figure 5 is quasi-sinusoidal with peak and valley 12 h apart, which is consistent with the negative ACF peak for the load at a 12-h lag. A similar pattern is observed for normal days in other seasons with slight variations in negative ACF peak location.
Under abnormal conditions, the load correlation shapes show a similar pattern as their counterparts under normal conditions, but a slight difference is observed between real and generated shapes for solar (see
Figure 7c,d). This happens because the cross-correlated nature of the
can bias one or both of its outputs (solar ACF in
Figure 7d), as it is trained on both the variables. Therefore, its accuracy in producing matching scenarios for any one variable might be lower. However, we demonstrate in
Section 4.2.2 that for actual power system applications, creating scenarios where the cross-correlations are considered results in more realistic outcomes.
4.2. Comparison with Uncorrelated Scenario Generation for Power System Application
To highlight the value of the correlated scenario generation process, two additional GANs were trained using the same historical data—one for the independent generation of load sequences and one for the independent generation of solar sequences. These univariate uncorrelated GANs (termed load GAN and solar GAN) generate seasonal (normal/abnormal) scenarios for load and solar generation, respectively. Note that many GAN-based scenario generation techniques proposed recently are univariate and hence, uncorrelated (e.g., [
22,
23]). Therefore, the subsequent analysis is a comparison of the proposed methodology with the state of the art.
The selection of baseline days for uncorrelated scenarios is an important but challenging consideration. As load and solar are processed independently, there is no guaranteed or consistent overlap between the labeled training data for each set. This disjunction is more clearly pronounced for abnormal days. For example, an abnormal summer day for the load (very high load) can be vastly different from an abnormal summer day for solar generation (cloudy or rainy day). Thus, it is impossible to determine baseline days satisfying the same load and solar generation conditions.
One strategy could be to assume that the baseline days were identical for correlated and uncorrelated data. However, doing so will yield consistently favorable results for the correlated scenario generation approach since the baselines are drawn from its training dataset. Consequently, to avoid this possible (implicit) bias in favor of the proposed approach, the following strategy was devised in this paper: the baseline days for the uncorrelated scenarios were synthesized independently from the two training datasets (load and solar generation). Separate comparisons were then made between each approach’s generated and baseline values.
4.2.1. Validation Using Optimal Power Flow (OPF) Analysis
To evaluate the performance of the generated scenarios for power systems applications, the generated solar and load profiles were applied to a modified IEEE 30-bus system [
32]. A futuristic generation scenario was evaluated, where all the load buses also have solar generation. OPF was run under different ratios of solar generation peak to load peak, ranging from 0.3 to 1.2. Since the scenarios are derived from the historical dataset, the OPF converged for all the scenarios. To lend statistical validity to the exercise, 900 (=30 × 30) scenarios were generated for both correlated and uncorrelated methodologies for each of the 6 classes (3 seasons × normal/abnormal). This enabled application of 30 distinct and randomly assigned profiles to all the buses of the system for one OPF computation. The OPF itself was run 30 times—each time with a completely different set of profiles—to ensure consistency of the results.
The distributions for each iteration were compared with the baseline to compute the distance between the two; the Wasserstein distance was used as a measure for this comparison. Additionally, the OPF results provided costs by the hour for each iteration. Finally, the voltage-angle data based on the hour/iteration/bus were collected for further analysis. The results were evaluated from multiple perspectives. Each methodology (correlated and uncorrelated scenario generation) was compared against its baseline to identify which would generate more realistic scenarios. Furthermore, comparisons were made over iterations to evaluate the consistency and on an hourly basis to identify if the gap between generated and baseline scenarios has any time-of-day dependence. The voltage-angle distribution plots were plotted for three different hours: 07, 12, and 17.
4.2.2. Results and Discussion for Abnormal Conditions
Figure 8 shows the shoulder season hourly OPF costs, averaged over 30 OPF iterations, for the solar-to-load ratio of 0.6. The correlated generated scenarios track their baseline much more closely than the uncorrelated scenarios. In addition, the baselines for the two cases show significant differences. The abnormal conditions typically signify lower solar generation (due to cloudy or rainy conditions), which is often accompanied by a lighter load (due to lower cooling requirements). However, the baseline of the uncorrelated case shows significantly higher OPF costs that result from the unrealistic combination of independently derived abnormal conditions (high load and no to low solar generation). The generated scenarios overestimate the costs (i.e., a combination of higher load and lower solar generation), resulting in grossly unrealistic scenarios.
A similar case is presented in
Figure 9 for the summer abnormal situation, where the correlated scenario’s costs are tracking the baseline costs well (similar to the shoulder abnormal). The baselines for correlated and uncorrelated cases are more closely aligned compared to the shoulder abnormal (except for a few morning hours), but the uncorrelated generated scenarios are underestimating the cost by a large margin. Although not shown in the figures to ensure clarity, the behavior for the other solar-to-load ratio was consistent with these results.
In the case of the winter abnormal shown in
Figure 10, the baseline costs between correlated and uncorrelated scenarios differ significantly—similar to the shoulder abnormal case. The correlated scenarios are much closer to their baseline than the uncorrelated ones. However, it is observed that the correlated scenarios are overestimating the costs between the hours of 8 AM and 6 PM, indicating that the generated solar scenarios are lower than the baseline. Under the winter abnormal conditions, the solar profiles are predominantly low with a few exceptions, so the GAN is getting trained to generate lower solar profiles. However, since the baseline does contain some higher solar generation profiles, there is some gap between the baseline and the correlated scenarios.
In contrast to
Figure 8, which provides averaged hourly OPF cost profiles over 30 iterations,
Figure 11 depicts the average cost per hour for different OPF iterations for the solar-to-load ratio of 0.6. The correlated variations are narrower in range and closer to the baseline. This chart also underscores the baseline difference discussed above.
Another perspective to view the differences between the correlated and uncorrelated approaches is to look at the voltage angle distributions for the 30-bus system.
Figure 12 shows the probability density functions (PDFs) of the voltage angles for 5 PM for correlated and uncorrelated scenarios for all three seasons. The better overlap with the baseline distribution is clearly visible for the correlated scenarios. These plots are for one of the 30 iterations, but a similar pattern was observed for other hours and for all iterations, albeit with some variability. The larger difference between the correlated and uncorrelated scenarios in the shoulder season may be partially attributable to the data segmentation technique used in
Section 2.2. However, the distinction between the two scenario generation methods is still evident in the other seasons.
Table 3 shows the numerical results for abnormal seasonal daily OPFs. It covers solar-to-load ratios from 0.3 to 1.2 for all three seasons and correlated and uncorrelated conditions and reinforces the results and conclusions from earlier plots. The Wasserstein distances for correlated cases are lower than uncorrelated cases under most conditions, often by large margins. The Wasserstein distance should be a low number, but not 0, as we are aiming to obtain similar, but distinct scenarios. Correlated scenarios achieve this objective much better than uncorrelated scenarios, with a minor exception of high solar-to-load ratios in winter, for which the results are comparable. Moreover, the costs for the uncorrelated scenarios for the summer and shoulder seasons point to totally misleading results. For instance, even under abnormal conditions, summer costs should be highest due to high load, and shoulder costs should be lowest due to a combination of low load and good solar generation. However, the uncorrelated scenarios are showing the exact opposite behavior.
4.2.3. Results and Discussion for Normal Conditions
The difference between correlated and uncorrelated scenarios is not as significant under normal conditions. In fact, the uncorrelated scenarios showed a closer correlation to the baseline data in the summer season than the correlated scenarios, as shown in
Table 4. This is understandable as the solar and load profiles for each season do not have many deviations under normal conditions, and the single variable nature of uncorrelated scenarios allows the corresponding GAN to be trained better for normal, independent signals. However, it was observed that for winter (see
Figure 13), the uncorrelated scenarios are farther from their baselines (depicting lower costs) due to the overestimation of the solar generation. The Wasserstein distances for shoulder normal shown in
Figure 14 indicate that the uncorrelated distances are higher than correlated ones for most hours of the day. The voltage-angle plots for three different hours for winter normal correlated scenarios (see
Figure 15) demonstrate that the distributions are matching the baseline very well.
Table 4 shows the normal seasonal summary results for solar-to-load ratios from 0.3 to 1.2 for all three seasons and correlated and uncorrelated conditions. It can be observed from the tables that the Wasserstein distances for normal conditions are similar (both are low) for correlated and uncorrelated scenarios. Similarly, the cost distinctions are minor under most conditions. However, the uncorrelated scenarios are consistently underestimating the costs for the shoulder season, which is in direct contrast to their behavior under abnormal conditions. As a result, the uncorrelated scenario-based OPF may demonstrate unreasonably high variations in OPF costs between normal and abnormal scenarios, leading to non-optimal outcomes from a long-term reliability planning perspective.
4.3. Practical Significance
Since many resource planning activities aim to distinguish abnormal conditions from normal conditions, it is helpful to compare how the generated abnormal scenarios differ from the generated normal scenarios. For correlated cases, the costs for the abnormal scenarios are consistently and reasonably higher than the costs of the normal scenarios due to the lower solar production on abnormal days. Winter days show the largest and most consistent gap through the day (see
Figure 16), indicating the need for a longer traditional generation or battery backup requirements. For the shoulder (see
Figure 17) and summer seasons, the gap between normal and abnormal is smaller and restricted to fewer hours of the day, indicating that the backup requirements may be less. For uncorrelated scenarios, consistency is absent: for the shoulder season, the abnormal scenarios grossly overestimate the net load (as shown in
Figure 17); for summer, they show lower costs than the normal scenario, and no reasonable conclusions can be drawn from them. In summary, through the OPF application, we have demonstrated the ability of correlated scenario generation to create valid representative power system scenarios that are a prerequisite for long-term resource planning. In the future, we will apply the scenarios generated using the proposed approach to solve the optimal BESS sizing and siting problem [
33].
5. Conclusions
As the exploration of ways to understand and analyze the impacts of RG on grid reliability continues, synthetically generated representative scenarios will play an increasingly vital role. Due to legacy practices and/or ease of application, uncorrelated/univariate scenario generation is often used for such exploration. However, this may lead to outcomes that are not realistic. This paper demonstrates the utility of correlated multivariate scenario generation in understanding and analyzing normal and abnormal system conditions.
The proposed systematic end-to-end methodology for correlated scenario generation has the following components:
Structured and model-free data segmentation.
An informed selection/design of a cross-correlated conditional recurrent generative adversarial network ().
Generation of correlated representative scenarios that augment the original dataset.
Extensive and application-oriented validation that proves the value of the proposed methodology.
Overall, correlated scenario generation was seen to create more realistic profiles due to the integration of both solar generation and load demand in the training of the . From the OPF application evaluation, the following key conclusions are drawn:
The correlated scenario generation resulted in lower and more accurate average hourly costs across the seasons (as shown in
Figure 11 and
Table 3).
From the voltage angle distributions, it was observed that the correlated scenarios are more similar to the real case compared to uncorrelated scenarios (as shown in
Figure 12).
Seasonal performance analyses highlighted why inferences drawn from uncorrelated scenarios might be misleading (results from
Table 3 and
Table 4).
It was also shown that the results from uncorrelated scenarios are adequate for normal days, but it can lead to misplaced conviction about their applicability to abnormal scenarios (results from
Table 3 and
Table 4).
The proposed methodology is voltage-level agnostic, scalable, and portable to different datasets, geographies, and end-application requirements. It can be used to analyze the reliability and resilience issues with various renewable energy penetration levels and come to definitive conclusions about deploying these resources. The proposed approach currently captures cross-correlations and temporal correlations that exist between RG and loads. With the right dataset and minor modifications, it can also be extended to capture spatial correlations between the different variables.