4.2. Methods for Training the Model
The power system has a large scale and various types of data, which can be monitored to achieve risk forewarning for power system operation. This includes voltage data, current data, power data, frequency data, phase angle data, and so on. The obtained power data samples have high dimensionality and large-scale characteristics. However, many features of the obtained power data exhibit strong correlations in terms of their physical meanings, which are redundant features. These will affect the training effectiveness of deep learning models [
40]. Therefore, feature selection and data processing need to be performed. Based on the following three aspects, feature selection is conducted in this paper.
This method calculates the variance of each feature and considers features with variance lower than a predefined threshold as redundant. These features can be directly removed.
- 2.
Correlation-based Feature Selection;
If features are highly correlated, their changing trends in the samples are generally similar. This can lead to a decrease in the generalization ability of deep learning models. Such features are known as collinear features, and usually only one of them needs to be retained. In this paper, the Pearson correlation coefficient is used to measure the correlation between features
where
denotes the number of samples;
and
denote the sample points of two features;
and
denote the sample means of the two features, and the closer their values are to E, the higher the correlation between the two features.
- 3.
Principal Component Analysis (PCA).
PCA is a technique that linearly maps the original features to a new set of features in such a way that the new features are uncorrelated with each other. This helps in reducing redundancy among the features.
In addition, deep learning models can exhibit excellent classification performance on datasets where the number of samples in each class is not significantly different. However, in reality, the majority of data samples are imbalanced, meaning there is a substantial difference in the number of samples between different classes. For a binary classification problem, we refer to the class with fewer samples as the minority class and the class with more samples as the majority class.
The imbalance of samples can have a significant impact on the training effectiveness of deep learning models [
41]. Sometimes, due to the scarcity of samples in the minority class, it may fail to learn the underlying patterns or the patterns learned from the minority class may be gradually overshadowed by the patterns learned from the majority class. For example, if we have a dataset with 10,000 samples, out of which 9990 are positive samples and 10 are negative samples, a model may correctly classify all 9990 positive samples while misclassifying all 10 negative samples. Despite having high accuracy, the model fails to distinguish between positive and negative samples entirely. In the field of electrical power systems, serious accidents are rare due to stringent safety measures, resulting in datasets exhibiting typical characteristics of sample imbalance. Therefore, addressing the sample imbalance issue is necessary.
In the field of deep learning, common methods for addressing the problem of imbalanced sample training include oversampling and undersampling.
Oversampling involves increasing the quantity of minority-class samples to balance the dataset. One commonly used oversampling method is data replication, where the minority class samples are replicated multiple times to increase their representation in the training set. This method is simple and straightforward, but it may lead to model overfitting due to the introduction of redundant information from duplicate samples.
- 2.
Undersampling.
Undersampling aims to balance the dataset by reducing the quantity of majority class samples. Cluster-based undersampling is a commonly used technique, which divides the majority class samples into several clusters and selects representative samples from each cluster to decrease the number of majority class samples. Undersampling can reduce training time and memory consumption, but it may result in the loss of important information from the minority class samples.
The SMOTE algorithm, which belongs to the oversampling method, is employed in this study to address the issue of imbalanced sample training. Synthetic Minority Over-Sampling Technique (SMOTE) is a widely used oversampling technique that has gained popularity. It involves creating synthetic minority samples by interpolating between existing minority samples, rather than simply replicating nearby samples. The inspiration for this technique originated from an algorithm proposed in a handwritten character recognition project. The main process of SMOTE for oversampling can be summarized as follows: Firstly, for each minority sample, K nearest neighbors are selected from other minority samples. Then, linear interpolation is performed between the selected point and its K nearest neighbors, using a certain random interpolation factor and oversampling rate. This generates a portion of synthetic minority samples. The formula for SMOTE interpolation is shown as Equation (26).
where
denotes the generated synthetic sample,
denotes the original sample,
is one of the
K nearest neighbor samples to
, where
K is a defined hyperparameter,
is a random number between 0 and 1.
The SMOTE algorithm mainly performs interpolation between similar minority class samples to generate representative synthetic samples. Therefore, the problem of overfitting can be mitigated to a certain extent, and the decision space of the minority class can be better expanded. The pseudocode of the SMOTE algorithm is illustrated as Algorithm 1 in the following figure.
Algorithm 1: The pseudocode of the SMOTE algorithm. |
- Input:
The number of samples in the minority class, denoted as T, the oversampling rate as N%, and K nearest neighbors. - Output:
Synthesized minority samples, denoted as T*.
For i = 1 to T Obtain K nearest neighbor minority samples for the minority sample i, and store these samples in the Karray. While N != 0 SMOTE(N, i, Karry) The function of the SMOTE algorithm is to perform linear interpolation in the Karray based on the sampling rate for the minority samples i, as defined in Equation (26). N = N − 1 Endwhile Return T*
|
Combining the aforementioned feature selection method with the imbalance sample handling method, the training process of the intelligent risk forewarning model for power system operation based on SW-ELM is as follows:
Divide the entire power system into several regions based on geographical locations, where each region consists of multiple nodes and lines.
- 2.
Modeling the correlation of strong wind among multiple regions;
Utilize the vine-copula tool mentioned in
Section 2.2 to establish a correlation model for strong wind among different regions in the power system, based on historical wind speed data.
- 3.
A sampling of strong wind scenarios;
Using the established correlation model for strong wind among multiple regions in the power system, combined with the Monte Carlo sampling method, sample the strong wind scenarios in each region. The total number of samples is denoted as .
- 4.
Generation of normal operating scenarios for the power system;
Randomly generate a certain number of realistic normal operating scenarios for the power system, including the load conditions at each node and the output of the generators, based on historical data and statistical analysis methods.
- 5.
Combination of strong wind scenarios and normal operating scenarios;
Randomly combine the strong wind scenarios obtained in Step 3 and the normal operating scenarios for the power system generated in Step 4, resulting in different scenarios of normal system operation under strong wind conditions. Note that for the random combination in this step, each strong wind scenario does not need to be combined with all the normal operating scenarios for the power system, but only with a subset of them, denoted as .
- 6.
Calculation of the probability of line outage and tower collapse faults;
Based on the different random combinations obtained in Step 5, use the method described in
Section 3.1 to calculate the probabilities of line outage and tower collapse faults occurring in each region.
- 7.
Generation of fault operation scenarios in the power system;
Based on the calculated probabilities of line outage and tower collapse faults from Step 6, utilize the Monte Carlo sampling method to randomly sample faults for each random combination obtained in Step 5. A total of different faults are sampled for each random combination.
- 8.
Calculation of power system risk level division results;
According to the power system operation risk indicators and risk level division principles described in
Section 3.2 and
Section 3.3, calculate the division results of the power system risk levels. These results will serve as the output data for SW-ELM and will be used for subsequent model training.
- 9.
Collection of power system information;
Collect the active and reactive load, voltage, and phase information of busbars in the power system, the active and reactive power output and operational status of generators, active and reactive power at both ends of the lines, and the location of faults.
- 10.
Feature selection;
Utilize feature selection methods to select features from the collected information in Step 9, determining the input data for SW-ELM.
- 11.
SW-ELM training.
Use the power system risk level division results obtained in Step 8 as the output data and the selected data from Step 10 after feature selection as the input data for SW-ELM. Combined with the imbalance sample handling method, train the SW-ELM model. After training is completed, SW-ELM can be used for intelligent risk forewarning in power system operation. Additionally, note that since the random combinations obtained in Step 5 represent normal operating states of the power system and different fault conditions are extracted in Step 7, the total number of training samples for SW-ELM is .