1. Introduction
The prediction of icing is critical to the normal operation of wind farms in cold climates. Ice formation on the blades of wind turbines would cause load imbalance and structural damage, bringing safety risks to the surrounding area [
1]. Prediction of icing would allow operators to put in place remedies before power losses occur due to icing events [
2]. Developing a prediction framework for an entire wind farm enables anticipating the amount of power loss to the grid and taking appropriate actions.
A review of machine learning approaches for the prediction of icing on wind turbines based on Supervisory Control and Data Acquisition (SCADA) data has been conducted by our research team, which is reported in [
3]. This review discusses existing machine learning methods for the prediction of icing on wind turbines in detail. A brief summary of the articles reviewed is as follows. In [
4], a data-driven neural network approach was used to predict icing on wind turbines based on SCADA data and historical weather data reporting an accuracy of 83% for the dataset examined. In [
5], Federated Learning (FL) was used to predict icing on wind turbine blades. Each turbine was trained as a local model and then a global model was aggregated using all the local models reporting a prediction accuracy of 70% for the dataset examined. In [
6], Random Forest (RF) was used to predict icing events on wind turbine blades reporting an accuracy of 74% for the dataset examined. In [
7], a Recurrent Neural Network (RNN) was used to predict icing reporting an accuracy of 72% for the dataset examined. In [
8], a data-driven Graph Neural Network (GNN) was used to predict icing on wind turbine blades reporting an accuracy of 75% for the dataset examined.
In our previous work [
9], we developed a framework to predict icing on a wind turbine. A Temporal Convolutional Network (TCN) prediction model was used, which generated an average prediction accuracy of 77.6% for future times up to 48 h or 2 days ahead. Only SCADA data and meteorological data were used as input to the prediction model. The prediction model did not rely on the installation of any additional sensors on the turbine.
In this follow-up work, two questions are addressed:
Can the predictors trained on one turbine of a wind farm be used to conduct icing predictions for the other turbines in the wind farm?
How to extend the turbine-level prediction framework to an entire wind farm?
The first question is addressed by carrying out cross-validation or by examining the generalization ability of TCN predictors trained on a single turbine. In other words, predictors trained on one turbine are tested on the other turbines in the same wind farm. The common performance measures of accuracy and
-score are used to evaluate the generalization ability. Accuracy is a measure that represents the number of times prediction is performed correctly across all the predictions performed. The
-score is a measure that provides a combined representation inversely proportional to the number of false positive and false negative predictions across all the predictions performed. A higher
-score indicates fewer incorrect predictions; see [
9] for formulas for accuracy and
-score. The second question is addressed by carrying out two types of fusion approaches: decision fusion and feature fusion. Fusion combines results from multiple turbines, and then give final predictions for the wind farm. In decision fusion, prediction is performed for each turbine independently or individually. Then, all individual prediction decisions are combined by majority voting to obtain a farm-level icing prediction. In feature fusion, features of all individual turbines are combined via averaging. Then, farm-level icing prediction is achieved by one predictor per prediction horizon. Fusion approaches have been previously used in other engineering applications, e.g., [
10,
11,
12]. However, it is worth mentioning that this is the first time fusion approaches are used to achieve farm-level icing prediction. More specifically, the contributions of this work are two fold: (i) examination of the generalization ability of predictors trained on a single turbine for the other turbines in a wind farm, (ii) the development of a farm-level icing prediction framework based on two fusion approaches.
The remainder of this paper is organized as follows.
Section 2 describes the cross-validation study conducted to answer the first question.
Section 3 describes the fusion approaches to conduct icing prediction for an entire wind farm answering the second question. The icing prediction results for an entire farm are reported and discussed in
Section 4. Finally, the paper is concluded in
Section 5.
2. Cross Validation: Generalization Ability of a Single Turbine Predictor
In [
9], we developed a prediction framework to forecast icing on wind turbines up to 2 days ahead using only SCADA data and meteorological data, if available. This approach is based on TCN predictors for different times in the future (prediction horizons). This prediction framework includes the modules of data preprocessing, prediction model training and testing, and prediction model evaluation. Based on the SCADA data from a single turbine, our TCN predictors produced an average prediction accuracy of 77.6% across different prediction horizons from 10 min ahead to 2 days ahead.
The SCADA dataset used in this paper is from a wind farm located in the northern part of the US. This dataset includes 11 features or variables of all the turbines in the wind farm measured every 10 min from January 2023 through July 2023. These features or variables are listed in
Table 1. In addition to the SCADA dataset, weather data features or variables listed in
Table 2 for the same location and time period were acquired from the VisualCrossing weather database [
13].
For the utilization of these predictors at the farm level, it is necessary to examine their performance on SCADA data from other turbines. The wind farm layout considered is shown in
Figure 1. The rated power of each wind turbine in the farm is 2 MW with a cut-in wind speed of 4 m/s, a rated wind speed of 12 m/s, and a cut-out wind speed of 25 m/s. The prevailing wind direction is shown in the figure.
Cross-validation is often used to evaluate the performance of a model on unseen data [
14]. An illustration of the cross-validation conducted here is shown in
Figure 2. For each turbine, its predictors are trained using its own SCADA data, which are then tested on the SCADA data of all the other turbines in the wind farm. An assessment metric
consisting of accuracy and
-score is used to evaluate the generalization ability, where
i denotes the turbine index a predictor is trained for and
j denotes the turbine index the predictor is tested on. The metric
is defined as follows:
where accuracy is a reflection of the number of correct predictions whereas
-score is a reflection of incorrect predictions. The equations for accuracy and
-score appear in [
9]. Note that each
value and
value are the average over the prediction horizons. The average assessment metric
is obtained by averaging
along
j or all the testing turbines. The average assessment metric
can be used to assess the prediction performance of the predictors that are trained on turbine
k and tested on all other turbines.
The outcome of the cross-validation is provided in
Figure 3. For the predictors trained on each turbine, the average accuracy and
-score (average assessment metric) for the testing data from the other turbines are plotted. This figure provides the generalization ability of trained predictors of a single across all the other turbines. The turbine numbers T1, T13, T15, and T24 were not fully operational and thus were not used in our analysis, which explains the missing accuracy and
-scores on
Figure 3 for these turbines.
The predictor trained on the turbine T55 exhibited the highest accuracy and
-score (dash vertical line in
Figure 3). The accuracy and
-score of the predictors trained on T55 and tested on all the other turbines are shown in
Figure 4. The average metric
(defined in
Figure 2) consists of an average accuracy of 86.10% and an average F1-score of 0.50, indicating that the predictors trained on the turbine T55 have the best performance when predicting icing on the other turbines in the same wind farm.
For each tested turbine, accuracy and -score are drawn as a box plot. This box plot indicates the accuracy and -score range across all the prediction horizons, from 10 min ahead to 2 days ahead. Each box plot contains the statistical information including the minimum, maximum, median, first quartile (Q1), and third quartile (Q3) values. It is seen from box plots that the accuracy and -score values vary between tested turbines, indicating that the predictors trained on T55 can perform well on some turbines but not on other turbines.
The predictor trained on the turbine T56 exhibited the lowest accuracy and
-score. The accuracy and
-score of the predictors trained on T56 and tested on all the other turbines are shown in
Figure 5. The average metric
(defined in
Figure 2) consists of an average accuracy of 63.88% and an average F1-score of 0.39, indicating that the predictors trained on the turbine T56 have the worst performance when predicting icing on the other turbines in the same wind farm. By comparing
Figure 4 and
Figure 5, it can be observed that the box plots in
Figure 5 have lower mean values and higher variances than the box plots in
Figure 4, indicating that the predictors trained on the turbine T55 outperform the predictors trained on the turbine T56.
The above analysis suggests that when individual turbine predictors, trained based on the data associated with a specific turbine, are tested on the data associated with another turbine, can perform well when the distributions of the SCADA features of the testing and training data are close and may not perform well when the distributions of the features are not close. As an example,
Table 3 shows the Fisher distance [
9,
15], a measure of closeness of two distributions, of the features from three turbines, where the predictors are trained on T55 and tested on T54 and T56, respectively. By inspecting features in
Table 3, the Fisher distance discrepancies between testing turbine T56 and training turbine T55 is more significant than the feature discrepancies between testing turbine T54 and T55. Therefore, the testing accuracy on T56 is lower than the testing accuracy on T54, which is illustrated in
Figure 6. This figure shows the distribution across all 288 prediction horizons (10 min to 2 days) of the prediction accuracy when T54 and T56 use the predictors from T55. While the histogram for T55 and T54 almost overlap, the histogram for T56 is skewed to the left clearly showing reduced accuracy.
Hence, the answer to the question posed earlier “Can the predictors trained on one turbine in a wind farm be used to conduct icing predictions for all the other turbines in the same wind farm?” is that the predictors trained on one turbine can be used to conduct predictions on the other turbines only if the distributions of the SCADA data features used for training are close to the distributions of the features for the other turbines. However, since there are variations of SCADA data among the turbines in a wind farm, the distributions of the features may not be close between different turbines, and thus one cannot generalize the predictors of one turbine to other turbines in a wind farm.
3. Farm-Level Prediction by Fusion
In this section, first, an overview of the framework reported in [
9] using a single turbine SCADA data is provided to set the stage for conducting prediction at the farm level. Next, the second question stated earlier is addressed. That is, “How to extend the icing prediction framework of a single turbine to an entire wind farm?”.
The prediction model of TCN was used in our single-turbine prediction framework. The architecture of TCN is shown in
Figure 7. This deep learning model consists of convolution layers, ReLU (Rectified Linear Unit) layers, and dropout layers [
16]. The convolution layer takes in SCADA feature data as an input tensor with the size
by
F, where
denotes the input window size and
F denotes the number of features, see
Figure 7. For each turbine, the best input window size and the number of features are determined by carrying out grid search experiments. The output of the network is a binary value, indicating the prediction outcome (1 if the prediction corresponds to “ice” state and 0 if the prediction corresponds to “normal” operation state). The parameters of the TCN model are in
Table 4. Interested readers are referred to [
9] for the experimentations conducted to reach these parameters.
Figure 8 indicates a specific time in the future for which the prediction is performed. For example, if at time
the user desires to predict icing one hour into the future (or 6 samples ahead noting that samples are taken every 10 min), the features in the red (past) and green (present) boxes are used to predict the icing condition in the blue stem. This process is repeated every 10 min.
3.1. Qualified Turbines
There are 75 turbines in the wind farm. The wind rose of the turbines is first checked to exclude the turbines with narrowly defined wind rose with respect to the other turbines for conducting farm-level icing prediction. For the wind farm examined with 75 turbines, the turbines T1, T13, T15, and T24 were excluded.
3.2. Rules Used for Labeling Ice Condition
For each turbine, the ice condition is labeled using the three rules in
Table 5 since the SCADA datasets normally do not provide ice condition labels. Three rules reflect temperature, relative humidity, and actual power as described in [
17,
18]. If all the three rules are met for a data sample, that data sample is labeled as an “ice” state (“1”). Otherwise, it is labeled as a “normal” state (“0”).
For an entire wind farm, ice labels need to be generated. This is necessary in order to test the farm-level predictors for accuracy and
-score. Ice labels were generated based on all the turbines using a majority voting scheme as illustrated in
Figure 9. At each time step, each turbine generates “ice” or “normal” labels independently. Then, a farm-level ice condition is generated using all the turbine labels via majority voting.
3.3. Fusion Approaches
Two fusion approaches are proposed in this work: decision fusion and feature fusion. In decision fusion, individual predictors for each turbine make independent decisions, and then their decisions are combined to generate the farm-level decision. Majority voting is often used for this purpose where each decision is attached to the same importance or weight and the overall decision is considered to be the decision with the highest vote [
19,
20]. The decision fusion approach is illustrated in
Figure 10.
In feature fusion, for each feature, out of the thirteen features listed in
Table 1 and
Table 2, all the data samples of the wind turbines are combined by averaging before carrying out predictions. A method to combine features is by averaging them. Then, the average is used to train one single predictor per prediction horizon. In this work, the predictor architecture used is from [
9]. The feature fusion approach is illustrated in
Figure 11.
For farm-level ice labeling of data samples as well as for decision fusion of predictions, the majority voting scheme is used which involves counting outcomes. A simple illustration of the majority voting scheme is shown in
Figure 12. In example 1, the count of ones is greater than the count of zeros leading to an output or outcome of “1”. In example 2, the count of zeros is greater than the count of ones leading to an output or outcome of “0”.
4. Farm-Level Prediction Results
In this section, the results of the prediction of icing using decision fusion and feature fusion are presented. Comparisons are made between fusion and single-turbine approaches.
Icing prediction accuracy across 288 prediction horizons covering from 10 min ahead to 2 days ahead are shown in
Figure 13. For each prediction horizon, our predictor made icing predictions for the time-series testing samples with the duration covering the winter season from January to April. The green curve represents the outcome of the decision fusion. The blue curve represents the outcome of the feature fusion with red curve representing a single turbine. As compared with the prediction accuracy using a single turbine, decision fusion demonstrates higher prediction accuracy and fewer fluctuations across different prediction horizons. In decision fusion, prediction is performed independently for each turbine. This lowers the chance of the overlap among the prediction errors of different turbines. The predictions from all the turbines are then combined by using majority voting. In other words, even if some of the turbines may provide incorrect predictions, the final decision is determined by the majority of predictions. This makes the decision fusion approach more robust to prediction errors as compared to the feature fusion approach. Feature fusion also exhibits an improvement of the prediction accuracy with fewer fluctuations over that of a single turbine, but more fluctuations than the decision fusion approach. In feature fusion, each data feature is averaged across all turbines in the farm. Since there is only one predictor per prediction horizon, the chance of making a prediction error is higher than in the decision fusion approach because in this latter case, there are many predictors per prediction horizon.
The distributions of accuracy across different prediction horizons are shown in
Figure 14. The average accuracy and standard deviation are shown in
Table 6. As can be seen from this figure and table, both decision fusion and feature fusion can increase the prediction accuracy and decrease the standard deviation with respect to the single turbine case for all 288 prediction horizons. Decision fusion has the advantage of having the least standard deviation, or fewer fluctuations, due to the smoothing resulting from combining many decisions. Feature fusion has the advantage of needing the training of only one predictor per prediction horizon, translating into less training time compared with decision fusion. Note that the latter requires training all turbines in the wind farm for each prediction horizon; in this case, this results in approximately 75× increase in the number of predictors.
An example prediction time series (simulating the way prediction is actually conducted in real-time) for the decision fusion approach is shown in
Figure 15 for the prediction horizon of ten minutes ahead. The predicted icing time-series is plotted in green for the decision fusion, while the actual farm-level icing time-series is plotted in blue. Recall that “icing” is labeled as “one” and “no icing” is labeled as “zero”. As illustrated in these time-series plots, most of the icing events are correctly predicted by the decision fusion approach.