Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals

Matrenin, Pavel V.; Harlashkin, Dmitry A.; Mazunina, Marina V.; Khalyasmaa, Alexandra I.

doi:10.3390/asi7060105

Open AccessArticle

Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals

Ural Power Engineering Institute, Ural Federal University Named After the First President of Russia B.N. Yeltsin, Ekaterinburg 620062, Russia

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2024, 7(6), 105; https://doi.org/10.3390/asi7060105

Submission received: 24 September 2024 / Revised: 21 October 2024 / Accepted: 24 October 2024 / Published: 29 October 2024

(This article belongs to the Special Issue Wind Energy and Wind Turbine System)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate prediction of wind power generation, as well as the development of a digital twin of a wind turbine, require estimation of the power curve. Actual measurements of generated power, especially over short-term intervals, show that in many cases the power generated differs from the calculated power, which considers only the wind speed and the technical parameters of the wind turbine. Some of these measurements are erroneous, while others are influenced by additional factors affecting generation beyond wind speed alone. This study presents an investigation of the features influencing the accuracy of calculations of wind turbine power at short-term intervals. The open dataset of SCADA-system measurements from a real wind turbine is used. It is discovered that using ensemble machine learning models and additional features, including the actual power from the previous time step, enhances the accuracy of the wind power calculation. The root-mean-square error achieved is 113 kW, with the nominal capacity of the wind turbine under consideration being 3.6 MW. Consequently, the ratio of the root-mean-square error to the nominal capacity is 3%.

Keywords:

wind turbine; wind power forecasting; power curve; ensemble models; machine learning; feature importance

1. Introduction

The operation of electric power systems with a high share of generation from renewable energy sources (RESs) differs from the operating principles of traditional electric power systems (EPSs). One of the major differences is the inability to control the generation volumes at RES facilities. Wind energy is an important part of RESs and one of the most promising forms of electricity generation, actively developing around the world. However, unlike traditional energy sources, wind turbines are subject to significant fluctuations in productivity due to variability in wind speed and other meteorological conditions [1,2,3]. Accurate forecasting of wind power generation allows the optimization of the operation of power systems, improving their stability and reliability, especially in the context of an increasing share of RESs in the overall energy balance.

Since the key factor determining the generation of a wind turbine is wind speed, many studies are devoted to forecasting wind speed [4]. According to [5], the main error in wind speed short-term forecasting is related to the numerical weather prediction (NWP), so research of this issue was carried out and different models and approaches were assessed. In addition, NWP models are data- and computationally intensive [6].

Statistical models such as autoregressive (AR) models, integrated moving averages (ARIMA), and exponential smoothing use historical data on wind speed and other meteorological parameters to create forecasts [7,8,9].

Modern machine learning and deep-learning algorithms show high efficiency in solving forecasting problems. The application of these methods to wind power forecasting can significantly improve the quality of forecasts compared with traditional statistical approaches [4,10,11,12,13,14,15,16]. Machine learning (ML) algorithms can use historical data on wind speed, wind direction, air temperature, and other factors to build a power generation forecasting model. For example, the paper [17] shows the use of convolutional neural networks to predict wind power output based on meteorological information.

For all wind turbines, manufacturers provide the functional dependencies of the power generation on the wind speed. Wind turbines operate on the principle of converting wind energy into mechanical energy. The blades of the wind turbine rotate under the influence of the wind. This rotation is transmitted to the generator, which ultimately produces electricity. Therefore, the physics of the process of electricity generation by a wind turbine requires a more careful study of the possibility of calculating generation from wind speed, considering such factors as inertia, air density, and thermal processes [18].

Thus, wind speed forecasting and wind turbine generation forecasting are different tasks. Although wind speed can be used to calculate generation, accurately determining the wind turbine generation is a separate task [19]. As theoretical wind power curves cannot always characterize the actual state and performance of the wind turbine, it is necessary to use data from the real operating wind turbines and farms to obtain more accurate power curves.

There is a wide variety of different approaches used for wind power prediction [20]: warping function [21], linearized segmented model, polynomial power curve, ideal power curve [22], probabilistic model [23], dynamical power curve [24], and logistic models [25]. A data synthesis-informed training U-net-based method of solving the wind turbine power curve modeling problem from the image processing perspective is presented in [26].

During recent years, machine learning algorithms have become one of the most popular approaches, replacing the mathematical models. Authors of the study [19] propose a new method to detect the outliers and assess the power curves with the use of clustering algorithms of vector quantization and density-based spatial clustering. The proposed method is tested on Korean wind farms. According to the research, this is one approach to increasing the efficiency in terms of eliminating noise from the data.

Calculating short-term and ultra-short-term wind turbine power generation is important not only for wind power forecasting, but also for creating digital twins of wind turbines that can be used to assess their condition and for other purposes [27,28,29].

Paper [30] presents a deep-learning approach to predicting the power generation for the various wind turbines using a two-stage modeling strategy, where the neural network unites the spatiotemporal correlation. This approach was also tested on a Chinese offshore wind farm and provided a more accurate forecast compared with the other variants.

Processing outliers in the wind power datasets is essential for power curves’ estimation [19,31,32]. Outliers can stem from various sources, including operational factors such as output curtailment, turbine failures, and maintenance activities, as well as measurement errors [31]. The detectability of these outliers is influenced by the time resolution of the data; higher resolutions (ideally, under 10 min [19]) make outlier detection more straightforward. However, accessing a sufficient quantity of high-resolution data can be challenging; as a rule, the existing articles consider 1 h intervals.

The study [19] identifies three areas related to outliers (as shown schematically in Figure 1). The first group involves power output values close to zero despite significant wind speeds (“1” in the Figure 1). The second group comprises output values substantially exceeding calculated estimates, likely due to measurement errors [31], as power generation is unlikely to surpass predicted levels at given wind speeds (“2” in the Figure 1). The third group pertains to situations where output is significantly lower than calculated, excluding cases from the first group (“3” in the Figure 1).

Excluding zero-power output values due to measurement and data recording system failures, as well as the second group of outliers, the remaining cases that appear as outliers could indeed be true values, especially at 10 min intervals. Therefore, constructing a wind power curve that disregards such values may reduce the actual accuracy of wind farm power output calculations. In power generation forecasting tasks for grid management, this may be insignificant. However, in a small, isolated power system based on wind energy, maximum forecasting accuracy is necessary for managing electrical power flows within the system.

Thus, research on constructing models to estimate wind turbine power based on various factors over short-term intervals remains relevant.

The main contributions of this research to the field of forecasting wind power plant generation and modeling a power curve of a wind turbine are as follows:

Analysis of the impact of different features beyond wind speed on the accuracy of wind turbine output power estimation using real SCADA system open data with 10 min measurement intervals;
Application of a hybrid approach based on the theoretical power curve and machine learning models to more accurately assess the actual wind turbine output power;
Investigation of the hypothesis of using power values from the previous time step to enhance accuracy by considering the inertial properties of the wind turbine.

2. Materials and Methods

2.1. Initial Dataset and Exploratory Analysis

The materials for this research are the data about the power generation output of a real wind power plant (nominal capacity of 3.6 MW) and the data about wind speed. These data were recorded by the SCADA system of the real wind turbine in Turkey [33]. It is an open dataset (Kaggle). The data discretization step equals 10 min, and there are 50,530 samples contained in the data, from 1 January 2018 00:00 to 31 December 2018 23:50 (the number of missing values is 3.86%). Next, parameters are presented from the collected data:

Date and time;
Generated active power, kW;
Wind speed at turbine height, m/s;
Theoretical electrical power values that the turbine generates with that wind speed as given by the turbine manufacturer, kW;
Wind direction at turbine height (while the turbine is turned so that the wind wheel is perpendicular to the wind flow).

The initial data structure is presented in Table 1.

Visual analysis of the data readily reveals outliers in the first group (Figure 2). It is evident that these can be attributed only to errors in the measurement and data collection system. Therefore, such values (where power is zero and wind speed exceeds 3 m/s) are excluded. As a result, the number of samples (rows) has been reduced from 50,530 to 46,976 (7%).

For all other outliers, it is impossible to confidently assert that they are solely due to incorrect power readings, so they are kept. Figure 3 shows that the dataset does not contain outliers in the second group. It is impossible to definitively determine whether data points in group 3 are incorrect values or not; therefore, they were not excluded from the dataset. For instance, these could be values where the actual power is lower than theoretical at high wind speeds due to the turbines’ inertial properties or sudden changes in wind direction, as the rotation of the wind turbine nacelle does not occur instantaneously. Figure 4 shows the theoretical power curve and observations from the SCADA system.

The theoretical (calculated) dependence of power generation on wind speed has a significant error, and determination of the actual generation based on data from the calculated model only will have a greater deviation than that based on a model that takes into account wind speed and additional parameters.

Statistical evaluation of the initial data and preprocessing were performed before implementation of any ML models. To look at the current state of the data, common statistical metrics were used, such as average value, standard deviation value, percentile values, and minimum and maximum values. The results of the statistical evaluation are presented in Table 2. Figure 5 presents the distribution of the features; Figure 6 shows the scatter plot of wind direction. When the wind speed exceeds a certain threshold, the output power stops increasing. It explains the second extremum on the density plot of wind turbine power.

2.2. Pipeline of the Applied Method

The general approach used is shown in Figure 7. In addition to the outlier removal described above, feature extraction must be performed. Instead of the wind direction angle, the sine and cosine values are used, since this will avoid the gap between 0 degrees and 360 degrees. Feature standardization is useful for some of the machine learning algorithms used later. Since the influence analysis of features is performed, different combinations of them are selected. For each feature set, the dataset is divided into training and testing, and then machine learning models are applied.

2.3. Feature Extraction

Sines and cosines of the wind direction are used instead of the wind direction in degrees:

Wind direction sin = sin(Wind direction)
Wind direction cos = cos(Wind direction)

(1)

The next step is extracting the values of month and hour as the separate features instead of date and time. In addition, based on the hypothesis about how the power–wind speed curve changes according to seasons, a new categorical feature, “season”, was added to the original data (winter, spring, summer, and autumn). One-hot encoding was applied after that, so the season feature is transformed into four binary features.

For investigation of the hypothesis of using power values from the previous time step to enhance accuracy by accounting for the turbine’s inertial properties, this feature was added.

As a result, the full set of features is as follows:

Wind speed, m/s;
Wind direction sin;
Wind direction cos;
Month;
Hour;
Winter (0/1);
Spring (0/1);
Summer (0/1);
Autumn (0/1);
Theoretical power (kW);
Previous value of real power (kW);
Real power (kW) as a target.

Figure 8 shows the Pearson correlation coefficients for the numerical features.

Standardization of the data was provided according to the following equation:

F′ = (F − mean(F))/std(F)

(2)

where F is the values of a feature. It was applied to all features except the binary features and the target. Standardization of the dataset makes the distribution law closer to normal, which has a positive effect on the model training process.

2.4. Regression Models

The regression problem can be formulated as follows:

y_{i}^{*} = f (X_{i}, y_{i - 1})

(3)

where

y_{i}

is the actual i-th value of power,

y_{i}^{*}

is the predicted value, X is features, and f is a regression model.

Next, metrics were used to evaluate the forecasting results and compare different forecasting models:

R² (determination coefficient);
Adjusted R²;
RMSE, kW.

R² was calculated using the following equation:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - y_{i}^{*})}^{2}}

(4)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - y_{i}^{*})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(5)

A d j . R^{2} = 1 - \frac{(1 - R^{2}) (N - 1)}{N - p - 1}

(6)

where

\bar{y}

is the mean actual power value, N is the count of samples, and p is the count of features.

The following models were used:

Linear regression (LR);
Support vector machine (SVM);
k-nearest neighbors (kNN);
Decision tree (DT);
Random forest (RF);
Extremely randomized trees (ET);
Adaptive boosting (AB);
Gradient boosting (GB);
Extreme gradient boosting (XGB);
Categorical boosting, or CatBoost (CB).

Various models were used, since the primary aim of the research was not to select the best model, but to analyze the possibility of increasing the accuracy of determining the output power of a wind turbine based on wind speed data and additional features.

The large number of models based on ensembles of decision trees is because of their high efficiency in the analysis of time series in electric power engineering problems, which was confirmed by previous studies by the author of this paper [34,35] and by other studies, such as [36,37]. The implementations of the models was taken from the Scikit-learn library [38], except for extreme gradient boosting [39] and categorical boosting [40].

3. Results and Discussion

All the above models were applied to a dataset after the preprocessing with different feature compositions. To train and test the models listed above, the preprocessed dataset was divided into two parts: a training set and a testing set. The shares of these parts were selected using a commonly used ratio, 80/20.

To select hyperparameters, 30% of the training set was taken for validation. Since the goal was not to achieve maximum possible accuracy, but to study the influence of features, the simplest method for adjusting hyperparameters, random search, was used.

Table 3 and Table 4 show the RMSE (kW) and adjusted R² obtained at the test set. Values of R² are very close to the values of adjusted R² because the number of features is very small relative to the test dataset size (according to Equation (6)). For comparison, the RMSE and adjusted R², when using the theoretical curve, are 262.22 kW and 0.95962, respectively.

The best results were obtained using CatBoost; therefore, for this model, the influence of features on the RMSE and the significance (importance) of features are shown in Figure 8 and Figure 9.

Experiments have demonstrated the advantages of ensemble models over others, highlighting the complexity of the dependencies between wind turbine power and various features. It should be noted that among the ensemble models, those using gradient boosting (GB, XGB, and CB) performed better than those using adaptive boosting (AB) or bagging (RF and ET).

Removing the theoretical power as a feature has little impact on the best models. However, if this feature is present, the model will utilize it, as evident in Figure 10. As expected, wind speed is the most significant feature, but using wind speed alone significantly reduces accuracy. The hypothesis that incorporating binary features to account for seasonality would have an effect was not confirmed, as the significance of these features is very low. Conversely, the hypothesis about the benefits of using the actual power value from the previous time step received experimental confirmation. It should also be noted that wind direction and month (as a feature of seasonality) have a slight influence, as shown in Figure 10.

Figure 11 displays a scatter plot of real and predicted wind turbine power, while Figure 12 presents a comparison of real, theoretical, and predicted power.

4. Conclusions

In this study, an analysis was conducted to examine the influence of various characteristics beyond wind speed on the accuracy of estimating wind turbine output power. The analysis utilized one year of real turbine data with a 10 min measurement interval. The necessity of incorporating additional features beyond wind speed and the estimated theoretical power derived from the power curve is substantiated. It is discovered that incorporating power values from the previous time step can enhance accuracy by accounting for the inertial properties of the wind turbine.

Using categorical boosting, the root-mean-square error (RMSE) achieved was 113 kW, with the nominal capacity of the wind turbine under consideration being 3.6 MW. Consequently, the ratio of the RMSE to the nominal capacity is 3%. In comparison, the RMSE of the theoretical power curve was 262 kW. It is important to note that the data used in this study may contain measurement errors, which could potentially result in an underestimation of the power curve error in real-world scenarios. However, during the data preprocessing stage, all obvious measurement errors were identified and removed.

It should be noted that the accuracy of wind power forecasting over short periods of time will be higher when using a more accurate model of the power curve compared with the accuracy achieved using the widely employed theoretical power curve. This is because wind turbine power is influenced by various factors that are not accounted for in the traditional dependence of power on wind speed alone.

The limitations of this study include the use of data from only one year and one wind turbine. However, publicly available real data with a measurement interval of less than an hour could not be identified. We plan to acquire additional data in the future. Another direction for improving the research is the development of machine learning models for forecasting wind power based on forecasted wind speed and other meteorological parameters, rather than actual wind speed. These models will consider different forecasting horizons, including 10 min, 1 h, and 24 h. In addition, we will conduct a more in-depth analysis of the significance of the features in further studies.

Author Contributions

Conceptualization P.V.M. and A.I.K.; methodology P.V.M. and D.A.H.; software, P.V.M. and M.V.M.; validation, M.V.M.; formal analysis, A.I.K.; investigation, P.V.M.; resources, A.I.K. and P.V.M.; data curation, M.V.M.; writing—original draft preparation, D.A.H. and M.V.M.; writing—review and editing, P.V.M.; visualization, A.I.K. and P.V.M.; supervision, A.I.K.; project administration, A.I.K.; funding acquisition A.I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was carried out within the state assignment with the financial support of the Ministry of Science and Higher Education of the Russian Federation (subject No. FEUZ-2022-0030, development of an intelligent multi-agent system for modeling deeply integrated technological systems in the power industry).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Kaggle at https://www.kaggle.com/datasets/berkerisen/wind-turbine-scada-dataset, accessed on 22 August 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Erdinc, O.; Uzunoglu, M. Optimum design of hybrid renewable energy systems: Overview of different approaches. Renew. Sustain. Energy Rev. 2012, 16, 1412–1425. [Google Scholar] [CrossRef]
Zhou, Q.; Wang, C.; Zhang, G. Hybrid forecasting system based on an optimal model selection strategy for different wind speed forecasting problems. Appl. Energy 2019, 250, 1559–1580. [Google Scholar] [CrossRef]
Moreno-Munoz, A. Description of wind power forecasting systems. In Large Scale Grid Integration of Renewable Energy Sources; Institution of Engineering and Technology (The IET): London, UK, 2017. [Google Scholar]
Manusov, V.; Matrenin, P.; Nazarov, M.; Beryozkina, S.; Safaraliev, M.; Zicmane, I.; Ghulomzoda, A. Short-Term Prediction of the Wind Speed Based on a Learning Process Control Algorithm in Isolated Power Systems. Sustainability 2023, 15, 1730. [Google Scholar] [CrossRef]
Goretti, G.; Duffy, A.; Lie, T.T. The impact of power curve estimation on commercial wind power forecasts—An empirical analysis. In Proceedings of the 2017 14th International Conference on the European Energy Market (EEM), Dresden, Germany, 6–9 June 2017. [Google Scholar] [CrossRef]
Fang, S.; Chiang, H. A High-Accuracy Wind Power Forecasting Model. IEEE Trans. Power Syst. 2017, 32, 1589–1590. [Google Scholar] [CrossRef]
Ali, B.M. Wind Energy Prediction: Artificial Intelligence Perspective. In Proceedings of the 6th International Conference on Engineering Technology and Its Applications (IICETA), Al-Najaf, Iraq, 15–16 July 2023; pp. 885–891. [Google Scholar]
Erdem, E.; Shi, J. ARMA based approaches for forecasting the tuple of wind speed and direction. Appl. Energy 2011, 88, 1405–1414. [Google Scholar] [CrossRef]
Cadenas, E.; Rivera, W.; Campos-Amezcua, R.; Heard, C. Wind Speed Prediction Using a Univariate ARIMA Model and a Multivariate NARX Model. Energies 2016, 9, 109. [Google Scholar] [CrossRef]
Prema, V.; Bhaskar, M.S.; Almakhles, D.; Gowtham, N.; Rao, K.U. Critical Review of Data, Models and Performance Metrics for Wind and Solar Power Forecast. IEEE Access 2022, 10, 667–688. [Google Scholar] [CrossRef]
Hu, Y.-L.; Chen, L. A nonlinear hybrid wind speed forecasting model using LSTM network, hysteretic ELM and differential evolution algorithm. Energy Convers. Manag. 2018, 173, 123–142. [Google Scholar] [CrossRef]
Jiang, P.; Yang, H.; Heng, J. A hybrid forecasting system based on fuzzy time series and multi-objective optimization for wind speed forecasting. Appl. Energy 2019, 235, 786–801. [Google Scholar] [CrossRef]
Han, Y.; Mi, L.; Shen, L.; Cai, C.S.; Liu, Y.; Li, K.; Xu, G. A short-term wind speed prediction method utilizing novel hybrid deep learning algorithms to correct numerical weather forecasting. Appl. Energy 2022, 312, 118777. [Google Scholar] [CrossRef]
Mogos, A.S.; Salauddin, M.; Liang, X.; Chung, C.Y. An Effective Very Short-Term Wind Speed Prediction Approach Using Multiple Regression Models. IEEE Can. J. Electr. Comput. Eng. 2022, 45, 242–253. [Google Scholar] [CrossRef]
Matrenin, P.V.; Khalyasmaa, A.I.; Rusina, A.G.; Eroshenko, S.A.; Popkova, N.A.; Sekatskii, D.A. Operational Forecasting of Wind Speed for an Self-Contained Power Assembly of a Traction Substation. Proc. CIS High. Educ. Inst. Power Eng. Assoc. Energetika 2023, 66, 18–29. [Google Scholar] [CrossRef]
Ates, K.T. Estimation of Short-Term Power of Wind Turbines Using Artificial Neural Network (ANN) and Swarm Intelligence. Sustainability 2023, 15, 13572. [Google Scholar] [CrossRef]
Sun, H.; Qiu, C.; Lu, L.; Gao, X.; Chen, J.; Yang, H. Wind turbine power modelling and optimization using artificial neural network with wind field experimental data. Appl. Energy 2020, 280, 115880. [Google Scholar] [CrossRef]
Zhang, J.; Cheng, C.; Yu, S. Recognizing the mapping relationship between wind power output and meteorological information at a province level by coupling GIS and CNN technologies. Appl. Energy 2024, 360, 122791. [Google Scholar] [CrossRef]
Paik, C.; Chung, Y.; Kim, Y.J. Power Curve Modeling of Wind Turbines through Clustering-Based Outlier Elimination. Appl. Syst. Innov. 2023, 6, 41. [Google Scholar] [CrossRef]
Ouyang, T.; Kusiak, A.; He, Y. Modeling wind-turbine power curve: A data partitioning and mining approach. Renew. Energy 2017, 102, 1–8. [Google Scholar] [CrossRef]
Kou, P.; Gao, F.; Guan, X. Sparse online warped Gaussian process for wind power probabilistic forecasting. Appl. Energy 2013, 108, 410–428. [Google Scholar] [CrossRef]
Trivellato, F.; Battisti, L.; Miori, G. The ideal power curve of small wind turbines from field data. J. Wind Eng. Ind. Aerodyn. 2012, 107, 263–273. [Google Scholar] [CrossRef]
Villanueva, D.; Feijoo, A. Normal-based model for true power curves of wind turbines. IEEE Trans. Sustain. Energy 2016, 7, 1005–1011. [Google Scholar] [CrossRef]
Gottschall, J.; Peinke, J. How to improve the estimation of power curves for wind turbines. Environ. Res. Lett. 2008, 3, 015005. [Google Scholar] [CrossRef]
Feijoo, A.; Villanueva, D. Four parameter models for wind farm power curves and power probability density functions. IEEE Trans. Sustain. Energy 2017, 8, 1783–1784. [Google Scholar] [CrossRef]
Wang, Y.; Duan, X.; Zou, R.; Zhang, F.; Li, Y.; Hu, Q. A novel data-driven deep learning approach for wind turbine power curve modeling. Energy 2012, 270, 126908. [Google Scholar] [CrossRef]
Ibrahim, M.; Rassõlkin, A.; Vaimann, T.; Kallaste, A.; Zakis, J.; Hyunh, V.K.; Pomarnacki, R. Digital Twin as a Virtual Sensor for Wind Turbine Applications. Energies 2023, 16, 6246. [Google Scholar] [CrossRef]
Massel, L.; Massel, A.; Shchukin, N.; Tsybikov, A. Designing a Digital Twin of a Wind Farm. Eng. Proc. 2023, 33, 30. [Google Scholar] [CrossRef]
Pacheco-Blazquez, R.; Garcia-Espinosa, J.; Di Capua, D.; Pastor Sanchez, A. A Digital Twin for Assessing the Remaining Useful Life of Offshore Wind Turbine Structures. J. Mar. Sci. Eng. 2024, 12, 573. [Google Scholar] [CrossRef]
Chen, X.; Zhang, X.; Dong, M.; Huang, L.; Guo, Y.; He, S. Deep learning-based prediction of wind power for multi-turbines in a wind farm. Front. Energy Res. 2021, 9, 723775. [Google Scholar] [CrossRef]
Zou, M.; Djokic, S.Z. A review of approaches for detection and treatment of outliers in processing wind turbine and wind farm measurements. Energies 2020, 13, 4228. [Google Scholar] [CrossRef]
Wang, Y.; Hu, Q.; Li, L.; Foley, A.M.; Srinivasan, D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019, 116, 109422. [Google Scholar] [CrossRef]
Wind Turbine Scada Dataset. Available online: https://www.kaggle.com/datasets/berkerisen/wind-turbine-scada-dataset (accessed on 22 August 2024).
Bramm, A.M.; Eroshenko, S.A.; Khalyasmaa, A.I.; Matrenin, P.V. Grey Wolf Optimizer for RES Capacity Factor Maximization at the Placement Planning Stage. Mathematics 2023, 11, 2545. [Google Scholar] [CrossRef]
Matrenin, P.V.; Gamaley, V.V.; Khalyasmaa, A.I.; Stepanova, A.I. Solar Irradiance Forecasting with Natural Language Processing of Cloud Observations and Interpretation of Results with Modified Shapley Additive Explanations. Algorithms 2024, 17, 150. [Google Scholar] [CrossRef]
Joseph, P.; Deo, R.C.; Casillas-Pèrez, D.; Prasad, R.; Raj, N.; Salcedo-Sanz, S. Multi-Step-Ahead Wind Speed Forecast System: Hybrid Multivariate Decomposition and Feature Selection-Based Gated Additive Tree Ensemble Model. IEEE Access 2024, 12, 58750–58777. [Google Scholar] [CrossRef]
Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Wind Power Prediction Using Ensemble Learning-Based Models. IEEE Access 2020, 8, 61517–61527. [Google Scholar] [CrossRef]
Scikit-Learn. Machine Learning in Python. Available online: https://scikit-learn.org/stable/ (accessed on 12 August 2024).
XGBoost Documentation. Available online: https://xgboost.readthedocs.io/en/stable/ (accessed on 18 August 2024).
CatBoost. Available online: https://catboost.ai/en/docs/ (accessed on 24 August 2024).

Figure 1. Theoretical (traditional) power curve of a wind turbine (red) and areas of the outliers (blue); “1”—power output values close to zero despite significant wind speeds; “2”—real values are substantially exceeding calculated values; “3”—real values are significantly lower than calculated.

Figure 2. Visual highlight of outliers of the first group; the red lines highlight areas containing outliers.

Figure 3. Actual power as a function of wind speed (left) and the theoretical power (right).

Figure 4. Theoretical power curve (red) and observations form the SCADA system (blue).

Figure 5. Distribution and density plots of the real power (a), wind speed (b), theoretical power (c), wind direction (d).

Figure 6. Wind direction scatter plot.

Figure 7. Flow chart of the approach.

Figure 8. Correlation coefficients.

Figure 9. RMSE of CatBoost using different sets of features.

Figure 10. CatBoost feature importance for using all features.

Figure 11. Real vs. predicted power.

Figure 12. Comparison of real, theoretical, and predicted power on different dataset fragments: (a) 12–14 January 2018, (b) 9–11 February 2018, (c) 10–18 July 2018.

Table 1. Initial data fragment.

Date and Time	Real Power, kW	Wind Speed, m/s	Theoretical Power, kW	Wind Direction °
1 January 2018 00:00	380.047791	5.311336	416.328908	259.994904
1 January 2018 00:10	453.769196	5.672167	519.917511	268.641113
1 January 2018 00:20	306.376587	5.216037	390.900016	272.5964789
1 January 2018 00:30	419.645905	5.659674	516.127569	271.258087
1 January 2018 00:40	380.650696	5.577941	491.702972	265.674286

Table 2. Statistical evaluation of the initial data results.

Metric/Parameter	Real Power, kW	Wind Speed, m/s	Theoretical Power, kW	Wind Direction °
Samples number	46,976	46,976	46,976	46,976
Average value	1406	7.709	1546	123.7
Standard deviation	1309	4.269	1371	92.72
Minimum value	0.000	0.000	0.000	0.000
25th percentile value	168.5	4.467	215.0	50.00
50th percentile value	991.1	7.294	1154	73.50
75th percentile value	2613	10.48	3057	201.4
Maximum value	3618.7	25.21	3600	360.0

Table 3. RMSE (kW) obtained at the test set.

Model	All Features	All Except Theoretical Power	All Except Previous Real Power	All Except Features Based on Date and Time	All Except Seasonal Binary Features	Wind Speed Only
LR	183.70	229.58	252.52	184.31	184.16	459.71
SVR	277.08	298.01	334.36	192.77	282.59	254.69
kNN	133.31	164.85	130.63	136.88	132.38	260.12
DT	135.31	137.86	194.46	149.64	144.40	256.70
RF	121.32	121.59	169.68	132.16	125.52	243.65
ET	125.04	137.96	182.74	134.58	128.11	240.89
AB	125.04	123.24	135.58	138.95	125.10	611.49
GB	112.59	113.87	144.43	134.91	116.80	252.59
XGB	114.34	113.08	131.13	138.76	114.04	240.06
CB	113.25	114.98	123.81	132.88	116.67	239.63

Table 4. Adjusted R² obtained at the test set.

Model	All Features	All Except Theoretical Power	All Except Previous Real Power	All Except Features Based on Date and Time	All Except Seasonal Binary Features	Wind Speed Only
LR	0.98016	0.96901	0.96251	0.98004	0.98007	0.87589
SVR	0.95486	0.94779	0.93428	0.97817	0.95307	0.96190
kNN	0.98955	0.98402	0.98997	0.98899	0.98970	0.96026
DT	0.98924	0.98883	0.97777	0.98684	0.98775	0.96130
RF	0.99135	0.99131	0.98307	0.98974	0.99074	0.96514
ET	0.99081	0.98881	0.98037	0.98936	0.99035	0.96592
AB	0.99081	0.99107	0.98919	0.98866	0.99080	0.78041
GB	0.99255	0.99238	0.98774	0.98931	0.99198	0.96253
XGB	0.99231	0.99248	0.98989	0.98869	0.99236	0.96616
CB	0.99246	0.99223	0.99099	0.98963	0.99200	0.96628

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matrenin, P.V.; Harlashkin, D.A.; Mazunina, M.V.; Khalyasmaa, A.I. Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals. Appl. Syst. Innov. 2024, 7, 105. https://doi.org/10.3390/asi7060105

AMA Style

Matrenin PV, Harlashkin DA, Mazunina MV, Khalyasmaa AI. Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals. Applied System Innovation. 2024; 7(6):105. https://doi.org/10.3390/asi7060105

Chicago/Turabian Style

Matrenin, Pavel V., Dmitry A. Harlashkin, Marina V. Mazunina, and Alexandra I. Khalyasmaa. 2024. "Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals" Applied System Innovation 7, no. 6: 105. https://doi.org/10.3390/asi7060105

APA Style

Matrenin, P. V., Harlashkin, D. A., Mazunina, M. V., & Khalyasmaa, A. I. (2024). Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals. Applied System Innovation, 7(6), 105. https://doi.org/10.3390/asi7060105

Article Menu

Investigation of the Features Influencing the Accuracy of Wind Turbine Power Calculation at Short-Term Intervals

Abstract

1. Introduction

2. Materials and Methods

2.1. Initial Dataset and Exploratory Analysis

2.2. Pipeline of the Applied Method

2.3. Feature Extraction

2.4. Regression Models

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI