1. Introduction
Wind power is currently considered as the most promising source of renewable electricity in the world. Nevertheless, due to the non-stationary conditions to which wind turbines are subjected, performance monitoring and early fault diagnosis are non-trivial tasks. For this reason, despite wind turbines’ substantially constituting a mature technology, the O&M costs can still reach the order of 25% of the overall life-cycle costs [
In the wind energy practitioners’ community, the standard for the evaluation of wind turbine performance is the analysis of the power curve, i.e., the curve displaying the relation between the wind flow intensity and the power output: the IEC recommends the binning method [
3], consisting of averaging the power measurement per wind speed intervals of 0.5 m/s or 1 m/s. In general, the averaging or discretisation of wind turbine data [
4] provides meaningful indications. The power curve analysis has the great advantage of simplicity, but the drawback is that it does not account for the fact that the power of a wind turbine has a multivariate dependence on the environmental conditions and working parameters [
5]. Furthermore, the undisturbed wind flow is not measured directly: it is estimated through a nacelle transfer function based on downwind measurements collected behind the rotor span [
The widespread diffusion of Supervisory Control And Data Acquisition (SCADA) has been a turning point and has projected wind energy into the era of data. SCADA systems record a vast set of environmental, operational, mechanical, electrical, and thermal data with a frequency in the order of Hertz and store them upon averaging with a time basis of a few minutes (typically ten). Wind turbine performance monitoring and fault diagnosis have been therefore gradually evolving into data analysis problems: the general concept is that anomalous performance or incoming damages are individuated by analyzing the residuals between the measurements and data-driven normal behavior model [
17]. The critical points as regards this kind of approach are repeatability, generalization, supervision, and absence of theoretical standards, and the literature is focused on these aspects. Given these considerations, also the study of wind turbine power curves has become substantially a problem of data analysis and interpretation [
A recent line of research about wind turbine power curves regards multivariate approaches [
31]: the general idea is that the power of a wind turbine is the output of a data-driven model, which has further input variables in addition to the wind speed. Despite it have been shown that the wind speed can account for up to the 99% of the variance of the power [
32] and therefore further input variables can explain not more than the residue of 1%, this can be decisive in order to obtain data-driven models whose average error metrics are sufficiently low to guarantee a robust monitoring of wind turbine performance. For a recent review about multivariate wind turbine power curves, refer to [
From the discussion in [
33], it arises that the literature about wind turbine multivariate power curves is at its early stages, but some evidence has been gradually accumulating:
In regard to the latter point, the blade pitch and the rotational speed have been individuated as the most relevant operation variables to be employed in a multivariate wind turbine power curve model. This is definitely reasonable, given that the theoretical expectation is that the power of a wind turbine is (Equation (
In Equation (
P is the produced power and depends on the rotor radius
R, the air density
, the wind speed
v, and the power factor
, which depends on the blade pitch angle
and the tip-speed ratio
(or, in other words, the rotational speed
). The role of the rotational speed and blade pitch in data-driven models for the power was explored in detail in [
5]: in that study, the rotor speed and blade pitch were added once at a time to a Gaussian process regression for the power of a wind turbine. The main result was that the inclusion of both variables reduced the error metrics and the rotational speed was slightly more influential than the blade pitch. The use of these two variables was discussed also in [
A further relevant technical development for multivariate wind turbine power curve models is data clustering [
36]. Actually, it is likely that employing more than one sub-model might be more convenient than employing one model for all the power curve span. This is reasonable, because between the cut-in and the rated wind speed, it is possible to distinguish three operation regions, which have different features:
Near the cut-in (approximately between 3 m/s and 5 m/s of wind intensity), the rotational speed of a wind turbine is practically fixed and the blade pitch varies;
In the full aerodynamic load region (approximately between 5 m/s and 9 m/s of wind speed), the wind turbine attains the maximum possible aerodynamic efficiency by varying the rotational speed and holding the blade pitch practically fixed;
In the partial aerodynamic load region (approximately above 9 m/s of wind intensity), the rotational speed is held fixed at the rated speed and the load is varied by regulating the blade pitch.
Based on these considerations, the objective of the present study is formulating a multivariate method for data-driven wind turbine power curve analysis, which can be easily implemented in industrial applications. Therefore, each building block of a good multivariate wind turbine power curve model was simplified as much as possible. This was performed as follows:
The selection of the input variables was drastically simplified. Based on considerations similar to [
5], the selected covariates of the model are wind speed, blade pitch, and rotor speed;
The structure of the model was selected according to a compromise. It is evident that a linear model would be too simple: this was observed also in the recent study [
31]. Nevertheless, it is worth exploring if it is possible to account for non-linearity in a simplified way, namely through a polynomial in the above indicated input variables. Therefore, a polynomial LASSO regression was selected. The advantage of this kind of model structure is that a selection of the covariates is substantially performed when the coefficients of the polynomial are set. A covariate, which was excluded from the model, has a vanishing coefficient in the polynomial;
The data clustering was performed using the k-means algorithm on a reference data set: it was selected because it is a very well-established method, which can easily be implemented in industrial applications. The number of clusters was set automatically by computing the average silhouette score for each cluster number arrangement.
The peculiarity and the innovativeness of the present study can therefore be individuated in a complex application (data-driven multivariate wind turbine power curve), characterized by several critical points, which must be mastered in depth, in order to provide solutions to each building block of the problem, which can be as simple as possible. This approach is novel in the scientific literature, because the applications of data clustering and non-linear multivariate power curves are at their early stages, and it is interesting for the industry because the use of data-driven models for custom wind turbine performance monitoring has been gradually becoming a necessity. The general structure of the method is summarized in the workflow of
Figure 1.
The study is organized as a test case discussion, based on the analysis of the SCADA data of a Vestas V52 wind turbine sited in Italy (850 kW rated power): the data were provided by the Lucky Wind company. Practically, the goodness of the proposed approach is discussed through the analysis of the most common performance metrics (MAE, MAPE, RMSE) for the validation of the data-driven model. Furthermore, a method for analyzing the accumulated performance change is proposed: it is based on the analysis of how the MAPE changes in two subsets of the target data set. This procedure is useful in the applications for individuating small performance changes accumulated along a relatively long period: this can happen in the form of performance improvement due to technology optimizations [
37] or, vice versa, in the form of performance worsening [
39], which can occur due to the aging of the machine.
The work is therefore organized as follows: In
Section 2, the test case and the data set are described. In
Section 3, the method is described. The results are collected in
Section 4, and the conclusions are drawn in
Section 5.
2. The Test Case and the Data Sets
The wind turbine of interest is sited in Italy and the model type is Vestas V52. The rated power is 850 kW; the cut-in wind speed is 4 m/s; the rated wind speed is 14 m/s. The rotor diameter is 52 m.
Two data sets were employed:
The available validated measurements have 10 min of sampling time, and those at disposal for the present study are:
The data pre-processing was based on the following steps, which are easily replicable in industrial applications:
The data were filtered on the wind turbine’s normal operation using the appropriate runtime counter provided by the SCADA system;
Industrial wind farms rarely operate under curtailment dictated by grid requirements, and this is the case also for the present wind turbine. This aspect has nothing to do with wind turbine performance, and therefore, operation under curtailment should be filtered out for the purposes of the present work. This can be done by noticing that a wind turbine operates in derated conditions by pitching anomalously with respect to the normal. Therefore, the average wind speed/blade pitch [
40] can be used for individuating outliers associated with derating. The filtering can be practically performed by eliminating time steps where the blade pitch deviates more than a threshold (
in this study) with respect to the average blade pitch for the given wind speed;
For each wind turbine, data were filtered between cut-in and the rated speed because the power monitoring becomes trivial above the rated speed.
Upon filtering,
constitutes 18,000 samples and
19,930 samples. A summary of the features of the data sets is reported in
Table 1.
The dependence of wind turbine power on environmental conditions is a widely debated issue, in particular as regards the external temperature [
41]. Given the applied point of view of the present study, the simplest method was selected: it consists of the renormalization of
by considering the effect of air density as indicated in Equations (
2) and (
is the corrected wind speed,
is the air density measured on site,
is the air density in standard conditions,
is the absolute temperature in standard conditions (288.15 K), and
T is the absolute ambient temperature measured on site.
An example of the raw data set and of the pre-processed data set (resulting in
) is reported in
Figure 2.
5. Conclusions and Further Directions
In the present study, a method for multivariate wind turbine power monitoring was proposed. The objective was conjugating simplicity, which can be exploited in industrial applications, with the awareness of the critical points regarding wind turbine multivariate power curves. Actually, in the present study, a meaningful simplified solution was proposed for the following points:
Set of covariates: The method employs wind speed, blade pitch, and rotor speed and accounts for the dependence on the external temperature by renormalizing the wind speed;
Model structure: The non-linearity was taken into account through the simplification of a polynomial up to cubic terms in the above-listed input variables. A LASSO regression was performed, which allows formally maintaining the structure of a linear regression;
Input variables’ selection: Through the K-fold cross-validation of the LASSO regression, the irrelevant input variables were discarded by setting to zero the corresponding coefficient in the polynomial;
Data clustering: The well-established k-means algorithm was employed to divide the multi-dimensional data appropriately, and a separate sub-model was set up for each cluster.
The method was tested on real-world data from a Vestas V52 wind turbine owned by the Lucky Wind company and sited in Southern Italy. The reference data set was employed for training the k-means algorithm and selecting the optimal data clustering (based on the silhouette coefficient): was selected, and this qualitatively coincides with the main different control regions of a modern wind turbine. Subsequently, a polynomial LASSO regression for the power of the wind turbine was performed for each obtained cluster: the reference data were used for selecting the input variables and setting the regression coefficients through the K-fold cross-validation. The performance of the model was quantified by analyzing the discrepancy between the measurement and simulation in the target data set.
It resulted that the mean absolute error of the model in the validation data set was 12 kW (1.4% of the rated power), corresponding to a mean absolute percentage error of 7.2%. The absolute errors did not increase in the near-rated region with respect to the moderate wind speed region (Cluster 3 against Cluster 2), and therefore, the percentage errors decreased, reaching 2.8% on average. This result is interesting because it is typical that, vice versa, the near-rated region is the most critical as regards power monitoring. It was noticeable that, despite the simplifications of the proposed methods, the obtained average error metrics were competitive with the state-of-the-art in the literature, as can be argued by the discussion in
Section 4.3 and by comparing against
Table 1 in [
Furthermore, an analysis was devoted to monitoring the accumulated performance. The rationale for this analysis was that the order of magnitude of some performance changes possibly occurring in a wind turbine’s lifetime is particularly small, but affects all the observations from a given moment. Therefore, monitoring small performance changes along long periods requires shifting the focus to the difference accumulated in a period against a reference one. To this aim, the target data set considered in this study was randomly split in two, and it was observed that the proposed method had a remarkably high sensitivity because the average accumulated percentage difference that it is possible to detect was of the order of 0.001%.
Therefore, the main lesson from this study, which can be particularly useful in the wind energy practitioners community, is that for application purposes, a multivariate wind turbine power curve model does not need to be overly complicated, but should rather contribute intelligently to each of the critical points regarding this type of problem.
The approach of this work was mainly methodological, but it should be emphasized that it has several practical applications. For example, a similar, albeit more complex method, was employed in [
53] for quantifying the effect of aging on wind turbine performance. A further application of the present study, which is being pursued at present, is the estimation of the effects of icing on wind turbine performance. Actually, the increasing exploitation of wind energy in harsh environments due to increasing demand for renewable energy production has been posing the issue of characterizing the performance of wind turbines in extreme conditions. Blade icing can likely be individuated as reduced rotor speed, and consequently extracted power, for a given wind speed [
56]: therefore, a model similar to the one proposed in the present paper should be useful to individuate the behavior of a wind turbine in icing conditions.
Furthermore, a topic that has been recently attracting attention in the wind energy literature is the effect of static and dynamic yaw error on wind turbine performance [
60]; the incorporation of such an effect in multivariate wind turbine power curve models would be a valuable development. Finally, a useful remark is that the the number of possible covariates could be enlarged quite arbitrarily; in that case, a dimension reduction algorithm such as principal component analysis [
61] should be included in the modeling chain.