Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models

Kopyt, Marcin; Piotrowski, Paweł; Baczyński, Dariusz

doi:10.3390/en17236194

Open AccessArticle

Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models

by

Marcin Kopyt

,

Paweł Piotrowski

^*

and

Dariusz Baczyński

Electrical Power Engineering Institute, Warsaw University of Technology, Koszykowa 75 Street, 00-662 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(23), 6194; https://doi.org/10.3390/en17236194

Submission received: 3 November 2024 / Revised: 28 November 2024 / Accepted: 5 December 2024 / Published: 9 December 2024

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Smart Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

High-quality short-term forecasts of wind farm generation are crucial for the dynamically developing renewable energy generation sector. This article addresses the selection of appropriate gradient-boosted decision tree models (GBDT) for forecasting wind farm energy generation with a 10-min time horizon. In most forecasting studies, authors utilize a single gradient-boosted decision tree model and compare its performance with other machine learning (ML) techniques and sometimes with a naive baseline model. This paper proposes a comprehensive comparison of all gradient-boosted decision tree models (GBDTs, eXtreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost)) used for forecasting. The objective is to evaluate each model in terms of forecasting accuracy for wind farm energy generation (forecasting error) and computational time during model training. Computational time is a critical factor due to the necessity of testing numerous models with varying hyperparameters to identify the optimal settings that minimize forecasting error. Forecast quality using default hyperparameters is used here as a reference. The research also seeks to determine the most effective sets of input variables for the predictive models. The article concludes with findings and recommendations regarding the preferred GBDT models. Among the four tested models, the oldest GBDT model demonstrated a significantly longer training time, which should be considered a major drawback of this implementation of gradient-boosted decision trees. In terms of model quality testing, the lowest nRMSE error was achieved by the oldest model—GBDT in its tuned version (with the best hyperparameter values obtained from exploring 40,000 combinations).

Keywords:

gradient-boosted decision tree; machine learning; short-term forecasting; wind farm; hyperparameter tuning; energy generation; performance

1. Introduction

The growing importance of renewable energy sources (RES) and the rapid expansion of wind farms in most countries have underscored the need for accurate energy generation forecasts, particularly as wind farms are increasingly contributing to the overall power system. Cost-effective and optimized management of power systems relies on achieving the highest possible accuracy in RES generation forecasts. Energy generation forecasts for wind farms, especially in the short term, have been extensively researched i.a. using machine learning (ML) techniques including various gradient-boosted decision trees models [1]. This class of models, particularly the eXtreme Gradient-Boosting (XGBoost) model, is highly popular and often cited as one of the best for the mentioned purposes [2,3,4,5]. It is also worth noting that Light Gradient-Boosting Machine (LightGBM) and Categorical Boosting (CatBoost) models are used significantly less frequently, as is the original, earliest version of the gradient-boosting decision tree (GBDT) model.

In theory, as suggested by model developers, newer and more complex models with a greater number of hyperparameters should achieve better forecast accuracy and perform computations quicker. Gradient-boosted decision tree models were originally designed for regression problems, although they are now more commonly used in classification tasks and are considered among the best. However, in the context of regression problems, particularly in forecasting energy generation in RES, the issue of model quality and efficiency has not been extensively studied. Quite often, articles present forecast results, but there is a lack of information on whether the models were “tuned”—i.e., whether hyperparameters other than the default ones were tested. The analysis in this work aims to determine whether model tuning is justified and how much improvement in forecast accuracy can be achieved through tuning.

1.1. Evolution of Gradient-Boosted Decision Tree Models

The GBDT model was introduced by J. H. Friedman in 1999. In article [6] (2001), J. Friedman describes the concept of gradient boosting and discusses the use of decision trees as base learners in boosting algorithms. This work is foundational for understanding the development of gradient-boosted decision trees. Another book [7] (2001) provides a comprehensive overview of various statistical learning techniques, including decision trees and ensemble methods, like boosting and bagging. This book includes a description of the MART (Multiple Additive Regression Trees) algorithm, which is a generic gradient tree boosting algorithm.

XGBoost was introduced in 2014 by Tianqi Chen as part of his research while he was a Ph.D. student at the University of Washington, and its description can be found in [8] (2016). XGBoost is an optimized gradient-boosting library designed to be highly efficient, flexible, and portable, with the algorithm providing parallel tree boosting (also known as GBDT or GBM) [9]. The code is available as an open-source package at GitHub [9]. The model gained popularity quickly due to its efficiency, scalability, and ability to handle large datasets and complex models. The advantages of the model compared to GBDT include the following: support for distributed computing, allowing it to scale across multiple machines and handle very large datasets; use of multiple CPU cores for parallel computation to speed up the training process; use of advanced optimization techniques, such as cache-aware access patterns and data compression, to make the learning process faster; including built-in regularization (Lasso Regularization (L1) and Ridge Regularization (L2)) to prevent overfitting; and use of more sophisticated “max depth” tree pruning to avoid overfitting.

LightGBM was introduced in 2017 by Microsoft Research. In paper [10], there is a description of the LightGBM model. It is a fast, scalable, and efficient implementation of GBDT that can handle high-dimensional data. It was designed to outperform other gradient-boosting implementations, particularly in terms of speed and memory usage. The code is available at GitHub [11]. The advantages of the model include the following: fast training speed and high efficiency, low memory usage, high accuracy, support of parallel computing, GPU learning, and capability of handling large-scale data [11].

CatBoost was developed by Yandex, a Russian multinational IT company, in 2017. This algorithm handles categorical features without the need for extensive preprocessing, such as one-hot encoding. In paper [12] (2018), there is a description of the CatBoost model. The novelty of this solution lies in its ability to handle categorical variables natively, overfitting reducing, ordered boosting technique, and its use of oblivious trees (symmetric decision trees). All mentioned mechanisms contribute to the algorithm’s efficiency and accuracy, and the code is available as an open-source package at GitHub [13]. The advantages of the model include the following: fast training speed, best in class prediction speed, support for both numerical and categorical features, and fast GPU and multi-GPU support for training [13].

1.2. Review of Applications of Gradient-Boosted Decision Trees in Wind Energy Generation Forecasting

Gradient-boosted decision trees, especially XGBoost, have emerged as a powerful tool for predicting different kinds of time series. From the very beginning, they were also applied to wind energy generation [1,14]. Their ability to model non-linear relationships and interactions between variables makes them compatible with the dynamic nature of wind energy.

GBDT applications in wind energy forecasting are consistent with general trends in this area. Considering the forecast horizon, the most common publications concern ultra-short-term [2,15,16,17,18] and short-term forecasts [17,18,19], while long-term forecasting appears less frequently [20]. Some works were atypical, as they declared forecasting as their topic, while actually estimating the generated power without an actual time difference between timestamps of input and output [3,21].

Like other forecasting methods, GBDT can solely utilize the forecast time series data or use it with additional exogenous explanatory variables [4]. Most often, these data are numerical weather predictions (NWP) [22], but sometimes data from nearby wind farms are also used [17]. In general, it is necessary to select the input data to the model to increase its effectiveness [15,16], which is particularly important for large datasets. In this task, XGBoost was used [23] for feature selection to optimize a set of input variables for their temporal convolution network model, indicating that tailored features significantly impact forecasting accuracy.

In most of the analysed works, the best results were obtained using XGBoost. However, when the explanatory dataset was large, better results were obtained using more complicated approaches, including hybrid ones. Zheng et al. [2] presented a method that integrates Long Short-Term Memory (LSTM) networks with XGBoost to enhance ultra-short-term wind power forecasting using technical indicators. This combination captures temporal dependencies while benefiting from GBDT’s wind speed predictive capabilities. Gao [5] proposed the Lasso-CNN-LSTM LightGBM model. This model showed its superiority over the BP neural network and SVM, across both very short-term and short-term forecasting time scales. A cooperation with XGBoost other than ANN—BGRU is presented in [24]. BGRU is applied to establish the potential relation between a decomposed NWP wind speed sub-series and measured wind speed and obtains the proposed wind speed correction model. The corrected NWP wind speed is used to forecast wind power by XGBoost.

Among the analysed works, there are some that concern the application of a single method [15,18,20,21] from the GBDT family, but most of them present results for more than one [19,23,25,26,27,28,29]. Only a few of them present a discussion on the optimization of hyperparameters of machine learning models [3,19]. Hence, it can be assumed that usually the optimization of hyperparameters in individual works is rather limited or individual methods are used with default settings. This can lead to imprecise conclusions. Therefore, the authors of this article decided to extensively study the influence of hyperparameter values on the results obtained by individual GBDT methods.

1.3. Review of Studies Comparing the Effectiveness and Efficiency of Gradient-Boosted Decision Trees

The latest models (LightGBM, CatBoost) achieve significantly better results in efficiency (training speed tests) compared to XGBoost and GBDT. It is important to emphasize that for very large training datasets, a shorter training time has practical and economic significance (lower energy consumption). However, from the perspective of model accuracy (effectiveness), the results are not always as straightforward as in the case of evaluating model training speed (efficiency).

For example, in article [30], the Epsilon dataset was used for binary classification. The training dataset contains 400,000 objects. Each object is described by 2001 features (one is the label, the remaining are numerical features). The learning speed varied significantly across different models. The fastest was the CatBoost model (527 s), followed by LightGBM (1146 s), and finally XGBoost (4339 s). While the difference between CatBoost and LightGBM was not drastically large (twice the time), XGBoost took over 8 times longer. The same article also presented results for the HIGGS dataset for binary classification. The training dataset contained 10,500,000 objects. Each object was described by 29 features (one is the label, the remaining are numerical features). In this case, the number of training samples was over 26 times larger, but the number of input features was significantly smaller (almost 69 times). The shortest training time was achieved by the LightGBM model (438 s), followed by CatBoost (770 s) and finally XGBoost (881 s). For both datasets, the XGBoost model had the longest training time. It is worth noting that these times refer to training the models using the computer’s CPU. It should be emphasized that the use of GPUs significantly accelerates the training time of models in the vast majority of cases without affecting the ranking order of the models.

In article [31], a comparison of training speed for a regression problem is described. The goal was to predict the baseline sales for all different types of beverages on offer. The total data points are approximately 1.5 million records spanning several months and 30 engineered features (25 numeric attributes and 5 categorical attributes). The shortest training time was achieved by the LightGBM model (17.79 min), followed by CatBoost (19.9 min) and finally XGBoost (39.2 min). In this case, the LightGBM model achieved the shortest time, with the CatBoost model achieving a slightly longer time, while the XGBoost model recorded a time more than twice as long as that of the best-performing model. When considering the problem from the perspective of forecasting quality, the ranking based on MAPE (Mean Absolute Percentage Error) is as follows: CatBoost (20.0%), XGBoost (21.1%), and LightGBM (28.3%). Therefore, taking into account both quality and training speed, the CatBoost model is the most advantageous.

In paper [32], the Kaggle dataset of flight delays for the year 2015 was used, as it included both categorical and numerical features (the total number of input features was 9). The number of training samples was nearly 5 million. The shortest training time was achieved by the LightGBM model (326 s), followed by CatBoost (390 s) and finally XGBoost (970 s). While the difference between CatBoost and LightGBM was very small, XGBoost took almost 3 times longer. Regarding the quality results for this classification problem (AUC score), the rankings from best to worst performance on the test data are as follows: CatBoost (0.816), XGBoost (0.789), and LightGBM (0.772). In this case, also taking into account both quality and training speed, the CatBoost model is the most advantageous.

Paper [30] also analyzed another crucial issue: which gradient-boosted decision tree model achieves the best performance with default hyperparameters (without tuning to improve quality). Nine large datasets were used for testing (encompassing both classification and regression problems). Interestingly, for all 9 analyses, the CatBoost model achieved the best performance. The XGBoost and LightGBM models consistently had worse results, ranging from 0.76% to 21.04% depending on the dataset. Notably, in 8 out of 9 cases, XGBoost achieved a better (closer to CatBoost) result than LightGBM.

2. Data and Forecasting Methods

2.1. Data

The data were obtained from Sotavento Galicia, S. A., covering the period from 1 January 2020 to 14 August 2024. The dataset pertains to a wind farm consisting of 24 wind turbines with a total power rating of 17 MW [33]. Three time series with 10-min intervals were available, wind speed, wind direction, and energy production, collected from an onshore wind farm. The data source contains only historical values for these three time series. Forecast values for meteorological data were not provided.

2.1.1. Preprocessing of Data

The data preprocessing was done in six steps. First, the data were roughly cleaned to eliminate measures differing from the general range of values by a magnitude. Observations with values greater than the general range of the given variable were eliminated. This allowed for further plot-based analyses without extreme outliers extending the plots and statistics of data. This allowed the data from years 2023–2024 to be discarded due to radical changes in the observed electricity generation range. Additionally, sudden atypical drops of energy generation values were spotted. The phenomena started appearing at the beginning of November 2022 and lasted until the end of the year. The moment of change of the described behavior is shown in Figure 1.

The moving average was proposed as way of restoring the process bound. Three different variants of the average were tested to achieve the new value for period T. The first one used observations from periods T (zero lag) and T − 1, the second one used values from T − 1 and T + 1, and the third one used values from T − 2, T − 1, T + 1, and T + 2. While the third method improved the shape most consistently, it did not solve the problem by itself. For some observations, plummets appearing for lasting periods of time made the bare moving average not sufficient. Due to that, an additional criterion was introduced in data cleaning. Observations for period T with the ratio of values from periods T and T − 1 lesser than 1.3 were treated as probable and changed to empty values otherwise. For the latter case, empty values were left to be filled during the imputation stage.

The third step concerned data filtering. During this stage improbable values were eliminated. Two replace-with-empty conditions were created. The first condition concerned energy generation lower than 2% of the rated power recalculated to an energy under wind speed greater than 6 m/s (2–3 m/s more than typical start-up speed of wind turbine). The second condition concerned generations greater than 30% of rated power recalculated to energy. Cleaning conditions were applied to eliminate both possible erroneous generation measurements and possible temporary decreases in rated power due to the control operations. Since using wind measurements does not introduce additional weather forecast errors, it can be treated as a reasonable method of filtering out improbable records.

In the next step, data imputation was performed. In the observed data, sample-to sample change was usually not constant, and variables often changed monotonicity. This behavior required an accurate interpolation method to reasonably replicate the process components. Time series interpolation with the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) method was used to fill in missing values. The method was chosen due to its non-linear character and thus its ability to capture complex dependencies better and also its ability to eliminate overshoots appearing during interpolation with cubic spline interpolation with a lack of constant sample-to-sample change [34]. Since weather changes dynamically, a conservative assumption was made that missing values of data gaps longer than 30 min (with 10 min time series resolution) could not be estimated in a reliable way. Hence, a sample limit of three was assumed for interpolation, as a balance between preserving variable change patterns after interpolation, keeping lagged variables as usable for model (not possible without interpolating over a shorter time window) and relevancy of data restored due to the interpolation.

During the fifth step, additional features were created. For the wind direction new variables were introduced in order to eliminate sudden jumps in variable values occurring when direction oscillates around the north. In the described situation, values change from close to 360 to close to 0, which can potentially be interpreted by models as a large change. The sine and cosine of the wind direction were used to better reflect the described phenomenon.

The last step of data preprocessing concerned the creation of lagged values of obtained variables. For each variable, lags up to ten periods were created as additional variables. The values were then left for further statistical analyses.

2.1.2. Statistical Analysis of Data and Development of Input Data Including Feature Engineering

The processed data after the preprocessing phase covers the period from 1 January 2021 to 31 December 2022. In the first step, an analysis was conducted to verify which lagged values of the forecasted time series of electricity generation could potentially serve as valuable input data for the predictive models. For this purpose, Pearson’s linear autocorrelation values for the electricity generation time series up to 60 h back (360 lagged 10-min intervals) were calculated. Figure 2 shows the autocorrelation function (ACF) of electricity generation from the wind farm. The first 12 ACF values were above 0.9, indicating a very strong correlation.

The last 6 lagged values were selected as input data. The first lagged value had an ACF value of 0.9974, while the sixth lagged value had an ACF value of 0.9515 (the variant with the largest amount of input data).

Figure 3, on the other hand, presents a scatter plot between energy generation in period T and wind speed in period T − 1. The relationship between energy generation and wind speed was nonlinear. Moreover, there was a noticeable amount of scatter in the plot, which may indicate that wind speed (lagged values) may not be as significant as input data compared to the lagged values of energy generation. The Pearson linear correlation coefficient between energy generation in period T and wind speed in period T − 1 was 0.9525, which was close to the correlation between energy generation in period T and energy generation in period T − 6, which was 0.9515. The last 6 lagged values of wind speed were selected as input data. The first lagged value had a correlation of 0.9525, while the sixth lagged value had a correlation of 0.8249. This represented the variant with the largest number of input data points.

In the case of Pearson’s linear correlation between energy generation and the lagged values of wind direction (in degrees), we observed two things: on the one hand, the correlation values were very similar (for wind direction lagged by 1 period, the correlation was 0.200, and for values lagged by 6 periods, the correlation was almost the same, at 0.1959); on the other hand, these correlation values were very small and likely had a negligible or insignificant impact on the quality of energy generation forecasts when used as input data. It is also worth noting that information about wind direction expressed in degrees can be quite misleading for a predictive model within a certain range of degrees. For example, a wind direction of 2 degrees and 359 degrees represents almost the same wind direction, yet they come from nearly opposite ends of the range (close to the maximum and minimum values). A solution to this problem is to represent wind direction using two values created during feature engineering, as described in Section 2.1.1. In this paper, the wind direction was represented as the sine and cosine of the wind angle. This way, oscillations of wind direction around north were represented as oscillations around 0 for the sine wave and 1 for cosine, with changes a lot smoother that in the original form.

Another feature engineering technique is the use of a smoothed value of the forecasted time series as additional input data. Smoothing is recommended to reduce the random component. For this purpose, a new input data point was created using the sum of the three lagged values of energy generation from T − 1 to T − 3, with expert-selected weights whose total equals one.

The smoothed time series values of energy generation were calculated using Equation (1).

{E G}_{t - 1}^{s m o o t h e d} = {E G}_{t - 1} \cdot w_{t - 1} + {E G}_{t - 2} \cdot w_{t - 2} + {E G}_{t - 3} \cdot w_{t - 3},

(1)

where

{E G}_{t - 1}^{s m o o t h e d}

is the smoothed value of electricity generation for period t − 1,

{E G}_{t - k}

is the value of electricity generation for period t-k,

w_{t - 1} = 0.80

,

w_{t - 2} = 0.15

, and

w_{t - 3} = 0.05

.

The R coefficient between energy generation in period T (output data) and the smoothed energy generation in period T − 1 (input data) was very high (0.9969), slightly lower than the correlation for energy generation in period T − 1 (0.9977) but higher than for energy generation in period T − 2 (0.9919). This suggests the potentially significant importance of the smoothed energy generation as an input variable for the forecasting model.

Table 1 contains a list of all prediction inputs (two variants) and their corresponding codes.

2.1.3. Dataset Division

The number of complete datasets (inputs and outputs) amounted to 94,008. This represents 89.5% of the datasets from all 10-min intervals over the 2-year period. Unfortunately, some of the data were so incomplete that they could not be used for forecasting purposes. The available dataset was divided as follows: the first 80% of the data formed the training subset, the next 10% constituted the validation subset (used for selecting the appropriate hyperparameters for the models), and the final 10% formed the test subset (used for the final evaluation of the models’ forecast accuracy). The chosen proportion of subsets (limiting the number of validation and test samples) was established due to the large number of datasets (nearly 100,000).

2.2. Gradient-Boosted Decision Tree Forecasting Methods

Historically, the algorithm for gradient-boosted decision trees evolved from the application of boosting methods to regression trees. The main premise is to create a sequence of (very) simple trees, each of which is constructed to predict the residuals generated by the previous ones. The method builds binary trees, meaning that the data are split into two subsets at each decision node. In successive boosting steps (of the tree-boosting algorithm), a single (optimal) data split is determined, and the deviations of observed values from the means (residuals in each split) are calculated. The next tree is fitted to these residuals and determines the next split, at which the variance of the residuals (i.e., the error) is further reduced (for the given sequence of trees) [35]. According to the incremental strategy used in gradient methods, the quality of the model improves with each subsequent iteration (errors decrease with each iteration). However, there is a risk that the model may become overfitted, as it becomes well-fitted only to the training data. In such a case, the model loses its ability to generalize results.

The primary objective of the study was to compare the effectiveness and efficiency of four different gradient-boosted decision tree models in a regression problem (10-min horizon wind farm energy generation forecast). The analysis focused on the following models sorted by year of creation: GBDT (1999) (description in [6]), XGBoost (2014) (description in [8]), LightGBM (2017) (description in [10]), and CatBoost (2017) (description in [12]). A naive (persistence) model was used as a baseline for comparison (description in [36]).

Each model comes with default hyperparameter values, likely determined based on experimental results. The designers’ intent was almost certainly to maximize the universality of these default values, allowing for good or very good performance in both regression and classification problems without the need for additional hyperparameter tuning. However, these default values are likely not optimal for every problem. Therefore, it is worthwhile to verify whether and how much results can be improved by tuning the hyperparameters of these models for a regression problem (forecasting). Additionally, it is important to assess whether newer, more complex (and enhanced) models with more hyperparameters actually achieve better results in the context of regression problems. Table 2 provides a summary of the key hyperparameters for each of the four gradient-boosted decision tree models (regression version), along with their default values.

Based on the information provided in Table 2, several observations can be made. The oldest and first model, GBDT, has the smallest number of main hyperparameters (8). In contrast, the LightGBM model has the largest number of hyperparameters (19). It should be underlined that some hyperparameters have their default value set to an “off” state (inactive hyperparameter).

Another important observation is the considerably larger default number of trees in the CatBoost model, which is ten times greater than that of the other models. The most common learning rate is 0.1 (for GBDT and LightGBM). However, the learning rate for CatBoost (0.03) is noticeably lower compared to the other models, and in comparison to XGBoost (0.3), it is ten times smaller.

The number of tree levels (tree depth) ranges from 3 to 6, reflecting the nature of these models, which typically involve building relatively shallow trees, in contrast to models like Random Forest (RF), which have a much more complex tree structure. An exception is the LightGBM model, which does not impose any restrictions on the number of tree levels (no regularization when the model is run with default settings).

A distinctly notable feature of the CatBoost model, compared to the other three models, is the strategy used for growing trees (grow policy). The other three models do not have such features. It determines how the trees are grown. The default option for the hyperparameter “grow_policy” is “symmetricTree,” in which trees are grown symmetrically, and each split is made in a balanced way, ensuring that the tree is balanced and efficient. For the “depthwise” option, trees are grown level by level, with splits made across all levels of the tree before moving deeper. This policy allows for more complex trees with potentially more splits. However, for the “lossguide” option, trees are grown according to the loss function, focusing on regions with the highest loss reduction. This can lead to more precise splits but can also result in overfitting if not properly controlled.

Another distinctive feature of the CatBoost model (not present in the other models) is its ability to control the intensity of the Bayesian bootstrap with sampling data for each iteration. The hyperparameter “bagging_temperature” is used to control the intensity of the Bayesian bootstrap during the data sampling process for each iteration of model training. A higher temperature increases the variability of the sample weights, leading to more randomness in the data used for each iteration. For low “bagging_temperature” values (close to 0), the sampling process becomes deterministic, meaning each data point is more likely to be sampled uniformly, similar to standard bootstrap sampling. The default value of “bagging_temperature” is 1, which provides a balance between randomness and uniform sampling.

Another distinctive feature of the CatBoost model (not present in the other models) is its ability to control randomness in the process of selecting features and splits. The hyperparameter “random_strength” manages the level of randomness introduced into the model to help with overfitting. The default value for “random_strength” is 1.0. This hyperparameter influences the regularization of the model by introducing randomness into the feature and split selection process, which can enhance generalization. In contrast, a unique hyperparameter found in the LightGBM model, which the other models do not have as a regularization option, is the “number of leaves in full trees,” set to 31.

The hyperparameter “minimum samples in leaf” is set to 1 for three of the models. Only the LightGBM model has a value of 20. From the perspective of regression problems (particularly in forecasting), it appears that a value of 1 is rather too low (effectively no regularization), and such models may be prone to overfitting.

The “ridge regularization (L2)” hyperparameter does not exist in the GBDT model and is disabled (default value) in model LightGBM. CatBoost model has this hyperparameter activated (default value = 3). The “l2_leaf_reg” hyparameter is used to add an L2 regularization term to the objective function during the training process. This regularization penalizes large weights in the leaf nodes of the decision trees, which can help to reduce the variance of the model and prevent overfitting. When “l2_leaf_reg” is set to 3, it means that CatBoost applies a moderate amount of L2 regularization to the leaf weights. This value is often used as a default or starting point in many scenarios, providing a balance between underfitting and overfitting. However, in the XGBoost model, the default value of “reg_lambda” is 1, and it means that L2 regularization is applied (active) with a moderate strength. This helps in reducing overfitting by adding a penalty proportional to the squared magnitude of the coefficients.

Interestingly, the CatBoost model employs only L2 regularization, unlike XGBoost and LightGBM models, which incorporate both L1 and L2 regularization hyperparameters. However, the basic GBDT model does not incorporate either L1 or L2 regularization.

The number of threads to use during training is identical across the three models, with all available threads utilized. The exception is the oldest, original GBDT model, which uses a single thread. For GBDT, this depends on the specific implementation.

3. Results and Discussion

In order to evaluate the results of individual prediction methods, the following quality measures were chosen:

-: nMAE (Normalized Mean Absolute Error):

$n M A E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - y_{i}^{*}|}{C_{n o r m}} * 100 %$

(2)
-: nRMSE (Normalized Root Mean Square Error):

$n R M S E = \frac{1}{C_{n o r m}} \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - y_{i}^{*})}^{2}} * 100 %$

(3)
-: nMBE (Normalized Mean Bias Error):

$n M B E = \frac{1}{n} \sum_{i = 1}^{n} \frac{(y_{i} - y_{i}^{*})}{C_{n o r m}} * 100 %$

(4)

where:

y_{t}

—actual value of the forecast variable y at time t,

y_{t}^{*}

—predicted value of variable y at time t.

C_{n o r m}

—normalizing factor

It was decided to use quality indicators in a normalized form. This allows for comparison of the obtained results with the results obtained for other prognostic cases. The value of normalizing factor

C_{n o r m}

was taken as the highest value occurring in the time series (17,296 kWh). nMBE shows the forecast bias. Positive values indicate that the model overpredicts, and negative values indicate that the model underpredicts. nMAE and nMBE are related to the first order of the error moment, and nRMSE is related to the second order of the error moment. This means that MAE values change linearly with the errors made by the model. RMSE values, on the other hand, depend on the square of these errors. The greater the difference between MAE and RMSE, the larger the errors made by the model in relation to the mean error value.

The authors believe that the combination of these three quality measures allows for a good assessment of the model performance (compared to the simplicity and clarity of the results presentation).

A PC with Ryzen 7 5800H 3.2 GHz and 32 GB RAM was used for computations. LightGBM, CatBoost, and XGBoost were used as standalone packages while GradientBoostingRegressor was used with its scikit-learn implementation. All packages were used with their Python sklearn-based interface. Only PCU was used for calculations.

The study began by generating forecasts using the default hyperparameters of each of the four tested gradient-boosted decision tree models. Additionally, the runtime of each model during the training process was measured. The experiments were conducted for the variant with 19 inputs– SET 1 (full variant) and the variant with 9 inputs– SET 2 (shortened variant). Table 3 presents the results for the SET 1 data variant, while Table 4 presents the results for the SET 2 data variant. In both variants, the results for the reference model (naive model—where the forecast at step T equals the energy generation at step T − 1) are also included. The data in Table 3 and Table 4 are sorted in ascending order based on the nRMSE error metric on the test subset. The best results (test range) for each quality metric are in bold and marked in blue and the worst results for each quality metric are marked in red. The same notations are used for the training time of the models for the training range.

Based on the results obtained using the default hyperparameters of the models (without tuning) for both the SET1 and SET2 data variants, several conclusions can be drawn. The SET1 dataset proved to be more advantageous (according to the nRMSE metric) for all four models. This indicates that the models perform better when provided with a larger amount of input information (specifically, the number of lagged values of the available time series). The differences in quality between the results of the four models were not significant. For example, the best model, XGBoost, showed an nRMSE error for the SET1 dataset that was 2.3% lower than for the SET2 dataset, while for the LightGBM model, this improvement was 3.2%.

Each of the four models, for both input data variants, exhibited a lower nRMSE and nMAE error than the naive model. The best model, XGBoost (with input data SET1), had an nRMSE error that was 37.2% lower than the naive model. The quality ranking (nRMSE error) of the models was the same for both input datasets, SET1 and SET2: the best model was XGBoost, while the worst model was GBDT (excluding the naive model).

Analyzing the training time of the models with default settings, it can be observed that for both data variants, SET1 and SET2, the order of the models was identical: LightGBM was by far the fastest to train, followed by CatBoost, while GBDT was the slowest (with training time being several times longer compared to the fastest model). For the SET1 dataset, the training time of the models was longer than for the SET2 dataset, which was expected due to the significantly smaller number of input data in SET2. The exception was the LightGBM model. The most substantial reduction in training time for SET2 occurred with the GBDT model. For the remaining three models, the impact of the number of input data on training time was much smaller.

The next point of study was generating forecasts using the tuned hyperparameters of each of the four tested gradient-boosted decision tree models. The hyperparameter tuning process involved training models on various predefined combinations of hyperparameter values specific to each model. There are certain differences between the models in terms of the number of hyperparameters (with GBDT models having the fewest) as well as the types of hyperparameters. For each of the four models, the training process was performed on approximately 40,000 different hyperparameter combinations.

In the second step, after an expert analysis of the obtained results, additional training was conducted for each of the four models on over 10 additional hyperparameter combinations, based on those that yielded the most favorable results (validation range—test range results were not visible at this stage and could not be used to select the best model). Additionally, the runtime of each model during the training process was measured.

The experiments were conducted for the variant with 19 inputs—SET 1 (full variant)— and the variant with 9 inputs—SET 2 (shortened variant). Table 5 presents the results for the SET 1 data variant, while Table 6 presents the results for the SET 2 data variant. The data in Table 5 and Table 6 are sorted in ascending order based on the nRMSE error metric on the test subset. The best results (test range) for each quality metric are in bold and marked in blue, and the worst results for each quality metric are marked in red. The same notations are used for the training time of the models for the training range.

Figure 4 presents a comparison of the nRMSE error on the test range for four models using the SET1 dataset, with default hyperparameters and tuned hyperparameters. Similarly, Figure 5 presents a comparison of the nRMSE error on the test range for four models using the SET2 dataset, with default hyperparameters and tuned hyperparameters. In both figures, the models are ranked from best to worst (from left to right) based on the tuned hyperparameter variant.

Analyzing hyperparameter tuning impact on model accuracy, it can be concluded that tuning reduced the nRMSE error for each model. However, it is noteworthy that the degree of improvement was not the same for all models. The largest decrease in nRMSE error was observed for the GBDT model (21% for the SET1 dataset and 24% for the SET2 dataset). Clearly, this model required hyperparameter tuning to reduce the error magnitude, as the default hyperparameters were far from optimal. It is also important to emphasize that with default hyperparameters, the GBDT model had the worst performance of the four models analyzed, but after tuning, its performance was the best among all the tuned models. The hyperparameter tuning process had the least impact on improving the performance of the XGBoost model. For the SET1 dataset, the improvement was 1.88%, while for the SET2 dataset, the improvement was 1.56%. Therefore, it can be concluded that XGBoost had the best default hyperparameter settings among the four models considered for the analyzed regression task—forecasting energy generation in a wind farm. For the SET1 dataset, the tuned GBDT model had an nRMSE error 5.6% lower than the second-best tuned model, XGBoost (for SET2, the difference in favor of GBDT was 5.1%). It can be concluded that the difference in nRMSE error was quite significant in favor of the GBDT model. For the LightGBM and CatBoost models, the reduction in nRMSE error for the tuned models compared to the default models was approximately 4% for both the SET1 and SET2 datasets.

The best model, the tuned GBDT (SET1), achieved an nRMSE error that was nearly 42% lower than the naive model and 7.4% lower than the best default model, XGBoost.

Figure 6 shows the scatterplot between the forecast results of the naive model (reference model) and the actual energy generation values for the test range; however, Figure 7 shows the scatterplot between the actual energy generation values and the forecasts for the test range for the top-ranked model (based on nRMSE error)—GBDT (input data SET1).

Figure 8 presents a comparison of the actual energy generation values at the wind farm and the forecasts (test range) using the GBDT model (input data SET1) for default and tuned hyperparameter values over a one-day period (31 December 2022).

Analyzing the training time of the models using tuned values of model hyperparameters, it can be observed that for both data variants, SET1 and SET2, the order of the models was identical: LightGBM was by far the fastest to train, followed by XGBoost, while GBDT was the slowest. For the SET1 dataset, the training time of the models was longer than for the SET2 dataset. The comparison described above refers to the training times of a single model using the best hyperparameter variant (the one with the lowest nRMSE error) obtained for the validation range. From a practical perspective, when considering the process of searching for the best set of hyperparameters for a given model, the total training time for all hyperparameter combinations was more important. The number of different hyperparameter combinations was approximately 40,000 for each model. Figure 9 presents the total training times for each of the four models, taking into account two different input datasets—SET1 and SET2.

Table 7 presents a summary of the main hyperparameters for the four models, both for the default version and the tuned version (the best set of hyperparameters found from approximately 40,000 tested combinations). In Table 7, hyperparameter values identical to the default for a given model are highlighted in blue.

The hyperparameter tuning results for the four models presented in Table 7 indicate that for none of the models were the default values optimal according to the nRMSE minimization criterion on the validation data. However, among the four models, XGBoost retained the most default hyperparameter values identical to those found after tuning. Only the “tree depth” and “number of trees” parameters had better values identified through the tuning process.

For all four models, the optimal value for the “number of trees” hyperparameter was 600. Three models had a default value of 100, which proved to be far too low.

For 3 out of the 4 models, the optimal “tree depth” was 4, with the exception of the GBDT model, which performed better with a depth of 5. None of the models had a default “tree depth” value that was optimal. Regarding the “learning rate” hyperparameter, the default values for three models were either slightly different or identical to the optimal values. Only the CatBoost model had a default learning rate that was significantly too low compared to the tuned value.

A notable observation is that “early stopping” was inactive in three models; activating it increased the nRMSE error. For the “lasso regularization” and “minimum loss reduction required to make a further partition” hyperparameters, the optimal value was 0 for the two models that include these parameters.

Another interesting observation is that for 3 of the 4 models, a value of 0.8 was more advantageous than 1 for the “fraction of the training data randomly selected and used to fit each tree” hyperparameter. For the “minimum samples in leaf” hyperparameter, it was not possible to identify a single preferred value across all four models, as there were substantial differences in the optimal values after tuning.

Figure 10 presents a segment of the tuning process for three hyperparameters of the GBDT model for the SET1 input data variant. A clear influence of the learning rate on the nRMSE error was visible. The default value proved to be appropriate for the GBDT model, as both increasing and decreasing the learning rate led to a rise in nRMSE error. Conversely, the hyperparameter “minimum samples in leaf” had a very minimal impact on the nRMSE error—both a value of 20 and the default value of 1 were appropriate. Noteworthy, however, was the significant effect of the “tree depth” hyperparameter. A depth of 5 was more advantageous than values of 2 or 3 (default).

The influence of weather conditions on the quality of forecasts was investigated for the best GBDT model using SET1 input data. Figure 11 and Figure 12 show the calculated mean error nMAE for individual wind speed ranges. As can be seen, the mean error increased and reached a maximum at average wind speeds. For such values, the power production curve for turbines had the greatest slope. Then the errors decreased and increased again for wind values at which we deal with violent weather phenomena, which may result in wind turbines switching off.

4. Conclusions

The conducted multivariate analysis demonstrated that, for the regression problem under consideration (forecasting energy generation of a wind farm), hyperparameter tuning improved the performance (nRMSE error) of each model, although the degree of improvement was not identical. For the XGBoost model, the difference between default and tuned hyperparameters was very small (below 2%). It can therefore be concluded that the developers of the XGBoost library carefully analyzed numerous regression and classification problems to establish highly appropriate and universal default values for the XGBoost model.

The most noticeable improvement from hyperparameter tuning was observed in the oldest model, GBDT. Interestingly, this model turned out to be the best among all the models analyzed, achieving a result 5.6% better than the second-ranked model, XGBoost, after hyperparameter tuning.

The analysis of hyperparameter values obtained after the tuning process showed that the XGBoost model retained the most parameter values identical to those before tuning. It is also important that for the regression problem under consideration, the default values of the “number of trees” hyperparameter were generally too low for most models. Conversely, the CatBoost model had the least favorable default value for its “learning rate”. The use of an active “early stopping” hyperparameter appeared to be disadvantageous for all models. Based on the results obtained, it is recommended to use a value of 0.8 instead of 1 for the “fraction of the training data randomly selected and used to fit each tree” hyperparameter.

Looking at the results from the perspective of selecting the appropriate input data, it can be concluded that reducing the number of input data (the number of lagged values of the forecasted time series and lagged values of exogenous time series, such as meteorological data) was disadvantageous. However, the differences in the performance of the models between SET1 and SET2 (the reduced input dataset) were not very large. For example, the best-tuned model, GBDT, achieved an nRMSE error 3.2% lower using SET1 compared to SET2.

It is worth noting that, although the GBDT model achieved the lowest forecast errors for energy generation, it required a very long training time to identify the optimal hyperparameter values. For example, for the SET1 input dataset, the total training time was more than 220 times longer than that of the model with the shortest total training time, amounting to over 22 days, which presented a significant practical issue when the time available for hyperparameter tuning was limited. In contrast, the model with the shortest hyperparameter tuning time, LightGBM, required just over 2 h to search through a similar number of hyperparameter combinations.

There are three scenarios that can address most of the needs in terms of practical use of the discussed models:

If there is a need to obtain a good forecasting model in a short time, you should use the XGBoost model (on default hyperparameters) trained on the full range of explanatory variables. Using a reduced range of explanatory variables allows you to additionally reduce the training time without significantly affecting the quality.
If there is a need obtain a forecasting model in the shortest possible time, you should use the LightGBM model.
If the most important element of the solution is to obtain the best possible quality of forecasts and the training time is not important, you should use the GBDT model with hyperparameter optimization.

The authors believe that the conducted analysis is worth continuing with the use of other data from the field of renewable energy forecasting (regression problems). The aim of this further analysis would be to verify the degree of reproducibility of the conclusions formulated in this article. In particular, this pertains to verifying the superiority of the tuned GBDT model over the other models.

Author Contributions

Conceptualization, M.K., P.P. and D.B.; methodology, M.K., P.P. and D.B.; software, M.K.; validation, P.P. and M.K; formal analysis, M.K., P.P. and D.B.; investigation, M.K., P.P. and D.B.; resources, M.K.; data curation, M.K.; writing—original draft preparation, M.K., P.P. and D.B.; writing—review and editing, M.K., P.P. and D.B.; visualization, P.P. and M.K.; supervision, M.K.; project administration, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2023 edition of the competition for grant of the Scientific. Council for the Discipline Automatic Control, Electronics, Electrical Engineering and Space Technologies of the Warsaw University of Technology (to M.K., P.P., D.B.).

Data Availability Statement

The data used in the article were obtained in accordance with the principles contained in the source [33].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used throughout this manuscript:

ACF	Autocorrelation function
ANN	Artificial Neural Network
BP	Backpropagation
BGRU	Bidirectional Gated Recurrent Unit
CatBoost	Categorical Boosting
CNN	Convolutional Neural Network
GBDT	Gradient-Boosting Decision Tree
GBM	Gradient-Boosting Machine
L1	Lasso Regularization
L2	Ridge Regularization
LightGBM	Light Gradient Boosting Machine
LSTM	Long-Short-Term Memory
MART	Multiple Additive Regression Trees
ML	Machine Learning
nMAE	Normalized Mean Absolute Error
nMBE	Normalized Mean Bias Error
nRMSE	Normalized Root Mean Squared Error
NWP	Numerical Weather Prediction
PCHIP	Piecewise Cubic Hermite Interpolating Polynomial
R	Pearson Linear Correlation Coefficient
RES	Renewable Energy Sources
RF	Random Forest
SVM	Support Vector Machine
XGBoost	eXtreme Gradient Boosting

References

Piotrowski, P.; Rutyna, I.; Baczyński, D.; Kopyt, M. Evaluation Metrics for Wind Power Forecasts: A Comprehensive Review and Statistical Analysis of Errors. Energies 2022, 15, 9657. [Google Scholar] [CrossRef]
Zheng, Y.; Guan, S.; Guo, K.; Zhao, Y.; Ye, L. Technical Indicator Enhanced Ultra-short-term Wind Power Forecasting Based on Long Short-term Memory Network Combined XGBoost Algorithm. IET Renew. Power Gen 2024, rpg2.12952. [Google Scholar] [CrossRef]
Singh, U.; Rizwan, M. SCADA System Dataset Exploration and Machine Learning Based Forecast for Wind Turbines. Results Eng. 2022, 16, 100640. [Google Scholar] [CrossRef]
Miele, E.S.; Ludwig, N.; Corsini, A. Multi-Horizon Wind Power Forecasting Using Multi-Modal Spatio-Temporal Neural Networks. Energies 2023, 16, 3522. [Google Scholar] [CrossRef]
Gao, Q. Multi-Temporal Scale Wind Power Forecasting Based on Lasso-CNN-LSTM-LightGBM. EAI Endorsed Trans. Energy Web 2024, 11. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Tianqi, C.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisko, CA, USA, 13–17 August 2016. [Google Scholar]
eXtreme Gradient Boosting. Available online: https://github.com/dmlc/xgboost (accessed on 11 August 2024).
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
Light Gradient Boosting Machine. Available online: https://github.com/Microsoft/LightGBM (accessed on 11 August 2024).
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 6638–6648. [Google Scholar]
Yandex Catboost. Available online: https://github.com/catboost/catboost (accessed on 12 August 2024).
Zheng, H.; Wu, Y. A XGBoost Model with Weather Similarity Analysis and Feature Engineering for Short-Term Wind Power Forecasting. Appl. Sci. 2019, 9, 3019. [Google Scholar] [CrossRef]
Jiading, J.; Feng, W.; Rui, T.; Lingling, Z.; Xin, X. TS_XGB:Ultra-Short-Term Wind Power Forecasting Method Based on Fusion of Time-Spatial Data and XGBoost Algorithm. Procedia Comput. Sci. 2022, 199, 1103–1111. [Google Scholar] [CrossRef]
Zha, W.; Liu, J.; Li, Y.; Liang, Y. Ultra-Short-Term Power Forecast Method for the Wind Farm Based on Feature Selection and Temporal Convolution Network. ISA Trans. 2022, 129, 405–414. [Google Scholar] [CrossRef]
Keerthisinghe, C.; Silva, A.R.; Tardáguila, P.; Horváth, G.; Deng, A.; Theis, T.N. Improved Short-Term Wind Power Forecasts: Low-Latency Feedback Error Correction Using Ramp Prediction and Data From Nearby Farms. IEEE Access 2023, 11, 128697–128705. [Google Scholar] [CrossRef]
Wu, Y.-K.; Huang, C.-L.; Wu, S.-H.; Hong, J.-S.; Chang, H.-L. Deterministic and Probabilistic Wind Power Forecasts by Considering Various Atmospheric Models and Feature Engineering Approaches. IEEE Trans. Ind. Applicat. 2023, 59, 192–206. [Google Scholar] [CrossRef]
Ponkumar, G.; Jayaprakash, S.; Kanagarathinam, K. Advanced Machine Learning Techniques for Accurate Very-Short-Term Wind Power Forecasting in Wind Energy Systems Using Historical Data Analysis. Energies 2023, 16, 5459. [Google Scholar] [CrossRef]
Ayele, S.T.; Ageze, M.B.; Zeleke, M.A.; Miliket, T.A. Adama II Wind Farm Long-Term Power Generation Forecasting Based on Machine Learning Models. Sci. Afr. 2023, 21, e01831. [Google Scholar] [CrossRef]
Wang, J.; Niu, W.; Yang, Y. Wind Turbine Output Power Prediction by a Segmented Multivariate Polynomial-XGBoost Model. Energy Sources Part A Recovery Util. Environ. Eff. 2024, 46, 505–521. [Google Scholar] [CrossRef]
Zhou, Y.; Ma, L.; Ni, W.; Yu, C. Data Enrichment as a Method of Data Preprocessing to Enhance Short-Term Wind Power Fore-casting. Energies 2023, 16, 2094. [Google Scholar] [CrossRef]
Fan, L.; Wang, Y.; Fang, X.; Jiang, J. To Predict the Power Generation Based on Machine Learning Method. J. Phys. Conf. Ser. 2022, 2310, 012084. [Google Scholar] [CrossRef]
Li, Y.; Tang, F.; Gao, X.; Zhang, T.; Qi, J.; Xie, J.; Li, X.; Guo, Y. Numerical Weather Prediction Correction Strategy for Short-Term Wind Power Forecasting Based on Bidirectional Gated Recurrent Unit and XGBoost. Front. Energy Res. 2022, 9, 836144. [Google Scholar] [CrossRef]
Cakiroglu, C.; Demir, S.; Hakan Ozdemir, M.; Latif Aylak, B.; Sariisik, G.; Abualigah, L. Data-Driven Interpretable Ensemble Learning Methods for the Prediction of Wind Turbine Power Incorporating SHAP Analysis. Expert Syst. Appl. 2024, 237, 121464. [Google Scholar] [CrossRef]
Oyucu, S.; Aksöz, A. Integrating Machine Learning and MLOps for Wind Energy Forecasting: A Comparative Analysis and Optimization Study on Türkiye’s Wind Data. Appl. Sci. 2024, 14, 3725. [Google Scholar] [CrossRef]
Ahmed, U.; Muhammad, R.; Abbas, S.S.; Aziz, I.; Mahmood, A. Short-Term Wind Power Forecasting Using Integrated Boosting Approach. Front. Energy Res. 2024, 12, 1401978. [Google Scholar] [CrossRef]
Mou, X.; Chen, H.; Zhang, X.; Xu, X.; Yu, Q.; Li, Y. Short-Term Wind Power Prediction Method Based on Combination of Meteorological Features and CatBoost. Wuhan Univ. J. Nat. Sci. 2023, 28, 169–176. [Google Scholar] [CrossRef]
Liao, S.; Tian, X.; Liu, B.; Liu, T.; Su, H.; Zhou, B. Short-Term Wind Power Prediction Based on LightGBM and Meteorological Reanalysis. Energies 2022, 15, 6287. [Google Scholar] [CrossRef]
Brain, J. When to Choose CatBoost Over XGBoost or LightGBM. Available online: https://neptune.ai/blog/when-to-choose-catboost-over-xgboost-or-lightgbm (accessed on 26 August 2024).
Keels, J. XGBoost, Light GBM and CatBoost. A Comparison of Decision Tree Algorithms and Applications to a Regression Problem. Available online: https://medium.com/octave-john-keells-group/xgboost-light-gbm-and-catboost-a-comparison-of-decision-tree-algorithms-and-applications-to-a-f1d2d376d89c (accessed on 16 August 2024).
Swalin, A. CatBoost vs. Light GBM vs. XGBoost. Available online: https://www.kdnuggets.com/2018/03/catboost-vs-light-gbm-vs-xgboost.html (accessed on 16 August 2024).
Historical. Available online: https://www.sotaventogalicia.com/en/technical-area/real-time-data/historical/ (accessed on 15 August 2024).
Barker, P.M.; McDougall, T.J. Two Interpolation Methods Using Multiply-Rotated Piecewise Cubic Hermite Interpolating Polynomials. J. Atmos. Ocean. Technol. 2020, 37, 605–619. [Google Scholar] [CrossRef]
StatSoft Electronic Statistics Textbook. Available online: https://www.statsoft.pl (accessed on 9 August 2024).
Piotrowski, P.; Baczyński, D.; Kopyt, M.; Gulczyński, T. Advanced Ensemble Methods Using Machine Learning and Deep Learning for One-Day-Ahead Forecasts of Electric Energy Production in Wind Farms. Energies 2022, 15, 1252. [Google Scholar] [CrossRef]

Figure 1. Change in behavior for energy generation time series from 2022-10-29 to 2022-11-02.

Figure 2. Autocorrelation function (ACF) of the electricity generation up to 60 h back.

Figure 3. Scatter plot between energy generation in period T and wind speed in period T − 1.

Figure 4. Comparison of the nRMSE error on the test range for four models using the SET1 dataset, with default hyperparameters and tuned hyperparameters.

Figure 5. Comparison of the nRMSE error on the test range for four models using the SET2 dataset, with default hyperparameters and tuned hyperparameters.

Figure 6. Scatterplot between the forecast results of the naive model (reference model) and the actual energy generation values for the test range.

Figure 7. Scatterplot between the actual energy generation values and the forecasts for the test range for the top-ranked model (based on nRMSE error)—GBDT (input data SET1).

Figure 8. Comparison of the actual energy generation values at the wind farm and the forecasts (test range) using the GBDT model (input data SET1) for default and tuned hyperparameter values over a one-day period (31 December 2022).

Figure 9. Comparison between total training times on two input datasets for four tree methods.

Figure 10. Segment of the tuning process for three hyperparameters of the GBDT model for the SET1 input data variant.

Figure 11. Mean nMAE for different wind speeds for the SET1 input data variant.

Figure 12. Mean nMAE for different wind speeds for the SET2 input data variant.

Table 1. All selected potential input (two variants) and codes.

Input Data Description (SET1—Full Variant)	Input Data Description (SET2—Shortened Variant)	Code
Generation in period T − n, n = 1, 2…6 [kWh]	Generation in period T − n, n = 1, 2, 3 [kWh]	EG(T − n)
Smoothed generation in period T − 1 [kWh]	Smoothed generation in period T − 1 [kWh]	SEG(T − 1)
Wind speed in period T − n, n = 1, 2…6 [m/s]	Wind speed in period T − n, n = 1, 2, 3 [m/s]	WS(T − n)
Wind direction—sine in period T − n, n = 1, 2, 3	Wind direction—sine in period T − n, n = 1	WD_sin(T − n)
Wind direction—cosine in period T − n, n = 1, 2, 3	Wind direction—cosine in period T − n, n = 1	WD_cos(T − n)

Table 2. The summary of the key hyperparameters for each of the four gradient-boosted decision tree models (regression version).

Description of the Hyperparameter	GBDT	XGBoost	LightGBM	CatBoost
Number of trees (number of iterations)	n_estimators = 100	n_estimators = 100	num_boost_round = 100	iterations = 1000
Learning rate (step size shrinkage)	learning_rate = 0.1	learning_rate = 0.3	learning_rate = 0.1	learning_rate = 0.03
Tree depth	max_depth = 3	max_depth = 6	max_depth = −1 (no limit)”	depth = 6
Strategy used for growing trees	-	-	-	grow_policy =SymmetricTree
Minimum samples in leaf	min_samples_leaf = 1	min_child_weight = 1	min_child_samples = 20	min_data_in_leaf = 1
Number of leaves in full trees (maximum)	-	-	num_leaves = 31	-
Fraction of the training data randomly selected and used to fit each tree (bagging)	subsample = 1	subsample = 1	subsample = 1	subsample = 1
The frequency of applying subsampling (bagging)—number of boosting iterations	-	-	bagging_freq = 0	-
Control the intensity of the Bayesian bootstrap when sampling data for each iteration	-	-	-	bagging_temperature = 1
Randomly subsampling features once per tree	-	colsample_bytree = 1	colsample_bytree = 1	-
Randomly subsampling features at each level of the tree	-	colsample_bylevel = 1	-	colsample_bylevel = 1
Control randomness into the process of selecting features and splits	-	-	-	random_strength = 1
Randomly subsampling features at each split (node) of the tree	-	colsample_bynode = 1	-	-
Maximum number of bins that features can be divided	-	-	max_bin = 255	border_count = 254
Minimum loss reduction required to make a further partition	-	gamma = 0	min_gain_to_split = 0	-
Manually adjust the weights of the positive and negative classes to handle class imbalance	-	scale_pos_weight = 1	scale_pos_weight = 1	-
Automatically adjust the weights of the positive and negative classes to handle class imbalance		-	is_unbalance = false	-
Lasso regularization (L1)	-	reg_alfa = 0	lambda_l1 = 0	-
Ridge regularization (L2)	-	reg_lambda = 1	lambda_l2 = 0	l2_leaf_reg = 3
Stop training early if the performance on a validation set doesn’t improve for a certain number of rounds	-	early_stopping_rounds = none	early_stopping_rounds = none	early_stopping_rounds = none
Loss function	Least Squares (MSE)	reg = squarederror (MSE)	regression (MSE)	loss_function = RMSE
Evaluation metric	-	eval_metric = rmse	l2 (MSE)	eval_metric = RMSE
Frequency of calculating the evaluation metric	every iteration by default	every iteration by default	metric_freq = 1	metric_period = 1
The number of threads to use during training	n_jobs = None (single thread)	nthread= −1 (all)	num_threads (all if not specified)	thread_count= −1 (all)

Table 3. Quality ranking of all analyzed models for the SET1 input data variant and using the default values of model hyperparameters sorted in ascending order based on the nRMSE error metric on the test subset. The best and worst results are highlighted in blue and red colors respectively.

Model Name	Training Range				Validation Range			Test Range
	nRMSE (%)	nMAE (%)	nMBE (%)	Time [s]	nRMSE (%)	nMAE (%)	nMBE (%)	nRMSE (%)	nMAE (%)	nMBE (%)
XGBoost	0.6451	0.4186	5 × 10⁻⁶	0.59	0.8047	0.4634	−0.0062	0.9710	0.6404	−0.0063
CatBoost	0.8483	0.5459	2 × 10⁻⁶	7.73	0.8424	0.4994	−0.0276	1.0064	0.6752	−0.0222
LightGBM	0.8438	0.5487	−0.0402	0.25	0.8684	0.5186	−0.0728	1.0195	0.6935	−0.0642
GBDT	0.9781	0.6315	−3 × 10⁻¹⁵	24.36	0.9639	0.5737	−0.0570	1.1393	0.7719	−0.0294
Naive	1.3121	0.8148	−0.0103	-	1.2966	0.7467	−0.0040	1.5464	1.0228	−0.0069

Table 4. Quality ranking of all analyzed models for the SET2 input data variant and using the default values of model hyperparameters sorted in ascending order based on the nRMSE error metric on the test subset.

Model Name	Training Range				Validation Range			Test Range
	nRMSE (%)	nMAE (%)	nMBE (%)	Time [s]	nRMSE (%)	nMAE (%)	nMBE (%)	nRMSE (%)	nMAE (%)	nMBE (%)
XGBoost	0.6969	0.4432	−2 × 10⁻⁶	0.44	0.8419	0.4816	−0.0425	0.9939	0.6552	−0.0631
CatBoost	0.8614	0.5466	2 × 10⁻⁵	7.28	0.8895	0.5207	−0.0629	1.0402	0.6939	−0.0493
LightGBM	0.8639	0.5563	−0.0380	0.38	0.9004	0.5363	−0.1011	1.0527	0.7110	−0.0989
GBDT	1.0224	0.6607	7 × 10⁻¹⁶	14.40	1.0481	0.6247	−0.0962	1.2252	0.8318	−0.0886
Naive	1.3121	0.8148	−0.0103	-	1.2966	0.7467	−0.0040	1.5464	1.0228	−0.0069

Table 5. Quality ranking of all analyzed models for the SET1 input data variant and using tuned values of model hyperparameters sorted in ascending order based on the nRMSE error metric on the test subset.

Model Name	Training Range				Validation Range			Test Range
	nRMSE (%)	nMAE (%)	nMBE (%)	Time [s]	nRMSE (%)	nMAE (%)	nMBE (%)	nRMSE (%)	nMAE (%)	nMBE (%)
GBDT	0.6072	0.3937	0.0002	147.39	0.7626	0.4419	−0.0104	0.8993	0.5917	−0.0161
XGBoost	0.5893	0.3858	−3 × 10⁻⁶	1.38	0.7860	0.4513	−0.0118	0.9528	0.6317	−0.0373
LightGBM	0.7113	0.4687	−0.0479	0.64	0.8127	0.4849	−0.0568	0.9715	0.6615	−0.0694
CatBoost	0.7861	0.5067	5 × 10⁻⁶	2.76	0.8126	0.4791	−0.0209	0.9768	0.6519	−0.0126

Table 6. Quality ranking of all analyzed models for the SET2 input data variant and using tuned values of model hyperparameters sorted in ascending order based on the nRMSE error metric on the test subset.

Model Name	Training Range				Validation Range			Test Range
	nRMSE (%)	nMAE (%)	nMBE (%)	Time [s]	nRMSE (%)	nMAE (%)	nMBE (%)	nRMSE (%)	nMAE (%)	nMBE (%)
GBDT	0.6198	0.3965	−0.0004	77.32	0.8083	0.4608	−0.0348	0.9289	0.6102	−0.0408
XGBoost	0.6463	0.4146	1 × 10⁻⁶	1.07	0.8331	0.4736	−0.0368	0.9783	0.6450	−0.0634
CatBoost	0.8020	0.5078	5 × 10⁻⁶	2.17	0.8537	0.4986	−0.0495	0.9947	0.6616	−0.0305
LightGBM	0.7574	0.4897	−0.0521	0.57	0.8640	0.5097	−0.0859	1.0084	0.6822	−0.0997

Table 7. Summary of the main hyperparameters for the four models, both for the default version and the tuned version (the best set of hyperparameters found from approximately 40,000 tested combinations).

	GBDT			XGBoost			LigtGBM			CatBoost
Hyperparameter Description	Default	Tuned SET1	Tuned SET2	Default	Tuned SET1	Tuned SET2	Default	Tuned SET1	Tuned SET2	Default	Tuned SET1	Tuned SET2
Number of trees	100	600	600	100	600	600	100	600	600	1000	600	600
Learning rate	0.1	0.1	0.2	0.3	0.3	0.3	0.1	0.2	0.2	0.03	0.12	0.12
Tree depth	3	5	5	6	4	4	−1	4	4	6	4	4
Minimum samples in leaf	1	20	40	1	1	1	20	20	20	1	1	1
Number of leaves in full trees (maximum)	-	-	-	-	-	-	31	31	31	-	-	-
Fraction of the training data randomly selected and used to fit each tree (bagging)	1	0.8	0.8	1	1	1	1	0.8	0.8	1	0.8	0.8
Randomly subsampling features once per tree	-	-	-	1	1	1	1	1	1	-
Randomly subsampling features at each level of the tree	-	-	-	1	1	1	-	-	-	1	1	1
Randomly subsampling features at each split (node) of the tree	-	-	-	1	1	1	-	-	-	-	-	-
Maximum number of bins that features can be divided	-	-	-	-	-	-	255	4000	4000	254	1000	1000
Minimum loss reduction required to make a further partition	-	-	-	0	0	0	0	0	0	-	-	-
Lasso regularization (L1)	-	-	-	0	0	0	0	0	0	-	-	-
Ridge regularization (L2)	-	-	-	1	1	1	0	0	1	3	1.5	1.5
Early stopping	-	-	-	no	no	no	no	no	no	no	no	no

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kopyt, M.; Piotrowski, P.; Baczyński, D. Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models. Energies 2024, 17, 6194. https://doi.org/10.3390/en17236194

AMA Style

Kopyt M, Piotrowski P, Baczyński D. Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models. Energies. 2024; 17(23):6194. https://doi.org/10.3390/en17236194

Chicago/Turabian Style

Kopyt, Marcin, Paweł Piotrowski, and Dariusz Baczyński. 2024. "Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models" Energies 17, no. 23: 6194. https://doi.org/10.3390/en17236194

APA Style

Kopyt, M., Piotrowski, P., & Baczyński, D. (2024). Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models. Energies, 17(23), 6194. https://doi.org/10.3390/en17236194

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Energy Generation Forecasts at a Wind Farm—A Multi-Variant Comparison of the Effectiveness and Performance of Various Gradient-Boosted Decision Tree Models

Abstract

1. Introduction

1.1. Evolution of Gradient-Boosted Decision Tree Models

1.2. Review of Applications of Gradient-Boosted Decision Trees in Wind Energy Generation Forecasting

1.3. Review of Studies Comparing the Effectiveness and Efficiency of Gradient-Boosted Decision Trees

2. Data and Forecasting Methods

2.1. Data

2.1.1. Preprocessing of Data

2.1.2. Statistical Analysis of Data and Development of Input Data Including Feature Engineering

2.1.3. Dataset Division

2.2. Gradient-Boosted Decision Tree Forecasting Methods

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI