A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning

Chen, Yan; Yu, Miaolin; Wei, Haochong; Qi, Huanxing; Qin, Yiming; Hu, Xiaochun; Jiang, Rongxing

doi:10.3390/en18030580

Open AccessArticle

A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning

by

Yan Chen

¹

,

Miaolin Yu

²,

Haochong Wei

²,

Huanxing Qi

³,

Yiming Qin

³,

Xiaochun Hu

^4,*

and

Rongxing Jiang

²

¹

School of Business, Guangxi University, Nanning 530004, China

²

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China

³

Dispatching Control Center of Guangxi Power Grid, Nanning 530004, China

⁴

Guangxi Key Laboratory of Big Data in Finance and Economics, Nanning 530004, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(3), 580; https://doi.org/10.3390/en18030580

Submission received: 23 December 2024 / Revised: 18 January 2025 / Accepted: 21 January 2025 / Published: 26 January 2025

(This article belongs to the Topic Advances in Power Science and Technology, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate wind power forecasting is crucial for optimizing grid scheduling and improving wind power utilization. However, real-world wind power time series exhibit dynamic statistical properties, such as changing mean and variance over time, which make it difficult for models to apply observed patterns from the past to the future. Additionally, the execution speed and high computational resource demands of complex prediction models make them difficult to deploy on edge computing nodes such as wind farms. To address these issues, this paper explores the potential of linear models for wind power forecasting and constructs NFLM, a linear, lightweight, short-term wind power forecasting model that is more adapted to the characteristics of wind power data. The model captures both short-term and long-term sequence variations through continuous and interval sampling. To mitigate the interference of dynamic features, we propose a normalization feature learning block (NFLBlock) as the core component of NFLM for processing sequences. This module normalizes input data and uses a stacked multilayer perceptron to extract cross-temporal and cross-dimensional dependencies. Experiments with data from two real wind farms in Guangxi, China, showed that compared with other advanced wind power forecasting methods, the MSE of NFLM in the 24-step ahead forecasting of the two wind farms is respectively reduced by 23.88% and 21.03%, and the floating-point operations (FLOPs) and parameter count only require 36.366 M and 0.59 M, respectively. The results show that NFLM can achieve good prediction accuracy with fewer computing resources.

Keywords:

wind power forecasting; deep learning; multi-layer perceptron; dynamic features; lightweight modeling

1. Introduction

Against the backdrop of increasing energy demand today, the widespread use of traditional energy is further exacerbating the energy crisis and promoting the deterioration of the greenhouse effect [1]. Therefore, people are turning their attention to renewable energy, especially wind energy. According to the 2022 Global Wind Energy Report released by the Global Wind Energy Council (GWEC), the global wind energy installed capacity is expected to increase by 680 million kilowatts from 2023 to 2027 [2]. This indicates that wind energy, as an environmentally friendly and widely distributed renewable energy source, is being actively promoted. However, the inherent randomness and instability of wind resources result in a high degree of uncertainty and unpredictability in wind power generation, posing significant challenges to the stable operation of the power grid and electricity dispatching [3]. Accurate short-term forecasting can help grid operators efficiently manage the power resources of wind farms, improving wind energy utilization and reducing operational costs. Moreover, the current real-time forecasting of wind farms places high demands on model forecasting efficiency; therefore, accurate and efficient wind power forecasting is crucial to meet these challenges.

Current models applied in the field of wind power forecasting mainly include physical models, statistical models, machine learning models, and deep learning models. Physical models use hydrodynamics and thermodynamics to simulate the interaction between wind and the atmosphere; by incorporating factors such as terrain and landforms and solving physical equations, they predict wind speed and direction at specific locations and heights. The output power of the wind farm is then calculated based on the power curve of the wind turbines [4]. However, physical models typically require large computational resources and real-time data, making them unsuitable for short-term forecasting [5]. On the other hand, statistical models such as Autoregressive (AR) [6], Autoregressive Moving Average (ARMA) [7], and Autoregressive Integrated Moving Average (ARIMA) [8] predict by establishing mathematical relationships between historical meteorological data and wind power. Wang et al. proposed a forecast error compensation method based on the ARIMA model to improve forecasting accuracy [9]. However, statistical models may be limited in handling complex non-stationary wind power sequences due to assumptions they are based on, such as linear distribution, which can lead to underfitting and lower forecasting accuracy [10].

In recent years, many classic machine learning techniques have been applied to short-term wind power forecasting, including Support Vector Machine (SVM) [11,12], Random Forests (RF) [13], Extreme Gradient Boosting (XGBoost) [14], and Extreme Learning Machine (ELM) [15]. Compared to statistical methods, machine learning models are able to effectively model the nonlinear relationships between wind power data at different time points [16]. In addition, deep learning has become a popular technology in the field of wind power forecasting with its excellent data feature mining ability and complex network architecture [17]. Among them, the commonly used deep learning models include CNN [18], TCN [19], RNN [20], and its two variants, LSTM [21] and GRU [22], etc. Hybrid models usually have better forecasting performance than a single model due to the integration of the unique advantages of different models [23]. Zhao et al. [24] employed variational mode decomposition to reduce the volatility of the original wind speed series. The decomposed subsequences, combined with the original wind power series, are then fed into a CNN-GRU hybrid model for short-term wind power forecasting.

Moreover, in recent years, the model based on the attention mechanism has demonstrated its powerful performance in various time series forecasting tasks, including wind speed forecasting [25], solar irradiance forecasting [26], and traffic flow forecasting [27]. This mechanism typically focuses on key features that significantly impact the forecasting results to address time series forecasting problems [28]. The Transformer [29], with its encoder-decoder structure and self-attention mechanism, can capture complex time-dependency patterns and correlations among different sequential variables, thereby achieving better forecasting results. Building upon this, the model based on the attention mechanism has been gradually introduced into the field of wind power forecasting and has achieved notable success. For instance, Sun et al. [30] proposed a short-term wind power forecasting method using the Transformer that considers spatio-temporal correlations. Gong et al. [31] combined the strengths of TCN and Informer, significantly enhancing the accuracy of wind power forecasting. In addition, Xiang et al. [32] utilized the multi-head self-attention mechanism of Vision Transformer to fully exploit the complex nonlinear relationships among input data, thereby improving the accuracy of ultra-short-term wind power forecasting. Liu et al. [33] proposed an interpretable Transformer model integrating decoupled feature-temporal self-attention and variable-attention networks and enhanced wind power prediction accuracy through multi-task learning.

As mentioned earlier, Transformers and their variants have achieved high accuracy in wind power forecasting. However, due to the reduction of correlation between distant data points in the sequence, it is difficult for the Transformer to adequately understand the global information of the sequence [34]. Additionally, the time complexity of Transformers grows quadratically with the increase in time series length [35]. Based on these considerations, Zeng et al. [36] questioned the applicability of such models as Transformer in the field of time series forecasting and proposed a simple single-layer linear model, DLinear, which decomposes the data into a moving average component and a seasonal trend component and then applies a single linear layer to the two components, respectively. DLinear is experimentally demonstrated to be superior to the complex Transformer-based model in terms of prediction accuracy and computational efficiency. Meanwhile, Zhang et al. [37] proposed a lightweight multivariate forecasting architecture based on stacked multi-layer perceptrons (MLPs) called LightTS, further advancing research in lightweight forecasting. This model converts one-dimensional sequences into two-dimensional tensors through different sampling methods, allowing it to focus separately on analyzing short-term local variations and long-term global variations and then uses the MLP to learn the sequential features within these tensors, achieving improved forecasting accuracy while significantly reducing computational complexity. Summing up the previous analysis, from the perspective of forecasting accuracy and efficiency, lightweight models have a promising application prospect in time series modeling.

We have observed that training complex network models and tuning their parameters typically require significant computational resources and time, especially for Transformer-based models, which often have millions of parameters. This makes their application in resource-limited environments (such as edge computing nodes or remote wind farms) challenging [38]. In contrast, lightweight models are more practical due to their lower computational resource requirements. To ensure the load safety of wind turbines, the control system needs to adjust the pitch angle and rotational speed of the blades in response to changes in wind speed. However, due to delays in the control system, this can lead to excessive loads on the turbine. Moreover, wind speed measurement devices such as lidar are sensitive to weather conditions, making it difficult to provide reliable wind measurements. Therefore, deploying lightweight prediction models to provide short-term wind speed forecasts for each turbine is a technological solution that can help reduce the load on the turbines. Furthermore, lightweight models generally offer faster computation speeds. In applications that require forecasting wind power for a specific region [39], they can quickly process and predict the output of wind power clusters consisting of multiple wind farms across a large spatial area, which is crucial for grid scheduling of wind resources. Therefore, lightweight wind power forecasting models are of significant importance for real-time predictions in wind farms. By simplifying the structure and reducing the number of parameters, they maintain a good balance between performance and lightweight design, accelerate the wind farm’s response time and update frequency to fluctuations in wind power generation, and reduce operational costs. In the field of wind power forecasting, Lai et al. [40] developed a lightweight spatiotemporal wind power forecasting network, which synchronously learns spatiotemporal representations through the spatial layer and asynchronously learns temporal representations through the asynchronous spatial layer. They also used MLP to update wind power information along the temporal dimension, improving the efficiency of capturing wind power’s spatiotemporal relationships. Moreover, since online learning emphasizes a model’s ability to learn from new data in real-time and continuously update itself, online learning models often require simpler structures. Zhong et al. [41] introduced external time-related data and designed a lightweight parallel network to process these external data, mitigating information transmission degradation and enhancing online forecasting performance. In this paper, based on the great potential of simple linear models and the limitations of complex network models, we decide to introduce LightTS, a lightweight deep learning architecture that utilizes MLP in time and channel dimensions and thus achieves a lightweight model structure. However, LightTS does not adapt well to the characteristics of wind power data due to the inherent non-stationarity of wind power sequences [42]. Therefore, we have decided to make further improvements based on LightTS.

In fact, the wind power series are typical time series with nonlinear and dynamic characteristics, and the value of each time step is affected by the past moments. This dependency arises because wind speed at a given moment is affected by various physical factors and temporal dynamics, such as atmospheric inertia, turbulence, and changing weather patterns, which introduce correlations between consecutive wind speed measurements. Specifically, wind speed time series often exhibit significant autocorrelation, meaning there is a statistical relationship between the current and past wind speeds [43]. Since wind speed affects the power generation of wind turbines, it creates a time dependency between wind power values at different moments. Although shallow models possess excellent nonlinear representation capabilities for time series modeling, they usually model the relationship between the input measurements and future predicted values of wind power series based on static space [44]. This approach may overlook the dynamic relationships that change over time between samples, leading to the patterns observed by the model in the past becoming inapplicable for the future, thus impacting forecasting accuracy. Based on this situation, we were inspired by the idea of RevIN [45] and proposed a normalization feature learning block, which is used as a key component of the proposed model to process sequence features. This block mitigates the negative impact of dynamic characteristics on model forecasting by removing dynamic statistical properties from the sequence features through preprocessing. Then, it applies MLPs in the temporal and channel dimensions for feature learning.

Considering the computational complexity of the model and the impact of the non-stationary nature of wind power data, we explore the potential of simple and lightweight deep learning network structures in the field of short-term wind power forecasting. The main contributions of this paper are as follows:

(1) To reduce the impact of dynamic relationships in wind power series on the forecasting performance of the model, we have proposed a Normalized Feature Learning Block (NFLBlock). This block removes the time-varying dynamic characteristics from the sequence features, allowing for better utilization of the stacked MLP structure for information interaction in both temporal and channel dimensions.

(2) To address the issues of overfitting and high training time costs associated with complex forecasting models, which hinder quick and accurate responses for real-time wind power forecasting in wind farms, we have developed a lightweight normalized feature learning forecasting model, NFLM, to achieve short-term multivariate wind power forecasting. This model is based on a stacked multi-layer perceptron architecture and, through continuous and interval sampling, can fully exploit the local and global dependencies in wind power sequences across different time scales.

(3) We tested the NFLM model at two wind farms in Guangxi, China. The results show that NFLM outperforms existing complex wind power forecasting models based on Transformers and recurrent structures in terms of forecasting accuracy. Additionally, NFLM maintains low computational and parameter requirements across all forecasting horizons.

The rest of this paper is organized as follows: we briefly introduce the framework of NFLM and its key components in Section 2. Section 3 conducts relevant experiments and discusses the model’s forecasting effectiveness and forecasting efficiency. In Section 4, we summarize our work and outlook for the future.

2. Materials and Methods

2.1. Overview of NFLM

In the current field of wind power forecasting, many models employ complex neural network structures to capture the intricate temporal patterns of wind power series. However, such complex structure increases the computational complexity of the model and requires more computational resources and time. Additionally, due to the intermittency and volatility of wind resources, wind power series often contain dynamic features with time variability, leading to patterns in future data that differ from those historically observed. This discrepancy makes it challenging for models to adapt the information learned during the training phase to unknown test data, thereby affecting forecasting accuracy. To address these challenges, we constructed NFLM, a lightweight forecasting model for multivariate wind power forecasting, and proposed a Normalized Feature Learning Block (NFLBlock) inspired by the RevIN model for learning temporal features in the series. As shown in Figure 1, in the first part of the model, we perform continuous and interval sampling on the original series, generating two different time-scale input sequences that are processed independently. The dynamic statistical properties of these sequences are first removed in NFLBlock, and then the MLP stack structure is used to perform feature projection in both time and channel dimensions. After that, the statistical properties are restored back to these features to capture the local and global temporal dependencies in the series. In the second part of the model, we concatenate the features extracted from the first part, and the NFLBlock is used to learn correlations between different time series variables and make forecasting. The two sampling methods and normalization feature learning modules in NFLM will be detailed in the following sections.

2.2. Continuous Sampling and Interval Sampling

An important characteristic of time series is that down-sampling time series can still largely preserve the position information within the series. However, traditional sampling methods usually remove some units within the time step, leading to irreversible information loss. Based on this observation, we adopt continuous and interval sampling without removing any base units. The advantage of this sampling strategy is that it transforms the original series into two forms with different feature distributions, enabling the model to extract sequence features at different time scales so as to comprehensively capture the change patterns of wind power data. Specifically, continuous sampling can better reflect the local view of the wind power series, while interval sampling focuses more on the global view of the series. By employing both sampling strategies simultaneously, the model can deeply mine hidden features in wind power data from different perspectives.

In multivariate wind power forecasting, we assume that

K, T

and

L

correspond to the number of variables, the length of the input sequence, and the length of model forecasting, respectively. When the time step length is

t

, given input sequence

X_{t} = {x_{t - T + 1}^{(i)}, \dots x_{t - 1}^{(i)}, x_{t}^{(i)}, | i = (1, 2, \dots K)}

, our down-sampling method transforms the time series into

K

two-dimensional matrices with dimension

C \times T / C

, This means, for a chosen subsequence length

C

, continuous sampling will continuously select

C

tokens to form a subsequence each time. Thus, for each input sequence

X_{t} \in ℝ^{T}

of length

T

, it is down sampled into

T / C

non-overlapping subsequences to obtain the two-dimensional matrix

X_{t}^{c o n} \in R^{C \times T / C}

, where the

j - t h

column formed by continuously sampling the sequence of the

i - t h

feature variable is:

{X_{t}^{c o n}}_{• j} = {x_{t - T + 1 + (j - 1) \cdot C}^{(i)}, x_{t - T + 2 + (j - 1) \cdot C}^{(i)}, \dots, x_{t - T + j \cdot C}^{(i)}}

(1)

In interval sampling, a fixed interval is used to select

C

tokens each time to form a subsequence. Similarly, the

j - t h

column of the two-dimensional matrix obtained by interval sampling of the

i - t h

feature variable sequence is represented as follows:

{X_{t}^{int}}_{• j} = {x_{t - T + j}^{(i)}, x_{t - T + j + ⌊T / C⌋}^{(i)}, x_{t - T + j + 2 \cdot ⌊T / C⌋}^{(i)}, \dots, x_{t - T + j + (C - 1) \cdot ⌊T / C⌋}^{(i)}}

(2)

The strategies of continuous and interval sampling can help the model learn different temporal patterns. Specifically, continuous sampling divides the original sequence into several short-term subsequences, providing a local perspective for the model, thereby aiding in the capture of short-term temporal patterns. Interval sampling focuses on the global information, where the model learns trends over longer time spans on sparsely spaced slices. Notably, this down-sampling does not discard any data points from the original sequence but rearranges them into non-overlapping subsequences. In the following content, we will detail how to further explore effective time series features based on these sampled subsequences using the NFLBlock.

2.3. Reversible Instance Normalization

We consider the multivariate forecasting problem of predicting future wind power through multiple historical meteorological features and wind power. This section introduces reversible instance normalization, which can remove time-varying statistical properties from the wind power series features, enabling the temporal patterns learned during the training phase to be more applicable to test data, thereby improving the predictive capability of the model. Generally, RevIN operations can be applied to any chosen layer to perform instance normalization and perform the inverse normalization at a symmetrical position in another layer. For the input sequence

X_{t} \in ℝ^{T}

, we first calculate the mean

ℤ_{t} [x_{k t}^{(i)}]

and variance

V a r [x_{k t}^{(i)}]

of the input sequence using each sample

x_{k \cdot}^{(i)} \in ℝ^{T}

within it.

\begin{array}{l} ℤ_{t} [x_{k t}^{(i)}] = \frac{1}{T} \sum_{j = 1}^{T} x_{k j}^{(i)} \\ V a r [x_{k t}^{(i)}] = \frac{1}{T} \sum_{j = 1}^{T} {(x_{k j}^{(i)} - ℤ_{t} [x_{k t}^{(i)}])}^{2} \end{array}

(3)

These statistics are used for scaling and shifting, followed by normalizing the original data using these statistics:

{\hat{x}}_{k t}^{(i)} = γ_{k} (\frac{x_{k t}^{(i)} - ℤ_{t} [x_{k t}^{(i)}]}{\sqrt{V a r [x_{k t}^{(i)}] + ε}}) + β_{k}

(4)

γ, β \in ℝ^{K}

is a learnable affine parameter vector. After normalization, the mean and variance of the sequence tend to be stable and consistent, thus mitigating the impact of dynamic factors. As a result, the normalization layer ensures that the model effectively identifies variation within the sequence when faced with inputs that have stable mean and variance.

Next, the transformed wind power series features

{\hat{x}}^{(i)}

is used as the input to the module for feature extraction on different dimensions. However, we notice that such input data differs in statistical properties from the original distribution. A module relying solely on normalized input data may not adequately capture the original features of the data. Therefore, the inverse normalization operation is performed at a symmetrical position to the normalization layer. The purpose of this operation is to reintroduce the statistical properties removed during input back into the output of the module so that the wind power value predicted by the model more closely matches the actual value of the original wind power series. We inverse normalize the output of the module based on Equation (4) using inverse arithmetic thinking:

{\hat{y}}_{k t} = \sqrt{V a r [x_{k t}] + ε} \cdot (\frac{{\tilde{y}}_{k t} - β_{k}}{γ_{k}}) + ℤ_{t} [x_{k t}]

(5)

Herein,

\tilde{y}

represents the wind power series obtained after feature projection by the module, and after reintroducing the original dynamic characteristics, now is the final output value of the module. Our focus is on the dynamic properties of the input data, including statistical properties such as mean

μ

, variance

σ^{2}

, and the learnable affine parameters

γ

and

β

. In the RevIN framework, normalization and inverse normalization layers are placed at symmetrical positions in the network, with the former used to remove statistical features and the latter to recover them. Through normalization, the original data is transformed into a zero-mean distribution, which reduces the distribution difference between different instances. To restore this removed information in the output, RevIN employs an inverse normalization process at the output layer, reintroducing the previously eliminated dynamic information. This enables each sample to regain its original distribution properties, beneficial for the final forecasting target computation, making the model’s forecasting more accurate.

2.4. Normalized Feature Learning Block

Figure 2 shows the structure of the Normalized Feature Learning Block (NFLBlock), which is a key component in the NFLM model for feature extraction and forecasting. Its main function is to realize information exchange of input features in the time dimension and channel dimension. The input of the NFLBlock is a two-dimensional matrix with a time dimension of

H

and a channel dimension of

W

. To alleviate the negative impact of dynamic characteristics in wind power series data on forecasting, we first perform an instance normalization operation on this matrix and then perform temporal projection and channel projection through the MLP stacked structure, producing a two-dimensional matrix of shape

F \times W

, where

F

represents the hyperparameter for the number of output feature dimensions. The output matrix can be regarded as the features obtained after the feature projection of the input matrix.

We denote the input matrix with

R = {(r_{i j})}_{H \times W}

, where the

i - t h

row is represented as

r_{i \cdot} = (r_{i 1}, r_{i 2}, \dots, r_{i W})

,

i = (1, 2, \dots, H)

and the

j - t h

column is represented as

r_{\cdot j} = (r_{1 j}, r_{2 j}, \dots, r_{H j})

,

j = (1, 2, \dots W)

. First, the instance normalization layer is applied to normalize the input two-dimensional matrix

R^{H \times W}

to obtain

R = {({r^{'}}_{i j})}_{H \times W}

, thereby removing the dynamic statistical characteristics in the series:

{({\hat{r}}_{i j})}_{H \times W} = R I N {(r_{i j})}_{H \times W}

(6)

Then, an MLP of

ℝ^{H} \to ℝ^{F^{'}} (F^{'} < < F)

is used to implement feature projection from high dimension to low dimension on each column, which we refer to as temporal projection (Equation (7)). Subsequently, another MLP of

ℝ^{W} \to ℝ^{W}

is employed to perform channel projection on each row (Equation (8)):

{r^{t}}_{\cdot j} = M L P ({\hat{r}}_{\cdot j}), j = 1, 2, \dots, W

(7)

{r^{c}}_{i \cdot} = M L P ({r^{t}}_{i \cdot}), i = 1, 2, \dots, F^{'}

(8)

Temporal projection and channel projection are feature transformations performed, respectively, on the time and channel dimensions. In the next step, we will use an MLP of

ℝ^{F^{'}} \to ℝ^{F}

to perform output projection on each column, projecting feature dimension

F^{'}

to

F

:

{r^{o}}_{\cdot j} = M L P ({r^{c}}_{\cdot j}), j = 1, 2, \dots W

(9)

For the NFLBlock, relying solely on normalized data

{(r_{i j}^{'})}_{H \times W}

is insufficient to fully capture the distribution characteristics of the original data. Therefore, we perform an inverse normalization operation at the symmetrical output position of this module, explicitly restoring the statistical properties removed from the input data back into the module output:

{\tilde{r}}_{i j} = I R I N {({r^{o}}_{i j})}_{F \times W}

(10)

where

{\tilde{r}}_{i j}

represents the final output of the NFLBlock. For its input dimensions

H

and

W

, in the first part of NFLM,

H

and

W

correspond to the length of the subsequence

C

and the number of divided subsequences

T / C

, respectively. In the second part,

H

corresponds to

2 \times F

, the sum of the output feature dimensions extracted in the first part, and

W

represents the number of sequence variables

N

.

In the NFLBlock architecture, RevIN aims to handle the dynamic statistical properties in the input matrix, so that the feature distribution of the training and test data is more stable. Firstly, the input sequence is normalized, and then the inverse normalization layer performs a reverse normalization operation on the module output to the same extent as the normalization layer, thereby restoring the statistical information from the original distribution back into the module output. NFLBlock repeatedly performs channel projection on each time step, and when processing longer time series, the computational complexity significantly increases. To reduce computational costs, NFLBlock adopts a bottleneck structure. It is called a “bottleneck” because a smaller intermediate length

F^{'}

is introduced into the structure, which is much smaller than both

H

and

F

. That is, the length

H

of the input sequence is first projected

F^{'}

through a temporal projection layer. Then, channel projection is performed

F^{'}

times to realize information exchange, and finally, the output projection layer projects the features of length

F^{'}

back to the required output length

F

. The bottleneck structure greatly reduces the amount of repetitive channel projection calculation.

3. Experimental Results and Discussions

In this section, we present the two real datasets used for the experiment, the evaluation metrics, and some details of the experiment and analysis of the results.

3.1. Data Description

This study uses two real wind farm datasets from Guangxi, China, to verify the effectiveness of the proposed model. These two wind farms are the Fujia Tian Wind Farm, located in Guilin City, and the Xiayi Mountain Wind Farm in Nanning City, with their geographical locations marked in Figure 3a. These datasets cover the period from 1 January 2021 to 31 December 2021, with recordings every 15 min (96 recordings per day), comprising 33,994 and 34,736 observations, respectively. Each dataset includes six sequential variables: wind speed, wind direction, temperature, humidity, atmospheric pressure, and actual wind power, which are used to predict future wind power in the model. The original dataset of Xiayi Mountain Wind Farm is shown in Figure 3b. For the missing values in the dataset, this paper employs linear interpolation to fill all the missing values. In terms of dataset division, each dataset is split into training, test, and validation sets in the ratio of 7:2:1.

In this study, all experiments were carried out on a computer equipped with the Windows 10 operating system, featuring an Intel(R) Core (TM) i5-8265U processor @ 1.60 GHz and 8 GB of memory. The graphics card used was a GeForce RTX 3090. The entire experimental process was conducted using the Python programming language on the Pycharm Professional Edition 2023.1 platform, with all programming and execution of the experiments completed using the PyTorch deep learning framework (version 2.0.1).

3.2. Experimental Comparison Model Setup

To evaluate the performance of our model, we selected several widely used wind power prediction models for comparison. These models include the Long Short-Term Memory network (LSTM), Gated Recurrent Unit (GRU), Temporal Convolutional Network (TCN), Transformer, Informer, DLinear, and LightTS. The experiment uses the Mean Squared Error (MSE) loss function and the Adam optimizer, with Gelu as the activation function. To reduce the risk of overfitting, we have also implemented an early stopping strategy. The hyperparameters of the model were set using a combination of the grid search method and manual experience to obtain the optimal parameter combination of the model. Details about the hyperparameter design of these comparative models are shown in Table 1, while the hyperparameters for our proposed model are shown in Table 2. The hyperparameters of LightTS are the same as those of NFLM.

3.3. Evaluation Metric

We compared the performance of NFLM with other common deep learning models in wind power forecasting. To evaluate the predictive performance of these models, we used three metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (

R^{2}

). MAE describes the average deviation between the predicted values and the actual values, while MSE represents the expected value of the squared difference between the predicted and actual values,

R^{2}

is used to measure how well the model fits the time series data. Generally, the smaller the MSE and MAE, the closer the predicted values are to the actual values, indicating better predictive accuracy. The larger

R^{2}

is, the better the predictive power of the model.

M S E = \frac{1}{n} {\sum_{i = 1}^{n} ({\tilde{x}}_{i} - x}_{i})^{2}

(11)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\tilde{x}}_{i} - x_{i}|

(12)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} x_{i} - {\hat{x}}_{i}}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}}

(13)

where

x_{i}

represents the actual wind power,

{\tilde{x}}_{i}

represents the predicted wind power,

\bar{x}

represents the average of wind power observations, and

n

denotes the number of predictive samples.

3.4. Experimental Analysis of Multi-Step Forecasting

To validate the performance of the model proposed in this paper in multi-step forecasting tasks, this study designed two experiments based on real datasets and analyzed their results, comparing all comparison models with NFLM. The first experimental case is the wind power forecasting of Xiayi Mountain Wind Farm, with its results and analysis detailed in Section 3.4.1. The second experiment is the wind power forecasting of Fujia Tian Wind Farm, with its results and analysis provided in Section 3.4.2. In this study, we consider 24, 48, 72, and 96 steps ahead forecasts, and the length of the input historical data is 96.

3.4.1. Experiment 1: Wind Power Forecasting at Xiayi Mountain Dataset

In Experiment 1, we tested the predictive performance of various models using the Xiayi Mountain dataset. Table 3 presents a comparison of the MSE, MAE, and

R^{2}

results for different forecasting horizons between the proposed model and other models, with the best and second-best results marked in bold and underlined, respectively. Smaller values of MSE and MAE and the larger the value of

R^{2}

, indicate better predictive performance of the models. From this, we can draw the following experimental conclusions:

We observed that as the forecasting horizon increases, the prediction errors of all models gradually grow. Among them, the proposed model, NFLM, achieved the lowest MSE and MAE, and the

R^{2}

of this model is the largest, demonstrating that the forecasting accuracy of the proposed model surpasses that of current state-of-the-art forecasting models and exhibits strong capability in wind power sequence modelling.

At a forecasting horizon of 24 steps, NFLM achieved an MSE of 0.3525 and an MAE of 0.4131, which are 13.6% and 8.3% lower than the second-best model, LightTS, respectively. On the other hand, NFLM also achieved the highest

R^{2}

value of 0.7574 among all models, indicating that the proposed model provides the best fit to the data at this prediction horizon. For 48-step forecasting, compared to DLinear and MLP, NFLM reduced the MSE by 16.6% and 15.8%, while the

R^{2}

value increased by 14.8% and 13.9%, respectively. At 72-step forecasting, NFLM maintained a clear advantage with an MSE of 0.8539 and an MAE of 0.6991. The performance of LightTS, DLinear, and MLP was comparable, with their MSE values being 0.9224, 0.9301, and 0.9408, while the MSE of models such as LSTM and GRU exceeded 0.9949. At 96-step forecasting, Transformer and Informer reached MSE values of 1.4832 and 1.4583, respectively, and NFLM reduced their MSE by 0.5031 and 0.4782, respectively. Additionally, MLP had the second-best MSE and R in this horizon. The experimental results demonstrate that NFLM exhibits outstanding forecasting performance across tasks with different forecasting horizons. Although simpler models do not achieve the same level of accuracy as our method, they still provide an acceptable level of prediction accuracy.

In Figure 4, we more intuitively compare the MSE, MAE, and

R^{2}

of each model at every step. It can be observed that the MSE of LightTS and DLinear are close to each other at 72 and 96 steps, and the performance of these two models is comparable at these two steps. Across all four forecasting steps, the MSE and MAE curves for NFLM consistently lie at the bottom, while the R curve remains at the top. NFLM always maintains a certain distance from the best performing LightTS in the comparison model. This indicates that even if the forecasting step length increases and the difficulty of model forecasting rises, NFLM still achieves the best forecasting performance. It also demonstrates the effectiveness of simple linear models in time series modeling.

Table 4 provides a detailed explanation of the percentage improvement in MSE and MAE of NFLM at Xiayi Mountain Wind Farm compared to other comparison models. Here, the positive value indicates better predictive performance of NFLM, while a negative value indicates the proposed model’s predictive performance is not as good as other models. The a-improvement, l-improvement, and m-improvement in the table represent the average percentage improvement, the minimum percentage improvement, and the maximum percentage improvement of NFLM compared to each comparative model, respectively.

It can be seen from Table 4 that all MSE and MAE improvement percentages are positive, indicating that NFLM’s predictive performance is better than each comparison model at all forecasting horizons. In terms of l-improvement, compared with the best performing model in the comparison model, the forecasting performance of NFLM is improved at each forecasting step. In the 24-step ahead forecast, NFLM achieves at least 13.65% and 8.28% improvement. Furthermore, our model achieves the highest reduction in MSE and MAE in this forecasting step by about 46.79% and 25.47%, respectively. Although the forecasting errors of all models increase at longer forecasting horizons, NFLM still achieves significant reductions in MSE of 19.23% and 14.07% on average in the 72-step and 96-step ahead forecasts, compared with LSTM, GRU, TCN, Transformer, Informer, and LightTS, indicating that even within a larger forecasting horizon, NFLM still exhibits superior predictive performance.

Figure 5 provides a comparison of the 24-step forecasting curves of different models during the testing phase. Compared to other comparison models, the forecasting curve of the proposed model is overall closer to the trend of the actual wind power value curve. The forecasting curves of the other comparison models fluctuate more significantly around the real curve. The forecasting of Transformer is noticeably lower than the actual values in some cases, which is reflected in its MSE and MAE. Meanwhile, Informer’s forecasting, particularly during the peaks and troughs, deviates more from the actual values compared to other models.

3.4.2. Experiment 2: Wind Power Forecasting at Fujia Tian Dataset

In order to verify the generalization ability of the proposed model, we used the Fujia Tian dataset to evaluate the predictive performance of NFLM, with an input step set to 96. Table 5 lists the evaluation index results of NFLM and all comparison models in the experiment. The best predicted results are highlighted in bold, and the second-best results are underlined. The experimental results are described below.

From the MSE and MAE results in Table 5, it is clear that the difficulty of wind power forecasting increases with the increase in the number of forecasting steps. In the 24, 48, 72, and 96-step predictions, the MSE and MAE of all models gradually increase, while the

R^{2}

value gradually decreases. Compared with LightTS and the rest of the compared models, NFLM shows the optimal performance in 24, 48, and 72 steps. In 24 steps, the performance of DLinear and LSTM is similar, with their MSE values being 0.4231 and 0.4234, respectively. The prediction errors of NLinear, MLP, and Linear are slightly smaller than these two models, while NFLM’s MSE is significantly lower than that of the two models. In the 48-step prediction, MLP achieved the second-best MSE and

R^{2}

, while NFLM performed the best in all three metrics. In 96 prediction steps, LightTS has a lower MSE than NFLM and DLinear, but NFLM’s MSE is still 7.1% lower than that of DLinear.

Figure 6 shows the comparison of MSE of all models in four forecasting steps through a bar graph. As shown in Figure 6, NFLM’s MSE is clearly lower than that of the other comparison models in the first three prediction steps. Its prediction error at the 96-step horizon is similar to that of MLP, Linear, and LightTS. The Transformer model performs better than the TCN and Informer models in the 24-step and 48-step ahead forecasts. However, due to the limitations of its attention mechanism, the MSE and MAE of the Transformer show a significant upward trend when the forecasting step length increases. This is particularly evident in the 72-step and 96-step ahead forecasts, where its performance is significantly lower than other models.

Similar to Experiment 1, to more directly observe the extent of improvement in the predictive performance of the proposed model compared to other models, we calculated the percentage improvement in the predictive effectiveness of NFLM and analyzed it in detail in Table 6, where the meaning of the percentage values is the same as in Table 4 of Experiment 1. Specifically, the MSE of NFLM is 9.72–34.75%, 6.05–44.11%, and 9.80–37.78% lower than other comparison models at 24, 48, and 72 forecasting steps, respectively, demonstrating that NFLM maintains higher accuracy in wind power forecasting. Secondly, although NFLM does not perform as well as LightTS at 96 forecasting steps, the MSE and MAE of our model are only 1.81% and 1.82% higher than LightTS, respectively. Furthermore, at the 96 forecasting steps, the MSE and MAE of our model on average can be improved by 15.96% and 8.60%, respectively.

Figure 7 displays the comparison between the predicted wind power values and the actual power values of all models during the testing phase for the 24-step forecasting experiment. It can be observed that under the same trend, NFLM more closely matches the actual data values in most cases.

3.4.3. Model Efficiency Analysis

In the experiment, we utilized the Fujia Tian dataset to evaluate the model’s efficiency. The length of the input history sequence was set at 96 steps, and the forecasting horizons were {24, 48, 72, 96}. We compared all comparison models with NFLM in terms of the number of floating-point operations per second (FLOPs) and the number of parameters. Table 7 presents the FLOPs, parameters, training time, and time complexity of NFLM across all prediction horizons.

In Table 7, the parameters and FLOPs of all models go up differently as the forecasting step increases. Secondly, we find that the FLOPs and parameters of NFLM and LightTS are the same for all forecasting steps, which suggests that our improvement on LightTS does not expand the computational complexity and capacity of the original benchmark model. For example, in the 24-step ahead prediction, the floating-point operations (FLOPs) and parameter count required to run NFLM are only 36.366 M and 0.59 M, respectively. In addition, simpler models such as DLinear, NLinear, and MLP have smaller FLOPs and parameter counts compared to NFLM. Although their accuracy is not as high as our method, they can still provide acceptable accuracy with lower computational complexity. Furthermore, while NFLM has a larger computational load than simple linear models, it achieves higher prediction accuracy and is still very lightweight compared to models such as Transformer. We find that the Transformer has the largest FLOPs in four steps, followed by TCN, and the number of parameters and FLOPs of Informer are lower than those of the Transformer, which are tens of times larger than those of NFLM, a phenomenon that highlights the advantages of its small model capacity based on the MLP structure. In terms of training time, NFLM shows a significant advantage over more complex models. Linear and MLP have even shorter training times, which is due to their use of simple linear layers for prediction.

Based on the above analysis, models such as DLinear and NLinear have smaller parameters and FLOPs. Although NFLM has more parameters and FLOPs compared to them, its computational cost is still much lower than that of the other comparison models. Combining the multi-step forecasting results from the two wind farms, NFLM appropriately increases the computational load to achieve better prediction accuracy, striking a good balance between prediction performance and computational complexity.

4. Conclusions

This paper presents a lightweight model based on a multi-layer perceptron, named NFLM, and applies it to short-term wind power forecasting. The main innovation of this model lies in its adoption of the RevIN model concept. It removes the statistical characteristics from the sequence features in the normalized feature learning module within the model so as to reduce the interference of the dynamic characteristics in the wind power sequence to the forecasting. On this basis, the MLP-based architecture is applied to learn the information in the sequence. Moreover, the model combines continuous and interval sampling strategies to arrange input sequences in various ways, enabling the model to capture both short-term local and long-term global features from sequences of different time scales.

In multi-step wind power forecasting experiments conducted on two real datasets, compared with mainstream forecasting models such as LSTM, GRU, and Transformer, NFLM demonstrated a more significant reduction in MSE and MAE in most cases; furthermore, it required fewer computational resources and parameters. This validates the higher forecasting accuracy and effectiveness of the model constructed in this paper. Based on the comprehensive experimental results, the lightweight forecasting model based on MLP structure has shown great application potential in the field of wind power forecasting.

Although the simple architecture with the help of MLP can make the model more lightweight, it also leads to the fact that it has too few hyperparameters, and the model generalization may be insufficient. In addition, the proposed model may have neglected the dependence between different variables, so in the future research, we intend to explore the correlation between different variables in the multivariate wind power series in a more in-depth manner, which can help the model to better understand the interactions among the variables in order to forecast wind power more accurately.

Author Contributions

Conceptualization, Y.C. and M.Y.; methodology, Y.C.; software, M.Y.; validation, M.Y. and H.W.; investigation, H.W.; resources, Y.C.; data curation, H.Q. and Y.Q.; writing—original draft preparation, Y.C.; writing—review and editing, M.Y., H.W., R.J. and X.H.; visualization, M.Y.; supervision, Y.C.; project administration, Y.C. and X.H.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 72461001.

Data Availability Statement

The data analyzed in this study are not available to the public under a confidentiality agreement.

Acknowledgments

The authors are grateful to the anonymous reviewers whose valuable suggestions helped to improve the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARMA	Autoregressive moving average
ARIMA	Autoregressive integrated moving average
SVM	Support vector machine
RF	Random forests
XGBoost	Extreme gradient boosting
ELM	Extreme learning machine
CNN	Convolutional neural networks
TCN	Temporal convolutional network
RNN	Recurrent neural network
LSTM	Long short-term memory
GRU	Gate recurrent unit
MLP	Multi-layer perceptron
NFLM	Normalized feature learning forecasting model
NFLBlock	Normalized feature learning block
FLOPs	Floating-point operations
MSE	Mean squared error
MAE	Mean absolute error

References

Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A Review of Wind Speed and Wind Power Forecasting with Deep Neural Networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
Global Wind Energy Council. GWEC|Global Wind Report 2023; Global Wind Energy Council: Brussels, Belgium, 2023. [Google Scholar]
Li, L.-L.; Liu, Z.-F.; Tseng, M.-L.; Jantarakolica, K.; Lim, M.K. Using Enhanced Crow Search Algorithm Optimization-Extreme Learning Machine Model to Forecast Short-Term Wind Power. Expert Syst. Appl. 2021, 184, 115579. [Google Scholar] [CrossRef]
Ye, J.; Xie, L.; Ma, L.; Bian, Y.; Xu, X. A Novel Hybrid Model Based on Laguerre Polynomial and Multi-Objective Runge–Kutta Algorithm for Wind Power Forecasting. Int. J. Electr. Power Energy Syst. 2023, 146, 108726. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A Review of Deep Learning for Renewable Energy Forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
Ahn, E.; Hur, J. A Short-Term Forecasting of Wind Power Outputs Using the Enhanced Wavelet Transform and Arimax Techniques. Renew. Energy 2023, 212, 394–402. [Google Scholar] [CrossRef]
Dong, Y.; Ma, S.; Zhang, H.; Yang, G. Wind Power Prediction Based on Multi-Class Autoregressive Moving Average Model with Logistic Function. J. Mod. Power Syst. Clean Energy 2022, 10, 1184–1193. [Google Scholar] [CrossRef]
Tu, Q.; Miao, S.; Yao, F.; Yang, W.; Lin, Y.; Zheng, Z. An Improved Wind Power Uncertainty Model for Day-Ahead Robust Scheduling Considering Spatio-Temporal Correlations of Multiple Wind Farms. Int. J. Electr. Power Energy Syst. 2022, 145, 108674. [Google Scholar] [CrossRef]
Wang, J.; Zhu, H.; Cheng, F.; Zhou, C.; Zhang, Y.; Xu, H.; Liu, M. A Novel Wind Power Prediction Model Improved with Feature Enhancement and Autoregressive Error Compensation. J. Clean. Prod. 2023, 420, 138386. [Google Scholar] [CrossRef]
Liu, M.-D.; Ding, L.; Bai, Y.-L. Application of Hybrid Model Based on Empirical Mode Decomposition, Novel Recurrent Neural Networks and the ARIMA to Wind Speed Prediction. Energy Convers. Manag. 2021, 233, 113917. [Google Scholar] [CrossRef]
Abedinia, O.; Ghasemi-Marzbali, A.; Shafiei, M.; Sobhani, B.; Gharehpetian, G.B.; Bagheri, M. A Multi-Level Model for Hybrid Short Term Wind Forecasting Based on SVM, Wavelet Transform and Feature Selection. In Proceedings of the 2022 IEEE International Conference on Environment and Electrical Engineering and 2022 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), Prague, Czech Republic, 28 June–1 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
Li, L.-L.; Zhao, X.; Tseng, M.-L.; Tan, R.R. Short-Term Wind Power Forecasting Based on Support Vector Machine with Improved Dragonfly Algorithm. J. Clean. Prod. 2020, 242, 118447. [Google Scholar] [CrossRef]
Yu, M.; Niu, D.; Gao, T.; Wang, K.; Sun, L.; Li, M.; Xu, X. A Novel Framework for Ultra-Short-Term Interval Wind Power Prediction Based on RF-WOA-VMD and BiGRU Optimized by the Attention Mechanism. Energy 2023, 269, 126738. [Google Scholar] [CrossRef]
Guan, S.; Wang, Y.; Liu, L.; Gao, J.; Xu, Z.; Kan, S. Ultra-Short-Term Wind Power Prediction Method Based on FTI-VACA-XGB Model. Expert Syst. Appl. 2024, 235, 121185. [Google Scholar] [CrossRef]
Xiong, J.; Peng, T.; Tao, Z.; Zhang, C.; Song, S.; Nazir, M.S. A Dual-Scale Deep Learning Model Based on ELM-BiLSTM and Improved Reptile Search Algorithm for Wind Power Prediction. Energy 2023, 266, 126419. [Google Scholar] [CrossRef]
Shao, Z.; Han, J.; Zhao, W.; Zhou, K.; Yang, S. Hybrid Model for Short-Term Wind Power Forecasting Based on Singular Spectrum Analysis and a Temporal Convolutional Attention Network with an Adaptive Receptive Field. Energy Convers. Manag. 2022, 269, 116138. [Google Scholar] [CrossRef]
Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An Improved Residual-Based Convolutional Neural Network for Very Short-Term Wind Power Forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
Zhu, N.; Dai, Z.; Wang, Y.; Zhang, K. A Contrastive Learning-Based Framework for Wind Power Forecast. Expert Syst. Appl. 2023, 230, 120619. [Google Scholar] [CrossRef]
Zhang, G.; Zhang, Y.; Wang, H.; Liu, D.; Cheng, R.; Yang, D. Short-Term Wind Speed Forecasting Based on Adaptive Secondary Decomposition and Robust Temporal Convolutional Network. Energy 2023, 288, 129618. [Google Scholar] [CrossRef]
Arora, P.; Jalali, S.M.J.; Ahmadian, S.; Panigrahi, B.K.; Suganthan, P.; Khosravi, A. Probabilistic Wind Power Forecasting Using Optimised Deep Auto-Regressive Recurrent Neural Networks. IEEE Trans. Ind. Inform. 2022, 19, 2814–2825. [Google Scholar] [CrossRef]
Zhao, Z.; Bai, J. Ultra-Short-Term Wind Power Forecasting Based on the MSADBO-LSTM Model. Energies 2024, 17, 5689. [Google Scholar] [CrossRef]
Xiao, Y.; Zou, C.; Chi, H.; Fang, R. Boosted GRU Model for Short-Term Forecasting of Wind Power with Feature-Weighted Principal Component Analysis. Energy 2023, 267, 126503. [Google Scholar] [CrossRef]
Shang, Z.; He, Z.; Chen, Y.; Chen, Y.; Xu, M. Short-Term Wind Speed Forecasting System Based on Multivariate Time Series and Multi-Objective Optimization. Energy 2022, 238, 122024. [Google Scholar] [CrossRef]
Zhao, Z.; Yun, S.; Jia, L.; Guo, J.; Meng, Y.; He, N.; Li, X.; Shi, J.; Yang, L. Hybrid VMD-CNN-GRU-Based Model for Short-Term Forecasting of Wind Power Considering Spatio-Temporal Features. Eng. Appl. Artif. Intell. 2023, 121, 105982. [Google Scholar] [CrossRef]
Li, Q.; Wang, G.; Wu, X.; Gao, Z.; Dan, B. Arctic Short-Term Wind Speed Forecasting Based on CNN-LSTM Model with CEEMDAN. Energy 2024, 299, 131448. [Google Scholar] [CrossRef]
Chu, Y.; Yang, D.; Yu, H.; Zhao, X.; Li, M. Can End-To-End Data-Driven Models Outperform Traditional Semi-Physical Models in Separating 1-Min Irradiance? Appl. Energy 2024, 356, 122434. [Google Scholar] [CrossRef]
Yu, W.; Huang, X.; Qiu, Y.; Zhang, S.; Chen, Q. GSTC-Unet: A U-Shaped Multi-Scaled Spatiotemporal Graph Convolutional Network with Channel Self-Attention Mechanism for Traffic Flow Forecasting. Expert Syst. Appl. 2023, 232, 120724. [Google Scholar] [CrossRef]
Ye, L.; Dai, B.; Pei, M.; Lu, P.; Zhao, J.; Chen, M.; Wang, B. Combined Approach for Short-Term Wind Power Forecasting Based on Wave Division and Seq2Seq Model Using Deep Learning. IEEE Trans. Ind. Appl. 2022, 58, 2586–2596. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Sun, S.; Liu, Y.; Li, Q.; Wang, T.; Chu, F. Short-Term Multi-Step Wind Power Forecasting Based on Spatio-Temporal Correlations and Transformer Neural Networks. Energy Convers. Manag. 2023, 283, 116916. [Google Scholar] [CrossRef]
Gong, M.; Yan, C.; Xu, W.; Zhao, Z.; Li, W.; Liu, Y.; Li, S. Short-Term Wind Power Forecasting Model Based on Temporal Convolutional Network and Informer. Energy 2023, 283, 129171. [Google Scholar] [CrossRef]
Xiang, L.; Fu, X.; Yao, Q.; Zhu, G.; Hu, A. A Novel Model for Ultra-Short Term Wind Power Prediction Based on Vision Transformer. Energy 2024, 294, 130854. [Google Scholar] [CrossRef]
Liu, L.; Wang, X.; Dong, X.; Chen, K.; Chen, Q.; Li, B. Interpretable Feature-Temporal Transformer for Short-Term Wind Power Forecasting with Multivariate Time Series. Appl. Energy 2024, 374, 124035. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Bentsen, L.Ø.; Warakagoda, N.D.; Stenbro, R.; Engelstad, P. Spatio-Temporal Wind Speed Forecasting Using Graph Networks and Novel Transformer Architectures. Appl. Energy 2023, 333, 120565. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, Y.; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; Li, J. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures. arXiv 2022, arXiv:2207.01186. [Google Scholar]
Lin, S.; Lin, W.; Wu, W.; Chen, H.; Yang, J. SparseTSF: Modeling Long-Term Time Series Forecasting with *1k* Parameters. PMLR 2024, 235, 30211–30226. Available online: https://proceedings.mlr.press/v235/lin24n.html (accessed on 4 January 2025).
Yang, M.; Wang, D.; Zhang, W.; Yv, X. A Centralized Power Prediction Method for Large-Scale Wind Power Clusters Based on Dynamic Graph Neural Network. Energy 2024, 310, 133210. [Google Scholar] [CrossRef]
Lai, Z.; Ling, Q. A Dual Spatio-Temporal Network for Short-Term Wind Power Forecasting. Sustain. Energy Technol. Assess. 2023, 60, 103486. [Google Scholar] [CrossRef]
Zhong, M.; Xu, C.; Xian, Z.; He, G.; Zhai, Y.; Zhou, Y.; Fan, J. DTTM: A Deep Temporal Transfer Model for Ultra-Short-Term Online Wind Power Forecasting. Energy 2024, 286, 129588. [Google Scholar] [CrossRef]
Wei, H.; Chen, Y.; Yu, M.; Ban, G.; Xiong, Z.; Su, J.; Zhuo, Y.; Hu, J. Alleviating Distribution Shift and Mining Hidden Temporal Variations for Ultra-Short-Term Wind Power Forecasting. Energy 2024, 290, 130077. [Google Scholar] [CrossRef]
Jónsdóttir, G.M.; Milano, F. Data-Based Continuous Wind Speed Models with Arbitrary Probability Distribution and Autocorrelation. Renew. Energy 2019, 143, 368–376. [Google Scholar] [CrossRef]
Yang, M.; Wang, D.; Zhang, Y. A Short-Term Wind Power Prediction Method Based on Dynamic and Static Feature Fusion Mining. Energy 2023, 280, 128226. [Google Scholar] [CrossRef]
Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.-H.; Choo, J. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. Available online: https://openreview.net/forum?id=cGDAkQo1C0p (accessed on 4 January 2025).

Figure 1. NFLM framework diagram.

Figure 2. Structure of Normalized Feature Learning Block (NFLBlock).

Figure 3. Description of two wind farms: (a) Location information of two wind farms (b) Xiayi Mountain dataset.

Figure 4. The three metrics of NFLM and the comparative model for the Xiayi Mountain dataset. (a) MSE; (b) MAE; (c) Square of R.

Figure 5. Forecasting curves of different models for the Xiayi Mountain wind farm.

Figure 6. MSE for all models in the Fujia Tian wind farm; (a) 24-step; (b) 48-step; (c) 72-step; (d) 96-step.

Figure 7. Forecasting curves of different models for the Fujia Tian wind farm.

Table 1. Comparison model hyperparameter settings.

Model	Hyperparameter	Value
LSTM	Learning rate	0.001
	Patience	5
	Number of hidden neurons	256
	LSTM layers	2
	Dropout rate	0.05
TCN	Learning rate	0.001
	Patience	5
	Dropout rate	0.03
	Number of hidden units per-layer	256
	Levels	5
GRU	Learning rate	0.002
	Patience	5
	number of hidden neurons	256
	GRU layers	5
	Dropout rate	0.3
Transformer	Learning rate	0.0008
	Patience	5
	Dropout rate	0.05
	Encoder layers	2
	Decoder layers	2
Informer	Learning rate	0.0005
	Patience	5
	Dropout rate	0.3
	Encoder layers	2
	Decoder layers	1
DLinear	Learning rate	0.001
	Patience	5
	Dropout rate	0.3

Table 2. NFLM model hyperparameter settings.

Model	Hyperparameter	Value
NFLM	Learning rate	0.001
	Patience	5
	Dropout rate	0.1
	Down-sampling subsequence length	24

Table 3. Multi-step forecasting results of different models for the Xiayi Mountain wind farm.

Model	24 Steps (6 h)			48 Steps (12 h)			72 Steps (18 h)			96 Steps (24 h)
Model	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE	R²
LSTM	0.4308	0.4511	0.7036	0.8162	0.6685	0.4389	0.9964	0.8048	0.3154	1.0884	0.8176	0.2518
GRU	0.4693	0.4958	0.6770	0.8689	0.7061	0.4029	1.0885	0.7855	0.2523	1.0943	0.8206	0.2475
TCN	0.4957	0.5442	0.6589	0.8460	0.7313	0.4186	0.9949	0.8090	0.3165	1.0682	0.8316	0.2656
Transformer	0.6625	0.5543	0.5443	1.1764	0.7534	0.1913	1.4589	0.8841	−0.0029	1.4832	0.9318	−0.0129
Informer	0.4897	0.4946	0.6631	0.9418	0.6953	0.3526	1.4308	0.8599	0.0167	1.4583	0.8899	−0.0026
DLinear	0.4262	0.4597	0.7068	0.7460	0.6418	0.4869	0.9301	0.7392	0.3613	1.0581	0.7859	0.2725
NLinear	0.4441	0.4528	0.6943	0.8035	0.6386	0.4475	1.0389	0.7371	0.2858	1.1835	0.7982	0.1862
MLP	0.4499	0.4961	0.6904	0.7388	0.6525	0.4923	0.9408	0.7545	0.3534	1.0292	0.7864	0.2925
Linear	0.4340	0.4681	0.6943	0.8035	0.6386	0.4475	1.0389	0.7371	0.2858	1.0691	0.7988	0.2651
LighTS	0.4082	0.4504	0.7192	0.6854	0.6101	0.5285	0.9224	0.7216	0.3661	1.0621	0.7848	0.2689
NFLM	0.3525	0.4131	0.7574	0.6220	0.5772	0.5721	0.8539	0.6991	0.4134	0.9801	0.7527	0.3262

Table 4. Mean, minimum, and maximum percentage improvement of NFLM compared to MSE and MAE of other models at the Xiayi Mountain wind farm.

Improvement	24 Steps (6 h)		48 Steps (12 h)		72 Steps (18 h)		96 Steps (24 h)
Improvement	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
a-improvement	23.88%	13.96%	24.66	13.96%	19.23%	10.35%	14.07%	8.44%
l-improvement	13.65%	8.28%	9.25%	5.39%	7.43%	3.12%	4.77%	4.09%
m-improvement	46.79%	25.47%	47.13%	23.39%	41.47%	20.92%	33.92%	19.22%

Table 5. Multi-step forecasting results of different models for the Fujia Tian wind farm.

Model	24 Steps (6 h)			48 Steps (12 h)			72 Steps (18 h)			96 Steps (24 h)
Model	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE	R²
LSTM	0.4234	0.4541	0.6685	0.6942	0.6323	0.4564	0.8675	0.7205	0.3207	1.0919	0.8192	0.1454
GRU	0.4543	0.4637	0.6445	0.7125	0.6276	0.4425	0.9190	0.7325	0.2807	1.0232	0.7847	0.1996
TCN	0.5055	0.5540	0.6045	0.7546	0.6974	0.4100	0.8555	0.7413	0.3301	1.0781	0.8438	0.1562
Transformer	0.4722	0.5255	0.6306	0.7041	0.6448	0.4492	1.1312	0.7940	0.1143	1.5292	0.9331	−0.1966
Informer	0.5254	0.5389	0.5886	1.0140	0.7389	0.2060	0.9940	0.7614	0.2217	1.4554	0.9111	−0.1386
DLinear	0.4231	0.4635	0.6981	0.6590	0.6056	0.4842	0.7976	0.6818	0.3756	0.9368	0.7561	0.2669
NLinear	0.4071	0.4357	0.6811	0.6991	0.5949	0.4525	0.8988	0.6857	0.2959	1.0487	0.7437	0.1790
MLP	0.3981	0.4660	0.6884	0.6032	0.5850	0.5280	0.8243	0.7041	0.3543	0.8563	0.7179	0.3298
Linear	0.3984	0.4438	0.6881	0.6393	0.5882	0.4997	0.7919	0.6670	0.3801	0.8780	0.7103	0.3124
LighTS	0.3797	0.4388	0.7027	0.6264	0.5788	0.5097	0.7803	0.6675	0.3891	0.8545	0.7048	0.3313
NFLM	0.3428	0.4198	0.7316	0.5667	0.5554	0.5565	0.7038	0.6381	0.4489	0.8700	0.7176	0.3190

Table 7. Analysis of model efficiency.

Model		LSTM	GRU	TCN	Transformer	Informer	DLinear	NLinear	MLP	Linear	LighTS	NFLM
Metric	Step	LSTM	GRU	TCN	Transformer	Informer	DLinear	NLinear	MLP	Linear	LighTS	NFLM
FLOPs (M)	24	2512.89	5482.75	6925.04	7281.23	5012.81	0.903	0.443	18.874	0.442	36.366	36.366
	48	2623.48	5593.34	7270.59	8322.30	5536.61	1.788	0.885	16.515	0.885	36.882	36.882
	72	2734.08	5703.93	7666.54	9363.67	6060.41	2.673	1.327	14.156	1.327	37.398	37.398
	96	2844.67	5814.52	8112.89	10,404.91	6584.21	3.557	1.770	11.797	1.770	37.914	37.914
Parameter (M)	24	3.19	6.94	9.52	22	19.7	0.02	0.02	0.06	0.02	0.59	0.59
	48	3.33	7.09	9.72	22	19.7	0.04	0.04	0.08	0.03	0.60	0.60
	72	3.47	7.23	9.94	22	19.7	0.06	0.05	0.09	0.04	0.61	0.61
	96	3.61	7.37	10.1	22	19.7	0.07	0.07	0.10	0.06	0.62	0.62
training time (s)	24	276	318	172	709	562	45	22	19	37	104	112
	48	294	375	153	774	539	32	22	28	42	91	105
	72	242	354	205	698	591	41	23	21	34	102	89
	96	302	369	186	732	685	37	21	26	30	117	129
time complexity		O(L)	O(L)	O(L)	O(L²)	O(L×logL)	O(L)	O(L)	O(L)	O(L)	O(L)	O(L)

Table 6. Mean, minimum, and maximum percentage improvement of NFLM compared to MSE and MAE of other models at the Fujia Tian wind farm.

Improvement	24 Steps (6 h)		48 Steps (12 h)		72 Steps (18 h)		96 Steps (24 h)
Improvement	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
a-improvement	21.03%	11.32%	18.72%	11.23%	19.60%	10.55%	15.96%	8.60%
l-improvement	9.72%	3.65%	6.05%	4.04%	9.80%	4.40%	−1.81%	−1.82%
m-improvement	34.75%	24.22%	44.11%	24.83%	37.78%	19.62%	43.11%	23.10%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Yu, M.; Wei, H.; Qi, H.; Qin, Y.; Hu, X.; Jiang, R. A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning. Energies 2025, 18, 580. https://doi.org/10.3390/en18030580

AMA Style

Chen Y, Yu M, Wei H, Qi H, Qin Y, Hu X, Jiang R. A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning. Energies. 2025; 18(3):580. https://doi.org/10.3390/en18030580

Chicago/Turabian Style

Chen, Yan, Miaolin Yu, Haochong Wei, Huanxing Qi, Yiming Qin, Xiaochun Hu, and Rongxing Jiang. 2025. "A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning" Energies 18, no. 3: 580. https://doi.org/10.3390/en18030580

APA Style

Chen, Y., Yu, M., Wei, H., Qi, H., Qin, Y., Hu, X., & Jiang, R. (2025). A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning. Energies, 18(3), 580. https://doi.org/10.3390/en18030580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Framework for Rapid Response to Short-Term Forecasting of Wind Farms Using Dual Scale Modeling and Normalized Feature Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of NFLM

2.2. Continuous Sampling and Interval Sampling

2.3. Reversible Instance Normalization

2.4. Normalized Feature Learning Block

3. Experimental Results and Discussions

3.1. Data Description

3.2. Experimental Comparison Model Setup

3.3. Evaluation Metric

3.4. Experimental Analysis of Multi-Step Forecasting

3.4.1. Experiment 1: Wind Power Forecasting at Xiayi Mountain Dataset

3.4.2. Experiment 2: Wind Power Forecasting at Fujia Tian Dataset

3.4.3. Model Efficiency Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI