1. Introduction
Against the backdrop of increasing energy demand today, the widespread use of traditional energy is further exacerbating the energy crisis and promoting the deterioration of the greenhouse effect [
1]. Therefore, people are turning their attention to renewable energy, especially wind energy. According to the 2022 Global Wind Energy Report released by the Global Wind Energy Council (GWEC), the global wind energy installed capacity is expected to increase by 680 million kilowatts from 2023 to 2027 [
2]. This indicates that wind energy, as an environmentally friendly and widely distributed renewable energy source, is being actively promoted. However, the inherent randomness and instability of wind resources result in a high degree of uncertainty and unpredictability in wind power generation, posing significant challenges to the stable operation of the power grid and electricity dispatching [
3]. Accurate short-term forecasting can help grid operators efficiently manage the power resources of wind farms, improving wind energy utilization and reducing operational costs. Moreover, the current real-time forecasting of wind farms places high demands on model forecasting efficiency; therefore, accurate and efficient wind power forecasting is crucial to meet these challenges.
Current models applied in the field of wind power forecasting mainly include physical models, statistical models, machine learning models, and deep learning models. Physical models use hydrodynamics and thermodynamics to simulate the interaction between wind and the atmosphere; by incorporating factors such as terrain and landforms and solving physical equations, they predict wind speed and direction at specific locations and heights. The output power of the wind farm is then calculated based on the power curve of the wind turbines [
4]. However, physical models typically require large computational resources and real-time data, making them unsuitable for short-term forecasting [
5]. On the other hand, statistical models such as Autoregressive (AR) [
6], Autoregressive Moving Average (ARMA) [
7], and Autoregressive Integrated Moving Average (ARIMA) [
8] predict by establishing mathematical relationships between historical meteorological data and wind power. Wang et al. proposed a forecast error compensation method based on the ARIMA model to improve forecasting accuracy [
9]. However, statistical models may be limited in handling complex non-stationary wind power sequences due to assumptions they are based on, such as linear distribution, which can lead to underfitting and lower forecasting accuracy [
10].
In recent years, many classic machine learning techniques have been applied to short-term wind power forecasting, including Support Vector Machine (SVM) [
11,
12], Random Forests (RF) [
13], Extreme Gradient Boosting (XGBoost) [
14], and Extreme Learning Machine (ELM) [
15]. Compared to statistical methods, machine learning models are able to effectively model the nonlinear relationships between wind power data at different time points [
16]. In addition, deep learning has become a popular technology in the field of wind power forecasting with its excellent data feature mining ability and complex network architecture [
17]. Among them, the commonly used deep learning models include CNN [
18], TCN [
19], RNN [
20], and its two variants, LSTM [
21] and GRU [
22], etc. Hybrid models usually have better forecasting performance than a single model due to the integration of the unique advantages of different models [
23]. Zhao et al. [
24] employed variational mode decomposition to reduce the volatility of the original wind speed series. The decomposed subsequences, combined with the original wind power series, are then fed into a CNN-GRU hybrid model for short-term wind power forecasting.
Moreover, in recent years, the model based on the attention mechanism has demonstrated its powerful performance in various time series forecasting tasks, including wind speed forecasting [
25], solar irradiance forecasting [
26], and traffic flow forecasting [
27]. This mechanism typically focuses on key features that significantly impact the forecasting results to address time series forecasting problems [
28]. The Transformer [
29], with its encoder-decoder structure and self-attention mechanism, can capture complex time-dependency patterns and correlations among different sequential variables, thereby achieving better forecasting results. Building upon this, the model based on the attention mechanism has been gradually introduced into the field of wind power forecasting and has achieved notable success. For instance, Sun et al. [
30] proposed a short-term wind power forecasting method using the Transformer that considers spatio-temporal correlations. Gong et al. [
31] combined the strengths of TCN and Informer, significantly enhancing the accuracy of wind power forecasting. In addition, Xiang et al. [
32] utilized the multi-head self-attention mechanism of Vision Transformer to fully exploit the complex nonlinear relationships among input data, thereby improving the accuracy of ultra-short-term wind power forecasting. Liu et al. [
33] proposed an interpretable Transformer model integrating decoupled feature-temporal self-attention and variable-attention networks and enhanced wind power prediction accuracy through multi-task learning.
As mentioned earlier, Transformers and their variants have achieved high accuracy in wind power forecasting. However, due to the reduction of correlation between distant data points in the sequence, it is difficult for the Transformer to adequately understand the global information of the sequence [
34]. Additionally, the time complexity of Transformers grows quadratically with the increase in time series length [
35]. Based on these considerations, Zeng et al. [
36] questioned the applicability of such models as Transformer in the field of time series forecasting and proposed a simple single-layer linear model, DLinear, which decomposes the data into a moving average component and a seasonal trend component and then applies a single linear layer to the two components, respectively. DLinear is experimentally demonstrated to be superior to the complex Transformer-based model in terms of prediction accuracy and computational efficiency. Meanwhile, Zhang et al. [
37] proposed a lightweight multivariate forecasting architecture based on stacked multi-layer perceptrons (MLPs) called LightTS, further advancing research in lightweight forecasting. This model converts one-dimensional sequences into two-dimensional tensors through different sampling methods, allowing it to focus separately on analyzing short-term local variations and long-term global variations and then uses the MLP to learn the sequential features within these tensors, achieving improved forecasting accuracy while significantly reducing computational complexity. Summing up the previous analysis, from the perspective of forecasting accuracy and efficiency, lightweight models have a promising application prospect in time series modeling.
We have observed that training complex network models and tuning their parameters typically require significant computational resources and time, especially for Transformer-based models, which often have millions of parameters. This makes their application in resource-limited environments (such as edge computing nodes or remote wind farms) challenging [
38]. In contrast, lightweight models are more practical due to their lower computational resource requirements. To ensure the load safety of wind turbines, the control system needs to adjust the pitch angle and rotational speed of the blades in response to changes in wind speed. However, due to delays in the control system, this can lead to excessive loads on the turbine. Moreover, wind speed measurement devices such as lidar are sensitive to weather conditions, making it difficult to provide reliable wind measurements. Therefore, deploying lightweight prediction models to provide short-term wind speed forecasts for each turbine is a technological solution that can help reduce the load on the turbines. Furthermore, lightweight models generally offer faster computation speeds. In applications that require forecasting wind power for a specific region [
39], they can quickly process and predict the output of wind power clusters consisting of multiple wind farms across a large spatial area, which is crucial for grid scheduling of wind resources. Therefore, lightweight wind power forecasting models are of significant importance for real-time predictions in wind farms. By simplifying the structure and reducing the number of parameters, they maintain a good balance between performance and lightweight design, accelerate the wind farm’s response time and update frequency to fluctuations in wind power generation, and reduce operational costs. In the field of wind power forecasting, Lai et al. [
40] developed a lightweight spatiotemporal wind power forecasting network, which synchronously learns spatiotemporal representations through the spatial layer and asynchronously learns temporal representations through the asynchronous spatial layer. They also used MLP to update wind power information along the temporal dimension, improving the efficiency of capturing wind power’s spatiotemporal relationships. Moreover, since online learning emphasizes a model’s ability to learn from new data in real-time and continuously update itself, online learning models often require simpler structures. Zhong et al. [
41] introduced external time-related data and designed a lightweight parallel network to process these external data, mitigating information transmission degradation and enhancing online forecasting performance. In this paper, based on the great potential of simple linear models and the limitations of complex network models, we decide to introduce LightTS, a lightweight deep learning architecture that utilizes MLP in time and channel dimensions and thus achieves a lightweight model structure. However, LightTS does not adapt well to the characteristics of wind power data due to the inherent non-stationarity of wind power sequences [
42]. Therefore, we have decided to make further improvements based on LightTS.
In fact, the wind power series are typical time series with nonlinear and dynamic characteristics, and the value of each time step is affected by the past moments. This dependency arises because wind speed at a given moment is affected by various physical factors and temporal dynamics, such as atmospheric inertia, turbulence, and changing weather patterns, which introduce correlations between consecutive wind speed measurements. Specifically, wind speed time series often exhibit significant autocorrelation, meaning there is a statistical relationship between the current and past wind speeds [
43]. Since wind speed affects the power generation of wind turbines, it creates a time dependency between wind power values at different moments. Although shallow models possess excellent nonlinear representation capabilities for time series modeling, they usually model the relationship between the input measurements and future predicted values of wind power series based on static space [
44]. This approach may overlook the dynamic relationships that change over time between samples, leading to the patterns observed by the model in the past becoming inapplicable for the future, thus impacting forecasting accuracy. Based on this situation, we were inspired by the idea of RevIN [
45] and proposed a normalization feature learning block, which is used as a key component of the proposed model to process sequence features. This block mitigates the negative impact of dynamic characteristics on model forecasting by removing dynamic statistical properties from the sequence features through preprocessing. Then, it applies MLPs in the temporal and channel dimensions for feature learning.
Considering the computational complexity of the model and the impact of the non-stationary nature of wind power data, we explore the potential of simple and lightweight deep learning network structures in the field of short-term wind power forecasting. The main contributions of this paper are as follows:
(1) To reduce the impact of dynamic relationships in wind power series on the forecasting performance of the model, we have proposed a Normalized Feature Learning Block (NFLBlock). This block removes the time-varying dynamic characteristics from the sequence features, allowing for better utilization of the stacked MLP structure for information interaction in both temporal and channel dimensions.
(2) To address the issues of overfitting and high training time costs associated with complex forecasting models, which hinder quick and accurate responses for real-time wind power forecasting in wind farms, we have developed a lightweight normalized feature learning forecasting model, NFLM, to achieve short-term multivariate wind power forecasting. This model is based on a stacked multi-layer perceptron architecture and, through continuous and interval sampling, can fully exploit the local and global dependencies in wind power sequences across different time scales.
(3) We tested the NFLM model at two wind farms in Guangxi, China. The results show that NFLM outperforms existing complex wind power forecasting models based on Transformers and recurrent structures in terms of forecasting accuracy. Additionally, NFLM maintains low computational and parameter requirements across all forecasting horizons.
The rest of this paper is organized as follows: we briefly introduce the framework of NFLM and its key components in
Section 2.
Section 3 conducts relevant experiments and discusses the model’s forecasting effectiveness and forecasting efficiency. In
Section 4, we summarize our work and outlook for the future.