1. Introduction
In the past two decades, wind energy has become a key renewable energy source for addressing energy crises and climate change [
1]. With fossil fuels gradually depleting and the emergence of new situations emphasizing energy conservation and emission reduction, wind power generation is increasingly receiving attention from countries. However, a major drawback of wind power is that wind is intermittent and highly fluctuating [
2], which is not conducive to the safe and stable operation and economic dispatch of the power system. Wind power prediction models can be classified into long-term [
3], medium-term [
4], short-term [
5,
6], and ultrashort-term [
7,
8] predictions based on time scales. Short-term and ultrashort-term predictions are highly correlated with the operation of integrated generating systems in wind farms. High-precision prediction can reduce wind curtailment, helping power plants adjust their short-term generation plans and reserve capacities promptly. Therefore, designing an accurate, fast, and reliable ultrashort-term wind power prediction model is of great significance.
Time series prediction models can generally be categorized into three main types: physical models [
9], statistical models [
10], and artificial intelligence methods [
11]. Physical forecasting models, based on numerical weather prediction (NWP) and aerodynamics, are effective and interpretable in long-term time series forecasting. However, short-term NWP forecasts often lack precision and may not meet forecasting requirements. Statistical methods rely on the statistical analysis of historical time series data to predict wind power generation. These models mainly include autoregressive and moving average (ARMA) [
12], autoregressive integrated moving average (ARIMA) [
13], and Markov models [
14], among others. Statistical models were widely used in wind power time series forecasting in the past, but with the rapid increase in data volume, they have gradually lost their advantage compared to machine learning methods. In recent years, deep learning has seen rapid development and many deep learning models have been applied in the field of wind speed and wind power prediction. Among them, long short-term memory (LSTM) networks [
15,
16], gated recurrent units (GRUs) [
17,
18], and other methods have been widely used in wind power prediction.
Traditional recurrent neural networks and their variants have achieved good results in power prediction, but they have limitations in handling parallel processing and establishing long-term temporal relationships. With the advent of attention mechanisms, the Transformer model was proposed and subsequently widely used [
19,
20,
21]. Thanks to its powerful ability to model time correlations, attention mechanisms have been widely used in time series prediction tasks, leading to a series of models using the Transformer for wind power prediction. Literature [
22] introduced the FEDformer model based on frequency-enhanced Transformer, for wind power generation and wind speed prediction. This model is the entry of Transformer-based models into the field of wind power prediction. In literature [
23], a hybrid deep learning model has been proposed that utilizes historical data for calibration. The model incorporates a Transformer network that accesses historical data in the encoder to enhance its predictions with correction information. Literature [
24] proposed a super-short-term wind speed prediction model based on complementary ensemble empirical mode decomposition (CEEMD) and the BILTM-Transformer. It used CEEMD decomposition to reduce the instability of wind speed sequences. Literature [
25] introduced the Wind Transformer model, which addresses the segmentation problem and predicts the power output of each turbine individually.
However, both the Transformer and its variant models rely on simple linear layers for dimensionality reduction to produce the final prediction, which, to some extent, reduces the model’s ability to capture nonlinear relationships. As research in this area has progressed, the emergence of Kolmogorov–Arnold networks (KANs) in 2024 [
26] successfully addressed this limitation, improving model prediction accuracy. KAN is an extension of deep learning based on the mathematical principles of the Kolmogorov–Arnold representation theorem (KART). This theorem proves that a multivariate continuous function can be approximated by the summation of a finite number of nonlinear elementary functions. The KAN network initializes several B-spline functions and learns their parameters. By summing these B-spline functions to form nonlinear elementary functions, and then aggregating all the nonlinear elementary functions to map them to the prediction dimension, the model’s capacity to capture nonlinear relationships is enhanced. Since KAN networks are adept at capturing nonlinear relationships, they have subsequently been integrated with time-series models. For instance, literature [
27] demonstrates that KAN networks, due to their highly accurate modeling of complex nonlinear systems, have proven to be effective in energy management systems, showcasing their practical utility and effectiveness. Another study [
28] incorporated KAN into time-series forecasting to improve interpretability and nonlinear relationship modeling, verifying the effectiveness of KAN in the field of time-series prediction.
When forecasting future wind power, the prediction is not only related to past wind power values but also closely linked to other variables such as historical wind speed, wind direction, temperature, and more. Therefore, multiple variable features are often introduced alongside wind power features. However, during this process, the presence of noise and outliers in the multivariable data can reduce the accuracy of the prediction model. To address this, the exponential weighted moving average (EWMA) method is introduced. By controlling the decay factor, EWMA smooths the data while retaining the original trends, thereby reducing the impact of noise on the prediction accuracy. For example, in [
29], EWMA analysis is used in the monitoring stage to detect abnormal changes in features. In [
30], EWMA assigns greater weight to more recent data, smoothing time-series data and improving performance through this optimized data-mining approach.
In conclusion, when using the Transformer model with point-wise attention mechanisms for wind power prediction, the model’s accuracy is reduced due to noise in wind power data and the inability of the simple linear output layer to capture nonlinear relationships. To address this, this paper proposes a KAN-Transformer model incorporating EWMA data processing. The contributions of this paper are as follows: First, EWMA is employed to denoise the introduced multivariable features, improving the predictability of wind power data. Second, the KAN layer is used to replace the Transformer model’s final linear layer, enhancing the model’s ability to capture nonlinear relationships and thus improving prediction accuracy.
The chapter arrangement is as follows: In
Section 2, EWMA is used to reduce the impact of noise on the model’s prediction accuracy. KAN is combined with the Transformer model, where features are extracted through the Encoder–Decoder structure and then passed into the KAN layer to obtain prediction results, enhancing the model’s ability to capture nonlinear relationships. In
Section 3, simulation experiments and analysis are conducted. In
Section 4, conclusions are summarized.